AI Usage Playbooks Portal Back to top

AI Usage Playbook — Development

Practical guidance on AI-assisted software development — for backend and frontend developers.

Core Principles: All recommendations in this playbook align with Xebia's official Core Principles for Working with AI. Refer to that document for the foundational rules that govern every AI interaction at Xebia.


How This Playbook Is Organized

This playbook focuses on development methodology and workflows. For AI tools, models, costs, security, and privacy, see the AI Common Playbook — it covers everything that applies across all roles.

Part I — Methodology and Practices covers development-specific techniques: coding best practices, task management, spec-driven development, quality metrics, and use cases. Come back to Part I when you need to refine your workflow or explore advanced patterns.

Part II — Quick Start is a step-by-step guide to agentic development. It walks you through a concrete workflow for coding with an AI agent, from preparing your project all the way to committing reviewed code. The Quick Start uses Claude Code as the primary tool, GitHub for version control, and the agent's built-in planning capabilities — no additional spec-driven development frameworks required. It gives you one clear path you can follow today and adapt to other tools, frameworks, and methodologies once you have a working baseline.

Part I covers the methodology. Part II (Quick Start) puts it into practice on a real task.


Part I — Methodology and Practices

Introduction

Use this part as a reference — for refining your AI-assisted workflow, adopting spec-driven development, or going deeper on any topic covered in the Quick Start.


1. Best Practices for AI Coding Assistants

These practices apply regardless of which tool you use.

1. Invest in context engineering

The highest-leverage investment is providing excellent context. This means maintaining project-level instruction files that describe your architecture, coding standards, approved tech stack, and conventions. Keep these files updated and version-controlled.

Deep dive: Anthropic's engineering team published two essential articles on this topic: Effective context engineering for AI agents explains why context management has surpassed prompt writing as the critical skill, and Effective harnesses for long-running agents covers strategies for maintaining quality across multi-session projects.

Project instruction files — each tool has its own configuration file format. See AI Common Playbook, Section 3 (Project Instruction Files) for the full reference table and loading behavior.

# Example: CLAUDE.md (or equivalent)
## Architecture
- Microservices with event-driven communication via RabbitMQ
- CQRS pattern with PostgreSQL + Marten for event sourcing
- .NET 8, C# 12, ASP.NET Core

## Coding Standards
- PascalCase for public methods, _camelCase for private fields
- Async/await for all I/O, suffix async methods with "Async"
- Constructor injection, guard clauses at method start
- XML documentation for all public APIs

## Constraints
- No GPL dependencies in production code
- API responses must be <200ms at p95
- All external dependencies require ADR approval

2. Start with small, focused tasks

Don't ask an agent to build an entire feature in one go. Break work into focused steps: data model → repository → service layer → API endpoints → tests. Review and validate at each step.

3. Use approval modes wisely

Most agents offer tiered permission levels. For production codebases, prefer modes that require explicit approval before the agent runs commands or writes to files outside your working directory. Increase autonomy only as you build trust and establish guardrails — but know that guardrails are not airtight. Attackers can bypass them through jailbreaking, context drift, and semantic inconsistency.

4. Review everything critically

AI-generated code can look plausible while containing subtle bugs, security holes, or performance issues. Apply the same rigor to AI-generated code that you would to any code review. Never commit code you don't understand.

5. Maintain version control discipline

Make small, atomic commits with descriptive messages. Never bulk-commit large AI outputs without understanding each change. Your git history should tell a coherent story.

Task Management and Planning Strategies

For complex projects, consider these agent-compatible planning approaches:

Plan → Implement → Verify loop — Before coding, ask the agent to create a plan. Review the plan, then authorize implementation. After implementation, run tests and verification. This three-phase approach catches issues early. Most major tools now have a dedicated plan mode for this: Copilot's Plan mode (select from agents dropdown or press Shift+Tab in CLI), Claude Code's Plan mode (Shift+Tab x 2 or --permission-mode plan), and Cursor's agent planning step.

Cross-model validation for critical plans — For high-stakes features, consider validating an agent's implementation plan with a second model before writing any code. After your primary agent (e.g., Claude Sonnet) produces a plan, feed that plan to a reasoning-focused model (e.g., o1/o3, Claude with extended thinking) with the prompt: "Find flaws, edge cases, and missing requirements in this implementation plan." This is not necessary for routine work, but for complex features with significant business impact, a five-minute cross-model review can catch architectural blind spots that a single model's planning step might miss. It's a lightweight second opinion, not a mandatory gate.

Multi-agent parallelism — Tools like Claude Code (sub-agents), Codex (cloud tasks), and Cursor (background agents) let you run multiple agents in parallel on independent tasks. Use this for large refactors where tasks don't have heavy dependencies on each other. When tasks are sequential or interdependent, parallelism won't work — instead, create a tasks.md file with a clear checklist and work through it across focused sessions, manually tracking progress.


2. Spec-Driven Development

Coding agents plan, execute multi-step tasks, and run across sessions that outlast your working memory. Context windows have grown, but context alone doesn't equal intent — without a durable, explicit specification, even the most capable agent can build the wrong thing. Ad-hoc prompting works for small tasks; it breaks down once a feature spans multiple sessions, multiple people, or both.

Consider a team adding CSV import. Without a spec, a developer describes the feature conversationally across three agent sessions. By session three the agent has forgotten the original validation rules, the error-handling approach contradicts day one, and the reviewer spends two days reverse-engineering intent from a 1,200-line diff. With a spec — functional requirements, technical constraints, acceptance criteria — every session starts from the same file, the reviewer checks the diff against written criteria, and gaps are obvious. The difference isn't ceremony; it's a single source of truth that outlives any individual session.

What Is Spec-Driven Development?

A spec is not just a better prompt. Four properties separate it from even a well-crafted prompt:

  • Durable — it lives in a file, not in a chat message. It survives across sessions, agent restarts, and team handoffs.
  • Versioned — it lives in git alongside the code. As the feature evolves, the spec evolves with it — and the history is auditable.
  • Reusable — multiple agents (or humans) can work from the same spec independently without re-explaining context.
  • Verifiable — it contains acceptance criteria that the agent, a reviewer, or CI can check against mechanically.

The core workflow:

Specify → Plan → Tasks → Implement → Validate
  1. Specify — Write detailed requirements including functional behavior, technical constraints, architecture decisions, and acceptance criteria
  2. Plan — Ask the AI to produce an implementation plan; review and refine it before any code is written
  3. Tasks — Break the plan into discrete, ordered tasks with clear inputs and outputs
  4. Implement — The AI agent implements tasks one at a time, with review checkpoints
  5. Validate — Run tests, review diffs, and verify the implementation against the specification

How It Compares to Other Approaches

Approach Specification Planning Oversight Best For
Vibe coding None (ad-hoc prompts) None Minimal — review output only Quick prototypes, throwaway scripts, exploration
Prompt engineering Implicit in prompts Minimal Per-prompt — review each response Single-file tasks, targeted edits
Spec-Driven (SDD) Formal, versioned specs Full plan before code Structured — review at each phase Production features, team collaboration, complex systems

When to Use SDD (and When Not To)

Use SDD when: - Building production features that will be maintained long-term - Working in a team where multiple people (or agents) need to understand intent - The feature involves complex business logic or cross-cutting concerns - You need traceability between requirements and implementation

Skip SDD when: - Rapid prototyping or throwaway experiments - Simple, well-scoped tasks (fix a typo, add a log line) - You're exploring an unfamiliar domain and need to iterate freely

Practical Example: SDD Spec

Here's what the CSV import spec from the intro might look like as a concrete template:

# Spec-Driven Development: CSV Import

## Requirements
- User can upload a CSV file of up to 50MB through the web UI
- System parses the CSV, validates against the expected schema, and stores records in PostgreSQL
- Invalid rows are collected and returned as a downloadable error report
- Processing happens asynchronously; user sees a progress indicator

## Technical Constraints
- Backend: .NET 8, ASP.NET Core, EF Core 8
- Frontend: React 18 with TypeScript
- File storage: Azure Blob Storage for temporary uploads
- Max concurrent uploads per user: 3
- Must handle files with up to 500,000 rows

## Architecture
- Use background job (Hangfire) for CSV processing
- SignalR for real-time progress updates
- Repository pattern for data access
- Follow existing coding standards in .cursorrules

## Instructions
1. First, produce a `plan.md` with your implementation approach — do NOT write any code yet
2. After I review the plan, break it into discrete tasks in `tasks.md`
3. Implement tasks one at a time, waiting for my approval after each

SDD Tools

Several tools target spec-driven workflows specifically:

GitHub Spec Kit — An agent-agnostic toolkit that bootstraps the specification, planning, and task breakdown process. Creates a .specify folder in your repo with structured artifacts. Works with any coding agent.

Source: developer.microsoft.com/blog/spec-driven-development-spec-kit

AWS Kiro — Amazon's AI IDE built around spec-driven development, using "spec" artifacts to guide implementation with strong integration into AWS services.

Source: kiro.dev | Introducing Kiro | Specs documentation

JetBrains Junie — Supports SDD workflows through its planning and guidelines system, producing requirements.md, plan.md, and tasks.md files. The "Think More" toggle encourages deeper planning.

Source: blog.jetbrains.com/junie/2025/10/how-to-use-a-spec-driven-approach-for-coding-with-ai/

Claude Code — Supports SDD through CLAUDE.md project files and Plan mode, which lets you research and plan before writing code. Combine with task files for structured implementation. Community-maintained skill collections (e.g. replicating Kiro's spec workflow) exist, but review them carefully before adopting — they're unofficial and may introduce unexpected behavior.

OpenAI Codex — Uses AGENTS.md project files and the $create-plan skill for structured planning. Combine with task files for step-by-step implementation.

What's Next for SDD

SDD is useful today, and several trends point to it becoming more important:

  • Multi-agent workflows — When multiple agents work on the same feature in parallel (one on backend, another on frontend, a third writing tests), the spec becomes the shared contract that keeps them synchronized. Without it, agents diverge silently.
  • Agentic CI/CD — Agents that don't just write code but also create PRs, run tests, and fix failing builds need something to validate their work against. A spec with clear acceptance criteria lets an agent self-check before requesting human review.
  • Traceability and audit — As AI generates a growing share of the codebase, the question "why was this implemented this way?" becomes harder to answer from git history alone. The chain from spec to plan to task to commit creates an audit trail — valuable internally, and increasingly relevant for clients and regulators.

3. Measuring AI Impact — Development Metrics

AI tools change how you work, but "it feels faster" won't convince a client or a budget holder. You need numbers.

Universal metrics (ROI, Developer Experience) — see AI Common Playbook, Section 6.

Delivery Output (DORA Metrics)

DORA (DevOps Research and Assessment) metrics measure team delivery performance. Working with AI directly impacts two of them: * Cycle Time: The time from starting a task to completing it (e.g., status change in Jira). Compare the average Cycle Time of a solo developer against a "developer + AI agent" pair — the difference shows where AI actually speeds things up. * Lead Time for Changes: The time from committing code to deploying it to production. AI speeds up test writing and CI/CD configuration — track whether that translates to shorter lead times in practice.

Quality and Technical Debt

Speed means nothing if defect rates climb. AI generates code and catches debt — track both sides: * SonarQube Metrics: Test coverage, Code Smells, and Technical Debt Ratio. Prompt agents to write missing tests and refactor flagged code — then check whether the numbers actually move. * QA Defect Rate: Bugs per release or sprint. If developers run AI-assisted code review and logic validation before opening a PR, defect rates should drop — measure whether they do.


4. Use Cases in Software Development

4.1 Understanding an Existing Codebase

When joining a new project or navigating an unfamiliar module, an AI agent can walk you through architecture, data flows, and conventions faster than reading docs that may be outdated.

Effective approach:

I'm new to this project. Here is the directory structure:
[paste tree output]

And here is the main entry point:
[paste code]

Please explain:
1. The overall architecture and main components
2. How data flows through the system
3. Key dependencies and their roles
4. Any patterns or conventions I should follow

With terminal agents (Claude Code, Codex CLI): These tools scan your entire repository and answer questions about code structure, data flow, and dependencies — no pasting needed.

Generating a visual Code Map (Claude Code): When you need a shareable visual overview — for onboarding, architecture discussions, or stakeholder communication — Claude Code's official Playground plugin generates one. Invoke /playground and ask for a code map of your project. It produces a self-contained interactive HTML file that visualizes the codebase's structure, dependencies, and key components.

4.2 Generation of Documentation

Documentation rots fast. AI agents can draft docs directly from your code, giving you a starting point that reflects what actually exists — not what someone remembered to write down six months ago.

Use cases: - API documentation from code signatures and comments - README files for new repositories - Architecture Decision Records (ADRs) - Inline code comments for complex logic - Migration guides and changelog entries - Onboarding guides from existing documentation

Example prompt:

Based on the following controller and service code, generate OpenAPI
documentation in YAML format. Include descriptions for each endpoint,
request/response schemas, error codes, and example payloads.

[paste code]

With terminal agents (Claude Code, Codex CLI): These tools have direct access to your codebase, so they can generate documentation from your actual code without any pasting — just describe what you need documented.

4.3 Coding with Agents

Agent-based coding delegates implementation steps to the AI. You describe what you want at a high level, and the agent plans, writes code, runs tests, and iterates — with you reviewing at each step.

Best practices:

Official guides: Claude Code best practices (Anthropic) | GitHub Copilot customization docs | GPT-5 prompting guide — agentic patterns (OpenAI)

  1. Write a CLAUDE.md / AGENTS.md first — Before you start any serious coding, spend 30 minutes on a context document that describes your project's architecture, conventions, and constraints. This front-loads context that would otherwise leak into every prompt. Keep these files up-to-date, consistent, but not overloaded. Consider what to include in these files and what to put in instructions or skills
  2. Identify and offload "friction" tasks first — Before adopting full spec-driven workflows, look for daily, repetitive tasks that disrupt your flow or feel like a chore. You don't need a fully autonomous agent — start with manual, on-demand commands. Create specialized prompts or scripts for common friction points: unit test generation based on your project's testing standards, contextual code review that checks against your team's priorities, localization workflows (generating translations, replacing hardcoded strings with keys), boilerplate scaffolding for new modules or components. These quick wins build confidence and skill before tackling complex agentic workflows
  3. Use the spec-driven workflow for anything non-trivial (see Section 2)
  4. Provide feedback, not just approval — When the agent proposes code, explain why something should be different. This teaches it your preferences for subsequent turns
  5. Run tests after each step — (or use pre/post hooks for it, which should be available in each agentic coding tool) Don't let the agent pile up five changes before verifying. Small steps, frequent verification
  6. Keep context window health in mind — Long sessions degrade quality. The agent typically reports context usage after key steps — when it's above ~50%, or you're switching to unrelated work, start a fresh session. For complex plans that require more work than a single session can handle, create a tasks.md file and split work across multiple sessions manually — parallelism via sub-agents is not always possible when tasks have sequential dependencies

4.4 Writing Good Tests with AI

AI-generated tests often look right but test the wrong things — implementation details instead of behavior, happy paths only, mocks that make tests pass by definition. The Core Principles cover common failure modes and include a review checklist. Here are practical techniques for getting better results.

1. Specify what to test, not just "write tests"

❌ "Write tests for UserService"

✅ "Write unit tests for UserService.AddUserAsync with the following scenarios:
   - Successful creation with valid data
   - Duplicate email → throws DuplicateEmailException with the email in the message
   - Null user object → throws ArgumentNullException
   - Empty username → throws ValidationException
   - Password hashing is applied (verify hash differs from plaintext)
   - CreatedAt audit field is set to current UTC time
   - User is NOT persisted when validation fails"

Also show examples of good tests in your prompt — few-shot examples make a noticeable difference here.

2. Demand behavioral tests, not interaction tests

Tell the AI to verify outcomes and state, not just method calls. A test should answer: "Does the system behave correctly?" — not "Did the code run?"

3. Require the AAA pattern explicitly

Ask for Arrange / Act / Assert with one logical assertion per test and descriptive names:

[Test]
public async Task AddUserAsync_WithDuplicateEmail_ThrowsDuplicateEmailException()

4. Validate tests by breaking the code

After the AI generates tests, intentionally introduce a bug in the production code. If the tests still pass, they are not testing real behavior. Nothing else catches false-confidence tests as reliably.

5. Consider property-based testing for complex logic

For algorithms, data transformations, or parsers, ask AI to generate property-based tests that verify invariants across thousands of random inputs, rather than a handful of specific examples.


5. Staying Up to Date

Models, tools, and best practices shift fast enough that what you learned last month may already be outdated.

Current recommendations are in Recommended Learning Resources.

6. Trainings and Certifications

Certifications give clients concrete proof you know what you're doing. Current options are listed in Recommended Learning Resources.


Part II — Quick Start

Introduction

This is a logical tutorial, not a technical one. The steps below describe the thinking process and the sequence of decisions you should follow when working with an AI coding agent. They are deliberately abstract and tool-agnostic — they outline what to do and why, not the exact keystrokes or CLI commands for every scenario. You will need to adapt the details to your specific tool, tech stack, and project context.

Part I gave you the methodology — best practices, spec-driven development, quality metrics. Now you have a real coding task and a practical question: "What do I actually do, step by step?"

What follows is one concrete workflow built around one toolset. Once it clicks, swap components, add spec-driven development frameworks (see Part I, Section 2), or integrate other tools.

The toolset:

  • Claude Code — Anthropic's terminal-native agentic coding assistant (also available as a VS Code / JetBrains extension). See AI Common Playbook, Section 3 for a detailed profile.
  • GitHub — version control and collaboration
  • Built-in plan mode — Claude Code's native planning capability (no external spec framework)
  • GitHub Actions — for automated AI-powered code review in CI/CD

Prerequisites: A Claude Pro, Max, or Team subscription (or Anthropic Console account), Git and GitHub set up. See the official Claude Code docs for installation instructions.

Cost management — agentic sessions consume significantly more tokens than simple chat. See AI Common Playbook, Section 2 for subscription vs. API economics and strategies to keep costs under control.


Step 1 — Set Up Claude Code

Get Claude Code running and configured before your first real task.

1.1 Install Claude Code

Follow the official installation guide to set up Claude Code on your machine: Install Claude Code - Native Install (recommended)

Once installed, launch Claude Code from your project's root directory:

claude

On first launch, Claude Code will authenticate with your Anthropic account. It automatically reads CLAUDE.md files in your project and picks up git status, then explores other files as needed during the conversation.

1.2 Getting around Claude Code

Key interface elements worth knowing:

Slash commands

Type / to see the full list of available commands. The ones you will use most often:

Command What it does
/model Switch between models mid-conversation. Use Opus for complex reasoning and architecture decisions, Sonnet for routine implementation, Haiku for quick questions. Switching models is free and instant — match the model to the task, not the other way around.
/effort Control how hard the model thinks before responding. Lower effort means faster (and cheaper) answers for straightforward tasks; higher effort gives you deeper reasoning for complex problems.
/context Visualize current context usage as a colored grid. Shows optimization suggestions when context gets heavy — useful for understanding how much room you have left in a long session.
/clear Clear conversation history and free up context. Useful when context gets cluttered after a long session or when you are switching to an unrelated task. Claude Code preserves the previous session so you can resume it later.
/usage Show your plan usage limits and current rate limit status. Useful for checking how much capacity you have left before hitting a rate limit. For per-session token costs, use /cost instead.

The ! prefix — running shell commands

You do not need to leave Claude Code to interact with your terminal. Prefix any command with ! and it runs directly in your shell:

! git log --oneline -5
! npm test
! docker ps

The output lands in the conversation, so the agent can see it too.

@ — referencing files

When you want the agent to look at a specific file, type @ followed by the path. Claude Code treats this as an explicit reference and loads the file into context:

@src/auth/middleware.ts What does the token validation logic do here?

This is more precise than asking Claude Code to "look at the auth middleware" — it removes ambiguity and avoids unnecessary file search.

Pasting images

Claude Code is multimodal. You can paste screenshots, diagrams, or mockups directly into the conversation. This is practical for:

  • Showing a UI bug ("this button should be aligned to the right")
  • Sharing a design mockup as a reference for implementation
  • Pasting an error screenshot from a browser or mobile device

Just paste the image into the terminal prompt — Ctrl+V in most terminals, Cmd+V in iTerm2, Alt+V on Windows. No special syntax needed.

Extending with plugins

Claude Code supports plugins that add new slash commands and skills. To manage them:

/plugin

This opens the plugin manager. A few plugins worth considering early on:

  • code-review — structured multi-agent code review (used later in this workflow)
  • claude-md-management — helps maintain and improve your CLAUDE.md over time

Plugins live locally — add or remove them at any time.

Further reading: The official Claude Code docs cover all of this in depth — keybindings, configuration files, permission modes, MCP integrations, and more. What is listed here is enough to get started.

1.3 Configure Claude Code

Configure permissions thoughtfully. Claude Code asks for permission before running commands or editing files. For production codebases, keep the default approval mode while you build trust. You can use the /permissions command to grant selective access with wildcard syntax:

/permissions
Bash(npm run *)        — allow all npm scripts
Bash(git *)            — allow git operations
Edit(/src/**)          — allow edits within src/

Security note: Avoid the --dangerously-skip-permissions command-line flag (passed when launching claude) at first, especially on client projects or production codebases. Guardrails exist for a reason — they prevent accidental destructive commands. As you become more advanced, you may find legitimate uses for it, but exercise caution when working with shared repositories.

Pro tip: Set up the status line. Run /statusline without arguments and Claude Code will auto-configure a status line from your shell prompt. It sits at the bottom of your terminal showing the current model, mode, and session state.


Step 2 — Prepare Your Project

Before you write your first prompt, set up your project so the agent can understand your codebase, standards, and constraints.

2.1 Create your CLAUDE.md

CLAUDE.md is a project instruction file that Claude Code automatically loads at the start of every session. It tells the agent what the project is about, how it is structured, and what conventions to follow — build commands, testing practices, coding style, architecture decisions, and anything else that shapes how work gets done in the repo.

Generate it with Claude Code itself:

The fastest way to bootstrap a CLAUDE.md is to let Claude Code analyze your project:

> /init

This command scans your project structure, tech stack, and conventions, then generates a CLAUDE.md file automatically.

A CLAUDE.md generated by /init is a solid starting point, but it can't capture everything — team conventions, preferred patterns, or lessons from real sessions tend to surface over time. For an existing CLAUDE.md that could use a refresh, the claude-md-management official plugin helps: its claude-md-improver skill analyzes your current file and suggests concrete improvements, and /revise-claude-md command updates it with lessons learned from your current session.

Review the generated file carefully and refine it. A good CLAUDE.md is concise (aim for under 100-150 lines), specific (include actual commands, not generic advice), and maintained (update it as the project evolves).

What to put in CLAUDE.md — and what not to:

The key principle: don't explain things the model already knows. You don't need to describe what TDD is, how SOLID works, or what microservices mean. The model knows all of that. Instead, state your preferences and constraints — what you want the model to do differently from its defaults.

  • Do: Use TDD with red-green-refactor approach — the model knows what this means, you just need to tell it to do it
  • Do: Keep files under 200 lines — split into modules when they grow beyond that
  • Do: Prefer composition over inheritance or Use functional style where possible
  • Don't: Explain what TDD is, what SOLID stands for, or how dependency injection works
  • Don't: Include generic best practices ("write clean code", "use meaningful variable names") — the model does this by default

Every line in CLAUDE.md costs tokens on every session. State preferences concisely, skip the explanations.

/init generates a solid starting structure — architecture, key commands, coding standards, constraints. After generating, review it and add your team's specific preferences that /init couldn't infer from the code alone (e.g., Use TDD (red-green-refactor), Keep files under 200 lines, preferred patterns for error handling or testing).

Pro tip: Reference existing files as examples rather than describing patterns in words. One concrete example file is worth a hundred words of explanation.

2.2 Set up coding standards and linters

Agents follow your coding standards only if those standards are explicit and enforceable. Before starting agentic work:

  1. Configure your linter/formatter — ESLint, Prettier, Black, dotnet format, Ruff — whatever your stack requires. Make sure these run via a simple command listed in CLAUDE.md.
  2. Set up pre-commit hooks or Claude Code hooks — Claude Code supports hooks that run automatically after tool use. Scope them carefully: PostToolUse hooks fire after every single file edit during active sessions. A slow hook here will noticeably degrade your workflow - if the agent edits 20 files, that hook runs 20 times.

Rule of thumb: PostToolUse is for fast, single-file operations only - formatting or linting the one changed file, nothing more. Never run project-wide commands here (full test suite, mypy, eslint ., static analysis across the codebase). For those, use Stop hooks - they fire once when Claude finishes the entire task, not after every individual edit.

Example — fast per-file formatting in PostToolUse (.claude/settings.json):

json { "hooks": { "PostToolUse": [{ "matcher": "Write(*.py)", "hooks": [{ "type": "command", "command": "python -m black $file" }] }], "Stop": [{ "matcher": "", "hooks": [{ "type": "command", "command": "npm test" }] }] } }

In this example, Black formats only the single changed .py file on every write (fast, scoped to $file), while the full test suite runs once at the end of the task (slow, project-wide). Notice that the PostToolUse command targets $file, not the whole directory - this distinction matters.

  1. Commit your CLAUDE.md — This file belongs in version control. It is documentation for both humans and AI agents. Keep it updated alongside your codebase.

2.3 Set up AI-powered code review

AI code review works best as a two-layer strategy: iterative in-session review during implementation (primary) and CI-based review on pull requests (optional safety net). In-session review catches issues early, costs almost nothing extra (the context is already loaded), and gives you multiple chances to fix things before the code leaves your machine. The second layer is useful but consumes API credits or subscription quota on every push — treat it as a final gate, not the main mechanism.

Layer 1: Iterative in-session review (primary)

The most effective review happens during implementation, not after it. Because the AI agent already has your codebase in context, in-session review adds minimal overhead while catching issues at the earliest — and cheapest — point.

Recommended practice: multi-pass review before creating a PR

After completing implementation (Step 4) and before creating a pull request, run 2-3 review iterations within your Claude Code session:

  1. Pass 1 — Built-in code review: Run /code-review (an official plugin for Claude Code) to launch parallel review agents that analyze your changes for logic errors, security issues, and standards violations
  2. Pass 2 — Project-specific skill: If you have created a custom review skill tailored to your project's specific concerns (domain rules, architecture constraints, common pitfalls), run it as a second pass. Project-specific skills catch things generic review cannot — they know your team's patterns, your client's requirements, and your codebase's known weak spots
  3. Pass 3 (optional) — Targeted review: If the feature touches security-sensitive code, performance-critical paths, or complex business logic, run a focused third pass with a specific prompt (e.g., "Review only the authorization logic in these changes for OWASP Top 10 vulnerabilities")

Why multiple passes work: Each review iteration operates with slightly different focus and heuristics. In practice, a second or third pass regularly turns up issues the first one missed — subtle logic errors, missing edge cases, convention violations. The cost is marginal (the context is already loaded), while the cost of shipping a bug to production is not.

Creating a project-specific review skill:

A custom review skill (stored in .claude/commands/review.md or as an Agent Skill) should encode your team's specific review priorities. Example:

<!-- .claude/commands/project-review.md -->
Review the current changes with focus on our project-specific concerns:
1. All database queries use parameterized statements (no string concatenation)
2. New API endpoints have proper authorization attributes
3. Event handlers follow our idempotency pattern (see OrderEventHandler.cs)
4. No direct HttpClient usage — all external calls go through typed clients
5. DTOs use records with required properties, not mutable classes
6. Background jobs have proper retry policies and dead-letter handling

Layer 2: CI-based review on pull requests (optional safety net)

CI-based review should ideally find nothing new — that means Layer 1 did its job. Its value is as a final gate: it runs on a fresh context (no session drift), reviews the complete diff against the base branch, and leaves comments directly on the PR for team visibility.

Cost consideration: CI-based review triggers a full API call with the entire diff context on every push. For active PRs with frequent pushes, this adds up quickly. To manage costs: trigger only on opened and ready_for_review events (not synchronize), use a cost-efficient model (mid-tier rather than frontier), and rely on Layer 1 for iterative fixes. If budget is tight, Layer 1 alone covers most of the value.

Quick setup with Claude Code:

Requires GitHub CLI installed and authenticated (gh auth login).

claude
> /install-github-app

This command interactively walks you through the entire setup — it asks questions, gives you links to open, and tells you what commands to run. By the end, it installs the Claude GitHub App, configures the required secrets, and creates a PR with two workflow files. Once you merge that PR, both workflows are active:

  • claude.yml — responds to @claude mentions in PR comments, review comments, and issues. Use it for on-demand questions, fixes, or implementation requests.
  • claude-code-review.yml — runs automatic code review on every PR using the code-review plugin. Posts inline findings directly on the PR diff.

Who can set this up? Installing a GitHub App requires admin access to the GitHub repository. However, GitHub organization owners can block repo admins from installing apps — in that case, only the org owner can do it. On Enterprise Cloud plans, enterprise owners can restrict this further. Repository secrets (for the API key or OAuth token) also require GitHub repo admin access. If you don't have it, you'll need to coordinate with someone who does.

Tuning the generated workflows:

The default workflows work out of the box, but are worth adjusting before you merge the PR:

  • Control costs: The default review workflow triggers on every push (synchronize), which adds up fast on active PRs. Remove it and keep only opened and ready_for_review:

yaml # in claude-code-review.yml, change: on: pull_request: types: [opened, ready_for_review] # was: [opened, synchronize, ready_for_review, reopened]

  • Change the model if needed: In both claude.yml and claude-code-review.yml, add claude_args to the action's with: block. Model aliases (sonnet, opus, haiku) resolve to the latest available version:

yaml - uses: anthropics/claude-code-action@v1 with: claude_args: "--model sonnet"

Customize what gets flagged by adding a REVIEW.md file to your repository root. This is the official mechanism for review-specific guidance — Claude reads it during code review and treats it as additive rules on top of its default correctness checks. Use it to encode what to always flag, what to skip, and team-specific conventions. Your CLAUDE.md also influences reviews (Claude flags violations as nits), but REVIEW.md keeps review-only rules separate.

Example REVIEW.md (adapted from official Claude Code docs):

# Code Review Guidelines

## Always check
- New API endpoints have corresponding integration tests
- Database migrations are backward-compatible
- Error messages don't leak internal details to users

## Style
- Prefer early returns over nested conditionals
- Use structured logging, not f-string interpolation in log calls

## Skip
- Generated files under `src/gen/`
- Formatting-only changes (our linter handles it)

Further reading: Code Review (Claude Code docs) for full setup and customization options. For security-focused reviews, check out anthropics/claude-code-security-review — a dedicated GitHub Action for OWASP-aligned vulnerability detection.


Step 3 — Plan Before You Code

Regardless of where the task comes from — GitHub issue, Jira ticket, Slack message — plan before you code.

No real task at hand? Use a genuine task from your current project if possible — real tasks teach more than toy examples. Otherwise, create a small project: a CLI tool, a REST API, a data pipeline. The goal is to exercise the full workflow — plan, implement, review, commit.

3.1 Start in plan mode

Start in plan or read-only mode. Let the agent explore and reason about the codebase without editing files or running commands — you want it to understand the problem fully before touching any code.

3.2 Write your prompt

Give Claude the task with enough context to produce a meaningful plan. Include the "what" (the desired outcome), link to requirements, and mention any constraints. You do not need to over-specify the "how" — that is what planning is for.

Example prompt in plan mode:

Implement the CSV upload feature from GitHub issue #42.

Requirements:
- Users upload CSV files (up to 50MB) through the web UI
- System validates against the CustomerImport schema
- Invalid rows collected into a downloadable error report
- Processing happens asynchronously with progress indicator

Constraints:
- Use Hangfire for background processing (already in our stack)
- Follow the patterns in OrderService.cs
- All new code needs unit tests following our test patterns

Clarify any details with me before proposing a plan.

Key prompting principles:

  • Be specific about outcomes, not implementation steps — let the agent figure out the approach
  • Reference existing code if you have it — point to a file that implements a similar pattern, so the agent follows the same approach
  • Say "clarify details with me if needed" — this invites the agent to ask questions rather than guess
  • Mention what NOT to do if there are common pitfalls (e.g., "Do not add new NuGet packages without asking me first")

3.3 Clarify requirements through dialogue

Claude will ask clarifying questions — answer them. Push back on parts of the plan you disagree with. Ask for alternatives. This dialogue is where you catch misunderstandings that would be expensive to fix in code.

Example exchange:

Claude: "Should the error report include row numbers from the original CSV?"
You:    "Yes, include original row numbers and the specific validation error
         for each failed row."

Claude: "The current schema validation uses FluentValidation. Should I follow
         that pattern or use data annotations?"
You:    "Use FluentValidation — it's our standard."

Claude: "I'm proposing to stream the CSV parsing rather than loading the
         entire file into memory. This handles the 50MB limit efficiently.
         Does that align with your expectations?"
You:    "Yes, good call. Make sure it handles different encodings — we've had
         issues with UTF-8 BOM and Windows-1252 from customer files."

Iterate until the plan feels right. Two or three rounds of clarification typically produce a solid plan.

3.4 Review and approve the plan

Once Claude presents a complete plan, review it critically:

  • Does it cover all requirements from the issue?
  • Does it follow the architectural patterns described in CLAUDE.md?
  • Are there edge cases it missed?
  • Is the scope reasonable — not over-engineered, not under-engineered?

If the plan is solid, approve it and move to implementation — or, even better, save it as a durable artifact first (see 2.5). If not, tell Claude exactly what to change.

3.5 Save the plan as a durable artifact

This is one of the most important habits to build. Before writing any code, ask the agent to save the approved plan to a file (e.g. docs/plans/feature-x-plan.md or simply feature-x-plan.md in the project root).

A saved plan file gives you several advantages:

  • Team review before implementation — Share the plan with teammates or the tech lead for feedback before committing to an approach. Catching a wrong assumption in a plan file costs minutes; catching it in code costs hours
  • Verification checklist after implementation — Once all code is written, ask the agent to verify the implementation against the plan (see Step 4.4). Agents sometimes skip or simplify details during longer sessions — the plan file keeps them honest
  • Frequent context resets without losing progress — You can /clear the context at any point and resume from the plan file in a fresh session. This is particularly valuable for complex features that span multiple sessions
  • Audit trail — The plan documents what was agreed and why, which is useful for code review, post-mortems, and onboarding

Context window health: The agent typically reports context usage after planning. If it's above ~50%, or the implementation task is straightforward, run /clear and start implementation with a fresh context — a clean context produces better code. Point the new session to the plan file and continue from there.


Step 4 — Implement with the Agent

4.1 Switch to implementation mode

Exit plan mode (press Shift+Tab to cycle back, or Escape). If you started a new session with the approved plan, point the agent to the plan file. For example:

Implement the plan in /docs/plans/csv-upload.md step by step. After each
logical step, run the tests and show me the results before moving on.
Commit each completed step with a descriptive commit message.

4.2 Work in small, reviewable steps

Break the work into logical chunks and review at each checkpoint — never let the agent implement everything in one shot:

Step 1: Data model and validation → review → commit
Step 2: Repository and data access → review → commit
Step 3: Background job and processing logic → review → commit
Step 4: API endpoints → review → commit
Step 5: Unit and integration tests → review → commit

After each step, Claude should run the relevant tests and linters. If anything fails, let it fix the issues before moving on.

What to watch for during implementation:

  • Does the code follow your conventions? Check naming, patterns, error handling against what is in CLAUDE.md
  • Are tests meaningful? AI-generated tests can look good but test nothing. Verify they test behavior, not just method calls. Ask Claude to introduce a bug and verify the test catches it
  • Is it using approved dependencies? The agent might suggest new libraries. Check if they are necessary and approved
  • Security considerations — sanitized inputs, no hardcoded credentials, proper authorization checks

4.3 Use git discipline

Ask Claude Code to make atomic commits with clear messages after each completed step:

Commit the data model changes with a descriptive message following
conventional commits format (feat:, fix:, etc.)

Atomic commits make review easier and let you revert individual changes cleanly. Never bulk-commit an entire feature in one go.

4.4 Verify implementation against the plan

The plan file from Step 3.5 pays off here. After all steps are implemented, point the agent back at the plan and ask it to verify completeness:

Please verify whether all uncommitted changes are consistent
with feature-x-plan.md. Check that every requirement is addressed
and nothing was missed or simplified beyond what was agreed.

Agents often skip or simplify small details during longer sessions — a validation rule that was in the plan but got lost after a /clear, an edge case that seemed obvious but never made it into code. This verification catches those gaps reliably, precisely because the plan file lives outside the context window and survives resets and session drift.

4.5 Handle context window limits

Complex features that span multiple hours will hit context limits. Watch the agent's context usage report — when it's above ~50%, or you're switching to a different part of the task:

  1. Save progress — ensure all code is committed
  2. Update the plan file from Step 3.5 — mark completed steps and note what remains
  3. Start a new Claude Code session
  4. Point the new session to the plan file and continue from where you left off
> Read docs/plans/csv-upload.md and continue implementation from step 4.
  The previous steps are committed. Verify the existing code before
  proceeding.

Step 5 — Review and Finalize

5.1 Run your own code review

After the agent has completed all steps, run the multi-pass in-session review described in Step 2.3 (Layer 1): start with /code-review, then run your project-specific review skill if you have one, and optionally a targeted third pass for security-sensitive or complex changes.

Automated review is a starting point, not a substitute for your own eyes. Read the diff yourself:

git diff main..HEAD

Ask yourself: Do I understand every line? Would I be comfortable defending this code in a pull request review? If not, ask Claude to explain the parts you do not understand — and then decide whether the approach is correct.

5.2 Run the full test suite

# Run the complete test suite, not just the new tests
npm test          # or dotnet test, pytest, etc.

If anything fails, ask Claude to investigate and fix. Pay special attention to existing tests — regressions in unrelated code are one of the most common AI pitfalls.

5.3 Create the pull request

Once everything passes, create a PR:

> Create a pull request for the CSV upload feature. Reference issue #42.
  Include a summary of what was implemented and any decisions made
  during planning.

Claude Code creates the branch, pushes, and opens the PR through the GitHub CLI. The CI pipeline (with the Claude review action from Step 2) reviews the PR automatically, and your human teammates review as well.

5.4 Address review feedback

If reviewers (human or AI) leave comments on the PR, you can address them directly from Claude Code:

> Read the review comments on PR #87 and address each one.
  Run tests after each fix.

Or, if using the GitHub integration, simply reply to the review comment with @claude and a description of what to fix.


Step 6 — Evolve Your Workflow

After a few tasks with this workflow, you'll spot friction points worth automating. The highest-impact next steps:

Create custom slash commands

For repetitive workflows, create reusable prompt templates in .claude/commands/:

<!-- .claude/commands/new-endpoint.md -->
Create a new API endpoint following the patterns in our existing controllers.

Steps:
1. Create the endpoint in the appropriate controller
2. Add request/response DTOs with FluentValidation
3. Add service layer method
4. Write unit tests following our test patterns
5. Update OpenAPI documentation
6. Run all tests and linter

Endpoint details: $ARGUMENTS

Then invoke it with: /new-endpoint POST /api/customers/import

Build skills for recurring patterns

Skills are the next step beyond slash commands — reusable, self-contained capability packages that the agent loads on demand. Use the skill-creator skill (from the official plugins repo) to generate skills from your existing workflows. If you notice you keep giving the agent the same instructions (e.g., "when writing tests, always use our factory pattern"), that's a skill waiting to be extracted.

Similarly, use the claude-md-management plugin and its claude-md-improver skill to keep your CLAUDE.md up to date with lessons learned from your sessions.

Practical tip — learning from mistakes: After a long session where the agent struggled and eventually found the right approach, open a second session and tell it: "In another session we worked on X and you had trouble with Y. Analyze what went wrong and do what you can — create a skill, update CLAUDE.md, whatever you think will prevent the same mistakes next time." Give the agent a brief summary of what happened — it can access past session files stored in ~/.claude/projects/, but searching through them is hit-or-miss with long conversations. Your summary gives the agent enough context to turn those lessons into durable project knowledge — skills, CLAUDE.md updates, or both.

Use sub-agents for larger tasks

When a feature has independent components, add "use subagents" to your prompt or planning feedback — the agent handles the rest. Sub-agents run in their own context, so they keep your main conversation clean and are especially useful for tasks that read many files or produce verbose output.

Consider Spec-Driven Development for complex features

The workflow described in this Quick Start uses Claude Code's built-in planning — this is sufficient for most tasks. However, for large, complex features (especially those involving multiple developers or agents), you may benefit from a more structured approach called Spec-Driven Development (SDD).

SDD introduces formal, versioned specification documents that serve as the contract between requirements and implementation. Tools like GitHub Spec Kit, AWS Kiro, and JetBrains Junie provide structured SDD workflows. Claude Code and Codex also support SDD through task files and CLAUDE.md/AGENTS.md conventions.

See Part I, Section 2 — Spec-Driven Development for a detailed explanation, comparison, and practical examples.


Quick Reference — The Workflow at a Glance

┌──────────────────────────────────────────────────────────────────┐
│                  AGENTIC DEVELOPMENT WORKFLOW                    │
├──────────────────────────────────────────────────────────────────┤
│                                                                  │
│  SET UP (once)                                                   │
│  ├─ Install Claude Code                                          │
│  ├─ Learn the interface (slash commands, !, @)                   │
│  └─ Configure permissions                                        │
│                                                                  │
│  PREPARE (once per project)                                      │
│  ├─ Generate and refine CLAUDE.md                                │
│  ├─ Set up linters, formatters, hooks                            │
│  └─ Set up AI-powered code review                                │
│                                                                  │
│  PLAN (every task)                                               │
│  ├─ Enter plan mode (Shift+Tab × 2)                              │
│  ├─ Describe the task with context and constraints               │
│  ├─ Clarify requirements through dialogue                        │
│  ├─ Review and approve the plan                                  │
│  └─ Save approved plan to a file (e.g. feature-x-plan.md)        │
│                                                                  │
│  IMPLEMENT (every task)                                          │
│  ├─ Work in small steps with review at each checkpoint           │
│  ├─ Run tests after each step                                    │
│  ├─ Make atomic commits with descriptive messages                │
│  ├─ Verify implementation against plan file                      │
│  └─ Monitor context window — /clear when needed                  │
│                                                                  │
│  REVIEW & SHIP (every task)                                      │
│  ├─ Run /code-review + read the diff yourself                    │
│  ├─ Run full test suite                                          │
│  ├─ Create PR (CI runs automated Claude review)                  │
│  └─ Address feedback, merge                                      │
│                                                                  │
│  EVOLVE (ongoing)                                                │
│  ├─ Create custom slash commands for common workflows            │
│  ├─ Use sub-agents for independent components                    │
│  └─ Explore SDD for complex features (see Part I, Section 2)     │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

This playbook evolves alongside the tools and practices it covers. Contributions, corrections, and suggestions are welcome.

Core Principles for Working with AI: Read more