# Codex Development Flow Example

This page shows how the demo K2 assets support a real Codex development
workflow. The task is intentionally framed like a legacy Java change request:
developers need source code, tests, release docs, and Confluence-style
engineering rules before editing code.

## Scenario

Customer task:

```text
Add a Flink REST endpoint that exposes checkpoint summary information for a
running job. Before editing, identify the implementation pattern, route
registration path, request/response message classes, tests to update, and
team guardrails.
```

The customer analogue is a large Java financial application where conventions
live in Confluence and the agent must avoid inventing controllers or using
stale framework patterns.

## What K2 Adds To Codex

| K2 asset | Demo object | How it helps Codex |
| --- | --- | --- |
| Collection | `flink-docs-2.2` | Gives Codex version-pinned release docs instead of general model memory. |
| Collection | `flink-code-2.2` | Gives Codex exact Java classes, methods, modules, and test anchors. |
| Collection | `java-rd-guides` | Gives Codex Confluence-style guardrails before it proposes code. |
| Agent | `Java R&D Guides Agent` | Answers "what rules must I obey?" before code planning. |
| Agent | `Flink Docs Agent` | Answers "what does this version say?" with documentation citations. |
| Agent | `Flink Code Agent` | Answers "what files and tests prove the implementation pattern?" |
| Agent | `Java R&D Architect Agent` | Synthesizes docs, code, tests, and guide context into an implementation plan. |
| Knowledge Feed | `Flink REST Guide Feed` | Turns repeated source findings into durable guide material for future runs. |
| Pipeline Spec | `Java Customer Demo Pipeline` | Shows the graph that binds collections, agents, feed, and subscriptions. |

## Baseline Codex Prompt

Use this first to show the weak path. Codex can write Java, but it has to infer
where the framework-specific and team-specific rules live.

```text
You are working in a large legacy Java codebase.

Task:
Add a REST endpoint that exposes checkpoint summary information for a running
job. Before editing, identify the implementation pattern, route registration
path, request/response message classes, tests to update, and any engineering
guardrails.

Constraints:
- Do not use web search.
- Do not use K2 or MCP tools.
- Use only what you already know and local repository search.
- If you are uncertain, say so.

Return:
1. Implementation plan
2. Files to inspect or edit
3. Tests to add or update
4. Risks and uncertainties
```

Expected demo observation: Codex may find useful files with `rg`, but it has no
durable memory of the guide rules and may miss the customer convention that
controller-like endpoints must not be implemented with Spring MVC, servlet, or
JAX-RS.

## K2-Assisted Codex Prompt

Use this after enabling the K2 MCP server. The important difference is the
order of work: guide rules first, release docs second, source and tests third,
then implementation.

```text
You are working in a large legacy Java codebase with K2 MCP available.

Task:
Add a REST endpoint that exposes checkpoint summary information for a running
job. Before editing, retrieve K2 context and produce a cited implementation
plan.

Required retrieval order:
1. Search guides for REST endpoint and checkpointing guardrails.
2. Search Flink 2.2 docs for REST and checkpointing requirements.
3. Search code for handler, message header, route registration, response body,
   and neighboring test patterns.
4. Only then propose files to edit and tests to add.

Use metadata filters:
- framework=flink
- framework_version=2.2.0
- api_surface in [rest, checkpointing]
- source_kind in [guide, docs, code, test]

Return:
1. Implementation plan
2. Files to inspect or edit
3. Tests to add or update
4. Risks and uncertainties
5. Citations for every major claim
```

Expected demo observation: Codex can explain the implementation path in terms
of the customer's actual rules and the versioned Java source evidence it
retrieved, instead of relying on a generic Java web-controller mental model.

## Codex MCP Setup

Set credentials and live IDs in your shell. Do not commit the key.

```bash
export K2_API_HOST="https://api-dev.knowledge2.ai"
export K2_API_KEY="..."
export K2_PROJECT_ID="<project-id>"
export K2_FLINK_DOCS_CORPUS_ID="<flink-docs-corpus-id>"
export K2_FLINK_CODE_CORPUS_ID="<flink-code-corpus-id>"
export K2_GUIDES_CORPUS_ID="<guides-corpus-id>"
```

For the current dev tenant used during setup, the project was:

```text
demo-java-customer-2026-05-10
```

Run Codex with the repo-local MCP server:

```bash
codex -C /path/to/java/repo \
  -c 'mcp_servers.k2-java-rd.command="/path/to/k2-java-rd-demo/scripts/k2_java_rd_mcp_server.py"' \
  -c 'mcp_servers.k2-java-rd.env.K2_MCP_BACKEND="sdk"' \
  -c 'mcp_servers.k2-java-rd.env.K2_MCP_RETRIEVAL_PROFILE="java_exact"' \
  -c 'mcp_servers.k2-java-rd.env.K2_MCP_COMPACT_TOOL_SURFACE="true"' \
  -c 'mcp_servers.k2-java-rd.env_vars=["K2_API_KEY","K2_API_HOST","K2_PROJECT_ID","K2_FLINK_DOCS_CORPUS_ID","K2_FLINK_CODE_CORPUS_ID","K2_GUIDES_CORPUS_ID"]'
```

The same setup can be used non-interactively with `codex exec` for repeatable
evaluation runs.

## Live K2 Transcript Command

Use this command to produce a JSON transcript that demonstrates the same flow
through K2 Agents, Knowledge Feed, and Pipeline APIs:

```bash
PYTHONPATH=src python -m k2_java_rd_demo.cli run-customer-demo-flow \
  --execute \
  --project-id "<project-id>" \
  --flink-docs-agent-id "<docs-agent-id>" \
  --flink-code-agent-id "<code-agent-id>" \
  --java-rd-guides-agent-id "<guides-agent-id>" \
  --java-architect-agent-id "<architect-agent-id>" \
  --flink-rest-guides-feed-id "<feed-id>" \
  --pipeline-spec-id "<pipeline-spec-id>" \
  --top-k 6
```

The transcript should show four agent steps:

1. `java_rd_guides_agent`: guide guardrails.
2. `flink_docs_agent`: version-pinned REST and checkpointing docs.
3. `flink_code_agent`: exact Java source and test patterns.
4. `java_architect_agent`: implementation plan synthesized from the above.

It should also show:

- a Knowledge Feed dry-run request for `Flink REST Guide Feed`;
- a Pipeline Spec dry-run with no issues;
- optionally, a pipeline graph linking collections, agents, feed, and target
  corpus.

## Concrete Evidence Codex Should Use

The committed K2 asset bundle contains these proof points:

| Development decision | Evidence source |
| --- | --- |
| Do not implement this as a Spring MVC, servlet, or JAX-RS controller. | `generated://guides/flink/confluence-rest-handler-guardrails.md` |
| Use Flink REST patterns with `MessageHeaders`, request/response message classes, `AbstractRestHandler`, and `WebMonitorEndpoint` registration. | `generated://guides/flink/confluence-rest-handler-guardrails.md` |
| Start from the version-pinned REST API docs and identify request path, headers, body classes, and neighboring handler tests before changing production code. | `https://nightlies.apache.org/flink/flink-docs-release-2.2/docs/ops/rest_api/` |
| Route registration for dispatcher-specific handlers happens in `DispatcherRestEndpoint.initializeHandlers(...)`. | `repo://apache/flink@release-2.2.0/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/DispatcherRestEndpoint.java` |
| Existing job REST handlers extend access-execution-graph handler patterns and return response DTOs from `handleRequest(...)`. | `repo://apache/flink@release-2.2.0/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/JobDetailsHandler.java` |
| Checkpointing changes should use current Flink 2.2 checkpointing APIs and call out migration risk. | `https://nightlies.apache.org/flink/flink-docs-release-2.2/docs/dev/datastream/fault-tolerance/checkpointing/` and `generated://guides/flink/confluence-checkpoint-upgrade-guardrails.md` |

## Existing Codex Patch Artifact

The repo also contains an actual Codex patch-generation artifact that can be
opened during the demo. It is not a slide; it includes the prompt, Codex answer,
patch, and verification result for the same style of Flink REST development
task.

| Arm | Artifacts |
| --- | --- |
| K2-assisted Codex | `docs/evaluations/patch-runs/20260502T085752Z-patch-generation/artifacts/flink-rest-job-vertex-watermarks-include-missing/codex_with_k2_mcp/` |
| Repo-only Codex | `docs/evaluations/patch-runs/20260502T085752Z-patch-generation/artifacts/flink-rest-job-vertex-watermarks-include-missing/codex_repo_only/` |

The K2-assisted answer cites:

- `JobVertexWatermarksHandler.java`;
- `MetricCollectionResponseBody.java`;
- `JobVertexWatermarksHeaders.java`;
- `JobVertexWatermarksHandlerTest.java`;
- the version-pinned Flink REST API docs;
- `generated://guides/flink/rest-handler-checklist.md`.

The focused verification for that K2-assisted patch recorded:

```text
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0
BUILD SUCCESS
```

Use this artifact when the customer asks what "development workflow" means in
practice: Codex received K2 context, made a Java patch, and the focused REST
handler test passed.

## Demo Talk Track

1. Run baseline Codex and show where it spends time searching or guesses from
   memory.
2. Run K2-assisted Codex and show that it starts from guide guardrails, not
   generic Java conventions.
3. Open the K2 transcript and point to the exact source documents behind the
   plan.
4. Show the Pipeline Spec graph to explain how docs, code, guides, Agents, and
   Feed are connected.
5. Explain the Knowledge Feed as the maintenance loop: when source patterns are
   repeatedly useful, K2 can materialize them back into the guide corpus so the
   next Codex session starts with better internal guidance.

## Readiness Gate

Before running the customer demo, inspect the live project:

```bash
PYTHONPATH=src python -m k2_java_rd_demo.cli inspect-live-k2 \
  --project-id "<project-id>"
```

Proceed only when the project has:

- 3 collections and 15 documents;
- 4 active agents;
- 1 Knowledge Feed;
- 1 Pipeline Spec;
- ready indexes for docs, guides, and code.

If any corpus reports `No indexes built for this corpus`, build or wait for the
index before using that part of the transcript as evidence.
