Measuring Decision Lag, the New Long Pole

When agents can ship in hours, the bottleneck is the decision. Most teams know when code landed. They do not know how long the decision sat before anyone wrote the first implementation commit.

A shipped change has a commit timestamp. The work before that commit is harder to see. When did the team decide to make the change? Stoa defines that gap as Intent Lead Time: the time from product decision to first implementation commit.

Intent Lead Time audit timeline

We measured it by tracing recent Stoa work through four points: idea, decision, artifact, and first commit. Tracing a single decision back to its origin could take up to an hour of manual review across transcripts, docs, and git history. Even then, we still were not sure we had found the earliest signal.

1. Start With a Commit

The audit started with commits because Git gives the timestamp, message, and patch. The harder endpoint is the decision, which might live in a meeting transcript, meeting note, topic summary, Claude Code agent log, design doc, Slack thread, or nowhere structured at all.

Intent Lead Time = first implementation commit - captured product decision

The endpoints rarely sit next to each other. Some decisions never become commits, and some commits trace back through several earlier discussions.

To diagnose bottlenecks, each record tracked four dates:

When was the idea first mentioned?
When did we first decide to pursue the idea?
When did the first tangible artifact, such as a design spec or prototype, get created?
When was the first commit?

2. Pull Together the Sources

Each trace drew from four source groups:

Conversations. We exported sessions, transcripts, topics, meeting notes, summaries, and metadata from our database into local files.
Git history. We pulled commit metadata, changed files, and patches from the Stoa repo so implementation evidence could be inspected next to meeting evidence.
Design docs. We collected specs, prototypes, and other planning artifacts.
Slack and other side channels. We left these out of scope for this pass.

No source maps cleanly to one stage. A discussion may introduce the idea, record the decision, sketch a prototype, or only hint at the work. A prototype may follow a decision or start as the loose idea. A commit may implement the feature or preserve an experiment before the team has decided to pursue it.

The audit had to separate four kinds of evidence:

intent evidence
decision evidence
design-doc evidence
implementation evidence

3. Build Records From the Evidence

A record ties several artifacts to the same piece of work. A valid record might include:

this meeting note introduced the idea
this meeting transcript is where we decided to act
this doc defined the approach details
these commits implemented it

The initial target was the last week of work. The workspace grew beyond that because recent work pointed back to earlier docs and meetings. The current corpus covers a broader April window:

1,156 ingested commits
583 indexed docs
1,530 extracted doc items
94 current intent records
about 1,000 evidence objects

Of the 94 records, 48 currently link to implementation commits, 42 are design-doc-only, and 4 have no commit found. Most records are still medium or low confidence.

4. Audit Whether the Records Are True

One feature, the starter-space template, showed why manual audit mattered.

New users should land in a prepared starter space when they first create an account, with files and examples already waiting in the workspace.

The first pass traced the decision to an April meeting. The team discussed how onboarding should copy the canonical starter space into a new org and decided to publish the files into Supabase/S3.

Manual audit found an earlier transcript: new users landed in Stoa without enough context. Several looser conversations circled the same fix: give users hands-on examples after signup. The implementation had a longer lead-in than the first trace showed.

The trace became:

Transcript: activation problem identification
Transcript: comment that we should give users a warm start
Transcript: discussion about pre-loaded use cases for new users
Transcript: explicit assignment to an individual
Design doc: onboarding flow
Implementation commits

Even with search and agents, this chain took manual audit. Slack was out of scope, so the trace may still miss earlier decisions.

5. Fix the Other Failure Modes

The same source-order problem appeared in another record. The first pass pointed to an April 22 implementation doc, but manual review found April 13 meeting evidence for the same work.

A second record failed differently: it used broad, noisy evidence and an April 14 decision timestamp that looked like follow-up context. The better source was an April 6 doc that described the idea before the later discussion.

If the source is wrong, the lead time is wrong. If the record combines several pieces of work, the number is a blend. If a follow-up note is treated as the original decision, the lead time stops meaning anything.

In both cases, the number changed only after manual review.

6. Use Semantic Search for Candidates

Semantic search helped because names drifted across systems. The same work might appear as starter teammate in a commit, Theo in a doc, and a first-meeting experience in meeting notes.

Agents seeded searches from commits, docs, and extracted meeting-note items. They expanded those searches with exact phrases, keywords, and semantic matches.

An agent reviewed the large candidate pool and clustered the candidates into possible records. Starting with candidates gave the agent evidence to verify, but manual audit still found missed links.

The pass produced 2,289 proposal files:

14 commit-first seeds
108 doc-first seeds
167 doc-item seeds
2,000 meeting-item seeds

Since most proposals required review, this workflow held up best:

Seed searches from commits, docs, doc items, and meeting items.
Use exact, keyword, and semantic search to gather possible matches.
Have an agent review and cluster the candidates.
Verify the source artifacts.
Apply the best-supported records.

7. Capture the Work Thread Earlier

The audit showed why this cannot stay a historical reconstruction exercise. We need a workspace that lets reviewers see meetings, docs, transcripts, and commits together, then inspect proposed links and missing pieces. A record needs auditable evidence that the artifacts describe the same work.

We also need to capture intent, decisions, docs, and implementation as work happens, so the trace exists before the audit starts.

We built 94 records, and most are still medium or low confidence. A single feature trace could touch six or more artifacts across four systems. Rebuilding the chain after the fact from disconnected systems is too fragile for an ongoing business metric.

Intent Lead Time has to be captured inside the way the team already works.

Older

AI Broke the Spec Handoff

Newsletter

Get new posts in your inbox

Bring your team together to build better products. Fresh takes on remote collaboration and AI-driven development.