Engineering Learning Review | Stoa Meeting Templates

Purpose: Share the outcomes of technical experiments so the whole team learns, not just the people who ran them

How to run this meeting

Pre-register your success criteria before the experiment starts — this is the most important discipline in a learning review culture. When you define what "success" looks like in advance, you prevent the common failure of unconsciously shifting the goalposts after seeing the results. A pre-registered hypothesis forces honesty: you either hit the criteria or you didn't, and both outcomes are valuable. Teams that skip this step tend to report experiments as successes by finding the metric that looks best after the fact.

Share negative results with the same rigor as positive ones. An experiment that showed a promising new caching strategy had no measurable impact on latency is enormously useful — it saves the next team from re-running the same experiment. Negative results are often more valuable than positive ones because they close off directions that might otherwise be relitigated repeatedly. Create explicit social reinforcement for sharing "we tried this and it didn't work" — celebrate the learning, not just the win.

Document what you'd do differently: this is the section that most teams skip and most teams need most. Reflecting on the methodology — not just the result — is what builds experimental skill over time. Link to the actual code, data, or configuration used in the experiment. This allows others to replicate, extend, or audit the work, and prevents the "we tried something like that two years ago" problem where institutional knowledge disappears when people leave.

Before the meeting

Write up the pre-registered hypothesis, method, and success criteria before the meeting (these should exist from when the experiment was started)
Prepare result data in a shareable format — charts, tables, or a dashboard link
Identify who should attend beyond the team that ran the experiment (who else would benefit from these findings?)
Link to all relevant code, configuration, and data in the document before the meeting so attendees can review in advance
If the experiment ran over multiple weeks, prepare a brief narrative of how the approach evolved

Meeting Details

Date:
Facilitator:
Presenter(s):
Attendees:
Duration: 45–60 minutes
Experiment period:

Experiment Goal

What question were you trying to answer? State it as a falsifiable hypothesis. Include the pre-registered success criteria.

Question: Can we reduce p99 API response times for the search endpoint by switching from our custom Redis caching layer to a CDN-edge cache?

Hypothesis: Serving search results from CDN edge nodes will reduce p99 latency by at least 40% for users outside the us-east-1 region, because the primary source of latency is geographic distance to our single-region Redis cluster.

Pre-registered success criteria:

p99 latency reduction of ≥40% for non-us-east-1 users
Cache hit rate ≥75% within 72 hours of rollout
No increase in stale-result complaints from users (measured via support tickets)

Method

How did you run the experiment? Be specific enough that someone else could replicate or audit it.

Approach: A/B test using feature flags. 20% of non-us-east-1 traffic routed to CDN-cached responses for 14 days (Jan 6–20, 2025). CDN configured with a 60-second TTL for search results; cache invalidated on any write to the search index.

Tooling: Cloudflare Workers for edge caching, LaunchDarkly for traffic splitting, Grafana for latency metrics, custom dashboard for cache hit rate.

Control group: Remaining 80% of non-us-east-1 traffic continued using the existing Redis caching path.

What we changed from the original plan: We reduced TTL from 5 minutes to 60 seconds on day 3 after seeing elevated stale-result reports in the treatment group. This was a protocol adjustment logged in the experiment doc.

Results

Present the actual numbers against the pre-registered success criteria. Do not editorialize yet — just the data.

Metric	Pre-registered target	Actual result	Met?
p99 latency reduction (non-us-east-1)	≥40%	61% reduction (820ms → 320ms)	Yes
Cache hit rate at 72 hours	≥75%	68% (at original 5-min TTL); 71% after TTL reduction	No
Stale-result support tickets	No increase vs. baseline	+340% at 5-min TTL; baseline levels after TTL reduction	Mixed

Additional findings (not pre-registered):

p99 latency for us-east-1 users showed no meaningful change (expected)
Cost: CDN edge caching is 40% cheaper per request than Redis for this traffic pattern
Cache hit rate improved to 78% by day 10 as Cloudflare's edge nodes warmed up

Insights

Interpretation of the results. What do the numbers mean? What was surprising?

The latency improvement significantly exceeded our hypothesis — 61% vs. a 40% target — validating that geographic distance to Redis was the dominant latency driver for international users. The stale-result problem was a significant surprise: a 5-minute TTL that felt conservative in planning created real user-facing issues in production because our search index is updated much more frequently than we estimated (roughly every 90 seconds during business hours).

The 60-second TTL represents a practical tradeoff: it reduced the stale-result issue to baseline while preserving 71% of the latency benefit. Cache hit rate falling slightly short of the pre-registered target is largely a consequence of the shorter TTL and is an acceptable tradeoff given the stale-result constraint.

The unexpected cost finding is meaningful — the CDN approach may be worth evaluating for other high-read, low-write API surfaces across the platform.

Implications

What should we do with these findings? Who else needs to know about this? What should we try next?

Recommendation: Roll out CDN edge caching to 100% of non-us-east-1 search traffic with a 60-second TTL. Update the cache invalidation logic to trigger on search index writes to further reduce stale results.

What we'd do differently: Pre-measure search index write frequency before choosing a TTL. A 5-minute TTL seemed reasonable in the abstract but was wrong for this specific endpoint's write pattern. Add "write frequency" as a standard input to the experiment design template for caching experiments.

Who else should know: The data pipeline team is considering a similar CDN caching approach for the analytics dashboard. This experiment's learnings on TTL selection and invalidation strategy are directly applicable. Share with @carlos's team.

Next experiment: Test whether aggressive cache warming (pre-populating the most common search queries) can push the hit rate above 85% while keeping the 60-second TTL.

Action Items

Owner	Action	Due Date	Status
@mia	Roll out CDN edge caching to 100% of non-us-east-1 search traffic	2025-02-07	Open
@mia	Implement cache invalidation on search index write events	2025-02-14	Open
@facilitator	Share experiment summary with @carlos's data pipeline team	2025-02-05	Open
@priya	Add "write frequency" field to the standard experiment design template	2025-02-12	Open
@mia	Design follow-up experiment for cache warming approach	2025-02-21	Open

Follow-up

Publish the full experiment document (including methodology, data, and this write-up) to the engineering wiki under the experiments log. Tag it with relevant technology and domain keywords for discoverability. Post a summary to the #engineering channel linking to the full doc. Archive the raw data and Grafana dashboard snapshots so the results can be audited later. Add the follow-up experiment to the backlog for the next quarter's experiment planning session.

Tech Talk / Brown Bag

Engineering Leadership Meeting

Skip the template

Let Stoa capture it automatically.

In Stoa, the AI agent listens to your engineering learning review and captures decisions, drafts artifacts, and tracks open questions in real time — no note-taking required.

Create your first Space — free