Research & Findings
A hook-based instrumentation framework across
Claude Code, Codex CLI, and Cursor
agentisd
April 2026
Burak Mert Köseoğlu — @mksglu
Prepared for Berkay Mollamustafaoğlu
The Problem
"Engineering teams spend $42K/month on AI coding tools.
80% have zero usage metrics.
Can we measure what these tools actually do?"
Source: Anthropic internal telemetry analysis — 63 tengu_* events tracked in bridge/replBridge.ts; total_cost_usd computed in cli/print.ts:SDKResult
Literature Review
PUBLISHED FINDINGS
METR 2025 — Randomized Controlled Trial
-19% slower with AI · +20% believed faster
arXiv:2507.09089 — n=16, 246 tasks, mature OSS repos
Faros AI 2025 — The Verification Bottleneck
+21% tasks, +98% PRs — but +91% review time
10K+ developers. Organizational throughput: flat.
SonarSource Jan 2026
42% of code is AI-generated · 96% don't trust it
Also: DORA 2024 — -1.5% throughput, -7.2% stability with AI adoption (dora.dev, Figure 7, 39K respondents). METR Feb 2026 — RCT abandoned, 30-50% refused tasks without AI.
"Vendors track everything.
Customers track nothing."
THE MEASUREMENT GAP
What Anthropic Measures Internally
bridge/replBridge.ts, cli/print.ts
Market Validation
context-mode is an open-source MCP plugin that saves 98% of the context window.
It already captures every tool call, every session, every outcome.
WHAT context-mode ALREADY TRACKS (per session)
"The data already exists. ctx_agentisd makes it visible."
Methodology
"AI coding tools expose hook events at every tool call."
WIRE PROTOCOL
Same protocol across Claude Code, Codex CLI, and Cursor
cli/structuredIO.ts — Claude Code hook I/O types and processing
codex-rs/hooks/hook_runtime.rs — Codex CLI hook execution runtime
cursor/plugins/schemas — Cursor plugin hook schemas
Data Model
| Field | Claude Code | Codex CLI | Cursor |
|---|---|---|---|
| tool_name | ✓ PreToolUse | ✓ PreToolUse | ✓ preToolUse |
| tool_input | ✓ full args | ✓ full args | ✓ full args |
| tool_output | ✓ PostToolUse | ✓ PostToolUse | ✓ postToolUse |
| session_id | ✓ all hooks | ✓ all hooks | ⚠ conversation_id |
| cost / tokens | ✓ SDKResult | ✓ OTEL metrics | ✗ not exposed |
| exit_code | ✓ PostToolUse | ✓ PostToolUse | ✓ postToolUse |
FIELD COVERAGE
Core Analysis — Business Personas
Core Analysis — Operational Personas
Full Metric Coverage
| Persona | Metrics | Claude Code | Codex CLI | Cursor | Top Impact |
|---|---|---|---|---|---|
| CTO | 8 | 7 | 7 | 5 | $27K/yr license savings |
| Engineering Manager | 7 | 7 | 7 | 5 | 2.6x sprint velocity |
| DevEx Lead | 7 | 6 | 6 | 4 | 6wk → 2wk onboarding |
| Security Officer | 5 | 5 | 5 | 3 | Full compliance audit |
| FinOps Manager | 5 | 5 | 5 | 1 | 15-30% cost optimization |
| QA Lead | 5 | 5 | 5 | 4 | Targeted tech debt sprints |
| Developer | 5 | 5 | 5 | 4 | Personal mastery curve |
| Onboarding | 5 | 5 | 5 | 3 | 4x faster ramp time |
| Context Sharing | 5 | 4 | 4 | 2 | Knowledge compounds |
| TOTAL | 52 | 88% | 87% | 57% |
Full catalog: github.com/mksglu/agentisd/blob/main/docs/vc/metric-catalog-full.md
Analysis
"Teams explain the same thing to AI 47 times a week."
CLAUDE.md Freshness
"3 teams haven't updated project instructions in 45 days"
SessionStart content hash tracking detects stale context files
SessionStart hook → SHA-256 hash comparison
Skill Effectiveness
QA skill → 92% commit rate
Finance skill → 41% commit rate
PostToolUse skill invocation → outcome analysis
PostToolUse tool_name + git commit correlation
"Explain It Twice" Detection
"5 engineers typed similar deploy instructions"
UserPromptSubmit cosine similarity clustering → auto-suggest skill creation
UserPromptSubmit prompt embedding analysis
Reference: Brian Scanlan, VP Engineering at Intercom —
@brian_scanlan
30+ custom skills in production. JAMF-managed deployment. Weekly usage reports.
Built over 6+ months of platform engineering. agentisd automates the detection.
Product Feature
Teams upload, share, and measure skills across Claude Code, Codex CLI, and Cursor.
Skill Management
Skill Analytics (agentisd measures)
PLATFORM SUPPORT
| Capability | Claude Code | Codex CLI | Cursor |
|---|---|---|---|
| Skill invocation detection | ✓ PostToolUse | ✓ PostToolUse | ✓ postToolUse |
| Marketplace integration | ✓ plugins.ts | ✓ plugins/ | ✓ cursor-team-kit |
| Enterprise push (managed) | ✓ admin settings | ⚠ config.toml | ⚠ Business plan |
| Skill effectiveness tracking | ✓ PostToolUse → outcome | ✓ PostToolUse → outcome | ✓ postToolUse → outcome |
Industry Reference
Brian Scanlan, VP Engineering at Intercom — @brian_scanlan, March 2026
| What Intercom Built | Effort | agentisd Metric |
|---|---|---|
| 30+ analytics skills (Snowflake, Gong, finance, QA) | 6+ months | Skill adoption tracking |
| JAMF deployment to 200+ engineers | Enterprise MDM | Marketplace health score |
| Weekly usage reports & quality evals | Custom Snowflake | Skill effectiveness score |
| QA skill: 7-stage pipeline → GitHub issues | Weeks | Test pass rate tracking |
| Code review agents with quality filters | Custom dev | Session effectiveness |
| Weekly CLAUDE.md fact-check GitHub Action | Automation | Context freshness score |
| Incident/troubleshooting with progressive disclosure | Months | "Explain it twice" detection |
| All runbooks followable by Claude in 6 weeks | Systematic program | Runbook→skill coverage |
Intercom spent 6+ months of platform engineering.
agentisd automates the measurement from day one.
Platform Analysis
| Capability | Claude Code | Codex CLI | Cursor |
|---|---|---|---|
| Hook Events | 5 | 5 + Stop | 5 (stop, afterAgentResponse) |
| PreToolUse | ✓ block + modify | ✓ block + modify | ⚠ block only |
| PostToolUse | ✓ full output | ✓ full output | ✓ read-only |
| SessionStart | ✓ inject context | ✓ inject context | ⚠ unreliable |
| UserPromptSubmit | ✓ | ✓ | ✗ |
| Token / Cost | ✓ native | ✓ OTEL | ✗ |
| Coverage | 95% (28/30) | 93% (27/30) | 62% (16/30) |
Claude Code: cli/structuredIO.ts, cli/print.ts:SDKResult
Codex CLI: codex-rs/hooks/hook_runtime.rs, OTEL codex.cost_usd
Cursor: cursor/plugins/schemas, cursor/coreCommands
Adoption Intelligence
ADOPTION BY SENIORITY
Juniors: 45% adoption but 2.1x slower time-to-solution → training opportunity
ADOPTION BY TEAM
| Team | Score | Sessions/wk |
|---|---|---|
| Platform | 92 | 1,247 |
| API | 87 | 983 |
| Frontend | 71 | 756 |
| Mobile | 54 | 312 |
| Design Eng | 43 | 89 |
Mobile: score 54, low session count → investigate blockers or tool fit
Engineering Teams (Platform, API, Backend)
Score: 87-92 · Sessions: 983-1,247/wk
High adoption, high effectiveness. Focus: optimize model mix, reduce cost.
Design & Non-Engineering Teams
Score: 31-54 · Sessions: 89-312/wk
Low adoption. Action: specialized skills, onboarding, or re-evaluate tool fit.
DATA PIPELINE — HOW THIS WORKS
From Hooks (automatic)
Onboarding Form (developer self-selects)
Computed Metrics
All 3 platforms (Claude Code, Codex CLI, Cursor). Seniority can be admin-configured or auto-inferred after 4 weeks of session data.
Competitive Landscape
Every developer analytics tool today is git-based.
They measure the output. They can't see the process.
| Capability | Jellyfish | DX | Swarmia | LinearB | Sleuth | agentisd |
|---|---|---|---|---|---|---|
| Data source | Git + Jira | Git + Surveys | Git + Jira | Git only | Config layer | Hook-level sessions |
| AI session observation | ✗ | ✗ | ✗ | ✗ | ⚠ governance | ✓ |
| Edit→Test→Edit cycles | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
| Cost per AI session | ✗ | ✗ | ✗ | ✗ | ⚠ per-skill | ✓ |
| Context quality scoring | ✗ | ⚠ catalog | ✗ | ✗ | ⚠ versioning | ✓ |
| "Explain it twice" detection | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
| Cross-tool comparison | ⚠ inferred | ⚠ inferred | ⚠ inferred | ✗ | ⚠ distribution | ✓ |
| AI tool ROI | ⚠ git-derived | ⚠ git-derived | ⚠ adoption only | ✗ | ✗ | ✓ |
Git-based tools
Measure commits, PRs, cycle time.
Can't see what happens inside an AI session.
agentisd (hook-based)
Observes every tool call, every iteration, every outcome.
Data that is structurally impossible for git-based tools.
Competitor data verified via jellyfish.co, getdx.com, swarmia.com, linearb.io, sleuth.io (April 2026). Jellyfish/DX "AI Impact" modules infer AI usage from git metadata, not session data.
System Design
Two sibling products, not parent-child. ctx_agentisd sends nothing to cloud.
ctx_agentisd — Local MCP Tool (inside context-mode)
Purely local. Free. No cloud dependency.
agentisd Cloud — Separate Product
Paid, Separate Infrastructure
k-anonymity ≥ 5
No raw events, no code, no prompts
Developer Experience
Developer types ctx_agentisd or /agentisd — browser opens with local SessionDB metrics
Tool Distribution
"ctx_agentisd is a local MCP tool. No separate install. No cloud. Just your data in your browser."
Team analytics is a separate product: agentisd cloud (Cloudflare D1).
Evidence
63 tengu_* analytics events tracked internally by Anthropic. total_cost_usd computed per session.
Zero exposed to customers.
bridge/replBridge.ts
cli/print.ts:SDKResult
5 hook events, Zod-validated JSON I/O, blocking + async modes. Identical wire protocol across 3 platforms.
cli/structuredIO.ts
codex-rs/hooks/hook_runtime.rs
forceLoginMethod (SSO), organization.uuid, maxBudgetUsd, allowedMcpServers, RBAC roles, trusted devices.
bridge/types.ts
cli/auth/
Business Model
Free — ctx_agentisd
$0
MCP tool inside context-mode. Local dashboard.
56K installs, 6.5K GitHub stars
agentisd — Enterprise Product
$18/seat/mo
Team
$34/seat/mo
Enterprise
Financial Projection (50-seat engineering org)
Conclusion
context-mode (open source)
Data collection layer. 56K installs.
ctx_agentisd: free local dashboard.
agentisd (enterprise product)
Team analytics. Cloudflare D1.
$18-34/seat. Privacy-first.
"You measure everything about your code.
Why not the AI writing it?"
Research & documentation: github.com/mksglu/agentisd