measure-okr-grader

Included with Lifetime

$97 forever

Scores completed OKR sets at cycle close with KR-level scoring per the canonical OKR type enum (committed | aspirational | learning | operational_health | compliance_or_safety), committed-vs-aspirational interpretation, evidence quality assessment, learning synthesis, and next-cycle recommendations. Refuses to retroactively change targets or shrink committed scope, average away guardrail KRs, treat 0.7 as success for committed or compliance_or_safety KRs, equate effort with impact, or use scores for individual performance. Hands off to iterate-lessons-log, iterate-retrospective, define-hypothesis, measure-dashboard-requirements, measure-instrumentation-spec, and foundation-okr-writer.

Data & Analytics

What this skill does

# OKR Grader

An OKR Cycle Review is a backward-looking artifact that closes the loop on a completed OKR set. It scores each KR against its baseline and target, separates committed from aspirational interpretation, surfaces what evidence does and does not support, names what the team learned, and prepares input for next-cycle drafting. Done well, a cycle review protects the integrity of the OKR operating system by refusing to dress up missed commitments as aspirational stretch, refusing to celebrate effort over outcome, and refusing to let scoring carry weight it cannot bear.

This skill is an evidence interpreter, not an arithmetic engine. Its job is to read final KR values, compare them against the original OKR set's intent, and produce a review that names the learning honestly. It enforces the empirical scoring conventions drawn from Doerr (`Measure What Matters`), Wodtke (`Radical Focus`), Castro (committed vs aspirational interpretation), Grove (`High Output Management`), and the OKR community's accumulated practice on misuse failure modes. It pairs with `foundation-okr-writer` (which produced the OKR set being scored) and hands off the learnings produced here to the iterate skills that consume them.

## When to Use

- The OKR cycle has ended (or you are scoring a partial-cycle close)
- You have final or interim KR values, baselines, and targets
- Stakeholders need a clear review with score, evidence, and learning
- The team is deciding what to continue, stop, change, or carry forward
- There is disagreement about whether a score is good or bad
- Evidence quality across KRs is uneven and needs to be made visible

## When NOT to Use

- You are still drafting OKRs . use `foundation-okr-writer`
- You want a generic team retro . use `iterate-retrospective`
- You are reporting a single experiment result . use `measure-experiment-results`
- You need a stakeholder progress update without scoring . use `foundation-stakeholder-update`
- The OKR set was never agreed on or never tracked . scoring requires an authored set; backfill via `foundation-okr-writer` first
- You want to use scores to evaluate individuals . the skill refuses this

## Instructions

When asked to score completed OKRs, follow these steps:

1. **Validate scoring readiness**
Check inputs: original OKR set, cycle dates, final KR values (or interim values for partial-close), baselines, targets, evidence sources, and OKR types (committed | aspirational | learning | operational_health | compliance_or_safety). If a value is missing, mark it explicitly (`not-yet-observable`, `not-instrumented`, `not-supplied`); never fabricate. Refuse to grade KRs whose original definitions are missing entirely.

2. **Classify each KR's type and indicator class**
The OKR type is one of `committed | aspirational | learning | operational_health | compliance_or_safety` (the five values produced by `foundation-okr-writer`). The indicator class is one of `leading | lagging | guardrail | health | evidence_generation`. Carry both forward from the original OKR set, or assign defaults if the original set did not specify. The OKR type determines the scoring convention: `aspirational` uses the 0.6 to 0.7 sweet spot; `committed` targets 1.0; `compliance_or_safety` is binary; `operational_health` is pass | fail | drift-within-tolerance against a threshold band; `learning` grades by validated or invalidated rather than by score. The indicator class adds independent rules that apply on top of the type's scoring (see Step 3).

3. **Score each KR**
For each KR, compute or assign a score using the convention for its OKR type:
- `aspirational` KR: numeric score = (actual - baseline) / (target - baseline). Sweet spot is 0.6 to 0.7.
- `committed` KR: pass or fail against the target. Anything below 1.0 is a miss.
- `compliance_or_safety` KR: binary. Met or not met. No partial credit. No retroactive scope shrinkage when coverage is partial; mark as not-yet-fully-observable instead.
- `operational_health` KR: pass | fail | drift-within-tolerance against the threshold band.
- `learning` KR: validated, invalidated, partially-validated, or insufficient-evidence. No numeric score.
Then apply indicator-class rules independently of the OKR type:
- any KR with indicator class `guardrail` is reported as its own signal and is NEVER averaged into the primary objective score, regardless of its OKR type. A failed guardrail does not dilute a high primary KR score.
For each score, state the calculation or rationale and the evidence confidence (high | medium | low | unknown).

4. **Interpret the objective score**
Avoid naive averaging when one KR is a guardrail, compliance threshold, or learning KR. Produce a qualitative read of the objective alongside any rough numeric average. State explicitly what the score does and does not mean.

5. **Assess evidence quality**
For each KR, name the evidence's reliability and any caveats (instrumentation gaps, target shifts mid-cycle, cohort definition changes, measurement window mismatches, sample-size limitations). Recommend fixes for next cycle's measurement plan.

6. **Review initiatives as bets**
For each initiative the team ran, name which KR it was expected to move, whether it shipped, what its apparent contribution was, and whether the evidence supports continuing, retiring, or reworking it. Use Castro's "initiatives are bets, not commitments" framing. Separate ship-status from KR-impact; an initiative that shipped on time but did not move its KR is not a partial win.

7. **Synthesize learning**
Capture validated assumptions, invalidated assumptions, surprises, and decision implications. Distinguish between learnings about the customer or product (carry forward), learnings about team process (hand to `iterate-retrospective`), and learnings about measurement (hand to `measure-instrumentation-spec` or `measure-dashboard-requirements`).

8. **Prepare next-cycle recommendations**
For each objective: continue, revise, retire, or escalate. Suggest candidate next-cycle OKRs or open questions for `foundation-okr-writer`. Hand-off measurement gaps to `measure-dashboard-requirements` or `measure-instrumentation-spec`. Hand-off assumption tests to `define-hypothesis`. Hand-off team-process work to `iterate-retrospective`. Hand-off organizational memory to `iterate-lessons-log`. Hand-off next-cycle drafting to `foundation-okr-writer`.

9. **Surface risks in interpretation**
Make explicit any places the score could mislead a reader: forced numeric scores on KRs that are not yet observable, confounded initiative results, stakeholder framings that under-state evidence, single-cycle results that need a second cycle of confirmation.

10. **Note the source of truth**
The artifact is a review document, not the canonical OKR system. Include a `source_of_truth` field pointing to the original OKR tracker.

11. **Finalize for direct use**
Remove all skill instruction commentary from the final artifact. The final output should be reader-facing.

## Constraint Rules (MUST / MUST NOT)

These rules are non-negotiable. The skill enforces them in every grading run.

- **MUST NOT** retroactively change baselines, targets, or KR definitions. If the team adjusted these mid-cycle, document the change explicitly and grade against both the original and adjusted versions.
- **MUST NOT** retroactively shrink the scope of a `committed` or `compliance_or_safety` KR to mark partial coverage as a pass. If the original commitment named 3 healthcare accounts and only 1 has been audited, the KR is `not-yet-fully-observable`. The 1-account result is a sub-signal, not the KR score.
- **MUST NOT** treat 0.7 as success for `committed`, `compliance_or_safety`, or `operational_health` KRs. Those target 1.0 (or the threshold band).
- **MUST NOT** average away a failed guardrail. A failed guardrail is a separate s

Files: 3

Size: 37.3 KB

Complexity: 55/100

Category: Data & Analytics

Source: https://github.com/product-on-purpose/pm-skills/tree/main/skills/measure-okr-grader

Related in Data & Analytics

clawarr-suite

Included

Comprehensive management for self-hosted media stacks (Sonarr, Radarr, Lidarr, Readarr, Prowlarr, Bazarr, Overseerr, Plex, Tautulli, SABnzbd, Recyclarr, Unpackerr, Notifiarr, Maintainerr, Kometa, FlareSolverr). Deep library exploration, analytics, dashboard generation, content management, request handling, subtitle management, indexer control, download monitoring, quality profile sync, library cleanup automation, notification routing, collection/overlay management, and media tracker integration (Trakt, Letterboxd, Simkl).

Data & Analyticsscripts

querying-soql

Included

SOQL query generation, optimization, and analysis with 100-point scoring. Use this skill when the user needs SOQL/SOSL authoring or optimization: natural-language-to-query generation, relationship queries, aggregates, query-plan analysis, and performance or safety improvements for Salesforce queries. TRIGGER when: user writes, optimizes, or debugs SOQL/SOSL queries, touches .soql files, or asks about relationship queries, aggregates, or query performance. DO NOT TRIGGER when: bulk data operations (use handling-sf-data), Apex DML logic (use generating-apex), or report/dashboard queries.

Data & Analyticsscripts

app-store-optimization

Included

App Store Optimization (ASO) toolkit for researching keywords, analyzing competitor rankings, generating metadata suggestions, and improving app visibility on Apple App Store and Google Play Store. Use when the user asks about ASO, app store rankings, app metadata, app titles and descriptions, app store listings, app visibility, or mobile app marketing on iOS or Android. Supports keyword research and scoring, competitor keyword analysis, metadata optimization, A/B test planning, launch checklists, and tracking ranking changes.

Data & Analyticsscripts

habit-flow

Included

AI-powered atomic habit tracker with natural language logging, streak tracking, smart reminders, and coaching. Use for creating habits, logging completions naturally ("I meditated today"), viewing progress, and getting personalized coaching.

Data & Analyticsscripts

app-store-optimization

Included

Data & Analyticsscripts

visualizing-data

Included

Builds dashboards, reports, and data-driven interfaces requiring charts, graphs, or visual analytics. Provides systematic framework for selecting appropriate visualizations based on data characteristics and analytical purpose. Includes 24+ visualization types organized by purpose (trends, comparisons, distributions, relationships, flows, hierarchies, geospatial), accessibility patterns (WCAG 2.1 AA compliance), colorblind-safe palettes, and performance optimization strategies. Use when creating visualizations, choosing chart types, displaying data graphically, or designing data interfaces.

Data & Analyticsscripts