oss-kpi-evaluation
Evaluates open source projects against Key Performance Indicators using multi-agent research. Performs live web research to gather current metrics on community health, maintenance, security, documentation, adoption, and code quality. Use when evaluating OSS projects for adoption, investment, or comparison.
What this skill does
# OSS KPI Evaluation Agent Skill
## Purpose and Scope
This skill evaluates open source projects against six Key Performance Indicator (KPI) categories using multi-agent research with built-in bias prevention. It performs live web research to gather current, verifiable metrics. An optional 7th category (Red Hat Engagement) can be included when explicitly requested to measure Red Hat AI Engineering's participation in a project.
### When to Use This Skill
- **Adoption Decisions**: Evaluating whether to adopt an OSS project
- **Investment Analysis**: Assessing project health for funding decisions
- **Project Comparison**: Objective comparison of competing projects
- **Due Diligence**: Comprehensive project health assessment
- **Risk Assessment**: Identifying maintenance, security, or community risks
### Scope Limitations
- **GitHub Only**: Projects must be hosted on GitHub
- **Single Project**: Evaluates one project per invocation
- **English Language**: Optimized for English-language projects and documentation
## Quick Start
```
Evaluate the project: https://github.com/fastapi/fastapi
```
Or with specific focus:
```
Evaluate https://github.com/psf/requests focusing on security and maintenance
```
## Evaluation Workflow
The evaluation follows six phases with strict quality controls.
### Phase 1: Input Validation
- [ ] Extract owner and repository name from input
- [ ] Validate repository exists via `gh repo view`
- [ ] Confirm repository is public
- [ ] Record repository metadata (stars, forks, language, license)
### Phase 1.5: LDAP Prerequisite Check (Conditional)
If Red Hat Engagement analysis is requested, verify prerequisites before dispatch:
- [ ] Verify Kerberos ticket: `klist` — must show a valid TGT
- [ ] Test LDAP connectivity: `ldapsearch -x -H ldap://ldap.corp.redhat.com -b dc=redhat,dc=com '(uid=shuels)' uid`
- [ ] If both checks pass: Set `ldap_available=true` for the RH subagent
- [ ] If either check fails: Set `ldap_available=false` — RH subagent will use email-only fallback with reduced confidence
- [ ] Log prerequisite check results for inclusion in the report
### Phase 2: Parallel Research Dispatch
Launch six independent research subagents in parallel. Each subagent:
- Receives ONLY the project identifier and metrics to collect
- Has no access to other subagent results
- Must use live data sources (WebSearch, WebFetch, gh CLI)
- Returns structured JSON with sources
**Subagents to dispatch:**
1. `kpi-community-researcher` - Community health metrics
2. `kpi-maintenance-researcher` - Maintenance activity metrics
3. `kpi-security-researcher` - Security posture metrics
4. `kpi-documentation-researcher` - Documentation quality metrics
5. `kpi-adoption-researcher` - Adoption and usage metrics
6. `kpi-codequality-researcher` - Code quality metrics
7. `kpi-rh-engagement-researcher` - Red Hat Engagement metrics *(conditional — only if requested)*
See `references/RESEARCH-PROMPTS.md` for subagent prompt templates.
### Phase 3: Result Collection & Parsing
- [ ] Collect JSON output from each subagent
- [ ] Validate JSON structure matches expected schema
- [ ] Flag any subagent failures or incomplete data
- [ ] Aggregate results into unified data structure
### Phase 3.5: Source Verification
**Purpose**: Prevent hallucinated values by independently re-fetching critical metrics.
**HIGH Priority Metrics** (must verify):
- `star_count` - via `gh repo view {owner}/{repo} --json stargazerCount`
- `contributor_count` - via `gh api repos/{owner}/{repo}/stats/contributors`
- `download_count` - via package registry API
**Verification Protocol**:
- [ ] Re-fetch each HIGH priority metric using the source command
- [ ] Compare subagent-reported value with freshly fetched value
- [ ] Apply 5% tolerance for timing differences
- [ ] Mark metrics exceeding tolerance as "UNVERIFIED" with both values
- [ ] If >3 metrics fail verification, trigger subagent re-run for affected category
**Verification Output Format**:
```json
{
"metric": "star_count",
"subagent_value": 15234,
"verified_value": 15189,
"variance_percent": 0.3,
"status": "VERIFIED",
"verified_at": "ISO-8601 timestamp"
}
```
**Unverified Metric Handling**:
- NEVER report disputed values as fact
- Report as: "star_count: UNVERIFIED (subagent: 15234, verification: 12891)"
- Flag in Data Quality Assessment section
- Reduce confidence score for affected category
### Phase 4: Cross-Validation
- [ ] Compare overlapping metrics between subagents
- [ ] Flag discrepancies exceeding 10% variance
- [ ] Verify all URLs are accessible
- [ ] Check temporal consistency (data freshness)
- [ ] **Bounds Validation**: Verify logical constraints between metrics
- `bus_factor` must be ≤ `contributor_count`
- `contributor_count` must be ≤ `fork_count * 10` (heuristic upper bound)
- `issue_engagement` >90% with <5 contributors warrants scrutiny
- [ ] **Independent Recalculation**: For calculated metrics (bus_factor, engagement rates), recompute from intermediate data provided by subagents. Flag >5% variance from reported value.
- [ ] **Temporal Sanity**: Verify all subagent collections completed within 5 minutes of each other
See `references/BIAS-PREVENTION.md` for validation protocols and semantic consistency rules.
### Phase 5: Mechanical Scoring
- [ ] Load scoring rubric from `assets/scoring-rubric.json`
- [ ] Apply threshold-based scoring to each metric
- [ ] Calculate category scores (average of metric scores)
- [ ] Calculate overall score (weighted average of categories)
- [ ] Apply custom weights if specified by user
### Phase 6: Report Generation
- [ ] Generate executive summary with overall score
- [ ] Create per-category detailed sections
- [ ] Include evidence tables with sources
- [ ] Document data quality issues
- [ ] List all sources with retrieval timestamps
See `references/REPORT-TEMPLATE.md` for output format.
## Subagent Architecture
Each research subagent operates in complete isolation to prevent bias contamination.
### Architecture Principles
1. **Statelessness**: Each subagent receives a complete, self-contained prompt
2. **No Shared Context**: Subagents cannot reference parent conversation
3. **Live Data Only**: All metrics from WebSearch, WebFetch, or gh CLI
4. **Structured Output**: JSON format for unambiguous parsing
5. **Source Attribution**: Every data point includes its source
### Subagent Responsibilities
| Subagent | Purpose | Primary Tools |
|----------|---------|---------------|
| kpi-community-researcher | Contributor metrics, engagement | WebSearch, gh CLI |
| kpi-maintenance-researcher | Release cadence, issue resolution | WebSearch, gh CLI |
| kpi-security-researcher | Security policy, vulnerabilities | WebSearch, gh CLI |
| kpi-documentation-researcher | Docs quality, completeness | WebSearch, Read |
| kpi-adoption-researcher | Usage metrics, trends | WebSearch, WebFetch |
| kpi-codequality-researcher | Tests, CI, code review | WebSearch, gh CLI |
| kpi-rh-engagement-researcher | RH employee contributions, governance | ldapsearch, gh CLI, git log *(conditional)* |
### Launching Subagents
Use the Task tool with `subagent_type: "general-purpose"` for each researcher. Launch all six in parallel using a single message with multiple Task tool invocations.
**Critical**: Use the prompts from `references/RESEARCH-PROMPTS.md` verbatim, substituting only the `{owner}` and `{repo}` placeholders.
## Bias Prevention Protocol
This skill implements strict bias prevention measures. See `references/BIAS-PREVENTION.md` for complete protocols.
### Key Controls
1. **Subagent Isolation**: Research agents receive only project ID and metrics to collect - no opinions, no context about why evaluation is happening
2. **Evidence-Based Only**: Every claim requires a verifiable source URL
3. **No Assumptions**: Missing data is flagged as "NOT_AVAILABLE" - never estimated
4. **Multiple Sources**: Critical metrics cross-validated with 2+ sources
5. **NeutrRelated in AI Agents
skill-development
IncludedComprehensive meta-skill for creating, managing, validating, auditing, and distributing Claude Code skills and slash commands (unified in v2.1.3+). Provides skill templates, creation workflows, validation patterns, audit checklists, naming conventions, YAML frontmatter guidance, progressive disclosure examples, and best practices lookup. Use when creating new skills, validating existing skills, auditing skill quality, understanding skill architecture, needing skill templates, learning about YAML frontmatter requirements, progressive disclosure patterns, tool restrictions (allowed-tools), skill composition, skill naming conventions, troubleshooting skill activation issues, creating custom slash commands, configuring command frontmatter, using command arguments ($ARGUMENTS, $1, $2), bash execution in commands, file references in commands, command namespacing, plugin commands, MCP slash commands, Skill tool configuration, or deciding between skills vs slash commands. Delegates to docs-management skill for official documentation.
reprompter
IncludedTransform messy prompts into well-structured, effective prompts — single or multi-agent. Use when: "reprompt", "reprompt this", "clean up this prompt", "structure my prompt", rough text needing XML tags and best practices, "reprompter teams", "repromptception", "run with quality", "smart run", "smart agents", multi-agent tasks, audits, parallel work, anything going to agent teams. Don't use when: simple Q&A, pure chat, immediate execution-only tasks. See "Don't Use When" section for details. Outputs: Structured XML/Markdown prompt, quality score (before/after), optional team brief + per-agent sub-prompts, agent team output files. Success criteria: Single mode quality score ≥ 7/10; Repromptception per-agent prompt quality score 8+/10; all required sections present, actionable and specific.
adaptive-compaction
IncludedAdaptive add-on policy and recovery layer that decides WHEN to compact, prune, snapshot, or fork -- replacing fixed-percent auto-compaction across Claude Code, Codex, and MCP-capable hosts. Trigger on auto-compact timing or damage: "when should I compact", "is it safe to compact now or start a fresh session", "auto-compact fires too early/mid-task", "switching to an unrelated task but the window still has space", "context rot", "answers get worse the longer the session runs", "the agent forgot the plan or my decisions after it summarized", "add a layer on top that manages context without changing the agent", raising autoCompactWindow to give the policy room, or installing/tuning a cross-tool compaction policy or PreCompact hook -- even when "compaction" is never said but the problem is context-window pressure or post-summarization memory loss. Do NOT use to summarize a conversation, build RAG, write a summarization prompt (decides WHEN not HOW), or answer max-context-length trivia.
agent-skill-creator
IncludedCreate cross-platform agent skills from workflow descriptions. Activates when users ask to create an agent, automate a repetitive workflow, create a custom skill, or need advanced agent creation. Triggers on phrases like create agent for, automate workflow, create skill for, every day I have to, daily I need to, turn process into agent, need to automate, create a cross-platform skill, validate this skill, export this skill, migrate this skill. Supports single skills, multi-agent suites, transcript processing, template-based creation, interactive configuration, cross-platform export, and spec validation.
llm-wiki
IncludedUse when building or maintaining a persistent personal knowledge base (second brain) in Obsidian where an LLM incrementally ingests sources, updates entity/concept pages, maintains cross-references, and keeps a synthesis current. Triggers include "second brain", "Obsidian wiki", "personal knowledge management", "ingest this paper/article/book", "build a research wiki", "compound knowledge", "Memex", or whenever the user wants knowledge to accumulate across sessions instead of being re-derived by RAG on every query.
skill-master
IncludedAgent Skills authoring, evaluation, and optimization. Create, edit, validate, benchmark, and improve skills following the agentskills.io specification. Use when designing SKILL.md files, structuring skill folders (references, scripts, assets), ingesting external documentation into skills, running trigger evals, benchmarking skill quality, optimizing descriptions, or performing blind A/B comparisons. Keywords: agentskills.io, SKILL.md, skill authoring, eval, benchmark, trigger optimization.