metric-runner

Included with Lifetime

$97 forever

# Metric Runner Skill

Data & Analytics

What this skill does

# Metric Runner Skill

Runs verification tests and extracts metrics for ralph-loop++.

## Purpose

Execute the verification test created by the Test Architect and parse the results into a usable format for decision-making.

## Running a Test

**IMPORTANT:** Never use `eval` or `bash -c` to execute test commands. Use the safe execution script instead.

```bash
# Use the run-test.sh script for safe execution
TEST_COMMAND="$1"
WORKING_DIR="$2"

# Safe execution (validates command format, no eval)
OUTPUT=$("${CLAUDE_PLUGIN_ROOT}/scripts/run-test.sh" "$TEST_COMMAND" "$WORKING_DIR")
EXIT_CODE=$?

echo "$OUTPUT"
exit $EXIT_CODE
```

### Allowed Test Commands

The following command prefixes are allowed:
- `node <script>`
- `python <script>` / `python3 <script>`
- `npm test` / `npx <command>`
- `pnpm <command>`
- `bun <command>`
- `pytest <args>`
- `jest <args>`
- `vitest <args>`

Commands containing shell metacharacters (`;`, `|`, `&`, `` ` ``, `$`, `(`) are rejected for security.

## Parsing Results

### Numeric Metric
```javascript
const output = `{"success": true, "metric_value": 45.2, "unit": "ms"}`;
const result = JSON.parse(output);

if (result.success && result.metric_value !== undefined) {
  console.log(`Metric: ${result.metric_value} ${result.unit || ''}`);
}
```

### Boolean Success
```javascript
const output = `{"success": true, "reason": "All tests passed"}`;
const result = JSON.parse(output);

if (result.success !== undefined) {
  console.log(`Success: ${result.success}`);
  if (result.reason) console.log(`Reason: ${result.reason}`);
}
```

## Expected Output Formats

### Valid Formats
```json
{"success": true, "metric_value": 42, "unit": "ms"}
{"success": true, "metric_name": "latency", "metric_value": 42}
{"success": false, "error": "Connection refused"}
{"success": true, "reason": "All 10 runs passed"}
{"success": true}
```

### Invalid Formats (Handle Gracefully)
```
# Plain text output - treat as failure
Error: something went wrong

# Non-JSON - treat as failure
42

# Missing success field - infer from metric_value
{"metric_value": 42}
```

## Comparison Logic

### Numeric Metrics

```javascript
function checkNumericGoal(current, target, direction) {
  if (direction === 'decrease') {
    return current <= target;
  } else if (direction === 'increase') {
    return current >= target;
  }
  return false;
}

// Example
const achieved = checkNumericGoal(45, 50, 'decrease');  // true
```

### Boolean Metrics

```javascript
function checkBooleanGoal(result) {
  return result.success === true;
}
```

## Progress Tracking

After each test run, update the state file:

```yaml
# Append to metrics_history
metrics_history:
  - timestamp: "2024-01-06T10:30:00Z"
    worker: 1
    iteration: 5
    metric_value: 85
    goal_achieved: false
```

## Handling Test Failures

### Test Crashes
```bash
if [[ $EXIT_CODE -ne 0 ]]; then
  echo '{"success": false, "error": "Test crashed with exit code '$EXIT_CODE'"}'
fi
```

### Timeout
```bash
timeout 300 $TEST_COMMAND || echo '{"success": false, "error": "Test timed out after 5 minutes"}'
```

### Invalid JSON
```javascript
try {
  const result = JSON.parse(output);
} catch (e) {
  return {
    success: false,
    error: `Invalid JSON output: ${output.substring(0, 100)}`
  };
}
```

## Baseline Measurement

When establishing baseline:

```javascript
// Run test multiple times for stability
const runs = 3;
const results = [];

for (let i = 0; i < runs; i++) {
  const result = runTest(testCommand);
  results.push(result.metric_value);
}

// Use median for baseline
results.sort((a, b) => a - b);
const baseline = results[Math.floor(runs / 2)];

console.log(`Baseline established: ${baseline}`);
```

## Integration with Ralph Loop

The worker stop hook uses this skill to:
1. Run the verification test after each iteration
2. Parse the metric value
3. Compare against target
4. Decide whether to continue or signal completion

```bash
# In stop hook
RESULT=$(run-metric-test "$TEST_COMMAND" "$WORKTREE_PATH")
METRIC=$(echo "$RESULT" | jq -r '.metric_value // empty')

if [[ -n "$METRIC" ]] && (( $(echo "$METRIC <= $TARGET" | bc -l) )); then
  echo "<promise>GOAL ACHIEVED: $METRIC</promise>"
fi
```

Files: 1

Size: 4.1 KB

Complexity: 6/100

Category: Data & Analytics

Source: https://github.com/ponderingbgi/ralph-loop-pp/tree/main/plugin/skills/metric-runner

Related in Data & Analytics

clawarr-suite

Included

Comprehensive management for self-hosted media stacks (Sonarr, Radarr, Lidarr, Readarr, Prowlarr, Bazarr, Overseerr, Plex, Tautulli, SABnzbd, Recyclarr, Unpackerr, Notifiarr, Maintainerr, Kometa, FlareSolverr). Deep library exploration, analytics, dashboard generation, content management, request handling, subtitle management, indexer control, download monitoring, quality profile sync, library cleanup automation, notification routing, collection/overlay management, and media tracker integration (Trakt, Letterboxd, Simkl).

Data & Analyticsscripts

querying-soql

Included

SOQL query generation, optimization, and analysis with 100-point scoring. Use this skill when the user needs SOQL/SOSL authoring or optimization: natural-language-to-query generation, relationship queries, aggregates, query-plan analysis, and performance or safety improvements for Salesforce queries. TRIGGER when: user writes, optimizes, or debugs SOQL/SOSL queries, touches .soql files, or asks about relationship queries, aggregates, or query performance. DO NOT TRIGGER when: bulk data operations (use handling-sf-data), Apex DML logic (use generating-apex), or report/dashboard queries.

Data & Analyticsscripts

app-store-optimization

Included

App Store Optimization (ASO) toolkit for researching keywords, analyzing competitor rankings, generating metadata suggestions, and improving app visibility on Apple App Store and Google Play Store. Use when the user asks about ASO, app store rankings, app metadata, app titles and descriptions, app store listings, app visibility, or mobile app marketing on iOS or Android. Supports keyword research and scoring, competitor keyword analysis, metadata optimization, A/B test planning, launch checklists, and tracking ranking changes.

Data & Analyticsscripts

habit-flow

Included

AI-powered atomic habit tracker with natural language logging, streak tracking, smart reminders, and coaching. Use for creating habits, logging completions naturally ("I meditated today"), viewing progress, and getting personalized coaching.

Data & Analyticsscripts

app-store-optimization

Included

Data & Analyticsscripts

visualizing-data

Included

Builds dashboards, reports, and data-driven interfaces requiring charts, graphs, or visual analytics. Provides systematic framework for selecting appropriate visualizations based on data characteristics and analytical purpose. Includes 24+ visualization types organized by purpose (trends, comparisons, distributions, relationships, flows, hierarchies, geospatial), accessibility patterns (WCAG 2.1 AA compliance), colorblind-safe palettes, and performance optimization strategies. Use when creating visualizations, choosing chart types, displaying data graphically, or designing data interfaces.

Data & Analyticsscripts