metric-runner
# Metric Runner Skill
What this skill does
# Metric Runner Skill
Runs verification tests and extracts metrics for ralph-loop++.
## Purpose
Execute the verification test created by the Test Architect and parse the results into a usable format for decision-making.
## Running a Test
**IMPORTANT:** Never use `eval` or `bash -c` to execute test commands. Use the safe execution script instead.
```bash
# Use the run-test.sh script for safe execution
TEST_COMMAND="$1"
WORKING_DIR="$2"
# Safe execution (validates command format, no eval)
OUTPUT=$("${CLAUDE_PLUGIN_ROOT}/scripts/run-test.sh" "$TEST_COMMAND" "$WORKING_DIR")
EXIT_CODE=$?
echo "$OUTPUT"
exit $EXIT_CODE
```
### Allowed Test Commands
The following command prefixes are allowed:
- `node <script>`
- `python <script>` / `python3 <script>`
- `npm test` / `npx <command>`
- `pnpm <command>`
- `bun <command>`
- `pytest <args>`
- `jest <args>`
- `vitest <args>`
Commands containing shell metacharacters (`;`, `|`, `&`, `` ` ``, `$`, `(`) are rejected for security.
## Parsing Results
### Numeric Metric
```javascript
const output = `{"success": true, "metric_value": 45.2, "unit": "ms"}`;
const result = JSON.parse(output);
if (result.success && result.metric_value !== undefined) {
console.log(`Metric: ${result.metric_value} ${result.unit || ''}`);
}
```
### Boolean Success
```javascript
const output = `{"success": true, "reason": "All tests passed"}`;
const result = JSON.parse(output);
if (result.success !== undefined) {
console.log(`Success: ${result.success}`);
if (result.reason) console.log(`Reason: ${result.reason}`);
}
```
## Expected Output Formats
### Valid Formats
```json
{"success": true, "metric_value": 42, "unit": "ms"}
{"success": true, "metric_name": "latency", "metric_value": 42}
{"success": false, "error": "Connection refused"}
{"success": true, "reason": "All 10 runs passed"}
{"success": true}
```
### Invalid Formats (Handle Gracefully)
```
# Plain text output - treat as failure
Error: something went wrong
# Non-JSON - treat as failure
42
# Missing success field - infer from metric_value
{"metric_value": 42}
```
## Comparison Logic
### Numeric Metrics
```javascript
function checkNumericGoal(current, target, direction) {
if (direction === 'decrease') {
return current <= target;
} else if (direction === 'increase') {
return current >= target;
}
return false;
}
// Example
const achieved = checkNumericGoal(45, 50, 'decrease'); // true
```
### Boolean Metrics
```javascript
function checkBooleanGoal(result) {
return result.success === true;
}
```
## Progress Tracking
After each test run, update the state file:
```yaml
# Append to metrics_history
metrics_history:
- timestamp: "2024-01-06T10:30:00Z"
worker: 1
iteration: 5
metric_value: 85
goal_achieved: false
```
## Handling Test Failures
### Test Crashes
```bash
if [[ $EXIT_CODE -ne 0 ]]; then
echo '{"success": false, "error": "Test crashed with exit code '$EXIT_CODE'"}'
fi
```
### Timeout
```bash
timeout 300 $TEST_COMMAND || echo '{"success": false, "error": "Test timed out after 5 minutes"}'
```
### Invalid JSON
```javascript
try {
const result = JSON.parse(output);
} catch (e) {
return {
success: false,
error: `Invalid JSON output: ${output.substring(0, 100)}`
};
}
```
## Baseline Measurement
When establishing baseline:
```javascript
// Run test multiple times for stability
const runs = 3;
const results = [];
for (let i = 0; i < runs; i++) {
const result = runTest(testCommand);
results.push(result.metric_value);
}
// Use median for baseline
results.sort((a, b) => a - b);
const baseline = results[Math.floor(runs / 2)];
console.log(`Baseline established: ${baseline}`);
```
## Integration with Ralph Loop
The worker stop hook uses this skill to:
1. Run the verification test after each iteration
2. Parse the metric value
3. Compare against target
4. Decide whether to continue or signal completion
```bash
# In stop hook
RESULT=$(run-metric-test "$TEST_COMMAND" "$WORKTREE_PATH")
METRIC=$(echo "$RESULT" | jq -r '.metric_value // empty')
if [[ -n "$METRIC" ]] && (( $(echo "$METRIC <= $TARGET" | bc -l) )); then
echo "<promise>GOAL ACHIEVED: $METRIC</promise>"
fi
```
Related in Data & Analytics
clawarr-suite
IncludedComprehensive management for self-hosted media stacks (Sonarr, Radarr, Lidarr, Readarr, Prowlarr, Bazarr, Overseerr, Plex, Tautulli, SABnzbd, Recyclarr, Unpackerr, Notifiarr, Maintainerr, Kometa, FlareSolverr). Deep library exploration, analytics, dashboard generation, content management, request handling, subtitle management, indexer control, download monitoring, quality profile sync, library cleanup automation, notification routing, collection/overlay management, and media tracker integration (Trakt, Letterboxd, Simkl).
querying-soql
IncludedSOQL query generation, optimization, and analysis with 100-point scoring. Use this skill when the user needs SOQL/SOSL authoring or optimization: natural-language-to-query generation, relationship queries, aggregates, query-plan analysis, and performance or safety improvements for Salesforce queries. TRIGGER when: user writes, optimizes, or debugs SOQL/SOSL queries, touches .soql files, or asks about relationship queries, aggregates, or query performance. DO NOT TRIGGER when: bulk data operations (use handling-sf-data), Apex DML logic (use generating-apex), or report/dashboard queries.
app-store-optimization
IncludedApp Store Optimization (ASO) toolkit for researching keywords, analyzing competitor rankings, generating metadata suggestions, and improving app visibility on Apple App Store and Google Play Store. Use when the user asks about ASO, app store rankings, app metadata, app titles and descriptions, app store listings, app visibility, or mobile app marketing on iOS or Android. Supports keyword research and scoring, competitor keyword analysis, metadata optimization, A/B test planning, launch checklists, and tracking ranking changes.
habit-flow
IncludedAI-powered atomic habit tracker with natural language logging, streak tracking, smart reminders, and coaching. Use for creating habits, logging completions naturally ("I meditated today"), viewing progress, and getting personalized coaching.
app-store-optimization
IncludedApp Store Optimization (ASO) toolkit for researching keywords, analyzing competitor rankings, generating metadata suggestions, and improving app visibility on Apple App Store and Google Play Store. Use when the user asks about ASO, app store rankings, app metadata, app titles and descriptions, app store listings, app visibility, or mobile app marketing on iOS or Android. Supports keyword research and scoring, competitor keyword analysis, metadata optimization, A/B test planning, launch checklists, and tracking ranking changes.
visualizing-data
IncludedBuilds dashboards, reports, and data-driven interfaces requiring charts, graphs, or visual analytics. Provides systematic framework for selecting appropriate visualizations based on data characteristics and analytical purpose. Includes 24+ visualization types organized by purpose (trends, comparisons, distributions, relationships, flows, hierarchies, geospatial), accessibility patterns (WCAG 2.1 AA compliance), colorblind-safe palettes, and performance optimization strategies. Use when creating visualizations, choosing chart types, displaying data graphically, or designing data interfaces.