Claude
Skills
Sign in
Back

output-dev-evaluator-function

Included with Lifetime
$97 forever

Create evaluator functions in evaluators.ts for Output SDK workflows. Use when implementing quality assessment, validation logic, or content evaluation.

Backend & APIs

What this skill does


# Creating Evaluator Functions

## Overview

This skill documents how to create evaluator functions in `evaluators.ts` for Output SDK workflows. Evaluators are used to assess quality, validate outputs, and provide confidence-scored judgments about workflow results.

## When to Use This Skill

- Implementing quality assessment for workflow outputs
- Adding validation logic with confidence scores
- Creating LLM-powered content evaluation
- Building reusable evaluation components

## File Organization

### Option 1: Flat File (Default)

For smaller workflows, use a single `evaluators.ts` file:

```
src/workflows/{workflow-name}/
├── workflow.ts
├── steps.ts
├── evaluators.ts    # All evaluators in one file
├── types.ts
└── ...
```

### Option 2: Folder-Based (Large workflows)

For larger workflows with many evaluators, use an `evaluators/` folder:

```
src/workflows/{workflow-name}/
├── workflow.ts
├── steps.ts
├── evaluators/      # Evaluators split into individual files
│   ├── quality.ts
│   ├── accuracy.ts
│   └── completeness.ts
├── types.ts
└── ...
```

## Component Location Rules

**Important**: `evaluator()` calls MUST be in files containing 'evaluators' in the path:
- `src/workflows/my_workflow/evaluators.ts` ✓
- `src/workflows/my_workflow/evaluators/quality.ts` ✓
- `src/shared/evaluators/common_evaluators.ts` ✓
- `src/workflows/my_workflow/helpers.ts` ✗ (cannot contain evaluator() calls)

## Activity Isolation Constraints

Evaluators are Temporal activities with strict import rules to ensure deterministic replay.

### Evaluators CAN import from:
- Local workflow files: `./utils.js`, `./types.js`, `./helpers.js`
- Local subdirectories: `./lib/helpers.js`
- Shared utilities: `../../shared/utils/*.js`
- Shared clients: `../../shared/clients/*.js`
- Shared services: `../../shared/services/*.js`

### Evaluators CANNOT import:
- Other evaluator files (activity isolation)
- Step files
- Workflow files

**Example of WRONG imports:**
```typescript
// WRONG - evaluators cannot import other evaluators
import { otherEvaluator } from '../../shared/evaluators/other.js'; // ✗
import { anotherEvaluator } from './other_evaluators.js'; // ✗
```

## Critical Import Patterns

### Core Imports

```typescript
// CORRECT - Import from @outputai/core
import {
  evaluator,
  z,
  EvaluationBooleanResult,
  EvaluationNumberResult,
  EvaluationStringResult,
  EvaluationFeedback
} from '@outputai/core';

// WRONG - Never import z from zod
import { z } from 'zod';
```

### LLM Client Import (for LLM-powered evaluators)

```typescript
// CORRECT - Use @outputai/llm wrapper
import { generateText, Output } from '@outputai/llm';

// WRONG - Never call LLM providers directly
import OpenAI from 'openai';
```

### ES Module Imports

All imports MUST use `.js` extension:

```typescript
// CORRECT
import { BlogContent } from './types.js';

// WRONG - Missing .js extension
import { BlogContent } from './types';
```

## Basic Structure

```typescript
import { evaluator, z, EvaluationBooleanResult } from '@outputai/core';

export const myEvaluator = evaluator( {
  name: 'my_evaluator',
  description: 'Description of what this evaluator assesses',
  inputSchema: z.object( { /* input schema */ } ),
  fn: async input => {
    // Evaluation logic
    return new EvaluationBooleanResult( {
      value: true,
      confidence: 0.95
    } );
  }
} );
```

## Required Properties

### name (string)
Unique identifier for the evaluator. Use `snake_case`.

```typescript
name: 'evaluate_content_quality'
```

### description (string)
Human-readable description of what the evaluator assesses.

```typescript
description: 'Evaluate the quality and completeness of generated content'
```

### inputSchema (Zod schema)
Schema for validating evaluator input.

```typescript
inputSchema: z.object( {
  content: z.string(),
  expectedLength: z.number()
} )
```

### fn (async function)
The evaluator execution function. Returns an evaluation result with value and confidence.

```typescript
fn: async input => {
  const isValid = input.content.length >= input.expectedLength;
  return new EvaluationBooleanResult( {
    value: isValid,
    confidence: 0.95
  } );
}
```

## Result Types

### EvaluationBooleanResult

Use for pass/fail or true/false evaluations:

```typescript
import { EvaluationBooleanResult } from '@outputai/core';

return new EvaluationBooleanResult( {
  value: true,           // boolean result
  confidence: 0.95,      // 0.0 to 1.0
  reasoning: 'Optional explanation of the evaluation'
} );
```

### EvaluationNumberResult

Use for numeric scores or ratings:

```typescript
import { EvaluationNumberResult } from '@outputai/core';

return new EvaluationNumberResult( {
  value: 85,             // numeric result (e.g., 0-100 score)
  confidence: 0.85,      // 0.0 to 1.0
  reasoning: 'Optional explanation of the score'
} );
```

### EvaluationStringResult

Use for categorical or text-based evaluations:

```typescript
import { EvaluationStringResult } from '@outputai/core';

return new EvaluationStringResult( {
  value: 'positive',     // string result (e.g., category, sentiment, label)
  confidence: 0.9,       // 0.0 to 1.0
  reasoning: 'Optional explanation of the classification'
} );
```

## Result Properties

| Property | Type | Required | Description |
|----------|------|----------|-------------|
| `value` | `boolean`, `number`, or `string` | Yes | The evaluation result |
| `confidence` | `number` (0.0-1.0) | Yes | Confidence in the evaluation |
| `reasoning` | `string` | No | Explanation of the evaluation |
| `name` | `string` | No | Name for this specific result (useful in dimensions) |
| `feedback` | `EvaluationFeedback[]` | No | Array of feedback objects with issues and suggestions |
| `dimensions` | `EvaluationResult[]` | No | Nested results for multi-dimensional evaluation |

## Simple Evaluator Examples

### Boolean Evaluator - Content Validation

```typescript
import { evaluator, z, EvaluationBooleanResult } from '@outputai/core';

export const evaluateCompleteness = evaluator( {
  name: 'evaluate_completeness',
  description: 'Check if content meets minimum length requirements',
  inputSchema: z.object( {
    content: z.string(),
    minLength: z.number().default( 100 )
  } ),
  fn: async ( { content, minLength } ) => {
    const isComplete = content.length >= minLength;

    return new EvaluationBooleanResult( {
      value: isComplete,
      confidence: 1.0,
      reasoning: isComplete ?
        `Content has ${content.length} characters, meets minimum of ${minLength}` :
        `Content has ${content.length} characters, below minimum of ${minLength}`
    } );
  }
} );
```

### Boolean Evaluator - Pattern Detection

```typescript
import { evaluator, z, EvaluationBooleanResult } from '@outputai/core';

export const evaluateGibberish = evaluator( {
  name: 'evaluate_gibberish',
  description: 'Check if a given string is gibberish',
  inputSchema: z.string(),
  fn: async content => {
    const gibberishPatterns = [ 'foo', 'bar', 'lorem', 'ipsum' ];
    const isGibberish = gibberishPatterns.some( p => content.toLowerCase().includes( p ) );

    return new EvaluationBooleanResult( {
      value: !isGibberish,
      confidence: 0.95
    } );
  }
} );
```

### Number Evaluator - Quality Score

```typescript
import { evaluator, z, EvaluationNumberResult } from '@outputai/core';

export const evaluateReadability = evaluator( {
  name: 'evaluate_readability',
  description: 'Calculate readability score based on sentence structure',
  inputSchema: z.object( {
    content: z.string()
  } ),
  fn: async ( { content } ) => {
    const sentences = content.split( /[.!?]+/ ).filter( s => s.trim() );
    const words = content.split( /\s+/ ).filter( w => w.trim() );
    const avgWordsPerSentence = words.length / Math.max( sentences.length, 1 );

    // Simple readability score (lower avg words = more readable)
    const score = Math.max( 0, Math.min( 100, 100 - ( avgWordsPerSentence - 15 ) * 5 ) );

    ret

Related in Backend & APIs