esphome-box3-builder

Included with Lifetime

$97 forever

This skill should be used when the user asks to "configure esp32-s3-box-3", "set up box-3", "create box-3 voice assistant", "display lambda on box-3", "configure ili9xxx display", "set up gt911 touch", "configure i2s audio", "es7210 microphone", "es8311 speaker", "box-3 audio pipeline", or mentions error messages like "I2S DMA buffer error", "Touch not responding", "Display flicker", "Audio popping", "PSRAM not detected". Provides complete ESP32-S3-BOX-3 hardware templates, display lambda cookbook, touch patterns, and voice assistant configurations.

Image & Videoscriptsassets

What this skill does


# ESP32-S3-BOX-3 Builder Skill

Specialist skill for ESP32-S3-BOX-3 hardware providing complete configuration templates, display lambda cookbook, touch interaction patterns, and voice assistant integration for complex audio/display/touch projects.

## Purpose

This skill accelerates ESP32-S3-BOX-3 development by providing:
- Complete hardware initialization templates
- Display lambda rendering cookbook (text, shapes, icons, multi-page UI)
- Audio pipeline recipes (I²S, ES7210 ADC, ES8311 DAC)
- Touch interaction patterns (buttons, swipes, gestures)
- Voice assistant integration (wake word, ducking, Home Assistant Assist)
- Material Design UI components
- Hardware-specific troubleshooting and workarounds

Use this skill for ESP32-S3-BOX-3 specific projects. For general ESPHome configuration, use the esphome-config-helper skill instead.

## When to Use This Skill

Use this skill when:
- Configuring ESP32-S3-BOX-3 hardware from scratch
- Implementing display lambda rendering (ILI9xxx)
- Setting up I²S audio pipeline (ES7210, ES8311)
- Configuring GT911 touch interaction
- Building voice assistant with wake word detection
- Creating multi-page touchscreen UI
- Troubleshooting BOX-3 specific issues

Delegate to specialized ESPHome agents for:
- Deep technical explanations (esphome-box3 agent)
- General ESPHome concepts (esphome-core agent)
- Network configuration (esphome-networking agent)

## Hardware Overview

The ESP32-S3-BOX-3 is a complete development kit with:
- **Module**: ESP32-S3-WROOM-1 (16MB Flash, 16MB Octal PSRAM)
- **Display**: ILI9342C (320x240, SPI, PSRAM required for 16-bit color)
- **Touch**: GT911 capacitive (I²C, multi-touch)
- **Microphone**: ES7210 4-channel ADC (I²S, 16kHz)
- **Speaker**: ES8311 mono DAC (I²S, 48kHz, requires MCLK)
- **Sensors**: BME688 environmental, ICM-42607-P IMU

**Critical Requirements**:
- PSRAM must be explicitly configured (2025.2+ breaking change)
- ESP-IDF framework recommended (better audio/display support)
- Shared I²S bus for microphone and speaker
- Reset pin GPIO48 shared between display and touch

For complete hardware specifications, consult:
- **`references/box3-pinout.md`** - Complete GPIO pinout, component addresses, known issues

## Configuration Templates

### Available Templates

Located in `templates/` directory:

1. **`box3-base.yaml`** - Hardware initialization foundation
   - ESP-IDF framework with PSRAM octal mode
   - I²S audio bus configuration
   - ES7210 ADC and ES8311 DAC setup
   - ILI9xxx display basic config
   - GT911 touch initialization
   - Use as foundation for all BOX-3 projects

2. **`box3-voice.yaml`** - Complete voice assistant
   - micro_wake_word with okay_nabu
   - Voice assistant pipeline (wake word → HA Assist → TTS)
   - Audio ducking with Nabu media player
   - State management for voice interaction
   - Use for voice-controlled BOX-3 projects

3. **`box3-display-ui.yaml`** - Multi-page touchscreen UI
   - 3-page navigation system
   - Touch zone binary sensors
   - Display lambda with Material Design
   - Page state management with globals
   - Use for interactive touchscreen projects

4. **`box3-audio-player.yaml`** - Music/media player
   - Media player entity
   - Volume control with touch buttons
   - Display showing playback status
   - Play/pause/skip controls
   - Use for audio playback projects

### Using Templates

To use a template:
1. Read the appropriate template file
2. Customize device name, WiFi credentials
3. Adjust UI elements, colors, fonts as needed
4. Flash using BOX-3 specific script (see Scripts section)

**Template Workflow:**
```yaml
# 1. Read template
cat ${CLAUDE_PLUGIN_ROOT}/skills/esphome-box3-builder/templates/box3-voice.yaml

# 2. Copy to project
cp ${CLAUDE_PLUGIN_ROOT}/skills/esphome-box3-builder/templates/box3-voice.yaml my-box3.yaml

# 3. Edit device-specific values
# - Update device name
# - Set WiFi credentials (use secrets.yaml)
# - Customize wake word model if desired
# - Adjust display text and layout

# 4. Flash with BOX-3 script
${CLAUDE_PLUGIN_ROOT}/skills/esphome-box3-builder/scripts/flash-box3.sh my-box3.yaml
```

## Display Lambda Rendering

### Lambda Rendering Cookbook

For complete display lambda examples and patterns, consult:
- **`references/display-lambdas.md`** - Display lambda cookbook
  - Text rendering (fonts, alignment, wrapping)
  - Shapes (rectangles, circles, lines)
  - Icons (Material Design Icons integration)
  - Images and sprites
  - Animation patterns
  - Multi-page navigation
  - Coordinate system and positioning

### Quick Display Patterns

**Basic Text Display**:
```cpp
it.printf(160, 10, id(roboto_16), TextAlign::TOP_CENTER, "ESP32-S3-BOX-3");
```

**Filled Rectangle (Card Background)**:
```cpp
it.filled_rectangle(10, 30, 300, 80, COLOR_PRIMARY);
```

**Multi-Line Text**:
```cpp
it.printf(20, 40, id(roboto_12), "Temperature: %.1f°C", id(temp_sensor).state);
it.printf(20, 60, id(roboto_12), "Humidity: %.1f%%", id(humidity_sensor).state);
```

**Icon Rendering** (Material Design Icons):
```cpp
// Requires MDI font in assets/fonts/
it.printf(30, 100, id(mdi_icons_24), "\U000F0425");  // thermometer icon
```

### Material Design UI Components

For Material Design color schemes, typography, and layouts, consult:
- **`references/material-design.md`** - Material Design UI guide
  - Color palette (primary, accent, background, text)
  - Typography hierarchy (headlines, body, captions)
  - Card layouts and spacing
  - Icon integration
  - Touch zone sizing (minimum 48x48 pixels)

### Font Integration

Required fonts are in `assets/fonts/`:
- **Roboto-Regular.ttf** - Material Design typography
- **materialdesignicons-webfont.ttf** - MDI icon font

**Font Configuration**:
```yaml
font:
  - file: ${CLAUDE_PLUGIN_ROOT}/skills/esphome-box3-builder/assets/fonts/Roboto-Regular.ttf
    id: roboto_16
    size: 16

  - file: ${CLAUDE_PLUGIN_ROOT}/skills/esphome-box3-builder/assets/fonts/materialdesignicons-webfont.ttf
    id: mdi_icons_24
    size: 24
    glyphs: [
      "\U000F0425",  # thermometer
      "\U000F050F",  # water-percent
      "\U000F0493",  # play
      "\U000F03E4",  # pause
    ]
```

Icon codepoints reference: `assets/fonts/mdi-codepoints.txt`

## Touch Interaction Patterns

### Touch Configuration

For complete touch patterns and gesture detection, consult:
- **`references/touch-patterns.md`** - Touch interaction patterns
  - Binary sensor zones for buttons
  - Swipe gesture detection
  - Long press patterns
  - Multi-touch handling
  - Page navigation integration

### Quick Touch Patterns

**Touch Button Zone**:
```yaml
binary_sensor:
  - platform: touchscreen
    name: "Button 1"
    x_min: 10
    x_max: 150
    y_min: 200
    y_max: 230
    on_press:
      - logger.log: "Button 1 pressed"
```

**Page Navigation**:
```yaml
binary_sensor:
  - platform: touchscreen
    name: "Next Page"
    x_min: 240
    x_max: 310
    y_min: 200
    y_max: 230
    on_press:
      - lambda: |-
          id(current_page) = (id(current_page) + 1) % 3;
          id(box3_display).show_page(id(current_page));
```

## Audio Pipeline Configuration

### I²S Audio Setup

**Shared I²S Bus** (LRCLK=GPIO45, BCLK=GPIO17, MCLK=GPIO2):
```yaml
i2s_audio:
  - id: i2s_shared
    i2s_lrclk_pin: GPIO45
    i2s_bclk_pin: GPIO17
    i2s_mclk_pin: GPIO2
```

**ES7210 Microphone ADC** (16kHz, GPIO16):
```yaml
audio_adc:
  - platform: es7210
    id: es7210_adc
    bits_per_sample: 16bit
    sample_rate: 16000
    mic_gain: 24DB
    address: 0x40

microphone:
  - platform: i2s_audio
    adc_type: external
    i2s_din_pin: GPIO16
    sample_rate: 16000
```

**ES8311 Speaker DAC** (48kHz, GPIO15, requires MCLK):
```yaml
audio_dac:
  - platform: es8311
    id: es8311_dac
    bits_per_sample: 16bit
    sample_rate: 48000
    use_mclk: true  # Required
    address: 0x18

speaker:
  - platform: i2s_audio
    dac_type: external
    i2s_dout_pin: GPIO15
    sample_rate: 48000
    buffer_duration: 100ms  # Prevents audio popping
```

## Voice

Files: 9

Size: 64.0 KB

Complexity: 84/100

Category: Image & Video

Source: https://github.com/nodnarbnitram/claude-code-extensions/tree/main/plugins/cce-esphome/skills/esphome-box3-builder

Related in Image & Video

watch

Included

Watch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.

Image & Videoscriptsfeatured

physical-ai-defect-image-generation

Included

Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.

Image & Videoscripts

accelint-react-best-practices

Included

React performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.

Image & Videoscripts

elevenlabs-agents

Included

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication

Image & Videoscripts

humanizer

Included

Humanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.

Image & Videoscripts

generating-mermaid-diagrams

Included

Salesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.

Image & Videoscripts