esphome-box3-builder
This skill should be used when the user asks to "configure esp32-s3-box-3", "set up box-3", "create box-3 voice assistant", "display lambda on box-3", "configure ili9xxx display", "set up gt911 touch", "configure i2s audio", "es7210 microphone", "es8311 speaker", "box-3 audio pipeline", or mentions error messages like "I2S DMA buffer error", "Touch not responding", "Display flicker", "Audio popping", "PSRAM not detected". Provides complete ESP32-S3-BOX-3 hardware templates, display lambda cookbook, touch patterns, and voice assistant configurations.
What this skill does
# ESP32-S3-BOX-3 Builder Skill
Specialist skill for ESP32-S3-BOX-3 hardware providing complete configuration templates, display lambda cookbook, touch interaction patterns, and voice assistant integration for complex audio/display/touch projects.
## Purpose
This skill accelerates ESP32-S3-BOX-3 development by providing:
- Complete hardware initialization templates
- Display lambda rendering cookbook (text, shapes, icons, multi-page UI)
- Audio pipeline recipes (I²S, ES7210 ADC, ES8311 DAC)
- Touch interaction patterns (buttons, swipes, gestures)
- Voice assistant integration (wake word, ducking, Home Assistant Assist)
- Material Design UI components
- Hardware-specific troubleshooting and workarounds
Use this skill for ESP32-S3-BOX-3 specific projects. For general ESPHome configuration, use the esphome-config-helper skill instead.
## When to Use This Skill
Use this skill when:
- Configuring ESP32-S3-BOX-3 hardware from scratch
- Implementing display lambda rendering (ILI9xxx)
- Setting up I²S audio pipeline (ES7210, ES8311)
- Configuring GT911 touch interaction
- Building voice assistant with wake word detection
- Creating multi-page touchscreen UI
- Troubleshooting BOX-3 specific issues
Delegate to specialized ESPHome agents for:
- Deep technical explanations (esphome-box3 agent)
- General ESPHome concepts (esphome-core agent)
- Network configuration (esphome-networking agent)
## Hardware Overview
The ESP32-S3-BOX-3 is a complete development kit with:
- **Module**: ESP32-S3-WROOM-1 (16MB Flash, 16MB Octal PSRAM)
- **Display**: ILI9342C (320x240, SPI, PSRAM required for 16-bit color)
- **Touch**: GT911 capacitive (I²C, multi-touch)
- **Microphone**: ES7210 4-channel ADC (I²S, 16kHz)
- **Speaker**: ES8311 mono DAC (I²S, 48kHz, requires MCLK)
- **Sensors**: BME688 environmental, ICM-42607-P IMU
**Critical Requirements**:
- PSRAM must be explicitly configured (2025.2+ breaking change)
- ESP-IDF framework recommended (better audio/display support)
- Shared I²S bus for microphone and speaker
- Reset pin GPIO48 shared between display and touch
For complete hardware specifications, consult:
- **`references/box3-pinout.md`** - Complete GPIO pinout, component addresses, known issues
## Configuration Templates
### Available Templates
Located in `templates/` directory:
1. **`box3-base.yaml`** - Hardware initialization foundation
- ESP-IDF framework with PSRAM octal mode
- I²S audio bus configuration
- ES7210 ADC and ES8311 DAC setup
- ILI9xxx display basic config
- GT911 touch initialization
- Use as foundation for all BOX-3 projects
2. **`box3-voice.yaml`** - Complete voice assistant
- micro_wake_word with okay_nabu
- Voice assistant pipeline (wake word → HA Assist → TTS)
- Audio ducking with Nabu media player
- State management for voice interaction
- Use for voice-controlled BOX-3 projects
3. **`box3-display-ui.yaml`** - Multi-page touchscreen UI
- 3-page navigation system
- Touch zone binary sensors
- Display lambda with Material Design
- Page state management with globals
- Use for interactive touchscreen projects
4. **`box3-audio-player.yaml`** - Music/media player
- Media player entity
- Volume control with touch buttons
- Display showing playback status
- Play/pause/skip controls
- Use for audio playback projects
### Using Templates
To use a template:
1. Read the appropriate template file
2. Customize device name, WiFi credentials
3. Adjust UI elements, colors, fonts as needed
4. Flash using BOX-3 specific script (see Scripts section)
**Template Workflow:**
```yaml
# 1. Read template
cat ${CLAUDE_PLUGIN_ROOT}/skills/esphome-box3-builder/templates/box3-voice.yaml
# 2. Copy to project
cp ${CLAUDE_PLUGIN_ROOT}/skills/esphome-box3-builder/templates/box3-voice.yaml my-box3.yaml
# 3. Edit device-specific values
# - Update device name
# - Set WiFi credentials (use secrets.yaml)
# - Customize wake word model if desired
# - Adjust display text and layout
# 4. Flash with BOX-3 script
${CLAUDE_PLUGIN_ROOT}/skills/esphome-box3-builder/scripts/flash-box3.sh my-box3.yaml
```
## Display Lambda Rendering
### Lambda Rendering Cookbook
For complete display lambda examples and patterns, consult:
- **`references/display-lambdas.md`** - Display lambda cookbook
- Text rendering (fonts, alignment, wrapping)
- Shapes (rectangles, circles, lines)
- Icons (Material Design Icons integration)
- Images and sprites
- Animation patterns
- Multi-page navigation
- Coordinate system and positioning
### Quick Display Patterns
**Basic Text Display**:
```cpp
it.printf(160, 10, id(roboto_16), TextAlign::TOP_CENTER, "ESP32-S3-BOX-3");
```
**Filled Rectangle (Card Background)**:
```cpp
it.filled_rectangle(10, 30, 300, 80, COLOR_PRIMARY);
```
**Multi-Line Text**:
```cpp
it.printf(20, 40, id(roboto_12), "Temperature: %.1f°C", id(temp_sensor).state);
it.printf(20, 60, id(roboto_12), "Humidity: %.1f%%", id(humidity_sensor).state);
```
**Icon Rendering** (Material Design Icons):
```cpp
// Requires MDI font in assets/fonts/
it.printf(30, 100, id(mdi_icons_24), "\U000F0425"); // thermometer icon
```
### Material Design UI Components
For Material Design color schemes, typography, and layouts, consult:
- **`references/material-design.md`** - Material Design UI guide
- Color palette (primary, accent, background, text)
- Typography hierarchy (headlines, body, captions)
- Card layouts and spacing
- Icon integration
- Touch zone sizing (minimum 48x48 pixels)
### Font Integration
Required fonts are in `assets/fonts/`:
- **Roboto-Regular.ttf** - Material Design typography
- **materialdesignicons-webfont.ttf** - MDI icon font
**Font Configuration**:
```yaml
font:
- file: ${CLAUDE_PLUGIN_ROOT}/skills/esphome-box3-builder/assets/fonts/Roboto-Regular.ttf
id: roboto_16
size: 16
- file: ${CLAUDE_PLUGIN_ROOT}/skills/esphome-box3-builder/assets/fonts/materialdesignicons-webfont.ttf
id: mdi_icons_24
size: 24
glyphs: [
"\U000F0425", # thermometer
"\U000F050F", # water-percent
"\U000F0493", # play
"\U000F03E4", # pause
]
```
Icon codepoints reference: `assets/fonts/mdi-codepoints.txt`
## Touch Interaction Patterns
### Touch Configuration
For complete touch patterns and gesture detection, consult:
- **`references/touch-patterns.md`** - Touch interaction patterns
- Binary sensor zones for buttons
- Swipe gesture detection
- Long press patterns
- Multi-touch handling
- Page navigation integration
### Quick Touch Patterns
**Touch Button Zone**:
```yaml
binary_sensor:
- platform: touchscreen
name: "Button 1"
x_min: 10
x_max: 150
y_min: 200
y_max: 230
on_press:
- logger.log: "Button 1 pressed"
```
**Page Navigation**:
```yaml
binary_sensor:
- platform: touchscreen
name: "Next Page"
x_min: 240
x_max: 310
y_min: 200
y_max: 230
on_press:
- lambda: |-
id(current_page) = (id(current_page) + 1) % 3;
id(box3_display).show_page(id(current_page));
```
## Audio Pipeline Configuration
### I²S Audio Setup
**Shared I²S Bus** (LRCLK=GPIO45, BCLK=GPIO17, MCLK=GPIO2):
```yaml
i2s_audio:
- id: i2s_shared
i2s_lrclk_pin: GPIO45
i2s_bclk_pin: GPIO17
i2s_mclk_pin: GPIO2
```
**ES7210 Microphone ADC** (16kHz, GPIO16):
```yaml
audio_adc:
- platform: es7210
id: es7210_adc
bits_per_sample: 16bit
sample_rate: 16000
mic_gain: 24DB
address: 0x40
microphone:
- platform: i2s_audio
adc_type: external
i2s_din_pin: GPIO16
sample_rate: 16000
```
**ES8311 Speaker DAC** (48kHz, GPIO15, requires MCLK):
```yaml
audio_dac:
- platform: es8311
id: es8311_dac
bits_per_sample: 16bit
sample_rate: 48000
use_mclk: true # Required
address: 0x18
speaker:
- platform: i2s_audio
dac_type: external
i2s_dout_pin: GPIO15
sample_rate: 48000
buffer_duration: 100ms # Prevents audio popping
```
## VoiceRelated in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.