image-annotations
Annotate screenshots, diagrams, and images with callout rectangles, arrows, labels, and color-coded highlights using PIL. Includes rules for animated GIF annotations with timing and pacing.
What this skill does
# Image Annotations
Add visual callouts to any image — screenshots, diagrams, architecture docs, demo frames — using PIL/Pillow. Highlights what changed or what to look at, so reviewers don't have to guess.
## When to Use This Skill
Use this skill when you need to:
- Highlight a specific area in a screenshot for a PR description
- Annotate before/after images to show what changed
- Add labels and callouts to diagrams or architecture images
- Create annotated frames for animated GIF demos
## Prerequisites
```bash
pip install Pillow -q
```
## Color Rules
- **Red (`#E63946`)** — only for "bad" / "removed" things (e.g., circling a bug being fixed)
- **Yellowish-orange (`#FF9F1C`)** — for neutral highlights ("look here", "new feature", etc.)
- Never use red just because it's eye-catching — red = bad/removed
## Font
- Use **Ink Free** (`C:/Windows/Fonts/Inkfree.ttf`) for a handwritten look on Windows
- On Linux/macOS, fall back to `ImageFont.load_default()`
- Size **36** for annotations on ~1400px-wide images
- `stroke_width=1` with `stroke_fill=<same color as fill>` — gives body without being too thick
- Do NOT use white stroke — looks like a bad glow effect
## Shapes
- Prefer **rounded rectangles** over circles/ellipses — less pixelation at edges
- `draw.rounded_rectangle([x1, y1, x2, y2], radius=14, outline=color, width=5)`
- **Padding 18px** around the target content
## Reference Snippet
```python
from PIL import Image, ImageDraw, ImageFont
# Setup
font = ImageFont.truetype('C:/Windows/Fonts/Inkfree.ttf', 36) # or load_default()
color = '#FF9F1C' # orange for highlights
stroke = 5
pad = 18
img = Image.open('screenshot.png')
draw = ImageDraw.Draw(img)
# Rounded rect with padding
draw.rounded_rectangle(
[x1 - pad, y1 - pad, x2 + pad, y2 + pad],
radius=14, outline=color, width=stroke
)
# Leader line (same thickness as rect)
draw.line([x2 + pad, cy, x2 + pad + 40, cy - 30], fill=color, width=stroke)
# Label — same-color stroke for body, NO white stroke
draw.text(
(x2 + pad + 45, cy - 60), 'label text',
fill=color, font=font, stroke_width=1, stroke_fill=color
)
img.save('annotated.png')
```
## Algorithmic Annotation — `annotate.py`
For images with multiple elements to annotate, use the `annotate.py` module below. Save it next to your script and import from it. It handles automatic label placement without overlapping.
### Quick start
```python
from annotate import annotate_image
result = annotate_image(
'screenshot.png',
[
{'elem': (560, 275, 635, 390), 'label': 'button', 'draw_box': True},
{'elem': (105, 453, 236, 470), 'label': 'status text'},
],
debug=True,
)
result.save('annotated.png')
```
- `elem`: `(x1, y1, x2, y2)` tight bounding box — must be exact pixel coordinates
- `label`: text label (supports `\n` for multi-line)
- `draw_box`: if `True`, draws a rounded rectangle around the element. If `False` (default), draws a V-arrowhead pointing at the element
- `debug`: shows targeting rectangles and candidate heatmap for placement validation
### Coordinate grid helper
**Always use `grid_image()` before annotating an unfamiliar image.** Scaled-down previews display images smaller than actual pixel dimensions — the error compounds as you move away from (0,0).
```python
from annotate import grid_image
grid = grid_image('screenshot.png', step=100)
grid.save('grid.png')
```
Then verify with small crops:
```python
from PIL import Image
img = Image.open('screenshot.png')
crop = img.crop((x1 - 20, y1 - 20, x2 + 20, y2 + 20))
crop.save('verify.png')
```
### Algorithm overview
1. **Ring search**: candidates between MIN_ARROW (25px) and MAX_ARROW (120px) from element edge
2. **Contrast scoring**: prefers placements where label text is readable — `abs(avg_brightness - 147) - std * 0.3 - dist * 0.02`
3. **Joint resolution**: candidates computed independently, placed greedily (best score first)
4. **Hard blocks**: labels cannot overlap any other annotation's element or breathing box
5. **Proximity penalty**: labels within 40px of other placed boxes get a score penalty
6. **Arrow crossing penalty**: -50 for arrows crossing already-placed arrows
### Debug mode colors
| Color | Meaning |
|-------|---------|
| Cyan | Target element box (elem + padding) |
| Gray | Exclusion zone (MIN_ARROW buffer) |
| Red→Green | Candidate heatmap (red=bad, green=good) |
| Magenta | Chosen label position |
| Orange | Final rendered annotation |
### Arrow styles
- **`draw_box=True`**: rounded rectangle + straight line to label, no arrowhead
- **`draw_box=False`**: V-shaped arrowhead with rounded line caps
### `annotate.py` — full module
Save this as `annotate.py` and import from it:
```python
"""
Algorithmic screenshot annotation with automatic label placement.
pip install Pillow numpy
Optional for diff_images: pip install scipy
"""
import math
import numpy as np
from PIL import Image, ImageDraw, ImageFont
# --- Defaults ---
DEFAULT_FONT = 'C:/Windows/Fonts/Inkfree.ttf'
DEFAULT_FONT_SIZE = 32
DEFAULT_COLOR = '#FF9F1C'
DEFAULT_STROKE = 5
MIN_ARROW = 25
MAX_ARROW = 120
TEXT_PAD = 6
BREATH = 18
CROSSING_PENALTY = 50
PROXIMITY_MARGIN = 40
PROXIMITY_PENALTY = 50
def _rect_intersects(a, b):
return a[0] < b[2] and a[2] > b[0] and a[1] < b[3] and a[3] > b[1]
def _segments_intersect(p1, p2, p3, p4):
def cross(o, a, b):
return (a[0] - o[0]) * (b[1] - o[1]) - (a[1] - o[1]) * (b[0] - o[0])
d1, d2 = cross(p3, p4, p1), cross(p3, p4, p2)
d3, d4 = cross(p1, p2, p3), cross(p1, p2, p4)
return ((d1 > 0 and d2 < 0) or (d1 < 0 and d2 > 0)) and \
((d3 > 0 and d4 < 0) or (d3 < 0 and d4 > 0))
def _line_rect_exit(cx, cy, tx, ty, rect):
x1, y1, x2, y2 = rect
dx, dy = tx - cx, ty - cy
tmin, tmax = 0.0, 1.0
for lo, hi, p, d in [(x1, x2, cx, dx), (y1, y2, cy, dy)]:
if abs(d) < 1e-9:
continue
t0, t1 = (lo - p) / d, (hi - p) / d
if t0 > t1:
t0, t1 = t1, t0
tmin, tmax = max(tmin, t0), min(tmax, t1)
return (cx + dx * tmax, cy + dy * tmax)
def _rect_gap(a, b):
dx = max(a[0] - b[2], b[0] - a[2], 0)
dy = max(a[1] - b[3], b[1] - a[3], 0)
if dx == 0 and dy == 0:
return 0
return math.sqrt(dx**2 + dy**2)
def _find_candidates(pixels, W, H, cyan, pw, ph, font):
cx, cy = (cyan[0] + cyan[2]) / 2, (cyan[1] + cyan[3]) / 2
excl_zone = (cyan[0] - MIN_ARROW, cyan[1] - MIN_ARROW,
cyan[2] + MIN_ARROW, cyan[3] + MIN_ARROW)
sx1 = max(0, cyan[0] - MAX_ARROW - pw)
sy1 = max(0, cyan[1] - MAX_ARROW - ph)
sx2 = min(W - pw, cyan[2] + MAX_ARROW)
sy2 = min(H - ph, cyan[3] + MAX_ARROW)
step_x = max(8, min(pw // 2, MAX_ARROW // 3))
step_y = max(8, min(ph // 2, MAX_ARROW // 3))
cands = []
for px in range(sx1, sx2, step_x):
for py in range(sy1, sy2, step_y):
pink = (px, py, px + pw, py + ph)
if _rect_intersects(pink, excl_zone):
continue
gl, gr = cyan[0] - pink[2], pink[0] - cyan[2]
gt, gb = cyan[1] - pink[3], pink[1] - cyan[3]
hd, vd = max(gl, gr, 0), max(gt, gb, 0)
ed = math.sqrt(hd**2 + vd**2) if (hd > 0 and vd > 0) else max(hd, vd)
if ed > MAX_ARROW:
continue
region = pixels[py:py + ph, px:px + pw, :3].astype(float)
score = abs(np.mean(region) - 147) - np.std(region) * 0.3
dist = math.sqrt((px + pw/2 - cx)**2 + (py + ph/2 - cy)**2)
score -= dist * 0.02
cands.append(((px, py), score))
return cands
def _resolve_placements(annots, font):
placed = []
all_elem_zones = []
for ann in annots:
all_elem_zones.append(ann['cyan'])
if ann.get('draw_box', False):
c = ann['cyan']
all_elem_zones.append((c[0]-BREATH, c[1]-BREATH, c[2]+BREATH, c[3]+BREATH))
for ann in sorted(annots, key=lambda a: -a['best_score']):
pw, ph =Related in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.