visual-creation

Included with Lifetime

$97 forever

AI image and video generation. Use when: creating artwork, images, illustrations, animations, videos, visual assets, AI art generation, style guidance, choosing image or video models, text-in-image.

Image & Videoscripts

What this skill does


# AI Visual Creation

Decision frameworks for AI image and video generation. Not tutorials — corrections, gotchas, and "which tool for which job."

---

## Midjourney: Version Gotchas

### V7 Breaking Changes (Critical)

| Feature | V6 | V7 |
|---------|----|----|
| Multi-prompt `::` weighting | ✅ Works | ⚠️ **CHANGED** (different behavior) |
| Negative weights `::-0.5` | ✅ Works | ⚠️ Less predictable |
| `--cref` (Character Ref) | ✅ | ❌ **DEPRECATED** (use `--oref`) |
| `--stylize` scale | 0-1000 | 0-1000 (**different results!**) |
| `--no` parameter | ✅ | ✅ |
| `--iw` range | 0-2 | 0-3 |
| `--oref` (Omni Reference) | ❌ | ✅ New (2x GPU cost) |
| `--draft` mode | ❌ | ✅ New (10x faster, half cost) |
| `--exp` parameter | ❌ | ✅ New (0-100) |

**Stylize Scale Migration:** V6 `--s 100` ≈ V7 `--s 300-400` | V6 `--s 250` ≈ V7 `--s 600-700`

**V7 workarounds for changed weighting:**
- Word order matters (early = more weight)
- Use natural language emphasis
- `--no` for exclusion
- Repetition for emphasis

**V6 prompt:** `cyberpunk::2 nature::1 dystopian::-0.5`
**V7 equivalent:** `cyberpunk city with nature elements, NOT dystopian --no dystopian, grim, dark`

---

## Midjourney: Reference Type Decision

### Quick Selector

| I want... | Use | Parameter | Version |
|-----------|-----|-----------|---------|
| Composition inspiration + text | Image Prompt | `--iw 1-2` | All |
| Same aesthetic, different subject | `--sref` | `--sw 100-300` | All |
| Same character, new pose/outfit | `--cref` | `--cw 0-50` | **V6 only** |
| Same character, keep everything | `--cref` | `--cw 100` | **V6 only** |
| Exact object/character preservation | `--oref` | `--ow 100-400` | **V7 only** |

⚠️ **V7 Migration:** `--cref` deprecated in V7. Use `--oref` instead (works for characters AND objects).

### Reference Type Deep Dive

**Image Prompt (--iw)**
- Mental model: Addition (image + text = result)
- Preserves: Composition, layout
- Changes: Details, style via text
- Range: 0-3 (V7), 0-2 (V6)

**Style Reference (--sref)**
- Mental model: Multiplication (style × subject = result)
- Preserves: Color palette, mood, rendering
- Changes: Subject, composition entirely
- Range: --sw 0-1000

**Character Reference (--cref) — V6 ONLY**
- ⚠️ **Deprecated in V7** — use `--oref` instead
- **CRITICAL:** Works best with Midjourney-generated images, NOT real photos
- --cw 0 = face only (max outfit flexibility)
- --cw 100 = everything (face, hair, clothing)
- Cannot preserve: fine freckles, small logos, detailed tattoos

**Omni Reference (--oref) — V7 ONLY**
- 2x GPU cost
- Only ONE reference allowed
- NOT compatible with inpainting/outpainting or Draft mode
- Competing params: high --stylize needs higher --ow to balance

### Common Failures

| Problem | Cause | Fix |
|---------|-------|-----|
| Reference ignored | --iw too low | Increase to 2.0+ |
| Shape lost, got mandala | Symmetry bias | Add "asymmetrical", use `--no symmetric, mandala` |
| Character looks different | Using real photo | Use Midjourney-generated source |
| Style overwhelms shape | High --sw, low --iw | Lower --sw OR increase --iw |
| --oref not working | V6 or Draft mode | Switch to V7 standard mode |

---

## Model Selection: Images

### Decision Matrix

| Need | Best Choice | Why | Backup |
|------|-------------|-----|--------|
| Photorealism | Flux 2 / Imagen 4 | Best benchmark quality | Midjourney V7 |
| Artistic/stylized | Midjourney V7 | Color harmony, mood, abstract | Leonardo.ai |
| **Text in images** | Ideogram 3.0 | 85-90% accuracy (best) | GPT Image 1.5 |
| Character consistency | Leonardo.ai | Custom LoRA training | Flux Kontext |
| Technical diagrams | Flux 2 | Text + spatial control | Recraft V3 |
| Speed priority | SDXL / SD4 Turbo | 13 sec/image | Ideogram Turbo |
| Quality priority | Flux 2 Pro | Best 2026 benchmarks | GPT Image 1.5 |
| Commercial safety | Adobe Firefly | Licensed training only | DALL-E 3 |
| Budget (API) | Flux Schnell | $0.003/image | SDXL |
| Open source | Stable Diffusion | 80% market share | HunyuanImage |

### Text Rendering Hierarchy

**Best → Worst:** Ideogram 3.0 (85-90%) >> GPT Image 1.5 >> Recraft V3 >> Flux 2 (~60%) >> Imagen 4 >> DALL-E 3 >> Midjourney V7 (~15% better than V6, still poor)

**Rule:** If you need readable text, don't use Midjourney. Use Ideogram, GPT Image, or Flux 2.

---

## Model Selection: Video

### Decision Matrix

| Need | Best Choice | Why | Backup |
|------|-------------|-----|--------|
| Highest quality | Runway Gen-4.5 | Benchmark leader (1,247 ELO) | Veo 3.1 |
| **With audio sync** | Kling 2.6 | Only simultaneous audio-visual | — |
| Longest duration | Kling 2.6 | 3 minutes native | Runway |
| Character consistency | Kling O1 | Unified multimodal | Kling 2.6 |
| Professional color | Luma Ray3 | Only native HDR, 16-bit EXR | Runway |
| Budget | Hailuo 2.3 | Best cost-effectiveness | Kling 2.3 |
| Free/open source | HunyuanVideo | Beats Gen-3 quality | Stable Video |

### Key Insight

**Audio-visual sync is now a competitive differentiator.** Only Kling 2.6 generates video + voiceover + sound effects + ambient audio in a single pass.

---

## Troubleshooting Patterns

### "It won't preserve the shape"

1. Use Image Prompt with high --iw (2.0+)
2. Match aspect ratio (input 1:1 → output --ar 1:1)
3. Add `--style raw` for tighter adherence
4. Lower --stylize (30-50) for more literal interpretation
5. **If still failing:** Try **Imagen 4** or **Flux 2** — they preserve shapes more literally than Midjourney

### "It keeps making it symmetric"

Midjourney defaults to symmetry. Fight it:
1. Add "asymmetrical" keyword explicitly
2. Use `--no symmetric, mandala, radial, mirrored, balanced, centered`
3. Add `--chaos 6-10`
4. Use directional language ("positioned to the left", "stepping diagonally")
5. Material words help ("weathered metal", "carved stone" resist perfect symmetry)

### "Style overwhelms subject"

Balance the competing forces:
- Lower --sw (style weight)
- Increase --iw (image weight) if using reference
- Use `--style raw`
- Simplify text prompt

### "Character keeps changing"

**V7 (recommended):**
1. Use `--oref` with Midjourney-generated source (2x GPU cost)
2. Start at `--ow 100`, increase to 200-400 for facial accuracy
3. For many images: Leonardo.ai with custom LoRA

**V6 (legacy):**
1. Use `--cref` with Midjourney-generated source (not real photos)
2. `--cw 0` for face only, `--cw 100` for everything

---

## References

| Need | Load |
|------|------|
| Midjourney reference types detail | [midjourney/reference-types.md](references/midjourney/reference-types.md) |
| Midjourney V7 full guide | [midjourney/v7-guide.md](references/midjourney/v7-guide.md) |
| Midjourney parameters | [midjourney/parameters.md](references/midjourney/parameters.md) |
| Midjourney animation/video | [midjourney/animation.md](references/midjourney/animation.md) |
| Image model comparison | [image-models.md](references/image-models.md) |
| Video model comparison | [video-models.md](references/video-models.md) |

**Sources:** All claims cite official documentation (docs.midjourney.com, vendor APIs) and benchmarks (Artificial Analysis, LM Arena). Full URLs in reference files.

Files: 8

Size: 48.4 KB

Complexity: 65/100

Category: Image & Video

Source: https://github.com/yzavyas/claude-1337/tree/main/plugins/visuals-1337/skills/visual-creation

Related in Image & Video

watch

Included

Watch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.

Image & Videoscriptsfeatured

physical-ai-defect-image-generation

Included

Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.

Image & Videoscripts

accelint-react-best-practices

Included

React performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.

Image & Videoscripts

elevenlabs-agents

Included

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication

Image & Videoscripts

humanizer

Included

Humanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.

Image & Videoscripts

generating-mermaid-diagrams

Included

Salesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.

Image & Videoscripts