videoagent-image-studio

Included with Lifetime

$97 forever

Tired of juggling 8 API keys? This skill gives you one-command access to Midjourney, Flux, Ideogram, and more, with zero setup. Use when you want to generate any image without worrying about API keys.

Image & Videovideoimage-generationmidjourneyfluxgeminifalideogramrecraft

What this skill does


# 🎨 VideoAgent Image Studio

**Use when:** User asks to generate, draw, create, or make any kind of image, photo, illustration, icon, logo, or artwork.

Generate images with 8 state-of-the-art AI models. This skill automatically picks the best model for the job and handles all the complexity — including Midjourney's async polling — so you can focus on the conversation.

---

## Quick Reference

| User Intent | Model | Speed |
|---|---|---|
| Artistic, cinematic, painterly | `midjourney` | ~15s |
| Photorealistic, portrait, product | `flux-pro` | ~8s |
| General purpose, balanced | `flux-dev` | ~10s |
| Quick draft, fast iteration | `flux-schnell` | ~2s |
| Image with text, logo, poster | `ideogram` | ~10s |
| Vector art, icon, flat design | `recraft` | ~8s |
| Anime, stylized illustration | `sdxl` | ~5s |
| Gemini-powered, consistent style | `nano-banana` | ~12s |

---

## How to Generate an Image

### Step 1 — Enhance the prompt

Before calling the script, expand the user's prompt with style, lighting, and quality descriptors appropriate for the chosen model.

- **Midjourney**: Add `cinematic lighting`, `ultra detailed`, `--v 7`, `--style raw`
- **Flux**: Add `masterpiece`, `highly detailed`, `sharp focus`, `professional photography`
- **Ideogram**: Be explicit about text content, font style, and layout
- **Recraft**: Specify `vector illustration`, `flat design`, `icon style`

### Step 2 — Run the script

```bash
node {baseDir}/tools/generate.js \
  --model <model_id> \
  --prompt "<enhanced prompt>" \
  --aspect-ratio <ratio>
```

**All parameters:**

| Parameter | Default | Description |
|---|---|---|
| `--model` | `flux-dev` | Model ID from the table above |
| `--prompt` | *(required)* | The image generation prompt |
| `--aspect-ratio` | `1:1` | `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `3:2`, `21:9` |
| `--num-images` | `1` | Number of images (1–4; Midjourney always returns 4) |
| `--negative-prompt` | — | Things to avoid (not supported by Midjourney) |
| `--seed` | — | Seed for reproducibility |

### Step 3 — Return the result

The script always waits and returns the final image URL(s). No polling required.

```json
{
  "success": true,
  "model": "flux-pro",
  "imageUrl": "https://...",
  "images": ["https://..."]
}
```

Send the `imageUrl` to the user.

---

## Midjourney Actions

After generating a 4-image grid with Midjourney, offer the user these options:

```bash
# Upscale image #2 (subtle, preserves details)
node {baseDir}/tools/generate.js \
  --model midjourney \
  --action upscale \
  --index 2 \
  --job-id <job_id>

# Create a strong variation of image #3
node {baseDir}/tools/generate.js \
  --model midjourney \
  --action variation \
  --index 3 \
  --job-id <job_id> \
  --variation-type 1

# Regenerate with same prompt
node {baseDir}/tools/generate.js \
  --model midjourney \
  --action reroll \
  --job-id <job_id>
```

**Upscale types:** `0` = Subtle (default, best for photos), `1` = Creative (best for illustrations)

**Variation types:** `0` = Subtle (default), `1` = Strong (dramatic changes)

---

## Example Conversations

**User:** "Draw a snow leopard on a snowy mountain with cinematic lighting"

```bash
# Choose midjourney for artistic quality
node {baseDir}/tools/generate.js \
  --model midjourney \
  --prompt "a majestic snow leopard on a snowy mountain peak, cinematic lighting, dramatic atmosphere, ultra detailed --ar 16:9 --v 7" \
  --aspect-ratio 16:9
```

> 🎨 Done! Which one to upscale? (U1-U4) Or create a variant? (V1-V4)

---

**User:** "Use Flux to generate a perfume product poster, white background"

```bash
# Choose flux-pro for photorealistic product shots
node {baseDir}/tools/generate.js \
  --model flux-pro \
  --prompt "a luxury perfume bottle on a clean white background, professional product photography, soft shadows, 8k, highly detailed" \
  --aspect-ratio 3:4
```

---

**User:** "Show me a quick draft"

```bash
# flux-schnell for instant previews
node {baseDir}/tools/generate.js \
  --model flux-schnell \
  --prompt "..." \
  --aspect-ratio 1:1
```

---

**User:** "Make me an App icon, flat style, blue theme"

```bash
# recraft for vector/icon style
node {baseDir}/tools/generate.js \
  --model recraft \
  --prompt "a minimal flat design app icon, blue color scheme, simple geometric shapes, vector style, white background"
```

---

## Setup

**Zero API keys needed!** All requests go through a hosted proxy that handles authentication server-side.

The skill works out of the box — just install and use.

### Advanced: Custom proxy or token

If you want to use your own proxy or a persistent token, set these environment variables:

```json
{
  "skills": {
    "entries": {
      "videoagent-image-studio": {
        "enabled": true,
        "env": {
          "IMAGE_STUDIO_PROXY_URL": "https://your-proxy.vercel.app",
          "IMAGE_STUDIO_TOKEN": "your_token_here"
        }
      }
    }
  }
}
```

| Variable | Required | Description |
|---|---|---|
| `IMAGE_STUDIO_PROXY_URL` | No | Custom proxy base URL (default: `https://image-gen-proxy.vercel.app`) |
| `IMAGE_STUDIO_TOKEN` | No | Persistent token (auto-obtained if not set, 100 free uses per token) |

To deploy your own proxy, see the [videoagent-audio-studio proxy](../videoagent-audio-studio/proxy/) as a reference implementation. You'll need `FAL_KEY` and `LEGNEXT_KEY` as Vercel environment variables.

---

## Changelog

### v2.0.0
- **Simplified async**: The script now blocks until Midjourney completes. No more `--async` / `--poll` flags needed in SKILL.md instructions.
- **Unified output format**: All models return the same `{ success, imageUrl, images }` shape.
- **Reference images for Nano Banana**: Pass `--reference-images "url1,url2"` for character/style consistency across generations.

### v1.3.0
- Added non-blocking async mode for Midjourney (`--async` + `--poll`).

### v1.2.0
- Midjourney turbo mode enabled by default (~10-20s).

### v1.1.0
- Switched Midjourney provider from TTAPI to Legnext.ai for better stability.

### v1.0.0
- Initial release with Midjourney, Flux, SDXL, Nano Banana, Ideogram, Recraft.

Files: 6

Size: 21.1 KB

Complexity: 36/100

Category: Image & Video

Source: https://github.com/pexoai/pexo-skills/tree/main/skills/videoagent-image-studio

Related in Image & Video

watch

Included

Watch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.

Image & Videoscriptsfeatured

physical-ai-defect-image-generation

Included

Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.

Image & Videoscripts

accelint-react-best-practices

Included

React performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.

Image & Videoscripts

elevenlabs-agents

Included

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication

Image & Videoscripts

humanizer

Included

Humanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.

Image & Videoscripts

generating-mermaid-diagrams

Included

Salesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.

Image & Videoscripts