image-prompt-generator
Generate AI images using Gemini image generation API. Use this skill when content needs images - thumbnails, social posts, blog headers, or creative visuals. Follows an iterative workflow - brainstorm concepts, select direction, generate in multiple styles, then produce via API.
What this skill does
# Image Prompt Generator Generate professional, non-generic images using Google's Gemini API for image generation. ## Prerequisites & Setup ### Getting Your Gemini API Key 1. Go to [Google AI Studio](https://aistudio.google.com/app/apikey) 2. Sign in with your Google account 3. Click "Create API Key" 4. Copy the generated key ### Configuring the API Key **Option 1: Environment file (recommended)** Create a `.env` file in your project root: ```bash GEMINI_API_KEY=your_api_key_here ``` **Option 2: Direct environment variable** ```bash export GEMINI_API_KEY=your_api_key_here ``` ### Install Dependencies ```bash pip install google-generativeai python-dotenv pillow ``` ### Available Models | Model | API Name | Best For | |-------|----------|----------| | **Flash** | `gemini-2.5-flash-image` | Speed, drafts, iteration | | **Pro** | `gemini-3-pro-image-preview` | **Final assets, 16:9 aspect ratio, quality** | **CRITICAL**: Use `gemini-3-pro-image-preview` for: - Thumbnails (need 16:9 aspect ratio) - Final production images - Any image where aspect_ratio config is needed --- ## Workflow Overview 1. **Brainstorm Concepts** - Generate 4-6 high-level visual ideas 2. **Select Direction** - User picks the concept they like 3. **Optimize Prompt** - Refine into a strong, detailed prompt 4. **Style Variations** - Adapt to 2-3 different visual styles 5. **Generate Images** - Run via Gemini API ## Step 1: Brainstorm Concepts When the user provides a topic or use case, generate 4-6 high-level visual concepts. Each concept should be: - **One sentence** describing the visual idea - **Concrete and immediate** - you can picture it instantly - **Conceptual but not abstract** - a clear object/scene with meaning - **Non-generic** - avoid cliches (no lightbulbs for ideas, no handshakes for partnership) **Format:** ``` 1. **[Short label]** - One sentence description of the visual concept and why it works. 2. **[Short label]** - One sentence description... ``` **Example for "newsletter about personal productivity":** ``` 1. **Compass with coffee stain** - A vintage compass where the needle points toward a coffee ring stain on a map, suggesting direction emerges from daily rituals. 2. **Clock face with seasons** - A clock where the 12 hours show seasonal changes, suggesting time management over long arcs, not just hours. 3. **Empty desk with shadow** - A minimalist desk in morning light, but the shadow shows a cluttered desk - the gap between intention and reality. 4. **Single key on many keychains** - One small key attached to dozens of decorative keychains, suggesting we overcomplicate simple solutions. ``` Wait for user to select before proceeding. ## Step 2: Optimize the Prompt Once the user selects a concept, develop it into a full prompt. Structure: ``` Create a [style type] illustration of [subject]. CONCEPT: [Expand the one-sentence idea into a clear visual description] STYLE: [Artistic approach - load from references/styles/ if brand-specific] COMPOSITION: [Framing, focal point, negative space, balance] COLORS: [Palette - describe by name, not hex codes which may render as text] TEXTURE: [Surface qualities, analog/digital feel] AVOID: [What should NOT appear - be specific] FORMAT: [Aspect ratio] ``` **Key principles:** - Natural language, full sentences - no tag soup - Describe colors by name (burnt orange, sky blue, near-black) not hex codes - Maximum 2-3 elements - if it feels busy, remove something - Favor metaphor over literal depiction ## Step 3: Style Variations **Default style: Risograph** - Use `references/styles/risograph.md` unless the content calls for something different. Available styles in `references/styles/`: - **risograph.md** - DEFAULT. Halftone dots, misregistration, indie printmaking aesthetic. Warm, tactile, analog. - **minimalist-ink.md** - High-contrast black and white, crosshatching. For craft/mastery posts. - **watercolor-line.md** - Ink linework with watercolor washes, warm. For organic topics. - **editorial-conceptual.md** - Conceptual, sophisticated, editorial wit. For abstract/philosophical posts. Present style options to user, recommending risograph as default. ## Step 4: Generate via API ### Running the Script ```bash # Load key from .env and generate export $(grep GEMINI_API_KEY .env) && \ python scripts/generate_image.py "prompt here" --model pro --aspect 16:9 # Save to specific folder python scripts/generate_image.py "prompt" --output "./images" --name "my_image" ``` **Options:** - `--model flash` (faster, cheaper) or `--model pro` (higher quality) - `--aspect 16:9`, `1:1`, or `9:16` (**PRO MODEL ONLY** - for flash, you MUST include ratio in prompt text) - `--variations N` - generate N versions - `--output ./path` - save location - `--name prefix` - filename prefix **Output location:** Save images alongside the content they belong to - not a generic images dump. ## Step 5: Iterate After user reviews generated images: - **80% good?** Request specific edits conversationally rather than regenerating - **Composition off?** Adjust framing or element placement in prompt - **Wrong style?** Try a different style reference - **Too busy?** Simplify to fewer elements - **Colors wrong?** Be more explicit about palette ## Prompting Principles ### Write Like a Creative Director Brief the model like a human artist. Use proper grammar, full sentences, and descriptive adjectives. | Don't | Do | |-------|-----| | "Cool car, neon, city, night, 8k" | "A cinematic wide shot of a futuristic sports car speeding through a rainy Tokyo street at night. The neon signs reflect off the wet pavement and the car's metallic chassis." | **Be specific about:** - **Subject:** Instead of "a woman," say "a sophisticated elderly woman wearing a vintage chanel-style suit" - **Materiality:** Describe textures - "matte finish," "brushed steel," "soft velvet," "crumpled paper" - **Setting:** Define location, time of day, weather - **Lighting:** Specify mood and light source - **Mood:** Emotional tone of the image ### Provide Context Context helps the model make logical artistic decisions. Include the "why" or "for whom." **Example:** "Create an image of a sandwich for a Brazilian high-end gourmet cookbook." *(Model infers: professional plating, shallow depth of field, perfect lighting)* ### Keep It Simple - One clear focal point - Maximum 2-3 elements total - Generous negative space - If it feels busy, remove something ### Avoid the Generic - No lightbulbs for "ideas" - No handshakes for "partnership" - No happy stock photo poses - No glossy AI aesthetic ## Resources ### references/styles/ Aesthetic style definitions: - `risograph.md` - **DEFAULT** - Halftone, misregistration, indie printmaking - `minimalist-ink.md` - Black and white ink illustration - `watercolor-line.md` - Ink with watercolor washes - `editorial-conceptual.md` - Conceptual editorial style ### scripts/ - `generate_image.py` - Gemini API image generation ## Prompt Modifiers Reference | Category | Examples | |----------|----------| | **Lighting** | golden hour, dramatic shadows, soft diffused light, neon glow, overcast | | **Style** | cinematic, editorial, technical diagram, hand-drawn, photorealistic | | **Texture** | matte finish, brushed steel, soft velvet, crumpled paper, weathered wood | | **Composition** | wide shot, close-up, bird's eye view, dutch angle, symmetrical | | **Mood** | energetic, serene, dramatic, playful, sophisticated | | **Quality** | 4K, high-fidelity, pixel-perfect, professional grade | ## Advanced Capabilities ### Text Rendering & Infographics Put exact text in quotes. Specify style: "polished editorial," "technical diagram," or "hand-drawn whiteboard." **Example prompts:** ``` Earnings Report Infographic: "Generate a clean, modern infographic summarizing the key financial highlights from this earnings report. Include charts for 'Revenue Growth' and 'Net Income', and highlight the CEO's key quote in a stylized
Related in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.