nanobanana-image-gen
Generate and edit images using Google's Nanobanana model via Replicate API. Use this skill when users request image generation from text descriptions, image-to-image transformations, style transfers, or any creative image editing tasks. Supports multiple input images, custom aspect ratios, and various output formats.
What this skill does
# Nanobanana Image Generation
## Overview
This skill enables image generation and editing using Google's Nanobanana model (google/nano-banana) through the Replicate API. Generate images from text prompts, transform existing images, apply style transfers, or create variations with precise control over aspect ratios and output formats.
## Core Capabilities
### 1. Text-to-Image Generation
Generate images from natural language descriptions.
**Example requests:**
- "Generate an image of a sunset over snow-capped mountains"
- "Create a portrait of a cat wearing a bow tie in a Victorian setting"
- "Make me an abstract image with blue and purple swirls"
**Process:**
1. Extract the text description from the user's request
2. Determine appropriate aspect ratio (default: 1:1 for general images, 16:9 for landscapes, 2:3 for portraits)
3. Execute `scripts/generate_image.py` with the prompt
4. Save the generated image to the user's working directory
### 2. Image-to-Image Transformation
Transform or edit existing images using text prompts.
**Example requests:**
- "Make this photo look like a watercolor painting"
- "Transform this image to black and white with high contrast"
- "Turn this photo into an anime-style illustration"
- "Edit this image to make it look like it was taken at golden hour"
**Process:**
1. Ensure the input image is accessible (local file or URL)
2. If local file, it must be uploaded to a publicly accessible URL first (use appropriate upload method)
3. Extract the transformation instructions from the user's request
4. Execute `scripts/generate_image.py` with both prompt and image input
5. Save the transformed image to the user's working directory
### 3. Multi-Image Input
Use multiple images as references or inputs for generation.
**Example requests:**
- "Combine the style of this painting with the subject of this photo"
- "Generate an image that merges elements from these three images"
- "Create a variation that incorporates aspects from both of these images"
**Process:**
1. Ensure all input images are accessible as URLs
2. Provide all image URLs in the `image_input` array parameter
3. Execute `scripts/generate_image.py` with the prompt and multiple image inputs
### 4. Aspect Ratio Control
Generate images with specific dimensions for different use cases.
**Available aspect ratios:**
- `1:1` - Square (social media posts, profile pictures)
- `16:9` - Widescreen (presentations, YouTube thumbnails)
- `9:16` - Vertical (mobile stories, TikTok)
- `4:3` - Standard (traditional photos)
- `3:4` - Portrait orientation
- `21:9` - Ultra-wide (cinematic)
- `4:5` - Instagram portrait
- `5:4` - Medium format
- `2:3` - Portrait photos
- `3:2` - Landscape photos
- `match_input_image` - Match input image dimensions (default when image input provided)
**Example requests:**
- "Generate a 16:9 banner image of a forest"
- "Create a square profile picture of a logo"
- "Make a vertical 9:16 image for Instagram stories"
## Using the Generation Script
The `scripts/generate_image.py` script handles all Replicate API interactions with proper error handling and polling.
**Basic usage:**
```bash
python scripts/generate_image.py "a sunset over mountains" --output sunset.jpg
```
**With image input:**
```bash
python scripts/generate_image.py "make this look like a watercolor" \
--image-input https://example.com/photo.jpg \
--output watercolor.jpg
```
**With multiple images:**
```bash
python scripts/generate_image.py "combine these styles" \
--image-input https://example.com/img1.jpg \
--image-input https://example.com/img2.jpg \
--output combined.jpg
```
**Custom aspect ratio:**
```bash
python scripts/generate_image.py "a wide landscape" \
--aspect-ratio 21:9 \
--output landscape.jpg
```
**PNG output:**
```bash
python scripts/generate_image.py "transparent logo concept" \
--output-format png \
--output logo.png
```
## Important Implementation Details
### Image URL Requirements
Replicate requires images to be uploaded to their file hosting service. The script automatically handles this:
1. If the input is a URL from any domain, the script downloads and re-uploads to Replicate
2. If the input is a local file path, the script reads and uploads to Replicate
3. This ensures compatibility with the Nanobanana model's requirements
### Error Handling
The script includes comprehensive error handling:
- API authentication failures (missing or invalid `REPLICATE_API_KEY`)
- Network timeouts and connection errors
- Invalid image URLs or file paths
- Model execution failures with helpful error messages
### Output Files
Generated images are saved to the specified output path. If no output path is specified, the script saves to `./generated_image_{timestamp}.jpg` in the current directory.
### Performance Considerations
- Image generation typically takes 5-15 seconds depending on complexity
- The script uses polling with appropriate intervals (configurable)
- Progress indicators show generation status
- Timeout is set to 5 minutes by default to handle complex generations
## API Reference
For detailed information about the Nanobanana model parameters, capabilities, and best practices, refer to `references/nanobanana-api.md`.
## Environment Setup
Ensure the `REPLICATE_API_KEY` environment variable is set:
```bash
export REPLICATE_API_KEY="your-api-key-here"
```
Get an API key from https://replicate.com/account/api-tokens
## Common Pitfalls
1. **Image URLs not publicly accessible**: Ensure input images are either publicly accessible URLs or properly uploaded to Replicate's file hosting
2. **Missing API key**: The script will fail if `REPLICATE_API_KEY` is not set in the environment
3. **Aspect ratio mismatch**: When using `match_input_image`, ensure at least one input image is provided
4. **File path issues**: Always use absolute paths or properly resolve relative paths for local image files
## Best Practices
1. **Descriptive prompts**: More detailed prompts generally produce better results (e.g., "a serene mountain lake at sunset with pine trees in the foreground" vs "a lake")
2. **Appropriate aspect ratios**: Choose aspect ratios that match the intended use case
3. **Image quality**: When using image inputs, higher quality source images generally produce better results
4. **Iterative refinement**: Generate multiple variations by adjusting prompts to find the best result
5. **Output format**: Use PNG for images requiring transparency, JPG for photographs and general images (smaller file size)
Related in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.