groq-api

Included with Lifetime

$97 forever

Groq API integration for building AI-powered applications with ultra-fast LLM inference. Use when working with Groq's Chat Completions API, Python SDK (groq), TypeScript SDK (groq-sdk), tool use/function calling, vision/image processing, audio transcription with Whisper, streaming responses, text-to-speech, content moderation with Llama Guard, batch processing, or any Groq API integration task. Triggers on mentions of Groq, GroqCloud, or fast LLM inference needs.

Image & Video

What this skill does


# Groq API

Build applications with Groq's ultra-fast LLM inference (300-1000+ tokens/sec).

## Quick Start

### Installation

```bash
# Python
pip install groq

# TypeScript/JavaScript
npm install groq-sdk
```

### Environment Setup

```bash
export GROQ_API_KEY=<your-api-key>
```

### Basic Chat Completion

**Python:**
```python
from groq import Groq

client = Groq()  # Uses GROQ_API_KEY env var

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
```

**TypeScript:**
```typescript
import Groq from "groq-sdk";

const client = new Groq();

const response = await client.chat.completions.create({
    model: "llama-3.3-70b-versatile",
    messages: [{ role: "user", content: "Hello" }],
});
console.log(response.choices[0].message.content);
```

## Model Selection

| Use Case | Model | Notes |
|----------|-------|-------|
| Fast + cheap | `llama-3.1-8b-instant` | Best for simple tasks |
| Balanced | `llama-3.3-70b-versatile` | Quality/cost balance |
| Highest quality | `openai/gpt-oss-120b` | Built-in tools + reasoning |
| Agentic | `groq/compound` | Web search + code exec |
| Reasoning | `openai/gpt-oss-20b` | Fast reasoning (low/med/high) |
| Vision/OCR | `llama-4-scout-17b-16e-instruct` | Image understanding |
| Audio STT | `whisper-large-v3-turbo` | Transcription |
| TTS | `playai-tts` | Text-to-speech |

See [references/models.md](references/models.md) for full model list and pricing.

## Common Patterns

### Streaming Responses

```python
stream = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
```

### System Messages

```python
response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello"}
    ]
)
```

### Async Client (Python)

```python
import asyncio
from groq import AsyncGroq

async def main():
    client = AsyncGroq()
    response = await client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": "Hello"}]
    )
    return response.choices[0].message.content

print(asyncio.run(main()))
```

### JSON Mode

```python
response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "List 3 colors as JSON array"}],
    response_format={"type": "json_object"}
)
```

### Structured Outputs (JSON Schema)

Force output to match a schema. Two modes available:

| Mode | Guarantee | Models |
|------|-----------|--------|
| `strict: true` | 100% schema compliance | `openai/gpt-oss-20b`, `openai/gpt-oss-120b` |
| `strict: false` | Best-effort compliance | All supported models |

**Strict Mode (guaranteed compliance):**
```python
response = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[{"role": "user", "content": "Extract: John is 30 years old"}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "person",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"}
                },
                "required": ["name", "age"],
                "additionalProperties": False
            }
        }
    }
)
```

**With Pydantic:**
```python
from pydantic import BaseModel

class Person(BaseModel):
    name: str
    age: int

response = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[{"role": "user", "content": "Extract: John is 30"}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "person",
            "strict": True,
            "schema": Person.model_json_schema()
        }
    }
)
person = Person.model_validate(json.loads(response.choices[0].message.content))
```

See [references/structured-outputs.md](references/structured-outputs.md) for schema requirements, validation libraries, and examples.

## Audio

### Transcription (Speech-to-Text)

```python
with open("audio.mp3", "rb") as f:
    transcription = client.audio.transcriptions.create(
        model="whisper-large-v3-turbo",
        file=f,
        language="en",  # Optional: ISO-639-1 code
        response_format="verbose_json",  # json, text, verbose_json
        timestamp_granularities=["word", "segment"]
    )
print(transcription.text)
```

### Translation (to English)

```python
with open("french_audio.mp3", "rb") as f:
    translation = client.audio.translations.create(
        model="whisper-large-v3",
        file=f
    )
print(translation.text)  # English text
```

### Text-to-Speech

```python
response = client.audio.speech.create(
    model="playai-tts",
    input="Hello, world!",
    voice="Fritz-PlayAI",
    response_format="wav",  # flac, mp3, mulaw, ogg, wav
    speed=1.0  # 0.5 to 5
)
response.write_to_file("output.wav")
```

## Vision

Process images with Llama 4 multimodal models. Supports up to 5 images per request.

**Models:** `meta-llama/llama-4-scout-17b-16e-instruct` (faster), `meta-llama/llama-4-maverick-17b-128e-instruct` (higher quality)

### Image from URL

```python
response = client.chat.completions.create(
    model="meta-llama/llama-4-scout-17b-16e-instruct",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
        ]
    }]
)
```

### Local Image (Base64)

```python
import base64

def encode_image(path: str) -> str:
    with open(path, "rb") as f:
        return base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="meta-llama/llama-4-scout-17b-16e-instruct",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image"},
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encode_image('photo.jpg')}"}}
        ]
    }]
)
```

### OCR / Extract Data as JSON

```python
response = client.chat.completions.create(
    model="meta-llama/llama-4-scout-17b-16e-instruct",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Extract all text and data as JSON"},
            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{base64_image}"}}
        ]
    }],
    response_format={"type": "json_object"}
)
```

See [references/vision.md](references/vision.md) for multi-image, tool use with images, and multi-turn conversations.

## Tool Use

For tool calling patterns and examples, see [references/tool-use.md](references/tool-use.md).

**Quick example:**
```python
import json

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Weather in Paris?"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    for tc in response.choices[0].message.tool_calls:
        args = json.loads(tc.function.arguments)
        # Execute function and continue conversation
```

## Built-In Tools (Agentic)

Use `groq/compound` or `openai/gpt-oss-120b` for built-in web search and code execution:

```python
response = client.chat.completions.create(
    model="groq/compou

Files: 13

Size: 106.6 KB

Complexity: 59/100

Category: Image & Video

Source: https://github.com/diskd-ai/groq-api

Related in Image & Video

watch

Included

Watch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.

Image & Videoscriptsfeatured

physical-ai-defect-image-generation

Included

Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.

Image & Videoscripts

accelint-react-best-practices

Included

React performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.

Image & Videoscripts

elevenlabs-agents

Included

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication

Image & Videoscripts

humanizer

Included

Humanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.

Image & Videoscripts

generating-mermaid-diagrams

Included

Salesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.

Image & Videoscripts