Claude
Skills
Sign in
Back

groq-api

Included with Lifetime
$97 forever

Groq API integration for building AI-powered applications with ultra-fast LLM inference. Use when working with Groq's Chat Completions API, Python SDK (groq), TypeScript SDK (groq-sdk), tool use/function calling, vision/image processing, audio transcription with Whisper, streaming responses, text-to-speech, content moderation with Llama Guard, batch processing, or any Groq API integration task. Triggers on mentions of Groq, GroqCloud, or fast LLM inference needs.

Image & Video

What this skill does


# Groq API

Build applications with Groq's ultra-fast LLM inference (300-1000+ tokens/sec).

## Quick Start

### Installation

```bash
# Python
pip install groq

# TypeScript/JavaScript
npm install groq-sdk
```

### Environment Setup

```bash
export GROQ_API_KEY=<your-api-key>
```

### Basic Chat Completion

**Python:**
```python
from groq import Groq

client = Groq()  # Uses GROQ_API_KEY env var

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
```

**TypeScript:**
```typescript
import Groq from "groq-sdk";

const client = new Groq();

const response = await client.chat.completions.create({
    model: "llama-3.3-70b-versatile",
    messages: [{ role: "user", content: "Hello" }],
});
console.log(response.choices[0].message.content);
```

## Model Selection

| Use Case | Model | Notes |
|----------|-------|-------|
| Fast + cheap | `llama-3.1-8b-instant` | Best for simple tasks |
| Balanced | `llama-3.3-70b-versatile` | Quality/cost balance |
| Highest quality | `openai/gpt-oss-120b` | Built-in tools + reasoning |
| Agentic | `groq/compound` | Web search + code exec |
| Reasoning | `openai/gpt-oss-20b` | Fast reasoning (low/med/high) |
| Vision/OCR | `llama-4-scout-17b-16e-instruct` | Image understanding |
| Audio STT | `whisper-large-v3-turbo` | Transcription |
| TTS | `playai-tts` | Text-to-speech |

See [references/models.md](references/models.md) for full model list and pricing.

## Common Patterns

### Streaming Responses

```python
stream = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
```

### System Messages

```python
response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello"}
    ]
)
```

### Async Client (Python)

```python
import asyncio
from groq import AsyncGroq

async def main():
    client = AsyncGroq()
    response = await client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": "Hello"}]
    )
    return response.choices[0].message.content

print(asyncio.run(main()))
```

### JSON Mode

```python
response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "List 3 colors as JSON array"}],
    response_format={"type": "json_object"}
)
```

### Structured Outputs (JSON Schema)

Force output to match a schema. Two modes available:

| Mode | Guarantee | Models |
|------|-----------|--------|
| `strict: true` | 100% schema compliance | `openai/gpt-oss-20b`, `openai/gpt-oss-120b` |
| `strict: false` | Best-effort compliance | All supported models |

**Strict Mode (guaranteed compliance):**
```python
response = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[{"role": "user", "content": "Extract: John is 30 years old"}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "person",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"}
                },
                "required": ["name", "age"],
                "additionalProperties": False
            }
        }
    }
)
```

**With Pydantic:**
```python
from pydantic import BaseModel

class Person(BaseModel):
    name: str
    age: int

response = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[{"role": "user", "content": "Extract: John is 30"}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "person",
            "strict": True,
            "schema": Person.model_json_schema()
        }
    }
)
person = Person.model_validate(json.loads(response.choices[0].message.content))
```

See [references/structured-outputs.md](references/structured-outputs.md) for schema requirements, validation libraries, and examples.

## Audio

### Transcription (Speech-to-Text)

```python
with open("audio.mp3", "rb") as f:
    transcription = client.audio.transcriptions.create(
        model="whisper-large-v3-turbo",
        file=f,
        language="en",  # Optional: ISO-639-1 code
        response_format="verbose_json",  # json, text, verbose_json
        timestamp_granularities=["word", "segment"]
    )
print(transcription.text)
```

### Translation (to English)

```python
with open("french_audio.mp3", "rb") as f:
    translation = client.audio.translations.create(
        model="whisper-large-v3",
        file=f
    )
print(translation.text)  # English text
```

### Text-to-Speech

```python
response = client.audio.speech.create(
    model="playai-tts",
    input="Hello, world!",
    voice="Fritz-PlayAI",
    response_format="wav",  # flac, mp3, mulaw, ogg, wav
    speed=1.0  # 0.5 to 5
)
response.write_to_file("output.wav")
```

## Vision

Process images with Llama 4 multimodal models. Supports up to 5 images per request.

**Models:** `meta-llama/llama-4-scout-17b-16e-instruct` (faster), `meta-llama/llama-4-maverick-17b-128e-instruct` (higher quality)

### Image from URL

```python
response = client.chat.completions.create(
    model="meta-llama/llama-4-scout-17b-16e-instruct",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
        ]
    }]
)
```

### Local Image (Base64)

```python
import base64

def encode_image(path: str) -> str:
    with open(path, "rb") as f:
        return base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="meta-llama/llama-4-scout-17b-16e-instruct",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image"},
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encode_image('photo.jpg')}"}}
        ]
    }]
)
```

### OCR / Extract Data as JSON

```python
response = client.chat.completions.create(
    model="meta-llama/llama-4-scout-17b-16e-instruct",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Extract all text and data as JSON"},
            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{base64_image}"}}
        ]
    }],
    response_format={"type": "json_object"}
)
```

See [references/vision.md](references/vision.md) for multi-image, tool use with images, and multi-turn conversations.

## Tool Use

For tool calling patterns and examples, see [references/tool-use.md](references/tool-use.md).

**Quick example:**
```python
import json

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Weather in Paris?"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    for tc in response.choices[0].message.tool_calls:
        args = json.loads(tc.function.arguments)
        # Execute function and continue conversation
```

## Built-In Tools (Agentic)

Use `groq/compound` or `openai/gpt-oss-120b` for built-in web search and code execution:

```python
response = client.chat.completions.create(
    model="groq/compou
Files: 13
Size: 106.6 KB
Complexity: 59/100
Category: Image & Video

Related in Image & Video