groq-api
Groq API integration for building AI-powered applications with ultra-fast LLM inference. Use when working with Groq's Chat Completions API, Python SDK (groq), TypeScript SDK (groq-sdk), tool use/function calling, vision/image processing, audio transcription with Whisper, streaming responses, text-to-speech, content moderation with Llama Guard, batch processing, or any Groq API integration task. Triggers on mentions of Groq, GroqCloud, or fast LLM inference needs.
What this skill does
# Groq API
Build applications with Groq's ultra-fast LLM inference (300-1000+ tokens/sec).
## Quick Start
### Installation
```bash
# Python
pip install groq
# TypeScript/JavaScript
npm install groq-sdk
```
### Environment Setup
```bash
export GROQ_API_KEY=<your-api-key>
```
### Basic Chat Completion
**Python:**
```python
from groq import Groq
client = Groq() # Uses GROQ_API_KEY env var
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
```
**TypeScript:**
```typescript
import Groq from "groq-sdk";
const client = new Groq();
const response = await client.chat.completions.create({
model: "llama-3.3-70b-versatile",
messages: [{ role: "user", content: "Hello" }],
});
console.log(response.choices[0].message.content);
```
## Model Selection
| Use Case | Model | Notes |
|----------|-------|-------|
| Fast + cheap | `llama-3.1-8b-instant` | Best for simple tasks |
| Balanced | `llama-3.3-70b-versatile` | Quality/cost balance |
| Highest quality | `openai/gpt-oss-120b` | Built-in tools + reasoning |
| Agentic | `groq/compound` | Web search + code exec |
| Reasoning | `openai/gpt-oss-20b` | Fast reasoning (low/med/high) |
| Vision/OCR | `llama-4-scout-17b-16e-instruct` | Image understanding |
| Audio STT | `whisper-large-v3-turbo` | Transcription |
| TTS | `playai-tts` | Text-to-speech |
See [references/models.md](references/models.md) for full model list and pricing.
## Common Patterns
### Streaming Responses
```python
stream = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
```
### System Messages
```python
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello"}
]
)
```
### Async Client (Python)
```python
import asyncio
from groq import AsyncGroq
async def main():
client = AsyncGroq()
response = await client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Hello"}]
)
return response.choices[0].message.content
print(asyncio.run(main()))
```
### JSON Mode
```python
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "List 3 colors as JSON array"}],
response_format={"type": "json_object"}
)
```
### Structured Outputs (JSON Schema)
Force output to match a schema. Two modes available:
| Mode | Guarantee | Models |
|------|-----------|--------|
| `strict: true` | 100% schema compliance | `openai/gpt-oss-20b`, `openai/gpt-oss-120b` |
| `strict: false` | Best-effort compliance | All supported models |
**Strict Mode (guaranteed compliance):**
```python
response = client.chat.completions.create(
model="openai/gpt-oss-20b",
messages=[{"role": "user", "content": "Extract: John is 30 years old"}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "person",
"strict": True,
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "age"],
"additionalProperties": False
}
}
}
)
```
**With Pydantic:**
```python
from pydantic import BaseModel
class Person(BaseModel):
name: str
age: int
response = client.chat.completions.create(
model="openai/gpt-oss-20b",
messages=[{"role": "user", "content": "Extract: John is 30"}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "person",
"strict": True,
"schema": Person.model_json_schema()
}
}
)
person = Person.model_validate(json.loads(response.choices[0].message.content))
```
See [references/structured-outputs.md](references/structured-outputs.md) for schema requirements, validation libraries, and examples.
## Audio
### Transcription (Speech-to-Text)
```python
with open("audio.mp3", "rb") as f:
transcription = client.audio.transcriptions.create(
model="whisper-large-v3-turbo",
file=f,
language="en", # Optional: ISO-639-1 code
response_format="verbose_json", # json, text, verbose_json
timestamp_granularities=["word", "segment"]
)
print(transcription.text)
```
### Translation (to English)
```python
with open("french_audio.mp3", "rb") as f:
translation = client.audio.translations.create(
model="whisper-large-v3",
file=f
)
print(translation.text) # English text
```
### Text-to-Speech
```python
response = client.audio.speech.create(
model="playai-tts",
input="Hello, world!",
voice="Fritz-PlayAI",
response_format="wav", # flac, mp3, mulaw, ogg, wav
speed=1.0 # 0.5 to 5
)
response.write_to_file("output.wav")
```
## Vision
Process images with Llama 4 multimodal models. Supports up to 5 images per request.
**Models:** `meta-llama/llama-4-scout-17b-16e-instruct` (faster), `meta-llama/llama-4-maverick-17b-128e-instruct` (higher quality)
### Image from URL
```python
response = client.chat.completions.create(
model="meta-llama/llama-4-scout-17b-16e-instruct",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]
}]
)
```
### Local Image (Base64)
```python
import base64
def encode_image(path: str) -> str:
with open(path, "rb") as f:
return base64.b64encode(f.read()).decode("utf-8")
response = client.chat.completions.create(
model="meta-llama/llama-4-scout-17b-16e-instruct",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image"},
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encode_image('photo.jpg')}"}}
]
}]
)
```
### OCR / Extract Data as JSON
```python
response = client.chat.completions.create(
model="meta-llama/llama-4-scout-17b-16e-instruct",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Extract all text and data as JSON"},
{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{base64_image}"}}
]
}],
response_format={"type": "json_object"}
)
```
See [references/vision.md](references/vision.md) for multi-image, tool use with images, and multi-turn conversations.
## Tool Use
For tool calling patterns and examples, see [references/tool-use.md](references/tool-use.md).
**Quick example:**
```python
import json
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"]
}
}
}]
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Weather in Paris?"}],
tools=tools
)
if response.choices[0].message.tool_calls:
for tc in response.choices[0].message.tool_calls:
args = json.loads(tc.function.arguments)
# Execute function and continue conversation
```
## Built-In Tools (Agentic)
Use `groq/compound` or `openai/gpt-oss-120b` for built-in web search and code execution:
```python
response = client.chat.completions.create(
model="groq/compouRelated in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.