ollama

Included with Lifetime

$97 forever

Use this if the user wants to connect to Ollama or leverage Ollama in any shape or form inside their project. Guide users integrating Ollama into their projects for local AI inference. Covers installation, connection setup, model management, and API usage for both Python and Node.js. Helps with text generation, chat interfaces, embeddings, streaming responses, and building AI-powered applications using local LLMs.

Backend & APIs

What this skill does


# Ollama

## Overview

This skill helps users integrate Ollama into their projects for running large language models locally. The skill guides users through setup, connection validation, model management, and API integration for both Python and Node.js applications. Ollama provides a simple API for running models like Llama, Mistral, Gemma, and others locally without cloud dependencies.

## When to Use This Skill

Use this skill when users want to:
- Run large language models locally on their machine
- Build AI-powered applications without cloud dependencies
- Implement text generation, chat, or embeddings functionality
- Stream LLM responses in real-time
- Create RAG (Retrieval-Augmented Generation) systems
- Integrate local AI capabilities into Python or Node.js projects
- Manage Ollama models (pull, list, delete)
- Validate Ollama connectivity and troubleshoot connection issues

## Installation and Setup

### Step 1: Collect Ollama URL

**IMPORTANT**: Always ask users for their Ollama URL. Do not assume it's running locally.

Ask the user: "What is your Ollama server URL?"

Common scenarios:
- **Local installation**: `http://localhost:11434` (default)
- **Remote server**: `http://192.168.1.100:11434`
- **Custom port**: `http://localhost:8080`
- **Docker**: `http://localhost:11434` (if port mapped to 11434)

If the user says they're running Ollama locally or doesn't know the URL, suggest trying `http://localhost:11434`.

### Step 2: Check if Ollama is Installed

Before proceeding, verify if Ollama is installed and running at the provided URL. Users can check by visiting the URL in their browser or running:

```bash
curl <OLLAMA_URL>/api/version
```

If Ollama is not installed, guide users to install it:

**macOS/Linux:**
```bash
curl -fsSL https://ollama.com/install.sh | sh
```

**Windows:**
Download from https://ollama.com/download

**Docker:**
```bash
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
```

### Step 3: Start Ollama Service

Ensure Ollama is running:

**macOS/Linux:**
```bash
ollama serve
```

**Docker:**
```bash
docker start ollama
```

The service typically runs at `http://localhost:11434` by default.

### Step 4: Validate Connection

Use the validation script to test connectivity and list available models.

**IMPORTANT**: The script path is relative to the skill directory. When running the script, either:
1. Use the full path from the skill directory (e.g., `/path/to/ollama/scripts/validate_connection.py`)
2. Change to the skill directory first and then run `python scripts/validate_connection.py`

```bash
# Run from the skill directory
cd /path/to/ollama
python scripts/validate_connection.py <OLLAMA_URL>
```

Example with the user's Ollama URL:
```bash
cd /path/to/ollama
python scripts/validate_connection.py http://192.168.1.100:11434
```

The script will:
- Normalize the URL (remove any path components)
- Check if Ollama is accessible
- Display the Ollama version
- List all installed models with sizes
- Provide troubleshooting guidance if connection fails

**Success output:**
```
✓ Connection successful!
  URL: http://localhost:11434
  Version: Ollama 0.1.0
  Models available: 2

Installed models:
  - llama3.2 (4.7 GB)
  - mistral (7.2 GB)
```

**Failure output:**
```
✗ Connection failed: Connection refused
  URL: http://localhost:11434

Troubleshooting:
  1. Ensure Ollama is installed and running
  2. Check that the URL is correct
  3. Verify Ollama is accessible at the specified URL
  4. Try: curl http://localhost:11434/api/version
```

## Model Management

### Pulling Models

Help users download models from the Ollama library. Common models include:

- `llama3.2` - Meta's Llama 3.2 (various sizes: 1B, 3B)
- `llama3.1` - Meta's Llama 3.1 (8B, 70B, 405B)
- `mistral` - Mistral 7B
- `phi3` - Microsoft Phi-3
- `gemma2` - Google Gemma 2

Users can pull models using:
```bash
ollama pull llama3.2
```

Or programmatically using the API (examples in reference docs).

### Listing Models

Guide users to list installed models:
```bash
ollama list
```

Or use the validation script to see models with detailed information.

### Removing Models

Help users delete models to free space:
```bash
ollama rm llama3.2
```

### Model Selection Guidance

Help users choose appropriate models based on their needs:

- **Small models (1-3B)**: Fast, good for simple tasks, lower resource requirements
- **Medium models (7-13B)**: Balanced performance and quality
- **Large models (70B+)**: Best quality, require significant resources

## Implementation Guidance

### Python Projects

For Python-based projects, refer to the Python API reference:

- **File**: `references/python_api.md`
- **Usage**: Load this reference when implementing Python integrations
- **Contains**:
  - REST API examples using `urllib.request` (standard library)
  - Text generation with the Generate API
  - Conversational interfaces with the Chat API
  - **Streaming responses for real-time output (RECOMMENDED)**
  - Embeddings for semantic search
  - Complete RAG system example
  - Error handling patterns
  - PEP 723 inline script metadata for dependencies
- **No dependencies required**: Uses only Python standard library

**IMPORTANT**: When creating Python scripts for users, include PEP 723 inline script metadata to declare dependencies. See the reference docs for examples.

**DEFAULT TO STREAMING**: When implementing text generation or chat, use streaming responses unless the user explicitly requests non-streaming.

Common Python use cases:
```python
# Streaming text generation (RECOMMENDED)
for token in generate_stream("Explain quantum computing"):
    print(token, end="", flush=True)

# Streaming chat conversation (RECOMMENDED)
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
]
for token in chat_stream(messages):
    print(token, end="", flush=True)

# Non-streaming (use only when needed)
response = generate("Explain quantum computing")

# Embeddings for semantic search
embedding = get_embeddings("Hello, world!")
```

### Node.js Projects

For Node.js-based projects, refer to the Node.js API reference:

- **File**: `references/nodejs_api.md`
- **Usage**: Load this reference when implementing Node.js integrations
- **Contains**:
  - Official `ollama` npm package examples
  - Alternative Fetch API examples (Node.js 18+)
  - Text generation and chat APIs
  - **Streaming with async iterators (RECOMMENDED)**
  - Embeddings and semantic similarity
  - Complete RAG system example
  - Error handling and retry logic
  - TypeScript support examples

Installation:
```bash
npm install ollama
```

**DEFAULT TO STREAMING**: When implementing text generation or chat, use streaming responses unless the user explicitly requests non-streaming.

Common Node.js use cases:
```javascript
import { Ollama } from 'ollama';
const ollama = new Ollama();

// Streaming text generation (RECOMMENDED)
const stream = await ollama.generate({
  model: 'llama3.2',
  prompt: 'Explain quantum computing',
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.response);
}

// Streaming chat conversation (RECOMMENDED)
const chatStream = await ollama.chat({
  model: 'llama3.2',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is the capital of France?' }
  ],
  stream: true
});

for await (const chunk of chatStream) {
  process.stdout.write(chunk.message.content);
}

// Non-streaming (use only when needed)
const response = await ollama.generate({
  model: 'llama3.2',
  prompt: 'Explain quantum computing'
});

// Embeddings
const embedding = await ollama.embeddings({
  model: 'llama3.2',
  prompt: 'Hello, world!'
});
```

## Common Integration Patterns

### Text Generation

Generate text completions from prompts. Use cases:
- Content generation
- Code completion
- Question answering
- Summarization

Guide

Files: 3

Size: 35.7 KB

Complexity: 47/100

Category: Backend & APIs

Source: https://github.com/balloob/llm-skills/tree/main/ollama

Related in Backend & APIs

jfrog

Included

Interact with the JFrog Platform via the JFrog CLI and REST/GraphQL APIs. Use this skill when the user wants to manage Artifactory repositories, upload or download artifacts, manage builds, configure permissions, manage users and groups, work with access tokens, configure JFrog CLI servers, search artifacts, manage properties, set up replication, manage JFrog Projects, run security audits or scans, look up CVE details, query exposures scan results from JFrog Advanced Security, manage release bundles and lifecycle operations, aggregate or export platform data, or perform any JFrog Platform administration task. Also use when the user mentions jf, jfrog, artifactory, xray, distribution, evidence, apptrust, onemodel, graphql, workers, mission control, curation, advanced security, exposures, or any JFrog product name.

Backend & APIsscripts

cupynumeric-migration-readiness

Included

Pre-migration readiness assessor for porting NumPy to cuPyNumeric. Use BEFORE substantial porting work begins when the user asks whether code will scale on GPU, whether they should migrate to cuPyNumeric, which NumPy patterns transfer cleanly, what must be refactored before porting, or mentions pre-port assessment, scaling analysis, or refactor planning. Inspect the user's source code, look up NumPy usage, cross-reference the cuPyNumeric API support manifest, and distinguish distributed-scaling-friendly patterns from blockers such as unsupported APIs, scalar synchronization, host round-trips, Python/object-heavy control flow, shape/data-dependent branching, and in-place mutation hazards. Produce a verdict of READY, LIGHT REFACTOR, SIGNIFICANT REFACTOR, or NOT RECOMMENDED, with concrete refactor pointers.

Backend & APIsscripts

alibabacloud-data-agent-skill

Included

Invoke Alibaba Cloud Apsara Data Agent for Analytics via CLI to perform natural language-driven data analysis on enterprise databases. Data Agent for Analytics is an intelligent data analysis agent developed by Alibaba Cloud Database team for enterprise users. It automatically completes requirement analysis, data understanding, analysis insights, and report generation based on natural language descriptions. This tool supports: discovering data resources (instances/databases/tables) managed in DMS, initiating query or deep analysis sessions, real-time progress tracking, and retrieving analysis conclusions and generated reports. Use this Skill when users need to query databases, analyze data trends, generate data reports, ask questions in natural language, or mention "Data Agent", "data analysis", "database query", "SQL analysis", "data insights".

Backend & APIsscripts

token-optimizer

Included

Reduce OpenClaw token usage and API costs through smart model routing, heartbeat optimization, budget tracking, and native 2026.2.15 features (session pruning, bootstrap size limits, cache TTL alignment). Use when token costs are high, API rate limits are being hit, or hosting multiple agents at scale. The 4 executable scripts (context_optimizer, model_router, heartbeat_optimizer, token_tracker) are local-only — no network requests, no subprocess calls, no system modifications. Reference files (PROVIDERS.md, config-patches.json) document optional multi-provider strategies that require external API keys and network access if you choose to use them. See SECURITY.md for full breakdown.

Backend & APIsscripts

resend-cli

Included

Use this skill when the task is specifically about operating Resend from an AI agent, terminal session, or CI job via the official resend CLI: installing/authenticating the CLI, sending/listing/updating/cancelling emails, batch sends, domains and DNS, webhooks and local listeners, inbound receiving, contacts, topics, segments, broadcasts, templates, API keys, profiles, or debugging Resend CLI/API failures. Trigger on mentions of Resend CLI, `resend`, `resend doctor`, `resend emails send`, `resend domains`, `resend webhooks listen`, `resend emails receiving`, or agent-friendly terminal automation.

Backend & APIsscripts

alibabacloud-odps-maxframe-coding

Included

Use this skill for MaxFrame SDK development and documentation navigation on Alibaba Cloud MaxCompute (ODPS). Helps answer MaxFrame API, concept, official example, and supported pandas API questions; create data processing programs; read/write MaxCompute tables; debug jobs (remote or local); and build custom DPE runtime images. Trigger when users mention MaxFrame, MaxCompute with MaxFrame, ODPS table processing, DPE runtime, MaxFrame docs/examples, DataFrame/Tensor operations, or GPU runtime setup. Works for both English and Chinese queries about Alibaba Cloud data processing with MaxFrame.

Backend & APIsscripts