Claude
Skills
Sign in
Back

object-store-best-practices

Included with Lifetime
$97 forever

Ensures proper cloud storage operations with retry logic, error handling, streaming, and efficient I/O patterns. Activates when users work with object_store for S3, Azure, or GCS operations.

Cloud & DevOps

What this skill does


# Object Store Best Practices Skill

You are an expert at implementing robust cloud storage operations using the object_store crate. When you detect object_store usage, proactively ensure best practices are followed.

## When to Activate

Activate this skill when you notice:
- Code using `ObjectStore` trait, `AmazonS3Builder`, `MicrosoftAzureBuilder`, or `GoogleCloudStorageBuilder`
- Discussion about S3, Azure Blob, or GCS operations
- Issues with cloud storage reliability, performance, or errors
- File uploads, downloads, or listing operations
- Questions about retry logic, error handling, or streaming

## Best Practices Checklist

### 1. Retry Configuration

**What to Look For**:
- Missing retry logic for production code
- Default settings without explicit retry configuration

**Good Pattern**:
```rust
use object_store::aws::AmazonS3Builder;
use object_store::RetryConfig;

let s3 = AmazonS3Builder::new()
    .with_region("us-east-1")
    .with_bucket_name("my-bucket")
    .with_retry(RetryConfig {
        max_retries: 3,
        retry_timeout: Duration::from_secs(10),
        ..Default::default()
    })
    .build()?;
```

**Bad Pattern**:
```rust
// No retry configuration - fails on transient errors
let s3 = AmazonS3Builder::new()
    .with_region("us-east-1")
    .with_bucket_name("my-bucket")
    .build()?;
```

**Suggestion**:
```
Cloud storage operations need retry logic for production resilience.
Add retry configuration to handle transient failures:

.with_retry(RetryConfig {
    max_retries: 3,
    retry_timeout: Duration::from_secs(10),
    ..Default::default()
})

This handles 503 SlowDown, network timeouts, and temporary outages.
```

### 2. Error Handling

**What to Look For**:
- Using `unwrap()` or `expect()` on storage operations
- Not handling specific error types
- Missing context in error propagation

**Good Pattern**:
```rust
use object_store::Error as ObjectStoreError;
use thiserror::Error;

#[derive(Error, Debug)]
enum StorageError {
    #[error("Object store error: {0}")]
    ObjectStore(#[from] ObjectStoreError),

    #[error("File not found: {path}")]
    NotFound { path: String },

    #[error("Access denied: {path}")]
    PermissionDenied { path: String },
}

async fn read_file(store: &dyn ObjectStore, path: &Path) -> Result<Bytes, StorageError> {
    match store.get(path).await {
        Ok(result) => Ok(result.bytes().await?),
        Err(ObjectStoreError::NotFound { path, .. }) => {
            Err(StorageError::NotFound { path: path.to_string() })
        }
        Err(e) => Err(e.into()),
    }
}
```

**Bad Pattern**:
```rust
let data = store.get(&path).await.unwrap();  // Crashes on errors!
```

**Suggestion**:
```
Avoid unwrap() on storage operations. Use proper error handling:

match store.get(&path).await {
    Ok(result) => { /* handle success */ }
    Err(ObjectStoreError::NotFound { .. }) => { /* handle missing file */ }
    Err(e) => { /* handle other errors */ }
}

Or use thiserror for better error types.
```

### 3. Streaming Large Objects

**What to Look For**:
- Loading entire files into memory with `.bytes().await`
- Not using streaming for large files (>100MB)

**Good Pattern (Streaming)**:
```rust
use futures::stream::StreamExt;

let result = store.get(&path).await?;
let mut stream = result.into_stream();

while let Some(chunk) = stream.next().await {
    let chunk = chunk?;
    // Process chunk incrementally
    process_chunk(chunk)?;
}
```

**Bad Pattern (Loading to Memory)**:
```rust
let result = store.get(&path).await?;
let bytes = result.bytes().await?;  // Loads entire file!
```

**Suggestion**:
```
For files >100MB, use streaming to avoid memory issues:

let mut stream = store.get(&path).await?.into_stream();
while let Some(chunk) = stream.next().await {
    let chunk = chunk?;
    process_chunk(chunk)?;
}

This processes data incrementally without loading everything into memory.
```

### 4. Multipart Upload for Large Files

**What to Look For**:
- Using `put()` for large files (>100MB)
- Missing multipart upload for big data

**Good Pattern**:
```rust
async fn upload_large_file(
    store: &dyn ObjectStore,
    path: &Path,
    data: impl Stream<Item = Bytes>,
) -> Result<()> {
    let multipart = store.put_multipart(path).await?;

    let mut stream = data;
    while let Some(chunk) = stream.next().await {
        multipart.put_part(chunk).await?;
    }

    multipart.complete().await?;
    Ok(())
}
```

**Bad Pattern**:
```rust
// Inefficient for large files
let large_data = vec![0u8; 1_000_000_000];  // 1GB
store.put(path, large_data.into()).await?;
```

**Suggestion**:
```
For files >100MB, use multipart upload for better reliability:

let multipart = store.put_multipart(&path).await?;
for chunk in chunks {
    multipart.put_part(chunk).await?;
}
multipart.complete().await?;

Benefits:
- Resume failed uploads
- Better memory efficiency
- Improved reliability
```

### 5. Efficient Listing

**What to Look For**:
- Not using prefixes for listing
- Loading all results without pagination
- Not filtering on client side

**Good Pattern**:
```rust
use futures::stream::StreamExt;

// List with prefix
let prefix = Some(&Path::from("data/2024/"));
let mut list = store.list(prefix);

while let Some(meta) = list.next().await {
    let meta = meta?;
    if should_process(&meta) {
        process_object(&meta).await?;
    }
}
```

**Better Pattern with Filtering**:
```rust
let prefix = Some(&Path::from("data/2024/01/"));
let list = store.list(prefix);

let filtered = list.filter(|result| {
    future::ready(match result {
        Ok(meta) => meta.location.as_ref().ends_with(".parquet"),
        Err(_) => true,
    })
});

futures::pin_mut!(filtered);
while let Some(meta) = filtered.next().await {
    let meta = meta?;
    process_object(&meta).await?;
}
```

**Bad Pattern**:
```rust
// Lists entire bucket!
let all_objects: Vec<_> = store.list(None).collect().await;
```

**Suggestion**:
```
Use prefixes to limit LIST operations and reduce cost:

let prefix = Some(&Path::from("data/2024/01/"));
let mut list = store.list(prefix);

This is especially important for buckets with millions of objects.
```

### 6. Atomic Writes with Rename

**What to Look For**:
- Writing directly to final location
- Risk of partial writes visible to readers

**Good Pattern**:
```rust
async fn atomic_write(
    store: &dyn ObjectStore,
    final_path: &Path,
    data: Bytes,
) -> Result<()> {
    // Write to temp location
    let temp_path = Path::from(format!("{}.tmp", final_path));
    store.put(&temp_path, data).await?;

    // Atomic rename
    store.rename(&temp_path, final_path).await?;

    Ok(())
}
```

**Bad Pattern**:
```rust
// Readers might see partial data during write
store.put(&path, data).await?;
```

**Suggestion**:
```
Use temp + rename for atomic writes:

let temp_path = Path::from(format!("{}.tmp", path));
store.put(&temp_path, data).await?;
store.rename(&temp_path, path).await?;

This prevents readers from seeing partial/corrupted data.
```

### 7. Connection Pooling

**What to Look For**:
- Creating new client for each operation
- Not configuring connection limits

**Good Pattern**:
```rust
use object_store::ClientOptions;

let s3 = AmazonS3Builder::new()
    .with_client_options(ClientOptions::new()
        .with_timeout(Duration::from_secs(30))
        .with_connect_timeout(Duration::from_secs(5))
        .with_pool_max_idle_per_host(10)
    )
    .build()?;

// Reuse this store across operations
let store: Arc<dyn ObjectStore> = Arc::new(s3);
```

**Bad Pattern**:
```rust
// Creating new store for each operation
for file in files {
    let s3 = AmazonS3Builder::new().build()?;
    upload(s3, file).await?;
}
```

**Suggestion**:
```
Configure connection pooling and reuse the ObjectStore:

let store: Arc<dyn ObjectStore> = Arc::new(s3);

// Clone Arc to share across threads
let store_clone = store.clone();
tokio::spawn(async move {
    upload(store_clone, file).await
});
```

### 8. Environment-Based Configu

Related in Cloud & DevOps