Integrity Verification
Included with Lifetime
$97 forever
SHA-256 checksum manifest generation, self-verification, and PREMIS fixity patterns
media-curator
What this skill does
# Integrity Verification
Cryptographic checksum verification patterns for detecting bit rot, tampering, and transfer errors in media archives. Implements self-verifying manifests with PREMIS fixity metadata.
## Manifest Generation Script
Complete bash implementation for generating self-verifying checksum manifests:
```bash
#!/bin/bash
set -euo pipefail
# Archive Checksum Manifest Generator
# Generates self-verifying SHA-256 checksum manifest
# Usage: ./generate-checksums.sh /path/to/archive
ARCHIVE_PATH="${1:-.}"
CHECKSUM_FILE="CHECKSUMS.sha256"
TEMP_FILE="/tmp/checksums-$$.tmp"
# Validate archive exists
if [ ! -d "$ARCHIVE_PATH" ]; then
echo "Error: Archive directory not found: $ARCHIVE_PATH" >&2
exit 1
fi
cd "$ARCHIVE_PATH"
echo "Generating checksums for: $ARCHIVE_PATH"
# Find all files, exclude checksum manifest itself
# Use null-terminated strings for handling filenames with spaces
find . -type f ! -name "$CHECKSUM_FILE" -print0 | \
sort -z | \
xargs -0 sha256sum > "$TEMP_FILE"
# Count files
FILE_COUNT=$(wc -l < "$TEMP_FILE")
echo "Found $FILE_COUNT files"
# Compute manifest hash (hash of the checksum content)
MANIFEST_HASH=$(sha256sum "$TEMP_FILE" | awk '{print $1}')
# Generate timestamp (ISO 8601 UTC with nanosecond precision)
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%S.%NZ)
# Write final manifest with self-verifying header
{
echo "# MANIFEST_HASH: $MANIFEST_HASH"
echo "# Generated: $TIMESTAMP"
echo "# Verify with: tail -n +4 $CHECKSUM_FILE | sha256sum"
cat "$TEMP_FILE"
} > "$CHECKSUM_FILE"
# Clean up
rm "$TEMP_FILE"
echo "✓ Generated $CHECKSUM_FILE"
echo " Manifest hash: $MANIFEST_HASH"
echo " Timestamp: $TIMESTAMP"
echo " Files: $FILE_COUNT"
```
**Key features**:
- Handles filenames with spaces via null-terminated strings (`-print0`, `-z`, `-0`)
- Deterministic output (sorted by path)
- Self-verifying header with manifest hash
- ISO 8601 UTC timestamps with nanosecond precision
- Exit on error (`set -euo pipefail`)
## Verification Commands
### Quick Manifest Integrity Check
Verify manifest has not been tampered with (sub-second):
```bash
#!/bin/bash
# Quick verification - manifest integrity only
CHECKSUM_FILE="CHECKSUMS.sha256"
if [ ! -f "$CHECKSUM_FILE" ]; then
echo "✗ Checksum manifest not found: $CHECKSUM_FILE" >&2
exit 1
fi
# Extract expected hash from header
EXPECTED=$(grep '^# MANIFEST_HASH:' "$CHECKSUM_FILE" | awk '{print $3}')
if [ -z "$EXPECTED" ]; then
echo "✗ Manifest header missing or malformed" >&2
exit 1
fi
# Compute actual hash of manifest content (lines 4+)
ACTUAL=$(tail -n +4 "$CHECKSUM_FILE" | sha256sum | awk '{print $1}')
# Compare
if [ "$EXPECTED" = "$ACTUAL" ]; then
echo "✓ Manifest integrity verified"
echo " Hash: $EXPECTED"
exit 0
else
echo "✗ Manifest has been tampered with" >&2
echo " Expected: $EXPECTED" >&2
echo " Actual: $ACTUAL" >&2
exit 1
fi
```
**Use case**: Daily automated checks. Fast execution regardless of archive size.
**Exit codes**:
- `0` - Manifest integrity verified
- `1` - Manifest corrupted or tampered
### Full File Verification
Verify all files match their checksums:
```bash
#!/bin/bash
# Full verification - manifest integrity + all files
CHECKSUM_FILE="CHECKSUMS.sha256"
# Step 1: Verify manifest integrity
echo "Step 1: Verifying manifest integrity..."
EXPECTED=$(grep '^# MANIFEST_HASH:' "$CHECKSUM_FILE" | awk '{print $3}')
ACTUAL=$(tail -n +4 "$CHECKSUM_FILE" | sha256sum | awk '{print $1}')
if [ "$EXPECTED" != "$ACTUAL" ]; then
echo "✗ Manifest integrity check failed - stopping" >&2
exit 1
fi
echo "✓ Manifest integrity verified"
# Step 2: Verify all files
echo "Step 2: Verifying all files..."
if tail -n +4 "$CHECKSUM_FILE" | sha256sum -c; then
echo "✓ All files verified successfully"
exit 0
else
echo "✗ One or more files failed verification" >&2
exit 1
fi
```
**Output format** (from `sha256sum -c`):
```
./audio/episode-001.opus: OK
./audio/episode-002.opus: OK
./video/recording.mp4: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
```
**Exit codes**:
- `0` - All files verified successfully
- `1` - Verification failed (manifest or files)
### Quiet Mode
Show only failures:
```bash
#!/bin/bash
# Quiet verification - only show failures
CHECKSUM_FILE="CHECKSUMS.sha256"
# Quick manifest check (silent)
EXPECTED=$(grep '^# MANIFEST_HASH:' "$CHECKSUM_FILE" | awk '{print $3}')
ACTUAL=$(tail -n +4 "$CHECKSUM_FILE" | sha256sum | awk '{print $1}')
if [ "$EXPECTED" != "$ACTUAL" ]; then
echo "MANIFEST: FAILED" >&2
exit 1
fi
# Verify files (quiet mode - only show failures)
tail -n +4 "$CHECKSUM_FILE" | sha256sum -c --quiet
EXIT_CODE=$?
if [ $EXIT_CODE -eq 0 ]; then
# Silent success
exit 0
else
# sha256sum already printed failures to stderr
exit 1
fi
```
**Use case**: Cron jobs, automated monitoring, CI/CD pipelines.
**Output**: Nothing on success, only failed files on failure.
## Regeneration After Changes
Script for regenerating manifest after archive modifications:
```bash
#!/bin/bash
set -euo pipefail
# Regenerate checksum manifest after archive changes
# Usage: ./fix-checksums.sh /path/to/archive
ARCHIVE_PATH="${1:-.}"
CHECKSUM_FILE="CHECKSUMS.sha256"
BACKUP_FILE="CHECKSUMS.sha256.bak"
TEMP_FILE="/tmp/checksums-$$.tmp"
cd "$ARCHIVE_PATH"
# Backup existing manifest
if [ -f "$CHECKSUM_FILE" ]; then
cp "$CHECKSUM_FILE" "$BACKUP_FILE"
echo "Backed up existing manifest to $BACKUP_FILE"
fi
# Generate new manifest
echo "Regenerating checksums..."
find . -type f ! -name "$CHECKSUM_FILE" ! -name "$BACKUP_FILE" -print0 | \
sort -z | \
xargs -0 sha256sum > "$TEMP_FILE"
FILE_COUNT=$(wc -l < "$TEMP_FILE")
MANIFEST_HASH=$(sha256sum "$TEMP_FILE" | awk '{print $1}')
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%S.%NZ)
{
echo "# MANIFEST_HASH: $MANIFEST_HASH"
echo "# Generated: $TIMESTAMP"
echo "# Verify with: tail -n +4 $CHECKSUM_FILE | sha256sum"
cat "$TEMP_FILE"
} > "$CHECKSUM_FILE"
rm "$TEMP_FILE"
# Detect changes
if [ -f "$BACKUP_FILE" ]; then
echo ""
echo "Changes detected:"
# Extract file paths from old and new manifests
tail -n +4 "$BACKUP_FILE" | awk '{print $2}' | sort > /tmp/old-files-$$.txt
tail -n +4 "$CHECKSUM_FILE" | awk '{print $2}' | sort > /tmp/new-files-$$.txt
# Added files
ADDED=$(comm -13 /tmp/old-files-$$.txt /tmp/new-files-$$.txt)
if [ -n "$ADDED" ]; then
echo " Added:"
echo "$ADDED" | sed 's/^/ /'
fi
# Removed files
REMOVED=$(comm -23 /tmp/old-files-$$.txt /tmp/new-files-$$.txt)
if [ -n "$REMOVED" ]; then
echo " Removed:"
echo "$REMOVED" | sed 's/^/ /'
fi
# Modified files (different hash for same path)
# This requires comparing hashes, not just paths
COMMON_FILES=$(comm -12 /tmp/old-files-$$.txt /tmp/new-files-$$.txt)
if [ -n "$COMMON_FILES" ]; then
while IFS= read -r file; do
OLD_HASH=$(grep -F "$file" "$BACKUP_FILE" | awk '{print $1}')
NEW_HASH=$(grep -F "$file" "$CHECKSUM_FILE" | awk '{print $1}')
if [ "$OLD_HASH" != "$NEW_HASH" ]; then
echo " Modified: $file"
fi
done <<< "$COMMON_FILES"
fi
rm /tmp/old-files-$$.txt /tmp/new-files-$$.txt
fi
echo ""
echo "✓ Generated new $CHECKSUM_FILE"
echo " Manifest hash: $MANIFEST_HASH"
echo " Files: $FILE_COUNT"
```
**Features**:
- Backs up existing manifest to `.bak` file
- Regenerates checksums for all current files
- Reports added, removed, and modified files
- Preserves backup for comparison
## VERIFY.md Template
Human-readable instructions placed in archive root:
```markdown
# Archive Integrity Verification
This archive contains a self-verifying checksum manifest (`CHECKSUMS.sha256`) for detecting corruption, tampering, or transfer errors.
## Archive Information
- **Generated**: {TIMESTAMP}
- **Total files**: {FILE_COUNT}
- **Total size**: {TOTAL_SIZE}
- **Manifest hash**: {MANIFEST_HASH}
## Quick Verification (30 seconds)
Verify the manifest has not been tampered with:
\`\`\`bash
EXPECTED=$(greRelated in media-curator
Archive Acquisition
IncludedPatterns for acquiring content from Internet Archive and archival sources
media-curator
YouTube Acquisition
Includedyt-dlp patterns for acquiring content from YouTube and video platforms
media-curator
Cover Art Embedding
IncludedPatterns for finding, processing, and embedding cover artwork into media files
media-curator
Quality Filtering
IncludedAccept/reject logic and quality scoring heuristics for media content
media-curator
Metadata Tagging
Includedopustags and ffmpeg patterns for applying metadata to audio and video files
media-curator
Transcribe Media
IncludedProduce timestamped transcript sidecars for acquired audio/video with hashes, source metadata, speaker labels when available, and explicit degraded plans when STT tooling is missing
media-curator