sec-edgar-pipeline
SEC EDGAR extraction pipeline: setup, filing discovery by CIK, recipe-driven extraction, and report generation.
What this skill does
# SEC EDGAR Pipeline
## Overview
This pipeline is centered on `edgar-analyzer` and the EDGAR data sources. The core loop is: configure credentials, create a project with examples, analyze patterns, generate code, run extraction, and export reports.
## Setup (Keys + User Agent)
Use the setup wizard to configure required keys:
```bash
python -m edgar_analyzer setup
# or
edgar-analyzer setup
```
Required entries:
- `OPENROUTER_API_KEY`
- (Optional) `JINA_API_KEY`
- `EDGAR` user agent string ("Name [email protected]")
## End-to-End CLI Workflow
```bash
# 1. Create project
edgar-analyzer project create my_project --template minimal
# 2. Add examples + project.yaml
# projects/my_project/examples/*.json
# 3. Analyze examples
edgar-analyzer analyze-project projects/my_project
# 4. Generate extraction code
edgar-analyzer generate-code projects/my_project
# 5. Run extraction
edgar-analyzer run-extraction projects/my_project --output-format csv
```
Outputs land in `projects/<name>/output/`.
## EDGAR-Specific Conventions
- **CIK** values are 10-digit, zero-padded (e.g., `0000320193`).
- **Rate limit**: SEC API allows 10 requests/sec. Scripts use ~0.11s delays.
- **User agent** is mandatory; include name + email.
## Scripted Example (Apple DEF 14A)
`edgar/scripts/fetch_apple_def14a.py` shows the direct flow:
1. Fetch latest DEF 14A metadata
2. Download HTML
3. Parse Summary Compensation Table (SCT)
4. Save raw HTML + extracted JSON + ground truth
## Recipe-Driven Extraction
`edgar/recipes/sct_extraction/config.yaml` defines a multi-step pipeline:
- Fetch DEF 14A filings by company list
- Extract SCT tables with `SCTAdapter`
- Validate with `sct_validator`
- Write results to `output/sct`
## Report Generation
`edgar/scripts/create_csv_reports.py` converts JSON results into:
- `executive_compensation_<timestamp>.csv`
- `top_25_executives_<timestamp>.csv`
- `company_summary_<timestamp>.csv`
## Troubleshooting
- **No filings found**: confirm CIK formatting and filing type (DEF 14A vs DEF 14A/A).
- **API errors**: slow down requests and confirm user-agent is set.
- **Extraction errors**: regenerate code or use manual ground truth in POC scripts.
## Related Skills
- `universal/data/reporting-pipelines`
- `toolchains/python/testing/pytest`
Related in universal
headlessui
IncludedHeadless UI - Unstyled, fully accessible UI components for React and Vue with built-in ARIA patterns
mpm-orchestration-demo
IncludedReference implementation demonstrating the Command → Agent → Skill orchestration pattern in Claude MPM, showing both preloaded-skill and dynamic-skill-invocation styles
kubernetes
IncludedKubernetes operations playbook for deploying services: core objects, probes, resource sizing, safe rollouts, and fast kubectl debugging
opentelemetry
IncludedOpenTelemetry observability patterns: traces, metrics, logs, context propagation, OTLP export, Collector pipelines, and troubleshooting
threat-modeling
IncludedThreat modeling workflow for software systems: scope, data flow diagrams, STRIDE analysis, risk scoring, and turning mitigations into backlog and tests
terraform
IncludedTerraform infrastructure-as-code workflow patterns: state and environments, module design, safe plan/apply, drift control, and CI guardrails