test-trading-strategies
Backtest trading strategies on historical data and interpret performance metrics. Provides run_backtest (crypto strategies) and run_prediction_market_backtest (Polymarket strategies). Fast execution (20-60s), minimal cost ($0.001). Returns Sharpe ratio, max drawdown, win rate, profit factor, and trade statistics. Use this skill after building or improving strategies to validate performance before deploying. NEVER deploy without thorough backtesting (6+ months recommended).
What this skill does
# Test Trading Strategies
## Quick Start
This skill validates strategy performance on historical data before risking real capital. Testing is fast (20-60s), cheap ($0.001), and essential for safe trading.
**Load the tools first**:
```
Use MCPSearch to select: mcp__workbench__run_backtest
Use MCPSearch to select: mcp__workbench__get_latest_backtest_results
```
**Basic backtest**:
```
run_backtest(
strategy_name="MyStrategy",
start_date="2024-01-01",
end_date="2024-12-31",
symbol="BTC-USDT",
timeframe="1h"
)
```
Returns performance metrics in 20-40 seconds:
- Sharpe ratio: 1.4 (good risk-adjusted return)
- Max drawdown: 12% (moderate risk)
- Win rate: 52% (realistic)
- Profit factor: 1.8 (profitable)
**When to use this skill**:
- After building new strategy (validate it works)
- After improving strategy (confirm improvement)
- Before deploying to live trading (ALWAYS)
- Comparing multiple strategy versions
- Testing parameter variations
**Critical rule**: NEVER deploy without backtesting 6+ months of data
## Available Tools (3)
### run_backtest
**Purpose**: Test crypto trading strategy performance on historical data
**Parameters**:
- `strategy_name` (required): Strategy to test
- `start_date` (required): Start date (YYYY-MM-DD)
- `end_date` (required): End date (YYYY-MM-DD)
- `symbol` (required): Trading pair (e.g., "BTC-USDT")
- `timeframe` (required): Timeframe (1m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 8h, 12h, 1d)
- `config` (optional): Backtest configuration object:
- `fee`: Trading fee per side (default: 0.0005 = 0.05%)
- `slippage`: Slippage per trade (default: 0.0005 = 0.05%)
- `leverage`: Position multiplier (default: 1, max: 5)
**Returns**: Performance metrics:
- **Net profit**: Total profit/loss in USDC
- **Total return**: Percentage return
- **Annual return**: Annualized return percentage
- **Sharpe ratio**: Risk-adjusted return (industry standard metric)
- **Max drawdown**: Largest peak-to-trough decline
- **Win rate**: Percentage of profitable trades
- **Profit factor**: Gross profit / gross loss
- **Trade statistics**: Total trades, average trade duration, consecutive losses
- **Equity curve**: Balance over time (for visualization)
**Pricing**: $0.001 (essentially free)
**Execution Time**: ~20-40 seconds
**Use when**: Testing crypto perpetual strategies on Hyperliquid
### run_prediction_market_backtest
**Purpose**: Test Polymarket prediction market strategy on historical data
**Parameters**:
- `strategy_name` (required): PolymarketStrategy to test
- `start_date` (required): Start date (YYYY-MM-DD)
- `end_date` (required): End date (YYYY-MM-DD)
- `condition_id` (for single market): Specific Polymarket condition ID
- `asset` (for rolling markets): Asset symbol ("BTC", "ETH")
- `interval` (for rolling markets): Market interval ("15m", "1h")
- `initial_balance` (optional): Starting USDC (default: 10000)
- `timeframe` (optional): Execution timeframe (default: 1m)
**Returns**: Backtest metrics:
- Profit/loss
- Win rate
- Position history for YES/NO tokens
- Market resolution outcomes
**Pricing**: $0.001
**Execution Time**: ~20-60 seconds
**Use when**: Testing Polymarket prediction market strategies
### get_latest_backtest_results
**Purpose**: View recent backtest results without re-running
**Parameters**:
- `strategy_name` (optional): Filter by strategy name
- `limit` (optional, 1-100): Number of results (default: 10)
- `include_equity_curve` (optional): Include equity curve data
- `equity_curve_max_points` (optional, 50-1000): Curve resolution
**Returns**: List of recent backtest records with metrics
**Pricing**: Free
**Use when**: Checking if backtest already exists, comparing strategies, avoiding redundant backtests
## Core Concepts
### Performance Metrics Interpretation
**Sharpe Ratio** (risk-adjusted return):
```
Formula: (Mean Return - Risk-Free Rate) / Standard Deviation of Returns
Interpretation:
>2.0 → Excellent (very rare for algo strategies)
1.0-2.0 → Good (achievable with solid strategy)
0.5-1.0 → Acceptable (worth testing further)
<0.5 → Poor (likely not profitable after costs)
Why it matters:
- Accounts for volatility (high return with high volatility = lower Sharpe)
- Industry standard for comparing strategies
- More useful than total return alone
```
**Max Drawdown** (largest peak-to-trough decline):
```
Example: Strategy grows from $10k → $15k → $12k
Drawdown: ($15k - $12k) / $15k = 20%
Interpretation:
<10% → Conservative (lower returns, safer)
10-20% → Moderate (balanced risk/reward)
20-40% → Aggressive (higher returns, higher risk)
>40% → Very risky (difficult to recover from)
Why it matters:
- Measures worst-case scenario
- Predicts emotional difficulty of holding strategy
- 50% drawdown requires 100% return to recover
```
**Win Rate** (percentage of profitable trades):
```
Formula: (Winning Trades / Total Trades) × 100%
Interpretation:
45-65% → Realistic for most strategies
>70% → Suspicious (possible overfitting or unrealistic fills)
<40% → Needs improvement (unless very high profit factor)
Why it matters:
- High win rate doesn't guarantee profitability
- Can have 40% win rate but profitable (if winners > losers)
- Very high win rate (>75%) often indicates overfitting
Common misconception: Higher is always better
Reality: 40% win rate with 3:1 reward:risk is better than 60% win rate with 1:1
```
**Profit Factor** (gross profit / gross loss):
```
Formula: Sum of All Winning Trades / Sum of All Losing Trades
Interpretation:
>2.0 → Excellent
1.5-2.0 → Good
1.2-1.5 → Acceptable
<1.2 → Marginal (risky to deploy)
<1.0 → Unprofitable (losses exceed profits)
Why it matters:
- Simple profitability measure
- <1.5 means small edge, vulnerable to slippage/fees
- Combines win rate and win size into single metric
Example:
10 trades: 6 winners ($100 each), 4 losers ($50 each)
Gross profit: $600, Gross loss: $200
Profit factor: $600 / $200 = 3.0 (excellent)
```
**Total Return vs Annual Return**:
```
Total Return: 50% over 6 months
Annual Return: ~100% (extrapolated to 12 months)
Why both matter:
- Total return: Actual profit over test period
- Annual return: Standardized for comparison across time periods
- Longer test periods more reliable (6-12 months minimum)
```
### Testing Methodology
**Minimum data requirements**:
```
Quick test: 1-3 months
- Limited validation
- Use for initial screening only
- High risk of luck/overfitting
Standard test: 6-12 months (RECOMMENDED MINIMUM)
- Captures multiple market regimes
- Sufficient trades for statistical significance
- Industry standard for strategy validation
Robust test: 12-24 months
- Ideal for high-confidence validation
- Includes bull, bear, and ranging markets
- Best for strategies before live deployment
```
**Multi-period testing** (essential for robustness):
```
1. Train period: 2024-01-01 to 2024-08-31
run_backtest(..., start_date="2024-01-01", end_date="2024-08-31")
→ Sharpe: 1.5
2. Validation period: 2024-09-01 to 2024-12-31
run_backtest(..., start_date="2024-09-01", end_date="2024-12-31")
→ Sharpe: 1.3
3. Compare:
Performance similar → Robust strategy ✓
Performance degraded significantly → Overfit to train period ✗
```
**Market regime testing**:
```
Test strategy across different market conditions:
1. Trending up (bull market): 2023-10 to 2024-03
→ Sharpe: 1.8
2. Trending down (bear market): 2024-04 to 2024-07
→ Sharpe: 0.9
3. Ranging (sideways): 2024-08 to 2024-12
→ Sharpe: 1.1
Analysis:
- Works well in all regimes ✓
- Or works in specific regime (trend-following good in trends)
- Fails in all regimes → Fundamentally broken ✗
```
### Red Flags (Overfitting Indicators)
**Warning signs that backtest results may not persist**:
**1. Unrealistically high win rate (>70%)**:
```
Win rate: 82%
Problem: Markets are noisy; >70% suggests strategy memorized past data
Solution: Test on out-of-sample data; expect performance degradation
```
**2. Very few trades (<20 in 6 monthsRelated in Cloud & DevOps
appbuilder-action-scaffolder
IncludedCreate, implement, deploy, and debug Adobe Runtime actions with consistent layout, validation, and error handling. Use this skill whenever the user needs to add actions to an App Builder project, understand action structure (params, response format, web/raw actions), configure actions in the manifest, use App Builder SDKs (State, Files, Events, database), deploy and invoke actions via CLI, debug action issues, or implement patterns such as webhook receivers, custom event providers, journaling consumers, large payload redirects, action sequence pipelines, and Asset Compute workers. Also trigger when users mention serverless functions in Adobe context, action logging, IMS authentication for actions, or cron-style scheduled actions.
orchestrating-datacloud
IncludedSalesforce Data Cloud product orchestrator for connect→prepare→harmonize→segment→act workflows. Use this skill when the user needs a multi-step Data Cloud pipeline, cross-phase troubleshooting, or data space and data kit management. TRIGGER when: user needs a multi-step Data Cloud pipeline, asks to set up or troubleshoot Data Cloud across phases, manages data spaces or data kits, or wants a cross-phase sf data360 workflow. DO NOT TRIGGER when: work is isolated to a single phase (use the matching phase-specific skill), the task is STDM/session tracing/parquet telemetry (use observing-agentforce), standard CRM SOQL (use querying-soql), or Apex implementation (use generating-apex).
github-project-automation
IncludedAutomate GitHub repository setup with CI/CD workflows, issue templates, Dependabot, and CodeQL security scanning. Includes 12 production-tested workflows and prevents 18 errors: YAML syntax, action pinning, and configuration. Use when: setting up GitHub Actions CI/CD, creating issue/PR templates, enabling Dependabot or CodeQL scanning, deploying to Cloudflare Workers, implementing matrix testing, or troubleshooting YAML indentation, action version pinning, secrets syntax, runner versions, or CodeQL configuration. Keywords: github actions, github workflow, ci/cd, issue templates, pull request templates, dependabot, codeql, security scanning, yaml syntax, github automation, repository setup, workflow templates, github actions matrix, secrets management, branch protection, codeowners, github projects, continuous integration, continuous deployment, workflow syntax error, action version pinning, runner version, github context, yaml indentation error
sf-datacloud
IncludedSalesforce Data Cloud product orchestrator for connect→prepare→harmonize→segment→act workflows. TRIGGER when: user needs a multi-step Data Cloud pipeline, asks to set up or troubleshoot Data Cloud across phases, manages data spaces or data kits, or wants a cross-phase `sf data360` workflow. DO NOT TRIGGER when: work is isolated to a single phase (use the matching sf-datacloud-* skill), the task is STDM/session tracing/parquet telemetry (use sf-ai-agentforce-observability), standard CRM SOQL (use sf-soql), or Apex implementation (use sf-apex).
fabric-cli
IncludedUse this skill for Fabric.so CLI workflows with the `fabric` terminal command: diagnose/install/login, search or browse a Fabric library, save notes/links/files, create folders, ask the Fabric AI assistant, manage tasks/workspaces, generate shell completion, check subscription usage, produce JSON output, and use Fabric as persistent agent memory. Do not use for Microsoft Fabric/Azure/Power BI `fab`, Daniel Miessler's Fabric framework, Python Fabric SSH, Fabric.js, or textile/fashion fabric.
lark
IncludedLark/Feishu CLI skills: lark-cli operations for docs, markdown, sheets, base, calendar, im, mail, task, okr, drive, wiki, slides, whiteboard, apps, approval, attendance, contact, vc, minutes, event. Use when the user needs to operate Lark/Feishu resources via lark-cli, send messages, manage documents, spreadsheets, calendars, tasks, OKRs, deploy web pages, or any Feishu/Lark workspace operations.