
rangebar-eval-metrics
by terrylica
Claude Code Skills Marketplace: plugins, skills for ADR-driven development, DevOps automation, ClickHouse management, semantic versioning, and productivity workflows
SKILL.md
name: rangebar-eval-metrics description: > SOTA metrics for evaluating range bar (price-based sampling) financial data. Use when computing Sharpe ratios, risk metrics, ML prediction quality for range bars. TRIGGERS - range bar metrics, evaluate range bars, Sharpe ratio range bars, WFO metrics, walk-forward metrics, BiLSTM evaluation, crypto metrics, daily aggregation, sqrt(7), sqrt(365), PSR DSR MinTRL, IC information coefficient. allowed-tools: Read, Grep, Glob, Bash
Range Bar Evaluation Metrics
Machine-readable reference + computation scripts for state-of-the-art metrics evaluating range bar (price-based sampling) data.
Quick Start
# Compute metrics from predictions + actuals
python scripts/compute_metrics.py --predictions preds.npy --actuals actuals.npy --timestamps ts.npy
# Generate full evaluation report
python scripts/generate_report.py --results folds.jsonl --output report.md
Metric Tiers
| Tier | Purpose | Metrics | Compute |
|---|---|---|---|
| Primary (5) | Research decisions | weekly_sharpe, hit_rate, cumulative_pnl, n_bars, positive_sharpe_rate | Per-fold + aggregate |
| Secondary/Risk (5) | Additional context | max_drawdown, bar_sharpe, return_per_bar, profit_factor, cv_fold_returns | Per-fold |
| ML Quality (3) | Prediction health | ic, prediction_autocorr, is_collapsed | Per-fold |
| Diagnostic (5) | Final validation | psr, dsr, autocorr_lag1, effective_n, binomial_pvalue | Aggregate only |
| Extended Risk (5) | Deep risk analysis | var_95, cvar_95, omega_ratio, sortino_ratio, ulcer_index | Per-fold (optional) |
Why Range Bars Need Special Treatment
Range bars violate standard IID assumptions:
- Variable duration: Bars form based on price movement, not time
- Autocorrelation: High-volatility periods cluster bars → temporal correlation
- Non-constant information: More bars during volatility = more information per day
Canonical solution: Daily aggregation via _group_by_day() before Sharpe calculation.
References
Core Reference Files
| Topic | Reference File |
|---|---|
| Sharpe Ratio Calculations | sharpe-formulas.md |
| Risk Metrics (VaR, Omega, Ulcer) | risk-metrics.md |
| ML Prediction Quality (IC, Autocorr) | ml-prediction-quality.md |
| Crypto Market Considerations | crypto-markets.md |
| Temporal Aggregation Rules | temporal-aggregation.md |
| JSON Schema for Metrics | metrics-schema.md |
| Anti-Patterns (Transaction Costs) | anti-patterns.md |
| SOTA 2025-2026 (SHAP, BOCPD, etc.) | sota-2025-2026.md |
| Worked Examples (BTC, EUR/USD) | worked-examples.md |
| Structured Logging (NDJSON) | structured-logging.md |
Related Skills
| Skill | Relationship |
|---|---|
| adaptive-wfo-epoch | Uses weekly_sharpe, psr, dsr for WFE calculation |
Dependencies
pip install -r requirements.txt
# Or: pip install numpy>=1.24 pandas>=2.0 scipy>=1.10
Key Formulas
Daily-Aggregated Sharpe (Primary Metric)
def weekly_sharpe(pnl: np.ndarray, timestamps: np.ndarray) -> float:
"""Sharpe with daily aggregation for range bars."""
daily_pnl = _group_by_day(pnl, timestamps) # Sum PnL per calendar day
if len(daily_pnl) < 2 or np.std(daily_pnl) == 0:
return 0.0
daily_sharpe = np.mean(daily_pnl) / np.std(daily_pnl)
# For crypto (7-day week): sqrt(7). For equities: sqrt(5)
return daily_sharpe * np.sqrt(7) # Crypto default
Information Coefficient (Prediction Quality)
from scipy.stats import spearmanr
def information_coefficient(predictions: np.ndarray, actuals: np.ndarray) -> float:
"""Spearman rank IC - captures magnitude alignment."""
ic, _ = spearmanr(predictions, actuals)
return ic # Range: [-1, 1]. >0.02 acceptable, >0.05 good, >0.10 excellent
Probabilistic Sharpe Ratio (Statistical Validation)
from scipy.stats import norm
def psr(sharpe: float, se: float, benchmark: float = 0.0) -> float:
"""P(true Sharpe > benchmark)."""
return norm.cdf((sharpe - benchmark) / se)
Annualization Factors
| Market | Daily → Weekly | Daily → Annual | Rationale |
|---|---|---|---|
| Crypto (24/7) | sqrt(7) = 2.65 | sqrt(365) = 19.1 | 7 trading days/week |
| Equity | sqrt(5) = 2.24 | sqrt(252) = 15.9 | 5 trading days/week |
NEVER use sqrt(252) for crypto markets.
CRITICAL: Session Filter Changes Annualization
| View | Filter | days_per_week | Rationale |
|---|---|---|---|
| Session-filtered (London-NY) | Weekdays 08:00-16:00 | sqrt(5) | Trading like equities |
| All-bars (unfiltered) | None | sqrt(7) | Full 24/7 crypto |
Using sqrt(7) for session-filtered data overstates Sharpe by ~18%!
See crypto-markets.md for detailed rationale.
Dual-View Metrics
For comprehensive analysis, compute metrics with BOTH views:
- Session-filtered (London 08:00 to NY 16:00): Primary strategy evaluation
- All-bars: Regime detection, data quality diagnostics
Academic References
| Concept | Citation |
|---|---|
| Deflated Sharpe Ratio | Bailey & López de Prado (2014) |
| Sharpe SE with Non-Normality | Mertens (2002) |
| Statistics of Sharpe Ratios | Lo (2002) |
| Omega Ratio | Keating & Shadwick (2002) |
| Ulcer Index | Peter Martin (1987) |
Decision Framework
Go Criteria (Research)
go_criteria:
- positive_sharpe_rate > 0.55
- mean_weekly_sharpe > 0
- cv_fold_returns < 1.5
- mean_hit_rate > 0.50
Publication Criteria
publication_criteria:
- binomial_pvalue < 0.05
- psr > 0.85
- dsr > 0.50 # If n_trials > 1
Scripts
| Script | Purpose |
|---|---|
scripts/compute_metrics.py | Compute all metrics from predictions/actuals |
scripts/generate_report.py | Generate Markdown report from fold results |
scripts/validate_schema.py | Validate metrics JSON against schema |
Remediations (2026-01-19 Multi-Agent Audit)
The following fixes were applied based on a 12-subagent adversarial audit:
| Issue | Root Cause | Fix | Source |
|---|---|---|---|
weekly_sharpe=0 | Constant predictions | Model collapse detection + architecture fix | model-expert |
IC=None | Zero variance predictions | Return 1.0 for constant (semantically correct) | model-expert |
prediction_autocorr=NaN | Division by zero | Guard for std < 1e-10, return 1.0 | model-expert |
| Ulcer Index divide-by-zero | Peak equity = 0 | Guard with np.where(peak > 1e-10, ...) | risk-analyst |
| Omega/Profit Factor unreliable | Too few samples | min_days parameter (default: 5) | robustness-analyst |
| BiLSTM mean collapse | Architecture too small | hidden_size: 16→48, dropout: 0.5→0.3 | model-expert |
profit_factor=1.0 (n_bars=0) | Early return wrong value | Return NaN when no data to compute ratio | risk-analyst |
Model Collapse Detection
# ALWAYS check for model collapse after prediction
pred_std = np.std(predictions)
if pred_std < 1e-6:
logger.warning(
f"Constant predictions detected (std={pred_std:.2e}). "
"Model collapsed to mean - check architecture."
)
Recommended BiLSTM Architecture
# BEFORE (causes collapse on range bars)
HIDDEN_SIZE = 16
DROPOUT = 0.5
# AFTER (prevents collapse)
HIDDEN_SIZE = 48 # Triple capacity
DROPOUT = 0.3 # Less aggressive regularization
See reference docs for complete implementation details.
Score
Total Score
Based on repository quality metrics
SKILL.mdファイルが含まれている
ライセンスが設定されている
100文字以上の説明がある
GitHub Stars 100以上
1ヶ月以内に更新
10回以上フォークされている
オープンIssueが50未満
プログラミング言語が設定されている
1つ以上のタグが設定されている
Reviews
Reviews coming soon

