How to Compare Backtest Results for 0DTE Strategy

Comparing backtests is about evidence, not impressions. If you change entry time, deltas, or management rules, you need a consistent method to decide whether Version B is actually better than Version A—or just looks better because of chance or settings drift. This guide gives a practical, repeatable process tailored to 0DTE SPX strategies and GreeksLab’s analytics.

1) Make It an Apples-to-Apples Test

Keep everything identical except the variable you’re testing.

Hold constant:

Date range and trading calendar (exclude half-days or keep them in both)
Capital, sizing model, commissions, slippage
Entry window granularity (e.g., 9:31 vs 9:45)
Underlying/universe (e.g., SPX only)

GreeksLab tip: Duplicate a strategy, change one parameter, and re-run. Log the change (“A: 16Δ; B: 12Δ”) and the run IDs.

2) Choose Primary and Guardrail Metrics

Pick 1–2 primary metrics that reflect your objective, and 3–5 guardrails to control risk.

Common primary metrics

Risk-adjusted return: Sharpe, Sortino
Profit efficiency: Average daily P&L, Return / Margin used
Drawdown efficiency: MAR (CAGR / Max DD) for multi-year tests

Guardrails

Max drawdown (absolute and %)
CVaR / Expected shortfall (e.g., worst 5% days)
Win rate vs payoff ratio (don’t accept higher win rate if payoff collapses)
Tail risk indicators: worst day, cluster of losses
Exposure: average and peak number of open positions

GreeksLab tip: Use the Overview tab for headline KPIs and the P&L distribution and underwater charts to sanity-check tails.

3) Compare by Market Regime (Not Just Aggregate)

A strategy can “win” overall while losing in key regimes.

Segment by:

Volatility (e.g., VIX buckets: <15, 15–20, 20–25, >25)
Trend/Range (up, down, chop)
Time-of-day (entry hour)
Event days (e.g. FOMC vs non-event)

GreeksLab tip: Use the Insights tab to slice results by volatility, weekday, entry hour, etc. Prefer a version that is robust across slices, not just top-line.

4) Use Paired, Day-Matched Comparisons

When A and B trade on the same days, compare their day-by-day differences. It’s more sensitive and fair than comparing unpaired aggregates.

Procedure

Align the two backtests on calendar days.
Compute ΔPnL = PnL_B − PnL_A per day.
Review:
- Median ΔPnL (less sensitive to outliers)
- % of days B > A
- Worst ΔPnL (downside surprise)
- Distribution of ΔPnL (fat left tail?)

Why it matters: 0DTE returns are non-normal and serially dependent. Paired comparisons reduce noise from market path differences.

5) Look Beyond Averages: Distributions and Tails

Histogram / KDE of daily PnL: Did the “better” version just add a few outsized wins?
Left-tail focus: Compare the 5th and 1st percentiles (or CVaR). 0DTE can fail by tail clustering.
Run-lengths: Max consecutive loss days. Can you survive that sequence?

GreeksLab tip: Use the P&L distribution and drawdown charts. A small Sharpe improvement is not worth a much fatter left tail.

6) Sample Size and Stability Checks

Minimum sample: For intraday 0DTE, aim for hundreds of trading days across multiple vol regimes.
Stability: Break your test into yearly or quarterly chunks. Does B outperform A in most chunks?
Walk-forward: Optimize on Period 1, validate on Period 2, then roll forward.

Red flags

Performance concentrated in a short window
Version B wins only in one regime you over-weighted
Highly parameter-sensitive results (tiny tweaks flip the outcome)

7) Multiple Comparisons Discipline

If you run 20 variations, some will “win” by chance.

Mitigations

Pre-register 2–3 hypotheses (e.g., “12Δ vs 16Δ,” “9:31 vs 10:00 entry”).
Use out-of-sample validation or walk-forward.
Prefer simpler rules if performance is similar.

8) Execution Reality Check

Backtests can be over-optimistic if fills are too generous.

Sanity checks

Increase slippage assumptions and re-run; does B still beat A?

GreeksLab tip: In Backtest Settings, model slippage and commissions and keep them identical across runs. Stress them higher to test fragility.

9) Decision Framework (Go / No-Go)

Use a simple scoring sheet. Example:

Criterion	Weight	A Score	B Score	Notes
Primary metric (Sortino)	3x	6.2	7.1	B higher is better
Max drawdown (lower is better)	3x	-18%	-23%	A wins on DD
CVaR 5% (lower is better)	2x	-$1.9k	-$2.4k	A better tail
Median ΔPnL (B − A)	2x	—	+$35	B wins per-day median
Regime robustness (wins in slices)	2x	3/6	5/6	B more consistent
Execution stress (slippage ↑ 2×)	2x	Breaks	Holds	B more resilient

Rule of thumb: Approve B only if it wins on the primary metric, does not worsen tail risk materially, and survives execution stress.

10) Common Pitfalls (Avoid These)

Comparing runs with different date ranges or costs
Changing multiple parameters at once
Cherry-picking regimes after looking at results
Declaring victory on tiny effect sizes with short samples

GreeksLab Workflow (Step-by-Step)

Duplicate baseline strategy. Rename clearly (e.g., “IC 16Δ → IC 12Δ”).
Change one parameter only (delta, entry time, stop rule, etc.).
Run both with the same backtest settings (dates, costs, slippage).
In Overview, record: Sharpe/Sortino, Max DD, CVaR, Avg Daily PnL.
In Insights, compare by VIX buckets, weekday, entry hour.
In Positions/Daily, export day-matched results; compute ΔPnL distribution.
Stress slippage/commissions; re-run. Check if ranking holds.
Decide with the Go/No-Go framework. Document run IDs and rationale.

Summary

Keep comparisons controlled (one change at a time, identical settings).
Evaluate risk-adjusted performance and tail behavior, not just averages.
Segment by regime, do paired day comparisons, and stress execution assumptions.
Approve changes only when improvements are consistent, robust, and practically tradable.

Use this checklist every time you iterate. It will save you from overfitting and false positives—and surface changes that actually matter in live 0DTE trading.

Get the most out of GreeksLab!

Create a free account or sign in to access:

Backtester tool
Flexible strategy builder
High resolution data
Advanced analytics
And much more...

Create new account

Already a member?Login here