Skip to content
SMARTFINANCEDATA
Home Markets Insights Blog Tools Contact
Sign In Get Access
Home Markets Insights Tools Blog Contact Pricing
Sign In Get Access
How It Works

Our Methodology,
Fully Transparent

Every dataset on SmartFinanceData is built on a rigorous, reproducible statistical framework. This page explains exactly how we source raw price data, define market events, run probability calculations, and validate results before publishing.

25yr Price History Chi-Square Validated Daily Refresh No Lookahead Bias
Price Candles
10M+
Daily OHLC records processed
History Depth
25 yrs
Back to January 2000
Instruments
50+
Across all asset classes
Significance
p<0.05
Minimum publication threshold
On This Page
Data Sources Data Pipeline Analysis Types Statistical Tests Bias Controls Update Cadence Limitations FAQ
Step 01

Data Sources

All price data is sourced from institutional-grade providers with full tick history. We use multiple redundant providers and cross-validate OHLC figures before ingestion to ensure data integrity.

💱
Forex
Spot FX Rates

Daily OHLC from interbank spot market. NY close (5pm EST) candles used as the standard session boundary for all forex pairs.

Daily H4 H1
📈
Indices & Equities
Exchange Data

Cash index OHLC sourced directly from exchange feeds. Adjusted for index rebalancing events. Futures-adjusted series used for extended history where needed.

Daily H4
₿
Crypto
Aggregated CEX

Volume-weighted aggregated OHLC across major centralised exchanges. UTC midnight used as the daily candle boundary for all crypto assets.

Daily H4

Data provider redundancy: OHLC values are cross-checked across at least two independent sources per instrument. Any discrepancy exceeding 0.05% triggers a manual review before the candle is included in the analysis database.

Step 02

Data Pipeline

Raw price data passes through a six-stage pipeline before it reaches the analysis layer. Each stage has automated quality gates. Any failure halts the pipeline and triggers an alert — no partial or corrupted data is ever published.

01
Ingest & Normalise

Raw OHLC data is pulled from providers via API, converted to a uniform schema (UTC timestamp, Open, High, Low, Close, Volume), and stored in the raw data lake.

02
Quality Checks

Automated checks flag: missing candles, High < Low violations, zero-volume sessions, extreme outliers (>5σ from rolling mean), and timestamp gaps.

03
Corporate Actions & Adjustments

For indices and commodities, contract rolls, index rebalances, and split events are identified and back-adjusted using ratio method to maintain return continuity.

04
Feature Engineering

Derived features are computed from clean OHLC: daily range, wick ratios, body-to-range ratio, gap size, session overlaps, ATR-normalised values, and rolling volatility.

05
Event Classification

Market events (e.g. "trend day", "Asian range break", "monthly high sweep") are classified using deterministic rule sets. Each classification has a documented definition — no ambiguity or look-back.

06
Analysis & Publish

Statistical analysis runs on the classified dataset. Results are written to the publication layer only if they pass all significance thresholds. Outputs include probability tables, confidence intervals, and distribution data.

Step 03

Analysis Types

SmartFinanceData publishes six core analysis types. Each answers a specific class of trading question by categorising historical price behaviour into measurable, actionable outcomes.

📊
Streak
Streak Analysis

Measures the probability of consecutive directional closes (up/down streaks). Answers: "After N bullish days, how likely is another bullish close?" Calculated per day-of-week and month.

P(up | streak=N) = count(up after N up days) / count(N up days)
// Bayesian posterior also computed with Dirichlet prior
🕐
Session
Session Analytics

Analyses price behaviour within and across trading sessions (Asian, London, New York). Identifies trend day probability given session compression, London breakout continuation rates, and more.

Asian Range = max(H₁..Hₙ) − min(L₁..Lₙ) // all H4 within 00:00–08:00 GMT
// P(trend day | Asian range ≤ ATR₁₄ × 0.25)
📐
Range
Range Distribution

Distributional analysis of daily, weekly, and monthly ranges expressed as multiples of ATR. Used to build probabilistic price targets and assess where the current range sits relative to historical norms.

Norm Range = (H − L) / ATR₁₄
// Percentile rank and Z-score calculated per day-of-week
🎯
Sweep
Level Sweep Probability

Calculates how often price sweeps a prior period high/low (monthly, weekly, previous day) before reversing. Useful for understanding liquidity grab behaviour at key structural levels.

Sweep = H > prev_H ∧ C < prev_H // wick above, close below
// P(sweep | prior range) segmented by volatility regime
📅
Seasonal
Seasonal Tendencies

Monthly and quarterly bias analysis using long-run averages. Statistical separation from random walk tested per calendar segment. Day-of-week and week-of-month breakdowns included.

Monthly Bias = avg(C − O) / ATR₁₄ over N years
// t-test vs H₀: bias = 0, with Bonferroni correction for 12 months
🏗️
Structure
Market Structure

HTF/LTF break of structure probability after defined price patterns. Includes Outside Day follow-through, Inside Bar resolution direction, and post-consolidation breakout continuation rates.

Outside Day: H > prev_H ∧ L < prev_L
// P(bullish/bearish resolution | wick dominance ratio)
Step 04

Statistical Tests

Raw probability counts are necessary but not sufficient. Every published figure must pass at least one formal statistical test. The test chosen depends on the data type and the null hypothesis being evaluated.

Test Used For Null Hypothesis Threshold
Chi-Square (χ²) Streak, session, sweep datasets — comparing observed vs expected frequency of categorical outcomes Outcome frequencies match a uniform or 50/50 distribution p < 0.05
Z-Test (two-tail) Directional bias in range and streak tables with large samples (n > 30) Population proportion = 0.5 (no directional edge) |Z| > 1.96
Wilson CI Confidence intervals on all binary probability estimates — avoids Wald interval breakdown at extreme probabilities N/A — interval estimation, not hypothesis testing 95% CI
Bayesian Posterior Streak tables with small sample sizes (n < 30). Beta distribution with uniform prior N/A — posterior distribution computed, not tested 90% HDI
One-sample t-test Seasonal bias tables — testing whether mean monthly return differs significantly from zero μ = 0 (no seasonal effect) p < 0.05
Bonferroni Correction Applied to seasonal tests across 12 months, 5 weekdays, and multi-comparison streak tables Controls family-wise error rate α / n comparisons

Sample size minimum: Any probability with fewer than 20 historical observations is marked with a low-confidence indicator (⚠) in the dataset tables. Figures below 10 observations are suppressed entirely. This prevents overfitting to noise in thin data conditions.

Step 05

Bias Controls

The two most dangerous failure modes in backtested analytics are lookahead bias and data snooping. We apply strict procedural controls to eliminate both.

🚫
No Lookahead Bias

All event classifications use only data that would have been available at the candle close. The analysis database is strictly append-only — historical rows are never retroactively modified once published.

Example: An "Asian range" is computed at 08:00 GMT using only candles from 00:00–07:59 GMT.

🔬
Walk-Forward Validation

Each analysis is checked for stability using a walk-forward approach: we verify that probabilities computed on a 10-year training window hold within ±5% on the subsequent 5-year out-of-sample period.

Datasets that fail walk-forward stability checks are flagged as "unstable" and excluded from featured datasets.

🎲
Multiple Comparison Control

When testing many sub-groups (e.g., 12 months × 5 weekdays = 60 cells), Bonferroni correction is applied to control the family-wise false discovery rate. This prevents spurious "signals" emerging from random variation at scale.

α_adjusted = 0.05 / n_comparisons
📏
Regime Segmentation

Probabilities are stress-tested across three volatility regimes (low / medium / high, defined by 14-period ATR quartiles) to confirm the edge is not regime-specific. Regime breakdowns are available in Pro datasets.

An edge that only exists in low-volatility regimes is disclosed prominently, not hidden in aggregate statistics.

Step 06

Update Cadence

Datasets are not static snapshots — they are living tables that update automatically as new price data is confirmed. Different analysis types update on different schedules based on the cadence of the underlying data.

Analysis Type Update Trigger Lag Recalc Depth
Streak Tables Daily close confirmed (NY session close) ~2 hrs post-close Rolling 5yr + full history
Session Analytics NY session close (17:00 EST) ~2 hrs post-close Full history recalc
Range Distribution Daily close confirmed ~2 hrs post-close Percentile tables rebuilt fully
Sweep Probability Weekly close (Friday NY close) + daily ~3 hrs post-close Weekly + daily tables refreshed
Seasonal Tendencies Month-end close Within 24 hrs of month end Full history, re-tested with t-test
Structure Analysis Daily close confirmed ~2 hrs post-close Pattern tables rebuilt
Important

Known Limitations

All datasets are based on historical price data. Past statistical tendencies do not guarantee future outcomes. Markets are non-stationary — structural regime changes can and do invalidate historical edges.

Non-Stationarity

Financial time series are not stationary. Central bank policy shifts, algorithmic market structure changes, and macro regime changes all affect the persistence of historical probabilities. We publish walk-forward stability scores to flag at-risk datasets.

Transaction Costs

Published probabilities are pre-cost and pre-slippage. A dataset showing 58% directional edge may not be viable after spread, commissions, and execution slippage are applied. Edge sizing should account for total round-trip cost.

Market Hours & Liquidity

Analysis uses continuous price data. Holiday periods, low-liquidity sessions, and data gaps during major events (flash crashes, circuit breakers) are excluded but may affect live execution if those conditions recur.

Interpretation Risk

Probabilities describe population averages across historical conditions. They should be used as one input in a systematic trading framework — not as standalone trade signals. No single dataset constitutes a complete trading strategy.

Questions

Frequently Asked

Forex and Indices daily candles use the standard NY close session — opening at 17:00 EST Sunday and closing at 17:00 EST Friday. This is the conventional institutional daily bar and aligns with how most professional charting platforms display daily candles. Crypto daily candles use UTC midnight open/close as the convention on crypto exchanges.
Most major forex pairs have data from January 2000, giving approximately 25 years of daily history (~6,500 candles). Crypto assets are limited to exchange inception dates — Bitcoin from 2012, most altcoins from 2017 onwards. Indices vary by contract — US indices (S&P 500, Nasdaq 100) go back to 2000; some European indices to 2005.
Forex weekends (Saturday/Sunday sessions) are excluded. Bank holidays for each market are identified using a curated exchange calendar — any session with anomalously low volume (bottom 2% by instrument) is also excluded from the analysis database to prevent holiday period distortions in range and session analytics.
Pro subscribers can export dataset tables as CSV or XLSX for use in their own analysis tools. Full methodology documentation including event classification rule sets is available in the Pro member resources section. Raw tick data is not distributed — only processed OHLC and derived probability tables.
The Kelly Criterion provides a theoretical optimal position sizing fraction given an estimated edge probability and assumed risk/reward ratio. We display full Kelly alongside half-Kelly (a more conservative practical sizing). These are educational reference figures assuming a 1:1 R:R — they should be adjusted to your actual strategy's win rate and risk/reward before application.
No — updates are triggered by confirmed session closes, not real-time tick data. Daily datasets update within approximately two hours of the NY session close (17:00 EST), after data quality checks pass. Intra-day data is not currently published. Subscribers can check the "last updated" timestamp on each dataset page for confirmation.
Ready to Explore

Put the Methodology to Work

The datasets are built on the framework above. Explore market analytics, browse instruments, or get full Pro access to every probability table on the platform.

Browse Markets Get Pro Access
SmartFinanceData

Probabilistic market analytics across Forex, Indices, Commodities & Crypto — powered by 50+ datasets and millions of data points.

Product
Insights Markets Pricing Team Login
Resources
Methodology Disclaimer Terms Of Service Privacy Policy FAQ Contact

© 2026 SmartFinanceData. All data is historical and does not guarantee future performance.