In 1998, the Long-Term Capital Management hedge fund imploded. The firm had two Nobel Prize winners on its board and the most sophisticated risk models on Wall Street. They knew their positions. They knew their Greeks. They knew their correlations. What they didn't capture was the reasoning behind their decisions, the assumptions that made their models seem safe.
When Russia defaulted on its debt and markets panicked, LTCM's models broke. Not because they lacked data, but because nobody had recorded why certain correlations were assumed to hold, what conditions would break them, or what the fallback should be when they did.
This is the difference between knowing what a trading system does and understanding how it thinks. The first is useless for debugging. The second is trainable.
What Actually Goes Into a Training Episode
Most "trading datasets" on the internet are just price data with trade logs. Buy here, sell there, made 2.3%. That's like training a medical AI on prescription records without noting the patient symptoms, the doctor's reasoning, or whether the treatment worked for the right reasons.
A complete decision episode captures six components, each serving a specific training purpose:
- Agent Reasoning - The actual thought process
- Market Context - Everything the agent observed
- Trade Execution - What actions were taken
- Position Journey - How it played out over time
- Outcomes - What happened
- Counterfactuals - What could have happened differently
Miss any one of these and you're training on partial information. The model learns to pattern-match without understanding.
1. Agent Reasoning
The reasoning component captures how the agent (human or AI) processed information and arrived at a decision.
{
"agent_reasoning": {
"explicit_reasoning": "BTC showing strength above 65k with
decreasing sell pressure. 4h RSI resetting from overbought.
Looking for continuation to 68k resistance.",
"decision_confidence": 0.72,
"thoughts": [
{
"phase": "analysis",
"reasoning_type": "chain-of-thought",
"output": "Market structure bullish on higher timeframes...",
"tool_calls": 3
},
{
"phase": "decision",
"reasoning_type": "react",
"output": "Entry criteria met. Sizing for 2% portfolio risk...",
"tool_calls": 1
},
{
"phase": "execution",
"reasoning_type": "react",
"output": "Limit order placed at 65,240. Stop at 63,800...",
"tool_calls": 2
}
]
}
}
Why Each Field Matters
explicit_reasoning: Natural language explanation of the decision. Models learn to connect observations to conclusions. Without this, they only see what happened, not why.
decision_confidence: Self-assessed confidence score (0-1). When paired with outcomes, this trains calibration. Models learn when to trust their analysis and when to be uncertain.
thoughts.phase: Tags each reasoning step as analysis, decision, or execution. Models learn that different phases require different cognitive modes.
thoughts.reasoning_type: Whether the step used ReAct (interleaved reasoning and action) or chain-of-thought. Different approaches work better for different situations.
thoughts.tool_calls: How many tools were invoked at each step. Models learn appropriate information-gathering behavior, neither under-researching nor over-researching.
Reasoning traces enable process supervision. Models don't just learn what decisions to make, they learn how to think through decisions. This is fundamentally more robust than outcome-only training.
2. Market Context
The context component captures everything the agent observed when making the decision.
{
"market_context": {
"candles_15m": [...], // Last 20 bars
"candles_1h": [...], // Last 20 bars
"candles_4h": [...], // Last 20 bars
"candles_1d": [...], // Last 10 bars
"indicators": {
"rsi_14": 58.3,
"macd": {"line": 124, "signal": 98, "histogram": 26},
"bollinger": {"upper": 66200, "mid": 65100, "lower": 64000}
},
"volatility_regime": "moderate",
"atr_1h": 312.5,
"atr_4h": 847.2,
"btc_price": 65240,
"order_book": {
"spread_bps": 2.3,
"imbalance_ratio": 1.24,
"bid_depth_usd": 4200000,
"ask_depth_usd": 3800000,
"book_depth_score": 0.73,
"large_bid_count": 12,
"large_ask_count": 8
}
}
}
Why Each Field Matters
Multi-timeframe candles: Models learn to synthesize information across time horizons. 15-minute data shows entry timing; daily data shows macro context. Different timeframes serve different purposes.
Pre-computed indicators: RSI, MACD, Bollinger Bands pre-calculated for consistency. Models learn indicator interpretation directly correlated with outcomes.
volatility_regime: Explicit regime classification (low/moderate/high). Models learn that the same signal means different things in different regimes.
atr_1h/4h: Average True Range enables volatility-adjusted sizing. A 2% stop in a low-vol regime is different from 2% in high-vol.
order_book: Current market microstructure. Spread, depth, and imbalance inform execution quality expectations and short-term directional bias.
3. Trade Execution
The execution component captures the specific actions taken.
{
"trade_execution": {
"action": "create_order",
"entry_price": 65247.50,
"position_size_usd": 5000,
"leverage": 3,
"stop_loss": 63800,
"take_profits": [66500, 67200, 68000],
"tool_name": "hyperliquid_perp",
"executed_at": "2025-03-15T14:32:17Z"
}
}
Why Each Field Matters
action: Classification of what was done (create_order, close, resize). Different action types have different risk profiles and appropriate contexts.
entry_price vs. intended: The actual fill price enables slippage analysis. Models learn to anticipate execution costs.
position_size_usd + leverage: Together these define risk. Models learn appropriate sizing for different setups and confidence levels.
stop_loss: Where the protective exit sits. Models learn stop placement relative to market structure and volatility.
take_profits (array): Multiple TP levels indicate scaling strategy. Models learn that exits aren't binary.
tool_name: Which execution tool was used. Different venues have different characteristics; models learn venue-appropriate behavior.
executed_at: Timestamp enables learning temporal patterns. Time-of-day and day-of-week effects are real in markets.
4. Position Journey
The journey component tracks how the position evolved over time.
{
"position_journey": {
"checkpoints": [
{
"type": "1h",
"hours_since_entry": 1,
"price": 65420,
"pnl_percent": 0.27,
"mfe_at_checkpoint": 0.35,
"mae_at_checkpoint": -0.12
},
{
"type": "4h",
"hours_since_entry": 4,
"price": 65890,
"pnl_percent": 0.99,
"mfe_at_checkpoint": 1.24,
"mae_at_checkpoint": -0.12
},
// ... more checkpoints
],
"max_favorable_excursion": 2.47,
"max_adverse_excursion": -0.89
}
}
Why Each Field Matters
checkpoints: Regular snapshots of position state. This turns a single trade into a sequence, enabling trajectory modeling.
pnl_percent at checkpoint: Running P&L shows how the trade unfolded. Models learn patterns of winning vs. losing trades over time.
mfe_at_checkpoint: Best profit reached to that point. Combined with final outcome, this shows "could have exited here" opportunities.
mae_at_checkpoint: Worst loss experienced to that point. Shows how much drawdown was endured.
max_favorable_excursion: The peak unrealized profit. This is the benchmark for exit optimization.
max_adverse_excursion: The worst unrealized loss. This is the true risk that was taken.
Maximum Favorable and Adverse Excursion metrics are often missing from trading datasets. Without them, you can't train exit optimization. A trade that made 1% but could have made 5% looks identical to a trade that made 1% at its peak.
5. Outcomes
The outcome component captures what actually happened.
{
"outcome": {
"result": "win",
"exit_price": 66480,
"realized_pnl": 94.20,
"realized_pnl_percent": 1.89,
"hold_duration_minutes": 487,
"closed_at": "2025-03-15T22:39:42Z"
}
}
Why Each Field Matters
result: Win/loss/breakeven classification. The basic label for supervised learning.
realized_pnl: Absolute dollar P&L. Important for understanding scale.
realized_pnl_percent: Percentage return. Normalized for comparison across position sizes.
hold_duration_minutes: How long the position was held. Combined with returns, this measures efficiency (return per unit time).
6. Counterfactuals
The counterfactual component captures what could have happened with different decisions.
{
"counterfactuals": {
"mfe_price": 66720,
"mfe_pnl_percent": 2.26,
"mae_price": 64940,
"mae_pnl_percent": -0.47,
"pnl_trailing_2pct": 1.94,
"pnl_trailing_5pct": 2.18,
"optimal_pnl_percent": 2.26,
"timing_score": 0.84,
"held_too_long": false,
"exited_too_early": true
}
}
Why Each Field Matters
mfe_price/pnl: The best exit that was available. This is the ceiling for what this trade could have made.
mae_price/pnl: The worst point in the trade. If stop was hit here, this would have been the loss.
pnl_trailing_2pct/5pct: What would have happened with trailing stop strategies. Enables comparing actual exit strategy vs. alternatives.
optimal_pnl_percent: Theoretical maximum achievable. The upper bound.
timing_score: A 0-1 composite metric for exit quality. Higher means closer to optimal.
held_too_long / exited_too_early: Explicit error flags. These enable targeted learning on specific failure modes.
Counterfactuals enable learning from "what could have been." A single trade yields multiple training signals: the actual outcome plus comparisons to alternatives. This multiplies the information density of each episode.
How Components Connect
These six components aren't independent. They form a coherent narrative:
- Market context shows what information was available
- Agent reasoning shows how that information was processed
- Trade execution shows what action resulted
- Position journey shows how the trade unfolded
- Outcomes shows what happened
- Counterfactuals shows what could have happened
Training on this complete chain teaches models not just what to do, but how to think, and how to evaluate their own performance.
Optional: Social Sentiment
Some episodes include social context:
{
"social_sentiment": {
"tweet_count": 847,
"time_window_minutes": 1440,
"avg_sentiment_score": 0.23,
"sentiment_distribution": {
"bullish": 412,
"bearish": 198,
"neutral": 237
},
"tweets": [
{
"text": "BTC breaking out of this range...",
"sentiment": "bullish",
"sentiment_score": 0.76,
"importance": "high",
"engagement": {
"likes": 2847,
"retweets": 342
}
}
]
}
}
Social sentiment enables models to incorporate crowd signals. The engagement metrics teach appropriate weighting, distinguishing high-reach signals from noise.
Quality Requirements
The structure above is necessary but not sufficient. Quality episodes also require:
- Consistency: Same fields populated the same way across all episodes
- Completeness: No missing components or null values in critical fields
- Accuracy: Prices, timestamps, and calculations that match reality
- Diversity: Coverage of different market conditions, strategies, and outcomes
- Recency: Fresh data that reflects current market structure
Using This Structure
If you're evaluating training data for financial AI:
- Check for completeness. Are all six components present? Missing reasoning traces or counterfactuals severely limit training value.
- Verify the reasoning chain. Does the reasoning connect observations to actions? Or is it post-hoc rationalization?
- Examine counterfactual quality. Are MFE/MAE properly calculated? Are timing scores meaningful?
- Look at diversity. Does the dataset cover wins and losses, different assets, different market conditions?
- Assess freshness. When were episodes generated? Markets from 2023 don't teach models about 2025 conditions.
The structure of your training data determines the ceiling for what your models can learn. Get the structure right, and everything else becomes easier.