Technical Deep-Dive

Anatomy of a Financial Decision Episode

January 2026 8 min read

TL;DR

A complete financial AI training episode requires six components: agent reasoning, market context, trade execution, position journey, outcomes, and counterfactuals.

Reasoning traces with explicit phases (analysis, decision, execution) enable process supervision, not just outcome-based training
MFE/MAE (Maximum Favorable/Adverse Excursion) metrics are critical for exit optimization but often missing from datasets
Counterfactuals multiply training signal by extracting "what could have been" from each decision point

In 1998, the Long-Term Capital Management hedge fund imploded. The firm had two Nobel Prize winners on its board and the most sophisticated risk models on Wall Street. They knew their positions. They knew their Greeks. They knew their correlations. What they didn't capture was the reasoning behind their decisions, the assumptions that made their models seem safe.

When Russia defaulted on its debt and markets panicked, LTCM's models broke. Not because they lacked data, but because nobody had recorded why certain correlations were assumed to hold, what conditions would break them, or what the fallback should be when they did.

This is the difference between knowing what a trading system does and understanding how it thinks. The first is useless for debugging. The second is trainable.

What Actually Goes Into a Training Episode

Most "trading datasets" on the internet are just price data with trade logs. Buy here, sell there, made 2.3%. That's like training a medical AI on prescription records without noting the patient symptoms, the doctor's reasoning, or whether the treatment worked for the right reasons.

A complete decision episode captures six components, each serving a specific training purpose:

Agent Reasoning - The actual thought process
Market Context - Everything the agent observed
Trade Execution - What actions were taken
Position Journey - How it played out over time
Outcomes - What happened
Counterfactuals - What could have happened differently

Miss any one of these and you're training on partial information. The model learns to pattern-match without understanding.

1. Agent Reasoning

The reasoning component captures how the agent (human or AI) processed information and arrived at a decision.

{
  "agent_reasoning": {
    "explicit_reasoning": "BTC showing strength above 65k with
      decreasing sell pressure. 4h RSI resetting from overbought.
      Looking for continuation to 68k resistance.",
    "decision_confidence": 0.72,
    "thoughts": [
      {
        "phase": "analysis",
        "reasoning_type": "chain-of-thought",
        "output": "Market structure bullish on higher timeframes...",
        "tool_calls": 3
      },
      {
        "phase": "decision",
        "reasoning_type": "react",
        "output": "Entry criteria met. Sizing for 2% portfolio risk...",
        "tool_calls": 1
      },
      {
        "phase": "execution",
        "reasoning_type": "react",
        "output": "Limit order placed at 65,240. Stop at 63,800...",
        "tool_calls": 2
      }
    ]
  }
}

Why Each Field Matters

explicit_reasoning: Natural language explanation of the decision. Models learn to connect observations to conclusions. Without this, they only see what happened, not why.

decision_confidence: Self-assessed confidence score (0-1). When paired with outcomes, this trains calibration. Models learn when to trust their analysis and when to be uncertain.

thoughts.phase: Tags each reasoning step as analysis, decision, or execution. Models learn that different phases require different cognitive modes.

thoughts.reasoning_type: Whether the step used ReAct (interleaved reasoning and action) or chain-of-thought. Different approaches work better for different situations.

thoughts.tool_calls: How many tools were invoked at each step. Models learn appropriate information-gathering behavior, neither under-researching nor over-researching.

Training Signal

Reasoning traces enable process supervision. Models don't just learn what decisions to make, they learn how to think through decisions. This is fundamentally more robust than outcome-only training.

2. Market Context

The context component captures everything the agent observed when making the decision.

{
  "market_context": {
    "candles_15m": [...],  // Last 20 bars
    "candles_1h": [...],   // Last 20 bars
    "candles_4h": [...],   // Last 20 bars
    "candles_1d": [...],   // Last 10 bars
    "indicators": {
      "rsi_14": 58.3,
      "macd": {"line": 124, "signal": 98, "histogram": 26},
      "bollinger": {"upper": 66200, "mid": 65100, "lower": 64000}
    },
    "volatility_regime": "moderate",
    "atr_1h": 312.5,
    "atr_4h": 847.2,
    "btc_price": 65240,
    "order_book": {
      "spread_bps": 2.3,
      "imbalance_ratio": 1.24,
      "bid_depth_usd": 4200000,
      "ask_depth_usd": 3800000,
      "book_depth_score": 0.73,
      "large_bid_count": 12,
      "large_ask_count": 8
    }
  }
}

Why Each Field Matters

Multi-timeframe candles: Models learn to synthesize information across time horizons. 15-minute data shows entry timing; daily data shows macro context. Different timeframes serve different purposes.

Pre-computed indicators: RSI, MACD, Bollinger Bands pre-calculated for consistency. Models learn indicator interpretation directly correlated with outcomes.

volatility_regime: Explicit regime classification (low/moderate/high). Models learn that the same signal means different things in different regimes.

atr_1h/4h: Average True Range enables volatility-adjusted sizing. A 2% stop in a low-vol regime is different from 2% in high-vol.

order_book: Current market microstructure. Spread, depth, and imbalance inform execution quality expectations and short-term directional bias.

3. Trade Execution

The execution component captures the specific actions taken.

{
  "trade_execution": {
    "action": "create_order",
    "entry_price": 65247.50,
    "position_size_usd": 5000,
    "leverage": 3,
    "stop_loss": 63800,
    "take_profits": [66500, 67200, 68000],
    "tool_name": "hyperliquid_perp",
    "executed_at": "2025-03-15T14:32:17Z"
  }
}

Why Each Field Matters

action: Classification of what was done (create_order, close, resize). Different action types have different risk profiles and appropriate contexts.

entry_price vs. intended: The actual fill price enables slippage analysis. Models learn to anticipate execution costs.

position_size_usd + leverage: Together these define risk. Models learn appropriate sizing for different setups and confidence levels.

stop_loss: Where the protective exit sits. Models learn stop placement relative to market structure and volatility.

take_profits (array): Multiple TP levels indicate scaling strategy. Models learn that exits aren't binary.

tool_name: Which execution tool was used. Different venues have different characteristics; models learn venue-appropriate behavior.

executed_at: Timestamp enables learning temporal patterns. Time-of-day and day-of-week effects are real in markets.

4. Position Journey

The journey component tracks how the position evolved over time.

{
  "position_journey": {
    "checkpoints": [
      {
        "type": "1h",
        "hours_since_entry": 1,
        "price": 65420,
        "pnl_percent": 0.27,
        "mfe_at_checkpoint": 0.35,
        "mae_at_checkpoint": -0.12
      },
      {
        "type": "4h",
        "hours_since_entry": 4,
        "price": 65890,
        "pnl_percent": 0.99,
        "mfe_at_checkpoint": 1.24,
        "mae_at_checkpoint": -0.12
      },
      // ... more checkpoints
    ],
    "max_favorable_excursion": 2.47,
    "max_adverse_excursion": -0.89
  }
}

Why Each Field Matters

checkpoints: Regular snapshots of position state. This turns a single trade into a sequence, enabling trajectory modeling.

pnl_percent at checkpoint: Running P&L shows how the trade unfolded. Models learn patterns of winning vs. losing trades over time.

mfe_at_checkpoint: Best profit reached to that point. Combined with final outcome, this shows "could have exited here" opportunities.

mae_at_checkpoint: Worst loss experienced to that point. Shows how much drawdown was endured.

max_favorable_excursion: The peak unrealized profit. This is the benchmark for exit optimization.

max_adverse_excursion: The worst unrealized loss. This is the true risk that was taken.

MFE/MAE Are Critical

Maximum Favorable and Adverse Excursion metrics are often missing from trading datasets. Without them, you can't train exit optimization. A trade that made 1% but could have made 5% looks identical to a trade that made 1% at its peak.

5. Outcomes

The outcome component captures what actually happened.

{
  "outcome": {
    "result": "win",
    "exit_price": 66480,
    "realized_pnl": 94.20,
    "realized_pnl_percent": 1.89,
    "hold_duration_minutes": 487,
    "closed_at": "2025-03-15T22:39:42Z"
  }
}

Why Each Field Matters

result: Win/loss/breakeven classification. The basic label for supervised learning.

realized_pnl: Absolute dollar P&L. Important for understanding scale.

realized_pnl_percent: Percentage return. Normalized for comparison across position sizes.

hold_duration_minutes: How long the position was held. Combined with returns, this measures efficiency (return per unit time).

6. Counterfactuals

The counterfactual component captures what could have happened with different decisions.

{
  "counterfactuals": {
    "mfe_price": 66720,
    "mfe_pnl_percent": 2.26,
    "mae_price": 64940,
    "mae_pnl_percent": -0.47,
    "pnl_trailing_2pct": 1.94,
    "pnl_trailing_5pct": 2.18,
    "optimal_pnl_percent": 2.26,
    "timing_score": 0.84,
    "held_too_long": false,
    "exited_too_early": true
  }
}

Why Each Field Matters

mfe_price/pnl: The best exit that was available. This is the ceiling for what this trade could have made.

mae_price/pnl: The worst point in the trade. If stop was hit here, this would have been the loss.

pnl_trailing_2pct/5pct: What would have happened with trailing stop strategies. Enables comparing actual exit strategy vs. alternatives.

optimal_pnl_percent: Theoretical maximum achievable. The upper bound.

timing_score: A 0-1 composite metric for exit quality. Higher means closer to optimal.

held_too_long / exited_too_early: Explicit error flags. These enable targeted learning on specific failure modes.

Unique Training Signal

Counterfactuals enable learning from "what could have been." A single trade yields multiple training signals: the actual outcome plus comparisons to alternatives. This multiplies the information density of each episode.

How Components Connect

These six components aren't independent. They form a coherent narrative:

Market context shows what information was available
Agent reasoning shows how that information was processed
Trade execution shows what action resulted
Position journey shows how the trade unfolded
Outcomes shows what happened
Counterfactuals shows what could have happened

Training on this complete chain teaches models not just what to do, but how to think, and how to evaluate their own performance.

Optional: Social Sentiment

Some episodes include social context:

{
  "social_sentiment": {
    "tweet_count": 847,
    "time_window_minutes": 1440,
    "avg_sentiment_score": 0.23,
    "sentiment_distribution": {
      "bullish": 412,
      "bearish": 198,
      "neutral": 237
    },
    "tweets": [
      {
        "text": "BTC breaking out of this range...",
        "sentiment": "bullish",
        "sentiment_score": 0.76,
        "importance": "high",
        "engagement": {
          "likes": 2847,
          "retweets": 342
        }
      }
    ]
  }
}

Social sentiment enables models to incorporate crowd signals. The engagement metrics teach appropriate weighting, distinguishing high-reach signals from noise.

Quality Requirements

The structure above is necessary but not sufficient. Quality episodes also require:

Consistency: Same fields populated the same way across all episodes
Completeness: No missing components or null values in critical fields
Accuracy: Prices, timestamps, and calculations that match reality
Diversity: Coverage of different market conditions, strategies, and outcomes
Recency: Fresh data that reflects current market structure

Using This Structure

If you're evaluating training data for financial AI:

Check for completeness. Are all six components present? Missing reasoning traces or counterfactuals severely limit training value.
Verify the reasoning chain. Does the reasoning connect observations to actions? Or is it post-hoc rationalization?
Examine counterfactual quality. Are MFE/MAE properly calculated? Are timing scores meaningful?
Look at diversity. Does the dataset cover wins and losses, different assets, different market conditions?
Assess freshness. When were episodes generated? Markets from 2023 don't teach models about 2025 conditions.

The structure of your training data determines the ceiling for what your models can learn. Get the structure right, and everything else becomes easier.

Need Training Data Like This?

UV Labs provides complete decision episodes optimized for financial AI training.

Schedule a Conversation

What Actually Goes Into a Training Episode

1. Agent Reasoning

Why Each Field Matters

2. Market Context

Why Each Field Matters

3. Trade Execution

Why Each Field Matters

4. Position Journey

Why Each Field Matters

5. Outcomes

Why Each Field Matters

6. Counterfactuals

Why Each Field Matters

How Components Connect

Optional: Social Sentiment

Quality Requirements

Using This Structure

Continue Reading

Need Training Data Like This?