Skip to content
Back to Blog

Why Financial AI Needs Reasoning Traces, Not Just Outcomes

TL;DR

Outcome-only training conflates luck with skill; reasoning traces are the only way to separate them and teach models how to think about financial decisions.

  • OpenAI's research shows process supervision "significantly outperforms outcome supervision" for reasoning tasks
  • FOMO entries have 23% lower win rates than planned entries, but this only shows up in reasoning data, not outcomes
  • ReAct (Reasoning and Acting) traces capture the adaptive information-gathering process that mirrors how human traders actually work

Here's a trade: BTC long at $65,000, closed at $67,000, profit 3%. Good or bad?

You can't tell. That's the problem with outcome-only training data.

Maybe the trader had a $70,000 target and exited early because of a scary headline. Maybe they executed a textbook breakout trade with perfect risk management. Maybe they just aped in because someone on Twitter said "BTC to 100k." The outcome is identical. The reasoning is completely different. And the model trained on this data will have no idea which behavior to replicate.

In 2023, OpenAI published a paper called "Let's Verify Step by Step" that demonstrated something financial AI builders should pay attention to. When they trained models to evaluate each step of mathematical reasoning rather than just checking if the final answer was correct, accuracy on challenging problems improved substantially. Process supervision, they found, "significantly outperforms outcome supervision."

The same principle applies to trading, only it's even more pronounced. Because in math, a wrong answer is clearly wrong. In trading, a wrong decision can produce the right outcome through pure luck, and vice versa.

Why Outcome Learning Fails in Markets

Behavioral finance research shows that traders feel losses nearly twice as strongly as equivalent gains. This asymmetry makes them hold losing positions too long, hoping for a recovery. After a loss, revenge trading increases average loss size by 340%. FOMO entries have win rates 23% lower than planned entries.

None of these psychological patterns show up in outcome data. A FOMO entry that happens to profit looks identical to a planned entry that profits. A revenge trade that gets lucky looks like skill. If you train on outcomes alone, you're training the model to replicate behaviors that will eventually blow up.

The problem gets worse: market outcomes are noisy by nature. Even skilled traders experience long losing streaks from variance. Even random strategies occasionally string together winners. Studies suggest distinguishing genuine skill from luck requires hundreds or thousands of independent trades. Yet most training happens on far smaller datasets.

The Core Problem

Reasoning traces are the only way to separate luck from skill with limited samples. They let you evaluate whether the process was sound, regardless of whether this particular instance of the outcome was favorable.

What Reasoning Traces Actually Contain

A reasoning trace captures the step-by-step cognitive process:

Phase: Analysis
"4-hour chart showing higher lows since yesterday's
consolidation. RSI at 42, not overbought. Volume
declining into the range, suggesting accumulation.
BTC.D stable, no major alt rotation underway."

Phase: Decision
"Setup matches my breakout criteria. Risk-reward
is roughly 1:2.5 to the next resistance at 68k.
Position sizing at 2% of portfolio gives stop
distance of ~2.2%. Confidence: moderate-high."

Phase: Execution
"Placing limit order 50 points below current bid
to catch a wick. Stop at yesterday's low. First
TP at 67k (50% size), second at 68.2k (remaining)."

Each phase serves a training purpose:

  • Analysis shows what observations matter and why
  • Decision shows how observations convert to actionable conclusions
  • Execution shows how conclusions become specific actions

ReAct: Reasoning and Acting

The most effective reasoning pattern for financial AI is ReAct (Reasoning and Acting). Rather than thinking through everything first and then acting, ReAct interleaves reasoning with tool use:

  1. Think: What do I need to know next?
  2. Act: Call a tool to get that information
  3. Observe: Process the result
  4. Think: What does this mean for my decision?
  5. Repeat until ready to decide

This pattern mirrors how human traders actually work. They don't analyze everything upfront; they iteratively gather information based on what they've learned.

Why ReAct Matters

Financial decisions require dynamic information gathering. The questions you need to answer depend on what you've already learned. ReAct traces capture this adaptive process, not just the final conclusions.

Confidence Calibration

One critical element of reasoning traces is confidence scoring. Models need to learn not just what to decide, but how confident to be in that decision.

"decision_confidence": 0.72

When paired with outcomes, this enables calibration training. Over many examples, the model learns:

  • When it says 70% confident, it should be right about 70% of the time
  • What kinds of situations warrant high confidence vs. uncertainty
  • How to express appropriate hedging when confidence is low

This is essential for production systems. An overconfident model that's wrong is worse than no model at all. A well-calibrated model can be trusted to know what it knows and acknowledge what it doesn't.

Process Supervision vs. Outcome Supervision

The research community increasingly recognizes the value of process supervision:

Outcome supervision: Reward the model based on whether the final answer was correct.

Process supervision: Provide feedback on each step of the reasoning, not just the conclusion.

Studies have shown that process supervision produces more robust and generalizable reasoning. Models learn the underlying logic rather than pattern-matching to correct answers.

For financial AI, this translates to:

  • Training on whether the analysis correctly identified key factors (not just whether the trade profited)
  • Training on whether risk was appropriately sized (not just whether the stop got hit)
  • Training on whether the reasoning was sound (not just whether the outcome was good)

A Tale of Two Trades

Consider two trades with identical outcomes:

Trade A:

Entry: BTC $65,000
Exit: BTC $67,000
Profit: 3.07%

Reasoning: "My friend said BTC was going to moon.
I aped in. Got lucky I guess."

Trade B:

Entry: BTC $65,000
Exit: BTC $67,000
Profit: 3.07%

Reasoning: "Daily trend bullish, 4h showing
higher low structure. RSI reset from overbought.
Entry on retest of breakout level with stop
below structure. Target at next resistance zone.
Risk-reward 1:2.4, sizing for 2% portfolio risk."

Outcome-only training sees these as equivalent. Both were profitable. Both would get the same reward signal.

Reasoning-trace training sees them as fundamentally different. Trade A provides no useful signal, it's noise. Trade B demonstrates a replicable process that can be learned and applied to future decisions.

Implications for Training Infrastructure

Capturing reasoning traces requires more than adding a text field to your database. It requires:

Structured Reasoning Formats

Free-form text is hard to learn from. Structured formats with explicit phases, tool calls, and confidence scores provide cleaner training signal.

Real-Time Capture

Post-hoc rationalization isn't the same as real reasoning. The trace needs to be captured as the decision happens, not reconstructed afterward.

Integration with Execution

Reasoning and action must be linked. A trace that says "I decided X" while the execution shows "they did Y" creates noise, not signal.

Human-in-the-Loop Validation

Not all reasoning is good reasoning. Human feedback on reasoning quality, not just outcomes, helps filter training data.

What Models Learn From Reasoning Traces

When trained on high-quality reasoning traces, models develop capabilities that outcome-only training can't produce:

  • Structured analysis: Breaking complex situations into manageable components
  • Appropriate information seeking: Knowing what to look at and what to ignore
  • Decision frameworks: Consistent logic for converting analysis to action
  • Risk awareness: Natural integration of risk considerations into every decision
  • Self-monitoring: Recognition of when reasoning is solid vs. when it's shaky

These are the capabilities that distinguish usable financial AI from models that can merely talk about finance.

The Bottom Line

If you want AI that can execute financial tasks reliably, you need training data that shows how decisions get made, not just what decisions were made. Reasoning traces are the bridge between language understanding and operational competence.

Need Training Data With Reasoning Traces?

UV Labs captures complete reasoning processes from real financial decisions.

Schedule a Conversation