The Data Problem

The Financial AI Data Problem: Why Market Data Isn't Enough

March 2025 12 min read

TL;DR

Financial AI is bottlenecked by data, but not the kind you think: market data shows what happened, while decision data shows what to do about it.

BloombergGPT cost $3M to train on 363B tokens of market data, yet still cannot make trading decisions
Renaissance's edge came from capturing decision data (reasoning, signals, outcomes), not just price history
Financial AI needs complete decision episodes with reasoning traces, counterfactuals, and process feedback

On August 1, 2012, Knight Capital deployed a software update to their trading systems. Within 45 minutes, the firm had lost $440 million. A dormant piece of code called "Power Peg" was accidentally left active on one of eight servers, triggering millions of unintended trades and single-handedly generating 20-50% of total market volume in 75 different stocks.

Knight had decades of market data. Petabytes of price history, order flow, execution records. What they didn't have was a system that understood how to make decisions about trading, as opposed to data about what trading looked like.

This distinction matters for AI.

The $3 Million Problem

When Bloomberg built BloombergGPT, their 50-billion parameter finance model, they spent approximately $3 million on training compute alone. They fed it 363 billion tokens of financial data from their proprietary terminals, news feeds, and research archives.

The result? A model that outperforms general LLMs on financial NLP benchmarks by 8-10 percentage points. It can classify sentiment, extract entities, and answer questions about financial concepts. Useful stuff.

But can it trade? Can it look at a market situation and decide whether to go long, short, or flat? Can it size a position appropriately given a risk budget? Can it know when to cut a loss versus hold through noise?

No. And the reason isn't compute or architecture. It's data.

What Jim Simons Understood

Renaissance Technologies' Medallion Fund returned 66% annually (gross) for 30 years. A dollar invested in 1988 would be worth $14 million today. During the 2008 financial crisis, when most funds were hemorrhaging money, Medallion returned 98%.

Jim Simons, the mathematician who founded Renaissance, once said: "Patterns of price movement are not random. However, they're close enough to random so that getting some excess, some edge out of it, is not easy and not so obvious."

What made Renaissance different wasn't access to better price data. Everyone has price data. What made them different was capturing and learning from the decisions their systems made. Every trade, every signal, every reasoning chain got recorded and fed back into their models. They built a flywheel of decision data that's been compounding for decades.

The secret wasn't in the market data. It was in the decision data.

The Core Problem

Market data shows what happened. Decision data shows what to do about it. Bloomberg has 363 billion tokens of the former. Nobody has built a comparable dataset of the latter.

Why Market Data Doesn't Transfer

Imagine you wanted to teach someone to play poker by showing them hand histories. Here's the board, here's the action, here's who won the pot.

Now imagine you had millions of these histories. Would watching enough hands teach you to play well?

Probably not. You'd learn the rules and some patterns. You'd see that certain hands tend to win. But you wouldn't learn how to think about the game. When to bluff, how to read opponents, when a pot odds calculation should override your gut feeling about an opponent's range.

Financial data has the same problem. You can feed a model:

Price data: BTC went from $64,000 to $61,000 in 3 hours on heavy volume. But should you buy the dip? Sell the breakdown? The data doesn't say.

Order book snapshots: There's 500 BTC of bids within 1% of current price. But is that real demand or spoofing? Should you front-run it or fade it?

News and sentiment: The Fed hinted at rate cuts. But is that priced in already? How much? Are traders positioned for it or against it?

Behavioral finance research shows that traders feel losses nearly twice as strongly as equivalent gains. This makes them hold losing positions too long, hoping for a recovery rather than cutting losses. Revenge trading after a loss increases average loss size by 340%. FOMO entries have win rates 23% lower than planned entries.

None of this shows up in price charts. It's embedded in human decision-making. And until models can learn from decision data that captures this reasoning, they'll keep making the same mistakes amateur traders make.

The Catch-22

Here's the awkward part: this data doesn't exist because nobody's built the infrastructure to create it.

Professional traders don't document their reasoning. They're under time pressure, incentivized to execute rather than explain. Even traders who keep journals rarely capture the full context: what they saw on their screens, what news they'd read that morning, what alternatives they considered and rejected.

Quant funds have this data internally, but they guard it religiously. According to Barclays, quantitative funds now manage over 35% of all hedge fund assets, up from just 10% in 2010. The firms with the best decision data have no incentive to share it. Renaissance's decision records are probably worth more than most countries' GDP.

And even if someone wanted to build a public dataset, there's no standard format. Everyone tracks different things in different ways. Aggregating across sources is nearly impossible.

So research labs face a chicken-and-egg problem: they can't demonstrate financial AI progress without decision data, and nobody's building decision data because nobody's demonstrated it's worth the effort.

The RLHF Comparison

When OpenAI trained ChatGPT, they didn't just throw more text at it. They hired 40 contractors to provide human feedback on model outputs. Those contractors rated responses, identified problems, suggested improvements. This feedback data, not the pre-training corpus, is what made ChatGPT actually useful.

Today, companies like Scale AI and Surge AI have built massive labeling workforces. Generalists earn $20-40 per hour. Experts with doctoral degrees can earn $90-200 per hour for specialized domains. Surge AI alone bootstrapped to $1.2 billion in annual revenue by 2024, surpassing Scale's $870 million.

The lesson: human feedback data at scale transforms model capabilities. The companies that invested in building this data infrastructure became essential to the AI industry.

Financial AI needs the same thing. Not more market data, not bigger models, but structured decision data with human feedback baked in. Reasoning traces that explain not just what happened but why, and whether the reasoning was sound regardless of the outcome.

What Actually Needs to Exist

For financial AI to work, someone needs to build datasets that include:

Complete decision episodes. Not just "bought BTC at $65,000" but the entire context: what was the thesis, what signals were considered, what alternatives were rejected, what risk parameters constrained the decision, how was it sized, what was the plan for exit.

Counterfactual environments. The ability to replay the same market situation and explore different decisions. What would have happened if the trader had waited an hour? Sized up 2x? Set a tighter stop?

Process feedback, not just outcome feedback. A trade can be profitable from luck or unprofitable from bad variance. The reasoning quality matters separately from the P&L. OpenAI's research shows that "rewarding each correct step significantly outperforms outcome supervision" for reasoning tasks.

Continuous fresh data. Markets change. A model trained on 2021 bull market data knows nothing about the 2022 Fed hiking cycle, the FTX collapse, or the 2024 recovery. Static datasets create brittle systems.

Why This Matters Now

Quant funds are estimated to manage $1.2-1.5 trillion in assets, roughly 25-30% of total hedge fund AUM. Algorithmic trading accounts for 70% of U.S. equity volume. These numbers keep growing.

But here's the thing: most current quant systems are narrow ML models, gradient boosted trees and simple neural networks trained for specific signal prediction tasks. They're powerful within their domains but can't generalize, can't adapt to new situations, can't explain their reasoning.

LLMs could change that. A model that can reason about markets, use tools, and explain its thinking would be qualitatively different from today's quant systems. But only if it can be trained on the right data.

The teams building financial decision data infrastructure today are positioning themselves for this transition. The data layer is the bottleneck. Models can be replicated, architectures copied, but unique high-quality training data cannot be easily substituted.

Where This Goes

The race to build financial decision data is quiet but consequential. Unlike model architecture papers that generate Twitter buzz, data infrastructure gets built in private. But when financial AI suddenly gets good, it won't be because someone invented a new transformer variant. It'll be because someone finally built the training data.

Building Financial AI?

Learn how UV Labs provides the decision data layer that research labs need.

Schedule a Conversation