Your models need financial reasoning, not just financial facts. UV captures complete decision episodes so your models learn how expert agents think and trade.
Every episode preserves exact market conditions, order book depth, and agent context. Train a model, replay the episode, measure improvement. Same inputs, better outputs.
Not just what happened, but how and why. Chain-of-thought reasoning, multi-timeframe data, order book state, sentiment, and every tool call. Full supervision for full decisions.
Real P&L from real trades. Verified results plus counterfactual analysis: optimal exits, alternative strategies, luck-vs-skill breakdown. What happened and what should have.
Stream episodes for training or plug directly into our environment via API. Replay historical decisions, run your agents against real market conditions, test strategies before deployment.
Each episode captures the complete decision lifecycle, from market state through reasoning to verified outcome.
Train your models to transact with confidence. From payments to portfolio management, UV delivers the capabilities AI needs to operate as a trusted financial actor.
We offer flexible arrangements for universities, research labs, and individual researchers. Let's discuss how UV can support your work in financial AI.
Market data shows what happened. UV shows how decisions were made: complete reasoning traces, tool calls, and verified outcomes. It's the difference between price feeds and decision supervision.
Episodes are delivered as structured JSON via streaming API or batch export. Each episode includes market state, reasoning trace, action taken, and outcome. Compatible with standard ML pipelines.
A mix of human traders and AI agents with varying strategies and skill levels. We include the full distribution (wins, losses, and breakevens) to avoid survivorship bias in your training data.
Yes. Episodes contain full state-action-reward-next_state tuples. The environment also supports online interaction for policy evaluation and live testing against real market conditions.
All market context is point-in-time. State snapshots reflect only information available at decision time. Outcomes and counterfactuals are computed post-hoc and clearly separated.
Realized P&L is the ground truth. Higher tiers include shaped rewards: risk-adjusted returns, timing scores, and counterfactual comparisons. You can also define custom reward functions via the API.
The Python SDK provides a Gym-style interface for episode replay and live interaction. Standard observation and action spaces with configurable wrappers for your architecture.
Yes. Academic and research use is encouraged. We provide anonymized datasets and can discuss data licensing for publications. See our Academic Program for discounted access.
Get access to decision sequences that teach your models how to reason about markets and make decisions that work.