Post-Training Data & Environments

Decision data for financial AI.

Complete trading episodes with reasoning, actions, outcomes, and counterfactuals. Gym-compatible environments for on-policy training.

View Environment

episode_nvda_20260507.json

{
  "symbol": "NVDA",
  "direction": "long",
  "market_context": {
    "regime": "low_vol",
    "rsi_4h": 58.2,
    "atr_pct": 0.024,
    "spread_bps": 1.2,
    "imbalance_ratio": 0.63,
    "volatility_regime": "compressing"
  },
  "social_sentiment": {
    "avg_score": 0.23,
    "distribution": { "bullish": 20, "bearish": 2, "neutral": 27 },
    "top_signal": "Institutional accumulation flagged by 3 accounts"
  },
  "agent_reasoning": {
    "explicit_reasoning": "NVDA consolidating above 200d MA post-earnings. Vol compressing, order book imbalance at 0.63, and 3 accounts flagging institutional accumulation. Entering long with stop below the consolidation range.",
    "decision_confidence": 0.84,
    "tool_calls": [{
      "name": "fetch_technicals",
      "args": { "symbol": "NVDA", "tf": "4h" },
      "result": "Above 200d MA. RSI 58, ATR contracting."
    }]
  },
  "trade_execution": {
    "entry_price": 131.42,
    "position_size_usd": 24800,
    "stop_loss": 126.50,
    "take_profits": [138.00, 142.50]
  },
  "position_journey": [
    { "at": "1h",  "pnl": 0.31, "mfe": 0.44, "mae": -0.12 },
    { "at": "4h",  "pnl": 1.12, "mfe": 1.38, "mae": -0.12 },
    { "at": "24h", "pnl": 2.89, "mfe": 3.41, "mae": -0.54 },
    { "at": "48h", "pnl": 4.72, "mfe": 6.31, "mae": -0.54 }
  ],
  "outcome": {
    "result": "win",
    "exit_price": 137.62,
    "realized_pnl_percent": 4.72,
    "hold_duration_minutes": 2847,
    "timing_score": 0.91,
    "sharpe_contribution": 0.34
  },
  "counterfactuals": [{
    "mfe_pnl_percent": 6.31,
    "optimal_pnl_percent": 8.43,
    "trailing_stop_pnl": 5.88,
    "timing_score": 0.74,
    "held_too_long": false,
    "exited_too_early": true
  }]
}

500K+

Decision episodes

Full Traces

Reasoning, actions & outcomesIntent to outcome

3 Years

Live collection

Every other financial dataset captures what happened. Ours captures the reasoning behind every decision, the outcome that followed, and what should have happened instead.

train.py

import uv

# Offline: download complete episodes
episodes = uv.download("NVDA", n=10000)
train(model, episodes)

# Replay: evaluate against historical decisions
env = uv.replay("episode_nvda_20260507")
obs = env.reset()
action = model.predict(obs)
obs, reward, done, info = env.step(action)

# Online: generate live trajectories
env = uv.connect(market="live", symbol="NVDA")
obs, done = env.reset(), False
while not done:
    action = model.predict(obs)
    obs, reward, done, info = env.step(action)

RL Environment

Three ways to train

`uv.download()`

Complete episodes as state-action-reward tuples for offline training.

`uv.replay()`

Run your policy against historical decisions and measure counterfactual outcomes.

`uv.connect()`

Generate on-policy trajectories in live or simulated markets.

Custom Trajectories

Specify the market, conditions, and strategy type. We deploy real traders to generate complete decision sequences with full reasoning traces, on demand.

Any Asset Class Custom Schemas On-Demand Generation Full Reasoning Traces

Talk to Us

FAQ

Common
questions

Can you customize the schema for our use case?

Yes. Schemas, observation spaces, reward functions, instrument coverage, all configurable. If you need fields that don't exist yet, we build them with you.

What format is the data in?

Structured JSON, delivered via streaming API or batch export. Each episode contains market state, reasoning trace, action, and outcome. Plugs into standard ML pipelines out of the box.

How do you handle lookahead bias?

Every state snapshot is strictly point-in-time: only information available at decision time. Outcomes and counterfactuals are computed after the fact and stored separately.

What's the reward signal?

Episodes include realized P&L, risk-adjusted metrics (Sharpe, Sortino, max drawdown), timing efficiency, and counterfactual comparisons. Most teams will want custom reward shaping on top of that (multi-objective, exposure-weighted, regime-conditional) and we support all of it.

Do you support online and on-policy RL?

Yes. Offline datasets, episode replay for counterfactual evaluation, and live environments for on-policy trajectory generation. The SDK is Gym-compatible with configurable observation and action spaces.

Can you scale to our data requirements?

750+ agents generate episodes continuously across multiple market regimes. Generation scales to match your pipeline throughput, and you can prioritize specific asset classes or decision types.

How does a pilot work?

You tell us what you're training and we scope the engagement: asset classes, episode volume, schema, and delivery format. Most pilots are live within a few weeks.

750+ agents generating episodes continuously

Run a pilot.

Tell us what you're training and we'll scope the data, schema, and environment around your research.

Schedule a Call

University and academic researchers — ask about discounted access.