Post-Training Data & Environments

Decision data for financial AI.

Reasoning traces, tool calls, counterfactuals, and verified outcomes. Three years of live collection, built for post-training.

View Environment

episode_nvda_20260507.json

{
  "symbol": "NVDA",
  "direction": "long",
  "market_context": {
    "regime": "low_vol",
    "rsi_4h": 58.2,
    "atr_pct": 0.024,
    "spread_bps": 1.2,
    "imbalance_ratio": 0.63,
    "volatility_regime": "compressing"
  },
  "social_sentiment": {
    "avg_score": 0.23,
    "distribution": { "bullish": 20, "bearish": 2, "neutral": 27 },
    "top_signal": "Institutional accumulation flagged by 3 accounts"
  },
  "agent_reasoning": {
    "explicit_reasoning": "NVDA consolidating above 200d MA post-earnings. Vol compressing, order book imbalance at 0.63, and 3 accounts flagging institutional accumulation. Entering long with stop below the consolidation range.",
    "decision_confidence": 0.84,
    "tool_calls": [{
      "name": "fetch_technicals",
      "args": { "symbol": "NVDA", "tf": "4h" },
      "result": "Above 200d MA. RSI 58, ATR contracting."
    }]
  },
  "trade_execution": {
    "entry_price": 131.42,
    "position_size_usd": 24800,
    "stop_loss": 126.50,
    "take_profits": [138.00, 142.50]
  },
  "position_journey": [
    { "at": "1h",  "pnl": 0.31, "mfe": 0.44, "mae": -0.12 },
    { "at": "4h",  "pnl": 1.12, "mfe": 1.38, "mae": -0.12 },
    { "at": "24h", "pnl": 2.89, "mfe": 3.41, "mae": -0.54 },
    { "at": "48h", "pnl": 4.72, "mfe": 6.31, "mae": -0.54 }
  ],
  "outcome": {
    "result": "win",
    "exit_price": 137.62,
    "realized_pnl_percent": 4.72,
    "hold_duration_minutes": 2847,
    "timing_score": 0.91,
    "sharpe_contribution": 0.34
  },
  "counterfactuals": [{
    "mfe_pnl_percent": 6.31,
    "optimal_pnl_percent": 8.43,
    "trailing_stop_pnl": 5.88,
    "timing_score": 0.74,
    "held_too_long": false,
    "exited_too_early": true
  }]
}

Illustrative schema: episodes span crypto and equities venues.

500K+

Decision episodes from 1M+ live trades

Full Traces

Reasoning, actions & outcomesIntent to outcome

3 Years

Live collection

Other datasets record what happened. Ours records why, and what should have happened instead.

train.py

import uvlabs

# Offline: download complete episodes
episodes = uvlabs.download("NVDA", n=10000)
train(model, episodes)

# Replay: evaluate against historical decisions
env = uvlabs.replay("episode_nvda_20260507")
obs = env.reset()
action = model.predict(obs)
obs, reward, done, info = env.step(action)

# Online: generate live trajectories
env = uvlabs.connect(market="live", symbol="NVDA")
obs, done = env.reset(), False
while not done:
    action = model.predict(obs)
    obs, reward, done, info = env.step(action)

Python SDK (preview).

RL Environment

Three ways to train

`uvlabs.download()`

Complete episodes as state-action-reward tuples for offline training.

`uvlabs.replay()`

Run your policy against historical decisions and measure counterfactual outcomes.

`uvlabs.connect()`

Generate on-policy trajectories in live or simulated markets.

From UV Harness, running in live markets.

Custom Trajectories

Specify the market, conditions, and strategy type. We deploy real traders to generate complete decision sequences with full reasoning traces, on demand.

Any Asset Class Custom Schemas On-Demand Generation Full Reasoning Traces

Talk to Us

FAQ

Common
questions

Every episode in the corpus comes from the same production Harness that platforms run in live markets. The answers below cover schema, licensing, and integration.

Can you customize the schema for our use case?

Yes. Schemas, observation spaces, reward functions, instrument coverage, all configurable. If you need fields that don't exist yet, we build them with you.

What format is the data in?

Structured JSON, delivered via streaming API or batch export. Each episode contains market state, reasoning trace, action, and outcome. Plugs into standard ML pipelines out of the box.

How do you handle lookahead bias?

Every state snapshot is strictly point-in-time: only information available at decision time. Outcomes and counterfactuals are computed after the fact and stored separately.

What's the reward signal?

Episodes include realized P&L, risk-adjusted metrics (Sharpe, Sortino, max drawdown), timing efficiency, and counterfactual comparisons. Most teams will want custom reward shaping on top of that (multi-objective, exposure-weighted, regime-conditional) and we support all of it.

Do you support online and on-policy RL?

Yes. Offline datasets, episode replay for counterfactual evaluation, and live environments for on-policy trajectory generation. The SDK is Gym-compatible with configurable observation and action spaces.

Can you scale to our data requirements?

750+ agents generate episodes continuously across multiple market regimes. Generation scales to match your pipeline throughput, and you can prioritize specific asset classes or decision types.

How does a pilot work?

You tell us what you're training and we scope the engagement: asset classes, episode volume, schema, and delivery format. Most pilots are live within a few weeks.

Trusted By