Post-Training Data & Environments

Decision data for financial AI.

Complete trading episodes with reasoning, actions, outcomes, and counterfactuals. Gym-compatible environments for on-policy training.

episode_nvda_20260507.json
{
  "symbol": "NVDA",
  "direction": "long",
  "market_context": {
    "regime": "low_vol",
    "rsi_4h": 58.2,
    "atr_pct": 0.024,
    "spread_bps": 1.2,
    "imbalance_ratio": 0.63,
    "volatility_regime": "compressing"
  },
  "social_sentiment": {
    "avg_score": 0.23,
    "distribution": { "bullish": 20, "bearish": 2, "neutral": 27 },
    "top_signal": "Institutional accumulation flagged by 3 accounts"
  },
  "agent_reasoning": {
    "explicit_reasoning": "NVDA consolidating above 200d MA post-earnings. Vol compressing, order book imbalance at 0.63, and 3 accounts flagging institutional accumulation. Entering long with stop below the consolidation range.",
    "decision_confidence": 0.84,
    "tool_calls": [{
      "name": "fetch_technicals",
      "args": { "symbol": "NVDA", "tf": "4h" },
      "result": "Above 200d MA. RSI 58, ATR contracting."
    }]
  },
  "trade_execution": {
    "entry_price": 131.42,
    "position_size_usd": 24800,
    "stop_loss": 126.50,
    "take_profits": [138.00, 142.50]
  },
  "position_journey": [
    { "at": "1h",  "pnl": 0.31, "mfe": 0.44, "mae": -0.12 },
    { "at": "4h",  "pnl": 1.12, "mfe": 1.38, "mae": -0.12 },
    { "at": "24h", "pnl": 2.89, "mfe": 3.41, "mae": -0.54 },
    { "at": "48h", "pnl": 4.72, "mfe": 6.31, "mae": -0.54 }
  ],
  "outcome": {
    "result": "win",
    "exit_price": 137.62,
    "realized_pnl_percent": 4.72,
    "hold_duration_minutes": 2847,
    "timing_score": 0.91,
    "sharpe_contribution": 0.34
  },
  "counterfactuals": [{
    "mfe_pnl_percent": 6.31,
    "optimal_pnl_percent": 8.43,
    "trailing_stop_pnl": 5.88,
    "timing_score": 0.74,
    "held_too_long": false,
    "exited_too_early": true
  }]
}
500K+
Decision episodes
Full Traces
Reasoning, actions & outcomesIntent to outcome
3 Years
Live collection

Every other financial dataset captures what happened. Ours captures the reasoning behind every decision, the outcome that followed, and what should have happened instead.

train.py
import uv

# Offline: download complete episodes
episodes = uv.download("NVDA", n=10000)
train(model, episodes)

# Replay: evaluate against historical decisions
env = uv.replay("episode_nvda_20260507")
obs = env.reset()
action = model.predict(obs)
obs, reward, done, info = env.step(action)

# Online: generate live trajectories
env = uv.connect(market="live", symbol="NVDA")
obs, done = env.reset(), False
while not done:
    action = model.predict(obs)
    obs, reward, done, info = env.step(action)
RL Environment

Three ways to train

uv.download()

Complete episodes as state-action-reward tuples for offline training.

uv.replay()

Run your policy against historical decisions and measure counterfactual outcomes.

uv.connect()

Generate on-policy trajectories in live or simulated markets.

Custom Trajectories

Specify the market, conditions, and strategy type. We deploy real traders to generate complete decision sequences with full reasoning traces, on demand.

Any Asset Class Custom Schemas On-Demand Generation Full Reasoning Traces
Talk to Us
FAQ

Common
questions

750+ agents generating episodes continuously

Run a pilot.

Tell us what you're training and we'll scope the data, schema, and environment around your research.

Schedule a Call

University and academic researchers — ask about discounted access.