Complete trading episodes with reasoning, actions, outcomes, and counterfactuals. Gym-compatible environments for on-policy training.
{
"symbol": "NVDA",
"direction": "long",
"market_context": {
"regime": "low_vol",
"rsi_4h": 58.2,
"atr_pct": 0.024,
"spread_bps": 1.2,
"imbalance_ratio": 0.63,
"volatility_regime": "compressing"
},
"social_sentiment": {
"avg_score": 0.23,
"distribution": { "bullish": 20, "bearish": 2, "neutral": 27 },
"top_signal": "Institutional accumulation flagged by 3 accounts"
},
"agent_reasoning": {
"explicit_reasoning": "NVDA consolidating above 200d MA post-earnings. Vol compressing, order book imbalance at 0.63, and 3 accounts flagging institutional accumulation. Entering long with stop below the consolidation range.",
"decision_confidence": 0.84,
"tool_calls": [{
"name": "fetch_technicals",
"args": { "symbol": "NVDA", "tf": "4h" },
"result": "Above 200d MA. RSI 58, ATR contracting."
}]
},
"trade_execution": {
"entry_price": 131.42,
"position_size_usd": 24800,
"stop_loss": 126.50,
"take_profits": [138.00, 142.50]
},
"position_journey": [
{ "at": "1h", "pnl": 0.31, "mfe": 0.44, "mae": -0.12 },
{ "at": "4h", "pnl": 1.12, "mfe": 1.38, "mae": -0.12 },
{ "at": "24h", "pnl": 2.89, "mfe": 3.41, "mae": -0.54 },
{ "at": "48h", "pnl": 4.72, "mfe": 6.31, "mae": -0.54 }
],
"outcome": {
"result": "win",
"exit_price": 137.62,
"realized_pnl_percent": 4.72,
"hold_duration_minutes": 2847,
"timing_score": 0.91,
"sharpe_contribution": 0.34
},
"counterfactuals": [{
"mfe_pnl_percent": 6.31,
"optimal_pnl_percent": 8.43,
"trailing_stop_pnl": 5.88,
"timing_score": 0.74,
"held_too_long": false,
"exited_too_early": true
}]
}
import uv
# Offline: download complete episodes
episodes = uv.download("NVDA", n=10000)
train(model, episodes)
# Replay: evaluate against historical decisions
env = uv.replay("episode_nvda_20260507")
obs = env.reset()
action = model.predict(obs)
obs, reward, done, info = env.step(action)
# Online: generate live trajectories
env = uv.connect(market="live", symbol="NVDA")
obs, done = env.reset(), False
while not done:
action = model.predict(obs)
obs, reward, done, info = env.step(action)
uv.download()Complete episodes as state-action-reward tuples for offline training.
uv.replay()Run your policy against historical decisions and measure counterfactual outcomes.
uv.connect()Generate on-policy trajectories in live or simulated markets.
Specify the market, conditions, and strategy type. We deploy real traders to generate complete decision sequences with full reasoning traces, on demand.
Yes. Schemas, observation spaces, reward functions, instrument coverage, all configurable. If you need fields that don't exist yet, we build them with you.
Structured JSON, delivered via streaming API or batch export. Each episode contains market state, reasoning trace, action, and outcome. Plugs into standard ML pipelines out of the box.
Every state snapshot is strictly point-in-time: only information available at decision time. Outcomes and counterfactuals are computed after the fact and stored separately.
Episodes include realized P&L, risk-adjusted metrics (Sharpe, Sortino, max drawdown), timing efficiency, and counterfactual comparisons. Most teams will want custom reward shaping on top of that (multi-objective, exposure-weighted, regime-conditional) and we support all of it.
Yes. Offline datasets, episode replay for counterfactual evaluation, and live environments for on-policy trajectory generation. The SDK is Gym-compatible with configurable observation and action spaces.
750+ agents generate episodes continuously across multiple market regimes. Generation scales to match your pipeline throughput, and you can prioritize specific asset classes or decision types.
You tell us what you're training and we scope the engagement: asset classes, episode volume, schema, and delivery format. Most pilots are live within a few weeks.
Tell us what you're training and we'll scope the data, schema, and environment around your research.
Schedule a CallUniversity and academic researchers — ask about discounted access.