What is a decision episode in financial AI?

A decision episode is a complete record of a financial decision that captures everything needed to understand and learn from the decision: the market context, the reasoning process, the action taken, what happened afterward, and what would have happened with different choices. Think of it like a detailed case study that an AI can learn from.

What is the difference between process supervision and outcome supervision in AI?

Outcome supervision only judges the final result (did the trade make money?), while process supervision evaluates each step of the reasoning. Process supervision is better for training because it separates good reasoning from lucky outcomes. A trade can be profitable due to luck even with flawed reasoning, or unprofitable despite sound analysis.

What is RLHF and why is it important for AI?

RLHF (Reinforcement Learning from Human Feedback) is a training technique where humans rate AI outputs, those ratings train a reward model, and the reward model guides the AI to improve. It transformed models like GPT-3 into useful assistants like ChatGPT by teaching them to follow human preferences.

What is the Sharpe ratio and what is considered good?

The Sharpe ratio measures risk-adjusted return: how much extra return you earn for each unit of risk. A ratio around 1 is acceptable, 2 is strong, and above 3 is exceptional over long periods. Most hedge funds ignore strategies with a Sharpe ratio below 2.

What is a transformer in AI and why does it matter?

A transformer is the neural network architecture that powers all modern AI language models including ChatGPT, Claude, GPT-4, and Llama. Introduced in 2017, transformers use 'self-attention' to process entire sequences simultaneously, understanding how words relate to each other. The T in GPT stands for Transformer.

What is AI hallucination and why does it happen?

AI hallucination occurs when a language model generates information that sounds confident and plausible but is actually false or fabricated. It happens because LLMs predict the most likely next word based on patterns, not by retrieving verified facts. When they encounter gaps in knowledge, they fill them with plausible-sounding but potentially incorrect text.

What is leverage in trading and why is it risky?

Leverage lets you control a large position with a small amount of money by borrowing the rest. With 10x leverage, $1,000 controls $10,000 of assets. Gains are multiplied, but so are losses. At 100x leverage, a 1% adverse move wipes out your entire position. Most retail traders who use high leverage eventually blow up their accounts.

Financial AI Glossary

Key Terms Explained

Essential definitions for researchers, engineers, and traders working at the intersection of artificial intelligence and financial markets.

Ctrl+K

AI Training Methods

How AI models learn from data

RLHF (Reinforcement Learning from Human Feedback)

Teaching AI to behave as humans prefer by learning from human ratings

RLHF is the technique that transformed raw language models like GPT-3 into useful assistants like ChatGPT. Instead of just predicting the next word, RLHF teaches models to generate responses that humans actually find helpful, accurate, and appropriate.

The process has three steps: First, humans compare pairs of AI responses and indicate which is better. Second, these preferences train a "reward model" that predicts how humans would rate any response. Third, this reward model guides the AI to improve its outputs through reinforcement learning.

Why It Matters for Finance

Standard RLHF doesn't work well for financial AI because rewards are delayed (you don't know if a trade was good until later), experts are scarce and expensive, and good decisions can have bad outcomes due to market randomness. This is why financial AI requires specialized approaches like decision episodes with process supervision.

DPO (Direct Preference Optimization)

A simpler alternative to RLHF that skips the reward model

DPO achieves similar results to RLHF but with less complexity. Instead of training a separate reward model and then using reinforcement learning, DPO directly adjusts the language model using preference data. Given pairs of responses where humans indicated which is better, DPO increases the probability of preferred responses and decreases dispreferred ones.

Think of it this way: RLHF is like teaching someone to cook by first having them study what makes food taste good (reward model), then practicing cooking (RL). DPO is like learning by directly comparing two dishes and adjusting your technique based on which one tastes better.

In Practice

Many leading open-source models like Llama and Mistral use DPO for alignment because it's faster to train, more stable, and requires less compute than traditional RLHF while achieving comparable quality.

Process Supervision

Judging each step of AI reasoning, not just the final answer

Process supervision provides feedback on each intermediate step of a model's reasoning, rather than only evaluating the final output. This separates good reasoning from lucky outcomes.

Consider a math problem: Outcome supervision only checks if the answer is correct. Process supervision checks each calculation step. A student might get the right answer through two wrong errors that cancel out, or the wrong answer despite perfect reasoning with one small mistake. Process supervision catches both cases.

Why It Matters for Finance

In trading, a profitable trade isn't necessarily a good decision, it might just be lucky. A losing trade isn't necessarily bad, markets have randomness. Process supervision rewards sound analysis even when results are unfavorable, and penalizes flawed reasoning even when results look good. OpenAI's research showed process supervision significantly outperforms outcome supervision on complex reasoning tasks.

Outcome Supervision

Judging AI only by the final result, ignoring how it got there

Outcome supervision provides feedback only on final results. Did the trade make money? Did the model get the right answer? The reasoning process is ignored.

This is simpler to implement since you only need to label final outcomes. However, it conflates luck with skill. In domains with randomness (like financial markets), good decisions can have bad outcomes and bad decisions can have good outcomes.

The Problem

A trader buys a stock for terrible reasons but gets lucky with a surprise earnings beat. Outcome supervision says "good job, do more of this." A trader does thorough research but loses money due to an unpredictable event. Outcome supervision says "bad, don't do this." Both conclusions are wrong.

Fine-Tuning

Adapting a pre-trained AI model for a specific task or domain

Fine-tuning takes a model that has already learned general patterns from massive datasets and further trains it on specialized data. This is far more efficient than training from scratch since the model already understands language, it just needs to learn your specific use case.

Think of it like hiring an experienced professional versus training someone from scratch. The professional already knows how to work; you just need to teach them your company's specific processes.

Types of Fine-Tuning

Full fine-tuning updates all model weights (most flexible but expensive). LoRA freezes most weights and only trains small adapter layers (efficient for most use cases). Instruction tuning teaches models to follow specific instructions and formats.

Imitation Learning

Training AI by watching expert demonstrations instead of trial-and-error

Imitation learning trains AI by showing it examples of expert behavior rather than defining a reward function and letting it learn through trial and error. The AI learns to mimic what experts do in similar situations.

This is like teaching someone to cook by having them watch and copy a chef, rather than telling them "make something delicious" and letting them figure it out through experimentation.

Why It Matters for Finance

In trading, it's hard to define what "good trading" means precisely enough for a reward function. But we can show examples of skilled traders making decisions. Decision episodes with reasoning traces enable imitation learning from expert financial decision-making.

Offline vs Online RL

Learning from recorded data vs learning by interacting with the environment

Online RL: The AI interacts directly with the environment during training. It takes actions, sees results, and updates its strategy in real-time. Like learning to drive by actually driving.

Offline RL: The AI learns from a fixed dataset of previously recorded experiences without any new interactions. Like learning to drive by studying videos of other drivers.

Why It Matters for Finance

You can't let an untrained AI trade real money to learn (too risky and expensive). Offline RL lets AI learn from historical trading data first. Then you might use online RL in a simulated environment or with small amounts of capital to refine strategies. UV Labs provides both recorded decision episodes (for offline RL) and replayable environments (for online policy evaluation).

Overfitting

When a model memorizes training data instead of learning generalizable patterns

Overfitting occurs when an AI model learns the specific details and noise in its training data so well that it performs poorly on new, unseen data. The model has essentially memorized rather than learned.

Think of it like a student who memorizes specific test answers instead of understanding the subject. They'll ace the exact same test but fail any variation. In trading, this means a strategy that looks amazing on historical data but loses money in live markets.

Warning Signs

A strategy that only works on the exact time period it was trained on. Too many rules or parameters (complexity invites overfitting). Results that are "too good to be true." Dramatically different performance between training data and out-of-sample tests.

Reward Hacking

When AI finds unintended shortcuts to maximize rewards

Reward hacking (also called reward gaming or specification gaming) happens when an AI finds ways to maximize its reward signal that weren't intended by designers. The AI does exactly what you measured, not what you actually wanted.

A classic example: an AI told to minimize user complaints might learn to make the complaint button hard to find. In trading, a model optimizing for Sharpe ratio might learn to avoid trading entirely (zero volatility = infinite Sharpe on the few lucky trades).

Why It Matters

Reward hacking is a fundamental challenge in AI alignment. Process supervision helps because it rewards good reasoning, not just outcomes. If a trading AI learns to "cheat" by exploiting data leakage or unrealistic assumptions, process supervision can catch this even when outcomes look good.

Transformer

The architecture behind ChatGPT, Claude, and every modern LLM

Transformers are the neural network architecture that powers virtually all modern AI language models. Introduced in the 2017 paper "Attention Is All You Need," they replaced older sequential models (RNNs, LSTMs) with a revolutionary approach: process all words simultaneously and let the model learn which words to pay attention to.

Think of reading a sentence: humans don't process words one by one, we see the whole sentence and understand how words relate. Transformers work similarly, using "self-attention" to let each word look at every other word and decide what's relevant. This parallelization made them dramatically faster to train and far more capable.

Fun Fact

The "T" in GPT stands for Transformer. So does the "T" in BERT. Claude, Gemini, Llama, all transformers. This single architecture dominates AI so completely that understanding it is like understanding the engine that powers every modern car.

Hallucination

When AI confidently makes things up that aren't true

AI hallucination occurs when a language model generates information that sounds confident and plausible but is actually false, incomplete, or entirely fabricated. Unlike human mistakes (which often come with "I'm not sure"), AI hallucinations are delivered with the same confidence as accurate information.

Why does this happen? LLMs predict the most likely next word based on patterns, not by retrieving verified facts. When they encounter gaps in knowledge or ambiguous prompts, they fill the gap with plausible-sounding text. The model doesn't "know" it's wrong because it doesn't actually "know" anything, it just predicts.

Famous Fails

A lawyer used ChatGPT to cite legal cases, but the AI invented cases that didn't exist. Google's Bard claimed the James Webb Telescope took the first photos of exoplanets (it didn't). These weren't bugs, they're a fundamental limitation of how LLMs work.

Temperature

A knob that controls how creative vs predictable AI responses are

Temperature is a parameter that controls the randomness of AI outputs. At low temperature (near 0), the model almost always picks the most likely next word, making responses predictable and focused. At high temperature (near 1 or above), less likely words become more probable, creating diverse, creative, sometimes wild responses.

Think of it like a creative dial. Temperature 0 is a robot following a script exactly. Temperature 1 is an improv actor riffing freely. For factual tasks like financial analysis, you want low temperature. For creative writing or brainstorming, higher temperature adds variety.

Practical Settings

0.0-0.3: Best for factual Q&A, code generation, analysis. 0.4-0.7: Balanced for most tasks. 0.8-1.0+: Creative writing, brainstorming, or when you want surprising outputs. Many APIs default to 1.0, but lowering it improves consistency.

Inference

When a trained AI actually generates predictions or outputs

Inference is the phase where a trained AI model is actually used, generating outputs from inputs. Training is learning; inference is performing. Every time ChatGPT responds or a trading model predicts, that's inference. The model's weights are frozen; it's just applying what it learned.

This distinction matters because training is expensive (sometimes millions of dollars, weeks of GPU time) while inference is cheap and fast. A model trains once but runs inference millions of times. Optimizing inference speed and cost is a major focus in production AI systems.

Why It Matters

For trading AI, inference latency can be critical. A model that takes 500ms to generate a trade signal may be too slow for fast-moving markets. Understanding the training/inference split also clarifies why models can't learn in real-time: they need separate fine-tuning phases to incorporate new patterns.

RL Environment

A simulation where AI agents learn by taking actions and observing rewards

An RL (Reinforcement Learning) environment is a simulation that an AI agent can interact with. The agent takes actions, the environment responds with new states and rewards, and the agent learns which actions lead to better outcomes. Think of it like a video game where the AI learns to play by trying things and seeing what works.

Gym-compatible environments follow the OpenAI Gym interface, making them easy to use with standard RL libraries. For financial AI, an RL environment might simulate a trading platform where the agent can place orders, observe market changes, and receive rewards based on P&L.

Why It Matters

Custom RL environments let you train AI on your specific product's workflows and APIs. Instead of training on generic tasks, the model learns exactly how your platform works, including your edge cases and failure modes. This produces models that actually perform on your product.

Eval Suite

A standardized set of tests to measure an AI model's performance

An eval suite (evaluation suite) is a collection of tests, benchmarks, and metrics used to measure how well an AI model performs. It includes test datasets, expected outputs, and scoring criteria. Think of it like a comprehensive exam that covers all the capabilities you care about.

A good eval suite includes not just accuracy metrics but also "failure slices," specific scenarios where models commonly struggle. For financial AI, this might include edge cases like low liquidity conditions, sudden volatility spikes, or unusual market structures.

Why It Matters

You can't improve what you can't measure. Eval suites let you compare model versions, catch regressions before deployment, and communicate model capabilities to stakeholders. Without rigorous evaluation, you're flying blind with AI that handles real money.

Model Hosting

Infrastructure that runs AI models and serves predictions at scale

Model hosting is the infrastructure needed to run trained AI models in production. It handles serving predictions (inference), scaling to meet demand, managing GPU resources, and ensuring uptime. Platforms like HuggingFace, Replicate, and OpenRouter provide managed hosting so you don't have to build this yourself.

Managed hosting abstracts away the complexity: cold starts, autoscaling, GPU provisioning, and failover. You deploy a model and get an API endpoint; the platform handles everything else.

Why It Matters

A trained model is useless if you can't run it reliably. Hosting on major platforms also provides distribution: your model becomes discoverable by thousands of developers. For financial AI, this is how your model goes from research asset to production infrastructure.

Data Concepts

What makes financial AI training data unique

Decision Episode

A complete record of a trading decision: context, reasoning, action, and outcome

A decision episode captures everything needed to understand and learn from a financial decision. It's like a detailed case study that includes: what the market looked like (context), how the trader analyzed the situation (reasoning trace), what they decided to do (action), what happened afterward (outcome), and what would have happened with different choices (counterfactuals).

Unlike raw market data that just shows prices, or trade logs that just show what was bought/sold, decision episodes capture the why behind each decision.

Structure

A UV Labs decision episode contains 62 fields across 6 categories: Market Context (prices, volumes, indicators), Agent Reasoning (analysis steps, confidence scores), Trade Execution (entry details, sizing), Position Journey (how it evolved), Outcomes (P&L, MFE, MAE), and Counterfactuals (what-if scenarios).

Reasoning Trace

A step-by-step record of how a decision was made

A reasoning trace captures the thought process behind a decision: what data was examined, what patterns were noticed, what options were considered, and why a particular action was chosen. It's like showing your work in math class.

This enables process supervision (judging each step) and makes AI decisions interpretable. You can see exactly why the AI made a particular trade, not just that it did.

Components

UV Labs reasoning traces include: Analysis phase (what market data was reviewed), Decision phase (options considered, risk assessment), Execution phase (why specific parameters were chosen), along with explicit tool calls and confidence scores at each step.

Post-Training Data

Specialized data used to teach pre-trained models specific skills

Post-training happens after a model's initial pre-training on general text. While pre-training teaches language understanding from trillions of words, post-training teaches specific behaviors and skills through techniques like RLHF, DPO, or supervised fine-tuning.

Pre-training is like general education (learning to read, write, think). Post-training is like professional training (learning to be a doctor, lawyer, or trader).

Why It Matters

Post-training is what transforms a general AI that knows about finance into one that can do finance. Decision episodes are post-training data specifically designed for financial AI, teaching models to reason about markets rather than just recall facts about them.

Counterfactual Learning

Learning from "what would have happened if..." scenarios

Counterfactual learning uses alternative scenarios to train AI. Beyond learning from what actually happened, models learn from what would have happened with different decisions. This multiplies the training signal from each episode.

If a trader bought at price X, counterfactuals might show: What if they'd waited for price Y? What if they'd used half the position size? What if they'd set a tighter stop loss? Each scenario provides additional learning.

Example

A trader exits a position with 5% profit. Counterfactual analysis shows they could have made 12% by holding longer, or would have been stopped out at -3% if they'd used a tighter stop. The model learns not just from the actual 5% outcome, but from understanding the full range of possibilities.

Replayable Environment

A market simulation that can reset to any historical point

A replayable environment lets AI agents explore different decisions from the same starting market state. Unlike static datasets that show what happened once, replayable environments allow experimentation: try one strategy, reset, try another.

Think of it like a video game save point. You can try different approaches to the same situation and see how each plays out, without permanent consequences.

Why It Matters

Enables counterfactual analysis and online policy evaluation without risking real capital. UV Labs environments support both offline training on recorded episodes and online interaction for testing new strategies.

Chain of Thought (CoT)

Getting AI to show its reasoning step-by-step

Chain of thought prompting encourages AI models to break down complex problems into intermediate reasoning steps rather than jumping to answers. Simply adding "Let's think step by step" to a prompt can significantly improve accuracy on reasoning tasks.

For financial decisions, CoT means the AI explains its analysis: "First, I notice the price is near resistance. Second, volume is declining. Third, this pattern historically precedes..." rather than just outputting "Sell."

Important Note

CoT only provides significant benefits in larger models (generally 100B+ parameters). In smaller models, forcing step-by-step reasoning can actually hurt performance because the model generates plausible-sounding but incorrect reasoning steps.

Distributional Shift

When real-world data differs from what the model was trained on

Distributional shift (or distribution shift) occurs when the data a model encounters during deployment differs meaningfully from its training data. The model may have learned patterns that simply don't apply anymore.

Markets are notorious for distributional shift. A model trained on 2015-2020 data might fail spectacularly in 2020-2025 because market dynamics, correlations, and volatility regimes changed. What worked in a bull market may fail in a crash.

Real-World Example

Interest rate hikes in 2022-2023 broke many AI models trained during the 2010s zero-rate era. Correlation patterns between assets changed dramatically. Models assuming "stocks and bonds move inversely" failed when both fell together.

Backtesting

Testing a strategy on historical data to see how it would have performed

Backtesting simulates how a trading strategy would have performed if you'd used it in the past. You run your strategy against historical market data and measure outcomes like returns, drawdowns, and win rates.

It's essential for strategy development but dangerous if done wrong. The past doesn't guarantee future results. Common pitfalls include lookahead bias (using future information), survivorship bias (only testing assets that still exist), and overfitting (optimizing too perfectly for historical data).

Why It Matters

Every strategy must be backtested, but results must be interpreted carefully. Walk-forward testing (testing on truly unseen future data) is more reliable than in-sample backtests. UV Labs environments enable realistic backtesting with proper separation between training and evaluation periods.

Embedding (Vector)

Converting words, images, or data into numbers that capture meaning

Embeddings convert real-world things (words, sentences, images, trading patterns) into lists of numbers called vectors. The magic: similar things get similar numbers. "King" and "queen" have similar vectors because they're semantically related. "King" and "banana" don't.

Think of it as coordinates in a meaning-space. Each number represents a dimension of meaning. Words that are conceptually close end up close together in this space. This lets AI do math on meaning, like finding similar documents or clustering related concepts.

The Classic Demo

King - Man + Woman = Queen. This famous example shows embeddings capture relationships, not just words. You can do math with meaning! In trading, you might find that "bullish divergence" has a similar embedding to "oversold bounce" since they're conceptually related signals.

Context Window

The AI's "working memory" - how much text it can see at once

The context window is the maximum amount of text (measured in tokens) that an AI model can process in a single request. Everything, your prompt, conversation history, documents, and the response, must fit within this window. Anything outside simply doesn't exist to the model.

Think of it as a reading window that can only fit so many pages. Early GPT-3 had about 4,000 tokens (~3,000 words). Claude 3 has 200,000 tokens (~150,000 words or 500 pages). Bigger windows mean more context, but also more compute cost and potentially worse attention to specific details.

The "Lost in the Middle" Problem

Even with large context windows, models often "lose" information in the middle of long texts, paying more attention to the beginning and end. If you bury a critical fact in paragraph 47 of 100, the model might miss it. This is why document chunking and RAG matter.

Tokenization

Chopping text into bite-sized pieces that AI can digest

Tokenization breaks text into tokens, the smallest units an AI can process. A token might be a word ("hello"), part of a word ("un" + "believ" + "able"), or even a single character. On average, one token is roughly 4 characters or three-quarters of a word in English.

Different tokenizers chop text differently. "ChatGPT" might be one token to GPT but two tokens to another model. Numbers are especially tricky: "12345" might become five separate tokens. This is why AI sometimes struggles with math, it literally sees digits as separate concepts.

Why This Matters

Tokens = money. APIs charge per token, so "The quick brown fox" (4 words, ~5 tokens) costs more than you'd expect. Dense technical text or code often uses more tokens than casual English. Understanding tokenization helps you write more cost-effective prompts.

RAG (Retrieval-Augmented Generation)

Teaching AI to look things up instead of making things up

RAG combines LLMs with external knowledge bases. Instead of relying only on what it learned during training, a RAG system retrieves relevant documents first, then generates answers based on that retrieved context. It's like giving the AI an open-book exam instead of testing its memory.

The process: (1) User asks a question. (2) System searches a knowledge base for relevant documents. (3) Retrieved documents are added to the prompt. (4) LLM generates an answer grounded in those documents. This dramatically reduces hallucinations and enables citing sources.

Why It's Revolutionary

RAG reduces hallucination rates by 26-43% compared to vanilla LLMs. It lets you add private, up-to-date information without expensive fine-tuning. For financial AI, RAG means agents can reference current market data, proprietary research, or regulatory documents without retraining the entire model.

Synthetic Data

Artificially generated data that mimics real data for training

Synthetic data is artificially generated data that statistically resembles real data. Instead of collecting millions of real examples, you create fake examples that have similar properties. It's like creating practice problems that follow the same patterns as real exam questions.

Why use fake data? Real data is often scarce, expensive, or privacy-sensitive. If you only have 1,000 examples of rare market events, you can generate 100,000 synthetic variations. But there's a catch: if your synthetic data doesn't capture real-world complexity, your model learns wrong patterns.

In Trading

Flash crashes are rare but catastrophic. You might only have 10 historical examples. Synthetic data generation can create thousands of plausible crash scenarios. But the quality matters enormously: poor synthetic data teaches models patterns that don't exist in real markets.

Trading Metrics

How to measure trading performance

Alpha

Excess return above what the market provides; the value added by skill

Alpha measures how much better (or worse) an investment performs compared to a benchmark like the S&P 500. If the market returns 10% and your strategy returns 15%, your alpha is 5%. Positive alpha means you beat the market; negative alpha means you underperformed.

Alpha represents the return from skill, insight, or strategy, as opposed to beta (returns from general market exposure). Anyone can get market returns by buying an index fund; alpha is the additional value from active decision-making.

Why It Matters

Consistent alpha generation is extremely difficult. The Efficient Market Hypothesis argues it's impossible long-term. When training financial AI, the goal is to find strategies with positive expected alpha, and understanding that many apparent alpha sources are actually just luck or have already been arbitraged away.

Sharpe Ratio

Risk-adjusted return: how much return you get per unit of risk

The Sharpe ratio measures efficiency: how much excess return you earn for each unit of risk (volatility) you take. The formula is: (Portfolio Return - Risk-Free Rate) / Portfolio Volatility.

Two strategies might both return 20%, but if one has half the volatility, it has double the Sharpe ratio, meaning it's twice as efficient at converting risk into return.

What's Good?

Around 1: Acceptable. Around 2: Strong. Above 3: Exceptional (rare over long periods). Most quantitative hedge funds ignore strategies with Sharpe ratios below 2. For retail traders, above 1 is respectable.

Maximum Drawdown (MDD)

The largest peak-to-trough decline before recovery

Maximum drawdown measures the worst loss from a high point to a low point, before a new high is reached. If your account grows from $100K to $150K, then drops to $120K before recovering, your MDD is 20% ($30K drop from the $150K peak).

It captures the "pain" an investor experiences. Two strategies might have the same annual return, but very different maximum drawdowns, meaning very different emotional experiences and risk of ruin.

Recovery Math

Drawdowns require disproportionate gains to recover. A 10% drawdown needs 11% to recover. A 30% drawdown needs 43%. A 50% drawdown needs 100% (double your money). This is why risk management focuses heavily on limiting drawdowns.

MFE & MAE (Maximum Favorable/Adverse Excursion)

The best and worst unrealized profit/loss during a trade

MFE (Maximum Favorable Excursion): The best unrealized profit before exit. If you exit a trade with 2% profit but it peaked at 8% profit during the trade, your MFE was 8%. This reveals potential exit timing improvements.

MAE (Maximum Adverse Excursion): The worst unrealized loss before exit. If you exit with 2% profit but were down 5% at one point, your MAE was -5%. This reveals the actual risk you took, regardless of final outcome.

Why It Matters for AI Training

Final P&L alone doesn't tell the whole story. MFE data helps AI learn better exit timing (were opportunities left on the table?). MAE data helps AI learn better risk management (was the position at significant risk during its lifetime?). Both are captured in UV Labs decision episodes.

Position Sizing

How much capital to allocate to each trade

Position sizing determines how much money to put at risk on any single trade. It's often more important than entry/exit timing. You can have a winning strategy but go broke with bad position sizing, or survive losing streaks with good sizing.

Common approaches include fixed dollar amounts, percentage of portfolio (like 2% risk per trade), volatility-based sizing (smaller positions in volatile assets), and the Kelly Criterion (mathematically optimal sizing based on edge and odds).

The 2% Rule

Many professional traders risk at most 2% of their portfolio on any single trade. With a $100,000 portfolio, you'd never risk losing more than $2,000 on one trade. This ensures you can survive 20+ consecutive losses (statistically unlikely) without catastrophic damage.

Risk-Reward Ratio

How much you could lose vs how much you could gain on a trade

Risk-reward ratio compares potential loss to potential profit. A 1:3 ratio means risking $1 to potentially gain $3. If your stop loss is $100 below entry and your target is $300 above, that's a 1:3 risk-reward ratio.

This ratio alone doesn't determine profitability, you must consider probability too. A 1:3 risk-reward is meaningless if you only win 10% of the time. Successful trading requires finding setups where risk-reward and win probability combine favorably.

The Math

With a 1:2 risk-reward ratio, you only need to win 34% of trades to break even. With 1:3, you only need 26% wins. Higher risk-reward ratios allow profitability with lower win rates, which is why some successful traders win less than half their trades.

Expected Value (EV)

The average outcome of a trade if repeated many times

Expected value calculates what you'd expect to gain or lose on average if you repeated a trade infinitely. The formula is: (Win Probability x Average Win) - (Loss Probability x Average Loss). Positive EV means profitable long-term.

A single trade can lose money even with positive EV, that's just variance. What matters is finding and repeatedly exploiting positive EV situations. Casinos profit not because every bet wins, but because every bet has negative EV for the player.

Example

A strategy wins 40% of the time with average gain of $300, and loses 60% of the time with average loss of $150. EV = (0.40 x $300) - (0.60 x $150) = $120 - $90 = +$30 per trade. Despite losing more often than winning, this strategy is profitable because winners are larger than losers.

Win Rate

The percentage of trades that are profitable

Win rate measures how often you profit, calculated as winning trades divided by total trades. A 60% win rate means 6 out of 10 trades make money. It's intuitive and psychologically satisfying but misleading if used alone.

Win rate says nothing about magnitude. A 90% win rate is terrible if your wins average $10 and your losses average $200. Conversely, a 30% win rate can be excellent if wins average $500 and losses average $100.

Why It Matters

Many beginning traders obsess over win rate, but professionals focus on expected value and risk-adjusted returns. A trend-following strategy might only win 35% of the time but be highly profitable because it cuts losses quickly and lets winners run. Psychologically, this is hard since you lose more often than you win.

Slippage

The difference between expected price and actual execution price

Slippage occurs when you execute at a different price than expected, usually due to market movement or insufficient liquidity. You wanted to buy at $100 but got filled at $100.05. That $0.05 is slippage.

Slippage is typically negative (you get a worse price) and increases with position size, execution speed needs, and low-liquidity markets. A strategy that backtests profitably can fail live if slippage costs weren't modeled realistically.

Impact on Strategies

High-frequency strategies with tiny edge per trade can be destroyed by slippage. If your edge is $0.02 per trade but slippage costs $0.03, you lose money. This is why realistic backtesting must include slippage modeling, and why many academic strategies fail when deployed.

Liquidity

How easily you can buy or sell without moving the price

Liquidity measures how quickly and easily an asset can be converted to cash (or traded) without causing significant price movement. High liquidity means lots of buyers and sellers, tight bid-ask spreads, and quick execution. Low liquidity means wide spreads, slow fills, and prices that jump when you trade.

Cash is the most liquid asset by definition. Forex is the most liquid market (trillions daily). Real estate is extremely illiquid (takes months to sell). Crypto varies wildly, Bitcoin is liquid; some altcoins can move 10% just from one large order.

The Liquidity Trap

Liquidity vanishes when you need it most. During market panics, even usually-liquid assets become hard to sell. Everyone wants out at once, but no one wants to buy. This "liquidity crisis" amplifies crashes. Your trading AI needs to understand that backtested liquidity may not exist in real stress scenarios.

Volatility

How wildly prices swing - both opportunity and danger

Volatility measures how much prices move over time. High volatility means large swings, both up and down. Low volatility means prices stay relatively stable. It's usually measured as the standard deviation of returns. The VIX, called the "fear index," tracks expected volatility in the S&P 500.

Volatility is a double-edged sword. Traders need volatility to make money (no movement = no opportunity), but too much volatility increases risk. A 1% daily move is normal for stocks; a 20% daily move is a crisis. Crypto routinely sees 10%+ daily swings that would cause stock market circuit breakers.

Reading the VIX

VIX below 15: Markets calm, low uncertainty. VIX 15-20: Normal range. VIX 20-30: Elevated concern, expect wider price swings. VIX above 30: High fear, potentially a good time to buy if you're contrarian. The VIX spiked to 82 during March 2020.

Leverage

Borrowing money to amplify both gains AND losses

Leverage lets you control a large position with a small amount of your own money. With 10x leverage, $1,000 controls $10,000 of assets. If the asset goes up 10%, you make 100% on your $1,000. But if it goes down 10%, you lose everything. It's a multiplier that works both directions.

The "margin" is your required deposit. If prices move against you too far, you get a "margin call," meaning you must add funds or your position is forcibly closed. With high leverage, a small move can wipe you out before you can react. This is why leverage is often called "a double-edged sword."

The Math of Ruin

At 100x leverage (common in crypto), a 1% adverse move wipes out your entire position. At 5x leverage, a 20% drop costs you everything. Pros typically use 2-3x maximum. Most retail traders who use high leverage eventually blow up their accounts, it's statistically inevitable given enough time.

Arbitrage

Risk-free profit from price differences between markets

Arbitrage is buying an asset in one place where it's cheap and simultaneously selling it somewhere else where it's expensive. True arbitrage is theoretically risk-free profit since both transactions happen at once. Buy Bitcoin for $50,000 on Exchange A, sell for $50,100 on Exchange B: instant $100.

In practice, pure arbitrage opportunities are rare and fleeting. Armies of trading bots hunt them constantly. By the time a human spots one, it's usually gone. Modern "arbitrage" strategies often involve some risk: execution delays, counterparty risk, or statistical relationships that might not hold.

Types of Arbitrage

Spatial: Same asset, different exchanges. Triangular: Currency loops (USD → EUR → JPY → USD). Statistical: Similar assets that "should" move together temporarily diverge. Merger: Buying acquisition targets before deals close. Each type has different risk profiles.

Mean Reversion

The idea that extreme prices tend to snap back to normal

Mean reversion is the theory that prices, after moving too far from their historical average, tend to return to that average. If a stock typically trades at $100 and crashes to $60 on panic, mean reversion predicts it'll eventually recover toward $100. Buy low, sell at average.

The strategy: when prices are unusually high (overbought), sell. When unusually low (oversold), buy. Tools like Bollinger Bands and RSI help identify extremes. Mean reversion works great in range-bound markets but gets destroyed by strong trends, the price can stay "extreme" longer than you can stay solvent.

The Danger

"The market can stay irrational longer than you can stay solvent." Betting on mean reversion during a regime change is catastrophic. People who bought the "dip" in 2008 kept buying all the way down 50%. Mean reversion assumes the old mean still applies, which isn't always true.

Beta

How much an asset moves relative to the overall market

Beta measures an asset's volatility compared to the market. A beta of 1 means the asset moves with the market. Beta of 2 means it moves twice as much (up and down). Beta of 0.5 means half as much. Negative beta means it moves opposite to the market.

Beta represents systematic risk you can't diversify away by holding more stocks. It's the market exposure portion of returns. Alpha is what's left after accounting for beta. A manager who returns 15% when the market returns 10% with a beta of 1.0 has 5% alpha. With beta of 1.5, they just have beta, no alpha.

Examples

Tesla (~1.5 beta): Moves more than the market. Utilities (~0.4 beta): Defensive, move less. Gold (~0 to negative beta): Often inversely correlated. Crypto has extremely high beta to the overall crypto market but less predictable correlation to stocks.

Long & Short Positions

Betting on prices going up (long) or down (short)

Long: You own the asset and profit when prices rise. Buy at $100, sell at $120, make $20. This is the default, intuitive way most people think about investing.

Short: You borrow an asset, sell it, then buy it back later (hopefully cheaper) to return it. If you short at $100 and buy back at $80, you profit $20. Shorts profit when prices fall but have unlimited loss potential if prices rise.

The Asymmetry of Shorting

Longs can lose 100% max (price goes to zero). Shorts can lose infinite % (price can rise forever). GameStop short sellers lost billions when the price went from $20 to $400+. This asymmetry is why shorting is considered much riskier than going long.

Stop Loss & Take Profit

Automatic exits that limit losses or lock in gains

Stop Loss: An order that automatically sells your position if the price falls to a specified level. Buy at $100, set stop at $95, you automatically exit if price hits $95, limiting your loss to 5%.

Take Profit: An order that automatically sells when price reaches your target. Buy at $100, set take profit at $120, you automatically lock in your 20% gain when hit. Both remove emotion from trading and ensure you stick to your plan.

Why These Matter

Without stop losses, small losses become catastrophic. Traders tell themselves "it'll come back" while losses grow. Stop losses enforce discipline. But placement is critical: too tight and you get stopped out by normal noise; too wide and you lose too much. This is where MFE/MAE analysis helps optimize placement.

See It In Action

These Concepts Come to Life in Our Data

Decision episodes, reasoning traces, and counterfactuals aren't just concepts, they're what we build. See how they work in practice.

Request a Demo