The Mathematics of Baseball Prediction
About Our Mathematical Model
Our prediction engine uses logistic regression, a powerful statistical method that models the probability of a specific outcome. This page explains the mathematical beauty behind our predictions and demonstrates how small changes to the model coefficients can significantly impact the results.
The Logistic Function
P(outcome) = 1 / (1 + e-z)
where z = β0 + β1x1 + β2x2 + ... + βnxn
The logistic function transforms any input (z) into a probability between 0 and 1. This makes it perfect for predicting baseball outcomes, where we need to estimate the probability of events like walks, strikeouts, or hits.
In our model, z is a linear combination of player statistics (xi) and their corresponding coefficients (βi). These coefficients determine how much weight each statistic has in the final prediction.
Our Prediction Formula
Interactive Walk Probability Calculator
All Model Coefficients
// Calculate base probabilities using logistic regression
const walkProb = logisticRegression(
REGRESSION_COEFFICIENTS.WALK.CONSTANT +
REGRESSION_COEFFICIENTS.WALK.BATTER_WALK_COEF * batter.stats.walkPct +
REGRESSION_COEFFICIENTS.WALK.PITCHER_BB9_COEF * pitcher.stats.bb9
)
// Logistic regression function
function logisticRegression(z: number): number {
return 1 / (1 + Math.exp(-z))
}
Implementation Notes
- We calculate probabilities for all possible outcomes (Single, Double, Triple, Home Run, Walk, Strikeout).
- The remaining probability is assigned to "Out" (any other out that's not a strikeout).
- We apply additional adjustments for factors like handedness matchups and historical data.
- The fixed walk coefficients have been implemented in our codebase to provide more accurate predictions.