Back

Historical Data

Training Dataset

Explore the historical MLB data used to train our deep learning models.

Data Explorer
Browse through historical MLB statistics by season and player.

Loading MLB data...

Data Collection Methodology
Learn about how we collect and process MLB statistics for our models.

Data Sources

Our historical data is collected from official MLB statistics, Statcast, and other reputable baseball analytics sources. We use MLB's Statcast API to gather pitch-by-pitch data including velocity, spin rate, and precise location information. This allows our models to analyze specific pitch types and their effectiveness in different zones.

Data Processing

Raw statistics undergo extensive preprocessing, normalization, and feature engineering to prepare them for our deep learning algorithms. We account for factors such as ballpark effects, weather conditions, and situational context. Our pitch classification system uses computer vision and machine learning to accurately categorize each pitch type based on movement, velocity, and spin.

Model Training

Our neural networks are trained on millions of historical at-bats, learning complex patterns and relationships between pitcher and batter characteristics. We use a combination of recurrent neural networks (RNNs) to capture sequential patterns and convolutional neural networks (CNNs) to analyze spatial data like pitch locations. The models are regularly updated with the latest MLB data to maintain prediction accuracy.

Zone Analysis

We divide the strike zone into 9 regions (3x3 grid) and track performance metrics for both pitchers and batters in each zone. This allows our models to identify strengths and weaknesses in specific areas of the strike zone. For pitchers, we analyze their tendency to throw certain pitch types in each zone, while for batters, we track their batting average, slugging percentage, and whiff rates by zone.