mathdata sciencesports analytics

Teaching Statistics with Fantasy Football: A Unit Using FPL Data

UUnknown

2026-01-28

9 min read

Turn FPL data into an engaging statistics unit — teach descriptive stats, probability, hypothesis testing & modeling with real Fantasy Premier League data.

Hook: Turn exam anxiety into matchday excitement — teach statistics with Fantasy Premier League

Students and teachers often struggle with bland datasets, abstract probability, and a disconnect between classroom exercises and real-world decision making. Use Fantasy Premier League (FPL) data to solve those pain points: it’s timely, rich, and emotionally engaging. In 2026, with sports analytics booming in schools and easy access to FPL APIs and public datasets, you can turn a statistics unit into a hands-on sports-data project that teaches descriptive statistics, probability, hypothesis testing, and model building.

The 2026 context: Why FPL is perfect now

Late 2025 and early 2026 saw three trends that make this unit especially practical:

Wider classroom adoption of sports analytics curricula and micro-credentials — schools are adding applied data projects to boost engagement.
Improved access to live FPL data via community-maintained APIs and scraping-friendly endpoints (the FPL /bootstrap-static/ JSON remains a common starting point), plus more third-party analytics (Understat, FBref) for enriched features.
Ubiquitous AI tooling in classrooms — students can use notebooks (Google Colab, Jupyter) and LLMs for code scaffolding and exploratory analysis while educators maintain assessment integrity with process-based rubrics.

As the BBC’s FPL coverage highlighted in January 2026, keeping up with injuries and team news is central to FPL decision-making — and that real-time angle gives students a compelling reason to analyze data week-to-week.

"Before the latest round of Premier League fixtures, here is all the key injury news alongside essential Fantasy Premier League statistics." — BBC Sport, 16 Jan 2026

Unit overview: goals, duration, and outcomes

This unit is designed for a 6-week term (adaptable from 4–10 weeks depending on depth) and targets secondary or early-college statistics/data science learners.

Learning objectives

Use FPL data to compute and interpret measures of central tendency and dispersion.
Apply probability concepts to match outcomes and player points.
Formulate and test statistical hypotheses (t-tests, chi-square, permutation tests).
Build and evaluate predictive models (linear regression, Poisson models, simple machine learning) to forecast FPL points.
Communicate findings in visual and narrative form, considering ethics and data quality.

Week-by-week plan (6 sessions)

Week 1 — Data literacy and sourcing

Introduce FPL as a dataset. Practical tasks:

Explore the FPL /bootstrap-static/ JSON (or an exported CSV). Show students the structure: players, teams, fixtures, element_stats.
Assign a mini-task: pick a player and compute season-to-date average points per game.
Discuss data ethics: accuracy, timeliness, and privacy (don’t publish personal identifying info from players or students).

Week 2 — Descriptive statistics & visualization

Teach summarising techniques using real FPL variables (total points, minutes played, expected goals (xG) if available).

Compute mean, median, mode, variance, standard deviation for a squad or set of players.
Plot histograms and boxplots for distribution of points; discuss skewness and outliers (e.g., a hat-trick week).
Activity: compare distribution of forwards vs defenders’ points; interpret differences.

Week 3 — Probability & expected values

Use match data to teach probability basics and expected point calculations.

Estimate probability of a player scoring based on past frequency (empirical probability).
Compute expected points for a player for the next gameweek using weighted averages (incorporate home/away, opponent strength).
Class exercise: if Player A has a 0.25 probability to score (4 points), 0.40 probability for an assist (3 points), compute expected points.

Week 4 — Hypothesis testing

Frame real questions and test them:

Example hypothesis: Players perform better at home than away. Choose a test (paired t-test if same players, bootstrap/permutation tests otherwise).
Demonstrate assumptions and show non-parametric alternatives when assumptions fail.
Students perform hypothesis tests and report p-values, effect sizes, and practical significance.

Week 5 — Model building

Teach simple predictive models using weekly features:

Linear regression to predict points, using features such as minutes played, xG, fixture difficulty.
Poisson regression for goal counts and logistic regression for binary outcomes like a clean sheet.
Evaluate models using train/test split, cross-validation, and metrics (RMSE, MAE, ROC AUC).

Week 6 — Project presentations & assessment

Students present mini-projects: a predictive model and a short report with visualizations. Assess using a rubric covering data cleaning, analysis, interpretation, and communication.

Data sources, tools, and classroom setup

Provide practical options so teachers can pick what fits their tech stack.

Data sources

FPL API: community endpoint /bootstrap-static/ gives player and team master data. Many education projects use this as a canonical starting point.
FBref & Understat: for xG, xA, and advanced stats if you need richer features.
BBC/Fantasy Football news: for team news and injuries (useful as binary features for modeling; see BBC Sport coverage in Jan 2026).

Tools

Beginner-friendly: Google Sheets / Excel for descriptive stats and simple visualizations.
Intermediate: Google Colab / Jupyter with Python (pandas, matplotlib/seaborn, statsmodels, scikit-learn).
Advanced: R with tidyverse, glm for Poisson, and caret for modeling pipelines.

Worked example: Does home advantage increase FPL points?

Walkthrough a complete hypothesis-testing mini-project. This example assumes students have a CSV of match-by-match player points labeled with home_or_away.

Formulate: H0: mean points at home = mean points away. H1: mean points at home > mean points away.
Visualize: draw boxplots for home vs away points to inspect distributions.
Check assumptions: if distributions roughly normal and variances similar, use paired or independent t-test; otherwise use a permutation test.

Compute test (pseudo-Python):

# pandas + scipy example
home = df[df.home_or_away=='home'].points
away = df[df.home_or_away=='away'].points
from scipy.stats import ttest_ind
stat, p = ttest_ind(home, away, equal_var=False)
print(stat, p)

Interpret: if p < 0.05, reject H0. But also report effect size: Cohen’s d or mean difference to show practical significance.

Model building: practical tips and baseline models

Start simple, then add complexity. Students learn best by iterating.

Baseline: predict each player’s next-gameweek points as their season average. This is the benchmark.
Linear model features: recent points (last 3 GW average), minutes, fixture difficulty index, home/away, opponent xGA.
Poisson model for goals: model goals scored with exposure = minutes/90. Use team-level attacking strength as offset when needed.
Ensembling: average predictions from linear and tree-based models for robustness.

Explain overfitting: keep a separate holdout set or use time-series split because FPL data is temporal.

Assessment & rubric (practical, process-based)

To prevent copy-paste misuse of AI tools and ensure learning, grade the process as much as the product.

Data collection and cleaning (20%): documented steps, rationale for imputations or exclusions.
Analysis and methodology (30%): correctly chosen tests/models and justification.
Interpretation (30%): accurate explanation of results, limitations, and practical implications for FPL managers.
Communication & reproducibility (20%): clean notebook, clear visuals, and a 5-minute presentation.

Differentiation & extensions

Adjust difficulty by grade level or class size.

Lower level: descriptive stats, simple probability, and small-group visualization tasks.
Middle tier: hypothesis testing and basic regression.
Advanced students: time-series forecasting, network analysis of transfers, or NLP on manager press conference text to extract sentiment and include it as a feature.

Classroom pitfalls & how to avoid them

Data freshness: FPL data changes weekly. Freeze a dataset for assessment to ensure reproducibility or require students to log data collection times — consider a short snapshot the data policy for grading windows.
Confounding variables: injuries and rotation affect minutes. Teach students to include minutes as exposure or filter for players who start regularly.
Multiple comparisons: when testing many players, use false discovery rate controls or emphasize practical effect sizes.
Overreliance on AI: let LLMs scaffold code but require students to explain and justify each step in their own words.

Assessment example question set

Compute the mean, median, variance, and skewness of midfielders’ points across the season. Interpret.
Estimate the probability that a chosen forward scores at least 6 points next GW using empirical frequencies.
Test whether the average points for players with >60 min are significantly different from those with <60 min using an appropriate test.
Build a model to predict next-match points and compare it to the season-average baseline. Explain where your model helps and where it fails.

Real-world relevancy & career links

Sports analytics is a fast-growing career path. Students who engage with FPL-based projects practice skills used in data journalism, performance analysis, and betting/odds modelling. Use student projects as portfolio items — with consent and anonymization — to show admissions or internship supervisors. For consent best practices, refer to safety guidelines on consent and anonymization for creative and micro-gig work.

Actionable checklist for teachers (ready-to-use)

Week 0: Reserve computer lab or set up Google Colab links; prepare sample CSV (players, fixture, points).
Download or snapshot the FPL JSON and convert to CSV. Provide both raw and cleaned versions.
Create a rubric and share it at project start to focus student effort on process.
Prepare 2–3 starter notebooks: descriptive, hypothesis test, simple model.
Plan 1 live data week where students apply their models to new GW data and reflect on prediction errors.

Final thoughts and 2026 forward-looking tips

In 2026, expect more schools to integrate sports-data units and more students to use AI-assisted workflows. Keep the unit current by:

Updating datasets each season and bookmarking reliable sources like FPL's API, FBref, and major outlets' injury feeds (e.g., BBC Sport).
Teaching model humility: show where models fail (randomness in football is high) and how uncertainty estimates make predictions useful.
Encouraging ethical reflection: when is predictive modeling helpful vs. harmful (e.g., gambling implications)? See guidance on safety and consent for classroom projects at voice and micro-gig safety.

Key takeaways

Use FPL data to teach both foundational and advanced statistics in a motivating, real-world context.
Design assessments around process (data collection, cleaning, reasoning) to ensure real learning.
Start simple: baseline models and descriptive statistics create wins that scale to more complex modeling tasks.
Leverage 2026 tools — notebooks, APIs, and AI scaffolding — but keep students accountable for interpretation and ethics.

Call to action

Ready to launch this unit? Download our ready-to-run Google Colab starter notebooks, a printable rubric, and a sample FPL dataset adapted for classroom use. Test one lesson this week with a small group and share results — you’ll be surprised how quickly students move from confusion to insight.

Start now: pick a fixture, snapshot the data, and run the first descriptive stats lesson. If you want the starter pack, visit our teacher resources page or email our curriculum team to get bespoke adaptations for your class level.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.