Automating Sunday with Python, SQL, Jupyter Notebooks & Google Cloud Platform | by Dave Melillo | Towards Data Science

Automating Sunday with Python, SQL, Jupyter Notebooks & Google Cloud Platform

Automating Sports Analytics with Data Science

Table of Contents:

Automation

The Pipeline

Authentication

Extraction

Transformation

Presentation

Jupyter Notebook Link

Automation

Automate everything until only the fun stuff is left.

This is a quote that guides me in almost everything I do. I’m not sure if it is an original thought or something I heard along the way … but after a 15 minute unsuccessful Google search, I‘m happy to claim ownership.

What follows is an application of data process automation specific to NFL handicapping, but as many will notice the solution here is applicable to many situations.

In my situation I had a spreadsheet. I created it every week to help me analyze NFL match ups. At first I put it together to get an edge in my friend’s fantasy league but over time as sports gambling was legalized throughout the country, I found it useful from a handicapping perspective as well.

The spreadsheet aggregates pre-game data such as game spreads/totals, implied fantasy points per team, opponent defensive ranking, points per game trends and record against the spread. Having a view of all of this data helps me make informed decisions around risk, value and expected return, much like running a business. The problem is, that it took me hours to put the file together before I could start analyzing anything. I also ran the risk of making decisions based off of out of date/incorrect data due to human error.

So after years of copy-pasting, vlookups and ridiculous spreadsheet formulas, I finally dedicated some time to build an automated data pipeline that allows me to spend more time picking and less time clicking.

The Pipeline

This exercise was easy-ish because I understood my target very well (the spreadsheet) and I knew exactly where I could get a consistent stream of data. I have been using https://sportsdata.io/ (and more recently https://bettingdata.com/ ) for years, and they have an extremely friendly API for any sports related data you could ever need.

The vision I had in my head looked something like … SportsDataIO as the source of data, BigQuery to store/transform the data and a DataStudio dashboard as a replacement for my manual spreadsheet … all orchestrated through a Jupyter Notebook.

Authentication

A lot of these articles skip boring things like authentication, which can be really frustrating unless you are familiar with the topic. Below is an example of how I easily authenticated with Google BigQuery (gbq) and connected to a specific project (sportsdataio) within my account.

Getting your credentials from Google Cloud Platform to plug into this template is easy. Follow the instructions for Creating a Service Account and insert results of the downloaded JSON file into the template below.

Extraction

I decided I needed to extract data from the following end points provided by https://sportsdata.io/:

The framework I use to extract data from each of these endpoints is standardized into 4 steps:

A simple example of accessing TeamSeasonStats for the current season is below:

A more complicated historical extraction of GameOddsByWeek is below.

All of these extractions, written in a Jupyter Notebook, are executed with a simple command in a Terminal:

jupyter nbconvert — to notebook — execute sdio_extraction.ipynb

Transformation

There are a number of transformations to make, all of which will be made in a couple of SQL queries now that we have the information stored in gbq.

One of the most involved transformations is calculating how each team measures Against The Spread. For instance, if the Tampa Bay Buccaneers are getting -7 this week against the Atlanta Falcons, and the final score of the game is TB 30 — ATL 20, then Tampa covered the spread. This is a helpful metric when wagering on point spreads because you generally want to gravitate towards teams who have a positive ATS record.

I could probably write a whole post on ATS alone, but in short, this script produces the table pictured below which makes it very easy to calculate an ATS win percentage (ats_win/games) per team (i.e TB 53.85% ats_win_p)

From a macro perspective, the more important transformation queries are actually scheduled in BigQuery. I have two queries scheduled to run daily, which compliment the extraction schedule. These queries populate views that will ultimately be used in DataStudio.

Presentation

Now, instead of a shoddy spreadsheet that took me hours to prepare, I wake up to a dashboard automatically populated with the most recent information. From here, its easy for me to create charts and visualizations that make decisions easier.

For example:

To be continued!

Setting up this pipeline has given me more time to analyze data and increase my likelihood of success, versus repeating repetitive, remedial and low value tasks. This is not only my goal for NFL handicapping, but for life! Data technology today gives us an amazing opportunity to automate the boring stuff so we can empower ourselves, or others, to execute a higher function.

You can access the Jupyter Notebook that is the “engine” for this pipeline here.

If you have any questions or suggestions please feel free to comment or reach out to dmelillo@gmail.com

Sign up for The Variable

By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge research to original features you don't want to miss. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Your home for data science. A Medium publication sharing concepts, ideas and codes.

Share your ideas with millions of readers.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store