The Limits of AI in College Football Analytics - Part 1
How the 2021 CFP semifinal between Georgia and Michigan demonstrates the limited utility of AI in college football analytics
In the fall of 2021 I lived two doors down from a diehard Michigan Wolverines fan. An alum, the guy was in his mid-40s, so his college Saturdays were spent cheering on the squads that led up to the undefeated 1997 team, which featured Heisman Trophy winner Charles Woodson.
I would see this neighbor mostly in the spring, summer, and early fall, when the Connecticut weather was nice enough for our kids to play outside (our sons are 6 weeks apart in age and are best friends). So, while I never got a detailed breakdown of his thoughts leading up to the CFP semifinal game between #2 Michigan and #3 Georgia, I do remember him mentioning that he thought it was a tough matchup, and that he had been rooting for Georgia to win the SEC Championship, which likely would have afforded Michigan a more favorable matchup with Cincinnati. Still, I would say his mood was cautiously optimistic leading up to the game.
Consensus among the pundits was that Georgia would win, and likely cover the -7.5 point spread. Still, there was a clear path to victory for Michigan, and had the Wolverines won, it would have been considered an upset, not a miracle.
In the days between the CFP selection show and December 21, I thought Michigan was set up well to be competitive and make it a four-quarter game. They were coming off the big win over Ohio St. and had completely dominated Iowa in the Big Ten Championship. They seemed to be playing their best football when it mattered most.
Then I saw this headline and knew all bets were off:
Up until November 2021, Covid was my full-time job. I earned my Master of Public Health in Health Policy & Management at Columbia University and worked for two different health systems during the course of the pandemic. The healthcare data I had at my fingertips was every bit as rich as the college football data that underpins this Substack. I vociferously read then, and continue to read now, the medical literature on the topic (research and clinical journals).
On its face, to the medically illiterate, Michigan’s decision to mass vaccinate all its players just days before the biggest game of their lives was a logical one. Harbaugh’s biggest worry was an outbreak that could cause key players, or even entire position groups, to miss the game due to a positive test.
“The card that is so high and wild…”
Artificial Intelligence (AI) is on a lot of people’s minds these days, and there seems to be a lot in the way of fear around its capabilities. I went from healthcare to managing Analytics for the fintech startup Vic.ai, which hopefully lends some credence to my thoughts on the topic. As I wrote in a LinkedIn post a few weeks back:
The New Yorker ran this article in [a recent] issue about the prospect of using AI to treat mental illness. Severe mental illness was an area I focused on during my MPH studies at Columbia University Mailman School of Public Health - I completed a geospatial analysis of state inpatient psychiatric hospitals during an internship at the Milbank Memorial Fund, and I volunteered in the evaluation center at NewYork-Presbyterian Hospital's White Plains psychiatric hospital (what NYP calls The Westchester Division). And yet, I found that the history of AI-mental health initiatives goes back further than I would have guessed. Still, the results, when positive, have tended to be at the margins. I couldn't help but compare those results to what we see at Vic.ai with our clients in the accounts payable space. For example, the 80% decline in per invoice processing time that some of our established clients have realized is light years ahead of anything achieved in the mental health space. When I asked myself, "Why?", the answer I came up with was: "Because Vic.ai leverages AI to improve performance on a tightly defined set of human-to-machine interactions. At the present time, AI has not demonstrated an ability to consistently solve issues governed by human-to-human interactions."
To expand on the above, AI has also not demonstrated an ability to reach beyond its training data. By definition, it cannot, and that will always limit its utility in dynamic environments, such as college football. There are factors that can crop up out of nowhere, factors that were heretofore considered unrelated, if they were considered at all. And in key moments, these factors can emerge in a way so as to render all others moot. Leonard Cohen captured the importance of these factors to the skilled analyst in The Stranger Song when he wrote:
Like any dealer he was watching for the card that is so high and wild
He'll never need to deal another
He was just some Joseph looking for a manger
Think about it, the average fan plays this game as well. So do talk radio hosts and television and print commentators. Data analysts are just a bit more structured in how they consider it.
My central argument is that it takes a human, and not just any human, but a human with some know how, to recognize that card when it is dealt, and to either incorporate it into the dataset, such that the magnitude of the determination can be captured, or toss the dataset entirely.
To Be Continued…
Next week, in Part 2, I will, in somewhat painstaking detail, present the folly of Harbaugh’s attempt to maximize Michigan’s chances of victory in the 2021 CFP semifinal by employing mass vaccination, and discuss why AI would fail to pick up on the impact this decision was likely to have on the game’s outcome. Then I will bring it back to Virginia Tech and discuss how this line of thinking impacts how I interpret models.