The Billingsley Report has been a staple of the BCS since its inception. However, there is one inherent flaw that can cause it to disagree with every other computer. The data backs this up. This article explains the flaw and reveals the data.
By Yeshayahu Ginsburg
The Billingsley Report is unique among the six BCS computer polls. Each of the polls is different and has their own quirks, but Billingsley’s is just an entirely different style than the rest. All five other BCS computer polls use some type of formula that takes into account every game from the season each week and gives each team a rating based on whatever criteria the programmer decided that is important. However, that is not how Billingsley’s works at all.
In Billingsley’s system, each team begins each week with a rating. That rating is solely based on the results of the previous week. A team will gain or lose points depending on its performance each week. The value of every game is determined by the ratings of both teams involved and by the location of the game. The higher the opponent’s rating, the more points a team will gain for beating them (and the less they will lose for losing to them).
The number of losses that a team has is also central to these rankings. A team will only get the full amount of rating points earned from each win if they are undefeated. Every loss that a team has lowers the percentage of the full points possible that the team actually receives. So teams move up and down each week based on how many points they gained or lost. It looks like it functions similar to the human polls, and the system does remarkably well in that respect. However, it has a fatal flaw that renders it inherently unfair.
The system only judges each team on a week-by-week basis. It does nothing to correct itself if later in the season a current ranking is proven to have been false. Billingsley defended this theory very well on his site (the explanation is no longer on his site). He used the 2008 season as an example, when Texas Tech famously upset Texas. Prior to having to run that gauntlet, Texas was clearly the #1 team in the country. As such, everyone who played them deserved to be credited for playing the #1 team. Texas losing a game eventually doesn’t change that. Thus, he claims, retroactively changing a team’s ranking is unfair. The team, at that time, deserved its ranking and therefore all opponents should be credited as having played such a highly-ranked team.
The flaw with this, however, is that Billingsley uses a preseason ranking. This ranking is solely based on the finishing position of the team the previous year. Thus, while his logic as to why not to retroactively readjust rankings is sound, it doesn’t apply in the beginning of the year. And while most teams do not show a massive variation in ranking from year to year, it can happen. When it does, it can throw Billingsley’s whole system off.
Take, for example, Ball State. The Cardinals had one amazing year in 2008, sandwiched between two pretty bad years. Ball State started 2008 ranked #75 in Billingsley following a decent (by MAC standards) 2007. And while they reached the top 20 in November 2008 before losing their final two games, the trek up the rankings was a slow one for a MAC team with a poor SOS. Any team who played them early in the season got credit for losing to a team ranked in the 70s or 60s, when clearly Ball State should have been ranked higher. This unnaturally deflated all of the rankings of Ball State’s opponents, which would in turn affect all of those teams’ future opponents.
Then, in 2009, the exact opposite happened. Ball State was ranked #31 to start the year after their incredible 2008. But this was clearly a different team. They lost their first game to #119 North Texas. This, of course, was a massive upset according to Billingsley and North Texas’s rating skyrocketed accordingly. The problem with this, obviously, is that both Ball State and North Texas were terrible that year. The two went a combined 4-20 and both ended up being ranked in the 110s. However, for those first few weeks of the season, every team that played either of them got an undeserved boost. Because the rankings only care about the rating of each opponent each week, that boost then affected all of the teams that Ball State’s and North Texas’s opponents played.
The system does have one way of correcting itself. It has a built-in rule that an undefeated team will remain ahead of every team that they have beaten. Thus, when Boise State beat Oregon to start 2009, Oregon’s wins continued to boost Boise’s rating because Boise State had to stay ahead of them. The system also attempts, where record and ranking permit it, to move every team ahead of a team they just beat. Usually the winning team will stay ahead of that team that they beat until they incur another loss.
Mr. Billingsley was nice enough to attempt to resolve this issue for me. He pointed out that his system is designed to reward being undefeated and usually gets the top two right, especially when they are undefeated. He also pointed out that his rankings do not usually deviate too much from the other computers. His first statement is very true. Because of how it rewards being undefeated and punishes losses, his system is good at finding the top teams. However, the issues that I brought up can still affect this. Not that it ended up making a difference in the end of the year, but Alabama (the eventual national champion) was one of those teams in 2009 that got an unfair boost from beating North Texas.
The fact that his computer is not too different than the rest is also true, but it is irrelevant. The Billingsley Report (along with Sagarin) is one of the two that contains the most outliers most often. As evidence, if you look at the BCS computer rankings over the last three years (2009-2011), Billingsley accounts for about 30% of all outliers. Sagarin accounts for about 23%. The next closest is Colley at about 18%, and Wolfe is the lowest at about 6%. Just because the results end up being correct most of the time does not change the fact that if the process is invalid, there is a higher risk of the results being invalid every time you use the process.