Tuesday, March 24, 2009

More on Pomeroy, My Favorite Subject

My usefulness around here during February and March is limited to asking stupid questions about the usefulness of everyone's favorite predictive tool for college basketball, the Pomeroy Ratings.

In today's episode, I ask, once again, whether Pomeroy undervalues the elite teams. I've previously asked whether the nation's best teams are undervalued by his efficiency statistics because those teams tend to play a lot of garbage time, which does not reflect their true quality but the data nevertheless goes into the Pomeroy numbers. As an example, I used the Duke beatdown of Maryland in February, when the Blue Devils ran out to a 40 point lead about 2/3rds of the way through the game and then coasted home to a final margin that was about equal to that. The teams at the top of the traditional rankings do this on a regular basis, yet those waning minutes are considered by Pomeroy to be just as important as the waning minutes in a close game. Today I'll take a new look at whether this might lead to a slight undervaluation of the elite teams as compared to the rest of the country.

The Basketball Prospectus predictions for 2008 and 2009 as compared to the seeds' actual performance, suggests that there may be some truth to this. Let's look first at the success of the top seeds in advancing to the Sweet 16. Since the tournament switched to the 64-team format in 1985 there have been exactly 100 top seeds. How convenient for us! Of those top seeds, 88 have advanced to the Sweet 16, for a ... wait for it ... 88% success rate. However, of the eight pre-tournament regional bracket predictions issued by Basketball Prospectus, only one #1 seed was given a better than 88% chance of advancing to the second round: Kansas in 2008, which was bestowed with a 93.6% chance of advancing, mostly because they were the Pomeroy overall #1 and their 8/9 draw was ridiculously weak according to the Pomeroy ratings. The other seven #1 seeds? UNC 2008 got a 70.99% chance of advancing, Memphis 2008: 84.3%, UCLA 2008: 77.86. Moving on to 2009, Pitt was given a 72.07% chance of advancing, Louisville got a 79.85%, UConn got a 70.35% shot, and UNC got a 79.61%.

We know that all 8 of these teams advanced, but that's not what's important. What is important is that these eight predictions average out to only a 78.58% chance of advancing, almost 10% lower than the historical data would suggest. So, either: (1) Basketball Prospectus thought the 2008 and 2009 #1 seeds teams were weaker than the average crop of #1 seeds historically; (2) the 2008-2009 #1 seeds were extraordinarily unlucky in their 8/9 draws, or (3) Basketball Prospectus and Pomeroy are undervaluing the top seeds.

Running the data with final four likelihood results in similar findings. 43 of 96 #1seeds have made the final four over the years, for a 44.79% chance of Final Four advancement historically. The BP/Pomeroy numbers are far more pessimistic for the 2008 and 2009 teams, averaging out to a 37.68% even with the 2008 Kansas squad getting an amazing 61% chance of advancing.

Does this mean anything? It's hard to say. The sample size of 8 #1 seeds is small, to the point where a single Pomeroy divergence from conventional wisdom (like Memphis, the #1 Pomeroy team in 2009, being a 2 seed that is twice as likely as UConn to advance to the Final Four) impacts the data. Still, I think there's enough disparity between history and the projections to at least ask some questions.

Of course, it is also worth noting that the one team given a likelihood of advancement to the Sweet 16 and Final 4 greater than historical percentages for #1 seeds was the 2008 Kansas team, the only one of these eight that has a national title under its belt as of this writing. That certainly had some value for the folks who read this blog regularly.

For the record, it seems that Pomeroy is well aware that the data has some minor shortcomings- for example, with respect to Gonzaga's amazingly high projections in the 2009 South Region and their impact on UNC's percentages, they wrote the following:

"Second on the list of apparent log5 absurdities [the Memphis numbers were the first] is the chance of Gonzaga winning the title. With the Zags' last marquee game being a humiliating loss in Spokane to Memphis, there's some well-founded skepticism that their chances of escaping the South region could approach a healthy UNC's chances."

I guess that the one thing we can take from this is that the rankings are not gospel, but are simply a useful predictive tool. I imagine even Pomeroy would agree with that sentiment.

Tune in later this week, when the coming baseball season finally gives me something to write about besides bizarre nitpicky breakdowns of predictive models.

2 comments:

Anonymous said...

....only one #1 seed was given a better than 88% chance of advancing to the second round: Kansas in 2008

You must mean advancing from the second round, to the regional semis.

Interesting premise of your post. But wouldn't there be more garbage time for Memphis, or a vintage UNLV team, or someone else who plays outmatched competition on a regular basis? The top half of the Big East playing the bottom half also comes to mind.

One way to test your hypothesis would be to derive your own Pomeroys controlling for blowouts. First we need to define "garbage time" (margin in points > 2 * minutes remaining?) and then drop those stats out of the calculation. I wonder if Ken himself would be open to providing some source data to help someone do this calculation.

If doing it play-by-play is too hard, we could just drop individual games that are blowouts.

All more work that I want to do personally, but someone already doing CBB research might be open to it.

Grover said...

Correct, I meant advancing past the second round. My mistake.

I thought about that with respect to teams like UNLV and Memphis, and frankly, I don't know the answer to that. I do think there may be some other problems with Pomeroy sometimes overvaluing dominance over weak conference competition, despite his best efforts to weight the efficiency numbers. After all, even he recognized that the two biggest anomalies produced by his numbers were incredibly optimistic projections for Memphis and Gonzaga. It will be interesting, although obviously not dispositive, to see how those two teams do this week.