10 April 2006

The Limits of Statistical Determinism, the Failure of Pythagorean Expectation


Note: This is a statistical related post, and may be quite boring, and also does not deal directly with DC United. If you care about such things.

Bruce's Belly has an interesting post up that deals with the ideas of winning soccer games through other team's mistakes, or results being dictated only in the margins. This actually mirrored something I had been wondering about. Namely, does Pythagorean Expectation work in MLS? The answer that is suggested to me is "Not really. In fact, almost the reverse."

For those unfamiliar with the concept, Pythagorean Expectation is credited to baseball empiricist Bill James, and is almost a cornerstone of the sabermetric analysis in baseball. The fundamental concept is that you can calculate what a baseball team's record should be with a formula involving the runs scored (RS) and allowed (RA) by a given team. The formula is most commonly described as Win %= RS2/(RS2+RA2). More recently, statisticians seem to be agreeing that an exponent other than 2 is more accurate, something like 1.81.

Part of the fundamental assumption is that games in baseball decided by one run are likely the result of luck, and the wider of margin of victory in a game, the more likely it is a reflection of a true comparison between the skill levels of any two teams. Just something to keep in mind when I tell you that I looked at Pythagorean Expectation in MLS.

The idea in baseball says that if you have two teams, and both have 20 wins and 20 losses, you can still determine a difference in a level of play. A team with a Pythagorean record of 23-17 has probably played better than a team with a Pythagorean record of 16-24, and one team has simply been luckier than the other. This isn't necessarily a prediction of future success, but it does let one evaluate how well teams are really playing in a way that luck in one-run games might obscure.

Using the results in MLS since 2001, I grabbed the final records, goals allowed (GA), and goals scored (GS) for all teams. Now, winning percentage, given the existence of draws in MLS, is not entirely an simple concept. So I calculated a number of different winning metrics that could be used: Percentage of Standing Points received / Maximum, % of games won, % of games won with draws counting as .5 of a win, and percentage of games won of games won and lost (throwing draws out).

Using the simple formula for pythagorean expectation (in this case, PythagExp% = GS2/(GA2 + GS2)) I calculated a pythagorean expectation for each team. But what does this percentage describe? The percentage of points won? Of games won? Can it account for draws?

So I ran an R2 evaluation between Pythag and each of the metrics I calculated for a team's performance. The data showed some pretty strong correlations. Between Pythag and % of Total Points was a .838 r2 value. Even better was the correlation of .847 between Pythag and a winning percentage where draws counted as one-half a win. Pythag was looking pretty useful as a way of discussing how well a team was truly performing, based on goals for and against.

The problem is that pythag was not better than a simpler formula of Expectation=GS/(GS+GA). That formula had slightly higher R2 values against all of the metrics I created. For Total Points %, its R2 was .844, and against the Win% (with draws as half-wins) it's R2 was .855. Further poking around produced the highest R2 values with an exponent of around 0.5. In other words, blowouts should have their impact deemphasized, not exaggerated as in the traditional Pythagorean discussion.

Which brings me back to what The Belly wrote. The fact that DC United defeated Chivas 2-nil is fine with me. But I also, if forced to swear on my Marco Etcheverry handtowel, would have to admit that 1-1 would have been a fair representation of the play that had occurred. MLS, and I think soccer in general, will see a significant margin of victory usually only as the result of some flukes, whereas close games are truly more representative of typical skill levels. Given that soccer is a game where a less skilled team may well bunker in an attempt to keep scoring down (feeling a nil-nil draw is fine with them) in a way that doesn't really extend to the NFL, NHL, MLB, or NBA, this makes a certain degree of sense.

So one of the fundamental cornerstones of statistical analysis in baseball simply doesn't seem to transport that well to MLS. Doesn't mean other models won't emerge, but I did think it was interesting that Pythag won't be one of them.


At 10 April, 2006 14:12, Anonymous Anonymous said...

In my opinion, in soccer, the team that did not play better that day wins more frequently that it does in other sport. This may be a function of all those fluke 1-0 victories.

I think this would go away with higher scoring. In other words, it would take more than one fluke to win 4-3. Or, a 4-3 result would come from demonstrable performances on both sides.

Is the 1-0 upset or 0-0 draw part of the tradition and spirit of soccer? Would interest fade if the result were more closely tied to skill or performance?


At 10 April, 2006 22:34, Anonymous Joe said...

Dude, are you trying to impress scaryice? Or challenge him to a soccer math tournament?

At 11 April, 2006 02:16, Anonymous the belly said...

You used my post as an excuse to do math!?!


At 17 October, 2012 21:43, Anonymous price per head service said...

It isn't the way that it should be. Your team should play the game like any other and win by themselves instead of winning because other team mistakes or other games.


Post a Comment

<< Return to The DCenters Main Page (HOME)