Friday, April 28, 2006

Baseketball Statistics - Pythagorean Win Percentage

In baseball there are many theories that go about proving this concept called "statistical luck." Take for example, a guy like Kevin Millwood, the pitcher for the Cleveland Indians in 2005. He had what would be considered, statistically, a very very good year.

Here's his line:
9 W
11 L
30 G
1 CG
192.0 IP
182 H
72 R
61 ER
20 HR
52 BB
146 SO
2.86 ERA
143 ERA+ (park adjusted)
1.219 WHIP

OK, a lot of people may not understand these statistics and I'll briefly run through them, but I wanted to first point out the obvious. These are awesome statistics. Yet he only had 9 wins! That, my friends, is called "unlucky." And it can be statistically shown. As mentioned, we can look throughout all of history and look at pitchers that have had similar years and compare them against each other. What you will find is that when a pitcher has a 2.43 ERA and is 143% better than the rest of the average pitcher in the league, that said pitcher should win more than 9 out of the 30 games he appears in. One important statistic that we are missing is "Run Support." What we would see is that Mr. Millwood had terrible run support from his team when he pitched (runs scored was considerably below the team average for the year). This is "luck."

To make it maybe a little simpler and "step back" for a minute. The object of any game is to score more runs (or points) than you give up. In this case, Mr. Millwood gave up only an average of 2.43 runs for every 9 innings he pitched. Yet, this was only good enough to win 9 games. That means that in the majority of games (9 out of 30 of them to be exact), his team failed to average more than 2.43 runs while he was in the game. This makes sense at the player level. Anyone that followed Mr. Millwood's year last year would describe it as "unlucky." He pitched well, but just didn't get the win.

If you apply that to the team, it's even easier to prove. You can see how many runs a team scores and how many they give up. Based on those numbers you can come surprisingly close to predicting the number of games they "should" have won. Any difference between the "predicted wins" and the "actual wins" is "luck." Which tells you something. If a team comes very close to its "predicted wins" you can say that the team's record accurately reflects the performance of the team. Thus, a team that would be predicted to win 82 games that actually won 82 games, you could say, is an average team. In any event, it is accurate to say that teams that are close to their high predicted wins are actually good and those that are not but still actually won a lot of games were just lucky. (of course, at the end of the day, who cares if you're the one holding the world series trophy over your head, right?)

Baseball statisticians have figured out how to calculated predicted wins, and that number is eerily similar to the "pythagorean theorem." The basic idea is that if you were to plot runs scored on one axis and runs allowed on another axis, the closest distance between them (the hypotenuse) is wins (sort of). Anyway, the actual formula derived by Bill James is: runs scored squared divided by (runs scored squared plus runs allowed squared). This will give you the team's predicted winning percentage. Multiply by games played (in baseball = 162) and voila - Predicted Wins.

In 2005, the Indians scored 790 runs and allowed 642. Running that through the ol' formula gives you: 624100/(624100+412164) = 624100/1036264 = .6022 (predicted win pct) = 98 Wins. They actually won 93, there's about +-4 on the calculation, but that's an "unlucky" by about 5 Wins. IF they had won those 5 games, they still would not have won their division. (The White Sox won 99), but would have made the playoffs. Let's look at the White Sox: 741 Runs, 645 Runs Allowed = 549081/(549081+416025) = 549081/965106 = .5689 (predicted win pct) = 93 Wins. So, the White Sox were 6 games over their predicted wins. Which means they were lucky. The Indians had 93 wins, and they didn't make the playoffs. So, like I said, at the end of the day who cares about luck when you're the one with the World Series trophy right?

So, what can we take away from that though? Well, we can say that despite winning the World Series, the White Sox were lucky last year. Which tells Kenny Williams, their GM, that they aren't really as good as their record. And that chances are they won't be so lucky next year - if they don't improve, they may not even make the playoffs. Thus, in the offseason, Mr. Williams made quite a few moves to improve their offense (their runs allowed was very good, but their runs scored was in the lower half of the league). On the other hand, the Indians knew that they were unlucky; thus they made very few moves in the offseason - knowing that if they just got rid of the bad luck, they would make the playoffs.

Anyway. These same ideas apply to both football and basketball. The exponents change a bit (to 2.47 and 16.5, respectively - though basketball-reference suggests 14 as an exponent), but the theory is still the same. I was going to show how it worked with them, but you get the idea, and I'm out of time and space and patience. There has already been some significant work done on Pythagorean Wins for football and basketball (see the above links). Here some links to a football statistics database and a basketball statistics database that you can play around with.

I ran a file for this year's NBA pythagorean wins. Here's a summary: (predicted/actual)

Luckiest Team - Eastern Conference: (+4) Detroit (60/64) and New Jersey (45/49)
Unluckiest Team - Eastern Conference: (-6) Toronto (33/27) and Indiana (47/41)

Luckiest Team - Western Conference: (+8) Utah (33/41)
Unluckiest Team - Western Conference: (-3) Golden State (37/34), LA Lakers (48/45), and Memphis (52/49)

So what does all this mean? Well, not much quite frankly at the top of the Playoff brackets. There are no teams that would have made or missed the playoffs if they had or had not met their predicted wins. However, some of the rankings would have been a little different.

Current Playoff Matchups - Western Conference:
San Antonio (1)/Sacramento (8)
Phoenix (2)/LA Lakers (7)
Denver (3)/LA Clippers (6)
Dallas (4)/Memphis (5)

Pythagorean Playoffs - Western Conference:
San Antonio (1)/Sacramento (8)
Phoenix (2)/LA Clippers (7)
Denver (3)/LA Lakers (6)
Dallas (4)/Memphis (5)

As you can see - not much difference. Only the Kobe/Nash matchup would not have happened. But you'd have Kobe/Carmelo, which would have been equally interesting.

Current Playoffs - Eastern Conference:
Detroit (1)/Milwaukee (8)
Miami (2)/Chicago (7)
New Jersey (3)/Indiana (6)
Cleveland (4)/Washington (5)

Pythagorean Playoffs - Eastern Conference:
Detroit (1)/Milwaukee (8)
Miami (2)/Chicago (7)
New Jersey (3)/Washington (6)
Cleveland (4)/Indiana (5)

So, again, no interesting deviations other than the regional matchups of Cleveland/Indiana and New Jersey/Washington.

No comments: