Friday, April 28, 2006

Baseketball Statistics - Pythagorean Win Percentage

In baseball there are many theories that go about proving this concept called "statistical luck." Take for example, a guy like Kevin Millwood, the pitcher for the Cleveland Indians in 2005. He had what would be considered, statistically, a very very good year.

Here's his line:
9 W
11 L
30 G
1 CG
192.0 IP
182 H
72 R
61 ER
20 HR
52 BB
146 SO
2.86 ERA
143 ERA+ (park adjusted)
1.219 WHIP

OK, a lot of people may not understand these statistics and I'll briefly run through them, but I wanted to first point out the obvious. These are awesome statistics. Yet he only had 9 wins! That, my friends, is called "unlucky." And it can be statistically shown. As mentioned, we can look throughout all of history and look at pitchers that have had similar years and compare them against each other. What you will find is that when a pitcher has a 2.43 ERA and is 143% better than the rest of the average pitcher in the league, that said pitcher should win more than 9 out of the 30 games he appears in. One important statistic that we are missing is "Run Support." What we would see is that Mr. Millwood had terrible run support from his team when he pitched (runs scored was considerably below the team average for the year). This is "luck."

To make it maybe a little simpler and "step back" for a minute. The object of any game is to score more runs (or points) than you give up. In this case, Mr. Millwood gave up only an average of 2.43 runs for every 9 innings he pitched. Yet, this was only good enough to win 9 games. That means that in the majority of games (9 out of 30 of them to be exact), his team failed to average more than 2.43 runs while he was in the game. This makes sense at the player level. Anyone that followed Mr. Millwood's year last year would describe it as "unlucky." He pitched well, but just didn't get the win.

If you apply that to the team, it's even easier to prove. You can see how many runs a team scores and how many they give up. Based on those numbers you can come surprisingly close to predicting the number of games they "should" have won. Any difference between the "predicted wins" and the "actual wins" is "luck." Which tells you something. If a team comes very close to its "predicted wins" you can say that the team's record accurately reflects the performance of the team. Thus, a team that would be predicted to win 82 games that actually won 82 games, you could say, is an average team. In any event, it is accurate to say that teams that are close to their high predicted wins are actually good and those that are not but still actually won a lot of games were just lucky. (of course, at the end of the day, who cares if you're the one holding the world series trophy over your head, right?)

Baseball statisticians have figured out how to calculated predicted wins, and that number is eerily similar to the "pythagorean theorem." The basic idea is that if you were to plot runs scored on one axis and runs allowed on another axis, the closest distance between them (the hypotenuse) is wins (sort of). Anyway, the actual formula derived by Bill James is: runs scored squared divided by (runs scored squared plus runs allowed squared). This will give you the team's predicted winning percentage. Multiply by games played (in baseball = 162) and voila - Predicted Wins.

In 2005, the Indians scored 790 runs and allowed 642. Running that through the ol' formula gives you: 624100/(624100+412164) = 624100/1036264 = .6022 (predicted win pct) = 98 Wins. They actually won 93, there's about +-4 on the calculation, but that's an "unlucky" by about 5 Wins. IF they had won those 5 games, they still would not have won their division. (The White Sox won 99), but would have made the playoffs. Let's look at the White Sox: 741 Runs, 645 Runs Allowed = 549081/(549081+416025) = 549081/965106 = .5689 (predicted win pct) = 93 Wins. So, the White Sox were 6 games over their predicted wins. Which means they were lucky. The Indians had 93 wins, and they didn't make the playoffs. So, like I said, at the end of the day who cares about luck when you're the one with the World Series trophy right?

So, what can we take away from that though? Well, we can say that despite winning the World Series, the White Sox were lucky last year. Which tells Kenny Williams, their GM, that they aren't really as good as their record. And that chances are they won't be so lucky next year - if they don't improve, they may not even make the playoffs. Thus, in the offseason, Mr. Williams made quite a few moves to improve their offense (their runs allowed was very good, but their runs scored was in the lower half of the league). On the other hand, the Indians knew that they were unlucky; thus they made very few moves in the offseason - knowing that if they just got rid of the bad luck, they would make the playoffs.

Anyway. These same ideas apply to both football and basketball. The exponents change a bit (to 2.47 and 16.5, respectively - though basketball-reference suggests 14 as an exponent), but the theory is still the same. I was going to show how it worked with them, but you get the idea, and I'm out of time and space and patience. There has already been some significant work done on Pythagorean Wins for football and basketball (see the above links). Here some links to a football statistics database and a basketball statistics database that you can play around with.

I ran a file for this year's NBA pythagorean wins. Here's a summary: (predicted/actual)

Luckiest Team - Eastern Conference: (+4) Detroit (60/64) and New Jersey (45/49)
Unluckiest Team - Eastern Conference: (-6) Toronto (33/27) and Indiana (47/41)

Luckiest Team - Western Conference: (+8) Utah (33/41)
Unluckiest Team - Western Conference: (-3) Golden State (37/34), LA Lakers (48/45), and Memphis (52/49)

So what does all this mean? Well, not much quite frankly at the top of the Playoff brackets. There are no teams that would have made or missed the playoffs if they had or had not met their predicted wins. However, some of the rankings would have been a little different.

Current Playoff Matchups - Western Conference:
San Antonio (1)/Sacramento (8)
Phoenix (2)/LA Lakers (7)
Denver (3)/LA Clippers (6)
Dallas (4)/Memphis (5)

Pythagorean Playoffs - Western Conference:
San Antonio (1)/Sacramento (8)
Phoenix (2)/LA Clippers (7)
Denver (3)/LA Lakers (6)
Dallas (4)/Memphis (5)

As you can see - not much difference. Only the Kobe/Nash matchup would not have happened. But you'd have Kobe/Carmelo, which would have been equally interesting.

Current Playoffs - Eastern Conference:
Detroit (1)/Milwaukee (8)
Miami (2)/Chicago (7)
New Jersey (3)/Indiana (6)
Cleveland (4)/Washington (5)

Pythagorean Playoffs - Eastern Conference:
Detroit (1)/Milwaukee (8)
Miami (2)/Chicago (7)
New Jersey (3)/Washington (6)
Cleveland (4)/Indiana (5)

So, again, no interesting deviations other than the regional matchups of Cleveland/Indiana and New Jersey/Washington.

Tuesday, April 18, 2006

Baseketball Statistics (part 1)

I'm a huge baseball fan. I'm a ridiculous baseball fan. I admit to being attracted to the 'dark side' of baseball. That side of baseball that the likes of Tommy Lasorda refuse to admit exists. The side of baseball that has produced winning teams in Oakland for years, got Boston its first World Series victory since my grandfather was young enough to not remember it, created winning teams in Cleveland all through the 90s. I'm a believer in statistics, in numbers, in knowing that so many games are played and so many at-bats, and so many hits, and so many runs, and so many put-outs are made that every player tends to an average. That a player's numbers are an accurate representation of their comparative ability for that particular year. Using statistical methods you can normalize years against each other and get a relatively decent comparison from year-to-year of a player's ability. Because baseball has been played for so long and most statistics are based on simple metrics that have been tracked for ages we can track trends over time. Indeed, most statistics are based simply on Hits, Runs, Walks and Strikeouts. And those four numbers have been tracked since at least the early 1900s. And what we can say, after analyzing a lot of players is that players that show a propensity to do x, also show a propensity to do y. Thus, when Player A does x, we can postulate that there is a very good chance he will do y. This analysis is very useful in baseball. If you want to build a team around On-Base-Percentage ("OBP") and Slugging Percentage ("SLG"), you can find players who have exhibited tendencies that are indicative of getting on base and getting high-profit hits (doubles, triples, home runs). While statistics aren't perfect, teams like Oakland are showing that they work more often than not. Even without a whole lot of money, if you spend the money on the computer power and invest in the raw data collection, you can assemble teams of relatively young or inexpensive players to fill roles and put together a team capable of winning year after year. In some circles it's called "Moneyball" based on the book written by Michael Lewis about Billy Beane - the GM of the Oakland Athletics.

What I find most interesting is two things: 1) professionals in baseball continue to deride statistics as a "dork's game" and worthless - in their opinion nothing beats the opinion of a time-worn scout; 2) other sports have not adopted similar approaches despite the fact that some, like basketball, could probably profit greatly from it.

As to the first, I'll just point to the Los Angeles Dodgers - perhaps the one team most singularly attached to the 'old school' way of scouting. They've been terrible over the last few years. Spending money on players that 'look good' but have shown no real consistent ability at the major league level. They consistently outspend most teams in the majors, yet teams, like Oakland, consistently put up bigger winning percentages. Why? I would argue inept front office led by Tommy Lasorda - the holy ghost of all things Dodger. He single-handedly ran out of town the only GM to make any sense there and instituted his own old school methodologies. Simply put, the old ways are out of date. When they work, they are great, but I think GMs like Billy Beane and Mark Shapiro and Kenny Williams (notice they're all AL GMs??) are leading the way in developing teams using a combination of statistical analysis and scouting ground-work.

As for the second issue, I think the other sports are missing out. And the point of this series of posts will point out some places where those sports might be able to bastardize some of the well-developed baseball analytics for their own usage. Things like the pythagorean winning percentage, runs created, scoring efficiency, park ratings, etc. These metrics have a (relatively) long history in baseball and have been refined to the point where they work pretty well. Each post I'll try to take one metric and adopt it for use in some other sport to show how its theories will hold up (or not!) to cross-sports usage.

Enjoy. When I get some time, I'll look first at the pythagorean theory for predicting expected wins and losses.

Wednesday, April 05, 2006

Tom DeLay and Other Partisan Politics Ridiculousness

A New York Times Analysis Piece about Tom DeLay deciding to quit the House of Representatives. I also heard a short discussion of this on NPR this morning on my drive in to work. In any event, it provides a good opportunity to look at the role of partisan politics, since Mr. DeLay is widely credited with, if not inventing, at least feeding and exacerbating partisan politics in Washington, and by extension throughout the United States. Of course, say what you will, the Times is unabashedly Democrat. And this piece is no exception, they lay into Mr. DeLay and Republicans in general something fierce. All but calling the Republican Party a pack of tyrants who use their control over both legislative branches to feed their own power. They fail to suggest that if the roles were reversed and Democrats were the ones in power that they would do precisely the same thing.

And therein lies the problem of modern bi-partisan politics, and I think a general indictment on our modern society in general. In politics and law, in entertainment, in sports, indeed in common every day life there is an overwhelming contrariness. People feel the need to be contrary simply for the sake of disagreeing. Granted, it's fun, but it can be really frustrating when you are trying to get something accomplished.

Those who know me are screaming "HOLY POT AND FUCKING KETTLE BATMAN!" I am the most contrary person you'd ever want to meet. Well, sometimes I'm the most contrary person you'd ever want to meet. I have honestly argued with people over whether the sky was blue. I argued that the concept of "blue" is a linguistic construct and that other people call "blue" different things, and that it's just mere coincidence that it's called "blue" at all and not "speaker" or some other word that someone along the line had to have invented because there HAD to have been a time when "blue" was "invented" - it's a stupid, childish argument that, I think, everyone has had when they were 16 and finally figured out that they knew everything.

The problem is, I think this is the equivalent of just about every argument that is held any more in the political arena (I promise to try to keep this post centered on politics). On "Fresh Air" yesterday Ben Carlin, a producer of "The Daily Show", was asked about a spot that frequently appears on that show called "This Day in Punditry" (or something like that) where they have children assume the role of 'talking heads' found on MSNBC, FOX News, CNBC, CNN, etc. To point out just how childish the arguments can be. And I think he's right on. But it isn't limited to pundits - it happens every day in politics.

It seems that politicians will say whatever they think we (the public) want to hear. Or, if not what we want to hear, what they (by "they" I mean "their party") want us to hear on the subject. There is no objective debate of an issue. There is the Republican side and the Democrat side. Either you're "for" abortion, or you're "against" it. Forget for the moment that no one is going around saying they are "pro" abortion! It's a ridiculous position, but those in the Republican Party want you to believe that if you aren't against abortion, you are no better than a murderer. An "abortion" is just a representative topic, this is true for Medicare, Education, the War in Iraq, just about every topic you can imagine. You're either for us or you're against us. There is no middle-ground policy. They can afford to take this position because they are the dominant party and it doesn't matter if you agree with them or not.

Such contrarian stances result in bad law. It results in the state of South Dakota banning all abortions - for the SOLE REASON of testing Roe v. Wade. What's the point!? Yet, we have to go through the motions because the Republicans feel some need to assert their power on the issue. Actually, they know that inertia is a powerful force. Half the battle in any contrarian policy is getting enough people to agree with you - so if they can power through an issue there is a good chance that it will never get reversed. It's inertia, and the result of an A.D.D. society. They know they can put an idea out there, the public will get in a huff over it, and if it survives the huff, the public will move on to the next topic and they can get their legislation through. It's happening with abortion. It's already happened with the War in Iraq and wiretapping. And it's going to happen with Digital Rights Management and other IP issues.

What the Republicans (or indeed the party in power, which just happens to be Republican Party right now) have figured out is that all you really need is a lot of money. If someone, say a Political Action Committee, has enough money, they can keep an issue on the legislative agenda forever. And if the issue stays on the agenda forever, eventually the public will move on and it will get passed. For instance, look at the "Broadcast Flag" issues. For three years the MPAA, RIAA and TV groups have been trying to get the broadcast flag issue passed. It refuses to go away despite overwhelming public resistence to it by manufacturers and the public. Yet, it stays on the plate and every year some Republican stooge (it's always a Republican, by the way) tries to sneak it into some budget as a rider that hopefully no one will notice. And every year, some one notices (usually the EFF) and the senators and representatives are flooded with mail and email and telephone calls and counter-propaganda about how unproductive it would be. Eventually it will get passed. Why? Because the powers that want it have more money than the powers that don't and they can keep it on the agenda forever. And eventually the EFF will be looking the other way.

And that's how politics works. There's no intelligent debate. There's just "the way it's going to be." And that way is whatever way someone wants to pay for it to be. We don't argue the relative merits of a position and select a course of action that is reasonable and move on to another topic. We yell at each other until one party forgets what the other is yelling about. It's like throwing shit at a wall to see what sticks. Except when the shit falls off the wall, it just gets picked up (assuming someone has paid for it to get picked up) and put back in the pile to throw at the wall.

Right now it is the Republican agenda that is getting most of the shit to stick. Why? Because they are in control of both parts of the legislative branch, the executive branch, and the judicial branch. So, throwing money at them is a good idea - you are more likely to get your agenda passed. In the 1970s and into the 1980s it was the Democrats. And what happened, well let's see - civil rights, advances in gender equality, increased quality of education (especially at the university level among the lower-tier universities), increased access to education, etc. And, who was making money? Teacher's unions, old people, manufacturers' unions, and other political action committees who have a vested, monetary interest in ensuring that their constituency gets paid.

Of course, we also had war, and terrible foreign relations, and oil shortages. Uhhh...wait. Maybe that stuff isn't related to politics. But wait, let's look a little deeper, because while we have those things, we have some very powerful companies making a shit load of money off of them (unlike the first time around) - for example, Halliburton is seeing record profits because of the war, the oil companies are seeing all time record profits, etc. So, unlike the first time, the economy isn't being hurt by those 'problems' this time it's being helped!!

Anyway, as usual, this has gotten far afield. But the main thesis is this: because we don't have a negotiation process in politics, because our political process is so contrarian and majoritarian, we have the result of a dominant party that forces through paid-for legislation, rather than legislation that is aimed at the best interests of the general public. Am I just going to bitch, or do I have a real solution? Mostly, I'm just going to bitch today. But my suggestion, which will likely be the subject of some other post some other day is: a viable third party.