Tuesday, April 18, 2006

Baseketball Statistics (part 1)

I'm a huge baseball fan. I'm a ridiculous baseball fan. I admit to being attracted to the 'dark side' of baseball. That side of baseball that the likes of Tommy Lasorda refuse to admit exists. The side of baseball that has produced winning teams in Oakland for years, got Boston its first World Series victory since my grandfather was young enough to not remember it, created winning teams in Cleveland all through the 90s. I'm a believer in statistics, in numbers, in knowing that so many games are played and so many at-bats, and so many hits, and so many runs, and so many put-outs are made that every player tends to an average. That a player's numbers are an accurate representation of their comparative ability for that particular year. Using statistical methods you can normalize years against each other and get a relatively decent comparison from year-to-year of a player's ability. Because baseball has been played for so long and most statistics are based on simple metrics that have been tracked for ages we can track trends over time. Indeed, most statistics are based simply on Hits, Runs, Walks and Strikeouts. And those four numbers have been tracked since at least the early 1900s. And what we can say, after analyzing a lot of players is that players that show a propensity to do x, also show a propensity to do y. Thus, when Player A does x, we can postulate that there is a very good chance he will do y. This analysis is very useful in baseball. If you want to build a team around On-Base-Percentage ("OBP") and Slugging Percentage ("SLG"), you can find players who have exhibited tendencies that are indicative of getting on base and getting high-profit hits (doubles, triples, home runs). While statistics aren't perfect, teams like Oakland are showing that they work more often than not. Even without a whole lot of money, if you spend the money on the computer power and invest in the raw data collection, you can assemble teams of relatively young or inexpensive players to fill roles and put together a team capable of winning year after year. In some circles it's called "Moneyball" based on the book written by Michael Lewis about Billy Beane - the GM of the Oakland Athletics.

What I find most interesting is two things: 1) professionals in baseball continue to deride statistics as a "dork's game" and worthless - in their opinion nothing beats the opinion of a time-worn scout; 2) other sports have not adopted similar approaches despite the fact that some, like basketball, could probably profit greatly from it.

As to the first, I'll just point to the Los Angeles Dodgers - perhaps the one team most singularly attached to the 'old school' way of scouting. They've been terrible over the last few years. Spending money on players that 'look good' but have shown no real consistent ability at the major league level. They consistently outspend most teams in the majors, yet teams, like Oakland, consistently put up bigger winning percentages. Why? I would argue inept front office led by Tommy Lasorda - the holy ghost of all things Dodger. He single-handedly ran out of town the only GM to make any sense there and instituted his own old school methodologies. Simply put, the old ways are out of date. When they work, they are great, but I think GMs like Billy Beane and Mark Shapiro and Kenny Williams (notice they're all AL GMs??) are leading the way in developing teams using a combination of statistical analysis and scouting ground-work.

As for the second issue, I think the other sports are missing out. And the point of this series of posts will point out some places where those sports might be able to bastardize some of the well-developed baseball analytics for their own usage. Things like the pythagorean winning percentage, runs created, scoring efficiency, park ratings, etc. These metrics have a (relatively) long history in baseball and have been refined to the point where they work pretty well. Each post I'll try to take one metric and adopt it for use in some other sport to show how its theories will hold up (or not!) to cross-sports usage.

Enjoy. When I get some time, I'll look first at the pythagorean theory for predicting expected wins and losses.

No comments: