It's Quasi-Greek to Me

June 21, 2006

Baseball is a complicated game. It's basics are simple enough: throw the ball, hit the ball and catch the ball. But analysis of the game is kinda like chasing after a greased pig: every time you think you have a handle on it, it wriggles away. There's a very popular formula that is purported to determine how well a team is expected to perform. It's nicknamed the Pythagorean theorem because it uses squares of two numbers to determine a third, namely runs scored squared divided by the sum of runs scored squared plus runs allowed squared. The result should give one a reasonably accurate expected winning percentage. The reasoning behind it is logical enough - teams that score more runs than they allow should win more games than they lose and the greater the difference in those two totals, the greater the change in winning percentage.

But how accurate and/or reliable is it?

There appear to be several problems with it as an indicator of how good a team really is. The first is that it's based on the assumption that a team's potential for scoring and preventing runs remains the same over the course of a six month season. Unfortunately teams always use more than the 25 players they begin the season with so as soon as the roster changes due to an injury, a trade, a demotion or a manager getting fired (because the new one will usually change the way the team approaches the game, otherwise the previous one wouldn't have been fired) then the projection is invalid. Teams are always making significant roster changes which means that their ability to score and prevent runs is also constantly changing. This is also why this theorem is such a terrible predictor of which team will win the World Series once the playoffs begin; the projection is based on the results of what 45-55 guys did over the course of a 162-game season and not limited to just what the 25 guys who are actually playing in the playoffs did.

Another flaw is that teams with good bullpens routinely exceed the expectation and teams with bad ones routinely fall short and the margin of error can be as great as 12 games either way. Given that the vast majority of teams fall between 70-90 wins in any given year anyway, it doesn't appear to be particularly useful. Just as an example, the White Sox have missed their Pythagorean projection by at least 5 wins four times in the last ten years and missed another by four wins. Last year they beat it by 8 wins. The Yankees have missed theirs by 5, 12, 5, 4, 6 wins over the last five years.

Another problem with the formula has come up in the SOMBOE Strat-o-matic league. I took a tally of my boxscores through the first 45 games of this simulated season and it turns out that I've been on the wrong end of six blowouts in which my team lost by 10 or more runs. In those games, they were outscored by a combined 102-21. However, in all the other games my team has outscored its opponents 220-163. So looking at the overall numbers, my squad should have a losing record according to the theorem. Remove the blowouts and my second-best won-loss record in the league is fully supported by the run differential.

Which begs the question: would it be possible for a real major league team to field three ace pitchers at the top of the rotation and have an ace closer and set-up man (as my SOMBOE team does), but have lousy pitchers on the rest of the staff? Economically, it seems very possible. The effect would be that in three out of every five games the team would hold the opposition to a very low score, but in the other two they would surrender quite a few runs. In addition, any game where one of the aces didn't go deep in the game would also result in a very high score. The likely overall difference between runs scored and runs allowed wouldn't be that great but it would not be at all surprising to see this team win 88-90 games if the offense was half decent because of their starting troika's ability to shut down opposing offenses in 60% of their games plus an ability to win close games with two studs pitching the 8th and 9th innings. And in fact the Minnesota Twins used this very formula to win 90 games in 2003 and 92 in 2004 outperforming their projection by five wins each season. And one might even argue that the Yankees have been successful in recent years using a similar model with Rivera and Gordon at the end although it's debateable that they've had more than one ace in the rotation.

So what to make of the baseball version of the Pythagorean theorem? Well, it's certainly better than randomly generating numbers between 70 and 90 and better than a lot of the "educated" guesses one gets from sportswriters. Just as a point of interest, it's not certain that the orginal Pythagoras was the actual author of the original theorem (A squared + B squared = C squared where it applies to the lengths of the sides of right triangles). In ancient Greece it was fairly common to credit teachers with the discoveries of their students. And it's also thought that an Indian mathmetician named Baudhayana discovered the relationship 300 years before the famous Greek did. Given the flaws of baseball's version and the dubious authorship of the original, maybe neither formula is all it's been made out to be.

June 21, 2006

Baseball is a complicated game. It's basics are simple enough: throw the ball, hit the ball and catch the ball. But analysis of the game is kinda like chasing after a greased pig: every time you think you have a handle on it, it wriggles away. There's a very popular formula that is purported to determine how well a team is expected to perform. It's nicknamed the Pythagorean theorem because it uses squares of two numbers to determine a third, namely runs scored squared divided by the sum of runs scored squared plus runs allowed squared. The result should give one a reasonably accurate expected winning percentage. The reasoning behind it is logical enough - teams that score more runs than they allow should win more games than they lose and the greater the difference in those two totals, the greater the change in winning percentage.

But how accurate and/or reliable is it?

There appear to be several problems with it as an indicator of how good a team really is. The first is that it's based on the assumption that a team's potential for scoring and preventing runs remains the same over the course of a six month season. Unfortunately teams always use more than the 25 players they begin the season with so as soon as the roster changes due to an injury, a trade, a demotion or a manager getting fired (because the new one will usually change the way the team approaches the game, otherwise the previous one wouldn't have been fired) then the projection is invalid. Teams are always making significant roster changes which means that their ability to score and prevent runs is also constantly changing. This is also why this theorem is such a terrible predictor of which team will win the World Series once the playoffs begin; the projection is based on the results of what 45-55 guys did over the course of a 162-game season and not limited to just what the 25 guys who are actually playing in the playoffs did.

Another flaw is that teams with good bullpens routinely exceed the expectation and teams with bad ones routinely fall short and the margin of error can be as great as 12 games either way. Given that the vast majority of teams fall between 70-90 wins in any given year anyway, it doesn't appear to be particularly useful. Just as an example, the White Sox have missed their Pythagorean projection by at least 5 wins four times in the last ten years and missed another by four wins. Last year they beat it by 8 wins. The Yankees have missed theirs by 5, 12, 5, 4, 6 wins over the last five years.

Another problem with the formula has come up in the SOMBOE Strat-o-matic league. I took a tally of my boxscores through the first 45 games of this simulated season and it turns out that I've been on the wrong end of six blowouts in which my team lost by 10 or more runs. In those games, they were outscored by a combined 102-21. However, in all the other games my team has outscored its opponents 220-163. So looking at the overall numbers, my squad should have a losing record according to the theorem. Remove the blowouts and my second-best won-loss record in the league is fully supported by the run differential.

Which begs the question: would it be possible for a real major league team to field three ace pitchers at the top of the rotation and have an ace closer and set-up man (as my SOMBOE team does), but have lousy pitchers on the rest of the staff? Economically, it seems very possible. The effect would be that in three out of every five games the team would hold the opposition to a very low score, but in the other two they would surrender quite a few runs. In addition, any game where one of the aces didn't go deep in the game would also result in a very high score. The likely overall difference between runs scored and runs allowed wouldn't be that great but it would not be at all surprising to see this team win 88-90 games if the offense was half decent because of their starting troika's ability to shut down opposing offenses in 60% of their games plus an ability to win close games with two studs pitching the 8th and 9th innings. And in fact the Minnesota Twins used this very formula to win 90 games in 2003 and 92 in 2004 outperforming their projection by five wins each season. And one might even argue that the Yankees have been successful in recent years using a similar model with Rivera and Gordon at the end although it's debateable that they've had more than one ace in the rotation.

So what to make of the baseball version of the Pythagorean theorem? Well, it's certainly better than randomly generating numbers between 70 and 90 and better than a lot of the "educated" guesses one gets from sportswriters. Just as a point of interest, it's not certain that the orginal Pythagoras was the actual author of the original theorem (A squared + B squared = C squared where it applies to the lengths of the sides of right triangles). In ancient Greece it was fairly common to credit teachers with the discoveries of their students. And it's also thought that an Indian mathmetician named Baudhayana discovered the relationship 300 years before the famous Greek did. Given the flaws of baseball's version and the dubious authorship of the original, maybe neither formula is all it's been made out to be.