It's Quasi-Greek to Me
June 21, 2006
Baseball is a complicated game. It's basics are simple enough:
throw the ball, hit the ball and catch the ball. But analysis of
the
game is kinda like chasing after a greased pig: every time you think
you have a handle on it, it wriggles away. There's a very popular
formula that is purported to determine how well a team is expected to
perform. It's nicknamed the Pythagorean theorem because it uses
squares of two numbers to determine a third, namely runs scored squared
divided by the sum of runs scored squared plus runs allowed
squared. The result should give one a reasonably accurate
expected winning percentage. The reasoning behind it is logical
enough - teams that score more runs than they allow should win more
games than they lose and the greater the difference in those two
totals, the greater the change in winning percentage.
But how accurate and/or reliable is it?
There appear to be several problems with it as an indicator of how good
a team really is. The first is that it's based on the assumption
that a team's potential for scoring and preventing runs remains the
same over the course of a six month season. Unfortunately teams
always use more than the 25 players they begin the season with so as
soon as the roster changes due to an injury, a trade, a demotion or a
manager getting fired (because the new one will usually change the way
the team approaches the game, otherwise the previous one wouldn't have
been fired) then the projection is invalid. Teams are always
making significant roster changes which means that their ability to
score and prevent runs is also constantly changing. This is also
why this theorem is such a terrible predictor of which team will win
the World Series once the playoffs begin; the projection is based on
the results of what 45-55 guys did over the course of a 162-game season
and not limited to just what the 25 guys who are actually playing in
the playoffs did.
Another flaw is that teams with good bullpens routinely exceed the
expectation and teams with bad ones routinely fall short and the
margin of error can be as great as 12 games either way. Given
that the
vast majority of teams fall between 70-90 wins in any given year
anyway, it doesn't appear to be particularly useful. Just as an
example, the
White Sox have missed their Pythagorean projection by at least
5 wins four times in the last ten years and missed another by four
wins. Last year they beat it by 8 wins. The Yankees have
missed theirs by 5, 12, 5, 4, 6 wins over the last five years.
Another problem with the formula has come up in the SOMBOE
Strat-o-matic league. I took a tally of my boxscores through the
first 45 games of this simulated season and it turns out that I've been
on the wrong end
of six blowouts in
which my team lost by 10 or more runs. In those games, they
were outscored by a combined 102-21. However, in all the other
games my
team has outscored its
opponents 220-163. So looking at the overall numbers, my squad
should have a losing record according to the theorem. Remove the
blowouts and my second-best won-loss record in the league is fully
supported by the run differential.
Which begs the question: would it be possible for a real major league
team to field three ace pitchers at the top of the rotation and have an
ace closer and set-up man (as my SOMBOE team does), but have lousy
pitchers on the rest of the staff? Economically, it seems very
possible. The effect would be that in three out of every five
games the team would hold the opposition to a very low score, but in
the other two they would surrender quite a few runs. In addition,
any
game where one of the aces didn't go deep in the game would also result
in a very high score. The likely overall difference between runs
scored and runs allowed wouldn't be that great but it would not be at
all surprising to see this team win 88-90 games if the offense was half
decent because of their starting troika's ability to shut down opposing
offenses in 60% of their games plus an ability to win close games with
two studs
pitching the 8th and 9th innings. And in fact the Minnesota Twins
used this very formula to
win 90 games in 2003 and 92 in 2004 outperforming their projection by
five wins each season. And one might even argue that the Yankees
have
been successful in recent years using a similar model with Rivera and
Gordon at the end although it's debateable that they've had more than
one ace in the rotation.
So what to make of the baseball version of the Pythagorean
theorem? Well, it's certainly better than randomly generating
numbers between 70 and 90 and better than a lot of the "educated"
guesses one gets from sportswriters. Just as a point of interest,
it's not certain
that the orginal Pythagoras was the actual author of the original
theorem (A squared + B squared = C squared where it applies to the
lengths of the sides of right triangles). In ancient Greece it
was
fairly common to credit teachers with the discoveries of their
students. And it's also thought that an Indian mathmetician named
Baudhayana discovered the relationship 300 years before the famous
Greek did. Given the flaws of baseball's version and the dubious
authorship of the original, maybe neither formula is all it's been made
out to be.