The Fail of Statistics
January 19, 2013
Many people believe that
the natural world is largely mechanical, that everything is measurable and that
outcomes are largely predictable if you know the actors. However, astrophysicists now understand that
less than 5% of our universe falls into that category, that the other 95% is
dark energy (75%) and dark matter (20%) for which we have little or no
observable evidence or measure. They
only knew it existed because the equations that calculate the weight of our
universe don’t add up without it. It
should be humbling to know that only 5% of our existence is presently knowable
but that doesn’t stop some from speaking authoritatively and conclusively about
the statistical value of human endeavors.
Of course, I’m talking about baseball and the Hall of Fame.
The voting that failed to induct any candidates this past ballot
was viewed as a huge disappointment because many felt the voting was more about
morality than statistics. Nevermind that a
couple of players were only able to achieve their impressive statistics by
cheating. To that end, many have
proposed that the Hall of Fame should only be about statistics.
There are several reasons why a Hall of Statistics would not be
a better solution. Even though they are
viewed as an objective measurement, unfortunately
baseball statistics aren't purely objective.
There are plenty of events that impact the game that aren't recorded,
plenty more that are attributed incorrectly to one player and still others that
are the subjective interpretation of the official scorer. Granted, at least the accumulation of
statistics is an honest attempt at objectivity but it falls short of being
reliable enough to ignore all other data inputs. Not all valuable human endeavors are measurable.
Sometimes we know something is different but can't exactly put our fingers on
it. We strive to understand and measure the changes, but sometimes the
technology just isn't there. That
doesn't mean that what we're seeing isn't important, though. For example, among the things that aren’t
recorded or are mis-attributed that significantly
effect the outcomes of games are players throwing to the wrong base, overly
aggressive/passive baserunning, team errors (which
are recorded as hits) and the impact a catcher’s arm and/or a pitcher’s move
have on scoring by limiting/extending secondary leads. The latter can very easily be mis-attributed to the outfielder arms or infielder
range. How about the impact a manager
has warming up relievers whom he does not bring into the game? Do they get fatigued from multiple warm-ups
inning after inning, and if so, how much does that affect their effectiveness
both in the near term and over the course of a season? There are no statistics for warm-ups so we
really don’t have a clear gauge on how much impact it has on each pitcher’s
performance. Those are just a few; I
haven’t even begun to explore all the ways umpire calls affect the game yet are
not incorporated into the statistical history.
And don’t get me started about how things “even out” over the course of
season or a lifetime because any honest appraisal of the evidence suggests
otherwise.
But if those things really
mattered, they’d show up in the Pythagorean projection, right? Or be reflected in a team’s record in one run-games, which typically yields about as many wins as
losses in a given year. Well, maybe they
do, maybe they don’t. Last year, eight
teams were five games outside their statistical projection. The year before there were eight teams outside
of five games from where they should have finished using conventional
predictive means. In 2010 there were
only 2 teams outside the 5-game range, but in 2009 that number was 14 and there were 9 more in 2008. This is not just a recent development. In 32 of the last 43 seasons – since
divisional play began in 1969 - at least 20% of the league missed its
projection by at least 5 games, and from 2003-2005 at least 30% of the league
exceeded that margin. Even in the strike
years of 1981 and 1994, when roughly two thirds of the season was played, at
least a quarter of the teams missed their projection by at least 4 games. Given that the vast majority of teams win
between 70 and 95 games, a 5-game margin of error is pretty significant. Taken as a single player using a popular
stat, WAR (wins above replacement), five wins is the difference between Mike
Trout and Angel Pagan, between David Wright and Alberto Callaspo,
between Buster Posey and Ryan Hanigan, between Justin
Verlander and Luis Mendoza. A discrepancy of five wins is a pretty big
deal. And as you can see, this kind of
miscalculation happens with a significant portion of the league and with
regularity. The most common culprit for
these kinds of anomalies in expectation is a team’s performance in one-run
games but that that doesn’t always provide the answer. For example, last year the Cubs
under-performed their projection by 4 games, but were 15-27 in one-run games,
12 games under the break-even mark.
Clearly they were doing something much worse than everyone else. In 2010, the Astros out-performed their
projection by 8 games but were 21-18 in tight games. In 2009, they were 6 games better than they
should have been but 24-23 in close affairs, and were even more “lucky” in 2008
when they beat Pythagoras by 9 games with an even 21-21 record in one-run
games. Luck is not known for striking
consistently and/or predictably. The
Astros seemed to be doing something better than everyone else, but whatever it
was, statistically it was invisible.
Our grasp of statistics
evolves all the time, sometimes significantly so. Imagine a Hall of Fame where
the only statistics to measure a player are batting average and wins. That is
how primitive we will likely view the current state of statistical analysis in
20 years. With that evolution, do we then re-evaluate each year the players
we've already inducted because our standard of understanding will have
improved? Do we start kicking guys out
because what were once viewed as great players statistically aren’t as good as
we thought they were? Imagine if we did
that with other awards like the Grammys or the Nobel Prize. “Sorry, sir, we realize your music was very
popular back in the day, but now we realize it was completely toxic rubbish and
we’re taking all your awards back.”
Wouldn’t that make a great reality show?
Or, “We’re sorry but in retrospect your innovative equations and
insightful revelations about the inner workings of the universe turn out to be
nothing more than common sense. We
expect you to fully return your endowment.”
No, I don’t think we can go that direction. The mistakes of human history should not be
swept away into a closet, but held completely public so we can learn from them
and not continue to repeat them. That
last part is the important bit, particularly with the Hall of Fame. There are a number of arguments stating that
because one particular player was mistakenly enshrined, we should lower the bar
for entry to allow all equivalent players in.
Just stop. One mistake is enough,
thank you. I’d prefer we not make the
same one repeatedly.
Another problem with the
Hall of Statistics is that even if we were to agree to such a thing, no one can
agree which statistics are the most relevant.
For example, sometimes individual stats camouflage a player's complete
value. I'm not saying Jack Morris is necessarily a Hall-of-Famer, but he
completed 175 games in the DH era entirely in the American League, which is
more than anyone except for Bert Blyleven. He also went at least 8 full innings
in 241 starts (most of anyone during that span), almost half time he took the
mound. There is value in being able to rest the bullpen once or twice a week. I
don't know how much value statistically that ability has because there aren’t
any statistics to measure it, but ask any Manny Acta-managed
team how important that is when August and September roll around. His teams routinely get out of the gate
quickly because he likes to use his bullpen liberally as if each game is the 7th
game of the World Series, but they fade badly late in the summer when his
bullpen is tired and worn out.
Regardless, the voters won’t put Morris in because of his less than
impressive ERA and ERA+. So despite the
fact he gave his team innings that helped keep his teams’ bullpen fresh - by
the way, Morris’ career ERA in the ninth inning of games was 2.78, which was
his best mark of any inning by more than half a run - they won’t put him
in. But on the flip side, they also
won’t vote for guys like Jose Rijo, who never
finished higher than 4th in the Cy Young voting and
only in the top 10 twice. Why is that
significant? Because before he injured
his arm in his final season, Jose Rijo had the lowest
career ERA in the National League since Pete Alexander, at 2.63. That’s lower than Bob Gibson, lower than
Sandy Koufax, lower than Nolan Ryan, Steve Carlton, Warren Spahn,
Tom Seaver and Greg Maddux. His adjusted ERA (ERA+) of 147 during his
healthy NL years would have been good enough for 5th best
all-time. And yet he never got close to
being honored for his achievement. Their
reasoning for not giving him more Cy Young
consideration? He didn’t give his team
enough innings.
Suggesting that all
players can be evaluated competently using only stats is like suggesting that
all dishes made with chicken and rice are the same, that
the spices don’t matter. Sure, you get
sustenance from eating any one of them but it doesn’t take a gourmand to know
that jambalaya, chicken biryani, chicken tikka massala,