The Fail of Statistics

January 19, 2013



Many people believe that the natural world is largely mechanical, that everything is measurable and that outcomes are largely predictable if you know the actors.  However, astrophysicists now understand that less than 5% of our universe falls into that category, that the other 95% is dark energy (75%) and dark matter (20%) for which we have little or no observable evidence or measure.  They only knew it existed because the equations that calculate the weight of our universe don’t add up without it.  It should be humbling to know that only 5% of our existence is presently knowable but that doesn’t stop some from speaking authoritatively and conclusively about the statistical value of human endeavors.  Of course, I’m talking about baseball and the Hall of Fame.


The voting that failed to induct any candidates this past ballot was viewed as a huge disappointment because many felt the voting was more about morality than statistics.  Nevermind that a couple of players were only able to achieve their impressive statistics by cheating.  To that end, many have proposed that the Hall of Fame should only be about statistics. 


There are several reasons why a Hall of Statistics would not be a better solution.  Even though they are viewed as an objective measurement, unfortunately baseball statistics aren't purely objective.  There are plenty of events that impact the game that aren't recorded, plenty more that are attributed incorrectly to one player and still others that are the subjective interpretation of the official scorer.  Granted, at least the accumulation of statistics is an honest attempt at objectivity but it falls short of being reliable enough to ignore all other data inputs.  Not all valuable human endeavors are measurable. Sometimes we know something is different but can't exactly put our fingers on it. We strive to understand and measure the changes, but sometimes the technology just isn't there.  That doesn't mean that what we're seeing isn't important, though.  For example, among the things that aren’t recorded or are mis-attributed that significantly effect the outcomes of games are players throwing to the wrong base, overly aggressive/passive baserunning, team errors (which are recorded as hits) and the impact a catcher’s arm and/or a pitcher’s move have on scoring by limiting/extending secondary leads.  The latter can very easily be mis-attributed to the outfielder arms or infielder range.  How about the impact a manager has warming up relievers whom he does not bring into the game?  Do they get fatigued from multiple warm-ups inning after inning, and if so, how much does that affect their effectiveness both in the near term and over the course of a season?  There are no statistics for warm-ups so we really don’t have a clear gauge on how much impact it has on each pitcher’s performance.  Those are just a few; I haven’t even begun to explore all the ways umpire calls affect the game yet are not incorporated into the statistical history.   And don’t get me started about how things “even out” over the course of season or a lifetime because any honest appraisal of the evidence suggests otherwise.


But if those things really mattered, they’d show up in the Pythagorean projection, right?  Or be reflected in a team’s record in one run-games, which typically yields about as many wins as losses in a given year.  Well, maybe they do, maybe they don’t.  Last year, eight teams were five games outside their statistical projection.  The year before there were eight teams outside of five games from where they should have finished using conventional predictive means.  In 2010 there were only 2 teams outside the 5-game range, but in 2009 that number was 14 and there were 9 more in 2008.  This is not just a recent development.  In 32 of the last 43 seasons – since divisional play began in 1969 - at least 20% of the league missed its projection by at least 5 games, and from 2003-2005 at least 30% of the league exceeded that margin.  Even in the strike years of 1981 and 1994, when roughly two thirds of the season was played, at least a quarter of the teams missed their projection by at least 4 games.  Given that the vast majority of teams win between 70 and 95 games, a 5-game margin of error is pretty significant.  Taken as a single player using a popular stat, WAR (wins above replacement), five wins is the difference between Mike Trout and Angel Pagan, between David Wright and Alberto Callaspo, between Buster Posey and Ryan Hanigan, between Justin Verlander and Luis Mendoza.  A discrepancy of five wins is a pretty big deal.  And as you can see, this kind of miscalculation happens with a significant portion of the league and with regularity.  The most common culprit for these kinds of anomalies in expectation is a team’s performance in one-run games but that that doesn’t always provide the answer.  For example, last year the Cubs under-performed their projection by 4 games, but were 15-27 in one-run games, 12 games under the break-even mark.  Clearly they were doing something much worse than everyone else.  In 2010, the Astros out-performed their projection by 8 games but were 21-18 in tight games.  In 2009, they were 6 games better than they should have been but 24-23 in close affairs, and were even more “lucky” in 2008 when they beat Pythagoras by 9 games with an even 21-21 record in one-run games.  Luck is not known for striking consistently and/or predictably.  The Astros seemed to be doing something better than everyone else, but whatever it was, statistically it was invisible.    


Our grasp of statistics evolves all the time, sometimes significantly so. Imagine a Hall of Fame where the only statistics to measure a player are batting average and wins. That is how primitive we will likely view the current state of statistical analysis in 20 years. With that evolution, do we then re-evaluate each year the players we've already inducted because our standard of understanding will have improved?  Do we start kicking guys out because what were once viewed as great players statistically aren’t as good as we thought they were?  Imagine if we did that with other awards like the Grammys or the Nobel Prize.  “Sorry, sir, we realize your music was very popular back in the day, but now we realize it was completely toxic rubbish and we’re taking all your awards back.”  Wouldn’t that make a great reality show?  Or, “We’re sorry but in retrospect your innovative equations and insightful revelations about the inner workings of the universe turn out to be nothing more than common sense.  We expect you to fully return your endowment.”  No, I don’t think we can go that direction.  The mistakes of human history should not be swept away into a closet, but held completely public so we can learn from them and not continue to repeat them.  That last part is the important bit, particularly with the Hall of Fame.  There are a number of arguments stating that because one particular player was mistakenly enshrined, we should lower the bar for entry to allow all equivalent players in.  Just stop.  One mistake is enough, thank you.  I’d prefer we not make the same one repeatedly.


Another problem with the Hall of Statistics is that even if we were to agree to such a thing, no one can agree which statistics are the most relevant.  For example, sometimes individual stats camouflage a player's complete value. I'm not saying Jack Morris is necessarily a Hall-of-Famer, but he completed 175 games in the DH era entirely in the American League, which is more than anyone except for Bert Blyleven. He also went at least 8 full innings in 241 starts (most of anyone during that span), almost half time he took the mound. There is value in being able to rest the bullpen once or twice a week. I don't know how much value statistically that ability has because there aren’t any statistics to measure it, but ask any Manny Acta-managed team how important that is when August and September roll around.  His teams routinely get out of the gate quickly because he likes to use his bullpen liberally as if each game is the 7th game of the World Series, but they fade badly late in the summer when his bullpen is tired and worn out.  Regardless, the voters won’t put Morris in because of his less than impressive ERA and ERA+.   So despite the fact he gave his team innings that helped keep his teams’ bullpen fresh - by the way, Morris’ career ERA in the ninth inning of games was 2.78, which was his best mark of any inning by more than half a run - they won’t put him in.  But on the flip side, they also won’t vote for guys like Jose Rijo, who never finished higher than 4th in the Cy Young voting and only in the top 10 twice.  Why is that significant?  Because before he injured his arm in his final season, Jose Rijo had the lowest career ERA in the National League since Pete Alexander, at 2.63.  That’s lower than Bob Gibson, lower than Sandy Koufax, lower than Nolan Ryan, Steve Carlton, Warren Spahn, Tom Seaver and Greg Maddux.  His adjusted ERA (ERA+) of 147 during his healthy NL years would have been good enough for 5th best all-time.  And yet he never got close to being honored for his achievement.  Their reasoning for not giving him more Cy Young consideration?  He didn’t give his team enough innings.  


Suggesting that all players can be evaluated competently using only stats is like suggesting that all dishes made with chicken and rice are the same, that the spices don’t matter.  Sure, you get sustenance from eating any one of them but it doesn’t take a gourmand to know that jambalaya, chicken biryani, chicken tikka massala, hunan chicken, chicken béarnaise, chicken teriyaki, kabob-e joojeh and paella are significantly different.  There’s still a lot of flavor in baseball we don’t completely understand and we’ll miss out on how important it is if we only focus on the rice.