Friday, August 31, 2007

The Disappearance of .400 Hitting

I've been reading this Stephen Jay Gould book, Full House, which is all about how various things in the world that we usually describe as "trends" are in actuality better understood as side effects of changes in statistical variation. One of the two main examples he uses throughout the book is the disappearance of batting averages over .400 over the last century. This "trend" does not represent a decline in hitting ability over time, but rather a narrowing of variation in hitting ability.

As the excellence of play in professional baseball has increased over the years, the difference in ability between the best and worst batters and the best and worst fielders has become smaller. In addition, MLB management has occasionally tweaked the rules to keep the mean batting average essentially constant from season to season. Also, the fact that .000 represents a hard lower limit (you can't have a negative batting average) means that the distribution of averages will skew right. As a consequence of these three factors, the right-hand tail of the batting average distribution curve has been brought in closer: whereas a century or so ago, the handful of batters out on the end of that curve were hitting above .400, that narrower curve now puts them significantly below .400. Here's some graphs showing how the curve has changed shape over the years:

So I've pretty much been thinking of all sorts of trends that may or may not be understood better with this sort of model. One that came to mind this morning was the stock market. It is sort of a general rule that market indexes and averages march ever upward. There are corrections and recessions and stuff, but the overall trend seems to be one of continuous increase. Does this represent a miracle of capitalism, that all firms will tend to gain in value over time, or is it merely a side effect of a widening variation among firms?

I didn't look for historical data—I know that the mean value of publicly traded firms has increased, so all I'm really interested in is the shape of the distribution of firms' values at a single point in time—but I was able to find the current values of all the US corporations listed on the NASDAQ. Here's the histogram showing how many firms are valued in each range (Y-axis is number of companies, X-axis is value in millions of dollars as of market close on 8/31):

It's the same kind of right-skewed curve as with batting averages, which makes sense: $0 represents another hard lower limit on firm value, because a company can't have negative worth without disappearing from the market. Gould talked about the idea of a "drunkard's walk": if a drunk leaves a bar, and is staggering back and forth on a sidewalk between the bar on one side a ditch on the other, then even if he staggers completely randomly (even chance of staggering towards the wall or towards the ditch) then he will still always end up in the ditch.

In the same way, it could be completely random whether a given company increases or decreases in value each day: some firms will move to the right in the distribution curve, some will move left, but the left limit will never drop below zero, and some very few (Microsoft) will inch the right limit further out. There needs be no general tendency for company values to increase, just variation increasing in the only direction it can over time.

Of course, that doesn't mean that the value of a company is random. Just that the shape of the curve could be explained by randomness. To actually demonstrate that it's all random you'd need to track individual firms over time. And in fact I would expect there to actually be tendency to increase in value as a result of the steadily increasing labor pool. But it is interesting that the growing mean value of corporations is not indicative of anything greater than an increase in variability.

No comments: