Sunday, June 2, 2013

Income by Subway Station

In April, the New Yorker ran a piece about income inequality along each of the subway lines, in which they made an interactive graphic portraying the median household income (from census data) at each of the subway stops on the line selected. For example, this is their graph for the F line.

It is an interesting exercise and produces some potentially informative graphics.  However, in determining household income, the New Yorker used a... I'll be polite: "unusual" technique.  They simply used the income value for the census tract in which their coordinates for the subway station lay.  This results in a number of problems.

Census tracts must be between 1,200 and 8,000 people, and most in New York seem to be mostly in the 2,000 to 5,000 range.  They vary in size as population density changes, but in New York are generally 1/16 to 1/4 square mile -- on the order of 8 blocks.  Many of the subway stations, like Columbus Circle or Carroll St have entrances 3 blocks apart from each other and in two different districts.  This means that the New Yorker's analysis would produce different income values based simply on which stairway they chose to mark the station.  The amazing thing is that they celebrate these statistical artifacts:
$142,265—The largest gap in median household income between two consecutive subway stations on the same line (between Fulton Street and Chambers Street on the A and the C lines, in Lower Manhattan).
As a first correction, we should at least average all of the census tracts that actually have subway stairs in them.  But what about other nearby ones?  How far out should we go?  Should neighboring tracts count in our average the same amount as slightly more distant ones?

In my analysis (you can argue with me if you want), I created a linearly decaying income weighting function, out to 1/2 mile (2.5 avenues or 10 blocks in Manhattan).  What this means is that tracts with a center on top of a given station get full weight, those 1/4 mile away get half weight, and those 1/2 mile or more have no influence.  It is important to note that the weighting values in the average for the tracts are relative to the values for all other tracts for a given station.  So for example, if a station is surrounded by 6 tracts, all with centers 1/4 mile away, all 6 would count equally towards the station's average income.  If there's two at 1/4 mile and two at 3/4 mile, the closer ones will influence the average by three times as much as the farther ones.

Thus, I get a map with the following median incomes. I have not created a line-by-line graphic like the New Yorker, but the data's all there if someone wants to be clever (see borough names below).

You can also explore a full screen version of the map.

The first thing you'll notice is that the income along the lines is much smoother in this analysis. The Fulton-Chambers difference is now $50,000, not $142,000.  The really big differences that remain are for stops actually separated by large distances.  The 4 largest (I believe) are the 4/5's 86th-125th St difference of over $100,00; The 2/3's Chambers-14th of $65,000; the A/D's Columbus-125th of $65,000; and the F's York-E.B'way of almost $60,000 across the East River. The greatest change for stops that are actually near each other is on the Upper East Side, when the income drops from $158k-$133k-$91k-$43k-$29k on 77th-86th-96th-103rd-116th Streets.  And poor Sutter Avenue on the L remains $12,000 less than any of its neighbors or anywhere in Brooklyn.

Since this technique involved comparing the distance between every station to every census tract (using data from the American Fact Finder), I broke the analysis down by borough to avoid creating a truly gigantic matrix.  For Manhattan and the Bronx, this is fine because no one walks across the East or Harlem Rivers to catch a train.  In Brooklyn and Queens, there may be some loss of accuracy along the border, since for example, no Brooklyn tracts are counted in the Seneca Ave M station, but there are few stations where this could really have an impact, and the data do not seem unusual.

I will not claim that this is the best way to analyze.  Maybe I should have a larger or smaller decay distance than 1/2 mile.  Maybe I should have used a different decay function than linear.  Maybe no decay function at all and simply give every tract with centers within 1/2 mile of the station full weight.  Maybe I should have even divided the map up into Voronoi polygons with one station in each and assign each census tract to exactly one subway stop.  But at any rate, this analysis produces more realistic and informative result than the technique used by the New Yorker.

No comments:

Post a Comment