You want to obtain US Census data at a low level of geographic aggregation. This example uses Washington, DC, Census tracts, but other states or levels of aggregation work similarly.
Install (if needed) and load the UScensus2010
package in R:
# install.packages("UScensus2010")
library(UScensus2010)
Select a level of geography, and obtain and load the package for that level, which contains the data for that level. Specify your operating system. Note: for low levels of geography, there will be many units, so this may take a while, depending on your computer/connection speed.
install.tract("osx")
After the first installation, you can start each new session here, with
library(UScensus2010tract)
Make the data available in the workspace:
data("district_of_columbia.tract10")
The data are stored as a SpatialPolygonsDataFrame
. This includes a standard data frame with 179 tracts and 461 Census variables. See help(district_of_columbia.tract10)
for details.
Obtaining the spatial data involves S4 classes. The data frame is in the @data
slot.
To see the first 6 rows and columns,
district_of_columbia.tract10@data[1:6, 1:6]
## state county tract fips P0010001
## district_of_columbia.tract10_0 11 001 002201 11001002201 3442
## district_of_columbia.tract10_1 11 001 002202 11001002202 3087
## district_of_columbia.tract10_2 11 001 002301 11001002301 2974
## district_of_columbia.tract10_3 11 001 002302 11001002302 2036
## district_of_columbia.tract10_4 11 001 002400 11001002400 3618
## district_of_columbia.tract10_5 11 001 002501 11001002501 2554
## P0020001
## district_of_columbia.tract10_0 3442
## district_of_columbia.tract10_1 3087
## district_of_columbia.tract10_2 2974
## district_of_columbia.tract10_3 2036
## district_of_columbia.tract10_4 3618
## district_of_columbia.tract10_5 2554
To get the center of the \(k^{th}\) tract, e.g., the \(10^{th}\) tract,
k <- 10
district_of_columbia.tract10@polygons[[k]]@labpt
## [1] -77.04182 38.93018
To store the centers of all 179 tracts,
## Get the number of tracts:
n_tract <- length(district_of_columbia.tract10)
## Create storage matrix:
centers <- matrix(NA, nrow = n_tract, ncol = 2)
## Loop over tracts, extracting longitude and latitude and storing:
for(i in 1:n_tract){
this_center <- district_of_columbia.tract10@polygons[[i]]@labpt
centers[i, ] <- this_center
}
Then I store this as a data frame and rename the columns:
centers <- data.frame(centers)
names(centers) <- c("long", "lat")
Here are the first few longitudes and latitudes:
head(centers)
## long lat
## 1 -77.02329 38.94913
## 2 -77.01386 38.94884
## 3 -77.01657 38.94259
## 4 -77.01088 38.93390
## 5 -77.02241 38.94178
## 6 -77.03173 38.94465