US Names by State: Part I (Mary is everywhere!)

I was browsing the Social Security Administration’s website and found a link for the open government initiative (http://www.ssa.gov/open/data/). There seems to be a fair amount of interesting data here, but I grabbed the names of people born in the US since 1910 (http://www.ssa.gov/oact/babynames/limits.html). Each state has a data file that lists the number of births under a given name by year in that state and the gender of the child.

There’s a lot of interesting analysis that could be done with this data, but I’m going to start by just plotting the most popular name by state by gender across the entire dataset (after 1910).

Here is the plot for males:

Male

We can see that John is most popular in the Mid-Atlantic (PA, NY, etc.) Robert is most popular in the Midwest and the northeastern states. James dominates large portions of the South while Michael is most popular in the West, Southwest and Florida.

Here is the plot for females:

Female

Mary was the most popular name basically everywhere in the country (with the exceptions of CA and NV where there were more Jennifers).

It’s interesting to see how dominant Mary is across the entire country while the males names seem to have more regional dominance. It is particularly unusual because states tended to have many more distinct female names than male names.

More analysis will follow, but here is the code…

###### Settings
library(plyr)
library(maps)
setwd("C:/Blog/StateName")
files<-list.files()
files<-files[grepl(".TXT",files)]
files<-files[files!="DC.TXT"]
 
###### State structure
regions1=c("alabama","arizona","arkansas","california","colorado","connecticut","delaware",
	"florida","georgia","idaho","illinois","indiana","iowa","kansas",
	"kentucky","louisiana","maine","maryland","massachusetts:main","michigan:south","minnesota",
	"mississippi","missouri","montana","nebraska","nevada","new hampshire","new jersey",
	"new mexico","new york:main","north carolina:main","north dakota","ohio","oklahoma",
	"oregon","pennsylvania","rhode island","south carolina","south dakota","tennessee",
	"texas","utah","vermont","virginia:main","washington:main","west virginia",
	"wisconsin","wyoming")
 
mat<-as.data.frame(cbind(regions1,NA,NA))
mat$V2<-as.character(mat$V2)
mat$V3<-as.character(mat$V3)
 
###### Reading files
for (i in 1:length(files))
	{
	data<-read.csv(files[i],header=F)
	colnames(data)<-c("State","Gender","Year","Name","People")
	data1<-ddply(data,.(Name,Gender),summarise,SUM=sum(People))
	male1<-data1[data1$Gender=="M",]
	female1<-data1[data1$Gender=="F",]
	male1<-male1[order(male1$SUM,decreasing=TRUE),]
	female1<-female1[order(female1$SUM,decreasing=TRUE),]
 
	mat$V2[grep(tolower(state.name[grep(data$State[1], state.abb)]),mat$regions1)]<-as.character(male1$Name[1])
	mat$V3[grep(tolower(state.name[grep(data$State[1], state.abb)]),mat$regions1)]<-as.character(female1$Name[1])
	}
 
jpeg("Male.jpeg",width=1200,height=800,quality=100)
map("state",fill=TRUE,col="skyblue")
map.text(add=TRUE,"state",regions=regions1,labels=mat$V2)
title("Most Popular Male Name (since 1910) by State")
dev.off()
 
jpeg("Female.jpeg",width=1200,height=800,quality=100)
map("state",fill=TRUE,col="pink")
map.text(add=TRUE,"state",regions=regions1,labels=mat$V3)
title("Most Popular Female Name (since 1910) by State")
dev.off()

Created by Pretty R at inside-R.org

US Population by Ethnicity Visualization

US Census 2011 (ACS) – choroplethr

As a statistician, I’ve always had a soft spot in my heart for the US Census. I love the rich data sets that are made publicly available and I’ve often experimented with visualizing the results. A couple of months ago, Ari Lamstein (a data scientist at Trulia) released the choroplethr package on CRAN (a repository for R packages). I pulled it up a couple of days ago and found it be simple and intuitive. Only a couple of simple commands are required to build plots like this: USPop

1) Go to http://www.census.gov/developers/tos/key_request.html to get a ACS API key.
2) Visit http://factfinder2.census.gov/faces/affhelp/jsf/pages/metadata.xhtml?lang=en&type=survey&id=survey.en.ACS_ACS to find the appropriate ACS table ID for the attribute that you’re looking to explore.
3) Open up R, install choroplethr package, define your API key using the api.key.install() command
4) Explore away!

I started looking at the US population split by ethnicity.
USPopWhite

USPopBlack

USPopAsian

We can see very clearly the heavier concentrations of African-Americans in the Southeastern states, the Eastern seaboard and Southern CA. Asian-American population centers are focused on the West Coast and the NE Coast.

The R code is shown below:

###### Settings
library(choroplethr)
library(acs)
library(ggplot2)
 
###### API key
# Need to go to http://www.census.gov/developers/tos/key_request.html to set API key
api.key.install("###############")
 
###### Basic ACS Table IDs 
# B19301 = Per Capita Income
# B01003 = Population
 
###### Plotting
## Basic by State
choroplethr_acs(tableId="B19301",lod="state")
choroplethr_acs(tableId="B19301",lod="state",showLabels=FALSE)
choroplethr_acs(tableId="B19301",lod="state",showLabels=FALSE,num_buckets=9)
choroplethr_acs(tableId="B19301",lod="state",showLabels=FALSE,num_buckets=9)+labs(title="US 2011 Per Capita Income by State")
 
## Per Capita Income by County
choroplethr_acs(tableId="B19301",lod="county")
choroplethr_acs(tableId="B19301",lod="county",num_buckets=9,states=c("CA"))
 
## Population by County by Ethnicity
choroplethr_acs(tableId="B01003",lod="county")+labs(title="Total US Population by County (2011)")
choroplethr_acs(tableId="B02008",lod="county")+labs(title="US Population by County (2011) - White")
choroplethr_acs(tableId="B02009",lod="county")+labs(title="US Population by County (2011) - Black ")
choroplethr_acs(tableId="B02011",lod="county")+labs(title="US Population by County (2011) - Asian")
choroplethr_acs(tableId="B03001",lod="county")+labs(title="US Population by County (2011) - Hispanic")