NYC Motor Vehicle Collisions – Street-Level Heat Map

StreetLevelMap In this post I will extend a previous analysis creating a borough-level heat map of NYC motor vehicle collisions. The data is from NYC Open Data. In particular, I will go from borough-level to street-level collisions. The processing of the code is very similar to the previous analysis, with a few more functions that map streets to colors. Below, I load the ggmap package, and the data, and only keep collisions with longitude and latitude information.

library(ggmap)

d=read.csv('.../NYPD_Motor_Vehicle_Collisions.csv')
d_clean=d[which(regexpr(',',d$LOCATION)!=-1),]

#### 1. Clean Data ####
# get long and lat coordinates from concatenated "location" var
comm=regexpr(',',d_clean$LOCATION)
d_clean$loc=as.character(d_clean$LOCATION)
d_clean$lat=as.numeric(substr(d_clean$loc,2,comm-1))
d_clean$long=as.numeric(substr(d_clean$loc,comm+1,nchar(d_clean$loc)-1))

# create year variable
d_clean$year=substr(d_clean$DATE,7,10)

I use the three functions below to process my data. The boro() function subsets to collisions with street names in a specified borough, since some collisions with coordinate data do not have street name data. The function then subsets to collisions in 2013. The accident_freq() functions calculates the frequency of collisions per street, then merges these numbers back to the collision-level data. This is important since the map needs collision-level data, for reasons that will be clear soon. The assign_col() function takes a collision-level data set (created with the accident_freq() function) for a particular borough and assigns each street a color ranging from white to a specified color (e.g. green, red, etc.). Streets with more collisions will be darker.

# functions boro() subsets to 2013 accidents in specified borough
boro=function(x){
 d_clean2=d_clean[which(d_clean$ON.STREET.NAME!='' & d_clean$BOROUGH==x),]
 d_2013_2=d_clean2[which(d_clean2$year=='2013'),c('long','lat','ON.STREET.NAME')]
return(d_2013_2)
}

# accident_freq() gets frequency of accidents per street for specified borough
accident_freq=function(x){
 tab=data.frame(table(x$ON.STREET.NAME))
 d_merge=merge(x=x,y=tab,by.x=c('ON.STREET.NAME'),by.y=c('Var1'))
 d_merge$freqPerc=round((d_merge$Freq/length(x$ON.STREET.NAME))*1000,digits=0)
 d_merge$freqPerc=ifelse(d_merge$freqPerc==0,1,d_merge$freqPerc)
return(d_merge)
}

# assign_col() assigns color shade to each street based on frequency
assign_col=function(x,c){
 pal=colorRampPalette(c('white',c))
 colors=pal(max(x$freqPerc))
 return(colors)
}

man=boro('MANHATTAN')
bronx=boro('BRONX')
brook=boro('BROOKLYN')
si=boro('STATEN ISLAND')
q=boro('QUEENS')

man_freq=accident_freq(man)
bronx_freq=accident_freq(bronx)
brook_freq=accident_freq(brook)
si_freq=accident_freq(si)
q_freq=accident_freq(q)

man_col=assign_col(man_freq,'dodgerblue')
bronx_col=assign_col(bronx_freq,'darkred')
brook_col=assign_col(brook_freq,'violet')
si_col=assign_col(si_freq,'darkgreen')
q_col=assign_col(q_freq,'darkgoldenrod4')

Finally, I use ggmap’s get_map() function to get a toner style map of NYC and add geom_path layers. There is one geom_path() layer per borough. Geom_path() connects all longitude and latitude points that are on the same street with a line or “path.” Essentially, it uses street as a grouping factor for the coordinates. All coordinates in a group are connected. Each line is then given a color determined by assign_col() using the col= parameter.

ny_plot=ggmap(get_map('New York, New York',zoom=11,maptype='toner'))

plot3=ny_plot+
 geom_path(data=man,size=1,aes(x=man$long, y=man$lat,group=man$ON.STREET.NAME),col=man_col[man_freq$freqPerc])+
 geom_path(data=bronx,size=1,aes(x=bronx$long, y=bronx$lat,group=bronx$ON.STREET.NAME),col=bronx_col[bronx_freq$freqPerc])+
 geom_path(data=brook,size=1,aes(x=brook$long, y=brook$lat,group=brook$ON.STREET.NAME),col=brook_col[brook_freq$freqPerc])+
 geom_path(data=si,size=1,aes(x=si$long, y=si$lat,group=si$ON.STREET.NAME),col=si_col[si_freq$freqPerc])+
 geom_path(data=q,size=1,aes(x=q$long, y=q$lat,group=q$ON.STREET.NAME),col=q_col[q_freq$freqPerc])+
 ggtitle('Street-Level NYC Vehicle Accidents by Borough')+
 xlab(" ")+ylab(" ")
plot3
Advertisements

3 thoughts on “NYC Motor Vehicle Collisions – Street-Level Heat Map

  1. got this error

    Error: Results must be all atomic, or all data frames
    In addition: Warning messages:
    1: In loop_apply(n, do.ply) :
    Removed 202 rows containing missing values (geom_path).
    2: In loop_apply(n, do.ply) :
    Removed 54 rows containing missing values (geom_path).

    1. Hi Syed,

      Those warning messages are familiar to me. They occur because the zoom setting in get_map() is too close. Therefore, some points will fall outside of the map’s range (hence the 54 and 202 removed rows).

      Try setting the zoom level lower and confirm that the numbers go to zero. I just tried replicating, but it seems the API is down this morning.

      This illustrates the point that there is a trade-off between data inclusion and map readability/zoom.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s