# Motor Vehicle Collision Density in NYC

In a previous post, I visualized crime density in Boston using R’s ggmap package. In this post, I use ggmap to visualize vehicle accidents in New York City. R code and data are included in this post.

The data comes from NYC Open Data. My data cut ranges from 2012 to 2015. The data tracks the type of vehicle, the name of the street on which the accident occurs, as well as the longitude and latitude coordinates of the accident. Both coordinates are saved as a single character variable called “LOCATION”.

Below, I load the ggmap and gridExtra packages, load the data, drop all accidents without location coordinates, and parse the LOCATION variable to get the longitude and latitude coordinates. I also parse the date variable to create a year variable and use that variable to create two data sets: one with all vehicle accidents in 2013 and another with all vehicle accidents in 2014


library(ggmap)
library(gridExtra)

d_clean=d[which(regexpr(',',d$LOCATION)!=-1),] comm=regexpr(',',d_clean$LOCATION)
d_clean$loc=as.character(d_clean$LOCATION)
d_clean$lat=as.numeric(substr(d_clean$loc,2,comm-1))
d_clean$long=as.numeric(substr(d_clean$loc,comm+1,nchar(d_clean$loc)-1)) d_clean$year=substr(d_clean$DATE,7,10) d_2013=d_clean[which(d_clean$year=='2013'),c('long','lat')]
d_2014=d_clean[which(d_clean$year=='2014'),c('long','lat')]  # Visualizing Hubway Trips in Boston Most Popular Hubway Stations (in order): 1. Post Office Sq. – located in the heart of the financial district. 2. Charles St. & Cambridge – the first Hubway stop after crossing from Cambridge over Longfellow Bridge. 3. Tremont St & West – East side of the Boston Common 4. South Station 5. Cross St. & Hannover – entrance to North End combing from financial district. 6. Boylston St & Berkeley – between Copley and the Common. 7. Stuart St & Charles – Theatre district, just south of the Common. 8. Boylston & Fairfield – located in front of the Boylston Street entrance to the Pru. 9. The Esplanade (Beacon St & Arlington) – this stop is on the north end of the Esplanade running along the Charles. 10. Chinatown Gate Plaza 11. Prudential at Belvidere 12. Boston Public Library on Boylston 13. Boston Aquarium 14. Newbury Street & Hereford 15. Government Center # Drug Crime Density in Boston The original map can be found in my previous post, Spatial Data Visualization with R. As a review, I use get_map function in the ggmap package to grab a map of Boston from Google Maps. The “crime report” data can be found at https://data.cityofboston.gov/. In the code below, I bring in crime data as a csv file. The data contains one observation per crime. It includes a description, crime category (drug, traffic violation, etc), and the longitude/latitude coordinates of the crime scene. I added density areas using stat_density2d function. I feed this function a set of coordinates using the x= and y= parameters. The alpha parameter adjust transparency with 1 being perfect solid and 0 being fully transparent. I set the fill to vary with the number of points in the shaded area (..level..). I also specify bins=8, which gives us 7 shades of blue. The density areas can be interpreted as follows: all the shaded areas together contain 7/8 of drug crimes in the data. Each shade represents 1/8 of drug crimes in the data. Since all shades represent the same proportion of the data, the smaller the area of a particular shade, the higher the arrest density. R code is given below. //update: Thanks to Verbal for the meaningful distinction between crimes and arrest. It’s not really clear what this data actually tracks. I’m sure that this is reported crime, as called in by people. I don’t think every crime here leads to an arrest. I could be wrong. # Finding and Plotting Lorenz Solution using MATLAB I use MATLAB to solve the following Lorenz initial value problem: $\begin{cases} x'=-10(x+y) \\ y'=-x(z+28)-y \\ z'=xy-\frac{8}{3}z \\ x(0)=y(0)=z(0)=5 \end{cases}$ I wrote a function, LorenzRK4IVP(), that takes the system of three differential equations as input and solves the system using the Runge-Kutta method with step size $h=.01$. I plot the strange attractor as well as use MATLAB to produce a GIF of the solution. # Interactive Cobb-Douglas Web App with R I used Shiny to make an interactive cobb-douglass production surface in R. It reacts to user’s share of labor and capital inputs and allows the user to rotate the surface. The contour plot (isoquants) is also dynamic. Shiny works using two R codes stored in the same folder. One R code works on the user interface (UI) side and the other works on the server side. On the UI side, I take user inputs for figure rotations and capital/labor inputs via slidebars and output a plot of the surface and isoquants. library(shiny) shinyUI(pageWithSidebar( headerPanel("Cobb-Douglas Production Function"), sidebarPanel( sliderInput("L","Share of Labor:", min=0, max=1, value=.5, step=.1), sliderInput("C","Share of Capital:", min=0, max=1, value=.5, step=.1), sliderInput("p1","Rotate Horizontal:", min=0, max=180, value=40, step=10), sliderInput("p2","Rotate Vertical:", min=0, max=180, value=20, step=10) ), mainPanel( plotOutput("p"), plotOutput('con')) ))  The manipulation of the inputs is done on the server side: library(shiny) options(shiny.reactlog=TRUE) shinyServer(function(input,output){ observe({ # create x, y ranges for plots x=seq(0,200,5) y=seq(0,200,5) # Cobb-Douglass Model model<- function(a,b){ (a**(as.numeric(input$L)))*(b**(as.numeric(input$C))) } # generate surface values at given x-y points z<-outer(x,y,model) # create gradient for surface pal<-colorRampPalette(c("#f2f2f2", "Blue")) colors<-pal(max(z)) colors2<-pal(max(y)) # plot functions output$p<-renderPlot({persp(x,y,z,
theta=as.numeric(input$p1), phi=as.numeric(input$p2),
col=colors[z],
xlab="Share of Labor",
ylab="Share of Capital",
zlab="Output"
)})
output$con<-renderPlot({contour(x,y,z, theta=as.numeric(input$p1),
phi=as.numeric(input$p2), col=colors2[y], xlab="Share of Labor", ylab="Share of Capital" )}) }) })  # Iterative OLS Regression Using Gauss-Seidel I just finished covering a few numerical techniques for solving systems of equations, which can be applied to find best-fit lines through a give set of data points. The four points $\{(0,0), (1,3), (2,3), (5,6)\}$ are arranged into an inconsistent system of four equations and two unknowns: $b+a(0) = 0 \\ b+a(1) = 3 \\ b+a(2) = 3 \\ b+a(5) = 6$ The system can be represented in matrix form: $\begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \\ 1 & 5 \end{bmatrix} \begin{bmatrix} b \\ a \end{bmatrix} = \begin{bmatrix} 1 \\ 3 \\ 3 \\ 6 \end{bmatrix}$ The least-squares solution vector can be found by solving the normal equations $A^TA\vec{x_*}=A^T\vec{b}$ Solving for $A^TA$ and $A^T \vec{b}$ yields the following matrix system: $\begin{bmatrix} 4 & 8 \\ 8 & 30 \end{bmatrix} \begin{bmatrix} b \\ a \end{bmatrix} = \begin{bmatrix} 12 \\ 39 \end{bmatrix}$ The system can be solved using Gauss-Seidel method. Below, I run 10 iterations of Gauss-Seidel (visualized in the figure above). The estimated line gets successively closer to the true solution in green. The estimates are shown in blue – each iteration is shown in a darker shade than the next (see highlighted lines). library(MASS) # package needed for ginv() function A=cbind(c(4,8),c(8,30)) # coefficient matrix b<-t(t(c(12,39))) # b-vector x0<-t(t(c(0,0))) # starting vector iter<-10 # number of Gauss-Seidel iterations to run L<-lower.tri(A)*A # lower triang. A U<-upper.tri(A)*A # upper triang. A D<-diag(diag(A)) # diag of A # plot points xc<-c(0,1,2,5) yc<-c(0,3,3,6) plot(xc,yc, col='red', xlab="X-Values", ylab="Y-Values", bty='n') title(main="OLS Estimates") legend("bottomright", c("Data","Estimates","True Line"), lty=c(0,1,1), pch=c(1,NA,NA), col=c("red","blue","green"), bty='n') # create color palette - lines will get darker with each iter pal<-colorRampPalette(c("#f2f2f2", "Blue")) colors<-pal(iter) # creates color palette of length(iter) # plot true line abline(a=1.0714,b=.8571,col='green') n<-1 while(n<=iter){ print(x0) # Gauss-Seidel formula x1<-(ginv((L+D)))%*%((-U%*%x0)+b) x0<-x1 n=n+1 # plot estimated line abline(a=as.numeric(x0[2,1]), # slope of estimated line b=as.numeric(x0[1,1]), # y-intercept of estimated line col=colors[n]) # pick nth color in palette }  # Spatial Data Visualization with R I’ve been fooling around with spatial data lately. As it turns out, there are some great R packages for visualizing this kind of data. Below is a set of charts I put together. It’s a good sample of the possibilities. Motherjones.com keeps a dataset with characteristics of every mass shooting since 1983. The location of each shooting is marked on the map below with a red circle. The size of the circle is determined by the number of fatalities. Newtown and Virginia Tech (both school shootings) are among the deadliest within this time period. In the vast majority of these cases, the shooters were white males with a history of mental illness who obtained their weapons legally. # Mass Shootings # packages used: rworldmap rworldxtra # data source: www.motherjones.com US &lt;- getMap(resolution = "high") plot(US,xlim=c(-125,-65),ylim=c(39,39), asp=1.31803) title(main="Mass Shootings 1982-2013") points(d$longitude,d$latitude,col="red",cex=d$Fatalities*.25)

text(-69.31142,37.21232,"Newtown")
text(-72.41394,30.22957,"Virginia Tech")
text(-111.04308,38.55200,"San Ysidro \n McDonald's Massacre")
text(-89.72780,25.9,"Luby's Massacre")

#using locator() -- add lines from circles to labels
points(c(-77.67630,-72.99422),c(36.08547,31.16065),type='l')
points(c(-71.71729, -69.05702),c(39.79927,37.94237),type='l')
points(c(-96.51104, -92.68024),c(29.62669,26.23582),type='l')
points(c(-115.8778, -111.4086),c(33.98637, 36.73135),type='l')


R is flexible with spatial data. It can zoom out of the United States and display global data. Malaysia Airlines has been in the news a few times over the last year, so it’s a pretty topical example. We can plot all of the Malaysia airline’s routes below using data from openflights.org. In the last chart, magnitude was shown using the size of the circles, but here we can show magnitude using the shade of the routes. Routes to popular destinations are a brighter shade of blue.

I’ve also plotted the Air France and American Airlines routes. The actual mapping was easy to do. I used a combination of xworldmap and xworldxtra for the world map along with geosphare for the route arcs.

# Airline Data
# Packages: rworldmap rworldxtra geosphere
# Source: OpenFlights.org; flowingdata.com

# plot world map
map("world", col="grey15", fill=TRUE, bg="Black")

pal &lt;- colorRampPalette(c("#f2f2f2", "Blue"))
colors &lt;- pal(100)

#plot each route
attach(gs)
for(i in 1:length(S_Long)){
inter &lt;- gcIntermediate(cbind(gs[i,]$S_Long, gs[i,]$S_Lat),
cbind(gs[i,]$D_Long, gs[i,]$D_Lat), n=100)

index&lt;-round( (Dest_Count/max(Dest_Count))*length(colors))

lines(inter, col=colors[index], lwd=.2)
}
title(main="American Airline Routes",col.main="Blue")


I saved the best for last.

Ggmap allows R to fetch maps directly from Google and zoom into specific cities. Below is a map of Boston showing crime locations in 2014. The red dots represent shootings and blue dots represent drug offenses. I downloaded the data from data.cityofboston.gov Darker red areas represent more shooting events at that location. Most of the shootings seem to be clustered around Brookline/Roxbury.

If we zoom into the center of Boston, we see much fewer shootings. There are still many drug busts, but they’re concentrated in mainly three areas: Chinatown (shocker!), East Boston, and the South End.


bos_2&lt;-boston[which(boston$Shooting=='Yes' &amp; boston$Year=='2014'),]
bos_3&lt;-boston[which(boston$INCIDENT_TYPE_DESCRIPTION=='DRUG CHARGES' &amp; boston$Year=='2014'),]

bos_plot&lt;-ggmap(get_map("Boston, Massachusetts",zoom=13))
bos_plot+geom_point(data=bos_2,aes(x=bos_2$Lat,y=bos_2$Long),
col='red',alpha=.5,
size=5)+geom_point(data=bos_3,aes(x=bos_3$Lat,y=bos_3$Long),
col='blue',alpha=.5,
size=2)