Motor Vehicle Collision Density in NYC

DensityByBorough

In a previous post, I visualized crime density in Boston using R’s ggmap package. In this post, I use ggmap to visualize vehicle accidents in New York City. R code and data are included in this post.

The data comes from NYC Open Data. My data cut ranges from 2012 to 2015. The data tracks the type of vehicle, the name of the street on which the accident occurs, as well as the longitude and latitude coordinates of the accident. Both coordinates are saved as a single character variable called “LOCATION”.

Below, I load the ggmap and gridExtra packages, load the data, drop all accidents without location coordinates, and parse the LOCATION variable to get the longitude and latitude coordinates. I also parse the date variable to create a year variable and use that variable to create two data sets: one with all vehicle accidents in 2013 and another with all vehicle accidents in 2014

 
library(ggmap)
library(gridExtra)

d=read.csv('.../NYPD_Motor_Vehicle_Collisions.csv')

d_clean=d[which(regexpr(',',d$LOCATION)!=-1),]

comm=regexpr(',',d_clean$LOCATION)
d_clean$loc=as.character(d_clean$LOCATION)
d_clean$lat=as.numeric(substr(d_clean$loc,2,comm-1))
d_clean$long=as.numeric(substr(d_clean$loc,comm+1,nchar(d_clean$loc)-1))
d_clean$year=substr(d_clean$DATE,7,10)

d_2013=d_clean[which(d_clean$year=='2013'),c('long','lat')]
d_2014=d_clean[which(d_clean$year=='2014'),c('long','lat')]

Continue reading Motor Vehicle Collision Density in NYC

Visualizing Hubway Trips in Boston

Most Popular Hubway Stations (in order):

  1. Post Office Sq. – located in the heart of the financial district.
  2. Charles St. & Cambridge – the first Hubway stop after crossing from Cambridge over Longfellow Bridge.
  3. Tremont St & West – East side of the Boston Common
  4. South Station
  5. Cross St. & Hannover – entrance to North End combing from financial district.
  6. Boylston St & Berkeley – between Copley and the Common.
  7. Stuart St & Charles – Theatre district, just south of the Common.
  8. Boylston & Fairfield – located in front of the Boylston Street entrance to the Pru.
  9. The Esplanade (Beacon St & Arlington) – this stop is on the north end of the Esplanade running along the Charles.
  10. Chinatown Gate Plaza
  11. Prudential at Belvidere
  12. Boston Public Library on Boylston
  13. Boston Aquarium
  14. Newbury Street & Hereford
  15. Government Center

Continue reading Visualizing Hubway Trips in Boston

Drug Crime Density in Boston

Drug Crimes in Boston with high-density areas shown using blue gradient shading. Each shade represents 1/8 of the data. Crime scenes are shown using gray dots.
Drug Crimes in Boston with high-density areas shown using blue gradient shading.
Each shade represents 1/8 of the data.
Crime scenes are shown using gray dots.

The original map can be found in my previous post, Spatial Data Visualization with R.

As a review, I use get_map function in the ggmap package to grab a map of Boston from Google Maps. The “crime report” data can be found at https://data.cityofboston.gov/. In the code below, I bring in crime data as a csv file. The data contains one observation per crime. It includes a description, crime category (drug, traffic violation, etc), and the longitude/latitude coordinates of the crime scene.

I added density areas using stat_density2d function. I feed this function a set of coordinates using the x= and y= parameters. The alpha parameter adjust transparency with 1 being perfect solid and 0 being fully transparent. I set the fill to vary with the number of points in the shaded area (..level..). I also specify bins=8, which gives us 7 shades of blue. The density areas can be interpreted as follows: all the shaded areas together contain 7/8 of drug crimes in the data. Each shade represents 1/8 of drug crimes in the data. Since all shades represent the same proportion of the data, the smaller the area of a particular shade, the higher the arrest density.

R code is given below.

//update: Thanks to Verbal for the meaningful distinction between crimes and arrest. It’s not really clear what this data actually tracks. I’m sure that this is reported crime, as called in by people. I don’t think every crime here leads to an arrest. I could be wrong.

Continue reading Drug Crime Density in Boston

Finding and Plotting Lorenz Solution using MATLAB

I use MATLAB to solve the following Lorenz initial value problem:
\begin{cases} x'=-10(x+y) \\ y'=-x(z+28)-y \\ z'=xy-\frac{8}{3}z \\ x(0)=y(0)=z(0)=5 \end{cases}

I wrote a function, LorenzRK4IVP(), that takes the system of three differential equations as input and solves the system using the Runge-Kutta method with step size h=.01. I plot the strange attractor as well as use MATLAB to produce a GIF of the solution.

LorenzRK4

testnew51

Continue reading Finding and Plotting Lorenz Solution using MATLAB

Interactive Cobb-Douglas Web App with R

Screen Shot 2014-11-04 at 7.24.02 PM

I used Shiny to make an interactive cobb-douglass production surface in R. It reacts to user’s share of labor and capital inputs and allows the user to rotate the surface. The contour plot (isoquants) is also dynamic.

Shiny works using two R codes stored in the same folder. One R code works on the user interface (UI) side and the other works on the server side.

On the UI side, I take user inputs for figure rotations and capital/labor inputs via slidebars and output a plot of the surface and isoquants.

library(shiny)

shinyUI(pageWithSidebar(
  headerPanel("Cobb-Douglas Production Function"),
  sidebarPanel(
    sliderInput("L","Share of Labor:",
                min=0, max=1, value=.5, step=.1),
    sliderInput("C","Share of Capital:",
                min=0, max=1, value=.5, step=.1),  
    sliderInput("p1","Rotate Horizontal:",
              min=0, max=180, value=40, step=10),
    sliderInput("p2","Rotate Vertical:",
            min=0, max=180, value=20, step=10)
    ),
  mainPanel( plotOutput("p"),
             plotOutput('con'))  
))

The manipulation of the inputs is done on the server side:

library(shiny)
options(shiny.reactlog=TRUE)

shinyServer(function(input,output){
  observe({
  # create x, y ranges for plots
  x=seq(0,200,5)
  y=seq(0,200,5)
  
  # Cobb-Douglass Model
  model<-  function(a,b){
      (a**(as.numeric(input$L)))*(b**(as.numeric(input$C)))
  }

  # generate surface values at given x-y points
  z<-outer(x,y,model)

  # create gradient for surface
  pal<-colorRampPalette(c("#f2f2f2", "Blue"))
  colors<-pal(max(z))
  colors2<-pal(max(y)) 
  
  # plot functions
  output$p<-renderPlot({persp(x,y,z,
                              theta=as.numeric(input$p1),
                              phi=as.numeric(input$p2),
                              col=colors[z],
                              xlab="Share of Labor",
                              ylab="Share of Capital",
                              zlab="Output"
                      )})
  output$con<-renderPlot({contour(x,y,z,
                              theta=as.numeric(input$p1),
                              phi=as.numeric(input$p2),
                              col=colors2[y],
                              xlab="Share of Labor",
                              ylab="Share of Capital"
  )})
  })
})

Iterative OLS Regression Using Gauss-Seidel

Figure shows 10 iterations of Gauss-Seidel's OLS estimates. Estimates get successively closer to the true line, shown in green.
Figure shows 10 iterations of Gauss-Seidel’s OLS estimates. Estimates get successively closer to the true line, shown in green.

I just finished covering a few numerical techniques for solving systems of equations, which can be applied to find best-fit lines through a give set of data points.

The four points \{(0,0), (1,3), (2,3), (5,6)\} are arranged into an inconsistent system of four equations and two unknowns:

b+a(0) = 0 \\  b+a(1) = 3 \\  b+a(2) = 3 \\  b+a(5) = 6

The system can be represented in matrix form:

\begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \\ 1 & 5 \end{bmatrix}  \begin{bmatrix} b \\ a \end{bmatrix}  =  \begin{bmatrix} 1 \\ 3 \\ 3 \\ 6 \end{bmatrix}

The least-squares solution vector can be found by solving the normal equations

A^TA\vec{x_*}=A^T\vec{b}

Solving for A^TA and A^T \vec{b} yields the following matrix system:

\begin{bmatrix} 4 & 8 \\ 8 & 30 \end{bmatrix}  \begin{bmatrix} b \\ a \end{bmatrix}  =  \begin{bmatrix} 12 \\ 39 \end{bmatrix}

The system can be solved using Gauss-Seidel method.

Below, I run 10 iterations of Gauss-Seidel (visualized in the figure above). The estimated line gets successively closer to the true solution in green. The estimates are shown in blue – each iteration is shown in a darker shade than the next (see highlighted lines).

library(MASS) # package needed for ginv() function 

A=cbind(c(4,8),c(8,30)) # coefficient matrix
b<-t(t(c(12,39))) # b-vector
x0<-t(t(c(0,0)))  # starting vector
iter<-10          # number of Gauss-Seidel iterations to run

L<-lower.tri(A)*A  # lower triang. A
U<-upper.tri(A)*A  # upper triang. A
D<-diag(diag(A))   # diag of A

# plot points 
xc<-c(0,1,2,5)
yc<-c(0,3,3,6)
plot(xc,yc,
     col='red',
     xlab="X-Values",
     ylab="Y-Values",
     bty='n')
title(main="OLS Estimates")
legend("bottomright",
       c("Data","Estimates","True Line"),
       lty=c(0,1,1),
       pch=c(1,NA,NA),
       col=c("red","blue","green"),
       bty='n')

# create color palette - lines will get darker with each iter
pal<-colorRampPalette(c("#f2f2f2", "Blue"))
colors<-pal(iter) # creates color palette of length(iter)

# plot true line
abline(a=1.0714,b=.8571,col='green')

n<-1
while(n<=iter){
  print(x0)
  
  # Gauss-Seidel formula
  x1<-(ginv((L+D)))%*%((-U%*%x0)+b)
  x0<-x1
  n=n+1
  
  # plot estimated line  
  abline(a=as.numeric(x0[2,1]), # slope of estimated line
         b=as.numeric(x0[1,1]), # y-intercept of estimated line
         col=colors[n]) # pick nth color in palette
}

Spatial Data Visualization with R

I’ve been fooling around with spatial data lately. As it turns out, there are some great R packages for visualizing this kind of data.

Below is a set of charts I put together. It’s a good sample of the possibilities.


Motherjones.com keeps a dataset with characteristics of every mass shooting since 1983. The location of each shooting is marked on the map below with a red circle. The size of the circle is determined by the number of fatalities. Newtown and Virginia Tech (both school shootings) are among the deadliest within this time period. 

In the vast majority of these cases, the shooters were white males with a history of mental illness who obtained their weapons legally.

Larger circles indicate higher fatalities.
Larger circles indicate higher fatalities.
# Mass Shootings
# packages used: rworldmap rworldxtra
# data source: www.motherjones.com
US &lt;- getMap(resolution = "high")

plot(US,xlim=c(-125,-65),ylim=c(39,39), asp=1.31803)
title(main="Mass Shootings 1982-2013")

points(d$longitude,d$latitude,col="red",cex=d$Fatalities*.25)

text(-69.31142,37.21232,"Newtown")
text(-72.41394,30.22957,"Virginia Tech")
text(-111.04308,38.55200,"San Ysidro \n McDonald's Massacre")
text(-89.72780,25.9,"Luby's Massacre")

#using locator() -- add lines from circles to labels
points(c(-77.67630,-72.99422),c(36.08547,31.16065),type='l')
points(c(-71.71729, -69.05702),c(39.79927,37.94237),type='l')
points(c(-96.51104, -92.68024),c(29.62669,26.23582),type='l')
points(c(-115.8778, -111.4086),c(33.98637, 36.73135),type='l')

 

R is flexible with spatial data. It can zoom out of the United States and display global data. Malaysia Airlines has been in the news a few times over the last year, so it’s a pretty topical example. We can plot all of the Malaysia airline’s routes below using data from openflights.org. In the last chart, magnitude was shown using the size of the circles, but here we can show magnitude using the shade of the routes. Routes to popular destinations are a brighter shade of blue.

I’ve also plotted the Air France and American Airlines routes. The actual mapping was easy to do. I used a combination of xworldmap and xworldxtra for the world map along with geosphare for the route arcs.

MalaysiaAirline  AirFrance

 

AmericanAirlines

 

# Airline Data
# Packages: rworldmap rworldxtra geosphere
# Source: OpenFlights.org; flowingdata.com

# plot world map
map("world", col="grey15", fill=TRUE, bg="Black")

#create 100 shades of blue
pal &lt;- colorRampPalette(c("#f2f2f2", "Blue"))
colors &lt;- pal(100)

#plot each route
attach(gs)
for(i in 1:length(S_Long)){
  inter &lt;- gcIntermediate(cbind(gs[i,]$S_Long, gs[i,]$S_Lat), 
  cbind(gs[i,]$D_Long, gs[i,]$D_Lat), n=100)
 
  index&lt;-round( (Dest_Count/max(Dest_Count))*length(colors))
 
  lines(inter, col=colors[index], lwd=.2)
}
title(main="American Airline Routes",col.main="Blue")

 


 

I saved the best for last.

Ggmap allows R to fetch maps directly from Google and zoom into specific cities. Below is a map of Boston showing crime locations in 2014. The red dots represent shootings and blue dots represent drug offenses. I downloaded the data from data.cityofboston.gov Darker red areas represent more shooting events at that location. Most of the shootings seem to be clustered around Brookline/Roxbury.

Blue marks represent drug charges and red dots represent shootings.
Blue marks represent drug charges and red dots represent shootings.

If we zoom into the center of Boston, we see much fewer shootings. There are still many drug busts, but they’re concentrated in mainly three areas: Chinatown (shocker!), East Boston, and the South End.

Boston2

 

 
bos_2&lt;-boston[which(boston$Shooting=='Yes' &amp; boston$Year=='2014'),]
bos_3&lt;-boston[which(boston$INCIDENT_TYPE_DESCRIPTION=='DRUG CHARGES' &amp; boston$Year=='2014'),]

bos_plot&lt;-ggmap(get_map("Boston, Massachusetts",zoom=13))
bos_plot+geom_point(data=bos_2,aes(x=bos_2$Lat,y=bos_2$Long), 
 col='red',alpha=.5,
 size=5)+geom_point(data=bos_3,aes(x=bos_3$Lat,y=bos_3$Long), 
 col='blue',alpha=.5,
 size=2)