Tag Archives: computing

NYC Motor Vehicle Collisions – Street-Level Heat Map

StreetLevelMap In this post I will extend a previous analysis creating a borough-level heat map of NYC motor vehicle collisions. The data is from NYC Open Data. In particular, I will go from borough-level to street-level collisions. The processing of the code is very similar to the previous analysis, with a few more functions that map streets to colors. Below, I load the ggmap package, and the data, and only keep collisions with longitude and latitude information.

library(ggmap)

d=read.csv('.../NYPD_Motor_Vehicle_Collisions.csv')
d_clean=d[which(regexpr(',',d$LOCATION)!=-1),]

#### 1. Clean Data ####
# get long and lat coordinates from concatenated "location" var
comm=regexpr(',',d_clean$LOCATION)
d_clean$loc=as.character(d_clean$LOCATION)
d_clean$lat=as.numeric(substr(d_clean$loc,2,comm-1))
d_clean$long=as.numeric(substr(d_clean$loc,comm+1,nchar(d_clean$loc)-1))

# create year variable
d_clean$year=substr(d_clean$DATE,7,10)

Continue reading NYC Motor Vehicle Collisions – Street-Level Heat Map

Motor Vehicle Collision Density in NYC

DensityByBorough

In a previous post, I visualized crime density in Boston using R’s ggmap package. In this post, I use ggmap to visualize vehicle accidents in New York City. R code and data are included in this post.

The data comes from NYC Open Data. My data cut ranges from 2012 to 2015. The data tracks the type of vehicle, the name of the street on which the accident occurs, as well as the longitude and latitude coordinates of the accident. Both coordinates are saved as a single character variable called “LOCATION”.

Below, I load the ggmap and gridExtra packages, load the data, drop all accidents without location coordinates, and parse the LOCATION variable to get the longitude and latitude coordinates. I also parse the date variable to create a year variable and use that variable to create two data sets: one with all vehicle accidents in 2013 and another with all vehicle accidents in 2014

 
library(ggmap)
library(gridExtra)

d=read.csv('.../NYPD_Motor_Vehicle_Collisions.csv')

d_clean=d[which(regexpr(',',d$LOCATION)!=-1),]

comm=regexpr(',',d_clean$LOCATION)
d_clean$loc=as.character(d_clean$LOCATION)
d_clean$lat=as.numeric(substr(d_clean$loc,2,comm-1))
d_clean$long=as.numeric(substr(d_clean$loc,comm+1,nchar(d_clean$loc)-1))
d_clean$year=substr(d_clean$DATE,7,10)

d_2013=d_clean[which(d_clean$year=='2013'),c('long','lat')]
d_2014=d_clean[which(d_clean$year=='2014'),c('long','lat')]

Continue reading Motor Vehicle Collision Density in NYC

Visualizing Hubway Trips in Boston

Most Popular Hubway Stations (in order):

  1. Post Office Sq. – located in the heart of the financial district.
  2. Charles St. & Cambridge – the first Hubway stop after crossing from Cambridge over Longfellow Bridge.
  3. Tremont St & West – East side of the Boston Common
  4. South Station
  5. Cross St. & Hannover – entrance to North End combing from financial district.
  6. Boylston St & Berkeley – between Copley and the Common.
  7. Stuart St & Charles – Theatre district, just south of the Common.
  8. Boylston & Fairfield – located in front of the Boylston Street entrance to the Pru.
  9. The Esplanade (Beacon St & Arlington) – this stop is on the north end of the Esplanade running along the Charles.
  10. Chinatown Gate Plaza
  11. Prudential at Belvidere
  12. Boston Public Library on Boylston
  13. Boston Aquarium
  14. Newbury Street & Hereford
  15. Government Center

Continue reading Visualizing Hubway Trips in Boston

Drug Crime Density in Boston

Drug Crimes in Boston with high-density areas shown using blue gradient shading. Each shade represents 1/8 of the data. Crime scenes are shown using gray dots.
Drug Crimes in Boston with high-density areas shown using blue gradient shading.
Each shade represents 1/8 of the data.
Crime scenes are shown using gray dots.

The original map can be found in my previous post, Spatial Data Visualization with R.

As a review, I use get_map function in the ggmap package to grab a map of Boston from Google Maps. The “crime report” data can be found at https://data.cityofboston.gov/. In the code below, I bring in crime data as a csv file. The data contains one observation per crime. It includes a description, crime category (drug, traffic violation, etc), and the longitude/latitude coordinates of the crime scene.

I added density areas using stat_density2d function. I feed this function a set of coordinates using the x= and y= parameters. The alpha parameter adjust transparency with 1 being perfect solid and 0 being fully transparent. I set the fill to vary with the number of points in the shaded area (..level..). I also specify bins=8, which gives us 7 shades of blue. The density areas can be interpreted as follows: all the shaded areas together contain 7/8 of drug crimes in the data. Each shade represents 1/8 of drug crimes in the data. Since all shades represent the same proportion of the data, the smaller the area of a particular shade, the higher the arrest density.

R code is given below.

//update: Thanks to Verbal for the meaningful distinction between crimes and arrest. It’s not really clear what this data actually tracks. I’m sure that this is reported crime, as called in by people. I don’t think every crime here leads to an arrest. I could be wrong.

Continue reading Drug Crime Density in Boston

Finding and Plotting Lorenz Solution using MATLAB

I use MATLAB to solve the following Lorenz initial value problem:
\begin{cases} x'=-10(x+y) \\ y'=-x(z+28)-y \\ z'=xy-\frac{8}{3}z \\ x(0)=y(0)=z(0)=5 \end{cases}

I wrote a function, LorenzRK4IVP(), that takes the system of three differential equations as input and solves the system using the Runge-Kutta method with step size h=.01. I plot the strange attractor as well as use MATLAB to produce a GIF of the solution.

LorenzRK4

testnew51

Continue reading Finding and Plotting Lorenz Solution using MATLAB

Interactive Cobb-Douglas Web App with R

Screen Shot 2014-11-04 at 7.24.02 PM

I used Shiny to make an interactive cobb-douglass production surface in R. It reacts to user’s share of labor and capital inputs and allows the user to rotate the surface. The contour plot (isoquants) is also dynamic.

Shiny works using two R codes stored in the same folder. One R code works on the user interface (UI) side and the other works on the server side.

On the UI side, I take user inputs for figure rotations and capital/labor inputs via slidebars and output a plot of the surface and isoquants.

library(shiny)

shinyUI(pageWithSidebar(
  headerPanel("Cobb-Douglas Production Function"),
  sidebarPanel(
    sliderInput("L","Share of Labor:",
                min=0, max=1, value=.5, step=.1),
    sliderInput("C","Share of Capital:",
                min=0, max=1, value=.5, step=.1),  
    sliderInput("p1","Rotate Horizontal:",
              min=0, max=180, value=40, step=10),
    sliderInput("p2","Rotate Vertical:",
            min=0, max=180, value=20, step=10)
    ),
  mainPanel( plotOutput("p"),
             plotOutput('con'))  
))

The manipulation of the inputs is done on the server side:

library(shiny)
options(shiny.reactlog=TRUE)

shinyServer(function(input,output){
  observe({
  # create x, y ranges for plots
  x=seq(0,200,5)
  y=seq(0,200,5)
  
  # Cobb-Douglass Model
  model<-  function(a,b){
      (a**(as.numeric(input$L)))*(b**(as.numeric(input$C)))
  }

  # generate surface values at given x-y points
  z<-outer(x,y,model)

  # create gradient for surface
  pal<-colorRampPalette(c("#f2f2f2", "Blue"))
  colors<-pal(max(z))
  colors2<-pal(max(y)) 
  
  # plot functions
  output$p<-renderPlot({persp(x,y,z,
                              theta=as.numeric(input$p1),
                              phi=as.numeric(input$p2),
                              col=colors[z],
                              xlab="Share of Labor",
                              ylab="Share of Capital",
                              zlab="Output"
                      )})
  output$con<-renderPlot({contour(x,y,z,
                              theta=as.numeric(input$p1),
                              phi=as.numeric(input$p2),
                              col=colors2[y],
                              xlab="Share of Labor",
                              ylab="Share of Capital"
  )})
  })
})

Iterative OLS Regression Using Gauss-Seidel

Figure shows 10 iterations of Gauss-Seidel's OLS estimates. Estimates get successively closer to the true line, shown in green.
Figure shows 10 iterations of Gauss-Seidel’s OLS estimates. Estimates get successively closer to the true line, shown in green.

I just finished covering a few numerical techniques for solving systems of equations, which can be applied to find best-fit lines through a give set of data points.

The four points \{(0,0), (1,3), (2,3), (5,6)\} are arranged into an inconsistent system of four equations and two unknowns:

b+a(0) = 0 \\  b+a(1) = 3 \\  b+a(2) = 3 \\  b+a(5) = 6

The system can be represented in matrix form:

\begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \\ 1 & 5 \end{bmatrix}  \begin{bmatrix} b \\ a \end{bmatrix}  =  \begin{bmatrix} 1 \\ 3 \\ 3 \\ 6 \end{bmatrix}

The least-squares solution vector can be found by solving the normal equations

A^TA\vec{x_*}=A^T\vec{b}

Solving for A^TA and A^T \vec{b} yields the following matrix system:

\begin{bmatrix} 4 & 8 \\ 8 & 30 \end{bmatrix}  \begin{bmatrix} b \\ a \end{bmatrix}  =  \begin{bmatrix} 12 \\ 39 \end{bmatrix}

The system can be solved using Gauss-Seidel method.

Below, I run 10 iterations of Gauss-Seidel (visualized in the figure above). The estimated line gets successively closer to the true solution in green. The estimates are shown in blue – each iteration is shown in a darker shade than the next (see highlighted lines).

library(MASS) # package needed for ginv() function 

A=cbind(c(4,8),c(8,30)) # coefficient matrix
b<-t(t(c(12,39))) # b-vector
x0<-t(t(c(0,0)))  # starting vector
iter<-10          # number of Gauss-Seidel iterations to run

L<-lower.tri(A)*A  # lower triang. A
U<-upper.tri(A)*A  # upper triang. A
D<-diag(diag(A))   # diag of A

# plot points 
xc<-c(0,1,2,5)
yc<-c(0,3,3,6)
plot(xc,yc,
     col='red',
     xlab="X-Values",
     ylab="Y-Values",
     bty='n')
title(main="OLS Estimates")
legend("bottomright",
       c("Data","Estimates","True Line"),
       lty=c(0,1,1),
       pch=c(1,NA,NA),
       col=c("red","blue","green"),
       bty='n')

# create color palette - lines will get darker with each iter
pal<-colorRampPalette(c("#f2f2f2", "Blue"))
colors<-pal(iter) # creates color palette of length(iter)

# plot true line
abline(a=1.0714,b=.8571,col='green')

n<-1
while(n<=iter){
  print(x0)
  
  # Gauss-Seidel formula
  x1<-(ginv((L+D)))%*%((-U%*%x0)+b)
  x0<-x1
  n=n+1
  
  # plot estimated line  
  abline(a=as.numeric(x0[2,1]), # slope of estimated line
         b=as.numeric(x0[1,1]), # y-intercept of estimated line
         col=colors[n]) # pick nth color in palette
}