Category Archives: Data Visualization

Stop and Frisk: Spatial Analysis of Racial Differences

Stops in 2014, with red lines indicated high white stop density areas and blue shades indicating high black stop density areas. Notice that high white stop density areas are very different from high black stop density areas. The star in Brooklyn marks the location where officers Lui and Ramos were killed. The star on Staten Island markets the location of Eric Garner's death.
Stops in 2014. Red lines indicate high white stop density areas and blue shades indicate high black stop density areas.
Notice that high white stop density areas are very different from high black stop density areas.
The star in Brooklyn marks the location of officers Liu’s and Ramos’ deaths. The star on Staten Island marks the location of Eric Garner’s death.

In my last post, I compiled and cleaned publicly available data on over 4.5 million stops over the past 11 years.

I also presented preliminary summary statistics showing that blacks had been consistently stopped 3-6 times more than whites over the last decade in NYC.

Since the last post, I managed to clean and reformat the coordinates marking the location of the stops. While I compiled data from 2003-2014, coordinates were available for year 2004 and years 2007-2014. All the code can be found in my GitHub repository.

My goals were to:

  • See if blacks and whites were being stopped at the same locations
  • Identify areas with especially high amounts of stops and see how these areas changed over time.

Killing two birds with one stone, I made density plots to identify areas with high and low stop densities. Snapshots were taken in 2 year intervals from 2007-2013. Stops of whites are indicated in red contour lines and stops of blacks are indicated in blue shades.

StopFriskSpatial

Continue reading Stop and Frisk: Spatial Analysis of Racial Differences

Modeling Ebola Contagion Using Airline Networks in R

I first became interested in networks when reading Matthew O’Jackson’s 2010 paper describing their application to economics. During the 2014 ebola outbreak, there was a lot of concern over the disease spreading to the U.S.. I was caught up with work/classes at the time, but decided to use airline flight data to at least explore the question.

The source for the data can be found in a previous post on spatial data visualization.

I assumed that the disease had a single origin (Liberia) and wanted to explore the question of how the disease could travel to the U.S. through a network of airline flights.

A visualization of the network can be seen below. Each node is a country and each edge represents an existing airline route from one country to another. Flights that take off and land in the same country are omitted to avoid clutter.

Each vertex is a country and each edge represents and existing airline route between two countries. Flights beginning and ending in the same country are not represented for clarity.
Each node is a country and each edge represents an existing airline route between two countries. Flights beginning and ending in the same country are not represented for clarity.

Continue reading Modeling Ebola Contagion Using Airline Networks in R

NYC Motor Vehicle Collisions – Street-Level Heat Map

StreetLevelMap In this post I will extend a previous analysis creating a borough-level heat map of NYC motor vehicle collisions. The data is from NYC Open Data. In particular, I will go from borough-level to street-level collisions. The processing of the code is very similar to the previous analysis, with a few more functions that map streets to colors. Below, I load the ggmap package, and the data, and only keep collisions with longitude and latitude information.

library(ggmap)

d=read.csv('.../NYPD_Motor_Vehicle_Collisions.csv')
d_clean=d[which(regexpr(',',d$LOCATION)!=-1),]

#### 1. Clean Data ####
# get long and lat coordinates from concatenated "location" var
comm=regexpr(',',d_clean$LOCATION)
d_clean$loc=as.character(d_clean$LOCATION)
d_clean$lat=as.numeric(substr(d_clean$loc,2,comm-1))
d_clean$long=as.numeric(substr(d_clean$loc,comm+1,nchar(d_clean$loc)-1))

# create year variable
d_clean$year=substr(d_clean$DATE,7,10)

Continue reading NYC Motor Vehicle Collisions – Street-Level Heat Map

Motor Vehicle Collision Density in NYC

DensityByBorough

In a previous post, I visualized crime density in Boston using R’s ggmap package. In this post, I use ggmap to visualize vehicle accidents in New York City. R code and data are included in this post.

The data comes from NYC Open Data. My data cut ranges from 2012 to 2015. The data tracks the type of vehicle, the name of the street on which the accident occurs, as well as the longitude and latitude coordinates of the accident. Both coordinates are saved as a single character variable called “LOCATION”.

Below, I load the ggmap and gridExtra packages, load the data, drop all accidents without location coordinates, and parse the LOCATION variable to get the longitude and latitude coordinates. I also parse the date variable to create a year variable and use that variable to create two data sets: one with all vehicle accidents in 2013 and another with all vehicle accidents in 2014

 
library(ggmap)
library(gridExtra)

d=read.csv('.../NYPD_Motor_Vehicle_Collisions.csv')

d_clean=d[which(regexpr(',',d$LOCATION)!=-1),]

comm=regexpr(',',d_clean$LOCATION)
d_clean$loc=as.character(d_clean$LOCATION)
d_clean$lat=as.numeric(substr(d_clean$loc,2,comm-1))
d_clean$long=as.numeric(substr(d_clean$loc,comm+1,nchar(d_clean$loc)-1))
d_clean$year=substr(d_clean$DATE,7,10)

d_2013=d_clean[which(d_clean$year=='2013'),c('long','lat')]
d_2014=d_clean[which(d_clean$year=='2014'),c('long','lat')]

Continue reading Motor Vehicle Collision Density in NYC

Visualizing Hubway Trips in Boston

Most Popular Hubway Stations (in order):

  1. Post Office Sq. – located in the heart of the financial district.
  2. Charles St. & Cambridge – the first Hubway stop after crossing from Cambridge over Longfellow Bridge.
  3. Tremont St & West – East side of the Boston Common
  4. South Station
  5. Cross St. & Hannover – entrance to North End combing from financial district.
  6. Boylston St & Berkeley – between Copley and the Common.
  7. Stuart St & Charles – Theatre district, just south of the Common.
  8. Boylston & Fairfield – located in front of the Boylston Street entrance to the Pru.
  9. The Esplanade (Beacon St & Arlington) – this stop is on the north end of the Esplanade running along the Charles.
  10. Chinatown Gate Plaza
  11. Prudential at Belvidere
  12. Boston Public Library on Boylston
  13. Boston Aquarium
  14. Newbury Street & Hereford
  15. Government Center

Continue reading Visualizing Hubway Trips in Boston

Drug Crime Density in Boston

Drug Crimes in Boston with high-density areas shown using blue gradient shading. Each shade represents 1/8 of the data. Crime scenes are shown using gray dots.
Drug Crimes in Boston with high-density areas shown using blue gradient shading.
Each shade represents 1/8 of the data.
Crime scenes are shown using gray dots.

The original map can be found in my previous post, Spatial Data Visualization with R.

As a review, I use get_map function in the ggmap package to grab a map of Boston from Google Maps. The “crime report” data can be found at https://data.cityofboston.gov/. In the code below, I bring in crime data as a csv file. The data contains one observation per crime. It includes a description, crime category (drug, traffic violation, etc), and the longitude/latitude coordinates of the crime scene.

I added density areas using stat_density2d function. I feed this function a set of coordinates using the x= and y= parameters. The alpha parameter adjust transparency with 1 being perfect solid and 0 being fully transparent. I set the fill to vary with the number of points in the shaded area (..level..). I also specify bins=8, which gives us 7 shades of blue. The density areas can be interpreted as follows: all the shaded areas together contain 7/8 of drug crimes in the data. Each shade represents 1/8 of drug crimes in the data. Since all shades represent the same proportion of the data, the smaller the area of a particular shade, the higher the arrest density.

R code is given below.

//update: Thanks to Verbal for the meaningful distinction between crimes and arrest. It’s not really clear what this data actually tracks. I’m sure that this is reported crime, as called in by people. I don’t think every crime here leads to an arrest. I could be wrong.

Continue reading Drug Crime Density in Boston

Interactive Cobb-Douglas Web App with R

Screen Shot 2014-11-04 at 7.24.02 PM

I used Shiny to make an interactive cobb-douglass production surface in R. It reacts to user’s share of labor and capital inputs and allows the user to rotate the surface. The contour plot (isoquants) is also dynamic.

Shiny works using two R codes stored in the same folder. One R code works on the user interface (UI) side and the other works on the server side.

On the UI side, I take user inputs for figure rotations and capital/labor inputs via slidebars and output a plot of the surface and isoquants.

library(shiny)

shinyUI(pageWithSidebar(
  headerPanel("Cobb-Douglas Production Function"),
  sidebarPanel(
    sliderInput("L","Share of Labor:",
                min=0, max=1, value=.5, step=.1),
    sliderInput("C","Share of Capital:",
                min=0, max=1, value=.5, step=.1),  
    sliderInput("p1","Rotate Horizontal:",
              min=0, max=180, value=40, step=10),
    sliderInput("p2","Rotate Vertical:",
            min=0, max=180, value=20, step=10)
    ),
  mainPanel( plotOutput("p"),
             plotOutput('con'))  
))

The manipulation of the inputs is done on the server side:

library(shiny)
options(shiny.reactlog=TRUE)

shinyServer(function(input,output){
  observe({
  # create x, y ranges for plots
  x=seq(0,200,5)
  y=seq(0,200,5)
  
  # Cobb-Douglass Model
  model<-  function(a,b){
      (a**(as.numeric(input$L)))*(b**(as.numeric(input$C)))
  }

  # generate surface values at given x-y points
  z<-outer(x,y,model)

  # create gradient for surface
  pal<-colorRampPalette(c("#f2f2f2", "Blue"))
  colors<-pal(max(z))
  colors2<-pal(max(y)) 
  
  # plot functions
  output$p<-renderPlot({persp(x,y,z,
                              theta=as.numeric(input$p1),
                              phi=as.numeric(input$p2),
                              col=colors[z],
                              xlab="Share of Labor",
                              ylab="Share of Capital",
                              zlab="Output"
                      )})
  output$con<-renderPlot({contour(x,y,z,
                              theta=as.numeric(input$p1),
                              phi=as.numeric(input$p2),
                              col=colors2[y],
                              xlab="Share of Labor",
                              ylab="Share of Capital"
  )})
  })
})