In a previous post, I visualized crime density in Boston using R’s ggmap package. In this post, I use ggmap to visualize vehicle accidents in New York City. R code and data are included in this post.

The data comes from NYC Open Data. My data cut ranges from 2012 to 2015. The data tracks the type of vehicle, the name of the street on which the accident occurs, as well as the longitude and latitude coordinates of the accident. Both coordinates are saved as a single character variable called “LOCATION”.

Below, I load the ggmap and gridExtra packages, load the data, drop all accidents without location coordinates, and parse the LOCATION variable to get the longitude and latitude coordinates. I also parse the date variable to create a year variable and use that variable to create two data sets: one with all vehicle accidents in 2013 and another with all vehicle accidents in 2014

As a review, I use get_map function in the ggmap package to grab a map of Boston from Google Maps. The “crime report” data can be found at https://data.cityofboston.gov/. In the code below, I bring in crime data as a csv file. The data contains one observation per crime. It includes a description, crime category (drug, traffic violation, etc), and the longitude/latitude coordinates of the crime scene.

I added density areas using stat_density2d function. I feed this function a set of coordinates using the x= and y= parameters. The alpha parameter adjust transparency with 1 being perfect solid and 0 being fully transparent. I set the fill to vary with the number of points in the shaded area (..level..). I also specify bins=8, which gives us 7 shades of blue. The density areas can be interpreted as follows: all the shaded areas together contain 7/8 of drug crimes in the data. Each shade represents 1/8 of drug crimes in the data. Since all shades represent the same proportion of the data, the smaller the area of a particular shade, the higher the arrest density.

R code is given below.

//update: Thanks to Verbal for the meaningful distinction between crimes and arrest. It’s not really clear what this data actually tracks. I’m sure that this is reported crime, as called in by people. I don’t think every crime here leads to an arrest. I could be wrong.

I use MATLAB to solve the following Lorenz initial value problem:

I wrote a function, LorenzRK4IVP(), that takes the system of three differential equations as input and solves the system using the Runge-Kutta method with step size . I plot the strange attractor as well as use MATLAB to produce a GIF of the solution.

I used Shiny to make an interactive cobb-douglass production surface in R. It reacts to user’s share of labor and capital inputs and allows the user to rotate the surface. The contour plot (isoquants) is also dynamic.

Shiny works using two R codes stored in the same folder. One R code works on the user interface (UI) side and the other works on the server side.

On the UI side, I take user inputs for figure rotations and capital/labor inputs via slidebars and output a plot of the surface and isoquants.

I just finished covering a few numerical techniques for solving systems of equations, which can be applied to find best-fit lines through a give set of data points.

The four points are arranged into an inconsistent system of four equations and two unknowns:

The system can be represented in matrix form:

The least-squares solution vector can be found by solving the normal equations

Solving for and yields the following matrix system:

Below, I run 10 iterations of Gauss-Seidel (visualized in the figure above). The estimated line gets successively closer to the true solution in green. The estimates are shown in blue – each iteration is shown in a darker shade than the next (see highlighted lines).

library(MASS) # package needed for ginv() function
A=cbind(c(4,8),c(8,30)) # coefficient matrix
b<-t(t(c(12,39))) # b-vector
x0<-t(t(c(0,0))) # starting vector
iter<-10 # number of Gauss-Seidel iterations to run
L<-lower.tri(A)*A # lower triang. A
U<-upper.tri(A)*A # upper triang. A
D<-diag(diag(A)) # diag of A
# plot points
xc<-c(0,1,2,5)
yc<-c(0,3,3,6)
plot(xc,yc,
col='red',
xlab="X-Values",
ylab="Y-Values",
bty='n')
title(main="OLS Estimates")
legend("bottomright",
c("Data","Estimates","True Line"),
lty=c(0,1,1),
pch=c(1,NA,NA),
col=c("red","blue","green"),
bty='n')
# create color palette - lines will get darker with each iter
pal<-colorRampPalette(c("#f2f2f2", "Blue"))
colors<-pal(iter) # creates color palette of length(iter)
# plot true line
abline(a=1.0714,b=.8571,col='green')
n<-1
while(n<=iter){
print(x0)
# Gauss-Seidel formula
x1<-(ginv((L+D)))%*%((-U%*%x0)+b)
x0<-x1
n=n+1
# plot estimated line
abline(a=as.numeric(x0[2,1]), # slope of estimated line
b=as.numeric(x0[1,1]), # y-intercept of estimated line
col=colors[n]) # pick nth color in palette
}

I’ve been fooling around with spatial data lately. As it turns out, there are some great R packages for visualizing this kind of data.

Below is a set of charts I put together. It’s a good sample of the possibilities.

Motherjones.com keeps a dataset with characteristics of every mass shooting since 1983. The location of each shooting is marked on the map below with a red circle. The size of the circle is determined by the number of fatalities. Newtown and Virginia Tech (both school shootings) are among the deadliest within this time period.

In the vast majority of these cases, the shooters were white males with a history of mental illness who obtained their weapons legally.

# Mass Shootings
# packages used: rworldmap rworldxtra
# data source: www.motherjones.com
US <- getMap(resolution = "high")
plot(US,xlim=c(-125,-65),ylim=c(39,39), asp=1.31803)
title(main="Mass Shootings 1982-2013")
points(d$longitude,d$latitude,col="red",cex=d$Fatalities*.25)
text(-69.31142,37.21232,"Newtown")
text(-72.41394,30.22957,"Virginia Tech")
text(-111.04308,38.55200,"San Ysidro \n McDonald's Massacre")
text(-89.72780,25.9,"Luby's Massacre")
#using locator() -- add lines from circles to labels
points(c(-77.67630,-72.99422),c(36.08547,31.16065),type='l')
points(c(-71.71729, -69.05702),c(39.79927,37.94237),type='l')
points(c(-96.51104, -92.68024),c(29.62669,26.23582),type='l')
points(c(-115.8778, -111.4086),c(33.98637, 36.73135),type='l')

R is flexible with spatial data. It can zoom out of the United States and display global data. Malaysia Airlines has been in the news a few times over the last year, so it’s a pretty topical example. We can plot all of the Malaysia airline’s routes below using data from openflights.org. In the last chart, magnitude was shown using the size of the circles, but here we can show magnitude using the shade of the routes. Routes to popular destinations are a brighter shade of blue.

I’ve also plotted the Air France and American Airlines routes. The actual mapping was easy to do. I used a combination of xworldmap and xworldxtra for the world map along with geosphare for the route arcs.

# Airline Data
# Packages: rworldmap rworldxtra geosphere
# Source: OpenFlights.org; flowingdata.com
# plot world map
map("world", col="grey15", fill=TRUE, bg="Black")
#create 100 shades of blue
pal <- colorRampPalette(c("#f2f2f2", "Blue"))
colors <- pal(100)
#plot each route
attach(gs)
for(i in 1:length(S_Long)){
inter <- gcIntermediate(cbind(gs[i,]$S_Long, gs[i,]$S_Lat),
cbind(gs[i,]$D_Long, gs[i,]$D_Lat), n=100)
index<-round( (Dest_Count/max(Dest_Count))*length(colors))
lines(inter, col=colors[index], lwd=.2)
}
title(main="American Airline Routes",col.main="Blue")

I saved the best for last.

Ggmap allows R to fetch maps directly from Google and zoom into specific cities. Below is a map of Boston showing crime locations in 2014. The red dots represent shootings and blue dots represent drug offenses. I downloaded the data from data.cityofboston.gov Darker red areas represent more shooting events at that location. Most of the shootings seem to be clustered around Brookline/Roxbury.

If we zoom into the center of Boston, we see much fewer shootings. There are still many drug busts, but they’re concentrated in mainly three areas: Chinatown (shocker!), East Boston, and the South End.

bos_2<-boston[which(boston$Shooting=='Yes' & boston$Year=='2014'),]
bos_3<-boston[which(boston$INCIDENT_TYPE_DESCRIPTION=='DRUG CHARGES' & boston$Year=='2014'),]
bos_plot<-ggmap(get_map("Boston, Massachusetts",zoom=13))
bos_plot+geom_point(data=bos_2,aes(x=bos_2$Lat,y=bos_2$Long),
col='red',alpha=.5,
size=5)+geom_point(data=bos_3,aes(x=bos_3$Lat,y=bos_3$Long),
col='blue',alpha=.5,
size=2)