Miami’s top 10 Jail Bookings

Hello World!

So I’ve been working on building some interesting visualizations with open data.  Today I get to show off a really interesting one, not only will we discuss the visualization in depth, but also dive into how I built it.  And here it is, the top 10 bookings in Miami where the legend is in descending order for most common bookings holistically.


So some interesting key points about the chart.  Notice that you can click on any item in the legend to draw or not draw the stack on that graphic.  This allows us to look at individual arrests as well as the aggregation of arrests.  Also notice how you can click and drag on areas of the chart to get a nice zoomed in view of that section of data.  One interesting item is that it appears there is a trend for summer having more issues.  This is of course difficult to defend as we do not have a whole ton of data here.

Drivers License issues seem to also beak in August and there appears to be some form of minor peak for DUIs in August as well.  All in all, my guess is we will see a nice bell curve form as we roll into and out of summer as it appears we are doing this on the 2015 and 2016 border, however it is really non-defensible as that is all the data we have.  As time goes on, perhaps the end of 2016 there will be more interesting information.

All in all, though I was surprised to see so much theft and battery as it relates to the other issues.  So to me this reads as though Miami’s top social problems as it pertains to sheer quantity are Drugs, Theft and Battery.  Sounds about right I suppose.

On to the discussion of what built this graphic.  The first thing I had to do was categorize every offense as something more generic.  There were over 900 unique offenses in the database.  After going through this, the number 1 problem Miami really has actually is folks not showing up to court when they are supposed to and officers have to go get them.  There were so many of those I ended up filtering it out of the data set.  There was no trend to it that I could tell easily, it was just all the time folks just did not want to go to jail (duh!).  Interesting to see though that our number 1 issue really is giving folks court dates and they never show.  Maybe we should just deal with it right there.

Anyways, before we get to the nitty gritty, I wanted to call out a few items.

  1. Notice the Axis Labels and Title.
    1. This is a bug in plotly’s R wrappers, if rendered in ggplot2, you get the below graphic.
  2. Notice the hover overs.
    1. Not ideal, there is a different workflow I want to discuss in a later article on how to resolve this issue.
    2. It should indicate the number of observations for that category and it should have more friendly verbiage.

Top10MiamiBookings

One last aside, I swear.  The big reason I have been using plotly, even though my graphics end up a little less professional due primarily to these small things is that the interactivity provided is just stellar.  Its really only on advanced graphics such as this one that I run into issues.  The issues I encounter however are all resolve-able, its just a matter of time commitment.

So to begin with, I started by exporting the data set as a .csv from here: https://opendata.miamidade.gov/Corrections/Jail-Bookings-May-29-2015-to-current/7nhc-4yqn 

Open data is a great place to get started with learning visualizations and experimenting with predictive analytics.  This data in its raw form is who, what, when where and multiple charges.

So on to the code, which I am recording a video today to explain 🙂

library(plotly)
library(plyr)
library(dplyr)
library(rjson)
library(lubridate)

Sys.setenv("plotly_username"="YOURS")
Sys.setenv("plotly_api_key"="YOURS")

#Read Data in.
correctionsInit = read.csv("C:\\data\\jailbookings.csv")
correctionsInit = correctionsInit %>% tbl_df()
corrections = correctionsInit

#Number of Charges we want to display
#ie top 10.
numCharges = 10

#Questions 
corrections$BookDate = as.Date(corrections$BookDate, format = "%m/%d/%Y")
#
#Rework some of the labelling.
#
ChargeLUT = c("BENCH WARRANT" = "Misc. Warrants",
              "PROBATION WARRANT" = "Misc. Warrants",
              "OUT-OF-CNTY/WARRANT" = "Misc. Warrants",
              "ALIAS CAPIAS" = "Misc. Warrants",
              "ASSAULT/AGGRAVATED" = "Assault",
              "DOMESTIC VIOL WARRNT" = "Misc. Warrants",
              "GRAND THEFT/ARM/CONS" = "Grand Theft",
              "GRAND THEFT 2ND DEG" = "Grand Theft",
              "GRAND THEFT 3RD DEG" = "Grand Theft",
              "COKE/SELL/DEL/W/INT" = "Drug Traffic",
              "CONT SUB/SELL/DEL" = "Drug Traffic",
              "DWLS/HABITUAL" = "Drivers Lisence",
              "DWLS/KNOWINGLY" = "Drivers Lisence",
              "CANN/TRF/25-2000LBS" = "Drug Traffic",
              "CANN/POSN/-20/ATTEMP" = "Drug Possession",
              "DWLS/3RD & SUBS OFFN" = "Drivers Lisence",
              "NO VALID DRIVERS LIC" = "Drivers Lisence",
              "CANNABIS/POSN/0-20" = "Drug Possession",
              "COCAINE/POSSESSION" = "Drug Possession",
              "ALC BEV/DRK IN PUBLC" = "Disorderly in Public",
              "DISORDERLY CONDUCT" = "Disorderly in Public",
              "PETIT THEFT" = "Theft",
              "PETIT THEFT 1D" = "Theft",
              "ASSAULT/AGG/FIREARM" = "Armed Assault",
              "ASSAULT/AGG/DWEAP" = "Armed Assault",
              "NO VALID DRIVERS LIC" = "Drivers Lisence",
              "GRD THEFT/3D/VEHICLE" = "Grand Theft",
              "CONT SUBS/POSS" = "Drug Possession",
              "ALCOHOL/PUB/MIA BCH" = "Disorderly in Public",
              "DL/EXPIRED 6 MTHS+" = "Drivers Lisence",
              "ALC/OPN CNTR/POSN" = "Drug Possession",
              "INDECENT EXPOSURE" = "Disorderly in Public",
              "DISORDERLY INTOX" = "Disorderly in Public",
              "RESIST OFF W/O VIOL" = "Resist Officer",
              "BATTERY/AGGRAVATED" = "Battery",
              "LOITERING OR PROWL" = "Loitering",
              "ARREST WARRANT" = "Misc. Warrants",
              "ROBBERY/STRONGARM" = "Theft",
              "RETAIL THEFT/300>" = "Theft",
              "BATTERY" = "Battery",
              "BATT/DOM/STRANGLE" = "Battery",
              "DRUG PARAPHERNA/POSN" = "Drug Possession",
              "TRES PROP/AFTER WARN" = "Misdemeanor",
              "BURGLARY/UNOCC CONVY" = "Theft",
              "CANN/SELL/DEL/PSNW/I" = "Drug Traffic",
              "BATTERY/LEO/COR/FIRE" = "Battery",
              "FUG WARR/OUT O STATE" = "Misc. Warrants",
              "TRESPASS/STRUC/UNOCC" = "Misdemeanor",
              "DIS COND/ESTAB" = "Disorderly in Public",
              "RECKLESS DRIVING" = "Reckless Driving",
              "RBRY/ARM/FA/DW - PBL" = "Theft",
              "BURGLARY/OCC/DWELL" = "Theft",
              "BATTERY/AGG/DWEAP" = "Armed Assault",
              "DUI/DAM/PROP/PERSON" = "DUI",
              "ALCOHOL/CURB DRNKNG" = "Disorderly in Public",
              "DUI/BREATH LEVEL" = "DUI",
              "BATTERY/FELONY" = "Battery",
              "ASSAULT" = "Assault",
              "PROBATION VIOLATION" = "Probation",
              "CRIM MIS/1000+" = "Misdemeanor",
              "CRIM MIS/200-1000" = "Misdemeanor",
              "BURGLARY/UNOCC/DWELL" = "Theft",
              "WRIT/BODILY ATTACH" = "Misc. Warrants",
              "PANHANDLING/BEG/SOL" = "Misdemeanor",
              "CONCEALED F/A /CARRY" = "Weapons",
              "CANNABIS/POSN/20+" = "Drug Possession",
              "BURGLARY/UNOCC/PREJU" = "Theft",
              "PROST/COMMIT/ENGAGE" = "Prostitution",
              "HOLD FOR AGENCY" = "Misc. Warrants",
              "BATTERY/AGG/PREGNANT" = "Battery",
              "LSA/PROP DAMAGE" = "Reckless Driving",
              "PANHANDLE/AGGRESV" = "Misdemeanor",
              "ALCOHOL/CONSUM/STORE" = "Disorderly in Public",
              "CRIM MIS/0-200" = "Misdemeanor",
              "FA/WEAP/POSN/FEL/DEL" = "Weapons",
              "BURGLARY/UNOCC/STRUC" = "Theft",
              "UTTERING/CHECK" = "Forgery",
              "DOM VIOL/INJUNC VIOL" = "Assault",
              "GT 3RD/VEHICLE/CONS" = "Grand Theft",
              "FLEE/ELUDE PO" = "Fleeing",
              "BURG/ASLT/BATT - PBL" = "Theft",
              "BATTERY/ELDERLY" = "Battery",
              "PARK/ENT AFT HRS" = "Misdemeanor",
              "BATTERY/AGG/HRM/WEAP" = "Armed Assault",
              "STOLEN PROP/DEAL IN" = "Theft",
              "TRESP/STRUC/OCC/WARN" = "Misdemeanor",
              "HER/SELL/DEL W/INT" = "Drug Traffic",
              "BUSINESS W/O LIC" = "Bussiness Offense",
              "PETIT THEFT 2> CONV" = "Theft",
              "HOLD/CORRECTIONS" = "Misc. Warrants",
              "PANHANDLE/AGGR/OBSTR" = "Misdemeanor",
              "BATTERY/ATTEMPT" = "Assault",
              "ORGANIZED FRD/0-20K" = "Business Offense",
              "RESIST OFF W/VIOL" = "Resist Officer",
              "FLS NAME/ID/AFT ARRE" = "Other",
              "RETAIL THFT/PRI CON" = "Theft",
              "BOATING UNDER INFLU" = "DUI",
              "ATT FEL MUR/DLY WEA" = "Armed Assault"
) 
#Rename for real.
corrections$Charge1 = corrections$Charge1 %>%
  revalue(ChargeLUT)

#What are the top issues
#aside from warrants.
topCharges = corrections %>%
  filter(Charge1 != "") %>%
  filter(Charge1 != "Misc. Warrants") %>%
  group_by(Charge1) %>%
  summarise(
    totalCharge1 = n()
  ) %>%
  top_n(numCharges, totalCharge1) %>%
  arrange(desc(totalCharge1))

#Get data together and in good display format
d = corrections %>%
  mutate(
    bookYear = year(BookDate),
    bookMonth = ordered(months(BookDate, abbreviate = TRUE), month.abb)
  ) %>%
  filter(Charge1 != "") %>%
  filter(Charge1 != "Misc. Warrants") %>%
  group_by(bookYear, bookMonth, Charge1) %>%
  summarise(
    chargeObs = n() / 1000
  ) %>%
  mutate(
    total = sum(chargeObs)
  ) %>%
  filter(Charge1 %in% topCharges$Charge1)

#Draw a professional chart
# First re-order the levels for charges in descending order
d$Charge1 = factor(d$Charge1, levels = topCharges$Charge1)
p = d %>%
  ggplot(aes(x = factor(bookMonth), y = chargeObs, fill = factor(Charge1))) +
  geom_bar(position = "stack", stat = "identity") +
  facet_wrap(~ bookYear) +
  scale_x_discrete(breaks = levels(d$bookMonth)[seq(1, length(d$bookMonth), 2)]) +
  ggtitle("Top 10 Miami Jail Bookings") +
  labs(y="Bookings (thousands)", x="Date") +
  scale_fill_brewer(palette="Spectral", name = "Charge") +
  theme(plot.title = element_text(size = rel(2)),
        panel.background = element_rect(fill = "white"),
        panel.grid.major = element_line(colour = "#eaeded"),
        strip.background = element_rect(fill = "#eaeded", size = 1),
        axis.title.y = element_text(size = rel(1.25), angle = 90, margin = unit(c(0, 10, 0, 0), "pt")),
        axis.title.x = element_text(size = rel(1.25), angle = 0, margin = unit(c(10, 0, 0, 0), "pt")),
        strip.text.x = element_text(size = rel(1.5))
  )
p = ggplotly(p)

plotly_POST(p, filename = "blog/MiamiJailBookings_5_2_2016")

Well, thats it folks!  90% data manipulation and cleaning and 10% graphics.  Sounds about right.

One thought on “Miami’s top 10 Jail Bookings

  1. Pingback: From Pirates to Snakes: Top-down processing – calderonj.com

Leave a Reply

Your email address will not be published. Required fields are marked *