Daily data on recorded crime was obtained from the Los Angeles Police Department and Los Angeles Sheriff’s Department. Both datasets cover eighty eight cities in the county from January 2010 to December 2015 from multiple crime categories that include assault, robbery, theft, rape, vandalism, arson, burglary, and larceny. The data would exclude February 29, 2012 from the study since it was a leap year and there was no weather data to report. From the types of crimes from the datasets, two categories of violent and property crimes were created by combining the different categories.
The reason for doing this is to clean up the data and to largely remove zero values. For this study theft is classified from these types of crimes shoplifting, petty theft ($950 or lower), auto theft, and grand theft ($950.01 or higher). The category violent crime was determined by homicide, assault, robberies, and rape. The category property crime was determined by arson, burglary, theft and vandalism. Any other category of crime was excluded from the study mainly because they are not among the crime types that are normally included in these types of studies.
Also, based on other research these categories were not included and excluding them makes the results more comparable. In most of the cases, violent crimes are reported days later of when the actual crime occurred, which becomes an issue for when analyzing categories of violent crime. Although crime data is not completely accurate this study decided to analyze crimes on the days they were reported because there is a higher accuracy when describing the crime to a police officer.
Just like violent crimes, property crimes also observed the days when a crime was reported because property crimes are a more serious issue, since these types of crimes are less likely to be reported immediately after the crime has occurred. Although, this causes a limitation on the study to address this problem several weather variables will be used. The daily summaries of weather data was obtained from the National Oceanic and Atmospheric Administration from January 2010 to December 2015. The weather variables that were considered for this study were average, maximum, and minimum daily temperature (°F), wind speed (mph) and precipitation (in).
For each crime observation, a corresponding weather observation had to be found. This was done to ensure a weather observation was assigned to a crime observation because there were days when crime did not occur, or multiple crimes occurred the same day. In order to match the crime observations to the weather observations, the nearest weather station had to be selected. While there are many weather stations around Los Angeles County, finding the station with the appropriate data was complicated. This was because weather stations datasets often have incomplete or missing records, or the weather station does not measure either temperature or precipitation. Out of all the stations near Los Angeles County, the best one to represent the study area was Burbank airport weather station because it is located near the center of the study area. Also included in the weather data is the daily average summaries. Which has normalized and standard deviation data for precipitation, temperature average, temperature max, and temperature minimum.
What is good about this crime datasets is the inclusion of statistics on the several crime types classified as either violent crime or property crime. For the category of violent crimes murder, rape, assault, and robbery would be placed there. While the category for property crimes arson, burglary, larceny, theft and motor vehicle theft would go there. What is valuable about both datasets is the information about the types of specific crimes that occurred during that time, the characteristics of the victims, value of property stolen, and location of where the crime occurred. The bad thing about using these datasets is that police might be careless or manipulative when they write up the report, or worse they might not report the crimes at all. Another reason why using this data could be bad is because police departments want to assure the public the city is safe and that could lead to false or skewed crime data. Another reason why using this data could be bad are the areas in which police officers patrol. Since, law enforcement usually patrol areas of low income or minority neighborhoods, the data can signify that some areas are more dangerous than others. There is also the crimes police don’t release to the public and that also can hurt the study. Overall, crime data could not relied on but it is better to have incomplete data than no data at all.