I've been searching for an interesting topic for a personal project - something engaging which would allow me to practice my data skills that I've been learning through my time at the data school.
I was speaking to my brother who is just starting University - and we were talking about his friends who went to University last year whilst he was having a gap year. Two of his good friends were lucky enough to get football scholarships and go to Uni in the USA last year - one in California and the other in Atlanta. It turns out that during their first year, BOTH of their Universities had on campus shootings! They were both random and seemingly indiscriminate in nature (based on what his friends said).
Anecdotally this makes it seem like going to Uni in the USA is extremely unsafe - I had heard of the issues before but what are the chances that both of them would have experienced incidents like this within a relatively short space of time in completely different locations?
After having this conversation I think that this topic could be an interesting one to explore. I would like to know - how likely are you as a U.S. citizen to be caught up in a shooting? Are you more likely to become a victim in certain locations? Have these incidents become more common over time or are we just hearing about them more with the increased media available today?
From a data perspective - this project would allow me to practice my skills in the following areas:
- Finding and extracting data from relevant sources. Currently I've identified a website https://www.gunviolencearchive.org/ which has collected 10 years of data on the subject and is regularly updated and verified. They have made the data available to download as csv files. I can combine this with population data from the U.S. Census in order to normalise the findings, and I will continue to look for supplementary data which may give further context to the project.
- Data cleaning and joining. Once I have downloaded all of the data I will create a workflow in order to put it into a useful format for analysis. My aim is that the workflow should be able to manage additional information as it is added in the future so that this piece of work can continue to analyse trends over the coming years.
- Planning and creating dashboards and (hopefully) analysing to find some interesting findings. I will have to think in more detail about which angles I would like to explore - especially once I know exactly which data I have available and what it contains.
I will blog further when there is updated progress!