1

Global Terrorism Database (GTD) is an open-source database including information on terrorist events around the world from 1970 through 2014. For each GTD incident, information is available on the date and location of the incident, the weapons used and nature of the target, the number of casualties, and--when identifiable--the group or individual responsible.

An idea on what question can be answered using this dataset: - Use attack type, weapons used, description of the attack, etc. to build a model that can predict what group may have been responsible for an incident.

There's definitely more information that can be obtained from this project on this topic, with plenty of room for interdisciplinary collaboration. We need social sciences and humanities people that have special interests in this issue specifically with deeper knowledge in specific regions or terrorist groups. That is essential in order to be able to ask the right questions and get meaningful insight, especially when we want to dig in deeper to a problem. For instance, terrorism in the Middle East, Africa, and Asia may have different motives. For example: - Different types of terror strikes in various geopolitical situations; - Identifying the factors responsible - such as political or social issues - for the frequent occurrence of terrorist attacks, and which regions are targeted; How the terrorist group chooses their attack locations, etc.

Responsive image
Learn More

Spring Leaf Marketing Data

2

The Springleaf marketing data is very high dimensional. This means that for each entity of the data, there are a large number of associated variables. In this case, the names of the variables are unknown (probably to protect privacy) so you won’t be able to hack together some predictor using the names; instead building the predictor will entail using machine learning techniques to build a more general purpose predictor for the model. This is the type of data that you should expect to encounter in our job if you work with data.

Responsive image
Learn More

Police Data

3

The datasets from the Bureau of Justice Statistics, including one involving over 60000 interactions between the police and the public, could be used to investigate racial profiling and other problems with the justice system

Responsive image
Learn More

Flight On-time Arrival Data

4

The project analyzing flight arrival times, which will be looking at a dataset from the Bureau of Transportation Statistics that on the delays and on-time arrivals of flights all around the nation, and could be used to predict future flight delays

Responsive image
Learn More

Recipe Project

4

The beginner project, which includes using machine learning to sort recipes into cuisine types, and playing with the Twitter API to categorize tweets.

Responsive image
Learn More