Brown Datathon

March 4-5, 2017

@ Providence, RI

Datathon is a celebration of data in which teams of undergraduates work around the clock to discover and share insights about large, rich, and complex data sets.

The event is sponsored by the Computer Science Department and the recently launched Data Science Initiative at Brown University. The Data Science Initiative is a cross-departmental program that aims to develop and promote data-driven research and education on campus.

Apart from the competition itself, which is the core of the event, there will be many other supporting activities ranging from educational workshops to opportunities to network with peers and industry representatives.

FAQ

Who can attend?

Anyone who is a high school, undergraduate or graduate student can attend Datathon. No experience with data science is assumed. There will be workshops and speakers that students of any background can learn from!

What if I don't have a team?

You can still participate in Datathon even if you don't have a team. You can also form a team at the event during one of our team-forming sessions! A team can consist of a maximum of four people.

What should I bring?

You should bring your laptop, phone, chargers and sleeping equipment!

Other questions?

If you have any more questions or concerns feel free to email us at: outreachcommittee.bds@gmail.com. You can also check out our Facebook page.

I’m not a Brown student. Can I still participate?

Absolutely! We welcome students from anywhere, and will do our best to reimburse you for transport. Amtrak or MBTA will get you to Providence Station, just a ten minute walk from Brown’s Campus.

How much is this going to cost?

Datathon is 100% free. We will be providing free food, drinks, and swag!

What do I hack on?

We will be providing datasets, including, but not limited to, data from our sponsors. While at least one of our provided datasets has to be a central part of your project, you are free to use external data as well. The use of common data ensures a level playing field for all applicants, and encourages you to try different approaches with data you are not yet familiar with - a crucial skill for any data scientist :)

Keynote Speaker

Marzyeh Ghassemi

MIT Computer Science Artificial Intelligence Laboratory; PhD Candidate

Sat, March 4 at 10:00-11:00am (MacMillan 117)


Why Machine Learning Should Change Health

The explosion of clinical data provides an exciting new opportunity to use machine learning to discover new and impactful clinical information. Among the questions that can be addressed are establishing the value of treatments and interventions in heterogeneous patient populations, creating risk stratification for clinical endpoints, and investigating the benefit of specific practices or behaviors. However, there are many challenges to overcome.

Come and learn about how machine learning could change health!

Workshops

Introduction to Deep Learning with Tensorflow

Sat, February 25 at 2:00-3:00pm (Wilson 102)


Sidd Karamcheti, Brown '18, CS and Literary Arts

Deep neural networks have been utilized to obtain state-of-the-art results in several problems in computer vision, natural language processing, and speech recognition. In this workshop, learn how to use the popular framework Tensorflow to quickly prototype deep neural network models. We'll be covering the basics of feed-forward networks, convolutional neural networks, and recurrent neural networks.

Assumes experience/coursework in linear algebra, probability, and general machine learning.

Data Science at TripAdvisor

Sat, March 4 at 11:00am-12:00pm (MacMillan 117)


TripAdvisor

TripAdvisor is the world’s largest travel site, enabling travelers to unleash the potential of every trip. TripAdvisor offers advice from millions of travelers, with 435 million reviews and opinions covering 6.8 million accommodations, restaurants and attractions, and a wide variety of travel choices and planning features — checking more than 200 websites to help travelers find and book today’s lowest hotel prices. TripAdvisor branded sites make up the largest travel community in the world, reaching 390 million average unique monthly visitors in 49 markets worldwide.

TripAdvisor: Know better. Book better. Go better.

Intro to D3.js

Sat, March 4 at 1:00-2:00pm (Smith-Buonanno 106)


Daniel Kunin, Brown '17, APMA-Bio

D3.js is an extremely powerful JavaScript library for web-based data visualization. It is used by companies such as the New York Times and FiveThirtyEight and has created some of the most beautiful and elegant data visualizations on the web. We will be go through the basic structure of a D3.js visualization and work on a couple short demos as a group. No experience necessary.

Check out Daniel's workshop via Github here!

Intro to Web Scraping by Spotter (using Beautiful Soup)

Sat, March 4 at 2:00-3:00pm (Smith-Buonanno 106)


Albie Brown @ Spotter, Brown '16

Is your hand still sore from copy-pasting for hours? Ever wished you could just download the internet? Hang out with the Spotter team and learn to turn websites into datasets. We'll teach you how to use the Beautiful Soup library to start you off on your web-scraping journey. Beginners welcome!

Check out Albie's workshop via Github here!

The Impact of Data Analysis on the World of Sports

Sat, March 4 at 4:00-5:00pm (Smith-Buonanno 106)


Colby Tresness, Brown '17, APMA-CS

The Sports Analytics Workshop will focus on how data can be leveraged to drive new insights in sports for front offices, coaching staffs, and fans. We will touch on key figures in the sports analytics community and the role of data science in the industry.

As a group, we will then explore how to predict future NBA career success of college basketball prospects and the many challenges that come with this endeavor. This workshop is open to all, regardless of concentration, background, or experience level!

Data Analytics in Healthcare

Sat, March 4 at 5:00-6:00pm (Smith-Buonanno 106)


Grant Fong, Brown '19

With the massive increase in data being produced in the healthcare space, new insights are constantly being discovered. This workshop will explore what sorts of datasets are available, and what some of the best practices are when coming to analyzing health data.

Interactive Data Visualizations in R

Sun, March 5 at 10:00-11:00am (Smith-Buonanno 106)


Andreas Karagounis, Brown '17, CS-Econ

All computers on campus have R on them. In this workshop, participants will get familiar with simple features of the R language, and understand the ease at which data can be visualized in R. We will be utilizing ggplot2 a graphing package and Shiny R in order to make our visualizations interactive. This workshop is open to all, regardless of concentration, background, or experience level. People with no experience with coding who want to be able to apply some visualization/data science techniques to their non-CS classes are especially encouraged to come.

Dirty secrets of Data Science

Sun, March 5 at 2:00-3:00pm (Smith-Buonanno 106)


Onur Keskin, Software Engineer at Google

What's swept under the rug? Every company does data science to some extend. Some do more systematically then others, but unfortunately this is not a common practice. We can attribute this to the intrinsic problem of data science: Dependency to data. I'll showcase some of the bad practices and the impacts of these practices in the long run. I'll also share some anecdotes on how to avoid them.

Analytics in the Global Enterprise

Sun, March 5 at 11:00am (Smith-Buonanno 106)


Serdar Kadioglu, Ph.D. Brown

We are undoubtedly in the middle of an Analytics Revolution that enabled turning huge amounts data into insights, and insights into predictions and actions. In this talk, I would like to present an overview of the analytics landscape in the global enterprise and share some successful commercial showcases. I would also like to open a discussion on various challenges faced during the life-cycle of decision support system

Sponsors




Schedule

Saturday, March 4

Registration

9:00-10:00am

Check-in at MacMillan Hall lobby.

Keynote Speaker

10:00-11:00am

Happening at MacMillan Auditorium (117).

Introduction to the Datasets

11:00-11:30am

Check it out at MacMillan Auditorium (117).

Team-Forming Mixer

11:30-12:30pm

Find teammates in MacMillan Auditorium (115).

Hacking Begins!

11:30am

Get started in Smith-Buonanno.

Lunch - Sponsored by TripAdvisor

12:00-1:00pm

Grab lunch in MacMillan Hall lobby.

Workshops

1:00-6:00pm

All workshops are held in Smith-Buonanno (106).

Dinner

6:30-7:30pm

Dinner is in Smith-Buonanno lobby.

Sunday, March 5

Breakfast

7:00-9:00am

Grab food in Smith-Buonanno lobby.

Workshops

10:00-12:00pm

All workshops are held in Smith-Buonanno (106).

Lunch

12:00-1:00pm

Get food in Smith-Buonanno lobby.

Demos & Judging

12:00pm

Closing talks and Announce Results

2:00-3:00pm

Announced in Smith-Buonanno (106).

Location

Rules and Guidelines

  1. The event is open to students at any level from high school through to graduate school. However, we'll need a parental consent form from those under 18 (get in touch with us!)
  2. You can work individually or in groups of up to 4 people. For those who don't have teams, don't worry! We will be holding a team-building mixer on Saturday morning.
  3. We will be providing datasets, including, but not limited to, data from our sponsors. While at least one of our provided datasets has to be a central part of your project, you are free to use external data as well. The use of common data ensures a level playing field for all applicants, and encourages you to try different approaches with data you are not yet familiar with - a crucial skill for any data scientist :)
  4. Here are some broad categories that you can consider for your projects:
    • Visualization
    • Statistical insights/correlations
    • Machine learning models
    • Interactive data exploration application
    Or any combination of these! Your project DOES NOT have to include all of these areas. You may focus on only one. Judging will be holistic and will prioritize the quality/creativity of your overall approach irrespective of which direction you take.
  5. You may not use any code that you wrote before the event.
  6. You can use any language/framework/library/API that you want.
  7. We reserve the right to disqualify/expel anyone who does not meet our code of conduct or violates rule 5.