March 3-4, 2018 @ Providence, RI

Brown Data Science is dedicated to creating and supporting opportunities related to Data Science for the undergraduate community.

We bring together students interested in Data Science and Machine Learning by hosting workshops and talks, and through competitions in which participants gain real world experience analyzing data—often with access to unique and exclusive data sets.

Working with the Data Science Initiative and Brown Computer Science, we hope to soon develop and promote independent study projects and internship connections. Brown Data Science is for everyone, so check out our upcoming Datathon!


What is the Datathon?

At Brown Datathon 2018, we’ll bring together over 250 students from Brown, MIT, Harvard, and other top schools across the country to Providence, RI.

Datathon is a celebration of data in which teams of undergraduates work around the clock to discover and share insights about large, rich, and complex data sets. Apart from the competition itself, which is the core of the event, there will be many other supporting activities ranging from educational workshops to networking opportunities with peers and industry representatives.

Who can attend?

Anyone who is an undergraduate or graduate student above age 18 can attend Datathon. No experience with data science is assumed. There will be workshops and speakers that students of any background can learn from!

What if I don't have a team?

You can still participate in Datathon even if you don't have a team. You can also form a team at the event during one of our team-forming sessions! A team can consist of a maximum of four people.

I’m not a Brown student. Can I still participate?

Absolutely! We welcome students from anywhere, and will do our best to reimburse you for transport. Amtrak or MBTA will get you to Providence Station, just a ten minute walk from Brown’s Campus.

How much is this going to cost?

Datathon is 100% free. We will be providing free food, drinks, and swag!

What do I hack on?

We will be providing datasets, including, but not limited to, data from our sponsors. While at least one of our provided datasets has to be a central part of your project, you are free to use external data as well. The use of common data ensures a level playing field for all applicants, and encourages you to try different approaches with data you are not yet familiar with - a crucial skill for any data scientist :)

What should I bring?

You should bring your laptop, phone, chargers and sleeping equipment!

Other questions?

If you have any more questions or concerns feel free to email us at: You can also check out our Facebook page.

Keynote Speaker

Matthew Rothman, Goldman Sachs Managing Director

Senior Lecturer in Finance at the MIT Sloan School of Management

Sat, March 3 at 10:00-11:00am (Bert 130)

The Alternative Data Under Our Noses

Investors are continually pushing into new frontiers to find new and powerful data. But the more removed a dataset is from the financial or economic domain, the more challenging the inference problem becomes. Matthew Rothman, MD at Goldman Sachs, thinks economic relevance is an essential attribute of any truly powerful dataset. Furthermore, he believes the world’s major investment banks have a crucial role to play in creating, interpreting and protecting the most economically relevant data in the world.


Build Your Own Roboadvising Trading Strategy & Careers in Finance

Sat, March 3 (Smitty-B 106)

Zach Hamed, Product Manager at Goldman Sachs

While Artificial Intelligence and Machine Learning are fantastic tools to have in your programming toolkit, the current hype around today’s trendiest technologies can overshadow just how much is possible with basic CS and math skills. Come learn the basics of building a trading strategy that buys and sells stocks, some caveats on implementing that trading strategy in reality, and why working on web software and algorithms around these strategies is an incredible way to boost your engineering and business skills.

Prototype to Product: Trade-offs in Scaling Data Science

Sat, March 3 (Smitty-B 106)

David Stuebe, Senior Software Engineer and Jeff Mayse, Data Scientist

This workshop will showcase Upserve’s products and some of the data science techniques it’s taken to go from prototype to product. Members from the Upserve’s data team will also standby for questions regarding the company’s Datathon dataset.

Intro to Deep Learning in Keras

Sat, March 3 (Smitty-B 106)

Andreas Karagounis, Computer Science Masters Student working in Serre Lab

Deep learning is at the forefront of developments of AI. Deep learning is a branch of machine learning that was inspired by information processing and communication patterns in a biological nervous system. In this workshop, we will be investigating some of the fundamental properties and techniques in deep learning using Keras. Keras is a popular neural networks library written in Python currently being used in industry and research. This workshop is open to anyone with some background in Python.

Intro to R programming

Sat, March 3 (Smitty-B 106)

Erin Bugbee ‘20, Computer Science and Statistics

Statistics is a crucial component of data science. As a statistical programming language, R is one of the most popular languages used for data science. This workshop serves as an introduction to programming in R. Learn the basics of how to turn data into knowledge through exploratory data analysis, visualizations, statistical models, and machine learning algorithms.

D3 Visualization of Spotify Data

Sat, March 3 (Smith-Buonanno 106)

Brad Guesman ‘20, Computer Science and Physics

Visualizing Music Data with d3.js and the Spotify API: in this workshop, we'll be using the d3 javascript library to create interactive visualizations of artist data from Spotify's web API. We'll start with the basics, like learning how to program a dynamic bar graph, but by the end of the workshop we'll have constructed a visual artist search engine! No prior experience with d3, the Spotify API, or javascript required, though it may help to have a bit of background with CSS and HTML.

Changing the Game: Recent Applications in Sports Analytics

Sat, March 3 (Smith-Buonanno 106)

Ozan Adiguzel ‘19, Statistics

The advent of analytics has revolutionized the world of sports. We are in an age where the implementation of new technologies such as player tracking has improved the quality and availability of data unlike anything before. Data-driven decision making is now indispensable for gaining competitive advantage in professional sports. This workshop will cover some of the cutting-edge applications in the field.

Data Driven Health and Wellness

Sat, March 3 (Smith-Buonanno 106)

Grant Fong, Brown '19

When it comes to people's physical and mental health, there is oftentimes a stark contrast between clinical data and data regarding their symptoms while not in care. In this workshop we will be looking at novel sources of data and specific ways to analyze this information to form a better interpretation of health data.

Closing Talks

Large-scale virtual indoor scene data

Sun, March 4 at 12:30-1:30pm (Smith-Buonanno 106)

Daniel Ritchie, Asst. Prof. of CS, Brown

The increasing availability of large-scale text and image data has received a lot of recent attention. But did you know that large-scale 3D data is also increasingly available? This talk will focus on one such type of data: virtual indoor scenes, such as bedrooms, living rooms, and office spaces. I’ll discuss some of the unique characteristics of this data, as well as recent research projects using it. This includes some of my own research on machine learning models which learn to generate new scenes.

Don’t Just Predict. Explain!

Sun, March 4 at 1:30-2:30pm (Smith-Buonanno 106)

Ellie Pavlick, Google

The data revolution has upended the way science is done, and has touched nearly every discipline. It is no longer enough to discuss a problem solely through blackboard theories and clever debates. Now, everything is an empirical question. Data or it didn’t happen! But just as focusing on models alone can lead us to fall into the traps of armchair philosophy and confirmation bias, so too can focusing on data alone. I will argue for the importance of forming and testing defeasible models of the world, especially in the age of big data. I will present several case studies of how over-reliance on data and prediction accuracy--and lack of attention to underlying models--can lead to false conclusions and biased predictions with real-world consequences.


The Data Science Initiative supports broad engagement with the campus community. Through public lectures, panel discussions, boot camps, and other projects, we explore the challenges in translating data into knowledge and in understanding its impact. Brown’s Data Science Initiative offers academic and professional programs for a rigorous, distinctive, and innovative approach to learning and collaboration for anyone building a career in data-enabled fields.

At Goldman Sachs, our Engineers don’t just make things – we make things possible. Change the world by connecting people and capital with ideas. Solve the most challenging and pressing engineering problems for our clients. Join our engineering teams that build massively scalable software and systems, architect low latency infrastructure solutions, proactively guard against cyber threats, and leverage machine learning alongside financial engineering to continuously turn data into action. Create new businesses, transform finance, and explore a world of opportunity at the speed of markets. We are excited to meet you at the Brown Data Science Datathon and tell you about how our engineers use data science in their roles!


Like our Facebook page to stay up to date with the Keynote and Workshop lineup.

Datasets will be released soon!

Saturday, March 3



Opening & Keynote


Lunch & Team-Forming Mixer


Hacking Begins!


Get started in Smith-Buonanno.



All workshops are held in Smith-Buonanno (106).





Sunday, March 4



Hacking ends






Closing talks


Demo & Prizes




Rules and Guidelines

  1. The event is open to students at any level from college through graduate school. Students must be over 18 years old to participate.
  2. You can work individually or in groups of up to 4 people. For those who don't have teams, don't worry! We will be holding a team-building mixer on Saturday morning.
  3. We will be providing datasets, including, but not limited to, data from our sponsors. While at least one of our provided datasets has to be a central part of your project, you are free to use external data as well. The use of common data ensures a level playing field for all applicants, and encourages you to try different approaches with data you are not yet familiar with - a crucial skill for any data scientist :)
  4. Here are some broad categories that you can consider for your projects:
    • Visualization
    • Statistical insights/correlations
    • Machine learning models
    • Interactive data exploration application
    Or any combination of these! Your project DOES NOT have to include all of these areas. You may focus on only one. Judging will be holistic and will prioritize the quality/creativity of your overall approach irrespective of which direction you take.
  5. You may not use any code that you wrote before the event.
  6. You can use any language/framework/library/API that you want.
  7. We reserve the right to disqualify/expel anyone who does not meet our code of conduct.