Hacking for Social Sciences

Schliessen Icon

The core concept of the course (and its answer to distance learning) is to mimic an open source software community. Open source communities have been tackling complex software projects collaboratively for decades – often without meeting in person. Hacking for Social Sciences sees programming (and learning about it) as a team sport. The course implements the belief that much of the success and motivation of open source software comes from the communities’ ability to collaborate online smoothly.

Implementation of the course during the time of distance learning

Hence Hacking for Social Sciences chooses state-​of-the-art, industry standard software development platforms, collaboration and project management approaches over traditional learning software to run the course. To learn in applied fashion how to use platforms like GitHub or tools like Kanban boards that are widely used in and beyond academia provides value as it helps students’ ability to fit into modern teams. The consistent use of a free, professional open source software development ecosystem alongside publicly available, accessible, screenreader friendly course material licensed under a dual creative common’s license (CC BY-​NC-SA 4.0) is a teaching innovation that distinguishes this course from many other courses.

The source code of all lecture material including interactive elements is fully available to students. This allows students to reproduce and modify all material. Interactive elements such as an in-​class survey with live reporting are shared in fully reproducible fashion, so students can learn in applied fashion how to create and operate such tools in an independent infrastructure – simply by reproducing them. Hacking for Social Sciences chooses its means of communication carefully to make sure the channel suits the message and purpose. Official administrative communication runs through ETH edoz emails which allow to easily involve ETH administration or IT services. Task and assignments are discussed through GitHub’s issue tracker5 and boards as these provide the best opportunity to track progress and give context aware, asynchronous feedback linked to students’ source code. The course Slack community offers an informal (and optional) opportunity to get feedback and interact with classmates, the lecturer and course alumni. The Slack workspace offers general channels, work group or topic specific channels as well as direct messaging and calls.

I am a research software engineer and lecturer from KOF’s administrative IT staff with an economics background and deeply rooted ties to the open source software community. I am the key contact person for all student questions. Thanks to the course’s community concept, communication approach and the my experience with a large number of online collaborators, it has been feasible to operate a course of up to 40 students without teaching assistants.

The course implements a flipped classroom concept in four blocks which take place several weeks apart. Each block consists of two half day sessions. Live blocks contain video lectures held via Zoom as well as Breakout Rooms to work on problems collaboratively in smaller groups. The first two blocks show the big picture and intend to create a common denominator to account for heterogeneous backgrounds and starting points (see course registrations). The latter two blocks are highly adaptive as students are encouraged to bring their own applied problems to class and work to solve these challenges with the help of the course environment. In addition, students who have not encountered feasible problems of their own in practice can choose from a wide array of programming topics to find a motivating applied task.

These tasks range from building a personal website, to parallel computing or to working with SQL databases. The course forms teams early on which improves identification and involvement with the group and therefore motivation. Students can use large stretches of the course to get input and support from class on their own applied problems. Given the high degree of intrinsic and collaborative motivation and that output is based on heterogeneous backgrounds and diverse starting points, an ungraded pass/fail assessment (ungraded semester work) seems appropriate for this course. Active participation in class and completion of a collaboratively solved applied programming task is required to pass. Last but not least the course intends to give participants entry points and techniques to stay up to date in a vastly evolving field, hone their programming skills, find support and connect to open source communities in general. As a global coordinator for useR! 2021 (~chair), the most established conference for the R Language for Statistical Computing, I offered my network to interested students and enriched the course with the latest developments and updates from developer community. In 2021, useR! had more than 1800 participants from more than 120 countries.

Overall concept of the course before the pandemic – during – after

Hacking for Social Sciences is a doctoral education course offered at D-​MTEC. It draws students from almost a dozen ETH departments as well as external guests from public administration and industry backgrounds. Researchers learn to leverage open source infrastructure and develop computational and data engineering competencies. While working on their own research in a coaching setup, students embrace project management, open source infrastructure, modern collaboration and software development workflows. Accessible and reproducible course material mirrors the course’s commitment to an innovative, inclusive and motivating environment.

The course consciously chooses state-​of-the-art, industry standard platforms such as GitHub over traditional learning management systems. To familiarize students with software engineering environments used in research, the infrastructure taught in the course is also used to operate it. The open source software (OSS) approach and Creative Commons based licensing of the course material allow to learn from its infrastructure and empower students to use what they learned beyond the course.

In the first two blocks of the course, students develop a common view of the big picture and learn how to navigate components of a software development ecosystem such as git version control. Students aim to reach a level of programming-​with-data proficiency that is often referred to as ‘software carpentry’ by the community.
The second two blocks follow a coaching approach. Participants co-​shape the course by bringing their problems to class or choosing the focus of input sessions. A flipped classroom concept, a publicly available online book, a Github organization, a course YouTube channel, course Slack community and my 15+ years of experience in OSS allow not only for vibrant live sessions but also to make the most of the time in between the four two day blocks. In addition, students get a chance to connect to the global OSS community network.

Course Description

Hacking for Social Sciences - An Applied Guide to Programming with Data
The vast majority of data has been created within the last decade. As a result, more and more fields of research start to consider and embrace programming to process and analyse data. This course teaches applied programming with data and aims to leverage the open source tech stack to deal with this new wealth and complexity of data.
The idea behind Hacking for Social Sciences is build a solid understanding of core technologies and concepts to help researchers develop a data processing strategy and increase your possibilities when working with data. The course approach is to single out those concepts stemming from software development that are easy to adopt and useful to social scientists. The course has three major learning objectives:

- Understand the role of focal components in a data science tech toolbox.
Learn how technologies like R, Python, Git Version Control, docker or Cloud Computing could play together in your research project.
- Learn how to manage and version control source code.
Hacking for Social Sciences teaches how to use git version control to collaborate professionally, make your research reproducible and your code base persistent.
- Applied data sourcing and data transformation
Learn how to communicate with SQL databases. Learn how to consume data from different sources using machine to machine communication interfaces (APIs) such as the OpenStreetMap geocoding API / Routing Engine or the KOF data API for macroeconomic time series.

Hacking for Social Sciences is not a Statistics, Econometrics or Machine Learning course. Though experience in these fields will help inasmuch that students will have an easier time to motivate investing in programming and to come up with their own application examples, profound methodological knowledge is not a prerequisite.
PhD (Master)
Ungraded semester performance