People vs. Algorithms: Data Ethics in the 21st Century

Welcome! Here you’ll find the syllabus and readings for “People vs. Algorithms: Data Ethics in the 21st Century,” a mentored research experience with guided readings and computing labs, taught in the Department of Statistics, at Columbia University, in Spring 2022, running from February 1 to April 26.

Course Details

STAT UN 3107 Sec 3 / GR5298 Sec 5 Mentored Research: “People vs. Algorithms: Data Ethics in the 21st Century”
Tuesdays, 10:10–12:00. Spring 2022.
Classroom: To be held over Zoom, here.
Instructors: Jonathan Reeve, Isabelle Zaugg, Tian Zheng
Email addresses: jonathan.reeve@columbia.edu; iz2153@columbia.edu. But please direct all questions to our chatroom on Matrix, where appropriate.
Chatroom: data-ethics-spr2022 on Matrix
Website, including course readings: https://data-ethics.jonreeve.com
Source code: https://github.com/JonathanReeve/course-data-ethics

Get Started!

Description

This interdisciplinary mentored research experience introduces students to the field of data ethics through an exploration of the societal impacts of data-driven technologies. It aims to bridge the philosophy of ethics, humanities and social science scholarship, and computational thinking and practice. We include hands-on, guided lab activities where students wrestle with intellectually-challenging ethical questions first-hand. The research experience is designed for students from all disciplinary backgrounds and supports the development of introductory computational skill sets for beginners. An interest in Python is recommended, and a crash course will be provided for students who are new to Python.

Objectives

By the end of the semester, students will be able to:

Understand ethical challenges posed by the big data era.
Analyze public data critically, with a sensitivity towards social issues.
Develop a clear vision of their own ethical framework, and how best to apply it in the realm of data science.
Develop skills for data analysis in the Python programming language.

Requirements

The class is Pass/Fail. To earn a passing grade, you must write 2 annotations on each required reading (2 readings per week). (Further details are in the section Readings and Annotations, below.) You can skip 1 week, no questions asked. You are also required to attend ten out of our twelve class sessions, and actively participate in the discussion and lab activity, including serving as discussion leader for one or more readings.

Readings and Annotations

For each reading, please write 2-3 annotations to our editions of the text, using hypothes.is. Annotations are not required for videos or other non-textual websites. Links to the texts are provided below. You’ll have to sign up for a hypothes.is account first. Please use your real name as your username, so we know who you are. You may write about anything you want, but it will help to think about ethical problems. Good annotations are:

Concise (think: a long tweet)
Well-written
Observant, rather than evaluative

You may respond to another student’s annotation for one or two of your annotations, if you want. Just make your responses equally as thoughtful.

Prerequisites

There are no prerequisites. We will use the Python programming language for computational data analysis, and a crash course will be provided for those who are new to Python, on Feb. 5th and 6th.

Communication

Please direct all questions to our course chatroom on Matrix.

Getting Started

Sign up for a user account on hypothes.is, our annotation platform. Please use your real name as your username.
Sign up for an account on Matrix, and introduce yourself in the course chatroom.
Download and install Anaconda, a Python distribution, which contains a lot of useful data science packages.

Extra Resources

If you want some extra help, or want to read a little more about some of the things we’re doing, there are plenty of resources out there. If you want a second opinion about a question, or have questions that we can’t answer in the chatroom, a good website for getting help with programming is StackOverflow. Also, the Internet is full of Python learning resources. One of my favorites is CodeCademy, which has a game-like interactive interface, badges, and more. There’s also the fantastic interactive textbook How to Think Like a Computer Scientist, which is the textbook for Computing in Context, the introduction to Python at Columbia’s Computer Science department.

Jonathan Reeve and a colleague have also put together a few guides for beginning programming:

Schedule

Note: this schedule is subject to some change, so please check the course website for the most up-to-date version.

Week 0, 2-1: Introduction to the Course

To be read in class:

Sloane, Mona. 2019. “Inequality Is the Name of the Game: Thoughts on the Emerging Field of Technology, Ethics and Social Justice.” In Weizenbaum Conference, 9. DEU.

Week 0.5, 2-5 and 2-6: Python Bootcamp

Over the weekend of February 5th and 6th, the Department of Statistics will host a Python bootcamp, led by Jonathan Reeve. If you’re not already proficient in Python and data science libraries like Pandas, please attend this event.

Lab TBA.

Week 4: Workers’ rights and data collection

Lab TBA.

Week 11: Our bodies, our data

Readings:

Optional readings:

Sweeney, Latanya. 2013. “Matching Known Patients to Health Records in Washington State Data.” arXiv:1307.1370 [Cs], July.
Montgomery, Chester, and Kopp. 2018. “Health Wearables: Ensuring Fairness, Preventing Discrimination, and Promoting Equity in an Emerging Internet-of-Things Environment.” Journal of Information Policy 8: 34.
Duhigg, Charles. 2012. “How Companies Learn Your Secrets.” The New York Times, February.