James Mickens on why all data science is political
Data science and artificial intelligence have inescapable influence and power in our world. The people who are the most negatively affected are often the ones whose voices are not heard. What does a digital world that works for everyone look like? And who gets a seat at the table?
In this episode, our hosts Colleen Ammerman and David Homa speak with James Mickens about the ethical challenges in cybersecurity, the societal implications of data science, and the importance of humor in teaching. James is a professor of computer science at the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS), as well as a director of the Berkman Klein Center for Internet and Society at Harvard University.
Read the transcript, which is lightly edited for clarity.
Colleen Ammerman (Gender Initiative director): Today, we are joined by James Mickens. James is the Gordon McKay Professor of Computer Science at Harvard’s John Paulson School of Engineering and Applied Sciences, as well as a director at the Berkman Klein Center for Internet and Society at Harvard University. Welcome, James. We’re very excited to talk to you today.
James Mickens (Harvard computer science professor): Thank you. Thank you for that introduction.
David Homa (Digital Initiative director): Good to see you again, James. Glad you could join us.
JM: Glad to be here.
DH: So, let’s get started. I want to talk specifically first about people and technology and actually, the people who create technology. The people who create technology are in a unique position to sort of know when bias might be introduced. Do you have a sense of what proportion of those people are actually aware that they have this big responsibility?
JM: I can’t give you a specific number. I know that the number is lower than we would hope. I think that a lot of people come through their technical education, whether it be formal, through university, or whether it be sort of informal, through self teaching or watching courses on the Internet. But a lot of technologists, and in fact, a lot of tech-centered entrepreneurs, they come through that educational process thinking that, implicitly, technology is value-neutral — that somehow we just create these products. And sure, they can be used for good or bad. But ultimately, that’s not for the technologist or for the businessperson to decide. There’s some sort of hope that: “If it gets too bad, maybe the government will intervene. Or, social forces will take over. There will be boycotts. It’s not our problem.” But that’s wrong. That’s the wrong way to think about it. And so I think that the fraction of people who actively devote thought to this is getting bigger. That’s the good news. I think that particularly the younger generation of entrepreneurs and tech folks are starting to think about these things more explicitly. But there’s still a large swath of the tech community that would sort of prefer to not get bogged down in these “nontechnical” details.
DH: We reached out on Twitter with a poll before interviewing you and asked: “If a developer sees something that really should be dealt with, or if there’s heavy bias in in a product, should they always say something? Should they only point it out if they think it’s a really big deal? Or, is it not their problem?” And it was pretty universal that they felt people should always say something. So maybe that’s a sign that says something about our community, or maybe the trend is heading in the right direction.
JM: Or maybe we can’t necessarily trust the polls. I mean, we would all say that we’d help the grandmother try to cross the street. We would certainly do that. But of course, when we actually get to the intersection, you see everyone looking around saying, “Who else is going to help this person across the street? I got somewhere to go.” I think you’re exactly right that in the abstract, when you ask people these questions, they say, “Oh, of course, of course. We should think about the well-being of other people.” But once you get the pressure of deadlines – the product has to ship next week. Once you get the pressure of shareholders – they want to make sure that you’re competitive, where “competitive” is oftentimes defined as how many features you have. “How fast is your software?” So on and so forth. When you start looking at these sort of more complex, real-world situations, I think it’s easier for people to lose some of their moral centering, if you will.
CA: I wanted to follow up on that theme around the way that we sort of divide technical issues, questions, and problems from social ones. We just don’t traverse that boundary a lot. So, it seems to me what you’re saying about education is really critical. Can we educate people differently so that these ethical considerations are embedded in how they think from the get-go? That seems ideal. But of course, today there are plenty of people in positions of power making decisions that didn’t have that education. You had this great quote in a talk that I was watching in preparation for this interview where you said something like, “Ethical considerations become less important to us when considering them could hurt our revenue.” We just have these different incentives. So, I would be curious to hear you talk about where we are today. How do we meaningfully integrate ethical considerations into our decision-making? And how do we even communicate about them when there’s so much variation in what people know on both the social and the technical sides?
JM: Those are great questions. I think the first step is always realizing that you have a problem. In other words, the first step is always sort of stepping back and saying, “Look, I can’t just silo myself in my narrow domain of expertise. The things that I build, the company or the technology or the people that I train, they interact with the larger society.” So, I think just sort of getting those high-level ideas in people’s heads, that’s sort of the first fight. Then, after you’ve won that battle, I think the next struggle is to convince people that many of these ethical challenges, they don’t just have a very simple, “yes” or “no” answer. And of course, as engineers, that is what we want to hear. As engineers, we want to basically say, “Look, I get it. Ethics. Certainly I want to make sure that I don’t go to jail and I get to heaven. So, can you just give me a checklist?” And then whenever something ethical happens to me, I have to make some sort of ethical decision, I just consult ye olde checklist. And then I just go, “yep, OK.” Then we’re done. Unsurprisingly, perhaps, this is not the way that these situations actually resolve themselves in the real world. You actually have to think about these things and you have to make difficult decisions whereby the decision you end up making may still have some bad effects. It may be the best of a series of difficult decisions to make.
So, that’s where I think it’s actually helpful to have people either on staff, or people that you can talk to, who are classically trained in thinking about these difficult issues. You know, many hospitals, for example, have an ethics board. And it’s not because the doctors aren’t aware of these issues. It’s that the doctors were, first and foremost, trained to heal people. They weren’t trained to think about, you know, philosophers in caves, or whatever it was that obsessed the Greeks back in the day. So, I think that increasingly, in at least some of the bigger companies, you’re starting to see there be some roles inside the company where the job is to think about some of these issues. In the same way that you have a lawyer to think about legal compliance, you might have some philosophers, some ethicists, some sociologists on staff to think about these issues, about what is the right and wrong thing to do here. “Who are the stakeholders? Are we ensuring that we’re providing equity along multiple dimensions — gender, race, disability status?” Things like that. So, I think that’s ultimately where we want to go. The endgame is that even if you have a company that seemingly is only focused on one thing — like making trades go fast, or providing a social network, or things like that — they still have the capacity to think more holistically about how those products are integrating with the rest of society.
DH: There’s a lot of talk about bias and data. But there’s a lot of steps in data: there’s the data gathering, there’s the collection, the storage, and the parsing of it. Then, there’s the analyzing of it. Then, on the back end, there are systems of AI and ML interpreting them, or making decisions, or projecting the future. I wonder, are there different problems at each stage there? And are some of the problems bigger than others? And where do you see the biggest problem? Or does everyone just have to be aware all along?
JM: Yeah, it’s wall-to-wall problems, chock full of problems. [laughter] Christmas has come early, and your gifts are problems! That’s the way that I would look at it. I mean, I think that intuitively human nature is to look at problems and try to be as reductionist as possible. It’s to say, “I’ve got this complicated process. But here, here’s the problem.” I want to point a finger at something and say, “If we fix that, then we’re done.” I think that particularly when you look at big data pipelines, machine learning, things like that, because these pipelines are sometimes so deep, because the data sets are so complex and so multidimensional, it’s oftentimes hard to say, “Yeah, if we just fix this one thing, that will solve 90% of our ‘ethics problems’ or our ‘diversity problems’ or whatnot.” It’s typically a more holistic type of reasoning that you have to apply. And I think that this is somewhat related to our conversation about how there is no checklist, right? In as much as, if you do this, this, and this, then you’re scot-free. Instead, what it typically is, I mean, it’s really a lifestyle. So, you have to — at every step of design and then implementation and then testing — you have to be thinking about some of these questions. And one pushback that you’ll sometimes get from that, particularly from engineers, or from quants (or, [in general], people who view themselves as being hard technical people), they’ll say, “This isn’t what you hired me for. You didn’t hire me to watch these videos about the value of diversity. I believe it. I believe it. Let everyone in. But I don’t want to talk about this.” And the problem is that there’s a lot of research which shows that that type of attitude of, “I understand bias, but I don’t want to deal with it in my day-to-day professional life,” does not lead to unbiased outcomes. It’s a thing that you always have to sort of think about, in addition to the technical side of things, or to the business side of things. So, it’s really — to get back to the original question — you’ve got some big data pipeline that involves a lot of different players, that involves a lot of different systems. At every step, you have to think about who are the stakeholders, who are the people you’re trying to help, what are their interests? How do you make sure those interests are being protected? Who are the set of people that you don’t care about, right? Who are the set of people for whom you’re not actually targeting their concerns? Being explicit about that stuff is very important. Because when you don’t think about it explicitly, you end up getting tech that fails in these ways, that ends up causing a lot of harm, even though, let’s say, the developers and the business folks don’t have any explicit malice in their heart.
A great example is facial recognition for cameras built into laptops. A lot of the early cameras that came out, they couldn’t track people’s faces of a certain skin tone. And I have no doubt that the people who designed a lot of these early systems, they weren’t explicitly trying to do that. But they didn’t ask these questions about, you know, “Where’s our training data coming from?” for example. So, they get training data that itself is biased. It then results in a biased facial recognition algorithm. They didn’t know that, though, and so they did ship the product saying, “We’ve hit all of our internal metrics for accuracy.” Until you started seeing things on YouTube where you’d have someone come into frame and then the camera would just freak out and just start emitting smoke. You know, it took that to make them understand, “We really need to rethink our process from the ground up, not just the engineering and the algorithms once you have the data.” But where is that data coming from in the first place?
DH: We’re talking a lot about engineers thinking differently. What’s the space or need for people who are not engineers, and what’s their role? And what would they have to be doing differently? Do we just take ethicists and put them in a room with engineers or do they need to learn something first? What’s your view on that?
JM: Exactly. Yeah. My number one recommendation: You take the ethicist, chain them to a radiator, bring in the other set of people, chain them to a radiator, see who makes it out. [laughter] My bet is on the ethicist. They’re kind, but they’re cunning.
I would say that, if you look at any sort of modern enterprise, there’s a very good chance that the set of job titles that you have in that enterprise are pretty varied. Even in a tech company, they have a huge number of lawyers, they have a huge number of business people and economists. These are very diverse companies. And so what that means is that even if you think, “Oh, I work at a widget company, the majority of people who work at this company are directly making widgets,” that’s oftentimes false. You know, there are the support roles to support the widget-forward workers. [laughter] You can tell I’m not a business person. That’s not business talk. [laughter] So, I think that what ends up happening a lot of times is that there are these decisions about products and services that have to be made that don’t just involve the people on the ground, the people who are actually making those services or products. The decisions sometimes bubble up to other parts of the company, which are not those front-facing people — the lawyers, the H.R. people, things like that. And so I think that when you talk about things like diversity training, when you talk about things like ethics training, it’s not just teaching to the people who are frontline, making the widgets. It’s all the way up and down the stack. And to be honest, I think a lot of the training also has to be directed to shareholders as well, because I think another key tension that oftentimes arises is that people — and by “people,” I mean shareholders — say things like, “Well, yeah, all these things that you’re doing that are not directly profit-focused — great. You should definitely do that… But also, don’t hurt profit.” They want to have this ambivalence towards these things, and that ambivalence is problematic.
CA: You do a lot of research on cybersecurity and it seems like that’s an area where we’re just beginning to grapple with the implications for diversity, inclusion, justice, and fairness. I wonder if you could talk a bit about the connections between cybersecurity and equity.
JM: You know, all of these issues that we look at in technology — and I think increasingly in business, too — I think just defining these terms is becoming messier and messier, is becoming more and more difficult. So, going back to this example of facial recognition: So, imagine that you have cities — you don’t have to imagine it, there are cities that have cameras deployed throughout the streets, and those cameras are used, among other things, to help prevent and then later on, unravel what types of criminal activity happen. Well, if the data that was used to train those cameras to identify faces was biased, that puts certain communities at greater risk. And so there’s an interesting cybersecurity angle there too because if someone were to break into those systems and let’s say, change the way that they identified criminals versus not-criminals, that risk would fall disproportionately on certain segments of society. And so, to put a finer point on it, depending on what zip code you live in, you are more or less likely to have, let’s say, police cameras in that zip code. And as a result, the security, or lack thereof, of that camera system deployed by the city, the impact of that system being hacked into will fall disproportionately on people from different zip codes.
So, I think that, you know, the notion of cybersecurity is evolving. It used to just be, “Can people break into my stuff?” It’s become more encompassing as technology has become more pervasive. So now, for example, cybersecurity includes things like “Can people break into my power grid?” And by “my,” I mean a county’s, state’s, or nation’s. Cybersecurity includes things like, “Can someone tamper with our elections?” And once again, there are opportunities for disproportionate impact in terms of the way that, you know, let’s say foreign states might try to tamper with the votes that have been registered by certain members of certain communities. So once again, I think this all harkens back to this idea that when we talk about these issues of tech, or business, or cybersecurity, or bias, or diversity, or ethics, you really have to take this increasingly broader perspective on things because so many aspects of society are entangled now with so many other aspects of society.
DH: I always wonder about integrating these data sets. You mention certain zip codes have more data or more gathered information on the residents in those areas. And out there, there are companies like, for instance, financial institutions that want to evaluate people for loans. And they’re aggregating data from many, many sources. So, inherently it would seem there’d be more information about people and potential crime, or crime in certain zip codes. And if the financial institutions just take that data at face value, they may make interpretations. Because then that’s another thing someone has to decide — how do I aggregate all this data together and decide on a profile of who you are? And there’s implications further beyond just the criminal justice system. How many people are looking at that and what are they saying?
JM: It’s a real problem, what you just described. And for all the people listening, I want to look right in the camera. Where is it? Oh, I’m stuck in Zoom World it’s right here. [laughter] I want to make this very, very clear. All data science is political. It’s impossible to take a dataset and analyze it in a “perfectly objective” way. Because you’re always going to be putting on there some type of value judgment about what the dataset represents; whether that dataset covers all the attributes that you care about; and what you are trying to do with the dataset. And I think that, once again, it’s very easy for technologists and entrepreneurs to say, “Let the machines handle it, because it’s just zeros and ones.” But that’s not at all the way that this sort of a system works. You look at things, for example, like predicting whether someone’s going to default on a loan, maybe to give them a mortgage or something like that. Let’s say I give you some dataset which looks at the historical rate of loan defaults for a bunch of different people. First of all, which communities are you looking at? You know, were they already the target of, let’s say, previous predatory lending, which put them in a poor position to pay for new loans — things like that? Thinking about those questions and whether those questions are important, that is a political process. And by “political” I don’t mean political in the sense that like, you’re a capital “R” Republican or a capital “D” Democrat. I mean “political” in the sense that you are making a statement about what you would prefer the world to look like, given that you’re going to analyze data in a certain way.
That’s what I mean by political. I think it’s so important for people to understand that because so often you hear this attitude of, “Yeah, we have these humans making these decisions and of course, humans are biased. But once we feed it into this machine learning thing, once you feed into this algorithm, all of our bias problems go away.” And that’s just completely false. And what you end up seeing, time and time again, is that if you don’t ask these political questions, if you’re not honest about that kind of stuff, then you see the old biases that you were supposedly trying to get rid of being replicated in these new systems that you create. Except now you have this sort of facade of like “it’s just zeros and ones.” So, when I see, or I hear about things like, “oh, we’re going to use algorithms to determine the first pass of CV screening.” You know, you submit an application, then an algorithm basically says, “Here’s the first cut” of things. That concerns me. Because there’s many studies that show, for example, if you have two resumes that are exactly the same, you just change the name, and all kinds of bad things happen based on whether you change the name to a woman’s or a person of color sounding name, you know, so on and so forth. And so if you say, “Oh, the goal of our algorithm is to be just as accurate as our old system,” your old system wasn’t accurate. So, I think it’s really important for everyone listening, if you’re in a company and your CTO or some data scientist says, “Don’t worry, we’ve got an algorithm on the case. We’re not going to have any bias problems,” fire that person! Arguably make a citizen’s arrest. [laughter] Look it up, be knowledgeable of the statutes. Try to do something to them till the authorities can show up. Because that’s just a terrible way of thinking about things.
CA: So, this is a question that I was thinking about after watching some presentations you’ve given and then also reading about how you think about teaching. You’re an educator — you’re not just a researcher — and you’re really a communicator and somebody who cares a lot about trying to foster these conversations. I think anybody who knows about you knows that you really lean into humor and storytelling and narrative in these public talks, and I imagine maybe in the classroom, too. So, I would just love to hear you talk about why you do that. I imagine that it’s a deliberate choice that you’re making.
JM: You know, heavy is the crown. Sometimes I wake up and just have too many jokes in my mind and it’s difficult to find a way to share that gift with humanity. [laughter] And yet, I try. So, I think that one reason I try to incorporate storytelling and humor into my public speaking is that you hear from politicians and leaders all the time saying, “We need more people in tech. Think more about tech. Tech is a great thing to get into: science, math and engineering, blah, blah.” And yet, we don’t have as many popularizers of STEM subjects as one might expect, given all these exhortations to go into that field. And I think that it’s very easy for laypeople to get this impression that, “Oh, you know, STEM stuff is very stodgy and I’m just going to be locked away in a lab all day. And it’s not fun.” But I think it is, in fact, fun. I think it is, in fact, interesting. And furthermore, it is, in fact, important. You know, many of the issues we’ve talked about in this conversation are issues that are extremely important to large swaths of society. And I think that because of this latter issue in particular, that there are so many important issues which need to be talked about, but which can be uncomfortable to talk about. That’s one reason why I think humor can be very useful — because I know from personal experience teaching engineers, also being an engineer myself, sometimes there’s this reaction when someone comes to you and they’re like, “Hey, have you thought about this thorny ethical dilemma?” You’re just like, “Get off my lawn. I don’t want to hear about this kind of stuff. You couldn’t come into my chair and write as many lines of bug-free code as I could. So, just get out of here.” And that misses the point. You know, it misses the point about what it means to be a member of society where you have to care more about things beyond just your narrow worldview. But sometimes you have to lead people to that river so they can drink like a horse, or whatever that saying is. [laughter]
So, what I find is that if you bring up some of these issues using the delivery mechanism of humor, people are more likely to be less defensive when you start talking about the more difficult things. Because, you know, no one likes to be told, for example, that they’re biased. I mean, anyone out there who’s listening, you know, take an implicit bias test. You will find out that you are basically like one bad day away from living in the 1500s. [laughter] I mean, it is rough! It doesn’t matter how open-minded you think you are, you’ll take the test and you’ll be like, “I’ve probably got a ninety five out of one hundred.” You’ll get like a negative 18 out of 100! I guarantee that. I’ve never seen a score higher than negative five. [laughter] And so I think that’s tough. It’s tough for people to hear that message. And so that’s why I think it’s helpful to use comedy to soften some of those blows, and to tell personal stories. You know, I grew up in the South, and I’ve had a certain set of experiences there, and some of them were troubling. But it’s useful to share some of those stories. Because ultimately, you know, we’re all people. I mean, I don’t want to tear up on camera. [laughter] But we’re all people. There’s a set of these universal experiences that we all have. And I think that people realize that quicker through laughing, because when you laugh together… What is comedy? It’s very interesting. Not to get too philosophical here, but, you know, comedy, when you tell a joke, you’ve built a worldview and you’re asking people to join you in that worldview — to join you in this little universe that you’ve created. And then the same things that you find funny, you want them to find funny themselves. And in a certain sense, that’s what any teaching is. That’s what any advocacy is. You’re saying, “I’ve built this universe, this way of thinking about things, and I’m inviting you to come live in that universe.” And so that’s why I think it’s so important that we get these messages out there, and we deliver them in a way that is both honest but also sort of caring. That makes it clear that we all make mistakes. But if you’re trying to constantly learn and you’re trying to constantly think about these issues explicitly, then that’s the best that we can do, and that’s the most you could ever ask of someone.
CA: That’s great. What you’re saying about humor and storytelling as a way to bring people into a place where they feel like they’re part of a community, and there’s kind of a shared experience, I think it really concurs with research that’s been done on bias training, which has found that if you simply educate people about bias, it actually makes people ultimately behave in more biased ways. And the only way to avoid that is to frame it around, “We’re all trying to overcome these biases.” And I think that’s what you’re doing, is creating that kind of condition around, “We’re all trying to learn together and grow and overcome this.”
Before we close, is there anything that we haven’t asked you? Or anything that you haven’t had a chance to speak about that you want to leave people with? Or any resources where people can go to learn more?
JM: I think that — and this is kind of building upon something that I mentioned a bit tangentially, earlier — one of the most important things anyone can do, regardless of what their job is, what profession they’re in, is talk to people. Because I feel that a lot of frictions or issues or problems that arise, they arise because people just haven’t been exposed to certain ideas or certain perspectives. So, one thing I’d really recommend that people do is that they try to talk to their coworkers, talk to their neighbors, talk to their friends, and just listen and see what kind of issues are top of mind for those people. Because I feel like if we look specifically at this problem of tech that’s sort of gone awry, tech that didn’t serve the population the way that we thought it would, many of the issues that arise were foreseeable. You know, they could have been dealt with early on if only we had talked to and valued the opinions of other people, who in many cases are not far away. It’s not like you need to put in a telegram to someone living at the center of the earth. You know, you just need to just go talk to the person in your next cubicle or down the street. So that’s one thing I’d really encourage people to do.
I’d also say that if you are interested in some more formal training for these types of things — and by “these types of things,” I mean ethical reasoning, diversity training, things like that — if you’re currently a student, or you know anyone who is at a university, oftentimes universities offer resources. If you’re particularly interested in issues at the intersection of computer science and ethics, you can check out Harvard’s website for embedded ethics that’s publicly available. You can see some of the readings that we have. Really, my high-level piece of advice is just talk to people, try to think about these issues involving business, tech, and ethics holistically, and I think you’ll see better outcomes.
CA: And that’s a perfect note to end on.
DH: That’s a wrap on the interview. But the conversation continues.
CA: And we want to hear from you. Send your comments, questions, and ideas to email@example.com.
James is a computer scientist and the Gordon McKay Professor of Computer Science at the Harvard John A. Paulson School of Engineering and Applied Sciences.
The HBS Gender Initiative drives change and eradicates gender, race, & other forms of inequality in business and society through research.
Keep ExploringJust Digital Future
All data science is political. It's impossible to take a dataset and analyze it in a 'perfectly objective' way. Because you're always going to be putting on there some type of value judgment about what the dataset represents.James Mickens
Computer scientist and Harvard University professor