More ethical machine learning using model cards at Wikimedia

apply(conf) - May '22 - 10 minutes

First proposed by Mitchell et al. in 2018, model cards are a form of transparent reporting of machine learning models, their uses, and performance for public audiences. As part of a broader effort to strengthen our ethical approaches to machine learning at Wikimedia, we started implementing model cards for every model hosted by the Foundation. This talk is a description of our process, motivation, and lessons learned along the way.

Chris Albon:

The talk that I’m giving today for 10 minutes is Making Model Cards. I am Chris Albon. For those who don’t know, I’m the Director of Machine Learning at the Wikimedia Foundation. Just a little background for people who don’t know, the Wikimedia Foundation is the nonprofit that runs the backend part of Wikipedia.

Chris Albon:

I don’t spend my days writing Wikipedia articles. We don’t control the content on Wikipedia. We are the ones who are behind the scenes making sure all the servers work, making sure everything is up, and building features for people most of the time that you use, but you don’t really see that much.

Chris Albon:

First thing everyone asks me is, “How many models does Wikipedia actually host?” We host about 150 models in production. They range from things like anti vandalism or particular features that use ML, to language translation and topic detection.

Chris Albon:

The thing, the big thing about Wikimedia that will always blow my mind is it is governed by the community and funded by all of you, right? The board has lots of community members on it. They’re voted into that position of being on the board and they govern what the foundation works on, and my salary and all of my servers and all of my team is all funded by you, right? When you get that little Jimmy Wales personal appeal, that is you putting money into the stuff that we end up building. We end up being pretty careful about what we spend money on.

Chris Albon:

Thing about Wikimedia, and this becomes very relevant as we go farther into this talk, is that it is wildly, wildly transparent. We have, big thing, all of our code is public, right? So every single line of code that I write, you can all go and see and post comments on. But in addition to that, my team’s internal chat is public. This is actually a screenshot, I think from this morning, this is my team behind the scenes.

Demetrios:

Oh, just a heads up, Chris. We can’t see what you’re sharing if you’re sharing slides.

Chris Albon:

Oh, you can’t see that. Why can’t you? Let me try that again. Let me try that again. I got this. I got this, share screen. All right. I’m going to –

Demetrios:

Now we see it. Yeah. Cool.

Chris Albon:

It’s weird, weird change. All right. Let me go to slideshow. All right. See it now?

Demetrios:

Yep. Yeah, we’re good.

Chris Albon:

All right. This is my team’s chat. You can see everything my team works on all day. You can go in there and chat with us, watch us bang our heads against the wall about whatever we’re dealing with. My team’s work tickets are public. Every single work ticket we all, every single Monday we go through and we room all of our tickets. You can actually put in comments about the tickets and everything else.

Chris Albon:

The thing about it is that we strive for just no black boxes, no secret sauce. We want you to know everything that we’re working on. We want you to seal the code that you’re working on. We want you to have that participation in it, and the implication of that is that we need a way for you to see when we’re hosting a model. We want you to be able to understand what that model is? What are the implications of that model?

Chris Albon:

Maybe you’re a reader of Wikipedia and some model is affecting a choice, like recommending something. We want you to be able to actually go and read about that model and understand it and dive right into it.

Chris Albon:

Model cards, which have been talked about previously two talks ago, I think three talks ago, were started by Mitchell et al paper in 2018 or 2019. It is, think of it as a public card which talks about a single source of truth for the model. It is public facing and it talks about the good and the bad of the model, the details of how it was trained, the motivation behind it. What it should be used for? What it should not be used for? What should you watch out for? How can you go get the code, if that’s available? How can you go get the data if that’s available? That kind of stuff.

Chris Albon:

To us, this was a pretty important thing to think about because, again, we are very much trying to make sure that you can go and know for a fact what you’re putting your money towards. If you contribute to the site, how we’re using stuff, how we’re recommending stuff to you, how we’re saying something is possible vandalism or not? We want to make sure that you don’t need to trust us, you can go and check it out yourself.

Chris Albon:

First thing we did when we started setting up these is just go straight to the literature. This is the original paper. The original paper actually provides a really nice example of what they think a model card would be. But one of the things that you run into a lot is that translating that kind of stuff into a real world thing that you’re going to put in production, obviously, takes a little bit of time to figure out how you want to do that, and where the line is between a model card and documentation in a certain setting?

Chris Albon:

We spent a lot of time speaking to the community, to researchers, to our contributors, to the people who spend a lot of time on Wikipedia and are very well known on the site. We wanted to know what kind of stuff were they looking for at something like this? Really, it fell into three groups.

Chris Albon:

One, it was community members who really wanted to govern the model, right? If people on French Wikipedia don’t want us to use a model on French Wikipedia, they have a right to say, “Hey, we don’t want to use it” and we’ll stop. We’ll turn it off for them.

Chris Albon:

It was also researchers, the volunteer contributors who just really wanted to know the ins and outs of the model, how we were training it, what was the training data that were used? They wanted to go pick that apart. Contribute to it, right? You’re totally free and available to come and contribute to code to our models, make them better. Then finally, just the general public who was interested in that.

Chris Albon:

Once we knew that, it became pretty straightforward to come up with what I have a proof of concept, and actually I’ll link this into the Slack when we’re done. This is a proof of concept of a model for our language agnostic topic. It’s detecting the topic of articles without taking into account the actual language that’s used like English or something like that.

Chris Albon:

The big part about it, just as a start, is that this model card, right at the front we wanted to have something that talked about how that model was being used, right? This is the motivation behind what we think this model should be doing, how it’s working, where we got to the point that we needed to make a model?

Chris Albon:

The idea behind this is that it should be the least technical part of the description. We want the least technical part at the top of the page, and then as you scroll down the page, scroll down the model card, you get more and more and more technical. This is something that we hope anyone from any kind of background, whether technical not, could come in and have an opinion about it or sit and discuss it.

Chris Albon:

Off to the side, we have that nitty gritty detail that’s really nice to have, just an explanation of a model card. One of the things that is interesting to note is actually that there is no Wikipedia page for a model card. Our attempt to link to that didn’t really work at first because there was actually no Wikipedia page for model cards. Interesting twist there.

Chris Albon:

This is also where you’d have the code linked to the data, linked to who made the model, that kind of stuff so you could end up reaching out. This is probably one of the most important ones. Up in the top corner it says, “Discussion.” Every single Wikipedia page has a talk page, which is where people talk about the contents of that Wikipedia page. Following that model, we wanted to have a discussion page and make sure that there was a place where people can come and discuss the model.

Chris Albon:

This is really a key part to us. This is where we want the community to go, where they want to say if a model is good, say if they have problems, if they have questions. Anything like that, there’s a place to go to talk about that kind of stuff. To contest it, to say we’re idiots, all that’s totally fine.

Chris Albon:

View history, just like any other Wikipedia page, you can view every single edit that’s made on the page. You can also favorite it, which means that using your Wikipedia account, you don’t need an account on Wikipedia, but if you do an account, you can favorite it and you’ll get an update whenever the page changes. Anytime there’s an update, you’ll get a notification.

Chris Albon:

Scrolling down the page, this is following pretty closely what we have on the Mitchell et al paper. There’s this uses, who should be the people who are using this model? How you should use it? I think one of the most interesting things for us was how you should not use it? That’s been a pretty interesting one of like, “Okay, where’s the limits to where we think this model will apply?” Also, where we’re just using it on the site right now so you could know that we’re using it on this particular Wiki for this particular purpose.

Chris Albon:

Next is ethical considerations. This part has been pretty interesting because it’s something where the folks that are making the model have a lot of thoughts about what you should think about when you end up putting this model is? You know, where the details like, “Okay, I set this threshold at this level. Is that good or bad? What are the costs and benefits to doing so?” This is a place for them to put it rather than hiding it in documentation.

Chris Albon:

Scrolling down farther, we have stuff about model performance. Our goal is for all of this to be auto generated, so whenever we would trade a new model, the goal is that this would be auto populated with new information, and so it’s as fresh as possible for how things are going? Then also to provide a deep dive where you could look at model performance over time to deal with things like model drift.

Chris Albon:

Implementation is typically just the details that we want on the model. It’s less than documentation, but more than just a brief description of the model so you could have an idea of how we went about it, what models we used and that kind of stuff.

Chris Albon:

Data, we’d like to expand to a full data card, but for start, we’re just going to work on data, which is some description of how the data’s being used. Ideally, a link to the data, but in this case we didn’t do that but that would be where we’d like to go on something like this.

Chris Albon:

Then finally at the end, just the licensing citation information. If you wanted to know, “Okay, can I use this model?” Anything like that, we have it all right there for you. The next step is that is the proof of concept for the card. Our next step is to start rolling it out for all 150 models that we have.

Chris Albon:

The reason I’m giving this talk is I would love for you all to give us feedback. This is actually a link. I will link the actual, the slide deck in the Slack and you can go and give us feedback. We want you to tell us what we’re doing wrong. That is the reason that I’m giving this talk, and the other reason I’m giving this talk that we’re hiring an MLE, so if you want to come work on stuff like this and have the internet shout at everything you do, come on board. We’d love to have you.

Demetrios:

Wow. This is, yeah, this is very forward thinking and I appreciate it so much. I mean, a lot is talked about, like AI ethics and this responsible AI, but it feels like you are just doing it and then you come and show us that, “Oh yeah. We’ve already got the proof of concept out.” So, super cool to see. I have one question. Since you are the last person, I’m going to assume you have a minute to answer it.

Chris Albon:

Definitely do.

Demetrios:

When you mention the ethical considerations there, is it only from the team or is it from anyone on the internet? They can also add that.

Chris Albon:

Yeah. The interesting thing is that it is treated just like any Wikipedia page. It’s not on Wikipedia, but it is using the same software that runs Wikipedia., and the goal is that it would be crowdsourced. We want people to participate in this kind of stuff, and that is why we treat it like a Wikipedia page.

Chris Albon:

The thing that we know is that Wiki content and how to deal with that kind of stuff, and so integrating it with things like the Talk Page and allowing anybody to edit the pages is a really, really important idea to us because it’s fundamental to how we do things.

Chris Albon:

So yeah, if someone comes in with an ethical consideration, edit the page, and then if people disagree, they’ll go on the talk page and they’ll argue it out and debate it and discuss it, and we’ll get to some kind of place where we feel like we’re saying it correctly, and then that’s what ends up being on the main page.

Chris Albon

Director of Machine Learning

Wikimedia Foundation

Chris spent over a decade applying statistical learning, artificial intelligence, and software engineering to political, social, and humanitarian efforts. He is the Director of Machine Learning at the Wikimedia Foundation. Previously, Chris was the Director of Data Science at Devoted Health, Director of Data Science at the Kenyan startup BRCK, cofounded the AI startup Yonder, created the data science podcast Partially Derivative, was the Director of Data Science at the humanitarian non-profit Ushahidi and was the director of the low-resource technology governance project at FrontlineSMS. Chris also wrote Machine Learning For Python Cookbook (O'Reilly 2018) and created Machine Learning Flashcards. Chris earned a Ph.D. in Political Science from the University of California, Davis researching the quantitative impact of civil wars on health care systems. He earned a B.A. from the University of Miami, where he triple majored in political science, international studies, and religious studies.

Add Your Heading Text Here

More ethical machine learning using model cards at Wikimedia

Chris Albon

Follow Us

Book a Demo

Contact Sales

Request a free trial