How do you hire data people in a fair and unbiased way?


Hey all :wave:, what advice do you have for ensuring a hiring process for data analysts/scientists/engineers/etc. is as fair and unbiased as possible? How can I ensure I get a diverse pool of applicants, and don’t filter them out for the wrong reasons along the way? What has worked well for you, what hasn’t, what do you wish you or others had done differently?

Links, quotes, original thoughts all welcome!

I’ll start it off with a blog post @stefanie sent me about what wikimedia did recently when they needed a data person:


This has worked well for one narrow area of diversity: I generally want to see examples of work from candidates, but realize that not everyone is free to post their code publicly. So we make this request to candidates who reach the phone-interview stage, and ask for an example of work with the text below.

…So we can further understand your analytical, computational and organizational skills, we ask that you provide a research compendium of a representative analysis project of yours by [TIME/DATE]. This should be a self-contained folder that contains inputs and outputs of the analysis and the steps required to reproduce it. This may include:

  • Raw data files
  • Processed data files
  • Spreadsheets
  • Computer code
  • Outputs (plots, text summaries, tables)

Your research compendium should drawn from your existing work, not a novel analysis. You should include a summary of up to 500 hundred words explaining the organization, motivation, logic, and key decisions made in the analysis. You may use materials you have already provided in your original application.

Please provide your compendium as a zip file or a link to a dropbox folder, GitHub repository or other online repository. If the results were included in a published report or a peer-reviewed manuscript, please include a link to that in your summary.

We understand that you may not have full control over sharing materials or data worked on as an employee. None of this information will be shared beyond our hiring committee, and will be used only for the purposes of this evaluation. However, if you are limited in your ability to share parts your work - for instance, raw data are not under your control - you may omit portions of the work and provide an explanation in the summary.


Credit goes to Hannah Frick, co-founder of R-Ladies Global, for the excellent wikimedia “Hiring a data scientist” link.

That post by Mikail Popov includes a job description and a take-home task for interviewees.


Noam, have you ever received pushback on that request from potential hires?

I’ve been part of a consulting firm for a year and a half, and if I was to be asked that for a potential new position, I don’t think there’s anything I’ve done here I could ethically provide code or data from. I’d be happy to talk about things at a high level, how I translated client requests into the systems and analyses I planned and executed, what tools I used, etc. but sharing any code or data would be a big no-no.


No, but we primarily hire academics.


Here are two blog post suggestions from Gina Helfrich, PhD, Communications Director & Program Manager for Diversity & Inclusion at NumFOCUS


BUT! @ClaytonJY’s original question included "How can I ensure I get a diverse pool of applicants, and don’t filter them out for the wrong reasons along the way?
This is hard. Any suggestions?


I think it’s hard to do but not hard to figure out what methodology to use. For example, if you followed all the directives here, you’d be in pretty good shape:


Does anyone implement blinded application review (e.g., blacking out names names and pronouns in letters of interest/resumes?) How does it work? I don’t think its practical for the late-stage process for us but we could probably do it for initial screening.


That’s what we did the last time we hired; had an employee that wasn’t involved in hiring decisions white out as many indicators as possible. Hard to say how well it worked, and there’s a lot more subtle differences between resumes/cover-letters across gender/race/background than could ever be whited-out, but I’d do it again; seems at least a bit more fair or unbiased than not doing it.


@ClaytonJY Quick Q: In terms of pure mechanics/workflow, how did you do this? Did the employee actually white out printed applications or did they do it online? Did any tool make it easier/faster? Just trying to reduce the burden on whoever I ask for help with this.


They started whiting-out printed ones because we didn’t think to anonymize up-front, but then the employee did all the anonymizing digitally before printing hard copies for review. I think at the time most applicants came from the same place (indeed), so that probably made it easier, though I don’t know the specific steps that had to be done to go from indeed app -> anonymized printed version.

In hindsight, seems like a valuable service for a job portal to offer, if they don’t already. Possible caveat is they’d likely have to standardize resume format, and I personally won’t consider applying to anywhere that won’t accept my LaTeX-PDF resume.

Then again, resume formatting might be another subtle signal that allows for subconscious discrimination on the wrong characteristics.


One more Q: What did you white out? We’re finally implementing a test of this and I am asking for name and indications of gender, race, and nationality (address) to be removed. I am also considering:

  • School (I made an effort to reach out to HBCUs and my guess is that these schools will be less familiar to reviewers)
  • Publication co-authors. Not sure about this. In any case this and name will be revealed when reviewers look at publications on the second round.

Thanks for your insight!


We only whited out names, and possibly addresses, which really isn’t enough. If you had asked us then why we didn’t white out school names, our excuse would have involved the fact that most applicants were local, from similar departments at 3 nearby schools, which we already know a lot about the variance in quality of, so we think that’s a useful signal. In hindsight I’m less sure that’s defensible, so we might white those out next time.

We weren’t hiring academics or even PhD’s, so publications weren’t much of a concern for us. Would be very interested in hearing more opinions from those that do hire academics about how they treat publication history and how big a source of bias that might be.