How do you hire data people in a fair and unbiased way?

ClaytonJY · September 21, 2017, 5:32pm

Hey all , what advice do you have for ensuring a hiring process for data analysts/scientists/engineers/etc. is as fair and unbiased as possible? How can I ensure I get a diverse pool of applicants, and don’t filter them out for the wrong reasons along the way? What has worked well for you, what hasn’t, what do you wish you or others had done differently?

Links, quotes, original thoughts all welcome!

I’ll start it off with a blog post @stefanie sent me about what wikimedia did recently when they needed a data person: https://blog.wikimedia.org/2017/02/02/hiring-data-scientist/

noamross · September 21, 2017, 6:21pm

This has worked well for one narrow area of diversity: I generally want to see examples of work from candidates, but realize that not everyone is free to post their code publicly. So we make this request to candidates who reach the phone-interview stage, and ask for an example of work with the text below.

…So we can further understand your analytical, computational and organizational skills, we ask that you provide a research compendium of a representative analysis project of yours by [TIME/DATE]. This should be a self-contained folder that contains inputs and outputs of the analysis and the steps required to reproduce it. This may include:

Raw data files

Processed data files

Spreadsheets

Computer code

Outputs (plots, text summaries, tables)

Your research compendium should drawn from your existing work, not a novel analysis. You should include a summary of up to 500 hundred words explaining the organization, motivation, logic, and key decisions made in the analysis. You may use materials you have already provided in your original application.

Please provide your compendium as a zip file or a link to a dropbox folder, GitHub repository or other online repository. If the results were included in a published report or a peer-reviewed manuscript, please include a link to that in your summary.

We understand that you may not have full control over sharing materials or data worked on as an employee. None of this information will be shared beyond our hiring committee, and will be used only for the purposes of this evaluation. However, if you are limited in your ability to share parts your work - for instance, raw data are not under your control - you may omit portions of the work and provide an explanation in the summary.

stefanie · September 21, 2017, 8:46pm

Credit goes to Hannah Frick, co-founder of R-Ladies Global, for the excellent wikimedia “Hiring a data scientist” link.

That post by Mikail Popov includes a job description and a take-home task for interviewees.

ClaytonJY · September 22, 2017, 5:17pm

Noam, have you ever received pushback on that request from potential hires?

I’ve been part of a consulting firm for a year and a half, and if I was to be asked that for a potential new position, I don’t think there’s anything I’ve done here I could ethically provide code or data from. I’d be happy to talk about things at a high level, how I translated client requests into the systems and analyses I planned and executed, what tools I used, etc. but sharing any code or data would be a big no-no.

noamross · September 24, 2017, 11:39am

No, but we primarily hire academics.

stefanie · September 26, 2017, 12:04am

Here are two blog post suggestions from Gina Helfrich, PhD, Communications Director & Program Manager for Diversity & Inclusion at NumFOCUS

We analyzed thousands of technical interviews on everything from language to code style. Here’s what we found, from interviewing.io
I like the later part of this post How to conduct a good Programming Interview where it gets a how to structure the interview

stefanie · September 26, 2017, 12:05am

BUT! @ClaytonJY’s original question included "How can I ensure I get a diverse pool of applicants, and don’t filter them out for the wrong reasons along the way?
This is hard. Any suggestions?

Dr-G · September 26, 2017, 2:20pm

I think it’s hard to do but not hard to figure out what methodology to use. For example, if you followed all the directives here, you’d be in pretty good shape: http://projectinclude.org/hiring#standardize-your-decision-making-process

noamross · October 9, 2017, 3:44pm

Does anyone implement blinded application review (e.g., blacking out names names and pronouns in letters of interest/resumes?) How does it work? I don’t think its practical for the late-stage process for us but we could probably do it for initial screening.

ClaytonJY · October 10, 2017, 3:19pm

That’s what we did the last time we hired; had an employee that wasn’t involved in hiring decisions white out as many indicators as possible. Hard to say how well it worked, and there’s a lot more subtle differences between resumes/cover-letters across gender/race/background than could ever be whited-out, but I’d do it again; seems at least a bit more fair or unbiased than not doing it.

noamross · October 19, 2017, 7:42pm

@ClaytonJY Quick Q: In terms of pure mechanics/workflow, how did you do this? Did the employee actually white out printed applications or did they do it online? Did any tool make it easier/faster? Just trying to reduce the burden on whoever I ask for help with this.

ClaytonJY · October 19, 2017, 10:16pm

They started whiting-out printed ones because we didn’t think to anonymize up-front, but then the employee did all the anonymizing digitally before printing hard copies for review. I think at the time most applicants came from the same place (indeed), so that probably made it easier, though I don’t know the specific steps that had to be done to go from indeed app -> anonymized printed version.

In hindsight, seems like a valuable service for a job portal to offer, if they don’t already. Possible caveat is they’d likely have to standardize resume format, and I personally won’t consider applying to anywhere that won’t accept my LaTeX-PDF resume.

Then again, resume formatting might be another subtle signal that allows for subconscious discrimination on the wrong characteristics.

noamross · October 24, 2017, 1:01pm

One more Q: What did you white out? We’re finally implementing a test of this and I am asking for name and indications of gender, race, and nationality (address) to be removed. I am also considering:

School (I made an effort to reach out to HBCUs and my guess is that these schools will be less familiar to reviewers)
Publication co-authors. Not sure about this. In any case this and name will be revealed when reviewers look at publications on the second round.

Thanks for your insight!

ClaytonJY · November 3, 2017, 6:23pm

We only whited out names, and possibly addresses, which really isn’t enough. If you had asked us then why we didn’t white out school names, our excuse would have involved the fact that most applicants were local, from similar departments at 3 nearby schools, which we already know a lot about the variance in quality of, so we think that’s a useful signal. In hindsight I’m less sure that’s defensible, so we might white those out next time.

We weren’t hiring academics or even PhD’s, so publications weren’t much of a concern for us. Would be very interested in hearing more opinions from those that do hire academics about how they treat publication history and how big a source of bias that might be.

Topic		Replies	Views
How do you review code that accompanies a research project or paper? Help rOpenSci plan a Community Call General Q&A commcalls , codereview	48	5963	November 29, 2018
rOpenSci \| Working with Qualtrics Data - Part 2: Excluding Data Blog	0	232	August 9, 2022
Social Science Data Scientist Jobs	0	282	May 9, 2023
A research compendium and methylation raw data General Q&A r , data , package , reproducibility	4	1996	September 30, 2017
Data license visibility General Q&A	18	1452	September 13, 2018

How do you hire data people in a fair and unbiased way?

Related topics