Trey Causey – Hiring Data Scientists
-
Upload
anirudhjay -
Category
Documents
-
view
212 -
download
0
Transcript of Trey Causey – Hiring Data Scientists
-
8/18/2019 Trey Causey – Hiring Data Scientists
1/6
trey causey
Twitter GitHub LinkedIn the spread
Hiring data scientists
May 04, 2015
These are my thoughts about the process of hiring data scientists. By process I
mean the actual process of conducting interviews and asking candidatesquestions. I don't mean the topic of the questions or the skills you need to look for
in data scientists (I covered that earlier here). Many of these thoughts also extend
to hiring software developers and beyond, but they are written specifically about
data scientists. This piece is the result of my interviewing candidates and being
interviewed as a candidate at mid-size pre-IPO companies, large public
companies, and small startups. Full disclosure -- these are not data-driven, at least
not with any large samples. However, I believe they are internally consistent and
will result in a better, more equitable hiring process for both those doing the
hiring and those seeking a position and hel p fight against employee churn caused
by mismatched hires.
Obligatory note that these are my own views and not the views of any of my
employers, past or present.
My ideal process looks something like this. I'll detail each of these steps below.
1) Initial phone call to discuss the details of the position, hear from the candidate
about his/her experience, and, depending on the seniority of the position, a brief
technical screen.
http://treycausey.com/hiring_data_scientists.htmlhttp://twitter.com/treycauseyhttp://github.com/treycauseyhttps://www.linkedin.com/in/treycauseyhttp://thespread.us/http://twitter.com/treycauseyhttp://github.com/treycauseyhttps://www.linkedin.com/in/treycauseyhttp://thespread.us/http://treycausey.com/http://treycausey.com/http://treycausey.com/http://treycausey.com/http://treycausey.com/getting_started.htmlhttp://treycausey.com/hiring_data_scientists.htmlhttp://thespread.us/https://www.linkedin.com/in/treycauseyhttp://github.com/treycauseyhttp://twitter.com/treycauseyhttp://treycausey.com/http://treycausey.com/
-
8/18/2019 Trey Causey – Hiring Data Scientists
2/6
2) Homework assignment
3) On-site interview
4) Offer / no offer
That's it. There aren't multiple on-site interviews unless the candidate requests
them (which is a totally good thing). The goal is to hire people who are interested
in the job, who would be interesting to work with, and have the potential to growin the position while having a positive impact on the company. That's it. You're
not looking for ninjas, rockstars, pirates, unicorns, 10x data scientists, or Stanford
degrees. You're looking for people -- of all genders and all ethnicities and all
backgrounds.
These four steps should start with a fairly wide net and get narrower as the
process is completed. Err on the side of interesting, not on the side of exacting
resume requirements. People with diverse backgrounds bring a lot to the table --things you may not even realize you're missing in your organization. When you
select for highly specific resume requirements (i.e., a specific degree level, a
degree in a specific field, a certain programming language, etc.), you are
unnecessarily limiting the search. Not only are you missing out on extremely
intelligent and motivated people, you're most likely thinking about solving the
very specific problems you're facing right now rather than the problems you'll be
facing in six or twelve months.
Don't overfit to your organization's immediate situation or some template for what
a "data science resume" should look like. Data science is a new field and there are
many different kinds of data scientists. Check out this O'Reilly report by Harlan
Harris for a good overview. If you only hire people with PhDs in computer
science from Stanford or Berkeley, you're likely to miss out on great candidates
from other fields and schools. If you value diverse knowledge, viewpoints, and
expertise, you need to branch out. Drew Conway has a great talk on this.
Obviously, if the Stanford CS PhD is the best candidate for the job, you should
hire him/her! But don't make this your primary selection characteristic.
Guiding Principles
1) Treat candidates like they're intelligent human beings, with honest intentions,
and with something to add. That doesn't mean the something they can add is the
thing you need -- but it doesn't mean they're stupid, or incompetent, or any of the
other pejoratives that are often thrown around about candidates. Looking for a job
is stressful for everyone.
http://drewconway.com/zia/2013/7/18/warning-do-not-feed-the-wildebeestshttp://twitter.com/drewconwayhttp://twitter.com/harlanhhttp://www.oreilly.com/data/free/analyzing-the-analyzers.csp
-
8/18/2019 Trey Causey – Hiring Data Scientists
3/6
-
8/18/2019 Trey Causey – Hiring Data Scientists
4/6
work environment as possible. The candidate can use his/her own equipment in a
setting of their choosing using their favorite development environment. It's to
remove the artificiality of coding on a whiteboard, or coding up an algorithm
from scratch that the candidate probably wouldn't have to do in a typical day's
work.
For data scientists, this means providing a dataset that doesn't require a ton of cleaning, but perhaps requires some cleaning. Then ask the candidate to build a
model to answer some question, explain why they made the modeling choices she
made, and how they evaluated the performance of their model. That's it.
Seriously. The parameters are broad enough that candidates can be as
sophisticated or not as they like, can demonstrate substantive knowledge about a
problem domain (e.g., class imbalance in some prediction cases, or the
in/appropriateness of linear models for certain tasks).
Many people tend to make both technical screens and homework assignments too
hard and overly specific to the job. Do a survey of your colleagues and try to
gauge where they were at, skill-wise, when they were hired for a similar job to the
one listed. Figure out what they learned on the job. If a skill / method was
primarily learned on the job, it doesn't make for a good screen / homework
question. It might make a good speculative "how would you handle" on-site
question.
Two common arguments against assigning homework are a) it means you won't
be able to attract the very top candidates who don't have to do homework for
other prospective jobs, and b) it provides an undue burden on those candidates
who have kids or other obligations at night / on the weekends.
The first criticism makes little sense to me. Candidates that are excited about the
work and want to make sure they will like the kind of work they will be doing are
the kinds of candidates you want to hire. Candidates who see themselves asinvaluable commodities that will only go to the employer that showers them with
money and asks nothing in return are probably not. I'm more afraid of missing out
on that person who may not have the typical profile but will be an awesome
colleague than I am on missing out on someone with a picture-perfect but
uninteresting resume.
The second criticism can be a fair one -- that's why it's so important to make sure
you make the homework task a reasonable one, that can be completed tosatisfaction in a short amount of time. This takes practice! Make sure you solicit
feedback from all of the candidates who complete the homework assignment
about how long it took to complete the task. Remember that candidates may be
-
8/18/2019 Trey Causey – Hiring Data Scientists
5/6
asked to do homework assignments for several different companies. There are
also options to use a time-boxed environment.
If the candidate really can't do the homework assignment or you really want to be
strict about how long the homework assignment takes, you can make this part of
the on-site interview. In this case, the first two hours of the day will be devoted to
the assignment and the candidate can bring his or her own laptop to work on. Naturally, this increases the number of on-site interviews you will likely do.
On site Interview
On-site interviews should be conversational, not adversarial. They should be
technical but not trivia sessions. One good idea for a session is to have the
candidate present his/her homework to one or more of the interviewers, where
they can ask probing questions, hear explanations about why the candidate madecertain choices, etc. This isn't a defense, but a discussion. Other good questions
include extended conversations about projects that the candidate has worked on
(within the range of discussion allowed by NDAs, of course) where
implementation details, architectures, and technologies can be discussed.
Ask the same questions (within reason) of all candidates. This provides
standardization and a baseline that you can use to evaluate candidates. It also
(somewhat) removes your own personal biases from the process where you tailor certain kinds of questions to certain kinds of candidates.
Whiteboard Coding
No. Whiteboard. Coding. That merits repeating. No. Whiteboard. Coding. I said
it. Whiteboard coding is a bad idea for a number of reasons.
First, it's not a natural environment for writing code in; even if it is "unnatural for
everyone" as its proponents like to argue, it's not actually true that the degree of
unnatural-ness is uniformly distributed. No one actually writes code on the board,
without any kind of tools like tab-complete or and IDE or without Google/Stack
Overflow/help files.
Second, whiteboard coding problems naturally skew towards trivial programming
exercises that reflect "Algorithms 101" style problems. If you've been taking my
advice and recruiting across a variety of backgrounds, many very qualified
candidates won't have a formal CS education and won't be able to rattle off
various sort algorithms from memory. Building on the first point, it's rare that
people have to implement these things from scratch anyway -- and when they do,
-
8/18/2019 Trey Causey – Hiring Data Scientists
6/6
they have tools and books at their disposal. Making a whiteboard coding exercise
look like an Algorithms 101 quiz question will tend to favor young candidates
who just graduated with a CS degree and candidates who have spent a lot of time
prepping for coding interviews. That's about it.
If the homework assignment was "iffy" in quality, ask the candidate to email in
some code samples or post them on GitHub that can be discussed during theinterview as well. This eliminates the need for whiteboard coding.
Conduct
Interviewing is extremely stressful. Make sure you build in plenty of small breaks
for the candidate to use the restroom, have water/coffee/etc., and recollect their
thoughts. If the candidate wants to whiteboard things, let them, but don't force
them. Treat them like humans that might be your co-worker one day soon!
Always, always, always leave time for the candidate to ask questions about the
company and the position -- never omit this portion of an interview session
merely because you want to ask more questions. As a candidate, doing so is a
huge red flag for me. I interviewed at a very large tech firm where an interviewer
told me he was forgoing Q&A on my part because he wanted to ask me one more
question. This was a data point I used when I declined their offer.
Hiring and job-seeking is a matching process. You want everyone involved to
have as much information as possible to avoid the possibility of a mismatch --
which is costly for everyone. Candidates need to know the truth about what it's
like to work at your company, you need to see their work and discuss it with
them. This is why conversations, not inquisitions are important.
Conclusion
I am advocating for an empathetic but potentially more time-consuming process
here. I realize a lot of this advice runs contrary to widespread hiring practices, and
I'm comfortable with that. I'm interesting in promoting a workplace that
welcomes diversity in all forms and that matches candidates with jobs that help
them grow as people and as data scientists. Similarly, you should be hiring people
that will push your organization forward, and that encourages learning and
teaching between peers.
Tweet
Posted on: May 04, 2015
https://twitter.com/intent/tweet?original_referer=http%3A%2F%2Ftreycausey.com%2Fhiring_data_scientists.html&ref_src=twsrc%5Etfw&text=Hiring%20data%20scientists&tw_p=tweetbutton&url=http%3A%2F%2Ftreycausey.com%2Fhiring_data_scientists.html&via=treycausey