Trey Causey – Hiring Data Scientists

download Trey Causey – Hiring Data Scientists

of 6

Transcript of Trey Causey – Hiring Data Scientists

  • 8/18/2019 Trey Causey – Hiring Data Scientists

    1/6

    trey causey

    Twitter GitHub LinkedIn the spread

    Hiring data scientists

    May 04, 2015

    These are my thoughts about the process of hiring data scientists. By process I

    mean the actual process of conducting interviews and asking candidatesquestions. I don't mean the topic of the questions or the skills you need to look for 

    in data scientists (I covered that earlier here). Many of these thoughts also extend

    to hiring software developers and  beyond, but they are written specifically about

    data scientists. This piece is the result of my interviewing candidates and being

    interviewed as a candidate at mid-size pre-IPO companies, large public

    companies, and small startups. Full disclosure -- these are not data-driven, at least

    not with any large samples. However, I believe they are internally consistent and

    will result in a better, more equitable hiring process for both those doing the

    hiring and those seeking a position and hel p fight against employee churn caused

     by mismatched hires.

    Obligatory note that these are my own views and not the views of any of my

    employers, past or present.

    My ideal process looks something like this. I'll detail each of these steps below.

    1) Initial phone call to discuss the details of the position, hear from the candidate

    about his/her experience, and, depending on the seniority of the position, a brief 

    technical screen.

    http://treycausey.com/hiring_data_scientists.htmlhttp://twitter.com/treycauseyhttp://github.com/treycauseyhttps://www.linkedin.com/in/treycauseyhttp://thespread.us/http://twitter.com/treycauseyhttp://github.com/treycauseyhttps://www.linkedin.com/in/treycauseyhttp://thespread.us/http://treycausey.com/http://treycausey.com/http://treycausey.com/http://treycausey.com/http://treycausey.com/getting_started.htmlhttp://treycausey.com/hiring_data_scientists.htmlhttp://thespread.us/https://www.linkedin.com/in/treycauseyhttp://github.com/treycauseyhttp://twitter.com/treycauseyhttp://treycausey.com/http://treycausey.com/

  • 8/18/2019 Trey Causey – Hiring Data Scientists

    2/6

    2) Homework assignment

    3) On-site interview

    4) Offer / no offer 

    That's it. There aren't multiple on-site interviews unless the candidate requests

    them (which is a totally good thing). The goal is to hire people who are interested

    in the job, who would be interesting to work with, and have the potential to growin the position while having a positive impact on the company. That's it. You're

    not looking for ninjas, rockstars, pirates, unicorns, 10x data scientists, or Stanford

    degrees. You're looking for people -- of all genders and all ethnicities and all

     backgrounds.

    These four steps should start with a fairly wide net and get narrower as the

     process is completed. Err on the side of interesting, not on the side of exacting

    resume requirements. People with diverse backgrounds bring a lot to the table --things you may not even realize you're missing in your organization. When you

    select for highly specific resume requirements (i.e., a specific degree level, a

    degree in a specific field, a certain programming language, etc.), you are

    unnecessarily limiting the search. Not only are you missing out on extremely

    intelligent and motivated people, you're most likely thinking about solving the

    very specific problems you're facing right now rather than the problems you'll be

    facing in six or twelve months.

    Don't overfit to your organization's immediate situation or some template for what

    a "data science resume" should look like. Data science is a new field and there are

    many different kinds of data scientists. Check out this O'Reilly report by Harlan

    Harris for a good overview. If you only hire people with PhDs in computer 

    science from Stanford or Berkeley, you're likely to miss out on great candidates

    from other fields and schools. If you value diverse knowledge, viewpoints, and

    expertise, you need to branch out. Drew Conway has a great talk on this.

    Obviously, if the Stanford CS PhD is the best candidate for the job, you should

    hire him/her! But don't make this your primary selection characteristic.

    Guiding Principles

    1) Treat candidates like they're intelligent human beings, with honest intentions,

    and with something to add. That doesn't mean the something they can add is the

    thing you need -- but it doesn't mean they're stupid, or incompetent, or any of the

    other pejoratives that are often thrown around about candidates. Looking for a job

    is stressful for everyone.

    http://drewconway.com/zia/2013/7/18/warning-do-not-feed-the-wildebeestshttp://twitter.com/drewconwayhttp://twitter.com/harlanhhttp://www.oreilly.com/data/free/analyzing-the-analyzers.csp

  • 8/18/2019 Trey Causey – Hiring Data Scientists

    3/6

  • 8/18/2019 Trey Causey – Hiring Data Scientists

    4/6

    work environment as possible. The candidate can use his/her own equipment in a

    setting of their choosing using their favorite development environment. It's to

    remove the artificiality of coding on a whiteboard, or coding up an algorithm

    from scratch that the candidate probably wouldn't have to do in a typical day's

    work.

    For data scientists, this means providing a dataset that doesn't require a ton of cleaning, but perhaps requires some cleaning. Then ask the candidate to build a

    model to answer some question, explain why they made the modeling choices she

    made, and how they evaluated the performance of their model. That's it.

    Seriously. The parameters are broad enough that candidates can be as

    sophisticated or not as they like, can demonstrate substantive knowledge about a

     problem domain (e.g., class imbalance in some prediction cases, or the

    in/appropriateness of linear models for certain tasks).

    Many people tend to make both technical screens and homework assignments too

    hard and overly specific to the job. Do a survey of your colleagues and try to

    gauge where they were at, skill-wise, when they were hired for a similar job to the

    one listed. Figure out what they learned on the job. If a skill / method was

     primarily learned on the job, it doesn't make for a good screen / homework 

    question. It might make a good speculative "how would you handle" on-site

    question.

    Two common arguments against assigning homework are a) it means you won't

     be able to attract the very top candidates who don't have to do homework for 

    other prospective jobs, and b) it provides an undue burden on those candidates

    who have kids or other obligations at night / on the weekends.

    The first criticism makes little sense to me. Candidates that are excited about the

    work and want to make sure they will like the kind of work they will be doing are

    the kinds of candidates you want to hire. Candidates who see themselves asinvaluable commodities that will only go to the employer that showers them with

    money and asks nothing in return are probably not. I'm more afraid of missing out

    on that person who may not have the typical profile but will be an awesome

    colleague than I am on missing out on someone with a picture-perfect but

    uninteresting resume.

    The second criticism can be a fair one -- that's why it's so important to make sure

    you make the homework task a reasonable one, that can be completed tosatisfaction in a short amount of time. This takes practice! Make sure you solicit

    feedback from all of the candidates who complete the homework assignment

    about how long it took to complete the task. Remember that candidates may be

  • 8/18/2019 Trey Causey – Hiring Data Scientists

    5/6

    asked to do homework assignments for several different companies. There are

    also options to use a time-boxed environment.

    If the candidate really can't do the homework assignment or you really want to be

    strict about how long the homework assignment takes, you can make this part of 

    the on-site interview. In this case, the first two hours of the day will be devoted to

    the assignment and the candidate can bring his or her own laptop to work on. Naturally, this increases the number of on-site interviews you will likely do.

    On site Interview

    On-site interviews should be conversational, not adversarial. They should be

    technical but not trivia sessions. One good idea for a session is to have the

    candidate present his/her homework to one or more of the interviewers, where

    they can ask probing questions, hear explanations about why the candidate madecertain choices, etc. This isn't a defense, but a discussion. Other good questions

    include extended conversations about projects that the candidate has worked on

    (within the range of discussion allowed by NDAs, of course) where

    implementation details, architectures, and technologies can be discussed.

    Ask the same questions (within reason) of all candidates. This provides

    standardization and a baseline that you can use to evaluate candidates. It also

    (somewhat) removes your own personal biases from the process where you tailor certain kinds of questions to certain kinds of candidates.

    Whiteboard Coding

     No. Whiteboard. Coding. That merits repeating. No. Whiteboard. Coding. I said

    it. Whiteboard coding is a bad idea for a number of reasons.

    First, it's not a natural environment for writing code in; even if it is "unnatural for 

    everyone" as its proponents like to argue, it's not actually true that the degree of 

    unnatural-ness is uniformly distributed. No one actually writes code on the board,

    without any kind of tools like tab-complete or and IDE or without Google/Stack 

    Overflow/help files.

    Second, whiteboard coding problems naturally skew towards trivial programming

    exercises that reflect "Algorithms 101" style problems. If you've been taking my

    advice and recruiting across a variety of backgrounds, many very qualified

    candidates won't have a formal CS education and won't be able to rattle off 

    various sort algorithms from memory. Building on the first point, it's rare that

     people have to implement these things from scratch anyway -- and when they do,

  • 8/18/2019 Trey Causey – Hiring Data Scientists

    6/6

    they have tools and books at their disposal. Making a whiteboard coding exercise

    look like an Algorithms 101 quiz question will tend to favor young candidates

    who just graduated with a CS degree and candidates who have spent a lot of time

     prepping for coding interviews. That's about it.

    If the homework assignment was "iffy" in quality, ask the candidate to email in

    some code samples or post them on GitHub that can be discussed during theinterview as well. This eliminates the need for whiteboard coding.

    Conduct

    Interviewing is extremely stressful. Make sure you build in plenty of small breaks

    for the candidate to use the restroom, have water/coffee/etc., and recollect their 

    thoughts. If the candidate wants to whiteboard things, let them, but don't force

    them. Treat them like humans that might be your co-worker one day soon!

    Always, always, always leave time for the candidate to ask questions about the

    company and the position -- never  omit this portion of an interview session

    merely because you want to ask more questions. As a candidate, doing so is a

    huge red flag for me. I interviewed at a very large tech firm where an interviewer 

    told me he was forgoing Q&A on my part because he wanted to ask me one more

    question. This was a data point I used when I declined their offer.

    Hiring and job-seeking is a matching process. You want everyone involved to

    have as much information as possible to avoid the possibility of a mismatch --

    which is costly for everyone. Candidates need to know the truth about what it's

    like to work at your company, you need to see their work and discuss it with

    them. This is why conversations, not inquisitions are important.

    Conclusion

    I am advocating for an empathetic but potentially more time-consuming process

    here. I realize a lot of this advice runs contrary to widespread hiring practices, and

    I'm comfortable with that. I'm interesting in promoting a workplace that

    welcomes diversity in all forms and that matches candidates with jobs that help

    them grow as people and as data scientists. Similarly, you should be hiring people

    that will push your organization forward, and that encourages learning and

    teaching between peers.

    Tweet

    Posted on: May 04, 2015

    https://twitter.com/intent/tweet?original_referer=http%3A%2F%2Ftreycausey.com%2Fhiring_data_scientists.html&ref_src=twsrc%5Etfw&text=Hiring%20data%20scientists&tw_p=tweetbutton&url=http%3A%2F%2Ftreycausey.com%2Fhiring_data_scientists.html&via=treycausey