Photo by Free To Use Sounds on Unsplash
Hiring the right data scientist, or any data professional, remains a difficult task for HR.
There are so many different skill sets required to take data from application systems and put them into analytical ones. From there, it takes a whole set of other skills to create machine learning and statistical models.
The problem is not purely just finding the right data scientist, but defining what your team needs.
To help simplify this process, our team has put together a few pointers as well as interview questions to help those out there looking to hire new data scientists.
Let’s first start by defining some of the skill sets you will often need when looking for a data scientist.
Data Science Skills
Photo by Crissy Jarvis on Unsplash
Statistics
One of the tasks data scientists perform is running experiments. This requires the need to hypothesis test and use p-values and that’s personally a baseline for statistical understanding.
Of course, there is a lot more to statistics than just understanding p-values. There are so many different levels of statistics that a team may require. For example, there is the concept of Frequentist vs. Bayesian statistics (which is usually a good intro question to gauge where someone is in their statistics journey). Some data scientists might only understand the high-level difference, while others might be keeping up with all the most recent updates on when to apply which method.
Perhaps you are looking for Ph.D.-level statistics to research and work on very abstract and ambiguous problems, or maybe you just need a data scientist that can perform basic analytics and run A/B testing.
These are two very different sides of the mathematical spectrum and their pay-grades will often reflect that.
Machine learning And statistical models
Along with general statistics, there are loads of models a data scientist can be familiar with. The generalized separation of model types starts with supervised vs. unsupervised, and then further continues to split into other model types, like classification and regression. Even here, there are further nuances like the concept of ensemble models.
Overall, it is very difficult for one data scientist to fully grasp every single model and or have implemented even half them into production.
Programming
Programming skills are interesting skills to look for, due to all the varying levels and nuances in programming. For example, some data scientists might only use Python and R for analytical purposes.
This means they might not be a good fit for implementing models into larger systems.
Other data scientists are not only great coders, but also DevOps-level engineers who are experts in Docker, Kubernetes, and CI/CD. They not only can create models but they can deploy your model to an enterprise system or AWS Lambda function.
This is really dependent upon what role your team plays. If your team focuses more on research and analysis, then you don’t need data scientists that can also program entire systems.
On the other hand, if your team is heavily entwined with a specific product and needs to deploy models, then having at least one strong programming data scientist is helpful.
SQL
One skill that anyone who uses data inevitably requires is SQL. But not every data scientist or researcher is proficient at writing super complex queries. Personally, we believe that pretty much every data scientist should know the basics of SQL at the very least; otherwise, you can’t even access the data you want, in many cases.
Now, if you already have a strong SQL expert on your team, you might be able to get away with not requiring every new hire to have SQL skills. But it is usually required.
Data visualization
Part of a data scientist’s role is communicating their findings with directors and managers to help drive decisions. One great way to do that is through data visualizations. This can be in Jupyter Notebooks, Tableau, and a list of other data visualizations tools.
Even in the data visualization category, there are varying levels. Some data scientists only are interested in learning just enough to create a report using a few basic charts, while others really enjoy creating beautiful dashboards that really allow managers and directors to drill into. So make sure you know which kind of data visualization skills you are looking for.
Product and business sense
One skill that is often overlooked is a person’s product or business sense. Not every data scientist on your team needs to be amazing at this. However, having one or two team members that are good at knowing where your team could find value in your business is key.
Just knowing how to run models and perform hypothesis testing is not enough as a data scientist. Companies will often have an infinite amount of problems to solve, and knowing where to find the best ROI helps justify a data scientist’s six-figure salary.
So you can’t just rely purely on a highly technical data scientist to find and create the right models. Having a team that has good product sense makes a huge difference. Otherwise, the team could be focusing all their brilliance in all the wrong places.
Deep learning for NLP And computer vision
Honestly, most deep learning work is done by software engineers, in our experience. Perhaps the data scientist did some research but the heavy lifting portion of the deep learning is done by the SWEs. Of course, that doesn’t mean companies aren’t looking for data scientists with an understanding of TensorFlow.
So we will add this skill set as well because we have seen it on job descriptions in the past.
Other Factors to Consider Before Creating a Job Requisition
Company size
In the previous section, I referenced that not all data scientists double as software engineers and DevOps specialists. However, when you are a start-up, your data scientist(s) probably has to be.
When you are looking for a data scientist at a large company, often you may already have 10, 20, 100, or maybe even nearly 1,000 data scientists working for your company. On top of that, you probably have just as many or more software engineers and a deployment team.
This allows your team to fill in any technical gaps.
This is usually not the case when you are looking for a data scientist at a start-up. In this case, you are often looking for more of a jack-of-all-trades type of data scientist.
There are nuances when looking for a new data scientist.
Client-facing role
Not all data scientists enjoy being client-facing, while still others do. At the end of the day, it is usually unavoidable and every team needs to go out and talk to their clients and internal partners to better align with the companies goals.
Knowing where the data science team is positioned inside the company can also help you better understand the person you are trying to hire.
Team dynamics
We will bring this up a few more times in this piece, but data science is a team sport. Just like in any other team sport, not everyone is a quarterback, and not everyone on your team should be. There are so many different facets to what makes a successful data science project that hiring for one specific set of skills wouldn’t make sense.
Instead, look for people with a diverse set of skills and experiences. Hire someone from the biology field, the physics field, and the business field. Having a team built of strong individuals who work together and bring a variety of skills provides the ability to amplify your data science team’s impact.
The Data Science Interview
Once you have a general idea of the skills your team needs, you can now develop an interview process to help find those skills.
Our team has put together a very extensive interview guide to help data scientists prepare for their future interviews.
One thing you will notice is the guide stretches the gambit from data science, SQL, statistics, and evening algorithms and data structures.
The vast array of topics has come from our past experiences and us seeing so many different types of questions being asked for data science interviews.
Some of this is because data scientists need a large skill set to do their work. However, we also feel that perhaps it is because employers don’t always know what they need. So below, we have provided a few sections of questions you can ask your data scientists based on their role.
The Technical Interview Process
Many of us have interviewed at tech companies like Google and Amazon.
We have had some great, some awkward, and some terrible.
The question becomes: Who is responsible for the interview experience?
Personally, we believe it is up to the company to create a great interview experience. To this day, many of our team members still recall the best interview experiences they have had, as well as the worst.
So what made the great experiences?
Simply put, we felt like we were often solving problems with the interviewer not being asked to perform a trick.
Technical interviews are very stressful. You can’t use your typical tools like a linter or StackOverflow. So even small problems might take a minute or two to solve.
When the interviewer provides little feedback or fails to provide hints that can get the interviewee unstuck, it’s not good from either side.
One, the interviewee has a bad experience, and two, the company could be losing out on a chance to hire a smart engineer who just had their brain freeze on how to convert an integer or do some simple task.
So creating an interview culture that focuses not just about getting the right answer, but also about leveling an interview while interviewing, is a great tip.
Every company has its own idea here, so we won’t say that this notion is 100% correct. But we will also say that we stand behind the idea of attempting to create a collaborative interview vs. a one-sided interview that just feels like one is being asked to perform tricks.
Data Science Interview Questions
There is still a lot of argument in the technical world whether or not whiteboarding and technical interviews work. We personally find value in being able to assess some baseline skillsets through a few technical questions.
You shouldn’t be looking to trick your candidate. Your interview should be structured to compensate for stress and the lack of an IDE. This means your questions should focus on laying out simple concepts that help see if your candidate has at least a general understanding of the skills you need.
For example, if you need SQL skills, then you should probably have at least one question with a join. From there, it could depend on how much SQL and what level of SQL is required. If your position needs a lot of complex SQL, then you can follow up with a question that requires either a self or left join, just to ensure the candidate understands the question.
If, on the other hand, the role requires only the occasional query or your data scientists can rely on data engineers to pull their data, then you can just ask what the difference is between a left and an inner join.
Statistics
Statistics questions can vary. Some companies we have seen will ask the statistics question inside of a programming question if the role requires a larger amount of programming, while others might just ask more probability-style questions, like these ones taken directly from glassdoor.com:
- A die is rolled twice. What is the probability of showing a 3 on the first roll and an odd number on the second roll?
- In any 15-minute interval, there is a 20% probability that you will see at least one shooting star. What is the probability that you see at least one shooting star in the period of an hour?
- Alice has two kids and one of them is a girl. What is the probability that the other child is also a girl? You can assume that there is an equal number of males and females in the world.
Product and metrics questions
- An important metric goes down. How would you dig into the causes?
- What metrics would you use to quantify the success of YouTube ads (this could also be extended to other products like Snapchat filters, Twitter live-streaming, Fortnite new features, etc.)
- How do you measure the success or failure of a product/product feature?
- Google has released a new version of its search algorithm, for which they used A/B testing. During the testing process, engineers realized that the new algorithm was not implemented correctly and returned less relevant results. Two things happened during testing:
- People in the treatment group performed more queries than the control group.
- Advertising revenue was higher in the treatment group as well.
What may be the cause of people in the treatment group performing more searches than the control group? There are different possible answers here.
(Question 4 borrowed from Zarantech; we really enjoyed it and thought it was a good example of how things can go wrong.)
SQL
We brought up SQL earlier as an important skill, but as we said, there are levels. A good way of figuring out where your candidate’s level is at is to ask a few styles of questions.
- Selects with an aggregate.
- Selects with an inner join.
- Select with a left join.
- Selects with a self-join.
- Bonus — Selects that force some use of an analytic clause.
Data science algorithms and machine learning
For this section, you can stay high-level and focus more on conceptual questions about different types of models, concepts, and gotchas in data science. For example, one problem a more junior data scientist might not have encountered in production is data imbalance. So asking about it can help surface how much experience a candidate has.
But also here are some other questions you could try asking:
- What is a recent model you implemented, and how did you have to prepare the data in order to make the model work?
- What is logistic regression?
- What is A/B testing?
Programming algorithms and data structures
It’s rare that data scientists need a deep knowledge of programming. However, in the case that they do, there are plenty of problems across the internet that both provide a good understanding into a data scientist’s programming knowledge, without being so complex that it would cause them to stumble.
For example, here are three.
Onboarding Your Data Science Team
Alright, now that you have spent so much time getting your data science team, you need to both onboard and train them so they can understand your technical environment, culture, and pain points.
These three things are the core areas we believe companies need to focus on when onboarding technical teams to help them become efficient quickly.
This is your company’s first impression from the perspective of an employee-employer relationship (since they probably interviewed to get the job, i.e., an interviewee-interviewer relationship).
You need to set the tone for how working at this company will be.
Good Luck With Hiring
Interviewing data scientists isn’t always about a large checklist of skills. Very few people have every skill a data science team needs, just like not every player on a basketball team needs to know every position. Even though some might argue that each player in basketball really just needs to know how to dribble and shoot, there are nuances that separate each role. Similarly, most data scientists should have a basic understanding of SQL and statistics, but from there they could vary in level and specialize in various other skills.
Data science isn’t usually an individual effort; it’s a team sport, so don’t just focus on hiring individual all-stars, focus on developing strong teams.
Thanks for reading and good luck hiring!
If you would like to read more about data science or cloud computing, then please click below.
Data Engineering 101: Writing Your First Pipeline
Data Engineering 101: An Introduction To Data Engineering
What Are The Different Kinds Of Cloud Computing
4 Simple Python Ideas To Automate Your Workflow
4 Must Have Skills For Data Scientists
SQL Best Practices — Designing An ETL Video