There is a lot of variety across different domains when it comes to Data Science Job Interviews. This blog post sheds light on what you should expect once you get past the introductory stages and head to the onsite and how to prepare.
- Take Home / Onsite Data Exercise : Some companies give you a data exercise to work on and they expect you to submit your answers or predictions. At a former startup I worked at, we had an easier take home data exercise that should take only a few hours to complete. If we brought you on-site then we gave you the harder data exercise that was expected to take you all day. You were expected to clean the data, build ETL, do some modeling, summarize your results and then present it to the team by the end of the day. The on-site data exercise was not perfect since some candidates buckled under the pressure and some others didn’t finish in the allotted time but we found that more than anything else we did that this gave us the highest signal to noise ratio when evaluating Data Scientist candidates. In fact, it has almost become standard practice to get at least a take home data exercise at most startups when you apply for a Data Science job today. It is difficult to prepare for this but the best resource to look at for lots of practice or to build a portfolio is the Collection of data science take home challenges
- Whiteboard programming : this is something we all love to hate because it’s not natural. On the job, you would typically use an IDE, shell or text editor. For an employer, this gives them an opportunity to the test your problem-solving skills and your thought process as you work through your solution. You typically see this in software engineering interviews but it has become a mainstay with Data Science interviews also. You’re usually given a few problems and asked to solve them using the language you’re most comfortable with. To prepare for this, it is essential you review basic computer science concepts, data structures, Big- O notation and complexity and practice a lot of coding challenges. Some helpful resources to use here are Data Structures Interview, Grokking Algorithms and Project Euler
- ML Project Discussion : Most interviewers will ask you to discuss one of the recent projects you’ve completed. This helps them test your ability to communicate your ideas in a succinct manner. It also helps if you’re able to tie some of your experience working on that project to the role you’re interviewing for. Highlighting which parts of the project you collaborated with other colleagues on, spent the most time working on or was most excited about will probably help guide the course of the interview.
- General / Applied Machine Learning knowledge : when you’re discussing your recent project, the interviewer will probably ask you which machine learning algorithms you considered and used for the project. Assuming you say Random Forest, you should be prepared to discuss the Random Forest algorithm in excruciating detail… why you used that particular algorithm, what scenarios does it work best, what scenarios does it fail, how you tuned it and they could also throw in hypothetical questions like would you do anything differently if you had to scale your project up to handle large amounts of data ( from hundreds of GBs to hundreds of TBs or more ). Another important topic to discuss here is how and why you chose your evaluation metric and how you extracted insights from your model to actually help the business.
- Theoretical Machine Learning knowledge : some of the expectations here go beyond just being able to use prepackaged machine learning models from different programming languages. Some roles could require you to rewrite machine learning algorithms to get them to fit your use case because of scaling concerns, getting it to work in a distributed environment or just integrating the algorithm / eventual model with other internal production systems. Here, you might be expected to derive some machine learning algorithm formulations while stating assumptions, etc. A simple example could be deriving logistic regression from first principles or maybe your favorite Neural Network activation function. They can get a bit into the weeds here especially with the domain specific areas. Some resources that will help you implement and develop an intuitive understanding of what going on under the hood are ML Refined, ML Algorithms from Scratch
- Engineering / Scaling : Depending on the role, the interviewer could ask you to discuss how you would scale a particular algorithm from small and medium data to big data (hundreds of TBs or more) with and without specialized hardware or infrastructure while still hitting your SLAs for prediction latency and model performance. I’ll discuss this in more detail in a later post.
- SQL : this is a vital skill for a Data Scientist and it isn’t going away anytime soon. Your interviewer could ask you to write SQL queries to answer a question that could require you to use different SQL structures like joins, sub queries, windowing functions, aggregations, etc. SQL resource
- Business Questions : The interviewer will ask questions specific to the business. A typical question could be – How would you improve [main metric that team cares about] ? It could also be how would you improve mobile usage, response time, subscription retention, churn or search depending on the team you’re interviewing with. This usually boils down to describing how you would approach the problem, do the analysis, give examples of a segment that performs badly, explain why you think that segment is doing poorly and come up with an idea to fix that. It is essential you have a good understanding of the business, top products, how the business makes money and risk factors. You should also be prepared for open ended / case questions since it is the core of the onsite interview at a lot of places where Data Science sits within the product team.
- Always be asking Questions : Make sure you have questions prepared for them. This shows a lot of interest on your end. This helps you also understand if the role will be a good fit.
You should remember that for the interviewer, the interview process helps them de-risk the investment they’re about to make in an Engineer or Data Scientist. They should come out of the process with enough new signal to make a decision while for the interviewee, the process helps them ask the tough questions and determine if there’s a good fit. It’s really a two-way street.
You should look for companies that have actually taken the time to think about and have implemented an interview regiment that enables them to extract more signal than noise during the interview process.
p.s If you have attended and graduated from a Data Science Bootcamp and you’d like to do a review of your experience, we’d love to hear from you. Please fill this form and we’ll reach out to you to conduct the interview.