In a world increasingly dominated by data, data engineers are critical, as they figure out how to store, move, and clean an organization’s data. Data scientists and analysts, in turn, depend on these engineers’ work in order to mine data for valuable insights.
Thanks to the complexity of their jobs, data engineers must boast a variety of skills, including SQL programming and Big Data platforms such as Apache Hadoop and Scala. Large companies frequently hire data engineers, but they also want job candidates who have the right mix of skills and background to succeed at very large projects.
“Data has definitely not slowed down in its importance to enterprises—it comes in so quickly and there’s so much of it, and the function of the data engineer is filling in the gap between the vast amount of data coming in and making it useful to the business,” said David Harris senior product manager, enterprise, at e-learning specialist Udacity.
Dice Insights spoke with Harris about the critical role data engineers play in today’s organizations, the critical skills they need to have, and the important questions they should be asking during the interview process.
What are the challenges faced today by data engineers?
Data engineers must move data from one place to another, and ensure it arrives at its destination as clean and organized as possible (which is difficult with some messier datasets). Since data can come from hundreds or even thousands of sources, these engineers must know as much as possible about formatting and structure (and the processes and tools needed to achieve it).
“Beyond that, there’s the question of whether the data is correct and questions of data governance, how the data gets shared, how privacy plays into the information that’s being shared and funneled to different places,” Harris explained. “Data engineers really have to understand who’s using the information, what its being used for, understanding the risk if its inaccurate or not updated. There’s a lot to think about.”
What questions do technical recruiters ask?
“It’s going to be similar to a software engineering interview—the types of projects you’ve worked on, tell me about how you designed a pipeline or data structure for a data warehouse,” Harris said. “Some companies will be looking for experience with working with streaming data, with Spark or Kafka, so typically they’re looking for specific experience with critical functions of data engineering like managing data lakes and data warehouses, building data pipelines for moving data from one source to another. Those are the biggest check boxes they’re looking for.”
What are ways data engineers can best prepare for a job interview?
Harris said there are two important areas to focus on when it comes to interview preparation: your background, and what you’ve learned from your previous projects (whether those projects proved a success).
“One of the things every interview will ask is to describe a recent project,” Harris said. “So document the process you went through and the impact of the work you did, and how you made those decisions.”
What are the most important skills one should know?
According to Burning Glass, key skills for data engineers include Python, machine learning, and Big Data, so expect questions from recruiters and hiring managers in those areas. It’s also important to describe and defend data infrastructure design, which means thinking about how you can store data in a data warehouse or lake, and being able to describe how you would implement those designs.
“When it comes to communication and collaboration skills, you need to be able to ask good, probing questions to figure out the needs of your users and what problems they’re trying to solve,” Harris said. “It’s going beyond taking a request for data and executing it—it’s about figuring out the impact it’s going to have.”
Beyond that, data engineers need to understand the tradeoffs they make choosing to work on one project versus another. “As an engineer, you need to prioritize, and then figure out what to work on next,” Harris added. “Knowing who to communicate that strategy to is also super important to being a successful data engineer.”
How does one answer questions about “greatest achievements” or strengths/weaknesses?
“Greatest achievement” questions provide you with the opportunity to talk about projects that really challenged you, and that you’re proud of having accomplished. “The excitement you have from accomplishing something really hard comes across, so you have the opportunity to show why you’re passionate about your work,” Harris said.
When it comes to weaknesses, highlight your opportunities for development. If you’re weak in a particular area of data engineering, that’s okay so long as you’re self-aware and interested in improving. Not everyone knows every tool or skill; your capacity to learn is crucial.
What questions should a data engineer ask during the interview?
Ask about your prospective employer’s current data infrastructure; while they might not be able to share all details with you, they can at least give you a sense of how the company treats its data in motion and at rest.
“Also, ask what type of mentorship is available, how collaborative the teams are, how people ask questions and collaborate,” Harris said. Don’t be afraid to ask probing questions; you want as much information about a company as possible before committing to a job as complex as data engineering.