The Hadoop market reached $256 million in vendor revenue during 2012 and is forecast to grow to nearly $1.7 billion in 2017, according to Wikibon, an open source advisory community based in Marlborough, Mass. So veteran developers, architects and data warehousing specialists are spending every spare moment learning the framework for storage and large-scale processing of data sets. If you’re new to Hadoop and are interviewing for a Hadoop-heavy job, be ready to describe your hands-on experience with the framework, advises Jobin Chacko, senior associate, recruitment, for Synechron, an IT solutions firm based in New York City. Click here for Hadoop jobs. Chacko’s job is to determine whether a candidate has amassed enough practical experience to navigate the data and stringent security requirements of the financial services industry. Here are some of the questions he asks newcomers to Hadoop. Have you worked on a go-live project or a prototype?
- What Most People Say: “I’ve dabbled with Hadoop in my spare time.”
- What You Should Say: “I had considerable experience as a data warehouse architect before taking classes to learn Hadoop. Then, to make sure I was ready to handle big data sets, I pulled massive amounts of historical data from the New York Stock Exchange and used the sample database to hone my analytical skills. I also used the data to create programs in MapReduce. You can see samples of my work by visiting my website.”
- Why You Should Say It: If you’re going to hone your skills in a simulated environment, make sure it emulates what you’ll find in the real world, says Chacko. Real jobs require you to handle big, heavy data sets.
How many nodes can be in one cluster?
- What Most People Say: “I would say no more than two to three nodes.”
- What You Should Say: “Hadoop scales out nicely, so the load really depends on the structure and data warehouse configuration. Hadoop can easily handle 10 to 50 nodes.”
- Why You Should Say It: Inspire confidence by showing that you understand Hadoop’s clusters and how to coordinate the parallel processing of data using MapReduce. Also, be sure to highlight your previous experience working with large data sets, even if it didn’t involve Hadoop.
Which NoSQL databases have you worked with?
- What Most People Say: “I’ve worked with Cassandra.”
- What You Should Say: “There are four categories of NoSQL databases. The first is key-values stores. I’ve used Redis, primarily when working with semi-structured data. The second is column value stores. I’ve used Cassandra when I needed scalability and high availability. The third is document databases. When I’ve needed to store and access semi-structured documents in formats like JSON, I’ve used CouchDB. Finally, there’s graph databases like InfiniteGraph.”
- Why You Should Say It: Sometimes, professionals are told to work with an open source database simply because it’s cheap. Unfortunately, they’re not ready for prime time because they have no idea why they’re using it or which NoSQL database is more efficient for processing large quantities of structured, semi-structured or unstructured data.
Which tool have you used for monitoring nodes and clusters?
- What Most People Say: “I haven’t used one.”
- What You Should Say: “I’ve used Nagios for monitoring servers and switches. And I’ve used Ganglia for monitoring the entire grid.”
- Why You Should Say It: “There are approximately 59 tools that can be used with Hadoop,” explains Chacko. “And not all of them can be used at the same time.”
“An experienced IT professional may think they’re qualified because they’ve worked with NoSQL or other databases,” says Chacko. “But when you start asking questions, you realize that they really don’t have hands-on experience with some of the most common tools. In fact, some of them don’t know Hadoop at all.”