Data scientists have a deceptively straightforward job to do: make sense of the torrent of data that enters an organization as unstructured hash. Somewhere in that confusion (hopefully) lies vital insight. But is skill with algorithms and datasets enough for data scientists to succeed? What else
do they need to know to advance their careers? While many tech pros might think that pushing data from query to conclusion is enough to get by, they also need to know how the overall business works, and how their data work will ultimately impact strategies and revenue. The current hunger for data analytics means that companies always want more from their data scientists.
Hard and Soft Skills
“There is a shortage, a skills gap in data science. It is enormous and it is growing,” said Crystal Valentine, vice president of technology strategy at MapR, a Big Data firm. As proof of this, Valentine cited a report from
consulting firm McKinsey & Co. that suggests a national shortage of as many as 190,000 people with “deep analytical skills” by 2018. That’s in addition to a gap of roughly 1.5 million “Big Data” analysts and managers during the same time period. Modern data science evolved from three fields: applied mathematics, statistics, and computer science. In recent years, however, the term “data scientist” has broadened to include anyone with “a background in the quantitative field,” Valentine added. Other fields—including physics and linguistics—are developing more of a symbiotic relationship with data science, thanks in large part to the evolution of artificial intelligence, machine learning, and natural language processing. In addition to aptitude with math and algorithms, successful data scientists have also mastered soft skills. “They need to know more than what is happening in the cubicle,” said Mansour Raad, senior software architect at ESRI, which produces mapping software. “You have to be a people person.” In order to effectively crunch numbers, in other words, data scientists need to work with the people who know the larger business. They must interact with managers who can frame the company’s larger strategy, as well as colleagues who will turn data insights into real action. With more input from those other stakeholders, data scientists can better formulate the right questions to drive their analysis. “Soft skills” also means a healthy curiosity, said Thomas Redman, a.k.a. the “Data Doc,” who consults and speaks extensively about data science. Ideally, the applicant “likes to understand data, to understand what is going on in the world.” When applying for data-science jobs, he added, applicants are often judged on their intellectual curiosity in addition to their other skills—employers fear “they will stay in front of a computer screen,” Redman observed. That can create an issue for some data scientists who are used to keeping their nose in the data, and not interacting with other business units. When Redman was a statistician at Bell Labs (long before the term “data scientist” was even coined), managers made a point of telling those employees who worked with data that the ultimate mission was to make the telephone network run better. That meant more than understanding statistics; it meant understanding the broader problems facing the company.
Faith vs. Skepticism
There’s an old saying in business: If you want to manage a problem, put a number on it. Data does that, to a certain extent. While the data scientist will wrangle the data, it’s up to the manager to make sense of it. Data can be taken on faith or questioned. Doing the former risks
“GIGO”—Garbage In, Garbage Out. The latter requires “data skepticism”—a good skill for anyone who works with data on a daily basis. Sometimes Raad spends about 80 percent of his time just cleaning data: “The data you get is just garbage.” In this respect, a data scientist is really a “data janitor.” In the real world, “data is messy,” MapR’s Valentine concurred. “You have to have a real healthy skepticism when looking at data collected from a real-life effort.” One can’t assume a uniform distribution: “Data is the side-effect of real-world processes.” A good data scientist keeps in mind that collected data is not unbiased. “You are trying to leverage the data to answer a question. You are not trying to stretch it too far,” Valentine added. “As a rule of thumb, gathering as much data as possible is a good strategy.” Even if you’re not a data scientist, taking the results of an analysis simply on faith is rarely a good idea. “We’re uncomfortable when someone else knows more than you do,” Redman said. Whenever you’re studying the results of an analysis, have a list of questions handy—where did the data come from? What’s the worst thing that can happen? What has to be true for the recommendation to be correct? “People who don’t question things are fair victims.” Redman said.
Bias vs. Objectivity
“Getting something right in the beginning is not a sign of victory.” Raad said. Be skeptical—do you have all the data? Is the data too good to be true? “The trick is to remove the human from the equation… Let the math speak for itself.” The data skeptic can then take the next step, showing how much of a conclusion is
not random. Don’t try to be perfect. The solution you craft must only be sufficient, getting the user from Point A to B. “You build a good, working Volkswagen [rather] than a Cadillac.” Raad said. “You have to be able to settle for the Volkswagen sometimes.” Teams’ preconceptions are often built into algorithms. For example, take a credit algorithm that rates applicants for loans; while you might think the underlying math is neutral, the programmer may have fed their biases into the code. Bias is not a new problem, Valentine said. Engineers often have to make a “subjective decision” when trying to meet goals—crafting portions of solutions that are sufficient to meet immediate needs. But it isn’t as if the underlying algorithms are black boxes: data scientists will need to determine for themselves if the software is producing a good outcome. When it comes to data scientist, both hard and soft skills are necessary to do the job—along with a healthy skepticism. When it comes to advancing a data-science career, not taking things on faith seems like a solid course of action.