Is Python Edging Out R in the Data Science Space?
At the moment, tech pros rely on a small handful of well-known languages to develop machine-learning (ML) apps for various industries. But a subtle change may be coming: the Python ecosystem may eventually overtake R as the platform of choice for data analytics and machine learning. A recent KDnuggets poll of tech pros who use both R and Python showed that, over the past two years, there's been a slow decline in R usage in favor of Python. Meanwhile, a separate survey from Burtch Works revealed that Python use among analytics professionals grew from 53 percent to 69 percent over that same time two-year period, while the R user-base shrunk by nearly a third. Are these user samples valid? If you’re involved in machine learning and data analytics, should you prefer Python over R? "Broadly speaking," said Greg Ambrose, CEO of Stack Talent, a technology search and recruitment company headquartered in Chicago, "why one would gain traction over another is, to some degree, a function of philosophy and preference." As Ambrose also noted, his computer science candidates nearly all gravitate toward flexible, broader-use Python, while candidates with traditional data science and statistics backgrounds focus more on R. And while he hasn't seen any substantial shift in hiring, he has noticed that more hiring managers seeking to fill statistics roles are interested in candidates with Python experience. (Companies are currently continuing to leverage both languages, he added.) "Combining R and Python is both reasonable and feasible," agreed Enriko Aryanto, the CTO and a co-founder of the Redwood City, Calif.-based QuanticMind, a data platform for intelligent marketing. "We run them both in our data science platform internally. But if I were starting my career all over again today, I might consider focusing on Python rather than R. It’s a more-general language with broader applications." R's limitations are potentially giving Python the edge in the data science and machine learning space. "R has issues with scalability," Aryanto said. "It’s a single-threaded language that runs in RAM, so it’s memory-constrained, while Python has full support for multi-threading and doesn’t have memory issues. When choosing a language, it all comes down to choosing what’s best to solve your problem." That said, Aryanto’s team uses R in their day-to-day data science and machine learning tasks. R is also firmly entrenched in academic programs, and Aryanto surmises that there will be statistics majors training in it for years to come. It's worth mentioning that, despite Python's current ascendancy in the space, it will likely end up eclipsed by some other language or platform. That’s just how technology works; it wasn’t that long ago that SAS was in wide use, only to have R replace it as a cost-effective (and open-source) competitor. But there may come a day when it’s no longer critical for tech pros in the analytics and machine learning space to know R. Two of Aryanto's current team members don't even have a background in it. "In a few years, could R gradually be de-emphasized in favor of Python and eventually fall off the map for résumés? Yes, I guess it could," he said. "But for me, the most important thing is finding someone who’s smart enough to learn new languages to solve new problems.” Python may seem very hot right now, he cautioned, “But in ten years that might not be the case. The people we hire have to be able to adjust and adapt to change. Ultimately, it goes back to solving our most important problems, whatever they may be, in the quickest, most scalable and easiest way."