I was thinking about the eternal "what is a data scientist?" again recently, and was reminded or this article I wrote back in December 2015.
Even though the Data Science field has changed a lot since then, I as pleasantly surprised to find out that I still agree with my thoughts back then.
The article below was slightly edited for clarity.
This fairly lengthy article by Robert Chang, Data Scientist at Twitter, got me thinking about Data Science. It caught my interest for two reasons. First, it includes this quote, which I love, because it is very true:
Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it. — Dan Ariely
It also identifies two types of Data Scientists. The article mentions that this distinction comes from Michael Hochster on Quora:
Type A Data Scientist: The A is for Analysis. This type is primarily concerned with making sense of data or working with it in a fairly static way. The Type A Data Scientist is very similar to a statistician (and may be one) but knows all the practical details of working with data that aren’t taught in the statistics curriculum: data cleaning, methods for dealing with very large data sets, visualisation, deep knowledge of a particular domain, writing well about data, and so on.
Type B Data Scientist: The B is for Building. Type B Data Scientists share some statistical background with Type A, but they are also very strong coders and may be trained software engineers. The Type B Data Scientist is mainly interested in using data “in production.” They build models which interact with users, often serving recommendations (products, people you may know, ads, movies, search results).
I love this distinction, because I think every Data Scientist actually lives somewhere on a scale between a Pure Type A and a Pure Type B… Below is my attempt to visualise this explanation.
I guess the top line is that the “Data Science” area is roughly defined by the common definition of a Data Scientist by Josh Wills:
Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.
There’s been a lot of articles written about Data Scientists in the last few years; my old manager Kevin Schmidt even wrote one on his blog.
Contrary to what the press has been saying for the last few years, data scientists are not “as rare as unicorns”. What is rare is people who are masters of both the Analysis and the Building sides of Data Science; just like unicorns, I haven’t met one yet, and I’m not convinced they actually exist.
But that’s okay, because you probably don’t actually need a Data Science unicorn. A Data team, just like any other team, is all about balance and complementarity: if your team already has a very good data engineer, then you probably only need an analyst with a good understanding of your data pipeline. Similarly, if you already have a good analyst with some programming skills (even if it’s excel formulas and VBA), then hiring a data engineer to build up your data platform would probably be a good idea.
It’s all about covering the most area possible with your data team (complementary skills) and having great communication: everyone in the team should understand what their teammates are doing and understand both the data and the data pipeline very well. As always, balance is key!
Another thing worth mentioning is that, in my experience, people falling inside that Data Science scope are curious by nature, and self-taught. To me, the magic formula to being a good data scientist isn’t
Great knowledge in Building + Great knowledge in Analysis; it’s probably something including fewer hard skills and a lot more soft skills, maybe along the lines of
Good understanding of Building + Good understanding of Analysis + Ability to learn new skills + Flexibility to adapt to business and environment.
In other words, good data scientists will probably strive to learn more and might grow to one day become that rare unicorn.
Or at least, I know that’s my plan ;-)