Data Science Tech Stack 2020

665 2021-10-27 10:10

There is a lot of things to know in Data Science.

If we tried to survey the technologies used by Data Scientists, we might get a picture like this:

This list is in no way comprehensive (this is already filtered based on my personal interests). On top of that, this list will change year-to-year. Keeping up with everything will be impossible. Thankfully, no Data Scientists will need to know or use all of these tools.

But since we should strive to be T-shaped people, we should at least learn a good chunk of these technologies, right? But where do we begin? How much should we learn, and what technologies? First, we should discuss briefly the nature of the profession itself.

Data science is a broad and loosely defined field.

Data science is still relatively young and continues to evolve. Once popularised by the tagline "sexiest job of the 21st century", many people were attracted to the interesting profession.

What may have begun as an application of statistics to solve business problems, is now a name that encompasses areas of big data engineering, visualisation, machine learning, deep learning and artificial intelligence. The rapid evolution was in part due to the breadth of areas in which data science can be applied to, but also because the technologies have also developed rapidly. The number of skills that a Data Scientist must possess has grown with the nebulous definition of the job.

I imagine a one-person Data Scientist in a small organisation would have a different set of tasks to do compared to a Data Scientist in a team within a large organisation. I also imagine that the exact job will depend very much on the industry and the nature of the organisation. Compounded with the rapid decrease in job tenure (or an increase in job mobility), this variance in the job description requires the practising Data Scientist to keep up with a large number of skills and technologies.

If we had to group some of the subfields of data science, they would look something like this:

  • Data Analysis: This part of the job is about understanding the data. It involves data wrangling, exploratory data analysis, and "explanatory" data analysis. In a larger team, dedicated Data Analysts will perform these tasks.
  • Data Visualisation: This part of the job is about communicating the data, usually to a non-technical audience. In a larger team, dedicated Business Intelligence Analysts will perform these tasks, although this can be a part of the Data Analyst's duties.
  • Machine Learning: This part of the profession is probably where the "sexy" comes from. Using regression, classification, and clustering to solve a wide range of problems including computer vision and natural language processing. Sometimes, the people who develop new and better ways of solving problems through machine learning are called Machine Learning Scientists and the people who implement the solutions are called Machine Learning Engineers.
  • Data Engineering: This part of the field has become so important that Data Engineers are more in demand than Data Scientists. To do data science, we need data and tools. Making these available is what data engineering is about.
  • Cloud DevOps: More and more, both the data and the tools required to do data science are being made available on the cloud. Navigating a large number of cloud products, managing the scalable infrastructure, and managing the access and security are the duties of the Cloud DevOps Engineers.
  • Web Development: This part might seem out of place, but if we consider the end-to-end data science projects, then the web is most likely the prototyping or deployment solution. In larger teams, there may be a team of Front-End and Full-Stack Developers.

Sure, these groupings are not clear cut and there are overlaps. At least, these groups give me some way of organisation. I have noted in the brief descriptions, these roles can be carried out by dedicated specialists in the team. But in a small organisation, it could be up to the one-person generalist data scientist to carry out all of these functions.

Whether we need to perform all of these roles or not, it would be helpful to understand a little bit about what other people in the team do. Or perhaps you are looking to switch your career track, say from a data analyst to a data engineer or from a web developer to a machine learning engineer, in which case, you will benefit by knowing something about everything.

Coming back to the tech stack, we can (loosely) group the technologies according to these roles.

EDIT: (Notes on what these are added towards the end of the post)

全部評論

·