There hasn't been anything called MLOps five years ago. But it's an indispensable part of any data science project now.
MLOps refers to the practice of applying Applying DevOps principles to machine learning (ml) systems. MLOps helps maintain seamless integration between the development and deployment of ml models in large-scale data science projects.
In most projects, the operational aspects are enormous compared to the actual model building. Thus it often requires several roles other than data scientists.
Depending on the size of the organization and the nature of the project, data science teams can have one or many of these roles. In small and medium-sized teams, their responsibilities are often blurry too. Yet, large firms manage distinct roles and responsibilities to a reasonable extent.
This post may help you distinguish between the different roles and their responsibilities. If you're early in your data science career or hoping to switch to another part, this post may also help you decide which skillset to focus on.
Business Analyst or Domain Expert
Most people don't realize that business analysts (BA) are part of the data science team. Yet, their contribution is the most critical part of machine learning operations.
They play a translator role between the business stakeholders and the technical team. They specialize in speaking the language of both worlds.
BAs help the technical team to break down the business problem into actionable machine learning problems. Further, they assist external stakeholders with communicating the timeline, progress, and achievements of the project.
BAs rarely code, but they have excellent data literacy. They spend most of their time preparing helpful documentation and presentations. They are very good at project planning and management.
In my opinion, the most distinguishing feature of successful business analysts is their storytelling skills. If you enjoy telling stories and creatively conveying messages, this may be a fantastic role for you.
Data analysts (DA) are very similar to business analysts. But DAs have more concentration on the technical aspects than BAs.
The primary responsibility of data analysts is to draw insights from data.
For the most part, they spend time exploring data from different angles. DAs worry less about machine learning models and their production implementation. That's because they need to have an open mind to draw valuable insights. Thus this detachment helps them not to circle around a trivial solution that isn't useful.
Yet, their work is quintessential in solving the right problem.
DA's toolkit is incomplete without Excel (or any spreadsheet software). Modern DAs also care about self-service analytics platforms such as Tableau and PowerBI. Depending on the project needs, some DA's should have extensive knowledge in SQL, and others may have expertise in technologies such as GIS software.
Most DAs know how to code. They use either R or Python for exploratory data analysis.
Software Engineers and Developers
Software engineers help productize the machine learning model.
Not everyone possesses the skills to open a Jupyter notebook and run Python scripts to use a machine learning prediction. They'd use it if it's wrapped in a friendly UI instead.
Further, Software engineers worry about a range of other things that regular data scientists don't care about. They include software access control, usage statistics collection, cross-platform integration, hosting, etc.
Some level of data literacy is preferable for software developers working in a machine learning project. Yet, not all software developers excel in model-building aspects.
On the other hand, software engineering has its own skillset. They create web apps with frameworks such as Django and Flask. Most software engineers also prefer front-end frameworks such as React and Vue. React allows creating cross-platform applications for iOS, Android, and desktop. But some large-scale teams have native mobile developers too.
Data and ML Architects
ML architects worry about the entire machine learning project lifecycle. They create the structure of the project and strategies to execute them in stages. They also try to foresee any risks to completing the project and maintaining the model in production.
They also act as the connectors between data scientists, engineers, and software developers. Architects help the team choose whether to go on-premise implementation, a data warehouse, or a cloud data lake. They help the couple choose the proper data storage and retrieval strategy for optimal performance and cost.
Data architects are skilled in various tools and technologies because it's their responsibility to evaluate and pick the right one. Thus they usually possess a deeper understanding of data pipeline technologies to algorithms to front-end web frameworks.
Data engineers are the platform enablers. They ensure the relevant data is available for the machine learning project and their quality meets the required standards.
Related:Stop Firefighting Data Quality Issues.**
Data engineers mostly spend their time building data pipelines. Data pipelines ensure uninterrupted data flow from sources, a preliminary transformation of the data, and loading them to appropriate data storage. This process is commonly known as ETL.
Data Engineers use tools such as Airflow and Prefect to build ETL pipelines. They help orchestrate various individual tasks together and run them in schedules.
They are also instrumental in concepts such as data lake, data warehouse. They are well versed in database technologies such as SQL language, Postgres, MySQL, etc.
MLOps engineers ensure the automation of the model deployment to the production systems. The level of automation can be different from organization to organization.
MLOps engineers' responsibility is to take the model from a data scientist and make it available to the software that uses it. Data scientists often use Jupiter notebooks or script files to build, test, and validate their machine learning models. On the other hand, software engineers expect the machine learning model to be accessible through callable APIs such as REST.
MLOps is slightly different from DevOps. That's because machine learning projects are experimental in nature, whereas software projects are definitive (for the most part.) Also, testing a machine learning model is drastically different from software testing. Testing in data science involves prediction quality and a range of other unique challenges.
MLOps engineers usually have an excellent understanding of data pipelines. They, too, like data engineers, use technologies such as Airflow and Prefect to automate tasks. A relatively new technology that helps in MLOps is MLFlow.
Most organizations spend a large portion of their budget on data pipelines, storage, model training. However, low code libraries such as scikit learn are not optimized for different hardware. They are built to perform in everyday use cases.
For smaller projects, this doesn't matter at all. However, as you scale it up, you're infrastructure cost shoots up exponentially. This is where optimization engineers come to the rescue.
The job of optimization engineers is to convert the model built with low code libraries to perform better on the chosen hardware system. They know how to convert them to LLVM bite codes, run them in parallel, and a lot more.
What is a data science project without having a data scientist? They are the central figure in any machine learning project.
This post is mainly about the aspects of data science other than model building, testing, and evaluating (i.e., MLOps). However, this list is not complete without mentioning data scientists.
Their responsibility is to find out the correct machine learning model that solves the business problem. They try out different algorithms, tune them for optimal hyperparameters, evaluate them and validate the results with various criteria.
However, if the team is small, data scientists absorb most other responsibilities. In smaller units, data scientists are the data architects and data engineers too.
Thus the term data scientist sort of encapsulates all other roles described above.
Building algorithms is only a tiny portion of a data science project. The riddle has lots of other complicated work to be a complete solution.
In small projects, a single person or a few data scientists could build the machine learning model and productize it with technologies such as Streamlit.
However, larger organizations need many other roles, specializing in a specific aspect of the overall business problem. We address data science project tasks that are beyond model building as MLOps.
In this article, we discussed eight roles, including the data scientist role that are very common in MLOps. Along with each of them, we've discussed their responsibilities and required skillset.
Not a Medium member yet? Please use this link to become a member because I earn a commission for referring at no extra cost for you.