Data Teams Are Becoming Less Centralized, and That's Wonderful

April 22, 2022

I firmly believe there won’t be any data teams in the future.

We all have witnessed the birth and growth of data teams in the past decades. And I belong in one of them.

But, we will have to see the disappearance of it too!

Yet that’s something the data science community should celebrate.

Traditionally, every organization has dedicated units for every aspect of its business — finance, planning, etc. That’s how we thought data science, too, deserves a dedicated team.

Fair enough, it works well so far.

Yet, we also see most people with no STEM background getting into AI/ML. Not to mention, they do well in their jobs, just like anyone else.

As we advanced in uncovering new ways to use our data, we’ve also built great tools that enable anyone to become a data scientist.

The evolution of modern data science

When I started in data science, the most significant achievement at the time was low-code libraries.

Take scikit-learn, for example. It masks the implementation of how an algorithm is implemented. All the data scientists will have to worry about is its application.

For instance, to build a linear regression model for your data set, you import the LinearRegression function and call it with our dependent and independent variables, like in the example below.

from sklearn.linear_model import LinearRegression
reg = LinearRegression().fit(X, y)

More libraries Keras and PyTorch, are extended the same philosophy to neural networks and deep learning too.

We still had to have programming skills to work with low-code tools. Later came the visual analytics platforms such as KNIME. They allowed us to use sophisticated data science techniques without writing a single line of code.

Making sense from data is common sense. Coding skills aren’t the superpower of a data scientist or data engineer.

That was amazing.

Visual analytics had several benefits compared to low-code libraries.

Especially those who were starting in data science find it helpful. The drag and drop implementation of the data science pipeline helped them grasp the core concepts faster.

Even experienced data scientists find it useful for rapid prototyping and a shorter time to go live. 

It’s not the end; we’ve got AutoML!

I wouldn’t call it a drawback, but we still have to know about different algorithms and their internals to use visual analytics.

It leads us to AutoML — AI figures out which AI to work with your data.

Auto ML platforms help everyone to solve business problems with AI. It solves some pre-defined problems with arbitrary data from the user.

Related: Data Science Will Be Democratized (In Less Than 10 Years.)

For instance, you can perform a churn prediction by uploading a dataset to SageMaker Canvas and running a regression model over it. You don’t have to do feature selection, hyperparameter tuning, and other time-consuming technical work.

In minutes you get your model trained and ready for prediction. You can even deploy it as a prediction service (REST API.)

Related: 3 Ways to Deploy Machine Learning Models in Production

How is the role of data scientists becoming more embedded?

Centralized data science teams are very productive. Such units offer incredible opportunities for their members to learn new skills quickly.

Centralized data team where a group of data science professionals works to serve the needs of other business units

Also, the management can prioritize critical resources at the organizational level. It makes sure (at least from the management’s perspective) that the ROI is at its greatest.

A drawback of centralized data teams is that they are often disconnected from the actual business problem. For instance, the prediction accuracy of drug efficacies has to be higher than the market demand prediction accuracy.

Related: The #1 Mistake Companies Make When Creating Their Data Science Foundation

Related: Stop Firefighting Data Quality Issues.

In this example, it may be the person’s common sense to choose a higher threshold for drug efficacy. But real-life data science problems require tons of such domain expertise for a model to be useful.

In centralized teams, we always have a risk of business disconnection. There is no better solution than having a dedicated data scientist(s). In other words, make the data team decentralized and embedded.

Embedded data teams where each team has its own dedicated data science professional

The challenge in decentralizing data scientists is limited communication with other data scientists. It reduces opportunities for collaborative learning. The results are poor data standards across the organization and slow problem-solving.

This is where I see the real benefit of Auto ML and visual analytics.

In other words, it is now possible to extract the application layer and move it closer to business units in the entire data science workflow.

In the data science workflow, it is now possible to extract the application layer and move it closer to business units.

Each team will have either a dedicated data scientist or someone trained in data science in this new approach. This person could now build ml models and bi dashboards on top of the organizational data warehouse/data lake.

The benefit here is this person is more domain expertise than the one in the centralized team.

Well, not that decentralized!

Some tasks in the data science workflow have nothing much to do with the subject matter.

Take the case of a data engineer. A data engineer’s work is to get the right data into the data warehouse. It has very little to do with decisions such as model to be used, model accuracy, etc.

Embedding such roles to individual business units can do more harm than good. Instead, they could work in a centralized data team.

Thus, what’s more, relevant to modern organizations is a hybrid approach.

Hybrid data teams where the central team does only the maintenance of the data warehouse/data lake. The in-house data scientist in every team will work towards the application of data science such as building ml models.

Roles such as data engineer or anything that has little relevance to the application side of data science could go into a centralized data team. Roles, that closely work with business units go well embedded.

Having a centralized team for data engineering also ensures common data standards are used across the organization. This hybrid approach is no more a problem for embedded data teams. 

Final thoughts

Data science teams have discovered new heights in the past couple of decades. Along with it, we discovered new roles and responsibilities too. 

Traditionally, data teams worked in isolation as a centralized entity. It serves other business units in the organization. But the data team itself is driven by its own set of standards and KPIs. 

However, recent advancements in data science, such as Auto ML and visual analytics, make data science skills available to everyone. You don’t have formal training to build and deploy machine learning models. 

Hence, most business problems can be solved by their respective units themself without needing a data scientist. 

Only common tasks such as data engineering remain a central role in the organization. 

Thanks for reading, friend! Say Hi to me on LinkedIn, Twitter, and Medium.

Not a Medium member yet? Please use this link to become a member because, at no extra cost for you, I earn a small commission for referring you.

How we work

Readers support The Analytics Club. We earn through display ads. Also, when you buy something we recommend, we may get an affiliate commission. But it never affects your price or what we pick.

Connect with us