Machine Learning Systems in Real-Life Production vs. Research/Academic Settings. | Machine Learning Systems in Real-Life | 1

Machine Learning Systems in Real-Life Production vs. Research/Academic Settings.

 When I started machine learning (ML) modeling, I thought it was a piece of cake. I’ve had good training from my degree program. But it was more complex than it seemed.

I’m not the only one who experienced it. The differences in ML in production and academic settings have been recognized already.

According to Andrew Ng, cofounder of Google Brain, 85% of companies’ AI projects are stuck at the proof of concept stage. In another blog, Google shared an infographic explaining how tiny the ML code is in the overall ML architecture.


Google's MLOps diagram
Google’s MLOps diagram


But anyone who did a data science or related academic program would attest that their assignments almost always had a correct answer. Besides, even researchers have different goals in educational settings.

Beginners entering ML modeling should realize the difference early in their careers. This article outlines some of the key differences between the two environments.

Before diving deep, let’s see the major differences between academia and production systems.

What are the significant differences in production ML systems vs. research settings?

There are several differences between machine learning in academia and machine learning in production:

  1. Purpose: In academia, machine learning is often used for research and experimentation, whereas in production, it is used to solve real-world problems and drive business value.
  2. Data: In academia, machine learning is often performed on small, curated datasets that may not represent the real-world data the model will encounter in production. In production, machine learning models must be trained on large, diverse, and often noisy datasets representative of the real-world data they will encounter.
  3. Model performance: In academia, machine learning models are often evaluated on their ability to achieve high accuracy on benchmark datasets. In production, the focus is often on the model’s ability to deliver business value, which may require trade-offs in terms of accuracy to achieve other goals, such as speed or scalability.
  4. Model deployment: In academia, machine learning models are often developed and evaluated in isolation, without considering factors such as scalability, security, or reliability. In production, these factors are critical and must be carefully considered when deploying a machine-learning model.
  5. Explainability and transparency: The focus is often on achieving high accuracy in academia, and explainability and transparency may not be a primary concern. In production, explainability and transparency are often critical, as the model’s decisions may have significant consequences and must be understood and trusted by stakeholders.

In production systems, data continuously changes.

The first difference in a production ML system is the nature of data. In an academic/research setting, data rarely change. But in production, it rarely is the same.

When we research, our goal is almost always to find the best model for the available data. For instance, if you’re developing a predictive model for customer churning, you’d probably have an extract of an organization’s database. The question is mainly about which model gives you the highest accuracy score.

You’d probably play around with a range of other techniques, such as feature engineering and the ensembling of models. But all these are to ensure the best possible model.

But as soon as you connect your model to a live system, you’d quickly see the model performance degrade. That’s because the properties of the underlying data have changed significantly.

An excellent example of our customer churn prediction example is COVID. When the pandemic hit the globe, almost all businesses experienced a change in their customer behavior. This data change is often referred to as data drift.

Even among the available customer base, the reason why some made a switch is not the same as before. Thus, our model trained on outdated data perform poorly. This change in Y-given X mapping is known as concept drift.

Both concept drift and data drift are challenging aspects of a production ML system. Yet, you’d rarely encounter these when you’re in academic research unless drift is the area you’re researching.

Design Priorities in Production ML Systems

Design priority in Production ML is dramatically different in academic settings.

As we already discussed, the goal in an educational setting is to find the best model. Thus the highest accuracy comes as a priority.

Most educational ML projects are not meant to run continuously. Thus optimizing for the cost may not come as a priority. You’d happily rent a high-performance cloud instance and complete your training.

But priorities change when the model is supposed to run continuously and serve users directly.

Training and serving costs take equal, if not more, weight than accuracy. Thus, in production ML systems, what we aim for is reasonable accuracy. Not the best.

The need for continuous retraining, which we’ll discuss next, further amplifies the need for cost optimization. Techniques such as transfer learning come in handy in these situations.

Related: How to Train Deep Learning Models With Small Datasets

Besides accuracy and cost, production ML has many other factors to consider.

The inference speed is one of the most critical factors. In live applications, you don’t want to keep your users waiting for an eternity to get predictions. Thus, often, predictions need to happen instantly.

Take a credit card fraud detection system. Predictions should happen in the blink of an eye whenever your customers make a purchase through your eCommerce system. Anything longer than that would create customer dissatisfaction.

Say you’ve built a fire detection system using ML. What is good if the model can finish the prediction in one hour?

Faster inference prompts a series of other design decisions. In production systems, engineers prefer simpler models at the expense of some accuracy score. Also, they can further improve prediction speed through dimensionality reduction techniques. Models on lower dimensions may perform significantly faster than full-fledged ones.

Then we have model interpretability.

Explainable AI has become a popular topic after a series of inappropriate prediction scandals. Businesses prefer ML models that are easy to explain than a black box approach. When there’s a direct impact on people’s life, interpretability is more critical than anything else.

In a recent HBR article, Reid Blackman and Beena Ammanath outline four situations when you need to explain your model predictions:

1) regulation requires it;

2) it’s essential to understand how to use the tool;

3) it could improve the system and;

4) it can help determine fairness.

Reid’s book, Ethical Machines: Your Concise Guide to Totally Unbiased, Transparent, and Respectful AI, is an excellent book to further expand your knowledge in this space.

Model training in production systems

Building and training an ML model involves a lot of conscious choices. For instance, how many layers, what activation function, cost function, etc.

Most academic projects have a shorter lifespan than live, ongoing production systems. Thus, all the choices are aimed at optimal tuning.

But production systems expect changes. Changes in data, changes in designs, etc. Thus, optimal has a different meaning in real-life ML applications.

Models need to be monitored and continuously retrained. This incurs costs. Retraining may or may not take the same feature space. Thus a feature store is preferred along with a database. To speed up retraining and model serving, often smaller models are favored.

Related: The Difference Between Data Warehouses, Data Lakes, and Data Lakehouses.

Fairness in real-life machine learning applications

Last, the models we train must be fair for all our users. This is very easy to overlook. But numerous recent ethical AI issues have made it to the top priority list of businesses adopting ML in their workstreams.

The automated decision-making is inevitable, and ML is at the center stage. But prolonged unfair decision-making could build up social issues.

Here’s a list of bias issues encountered in recent years.

Most fairness issues can be traced back to the bias in datasets. We indeed have fewer samples of underrepresented groups. Thus, the algorithms trained predominantly with the popular groups may ignore these groups. But that’s not an excuse to make a partial prediction.

In academic settings, the research results are a guide for improvement. But in production systems, it impacts peoples’ live instantly. Thus, fair representation in the training dataset and unbiased predictions are important in educational settings. But it’s absolutely crucial in production systems.

Related: How to Improve Data Quality With Data Quality Assessment?

Final thoughts

Like in any discipline, there is a difference in ML, too, between the academic and real-life approaches. Each work in conjunction to build one upon the other. Yet, if you’re taking an ML engineer career, you should know this difference early.

We discussed some prominent production ML situations that aren’t usually present in academia. Dynamic data, design priorities, model retraining, and fairness in prediction are the aspects young ML engineers should focus on more.

This post discusses the production ML differences. But also check out ML engineers’ specific data challenges in real-life projects.


Thanks for reading, friend! Say Hi to me on LinkedIn, Twitter, and Medium.

Not a Medium member yet? Please use this link to become a member because, at no extra cost to you, I earn a small commission for referring you.

Similar Posts