I trust the first profound attempt was in 1985. A revolutionary software changed the way we think about data. It allowed ordinary people to do extraordinary data analyses. We call it Excel, developed by Microsoft initially for Macintosh.
Since then, the field of data science has evolved and become accessible to everyone.
- Access to knowledge had phenomenal improvements. If you’ve been listening to data science-related interviews, you may have noticed one in every ten mentions Andrew Ng’s machine learning course. It’s a free online resource available for anyone aspiring to be a data scientist.
- Affordable Infrastructure — training a model costs less than a cup of coffee. Before cloud computing, people bought heavy hardware and struggled with its maintenance. You can rent one and pay much less today. If your machine learning model needs one hour of training in a 96 CPU, 192 GB RAM hardware, an EC2 instance costs only $4.08. That’s a Caffe Mocha at Starbucks.
- Open-source software rules the world. A majority of a data scientist’s toolkit is open-source software. Most are free for commercial use as well. Programming languages for data science, such as Python and R, are open source too. Unlike proprietary ones, a global community of developers supports every open-source project.
- Accessible data — collecting and maintaining data hasn’t been easier. Mobile apps track a dozen of biometrics and store them in the cloud. With a few clicks, anyone can create a survey and distribute it across the planet. Configuring cloud storage on most modern software also is not conveniently tied to your SSO.
Thanks to this improvement today, everyone enjoys the great benefits of data science. Soon all the remaining barriers will disappear too.
But will the advances in data science lead to the extinction of data scientists?
What makes a data scientist in the future?
Data literacy and critical thinking is the answer.
Exceptional mathematical skills, programming in more than one language are no longer required. Any high-school kid knows enough mathematics to begin their data science journey.
If you are a research scientist, you may have to. But not many data scientists are inventing new algorithms. Instead, they solve practical problems by using them. For them, algorithms are configurable black boxes. Their internals doesn’t matter all the time.
Likewise, you never have to learn programming to become a data scientist. Not anymore. You can use tools such as KNIME, Rapid Miner, AutoML, and Data Robot Instead. They allow you to program your logic without a programming language.
A lab coat and goggles won’t make a chemist. Likewise, programming skills won’t create data scientists. It’s only a preference.
Your kids won’t be solving the same problems you do today.
We spent the last couple of decades in data science, fitting models to real-world problems. We worked hard to make predictions accurate by tuning hyperparameters manually. Most of our energy went coding them with our bare hands and optimizing them to match the computing power.
But the landscape is changing. Hyperparameter tuning, which I thought would always remain manual, is now semi-automated. Programming too is getting out of the way with projects such as Github Copilot.
It’s fascinating to think about what’s left in data science for our kids. But there are. Their efforts will be focused more on the problem definition rather than solving them. Because machines solve their problems if they are well-defined.
Future generations won’t be fitting models and tuning them for accuracy and performance. Domain experts will take over the application, and data scientists will focus on developing the science itself.
It’s the democratization of data science. At the current rate, it won’t take another decade to realize it.
Data science becomes accessible to more people every day. Thanks to the rapid improvements in knowledge sharing, infrastructures, open-source software, and access to data, it’s not limited to high-tech companies only.
In the future, the application of data science won’t be the role of a data scientist. Domain experts will handle it themselves with great platforms such as KNIME.
The development of science will be the responsibility of data scientists. But that, too, won’t be the same as even complex things such as hyperparameter tuning and programming are automated.
Not a Medium member yet? Please use this link to become a member because I earn a commission for referring at no extra cost for you.