Can Data Scientists Completely Replaced by Generative AI?
Is it the end of the era for data scientist hype?
Data science and machine learning have grown significantly in recent years, but 2023 has truly put AI on the map. I am sure that almost everyone has heard about AI and what it can do, which has led many businesses to integrate AI into their operations eagerly.
Why has AI become more prominent now than ever, even though it's been around for a long time? I would argue it's all about the accessibility of generative AI and its transformative impact on our work.
Before ChatGPT, virtually no generative AI was easily accessible to the general population and had broad market appeal. However, when ChatGPT arrived, people were shocked, especially by how easily this AI model could positively impact their businesses and daily lives.
It's also affecting the work of data scientists. Various publications have mentioned how AI could potentially replace, or at least automate, part of their tasks. For example, articles in bussinessinsider and datafloq mention that AI might replace data science jobs.
Essentially, the argument that AI could replace data scientists is constructed around the following points:
Data scientists’ programming activity, such as machine learning development, could be replaced by AI,
The data analysis task can be delegated to AI, allowing it to explain the findings to us,
Even production and maintenance possibly be replaced by AI.
Considering the above points, it might seem that most of the work supposedly replaceable by AI is the technical work that constitutes part of a data scientist's responsibilities. But is that all a data scientist does in their job?
I would argue that a data scientist's job extends beyond the technical aspects, as there are many things to understand before starting a project. Even within the technical work, it might be difficult for AI to take over completely.
Here are some reasons why I think that generative AI might not be able to replace data scientists entirely.
Understanding the Business Requirements
No matter how advanced the generative AI is, there are times when the model might not be flexible enough to meet the business requirements. We can’t expect the model to run its course without the data scientist's presence to help understand the business need.
My previous newsletter might also help you learn how data scientists could learn business to improve.
Data Preparation and Processing
Generative AI might handle all the basic data preparation and processing while providing acceleration but meeting specific project needs still requires the work of a data scientist. A data scientist often needs to execute the intricate tasks of preparation and processing.
Of course, if the process has already been streamlined, it is possible to automate the preparation and processing. However, it still requires the expertise of a data scientist to determine what should be automated.
Interpretation of Results
Generative AI models can produce impressive outputs, but interpreting those outputs meaningfully often requires human knowledge and domain expertise.
Relying solely on the interpretation from generative AI means depending solely on the model's interpretation, which might be misleading. This is particularly true if there are current events and business needs the AI hasn't considered or in cases of generative AI hallucination.
It is then the data scientist’s responsibility for interpreting the results and communicating them to non-technical stakeholders.
Ethics and Bias Considerations
Generative AI models can inadvertently perpetuate or even amplify biases in the data they're trained on. This may lead to discrimination against certain social groups, causing disadvantageous conditions.
Avoiding ethical and bias issues is certainly a human responsibility, and that's where data scientists come in. This consideration is something that generative AI might not be able to resolve without human intervention.
I recently wrote about Social Bias in NLP models, which is closely related to current Generative AI. You can read more about it here.
Research and Development
It would be amazing if the Generative AI could self-develop its model for further improvement, which has already been proven by various existing studies such as Sheng and Padmanabhan (2022) study regarding Self-Programming AI.
However, humans still need to guide the direction of research and development. Self-developing models wouldn't exist without data scientists developing and steering them in the right direction.
This might seem over-the-top imagination, but leaving everything to AI could potentially lead to a dystopian future where AI takes over. It might not happen, but it's a possibility.
Conclusion
In short, I don't believe that generative AI will completely replace what data scientists do.
However, generative AI will certainly accelerate the work of data scientists and will become one of the tools used to help solve business problems.
We don't need to fear AI will replace our jobs; instead, we should focus on developing our skills.
Thank you for reading the Non-Brand Data Newsletter. If you found this helpful post, please share it with your friends. Also, I encourage you to comment on any topics you'd like me to write about!