ChatGPT - Are Data Science Jobs Now Obsolete?

I asked ChatGPT to do 5 things, from explaining a formula to creating a neural network model. The results got me thinking…

ChatGPT is an artificial intelligence chatbot developed by OpenAI. It specializes in dialogue, but you can use it for things as advanced as code generation. Naturally, software engineers and data scientists got scared for their job safety, but should they? Can AI replace tech professionals?

Media tends to blow topics like this out of proportion, but after trying ChatGPT for Data Science and Machine Learning topics, I can say I was never so amazed and frightened at the same time.

To make sense of what ChatGPT brings to the table, I decided to ask it the following questions:

Explain the bias-variance tradeoff in simple terms
Explain the linear regression formula in machine learning
Write a machine learning model for predicting passenger survival on the Titanic dataset using a Random Forest classifier
Write a neural network in TensorFlow for classifying handwritten digits of the MNIST dataset
How to make a REST API around a machine learning model in Python with FAST API?

Want to know the answers? Continue reading, or watch the video instead:

Q1: Explain the Bias-Variance Tradeoff in Simple Terms

In machine learning and statistics, the concept of a bias-variance tradeoff often occurs when discussing the complexity of machine learning models. Too few parameters result in high bias and low variance, and too many parameters do the opposite. You need to find the right balance so you don’t overfit or underfit your data.

Here’s what ChatGPT thinks of the bias-variance tradeoff:

Image 1 - ChatGPT bias-variance tradeoff (image by author)

Well, isn’t this spot on? No reason to worry yet, it’s just a generic question. Let’s ask another one before asking ChatGPT to generate code.

Q2: Explain the Linear Regression Formula in Machine Learning

Linear regression is a fundamental algorithm in statistics and machine learning. It models a linear relationship between the dependent variable Y and the independent variable X (or a set of independent variables).

Let’s see what ChatGPT thinks of it:

Image 2 - ChatGPT linear regression formula and explanation (image by author)

It’s safe to say all four paragraphs of generated explanations are accurate. ChatGPT even explained the line equation and covered how each parameter works. The one thing it failed to capture was the case of multivariate linear regression, or linear regression when you have multiple dependent variables (predictors).

Nevertheless, I have no doubt it knows more about multivariate linear regression than me, you just have to ask it explicitly.

Q3: Write a Machine Learning Model for Predicting Passenger Survival on the Titanic Dataset using a Random Forest Classifier

I decided to explicitly ask ChatGPT to generate the code for training a machine learning model on the Titanic dataset. To make the matter more secure, I asked it to use a Random Forecast classifier algorithm.

Here’s what ChatGPT came up with:

Image 3 - Titanic dataset forecast with ChatGPT (image by author)

This is more or less accurate. ChatGPT knows about the Titanic dataset since it’s aware of the columns it has. It abstracts away the loading phase to a load_titanic_dataset() function, so you’ll have to figure out manually how to do it.

Also, it builds a Random Forest classifier model only on two input features - age and gender. There are many more available in the dataset that would require further feature engineering, so it looks like ChatGPT has cut some corners here. It also generated the code for making predictions on new data.

Overall, I’m impressed, especially by how it comments on each portion of the code.

I’ve asked the same question once again:

Image 4 - Titanic dataset forecasting instructions by ChatGPT (image by author)

This time, ChatGPT didn’t generate a single line of code. Instead, it returned a generic step-by-step machine learning pipeline. It’s in no way tied to the Titanic dataset specifically, but you can use this pipeline to work on the dataset.

Q4: Write a Neural Network in TensorFlow for Classifying Handwritten Digits of the MNIST Dataset

Now, ChatGPT can’t write the code needed to train a neural network model, right? Wrong.

I asked it to write the entire model for the MNIST dataset with TensorFlow, and this is what it came up with:

Image 5 - ChatGPT TensorFlow neural network model example (image by author)

Overall, it loaded the dataset, flattened it, scaled the flattened arrays to values between 0 and 1, and successfully trained/evaluated the model. I’m impressed, to say at least.

MNIST is known as a “hello world” dataset in deep learning, so there’s a lot of information online on how to train a neural network model on it. Still, asking a question this way is much faster than figuring things out manually after a Google search.

Q5: How to Make a REST API Around a Machine Learning Model in Python with FAST API?

As data scientists and machine learning engineers, we often have to make our models accessible to the public. That’s where REST APIs come into play, and FastAPI is one of the most popular Python libraries for the job.

So, I asked ChatGPT to explain how I can make a REST API around a machine learning model:

Image 6 - Instructions for a REST API around a machine learning model (image by author)

ChatGPT assumed I have a model ready in a Pickle format, which probably is the case if I’m thinking about deployment. It unnecessarily imported sklearn.datasets module, but everything else looks good. It’s a valid prototype for deploying machine learning models, and it also included a list of steps and instructions.

If you don’t understand some terms, such as API endpoint, you can just Google it. Or even better, ask ChatGPT instead!

ChatGPT - How will it Change Data Science and Machine Learning?

I’ve only asked ChatGPT some fundamental questions that are well-documented on the web. It’s nothing you wouldn’t stumble upon by using Google Search, but ChatGPT takes that to a whole new level. You don’t need to do manual research to see what works - instead, just ask a detailed question and ChatGPT will do its best to be the one-stop-shop you always wanted.

I think the TensorFlow example best demonstrated how ChatGPT can be a be-all-end-all solution for simple problems since it generated the entire code needed to train and evaluate the model. Other times, either manual intervention or additional questions were required.

So, what does this mean for the future of data science jobs?

ChatGPT is still in a research preview version, and nobody knows what the future brings. It gave me a lot of WTF moments when writing this article because I didn’t expect it to do so well. Overall, I think it gives pretty accurate answers to any question a junior data scientist or a software engineer would ask. It can generate the baseline code which you can then tweak for your specific problem. On the other hand, it always gives a different answer to the same question, which can be both a good and a bad thing.

I don’t see ChatGPT making tech professionals obsolete, at least not yet. Instead, I see it as a tool that will eliminate some boring aspects of our jobs. For example, it can’t design an entire highly-specific system architecture at once, but it can likely assist you in every step. Good news if you’re a senior or a solution architect, but not so good if you’re just starting out.

Overall, it’s just too soon to tell. We’ll have to wait and see how companies will adopt ChatGPT and its future versions, and what will happen to the tech job market in the process.

What do you think of ChatGPT and its impact on the future of the job market? Let me know in the comment section below.