Numpy Array to Pandas DataFrame: 3 Ways to Convert a Numpy Array to DataFrame in Python

Numpy Array to Pandas DataFrame: 3 Ways to Convert a Numpy Array to DataFrame in Python

Numpy Array to Pandas DataFrame

If you want to convert Numpy Array to Pandas DataFrame, you have three options. The first two boil down to passing in a 1D or 2D Numpy array to a call to pd.DataFrame, and the last one leverages the built-in from_records() method. You’ll learn all three approaches today, with a ton of hands-on examples.

To be perfectly honest, there are many more ways to convert a Numpy array to DataFrame, but in reality, you only need these three. Everything else is just a modification and brings no novelty to the table.

Before proceeding, it would be helpful if you already know how to Convert Python List to Pandas DataFrame, and also how to Convert Python Dictionary to Pandas DataFrame. Reading these articles isn’t mandatory, but it can’t hurt to know.

Regarding library imports, you’ll need both Numpy and Pandas today, so stick these two lines at the top of your Python script or notebook:

import numpy as np
import pandas as pd

Convert Numpy Array to Pandas DataFrame - 1D Numpy Arrays

Think of 1D arrays as vectors or distinct features in the dataset. For example, a 1D array can represent age, first name, date of birth, or job title - but it can’t represent all of them. You’d need four 1D arrays to do so.

Let’s see this in action. The following code snippet converts a 1D Numpy array to Pandas DataFrame:

arr = np.array([1, 2, 3])

data = pd.DataFrame(arr)
data

It’s just a vector of numbers, so the resulting DataFrame won’t be too interesting:

Image 1 - DataFrame from Numpy array (Image by author)

Image 1 - DataFrame from Numpy array (Image by author)

In case you want to convert a Numpy array to Pandas DataFrame with a column name, you’ll have to provide a value to the columns argument. It has to be a list, so keep that in mind:

arr = np.array([1, 2, 3])

data = pd.DataFrame(arr, columns=["Number"])
data

The resulting DataFrame has a bit more context now:

Image 2 - DataFrame from Numpy array with column name (Image by author)

Image 2 - DataFrame from Numpy array with column name (Image by author)

Now, DataFrames with only a single feature aren’t the most interesting, so let’s see how we can spice things up with multidimensional Numpy arrays.

Numpy Array to DataFrame - 2D Numpy Arrays

Think of 2D arrays as matrices. We have rows and columns, where each row represents the values for one observation, measured across multiple features (columns). Each column contains information on the same feature across multiple observations.

Let’s go through a dummy example first, just so you can grasp how to leverage Pandas to create DataFrame from an array:

arr = np.array([
 [1, 2, 3],
 [4, 5, 6],
 [7, 8, 9]
])

data = pd.DataFrame(arr, columns=["Num 1", "Num 2", "Num 3"])
data

The DataFrame has three observations (rows) measured through three features (columns):

Image 3 - DataFrame from Multidimensional Numpy array (Image by author)

Image 3 - DataFrame from Multidimensional Numpy array (Image by author)

Maybe the dummy example doesn’t paint you the full picture, so take a look at the following example if that’s the case. In it, we’re declaring a 2D Numpy array of employees.

Each row is a single observation telling us the detail of employee’s first name, last name, and email address. Each column is essentially a 1D array (vector) representing either first names, last names, or emails, across all records:

employees = np.array([
 ["Bob", "Doe", "[email protected]"],
 ["Mark", "Markson", "[email protected]"],
 ["Jane", "Swift", "[email protected]"],
 ["Patrick", "Johnson", "[email protected]"]
])

data = pd.DataFrame(employees, columns=["First name", "Last name", "Email"])
data

Here’s the resulting DataFrame:

Image 4 - DataFrame from real data in Numpy arrays (Image by author)

Image 4 - DataFrame from real data in Numpy arrays (Image by author)

And that’s how you can convert both 1D and 2D Numpy arrays to Pandas DataFrames. Let’s take a look at another way of doing the same thing, which is with the built-in from_records() method.

How to Convert Numpy Array to Pandas DataFrame with the from_records() Method

Pandas has a built-in method that allows you to convert a multidimensional Numpy array to Pandas DataFrame. It’s called from_records(), and it is specific to the DataFrame class.

Truth be told, you don’t have to use it, since it provides no advantage over the conversion approaches we’ve covered so far. But still, if you want a dedicated method, here’s how to use it:

employees = np.array([
    ["Bob", "Doe", "[email protected]"],
    ["Mark", "Markson", "[email protected]"],
    ["Jane", "Swift", "[email protected]"],
    ["Patrick", "Johnson", "[email protected]"]
])

data = pd.DataFrame.from_records(employees, columns=["First name", "Last name", "Email"])
data

The resulting Pandas DataFrame is identical to the one from the previous section:

Image 5 - DataFrame from Numpy array with from_records() (Image by author)

Image 5 - DataFrame from Numpy array with from_records() (Image by author)

And that’s how you can convert a Numpy array to Pandas DataFrame. Let’s go over some commonly asked questions next.


Numpy Array to Pandas DataFrame Q&A

This section will walk you through some common questions regarding the Numpy array to Pandas DataFrame conversion.

Q: Can Pandas Work with Numpy Arrays?

A: Yes, Pandas can work with Numpy arrays, just as well as with plain Python lists. You can declare either a bunch of 1D arrays or a single 2D Numpy array and convert it to a Pandas DataFrame by passing it into the pd.DataFrame() call. Just remember to specify the column names, otherwise, the default range index will be used.

Q: How Can You Convert a Numpy Array Into a Pandas DataFrame?

A: You can use either a call to pd.DataFrame() or the pd.DataFrame.from_records() method. Both of these work identically, and you can leverage them to convert a 2D Numpy array (matrix) to a Pandas DataFrame.

Q: How to Convert Numpy Array to DataFrame Column

A: You can use Numpy to add additional columns to an existing Pandas DataFrame. For example, the following code snippet declares a Pandas DataFrame from a 2D Numpy array:

employees = np.array([
 ["Bob", "Doe", "[email protected]"],
 ["Mark", "Markson", "[email protected]"],
 ["Jane", "Swift", "[email protected]"],
 ["Patrick", "Johnson", "[email protected]"]
])

data = pd.DataFrame(employees, columns=["First name", "Last name", "Email"])
data
Image 6 - DataFrame from 2D Numpy array (Image by author)

Image 6 - DataFrame from 2D Numpy array (Image by author)

To convert a Numpy array to a DataFrame column, you only have to declare a new Numpy array and assign it to a new column. Here’s the code:

years_of_experience = np.array([5, 3, 8, 12])

data["Years of Experience"] = years_of_experience
data

The DataFrame now has four columns instead of three:

Image 7 - Adding a DataFrame column from Numpy array (Image by author)

Image 7 - Adding a DataFrame column from Numpy array (Image by author)

And that’s all for today. Let’s make a short recap next.


Summing up Numpy Array to Pandas DataFrame

To conclude, Python’s Pandas library provides a user-friendly API for converting most common data types into Pandas DataFrames - Numpy array being one of them. This article covered three ways to convert a Numpy array to Pandas DataFrame, and these are all you need when working with Numpy.

There are some variations to these approaches, but they have nothing to do with Pandas. Learn these three, and you’ll be ready for any data analysis project coming your way.

Stay tuned to Practical Pandas website because next, we’ll explore how to add rows and columns to a Pandas DataFrame.