Deep Dive into Pandas DataFrame Merging
Leverage Pandas merging functions and how it is working.
🔥Reading Time: 14 Minutes🔥
It's common in the data work to have multiple datasets from the data source or as the result of data analysis.
Sometimes, we want to merge two or more different datasets for various reasons. For example:
We want to integrate data from multiple data sources into one dataset for deeper analysis
We want to perform missing value imputation from one dataset to another dataset
We split the dataset to perform different analyses on each dataset, and we want to return them into one dataset
Merging datasets is possible with the available functions from the Pandas package. In this article, we will learn different functions for merging with the coding example. Let's get into it.
1. merge
The merge
function is the go-to function in Pandas to perform basic dataset merging. This function would combine two datasets based on the given dataset index or column.
For example, let's create a dataset example to show how merge
function works.
import pandas as pd
customer = pd.DataFrame({'cust_id': [1,2,3,4,5],
'cust_name': ['Maria', 'Fran', 'Dominique', 'Elsa', 'Charles'],
'country': ['German', 'Spain', 'Japan', 'Poland', 'Argentina']})
order = pd.DataFrame({'order_id': [200, 201,202,203,204],
'cust_id':[1,3,3,4,2],
'order_date': ['2014-07-05', '2014-07-06', '2014-07-07', '2014-07-07', '2014-07-08'],
'order_value': [10.1, 20.5, 18.7, 19.1, 13.5]})
In the above sample, we try to simulate two different datasets: customer and order data, where we have the cust_id
column exists in both DataFrame.
Let's perform DataFrame merging to understand the function better.
pd.merge(customer, order)
By default, the merge
function has a few things already set upon:
The function only needs two parameters to pass: the DataFrame we want to merge,
Would merge by column and actively try to find the common column from both datasets,
Using intersection between columns values from both DataFrame (Inner Join).
Let's explore a little bit about merge
function parameters that we can tweak.
Keep reading with a 7-day free trial
Subscribe to Non-Brand Data to keep reading this post and get 7 days of free access to the full post archives.