Non-Brand Data

Non-Brand Data

Share this post

Non-Brand Data
Non-Brand Data
Deep Dive into Pandas DataFrame Merging
Copy link
Facebook
Email
Notes
More

Deep Dive into Pandas DataFrame Merging

Leverage Pandas merging functions and how it is working.

Cornellius Yudha Wijaya's avatar
Cornellius Yudha Wijaya
Sep 29, 2024
∙ Paid
6

Share this post

Non-Brand Data
Non-Brand Data
Deep Dive into Pandas DataFrame Merging
Copy link
Facebook
Email
Notes
More
1
Share

🔥Reading Time: 14 Minutes🔥

It's common in the data work to have multiple datasets from the data source or as the result of data analysis.

Sometimes, we want to merge two or more different datasets for various reasons. For example:

  • We want to integrate data from multiple data sources into one dataset for deeper analysis

  • We want to perform missing value imputation from one dataset to another dataset

  • We split the dataset to perform different analyses on each dataset, and we want to return them into one dataset

Merging datasets is possible with the available functions from the Pandas package. In this article, we will learn different functions for merging with the coding example. Let's get into it.

Thanks for reading Non-Brand Data! This post is public so feel free to share it.

Share


1. merge

The merge function is the go-to function in Pandas to perform basic dataset merging. This function would combine two datasets based on the given dataset index or column.

For example, let's create a dataset example to show how merge function works.

import pandas as pd

customer = pd.DataFrame({'cust_id': [1,2,3,4,5],
                    'cust_name': ['Maria', 'Fran', 'Dominique', 'Elsa', 'Charles'],
                   'country': ['German', 'Spain', 'Japan', 'Poland', 'Argentina']})

order = pd.DataFrame({'order_id': [200, 201,202,203,204],
                      'cust_id':[1,3,3,4,2],
                      'order_date': ['2014-07-05', '2014-07-06', '2014-07-07', '2014-07-07', '2014-07-08'],
                      'order_value': [10.1, 20.5, 18.7, 19.1, 13.5]})
Image by Author

In the above sample, we try to simulate two different datasets: customer and order data, where we have the cust_id column exists in both DataFrame.

Let's perform DataFrame merging to understand the function better.

pd.merge(customer, order)
Merged Dataset (Image by Author)

By default, the merge function has a few things already set upon:

  • The function only needs two parameters to pass: the DataFrame we want to merge,

  • Would merge by column and actively try to find the common column from both datasets,

  • Using intersection between columns values from both DataFrame (Inner Join).

Let's explore a little bit about merge function parameters that we can tweak.

Keep reading with a 7-day free trial

Subscribe to Non-Brand Data to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Cornellius Yudha Wijaya
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More