Pandas join() function | Docs With Examples

The join() function in Pandas allows you to combine DataFrames efficiently based on their indexes.

It is useful for merging datasets when working with relational data in Python.

Importing Pandas

Before using join(), it's essential you have Pandas installed and imported, and it's standard practice to import it with an alias:

import pandas as pd

Creating Sample DataFrames

Let's create two sample DataFrames to demonstrate how join() works:

df1 = pd.DataFrame({
    "A": [1, 2, 3],
    "B": ["x", "y", "z"]
}, index=["one", "two", "four"])  # Note: "four" does not match with df2

df2 = pd.DataFrame({
    "C": [4, 5, 6],
    "D": ["p", "q", "r"]
}, index=["one", "three", "four"])  # Note: "three" does not match with df1

Using `join()` to Merge DataFrames

The join() method merges DataFrames based on their index:

df_merged = df1.join(df2)
print(df_merged)

Output:

      A  B    C    D
one   1  x  4.0    p
two   2  y  NaN  NaN
four  3  z  6.0    r

Explanation: This joins df1 with df2 on the index, adding the columns from df2 to df1.

Specifying Join Types

By default, join() performs a left join. You can change this using the how parameter:

1. Left Join (Default)

df_left = df1.join(df2, how="left")

Output:

      A  B    C    D
one   1  x  4.0    p
two   2  y  NaN  NaN
four  3  z  6.0    r

Explanation: Keeps all rows from df1 and adds matching values from df2. Missing values are filled with NaN.

2. Right Join

df_right = df1.join(df2, how="right")

Output:

         A    B  C  D
one    1.0    x  4  p
three  NaN  NaN  5  q
four   3.0    z  6  r

Explanation: Keeps all rows from df2 and adds matching values from df1. Missing values are filled with NaN.

3. Inner Join

df_inner = df1.join(df2, how="inner")

Output:

      A  B  C  D
one   1  x  4  p
four  3  z  6  r

Explanation: Keeps only rows with matching indexes in both DataFrames.

4. Outer Join

df_outer = df1.join(df2, how="outer")

Output:

         A    B    C    D
four   3.0    z  6.0    r
one    1.0    x  4.0    p
three  NaN  NaN  5.0    q
two    2.0    y  NaN  NaN

Explanation: Keeps all rows from both DataFrames, filling missing values with NaN. Note, the order of the outer join output is determined by the sorting of the index values. Pandas automatically sorts the index in lexicographical order when performing an outer join unless explicitly modified.

Joining on Different Indexes

If the DataFrames have different indexes, join() aligns them automatically:

df1 = pd.DataFrame({"A": [1, 2]}, index=["one", "two"])
df2 = pd.DataFrame({"C": [3, 4]}, index=["two", "three"])

df_joined = df1.join(df2, how="outer")
print(df_joined)

Output:

         A    C
one    1.0  NaN
three  NaN  4.0
two    2.0  3.0

Explanation: This ensures all data is preserved, filling missing values with NaN.

Key Takeaways

join() merges DataFrames based on their index.
Use how="left", "right", "inner", or "outer" to specify join types.
It is best used when DataFrames share the same index structure in your Python projects.

Practice Exercise

Here's a simple challenge, open up your Python editor and try to create two DataFrames with different indexes and perform an outer join:

df1 = pd.DataFrame({"X": [10, 20]}, index=["a", "b"])
df2 = pd.DataFrame({"Y": [30, 40]}, index=["b", "c"])

df_result = df1.join(df2, how="outer")
print(df_result)

Wrapping Up

Pandas join() is a powerful method for merging DataFrames efficiently when working with indexed data. Understanding how to use it correctly helps streamline data analysis tasks. Happy coding!

Pandas join() function | Docs With Examples

Importing Pandas

Creating Sample DataFrames

Using join() to Merge DataFrames

Specifying Join Types

1. Left Join (Default)

2. Right Join

3. Inner Join

4. Outer Join

Joining on Different Indexes

Key Takeaways

Practice Exercise

Wrapping Up

Learn More

Using `join()` to Merge DataFrames