Text Manipulation Methods in pandas

pandas
dataframe
text-manipulation
Comprehensive guide to string methods in pandas for text data manipulation, including case conversion, searching, regex, and splitting.
Author

Mohammed Adil Siraju

Published

September 23, 2025

Text Manipulation Methods in pandas

This notebook explores pandas string methods (accessed via .str) for manipulating text data in DataFrames. Covers case changes, searching, regex, replacement, and splitting.

Introduction

Pandas provides vectorized string operations through the .str accessor. These methods work on Series of strings and are efficient for text data processing.

import pandas as pd

Sample Data

We’ll use a simple DataFrame with text data to demonstrate string methods.

data = {
    'TextData': ['Hello','World','Python', 'Pandas', 'Data Science']
}

df = pd.DataFrame(data)
df
TextData
0 Hello
1 World
2 Python
3 Pandas
4 Data Science

Case Conversion

Convert text to lowercase or uppercase using .str.lower() and .str.upper().

df['LowerCase'] = df['TextData'].str.lower()
df
TextData LowerCase
0 Hello hello
1 World world
2 Python python
3 Pandas pandas
4 Data Science data science
df['UpperCase'] = df['TextData'].str.upper()
df
TextData LowerCase UpperCase
0 Hello hello HELLO
1 World world WORLD
2 Python python PYTHON
3 Pandas pandas PANDAS
4 Data Science data science DATA SCIENCE

Searching in Text

Check if strings contain substrings with .str.contains(). Use case=False for case-insensitive search.

df['Contains'] = df['TextData'].str.contains('O', case=False)
df
TextData LowerCase UpperCase Contains
0 Hello hello HELLO True
1 World world WORLD True
2 Python python PYTHON True
3 Pandas pandas PANDAS False
4 Data Science data science DATA SCIENCE False

Regular Expressions (Regex)

Use regex with methods like .str.findall() to find patterns. Here, finding all ‘o’ characters.

df['Matches'] = df['TextData'].str.findall('o')
df
TextData LowerCase UpperCase Contains Matches
0 Hello hello HELLO True [o]
1 World world WORLD True [o]
2 Python python PYTHON True [o]
3 Pandas pandas PANDAS False []
4 Data Science data science DATA SCIENCE False []

Replacement and Splitting

Replace substrings with .str.replace() and split strings with .str.split().

df['Replaced'] = df['TextData'].str.replace('o', 'x')
df
TextData LowerCase UpperCase Contains Matches Replaced
0 Hello hello HELLO True [o] Hellx
1 World world WORLD True [o] Wxrld
2 Python python PYTHON True [o] Pythxn
3 Pandas pandas PANDAS False [] Pandas
4 Data Science data science DATA SCIENCE False [] Data Science
df['Split'] = df['TextData'].str.split(' ')
df
TextData LowerCase UpperCase Contains Matches Replaced Split
0 Hello hello HELLO True [o] Hellx [Hello]
1 World world WORLD True [o] Wxrld [World]
2 Python python PYTHON True [o] Pythxn [Python]
3 Pandas pandas PANDAS False [] Pandas [Pandas]
4 Data Science data science DATA SCIENCE False [] Data Science [Data, Science]

Best Practices

  • Handle missing values: Use .str methods which handle NaN gracefully.
  • For complex regex, test patterns separately.
  • Vectorized operations are faster than loops.

Summary

This notebook covered essential pandas string methods for text manipulation. Experiment with real datasets to master these techniques!