Pandas: Advanced Techniques for Conditional Filtering and Querying

pandas
dataframe
filtering
querying
data-manipulation
Master advanced pandas techniques for conditional filtering, boolean indexing, query methods, sorting, and ranking data. Learn efficient data manipulation strategies with practical examples.
Author

Mohammed Adil Siraju

Published

September 18, 2025

📋 What I Learnt

  • Conditional Filtering: Using boolean indexing to filter data based on conditions
  • Query Method: Writing SQL-like queries on pandas DataFrames
  • Sorting Techniques: Multi-column sorting and custom sort orders
  • Ranking Methods: Different ranking strategies for data analysis
  • Best Practices: Efficient data manipulation patterns

🎯 Prerequisites

  • Basic pandas knowledge
  • Understanding of Python data structures
  • Familiarity with DataFrame operations

Let’s dive into the advanced filtering techniques!

📊 Setup and Data Loading

First, let’s import pandas and load our sample dataset to work with.

import pandas as pd
df = pd.read_csv('example.csv')
df
Name Age City
0 Adil 23 Matannur
1 Aman 19 Vellore
2 Ziya 15 Tly
3 Zahra 9 Knr

🔍 Conditional Filtering with Boolean Indexing

Boolean indexing allows you to filter DataFrames based on conditions. This is one of the most powerful features of pandas for data manipulation.

Basic Boolean Filtering

Let’s start with simple conditional filtering using comparison operators.

Simple Condition: Filter by Age

Filter rows where Age is less than 20:

df[df['Age'] < 20]
Name Age City
1 Aman 19 Vellore
2 Ziya 15 Tly
3 Zahra 9 Knr

OR Condition: Multiple Criteria (Either condition true)

Filter rows where City is ‘Vellore’ OR Age is less than 20:

df[(df['City']=='Vellore') | (df['Age']<20)]
Name Age City
1 Aman 19 Vellore
2 Ziya 15 Tly
3 Zahra 9 Knr

AND Condition: Multiple Criteria (All conditions true)

Filter rows where City is ‘Vellore’ AND Age is less than 20:

df[(df['City']=='Vellore') & (df['Age']<20)]
Name Age City
1 Aman 19 Vellore

🔎 Query Method: SQL-like Filtering

The .query() method provides a more readable way to filter DataFrames using SQL-like syntax. It’s often cleaner than boolean indexing for complex conditions.

Basic Query Syntax

Query rows where Age is greater than 10:

df.query('Age > 10')
Name Age City
0 Adil 23 Matannur
1 Aman 19 Vellore
2 Ziya 15 Tly

Query with AND Condition

Query rows where Age > 10 AND City equals ‘Matannur’:

df.query("Age > 10 and City == 'Matannur'")
Name Age City
0 Adil 23 Matannur

Query with OR Condition

Query rows where Age < 18 OR City equals ‘Matannur’:

df.query("Age < 18 or City == 'Matannur'")
Name Age City
0 Adil 23 Matannur
2 Ziya 15 Tly
3 Zahra 9 Knr

Query with Mathematical Operations

Query with mathematical expressions - rows where Age * 2 > 30:

df.query("Age * 2 > 30")
Name Age City
0 Adil 23 Matannur
1 Aman 19 Vellore
df
Name Age City
0 Adil 23 Matannur
1 Aman 19 Vellore
2 Ziya 15 Tly
3 Zahra 9 Knr

🔄 Sorting Data

Sorting is essential for data analysis. Pandas provides flexible sorting capabilities for both single and multiple columns.

Basic Sorting by Single Column

Sort by Age in descending order:

df.sort_values('Age',ascending=False)
Name Age City
0 Adil 23 Matannur
1 Aman 19 Vellore
2 Ziya 15 Tly
3 Zahra 9 Knr

Multi-Column Sorting

Sort by Name first (ascending), then by Age (ascending) for ties:

df.sort_values(['Name', 'Age'],ascending=True)
Name Age City
0 Adil 23 Matannur
1 Aman 19 Vellore
3 Zahra 9 Knr
2 Ziya 15 Tly
df
Name Age City
0 Adil 23 Matannur
1 Aman 19 Vellore
2 Ziya 15 Tly
3 Zahra 9 Knr

🏆 Ranking Data

Ranking assigns ordinal numbers to data points. This is useful for competitions, percentiles, and comparative analysis.

Default Ranking (Average method)

Rank ages - ties get average rank:

df['Age'].rank()
0    4.0
1    3.0
2    2.0
3    1.0
Name: Age, dtype: float64

Ranking with ‘min’ Method

Rank ages - ties get minimum rank:

df['Age'].rank(method='min')
0    4.0
1    3.0
2    2.0
3    1.0
Name: Age, dtype: float64

🎉 Summary

You’ve now learned advanced pandas techniques for:

  • Conditional Filtering: Using boolean indexing with & (AND) and | (OR) operators
  • Query Method: Writing SQL-like queries with .query() for cleaner, more readable code
  • Sorting: Single and multi-column sorting with sort_values()
  • Ranking: Different ranking methods for comparative analysis

Key Takeaways

  1. Boolean Indexing: df[condition] is powerful but can get complex with multiple conditions
  2. Query Method: df.query('condition') is more readable for complex filtering
  3. Sorting: Use sort_values() with lists for multi-column sorting
  4. Ranking: Choose ranking method based on your analysis needs (average, min, max, etc.)

Next Steps

  • Practice with real datasets
  • Combine filtering with other pandas operations
  • Explore advanced query features with variables
  • Learn about method chaining for efficient data pipelines

Happy pandas coding! 🐼