import pandas as pdPandas: Advanced Techniques for Conditional Filtering and Querying
📋 What I Learnt
- Conditional Filtering: Using boolean indexing to filter data based on conditions
- Query Method: Writing SQL-like queries on pandas DataFrames
- Sorting Techniques: Multi-column sorting and custom sort orders
- Ranking Methods: Different ranking strategies for data analysis
- Best Practices: Efficient data manipulation patterns
🎯 Prerequisites
- Basic pandas knowledge
- Understanding of Python data structures
- Familiarity with DataFrame operations
Let’s dive into the advanced filtering techniques!
📊 Setup and Data Loading
First, let’s import pandas and load our sample dataset to work with.
df = pd.read_csv('example.csv')
df| Name | Age | City | |
|---|---|---|---|
| 0 | Adil | 23 | Matannur |
| 1 | Aman | 19 | Vellore |
| 2 | Ziya | 15 | Tly |
| 3 | Zahra | 9 | Knr |
🔍 Conditional Filtering with Boolean Indexing
Boolean indexing allows you to filter DataFrames based on conditions. This is one of the most powerful features of pandas for data manipulation.
Basic Boolean Filtering
Let’s start with simple conditional filtering using comparison operators.
Simple Condition: Filter by Age
Filter rows where Age is less than 20:
df[df['Age'] < 20]| Name | Age | City | |
|---|---|---|---|
| 1 | Aman | 19 | Vellore |
| 2 | Ziya | 15 | Tly |
| 3 | Zahra | 9 | Knr |
OR Condition: Multiple Criteria (Either condition true)
Filter rows where City is ‘Vellore’ OR Age is less than 20:
df[(df['City']=='Vellore') | (df['Age']<20)]| Name | Age | City | |
|---|---|---|---|
| 1 | Aman | 19 | Vellore |
| 2 | Ziya | 15 | Tly |
| 3 | Zahra | 9 | Knr |
AND Condition: Multiple Criteria (All conditions true)
Filter rows where City is ‘Vellore’ AND Age is less than 20:
df[(df['City']=='Vellore') & (df['Age']<20)]| Name | Age | City | |
|---|---|---|---|
| 1 | Aman | 19 | Vellore |
🔎 Query Method: SQL-like Filtering
The .query() method provides a more readable way to filter DataFrames using SQL-like syntax. It’s often cleaner than boolean indexing for complex conditions.
Basic Query Syntax
Query rows where Age is greater than 10:
df.query('Age > 10')| Name | Age | City | |
|---|---|---|---|
| 0 | Adil | 23 | Matannur |
| 1 | Aman | 19 | Vellore |
| 2 | Ziya | 15 | Tly |
Query with AND Condition
Query rows where Age > 10 AND City equals ‘Matannur’:
df.query("Age > 10 and City == 'Matannur'")| Name | Age | City | |
|---|---|---|---|
| 0 | Adil | 23 | Matannur |
Query with OR Condition
Query rows where Age < 18 OR City equals ‘Matannur’:
df.query("Age < 18 or City == 'Matannur'")| Name | Age | City | |
|---|---|---|---|
| 0 | Adil | 23 | Matannur |
| 2 | Ziya | 15 | Tly |
| 3 | Zahra | 9 | Knr |
Query with Mathematical Operations
Query with mathematical expressions - rows where Age * 2 > 30:
df.query("Age * 2 > 30")| Name | Age | City | |
|---|---|---|---|
| 0 | Adil | 23 | Matannur |
| 1 | Aman | 19 | Vellore |
df| Name | Age | City | |
|---|---|---|---|
| 0 | Adil | 23 | Matannur |
| 1 | Aman | 19 | Vellore |
| 2 | Ziya | 15 | Tly |
| 3 | Zahra | 9 | Knr |
🔄 Sorting Data
Sorting is essential for data analysis. Pandas provides flexible sorting capabilities for both single and multiple columns.
Basic Sorting by Single Column
Sort by Age in descending order:
df.sort_values('Age',ascending=False)| Name | Age | City | |
|---|---|---|---|
| 0 | Adil | 23 | Matannur |
| 1 | Aman | 19 | Vellore |
| 2 | Ziya | 15 | Tly |
| 3 | Zahra | 9 | Knr |
Multi-Column Sorting
Sort by Name first (ascending), then by Age (ascending) for ties:
df.sort_values(['Name', 'Age'],ascending=True)| Name | Age | City | |
|---|---|---|---|
| 0 | Adil | 23 | Matannur |
| 1 | Aman | 19 | Vellore |
| 3 | Zahra | 9 | Knr |
| 2 | Ziya | 15 | Tly |
df| Name | Age | City | |
|---|---|---|---|
| 0 | Adil | 23 | Matannur |
| 1 | Aman | 19 | Vellore |
| 2 | Ziya | 15 | Tly |
| 3 | Zahra | 9 | Knr |
🏆 Ranking Data
Ranking assigns ordinal numbers to data points. This is useful for competitions, percentiles, and comparative analysis.
Default Ranking (Average method)
Rank ages - ties get average rank:
df['Age'].rank()0 4.0
1 3.0
2 2.0
3 1.0
Name: Age, dtype: float64
Ranking with ‘min’ Method
Rank ages - ties get minimum rank:
df['Age'].rank(method='min')0 4.0
1 3.0
2 2.0
3 1.0
Name: Age, dtype: float64
🎉 Summary
You’ve now learned advanced pandas techniques for:
- Conditional Filtering: Using boolean indexing with
&(AND) and|(OR) operators - Query Method: Writing SQL-like queries with
.query()for cleaner, more readable code - Sorting: Single and multi-column sorting with
sort_values() - Ranking: Different ranking methods for comparative analysis
Key Takeaways
- Boolean Indexing:
df[condition]is powerful but can get complex with multiple conditions - Query Method:
df.query('condition')is more readable for complex filtering - Sorting: Use
sort_values()with lists for multi-column sorting - Ranking: Choose ranking method based on your analysis needs (average, min, max, etc.)
Next Steps
- Practice with real datasets
- Combine filtering with other pandas operations
- Explore advanced query features with variables
- Learn about method chaining for efficient data pipelines
Happy pandas coding! 🐼