Filtering Data#
In geospatial analysis, filtering vector data is a common task to refine datasets based on specific conditions. Kalpa offers a powerful and flexible filtering mechanism using Python’s Pandas-style operations. This chapter demonstrates how to filter vector data using conditions on one or multiple columns, enabling you to create tailored datasets for your analysis.
Vector Filtering#
The VectorLayerFiltering
function allows you to filter a vector layer or a file containing vector data (.gpkg
/.shp
) using custom conditions written in Python’s Pandas-style operations.
Function Overview#
The VectorLayerFiltering
function applies a condition to filter rows in a vector dataset and returns the filtered result.
Key Arguments:
A file path to a vector dataset or selection of a layer (e.g.,
.gpkg
,.shp
).filter_condition
: A Python condition for filtering rows. The condition uses the format:row['column_name'] <condition> value
You can combine multiple conditions using logical operators like
and
,or
, andnot
.
Examples#
Single-Column Based Filtering#
Scenario: Filter all rows where the
Population
column is greater than 1,000.Condition String:
row['Population'] > 1000
Multi-Column Based Filtering#
Scenario: Filter rows where the
Population
is greater than 1,000 and theCity
starts with the letter ‘C’.Condition String:
row['Population'] > 1000 and row['City'].startswith('C')
Filtering Using Equality Conditions#
Scenario: Filter rows where the
City
column is equal to ‘B’.Condition String:
row['City'] == 'B'
Combining Conditions with OR#
Scenario: Filter rows where the
Population
is less than 1,000 or theCity
starts with ‘A’.Condition String:
row['Population'] < 1000 or row['City'].startswith('A')
Filtering by Distance (Geospatial Attributes)#
Scenario: Filter rows where the distance to a fault line is less than 5 km.
Condition String:
row['Fault_Dist'] < 5
Filtering Rows with Numerical Ranges#
Scenario: Filter rows where
Population
is between 500 and 1,500.Condition String:
500 <= row['Population'] <= 1500
Filtering Rows Based on String Patterns#
Scenario: Filter rows where the
City
name contains the letter ‘a’ (case-insensitive).Condition String:
row['City'].str.contains('a', case=False)
Filtering Rows with Missing or Null Values#
Scenario: Filter rows where the
geometry
column isNone
(missing).Condition String:
row['geometry'] is None
Filtering Rows Based on Multiple Conditions (Advanced)#
Scenario: Filter rows where
Population
is greater than 1,000, and theCity
does not start with ‘A’.Condition String:
row['Population'] > 1000 and not row['City'].startswith('A')
Filtering Rows Using Custom Functions#
Scenario: Use a custom function to filter rows where the
City
name length is greater than 1 character.Condition String:
len(row['City']) > 1
Filtering Geospatial Data by Attribute and Proximity#
Scenario: Filter rows where faults are older than 50 million years and within 10 km of the grid points.
Condition String:
row['Fault_Age'] > 50 and row['Fault_Dist'] < 10
Filtering Using Logical OR Conditions#
Scenario: Filter rows where the
City
is either ‘A’ or ‘C’.Condition String:
row['City'] in ['A', 'C']
Filtering by Area or Length Attributes#
For vector datasets with polygons or lines, you can filter by geometric properties such as area or length.
Scenario: Filter polygons where the area is greater than 1,000 square meters. - Condition String:
row['geometry'].area > 1000
Scenario: Filter line features where the length is less than 500 meters. - Condition String:
row['geometry'].length < 500
Tips for Writing Filtering Conditions#
Use Logical Operators: Combine conditions with
and
,or
, ornot
to create complex queries.Check Data Types: Ensure your column data types match the condition. Numeric values should not be compared to strings.
Handle Missing Values: Use Pandas-style operations like
row['column'].notnull()
to filter out rows with missing data.Validate Columns: Ensure the columns used in filtering exist in the dataset.
Optimize Conditions: Use simple, efficient conditions to avoid unnecessary computational overhead.
Test Conditions: Before applying complex filters, test them on a small subset of data to ensure correctness.
Export Results: Save filtered datasets for further analysis or visualization.