Creating Data for Geodata Engineering and Artificial Intelligence Modeling#

Kalpa makes it easy to prepare datasets for geodata engineering, geostatistics, and machine learning applications. By extracting information from loaded raster and vector layers, you can create structured, machine-trainable datasets tailored to your specific modeling needs. This chapter outlines how to filter, process, and extract data seamlessly.

Overview#

When you load multiple layers of raster and vector data, Kalpa allows you to extract their attributes for further analysis. This process involves creating a grid over a defined area of interest (AOI) and sampling raster values or vector attributes into a new dataset. The sampled data is saved as a vector file (.gpkg) and includes:

Raster Values: Directly sampled at grid points.
Vector Attributes: Selected columns from vector data and their distances to grid points.

For example, if you select a vector column named Fault_Age, the new layer will contain:

A column named Fault_Age, which stores the sampled attribute value.
A column named Fault_Age_dist, which stores the distance from the grid point to the nearest vector geometry.

Grid Types for Data Sampling#

Kalpa supports two primary gridding approaches to structure the sampling process:

Random Grid - Description: Randomly distributes points across the AOI. - Applications:
- Suitable for creating unbiased training datasets for machine learning models.
- Reduces spatial autocorrelation in training data, improving generalization.
- Benefits: - Provides diverse sampling across the area. - Reduces overrepresentation of specific regions or patterns.
Regular Grid - Description: Creates a grid with uniform spacing based on a specified resolution. - Applications:
- Ideal for geodata engineering tasks, such as image processing or geophysical filtering (e.g., upward and downward continuation).
- Useful for spatial modeling and interpolation.
- Benefits: - Ensures consistent coverage across the AOI. - Facilitates compatibility with raster-based algorithms.

For regular grids, the X and Y resolutions are identical, ensuring a uniform grid layout.

Step-by-Step Guide: Sampling Data#

Accessing the Sampling Tool - Navigate to Data Processing > Data Sampling to open the data sampling interface.
Defining the Area of Interest (AOI) - You must specify the region where the data will be sampled. You can define the AOI in two ways:
- Bounding Box: - Use the Bounding Box Utility to create a rectangular AOI based on an existing raster or vector layer. - Go to Vector > Bounding Box, select the layer, and save the bounding box layer.
- Vector File: - Use a vector file with complex polygon or multipolygon geometries, or point geometries, as the AOI.
Selecting Data Layers - Raster Layers:
- Select one or more loaded raster layers to sample values at grid points.
- Vector Layers: - Choose vector layers, and a dropdown with checkboxes will appear. You can select specific columns (attributes) from the vector data for sampling.
Choosing a Gridding Method - Select one of the two available gridding methods:
- Random Grid: Specify the number of points to generate.
- Regular Grid: Set the resolution of the grid.
Creating the Grid and Sampling Data 1. Click Create to generate the grid and begin sampling data. 2. Wait for the process to complete. A progress bar or notification will indicate the status.
Saving the Sampled Data - After the sampling process finishes, a saving window will appear. You can specify the file name and save the output as a vector file (.gpkg). - This vector file will be added as a new layer in the Layer Window.

Tips for Effective Data Sampling#

For large datasets, consider using an AOI that reduces the sampling region to save computational resources and time.
When working with machine learning models, using a Random Grid can help reduce sampling bias and improve model performance.
For spatially dense geodata engineering tasks, use a Regular Grid with a resolution that matches the scale of your analysis.

Creating Data for Geodata Engineering and Artificial Intelligence Modeling

Contents

Creating Data for Geodata Engineering and Artificial Intelligence Modeling#

Overview#

Grid Types for Data Sampling#

Step-by-Step Guide: Sampling Data#

Tips for Effective Data Sampling#