GeoParquet

Purpose

Describe how GeoParquet supports scalable vector data storage, analytics, and interchange in modern geospatial data systems.

Outline

  • Relationship between Parquet columnar storage and geospatial vector workloads
  • Geometry encoding, coordinate reference systems, and metadata expectations
  • Partitioning and indexing strategies for spatial and temporal queries
  • Integration patterns with data lakes, query engines, and ML feature generation
  • Tradeoffs compared with GeoJSON, Shapefiles, spatial databases, and tiled vector formats

Later Examples

  • Designing a partitioned GeoParquet dataset for repeated analysis
  • Reading only required columns and spatial subsets
  • Preparing vector features for a geospatial ML pipeline