Spatial Data Synthesis Capabilities

Spatial data processing
capabilities for processing a wide range of spatial big data streams and types.

  Spatial data integration capabilities for integrating location-based social media data with authoritative geospatial data sources.  
  Spatial data presentation capabilities for interactively visualizing complex spatial relations and uncertainties associated with geospatial big data.  
  Data retrieval and storage capabilities for efficient and scalable storage of geospatial big data.  

GeoBalance: Workload-aware Management of Spatiotemporal Data

GeoBalance is the adaptive workload-aware partitioning of spatiotemporal data that considers both data and query workload. This system addresses possible workload skew due to the existence of hotspots, time-varying skew, and load spikes. It is optimized for high velocity and multi-user write-intensive geospatial applications. An evolutionary algorithm was developed for GeoBalance to modify partitions in the presence of imbalance. GeoBalance employs rolling migration of partitions to avoid disrupting other services when partitions are changing.


  • Prototyped a cyberGIS-based application for spatial big data synthesis and sharing by leveraging heterogeneous cyberinfrastructure with both cloud and HPC resources
  • Employed modern microservices architecture
  • Lowered barriers to accessing, processing, analyzing, and visualizing large raster datasets
  • Data from 18 hydrological regions (Hydrological Unit Code level 2) and 48 conterminous states available


  • Synthesized geospatial big data from social media with authoritative land use data
  • Investigated population dynamics and land use changes based on the synthesized data
  • Implemented a cyberGIS workflow using Apache Hadoop to process large streaming data from Twitter (e.g. 965 million geo-tagged tweets in 2014)
  • Developed a novel distributed point-in-polygon algorithm suitable for large vector datasets

Geospatial Operation Management

Data preparation and handling operations can be used broadly and are be crucial for data sharing and synthesis. To enable these operations we will develop a capability to allow users to be able to register and publish operations that are relevant to specific types of data in a form that will enable them to be published and easily integrated by others into their workflow.