Spatial data often embedded with geographic references are important to numerous scientific domains (e.g., ecology, geography, and spatial sciences, geosciences, and social sciences, to name just a few), and also beneficial to solving many critical societal problems (e.g., environmental and urban sustainability). In recent years, however, this type of data has exploded to massive size and significant complexity as increasingly sophisticated location-based sensors and devices (e.g., social networks, smartphones, and environmental sensors) are widely deployed and used. The big spatial data collected from numerous sources are extensively used to instrument our natural, human and social systems at unprecedented scales while providing us with tremendous opportunities to gain dynamic insight into complex phenomena. However, to synthesize various spatial data – a foundational process of various scientific problem-solving practices – has become increasingly difficult and is not scalable to the significant size, complexity, and diversity of spatial data. Therefore, the overarching goal of this project is to establish fundamental and scalable capabilities for spatial data synthesis through integration with cyberGIS  (geographic information systems based on advanced cyberinfrastructure (CI)) and novel cloud computing strategies to enable cutting-edge data-intensive research and education across multiple scientific communities. Our project will achieve the following specific objectives:

  1. Develop a core set of community-driven and scalable capabilities for meeting the requirements of spatial data synthesis in two representative scientific case studies: measuring urban sustainability based on a number of social, environmental, and physical factors and processes, and examining population dynamics at high spatial and temporal resolutions by synthesizing multiple state of the art population data sources with location-based social media data;
  2. Establish a scalable suite of data synthesis capabilities: (1) data integration capabilities that ensure that data from different sources can be combined independently of the original format and type in which it has been produced, and (2) data aggregation capabilities that ensure scaling to accommodate varying numbers of data sources, user requests, and processing while providing response time appropriate to the handled data;
  3. Evaluate and improve these capabilities by engaging the broad cyberGIS community that span scientists across bio, computational, engineering, geo, and social sciences;
  4. Integrate the data synthesis capabilities with the CyberGIS Science Gateway to ensure open and wide access to the capabilities;
  5. Develop novel education and training materials for a large number of users to learn the capabilities and build on them to learn about scientific principles of spatial data synthesis.

Architecture for Scalable Spatial Data Synthesis.