Welcome to MRSDS!

The Methane Research Science Data System (MRSDS) organization hosts repositories related to data processing, analysis, and visualization of atmospheric methane information at the full range of spatial scales including local, regional and global. The collection of software associated with this functionality is referred to as the Multi-Scale Methane Analytic Framework (M2AF). Please refer to Jacob, et al. 2021 for additional background information about M2AF software deployed across high performance computing and cloud computing platforms. The primary visualization component for this work is the Methane Source Finder (MSF) portal, and several of the code repositories relate to the MSF front-end, back-end, and data ingestion components.

The following describes the function of each repository in the MRSDS organization:

In order to run the Methane Source Finder (MSF) web application you will need to use msf-ui (the web app), msf-be (the back-end that serves up APIs to provide data to the UI), and msf-static-layers (gridded data layers that are stored as flatfiles), plus a database with the point-source data and GeoTIFF plume-images (loaded by msf-ingestion).

In order to run the point-source data pipeline you will need msf-flow.

SDAP data ingestion and test Jupyter notebooks are in the sdap_collections and sdap_notebooks repos

Science Data Analytics Platform (SDAP)

We have demonstrated the use of the open-source Science Data Analytics Platform (SDAP) to perform basic analytics on our gridded methane data products, like our regional and global inversions. SDAP uses Apache Spark for parallel computations in the "map-reduce" style. In map-reduce computations, one or more "map" functions operate independently on different subsets of the data. Reduction operators combine the distributed map results to produce the final analytics product. The final product is expected to be smaller than the collection of input data files that were used to compute the result. In this way, SDAP performs the computations remotely, close to the data, and eliminates the need for large data file downloads.

SDAP Web Service API

Analytics requests to SDAP are done using a web service API that can be done in a variety of programming languages or in any web browser. Please refer to Jupyter Notebooks for examples of how to make calls to SDAP in Python.

SDAP Datastore

SDAP delivers rapid subsetting by using a tile-based datastore, instread of operating on many files. The data array for each variable of interest is partitioned into equal sized tiles, each covering a particular time range and coordinate bounds. These tiles are ingested into the SDAP datastore, which has two components:

  1. Solr: Hosts tile attributes, including a unique identifier for each tile, spatial bounds, time range covered, and summary statistics; Enables rapid geospatial search for tiles that intersect a user-defined bounding box.
  2. Cassandra: Hosts the actual data tiles, and enables each tile to be directly retrieved using its unique identifier, retrieved from Solr.
The SDAP algorithms rapidly access the necessary data subsets in a two-step process. First, it performs a geospatial search in Solr to extract the unique identifiers of the data tiles that intersect the time range and spatial area of interest. It then uses those tile identifiers as keys to directly access the tile data from Cassandra's key-value store.

SDAP Analytics Algorithms

The SDAP analytics algorithms that may highly relevant to multi-scale methane analysis are:

All of the SDAP computations can be constrained to a spatiotemporal bounding box.

SDAP Deployment

SDAP is deployed using Kubernetes and Helm. A functional SDAP consists of the following components:

SDAP Data Ingest

To run analytics on a dataset with SDAP you need to first "ingest" the dataset. Please refer to the SDAP helm deployment instructions to learn how to do deploy and ingest data in to SDAP. SDAP can support ingest from NetCDF4 or HDF5 files that follow CF Metadata Conventions.

As described in the SDAP documentation, you will need to compose a dataset configuration in YAML format in order to configure a dataset for ingest. Please refer to the examples we provided at Example collections.yaml.

Additional Recommendations to use SDAP

The following are some additional recommendations that we have found increase the likelihood that the SDAP ingesters will support the data files:

the end.