Using the STAC Specification#

Purpose of the STAC Item descriptors#

The STAC Item specification is used to encode the metadata of the resources that may be created, shared and published in the AIOPEN Service. Concretely, this currently applies to trained models and to training data.

Complete examples of STAC Item descriptors for both resource types are provided in the Sharing and Publishing section of the Developer Manual.

A web-based STAC Validator tool has been integrated in the Development Services to facilitate the creation and validation of the STAC Items. See: Using the STAC Validator.

When STAC Item descriptors containing resources metadata are pushed in a GitHub repository monitored by the Service, these are automatically registered either in the user workspace Local Catalogue, or in the Service Global Catalogue. The destination Catalogue depends on the git branch in which the file is pushed:

  • STAC Item descriptors pushed (or merged) in the develop branch are registered in the user workspace Local Catalogue.

  • STAC Item descriptors pushed (or merged) in the main branch are registered in the Service Global Catalogue.

Registering a resource in the Global Catalogue allows publishing it on the Marketplace where it may be discovered by all the visitors (anonymous and authenticated). It is thus crucial to include in the STAC Item descriptors accurate and sufficient information about the resources.

The following sections describes the different pieces of information that must, of should, be included in the STAC Item descriptors, and explain how this must be done to be properly managed in AIOPEN.

Trained Models Information#

The STAC Items must be valid and must include all the information marked as REQUIRED in the core STAC specification and in the STAC extensions in use. Information indicated as Recommended is not used by the AIOPEN service but is displayed in the Details pages of the Marketplace to help users determine if a given resource meets their needs. It also informs on how data (e.g. satellite imagery) must be pre-processed before being used as inference input to obtain predictions.

Specific content is also required to ensure the resources can be shared or published in the AIOPEN platform. Required information is different if the resource is a trained model or a training dataset.

Required Information#

The following table describes the information that is either required or recommended to be included in the STAC Items representing trained models:

Element / Field

Required

Comment

STAC extension mlm

Required

Trained models must use version 1.3.0 of the mlm STAC extension (see Trained Model assets).

Asset with role mlm:model

Required

The href of this asset must refer to the MLmodel file generated by MLflow and stored in a worskpace S3 bucket (see Trained Model assets).

properties/mlm:name properties/mlm:architecture properties/mlm:tasks

Required

These properties are defined as required in the mlm extension specifiation.

properties/mlm:input

Required

This field provides the characteristics of the model input (e.g. bands, shape, datatype) and describes the transformation (pre-processing) between the EO data and the input value.

properties/mlm:output

Required

This field describes model outputs and how to interpret them (e.g. classes).

properties/status

Optional however

This property is required to publish or unpublish a model. If not specified the STAC Item is ignored (see Publish & Unpublish status).

properties/type

Optional

If the type property is also provided, this must comply with the STAC extension. The type must thus have the value model.

Training Data Information#

Required Information#

The following table describes the information that is either required or recommended to be included in the STAC Items representing training data:

Element / Field

Required

Comment

STAC extension ml-aoi

Required

Training data must specify this STAC extension (see Training Data assets: label or feature).

Asset with field ml-aoi:role and value feature or label

Required

The href of this asset must refer to the actual data file or folder in a workspace S3 bucket (see Training Data assets: label or feature).

properties/status

Optional however

This property is required to publish or unpublish training data. If not specified the STAC Item is ignored (see Publish & Unpublish status).

properties/type

Optional

If the type property is also provided, this must comply with the STAC extension. The type must thus have the value TrainingData.

Publish & Unpublish status#

As explained in the Sharing and Publishing section of the Developer Manual, resources may be published but also unpublished from the catalogues. In order to publish or unpublish a resource, the resource status in the corresponding STAC Item descriptor must updated and the file must be pushed again in GitHub.

The target status must be specified in properties/status as follows:

  • "status": "publish" (or "published") to register the resource in the catalogue (and thus publish to the Marketplace).

  • "status": "unpublish" (or "unpublished") to unregister the resource from the catalogue (and thus remove from the Marketplace).

Example to publish a new resource or modify a resource already published (with the same id):

{
  "type": "Feature",
  "stac_version": "1.0.0",
  "id": "model-deforestation",
  "properties": {
    "title": "Deforestation tracking using U-Net",
    "description": "Deforestation-tracking model using Sentinel-2 data",
    "status": "published"
  }
}

Example to unpublish a resource:

{
  "type": "Feature",
  "stac_version": "1.0.0",
  "id": "model-deforestation",
  "properties": {
    "title": "Deforestation tracking using U-Net",
    "description": "Deforestation-tracking model using Sentinel-2 data",
    "status": "unpublished"
  }
}

Target catalogue collection#

Resource developers and providers may choose in which catalogue collection they want to register their resources. It is typically the name or organisation of the user publishing the resources but this is not mandatory.

The collection identifier must be provided in the collection field.

For example:

{
  "type": "Feature",
  "stac_version": "1.0.0",
  "id": "model-deforestation",
  "collection": "kplabs",
  "properties": {
    "title": "Deforestation tracking using U-Net",
    "description": "Deforestation-tracking model using Sentinel-2 data",
    "status": "published"
  }
}

Note

The identifier of the catalogue collections in which resources are published is in reality <collection-id>:published. This allows the Marketplace to filter and only display the resources located in *:published collections.

Reference to the resource assets#

STAC Item descriptors represent either a trained model or a training dataset and each descriptor must contain the reference to the actual resource files (assets) stored in on of the user workspace buckets.

Trained Model assets#

Initially, AIOPEN was using the ml-model STAC extension to include the reference to the model assets. This extension has been deprecated in 2024 and the version 1.3.0 of the “mlm” STAC extension must be used instead.

It is thus mandatory to declare the extension URL in the STAC Item descriptor. Optionally, the “file” STAC extension may be used to indicate the size of the model assets.

"stac_extensions": [
  "https://stac-extensions.github.io/mlm/v1.3.0/schema.json",
  "https://stac-extensions.github.io/file/v2.1.0/schema.json"
]

In both cases, the asset “roles” is checked. They must contain either “ml-model:inference-runtime” or “mlm:model”.

The link (href) must refer to the “MLmodel” files generated by MLflow. The service will automatically take into account all the files (objects) having the same prefix (thus in the same “folder” and in the “sub-folders”).

Example using the mlm STAC extension and:

  • experiment ID = 2

  • run ID = 69f168eaebc04b99af345720d34e6264

  • model name = model (default value in MLflow)

"assets": {
  "inferencing-compose": {
    "href": "s3://developer-modelrepo/2/69f168eaebc04b99af345720d34e6264/artifacts/model/MLmodel",
    "type": "application/yaml; application=mlflow",
    "title": "Model inference runtime definition",
    "file:size": 12345,
    "roles": [
      "mlm:model"
    ]
  }
}

Training Data assets: label or feature#

STAC Item descriptors representing training data must use the ml-aoi STAC extension (ml-aoi extension ).

It is thus mandatory to declare the extension URL in the STAC Item descriptor. Optionally, the “file” STAC extension may be used to indicate the size of the training data files.

"stac_extensions": [
  "https://stac-extensions.github.io/ml-aoi/v0.2.0/schema.json",
  "https://stac-extensions.github.io/file/v2.1.0/schema.json"
]

The data files must be referred to using asset entries ).

Instead of defining asset roles (to be included in the roles array), the ml-aoi STAC extension defines fields to be included directly in the asset definition. The “roles” field is then optional.

The field to be used to indicate that an asset contains labels or features is ml-aoi:role, with the value label or feature, respectively.

Multiple assets of type label or feature may coexist in the same STAC Item.

For example:

"assets": {
  "data-files": {
    "ml-aoi:role": "feature",
    "href": "s3://developer-data/path/to/my/dataset",
    "type": "image/tiff; application=geotiff",
    "title": "Training data files",
    "file:size": 1324543""
  }
}

Resource versioning#

Altough not mandatory, it is a recommended to version shared and published resources. When specified, the resource version is displayed in both the Marketplace main page (displaying resource cards) and in the resource details pages.

Note that the Marketplace does not allow searching or filtering on the resource version. Also, when multiple versions of the same resource exist, it is up to the user to identify the one to use (most frequently the most recent one).

The version information displayed by the Marketplace must be located in the version field in the properties section of the STAC Items. This field is defined in the the “version” STAC extension.

This extension also defines two boolean fields experimental and deprecated and a number of relation types, which are not used by the Marketplace, but may be used by the users who are discovering the resources using the catalogue API.

STAC Items that include version information should thus indicate that they comply with the related schema:

"stac_extensions": [
  "...",
  "https://stac-extensions.github.io/version/v1.2.0/schema.json"
]

Version information is included in the STAC Item properties:

"properties": {
  "version": "1.2.0",
  "...": "..."
}

Terms and Conditions (license)#

A user who want to use (order or execute) a resource that is given a license property, must accept the license before being allowed to proceed.

The resource license may be specified using a STAC Item property or a link:

Example using the license property field:

{
  "type": "Feature",
  "stac_version": "1.0.0",
  "id": "EuroSAT-subset-train-sample-59-class-SeaLake",
  "properties": {
    "license": "SPDX-License-Identifier: MIT",
    "<other-properties>": "...",
  }
}

Example using a license link:

"links": [
  {
    "rel": "license",
    "href": "https://www.gnu.org/licenses/gpl-3.0.html",
    "type": "text/html",
    "title": "GPL-3.0"
  }
]

Contact persons#

The “contact” STAC extension is used to specify contact information such as the name and coordinates of the resource developers and providers.

The extension must be declared in the STAC Item descriptor, next to the mlm or the ml-aoi extension:

"stac_extensions": [
  "https://stac-extensions.github.io/mlm/v1.3.0/schema.json",
  "https://stac-extensions.github.io/contacts/v0.1.1/schema.json"
]

The contact information is included in the STAC Item descriptor under the contacts property. The value is a list (array) and thus allows specifying multiple contacts.

See the full specification for the For example:

"properties": {
  "contacts": [
    {
      "name": "KP Labs",
      "organization": "KP Labs",
      "phones": [
        {
          "value": "+12345678933",
          "roles": [
            "work"
          ]
        }
      ],
      "emails": [
        {
          "value": "aiopen@example.com",
          "roles": [
            "work"
          ]
        }
      ]
    }
  ]
}

Themes#

Assigning themes to resources helps the end users in choosing the model or datset that best suit their needs.

The AIOPEN Marketplace does not yet allow searching or filtering on theme values however this information is provided in the resource details pages.

The “themes” STAC extension is used to specify contact information such as the name and coordinates of the resource developers and providers.

The extension must be declared in the STAC Item descriptor, next to the mlm or the ml-aoi extension:

"stac_extensions": [
  "https://stac-extensions.github.io/mlm/v1.3.0/schema.json",
  "https://stac-extensions.github.io/themes/v1.0.0/schema.json"
]

Example themes property in a STAC Item:

"properties": {
  "themes": [
    {
      "concepts": [
        {
          "id": "Deforestation",
          "name": "Deforestation"
        }
      ],
      "scheme": "https://en.wikipedia.org/wiki"
    },
    {
      "concepts": [
        {
          "id": "Category:Deforestation",
          "name": "Deforestation"
        }
      ],
      "scheme": "https://dbpedia.org/page"
    }
  ]
}

Publication DOIs and Citations#

Including external references to related publications provides users with additional insights to the published models and training data and helps them determine if a given resource is of interest to them or not.

The “scientific” STAC extension allows providing this information and also allows indicating how the resource must be cited in publications.

The properties fields specified in this extension use the sci: prefix.

Altough the Marketplace does not allow searching or filtering on DOIs or citations, this information is displayed in the item details pages.

When used, the scientific extension must be declared in the STAC Item descriptor next to the mlm or the ml-aoi extension:

"stac_extensions": [
  "https://stac-extensions.github.io/mlm/v1.3.0/schema.json",
  "https://stac-extensions.github.io/scientific/v1.0.0/schema.json"
]

Related publications are listed in the STAC Item property sci:publications. Each publication entry must contain the publication Digital Object Identifier (in doi) and a citation string (free text).

If the current resource has itself a DOI, this may be specified either in the property sci:doi, or as a hyperlink in an item link with role cite-as.

Example use of scientific fields and links in a STAC Item:

"properties": {
  "id": "unique-item-id",
  "sci:doi": "10.5061/dryad.s2v81.2/27.2",
  "sci:publications": [
    {
      "doi": "10.5061/dryad.s2v81.2",
      "citation": "Vega GC, Pertierra LR, Olalla-Tárraga MÁ (2017) Data from: MERRAclim, a high-resolution global dataset of remotely sensed bioclimatic variables for ecological modelling. Dryad Digital Repository."
    },
    {
      "doi": "10.1038/sdata.2017.78",
      "citation": "Vega GC, Pertierra LR, Olalla-Tárraga MÁ (2017) MERRAclim, a high-resolution global dataset of remotely sensed bioclimatic variables for ecological modelling. Scientific Data 4: 170078."
    }
  ]
},
"links": [
  {
    "rel": "cite-as",
    "href": "https://doi.org/10.5061/dryad.s2v81.2"
  }
]