eNews

#01 2025

SAEON systems development lead presents expert opinion to the International Atomic Energy Agency

The uLwazi Node’s systems development lead, Mark J. Jacobson, was invited by the United Nations International Atomic Energy Agency (IAEA) to participate in an Interregional Meeting on the Architecture and Development of the NUTEC Plastic Database for Marine Microplastic Monitoring, held over four days at the Vienna International Centre in Austria. 

The meeting was facilitated by the IAEA Department of Technical Cooperation under project INT7021: Contributing to the Global Monitoring of Marine Plastic Pollution under the IAEA NUclear TEChnology for Controlling Plastic Pollution (NUTEC Plastics) Initiative.

Mark was among seven experts invited from various institutions around the world that are involved in one way or another with the collection and management of data in the marine environment. Several IAEA staff were also present in a facilitatory capacity, including the technical lead for the development of the NUTEC plastics database.

Introductory session of the interregional meeting on the architecture and development of the NUTEC plastics database for marine microplastic monitoring.

Data management and marine science experts who participated in adjacent meetings under the NUTEC Plastics Initiative of the IAEA at the Vienna International Centre.

Data collection in the marine environment 

On the first day of the meeting, the invited experts presented their respective experiences on data collection in the marine environment. Representing SAEON, Mark presented an overview of the organisation and its research activities and elaborated on the architecture of the SAEON Open Data Platform and its role in supporting the Marine Information Management System, a component of the National Oceans and Coasts Information Management System (OCIMS) of South Africa.

The remaining three days of the meeting were dedicated to discussions on data and metadata management, standardisation and the use of vocabularies, interconnection with existing marine databases and data products, reporting on progress against the relevant UN Sustainable Development Goals, and technical aspects of the development of the NUTEC plastics database.

On the last two days of the meeting, combined sessions were additionally held with marine scientists participating in an adjacent meeting under the same project. The adjacent meeting was focused on harmonisation of data collection methods and establishment of protocols for the monitoring of marine microplastics pollution. The combined sessions were highly productive, with the sharing of understandings from the data management perspective on the one hand and the marine science perspective on the other. The invited experts were asked to contribute their respective recommendations to the end-of-mission report. Mark’s recommendations are summarised below.

Looking at the NUTEC access control requirements, it is proposed that end-user roles (the NUTEC terms of reference describes six: Global Admin, Country Admin, Lab Admin, Analyst, Sampler, Public) be modelled as dynamic entities distinct from functional permissions. In this approach, a role is defined as an aggregation of fine-grained API (application programming interface) permissions, and a user may have any number of roles with permissions applied cumulatively.

This approach to role-based access control – successfully implemented in SAEON’s Open Data Platform – enables roles to be easily added, removed or reconfigured in the future, without requiring modifications to the authorisation code that controls access to individual API endpoints.

The information architecture of the SAEON Open Data Platform, version 3.0 (in development).

The meeting was facilitated by the IAEA Department of Technical Cooperation under project INT7021: Contributing to the Global Monitoring of Marine Plastic Pollution under the IAEA NUclear TEChnology for Controlling Plastic Pollution (NUTEC Plastics) Initiative. (Photo: Miklos Gaspar, IAEA)

With respect to data management, it is proposed that sample records be stored as JSON objects (rather than long rows of columns). In this approach, each record is associated with a JSON schema that describes the record’s overall structure, its individual field formats and its validation rules. Syntactic and semantic validations are thus made explicit and public with published schemas, instead of being hidden in API code. This is advantageous for developers tasked with creating data pipelines to submit data from regional laboratories to the central repository.

The NUTEC terms of reference define four classes of laboratory – Level I (Basic), Level II (Intermediate), Level III (Advanced) and Level III+ (Advanced Plus) – each of which will have an associated protocol for which a distinct JSON schema would apply. The JSON schema approach is cognisant of the possibility of more advanced laboratory techniques and new protocols emerging in the future, of existing protocols being revised, and of obsolete protocols being shelved. The approach furthermore caters to changes in formal vocabularies that might happen from time to time as new methodologies and new classifications emerge.

Open Archival Information System 

A proposition in respect of the NUTEC information model is to consider the Open Archival Information System (OAIS), an abstract architecture which has informed SAEON’s approach to data and metadata management, and which continues to inform ongoing development.

The OAIS defines three distinct classes of package, comprising a dataset and its associated metadata. A Submission Information Package (SIP) is supplied by a data provider and is stored as is. A resubmission is typically handled as a new SIP instance. An Archival Information Package (AIP) is derived from one or more SIP, optionally transformed, and stored independently. It is the source of truth for a dataset and may be identified by a DOI.

A Dissemination Information Package (DIP) is a distributable record of the dataset, again stored separately (or generated on-the-fly, in some implementations). The DIP may take many different forms, depending on target audience and intended use, but its metadata typically aligns with the FAIR principles – that the dataset be findable, accessible, interoperable and reusable.

Finally, commentary was offered on an interesting discussion around the relevance of a general, free-text “comments” field. Scientists in the adjacent meeting were generally in favour of including such a field in sample records.

One of the data management experts, however, advocated for exclusion of such, on the grounds that free-text comments are not machine-readable, are incomparable, and may be open to interpretation and subjectivity. But, argued Mark, while it is true that protocols should endeavour to use controlled vocabularies as far as possible, it is important to acknowledge the possibility of new methods and new interpretations being devised in the future, which may arise from observations and discoveries made by researchers under current protocols. Such observations and discoveries might not be identifiable or classifiable under any existing vocabulary but might comprise valuable information that could be harvested and migrated into controlled fields under future protocols. The only suitable place for such information is a general comments field. Even in scenarios where a regional laboratory has extended the JSON schema for a protocol to systematically collect additional information using a new or extended vocabulary, there will always be a lag between novel observations and innovations, and the standardised representations thereof.

The IAEA has indicated that further collaborative meetings may be held in the future, as the NUTEC microplastics monitoring initiative progresses and evolves.