EDI – A Template-Driven Metadata Editor for Research Data

Fabio Pavesi; Anna Basoni; Cristiano Fugazza; Stefano Menegon; Alessandro Oggioni; Monica Pepe; Paolo Tagliolato; Paola Carrara

(1) Overview

Introduction

The effective provisioning of spatial resources on the Internet has always been hampered by the intrinsic characteristics of this category of data: They are either of non-textual nature or, even when utilizing a text-based format for their encoding, generally convey little information on their semantics, purpose, and associated principals. For this reason, metadata is typically associated with resources in order to ground search, retrieval, and ultimately reuse of spatial information. Also, once instances of this category of resources are successfully “discovered” through the ad-hoc search engines generally referred to as geoportals, compliance with a number of access protocols is required by user agents in order to actually exploit the associated data.

The management of geospatial information in Europe is regulated by the INSPIRE (INfrastructure for SPatial InfoRmation in Europe) Directive [], which itself is based on the groundwork set by the ISO 19115 (“Geographic information – Metadata) and ISO 19119 (Geographic information – Services) standards. Unfortunately, these standards can easily become inadequate because of the emergence of categories of data not previously considered. As an example, spatial data (in the broadest sense) have been recently enriched by the new category of data sources constituted by real-time/near-real-time data pulled from sensors. Consequently, these new data sources require specific data and metadata representations for management and enactment.

In this context, metadata editing plays a pivotal role that requires a flexible strategy. EDI, is a template-based metadata editor that is capable of abstracting from the specific XML schema a given metadata format is complying with. Also, the service-based deployment strategy allows users to retain full control on the data structures that are produced. Further details on EDI’s role can be found in [, ]

EDI’s main goals are depicted in Figure 1 and listed here:

Assisting metadata creation
- by concealing complexity, e.g. storing mandatory default data inside hidden fields, driving users in fields filling, etc.
- by helping avoid typos, e.g. by providing combo boxes and autocompletion functionalities
Fostering semantic enrichment
- With an eye to the Semantic Web, EDI allows plugging in external data sources that are made available as SPARQL endpoints. On the basis of these, EDI can semantically annotate metadata by providing, beside text descriptions of the entities that are referred to in the metadata (e.g., keywords, points of contact, toponyms, etc.), unique identifiers for these, in the form of URIs
Constraining metadata completion
- by validating field types and mandatory field presence on the client-side
- by facilitating input with dedicated widgets; As an example, the geographic extent of a dataset (the “bounding box”) can be entered either by drawing a rectangle on a map or by filling in the coordinates

Figure 1

EDI’s goals.

EDI allows system administrators to easily associate codelists, controlled vocabularies, and any kind of context information that are specific to their domain with the application. This capability is supported by allowing the editing interface to draw information from generic RDF data structures [] made available as SPARQL endpoints.

EDI was developed as part of sub-project 7 (http://www.ritmare.it/en/articolazione/sottoprogetto-7.html) of the Italian Flagship Project RITMARE (http://www.ritmare.it/).

Creating a form based on a specific template is as easy as inserting the following line inside a script tag:

edi.loadLocalTemplate(“INSPIRE_dataset”, “1.00”,
onTemplateLoaded);

Templates are described by XML files, in terms of elements (groups of controls somehow related to one another) and items (single controls).

Items

are associated with data types to enable validation,
can have a display directive (a URL can be displayed as a single text box or as a text box with underlying picture preview)
can have a datasource

Implementation and architecture

EDI consists of a client- and a server-side components.

The relationship between them is shown in Figure 2.

Figure 2

Architecture.

The EDI client gathers input from the user.

It then packs the information provided by the user into an intermediate data structure (referred to as EDIML) and sends it to the EDI server to be compiled into the specific metadata (XML) format the template refers to.

The server sends the XML metadata back to the client for further usage (e.g. storage, CSW sharing, …).

The client

The client is written in Javascript and is based on the jQuery (https://jquery.com) and Bootstrap 3 (http://getbootstrap.com) frameworks.

Its main components are:

UI Generation module – it renders HTML5 controls according to the data type and constraints defined in the template.
Internationalisation module – in templates, all labels and help boxes can be localised by specifying them in an arbitrary number of languages, each being declared by an xml:lang attribute.
Validation module – on the basis of the data types and constraints specified in the template, form content is validated upon submission, returning a “pass”, “warning” or “error” state. Validation can be disabled at template-level for testing purposes.
EDIML module – this component hosts the main data structure, i.e. ediml.content, and methods to maintain it. This is also the module managing conversations with the EDI server.
Datasource Mgmt module – manages the data sources underlying codelists or autocompletion functionalities.

The server

The server is written in Java as a Spring Boot Application.

The server consists of:

A RESTful[] interface – that the client uses to compile metadata and access utility services.
Business logic – it consists of two main blocks:
- Metadata compilation – this is the actual core of the server: it transforms an EDIML XML document containing the data and structure of the user filled-in form into the target XML metadata. Compilation can be followed by a template-defined chain of XSLT transformations if further processing is required.
- Utility services – various services, e.g. XSLT transformations, EDIML retrieval, etc.
Metadata persistence – metadata generated by the server is stored in a PostgreSQL relational database.

Listings 1, 2, 3, 4 show excerpts of the template for SensorML v2.0.0 and allow to pinpoint the main characteristics of the EDI meta-language. The outer template tag contains a settings section that defines the general parameters that are taken into account for creating the HTML editing front-end: Particularly, the metadataEndpoint and sparqlEndpoint tags contain, respectively, the URL of the web service processing the client input and the default SPARQL endpoint for the datasources below. The baseDocument tag contains the outermost levels of the document’s XML hierarchy that is shared by all descriptions that are created with a given template. The next essential component of a template is the definition of datasources, that is, the specification of where to look up when the editor fills drop-down lists, provides alternatives for the autocompletion features, etc. For convenience, these can also be clustered into endpointTypes in order to avoid duplication of parameters that are shared by all instances of a specific triple store (i.e., the data base for RDF).

Listing 1

SensorML template excerpt – (Settings).

Listing 2

Sensorml template excerpt – (endpointtypes).

Listing 3

Sensorml template excerpt – (datasources).

Listing 4

Sensorml template excerpt – (groups, elements and items).

The template then contains a sequence of group tags that allow the developer, as the name suggests, to group metadata elements together, a feature that can be employed to divide the editing interface into sections or tabs. Group tags contain a number of element tags that represent metadata fields, even when these are composite entities made up by a number of distinct items. Each of these three constructs can be given multilingual labels in order to tailor the interface to multiple languages (and automatically switch between them). Each element tag specifies whether the metadata element isMandatory or isMultiple. Also, tag hasRoot specifies at which level of the XML node tree multiple instances of the same element shall be rooted. Finally, item tags represent all the nodes that are required in order to fully define the metadata item (ISO metadata is particularly redundant in the specification of these).

A key component in the specification of items is the associated data type, specified by means of the hasDatatype attribute.

EDI templates define the primitives necessary for the definition of metadata elements.

Each element defines one or more items that correspond to the individual XML nodes that are populated in the metadata record.

The essential information defining an element is:

the element must or must not show in the interface (isFixed, see Listing 5)
textual explanation in the interface (label and help)
compulsoriness (isMandatory)
the XPath defining the position where the XML node shall be placed (hasRoot, hasPath)
its value (hasValue, defaultValue, see Listing 6)
data types (hasDatatype, see Listing 6)

Listing 5

Hidden predefined item example.

Listing 6

Default value example.

Elements contain items, which have a data type declared for validation purposes.

A list of available data types is shown in Table 1.

Table 1

Item data types.

Simple	Composite	External sources	Internal references

string	dateRange	code (alias: codelist)	copy
text	boundingBox	autoCompletion	ref
date	URL	select	function
int
real
autonumber
hidden
URN
URI
sensorID (URL)

The interchange format – EDIML

EDIML is an XML format we devised to pass on user-entered metadata from the EDI client to the EDI server and backwards, without losing the semantic enrichment EDI is capable of when the destination format is semantics-unaware, as ISO standards, unfortunately, are.

This is the reason why EDI constantly keeps two versions of each metadata: one in EDIML, e.g. for subsequent editing, and one in the destination format (e.g. SensorML 2.00).

Listing 7 shows an example of the EDIML format. Please note that, for item manuf_name_3, the value is “DeepSea Power & Light”, but the URI (i.e. “http://sp7.irea.cnr.it/rdfdata/sensors/manufacturer/31”) is brought along.

Listing 7

Sample ediml.

Quality control

Functional testing was conducted by manually creating a collection of metadata, used as a test bench for validation.

Both Sensor ML (ver 1.0.0 and 2.0.0) and INSPIRE metadata are created by different experts of different disciplines (physical and chemical oceanography, marine geology, geophysics, coastal systems, marine ecology, fishery and aquaculture, marine biomolecular science, human impacts, climatology, biogeochemistry, remote sensing, agriculture) and for different type of data resources, in order to account for the huge variability of contents, lexicons, outlooks, as well as for the subjectivity, in the metadata filling. Several experts participating to research projects, mainly RITMARE, are involved in the testing; metadata are manually compiled using the template available in the standard distribution that refers to one specific metadata (XML) format. More than 60 samples of Sensor ML and more than 50 samples of INSPIRE metadata are compiled for the purpose.

The consistency was carried out for the whole metadata collection against the corresponding XML Schemas (as declared in the baseDocument tag), by means of SensorML and ISO XSDs.

Quality has been further tested specifically for the INSPIRE profile by means of the official INSPIRE validator (available at http://inspire-geoportal.ec.europa.eu/validator2/), manually uploading the samples, and getting positive returns.

(2) Availability

Operating system

Any OS that can run Java 1.7 for the back end, any OS for the client

Programming language

Javascript / JavaEE 1.7

Additional system requirements

EDI Client is tested on Mozilla Firefox

Dependencies

Java 1.7 (EDI Server only)

PostgreSQL 9.0+ (EDI Server only, tested on 9.3.4)

List of contributors

Pepe Monica, Oggioni Alessandro, Fugazza Cristiano, Tagliolato Paolo – template design, functional testing
Pavesi Fabio – software design and development
Menegon Stefano – functional testing, integration with the GET-IT platform

Software location

Archive (e.g. institutional repository, general repository) (required)

Name: GitHub

Persistent identifiers:

EDI Server: https://github.com/SP7-Ritmare/EDI-NG_server.git

Distribution (v1.2, commit ID b351a9d): https://github.com/SP7-Ritmare/EDI-NG_server/releases/download/v1.2/edi.zip

EDI Client: https://github.com/SP7-Ritmare/EDI-NG_client.gitDistribution (v 1.2, commit ID 2f4f05c): https://github.com/SP7-Ritmare/EDI-NG_client/releases/tag/v1.2

Licence: GPL v3

Publisher: Fabio Pavesi

Date published: 15/12/14

Language

English

(3) Reuse potential

EDI actually has great reuse potential.

In fact, it can generate documents according to any XML schema and then any data formats with an XML serialisation can be produced.

Plenty of templates are available in the EDI-NG_templates repository, which we have been using in several research projects, also serve as samples of what can be attained with templates.

For instance, we provide both SensorML 1.0.1 and SensorML 2.0

As an example, an EDI template producing an RDF/XML output has been developed.

On the other hand, as EDI allows for specifying an arbitrary chain of XSL Transformations for post-processing the XML output, it can generate any text-based output format as well.

[B1] Basoni, B, Bastianini, M, Fugazza, C, Menegon, S, Minuzzo, T, Oggioni, A, Pavesi, F, Sarretta, A, Tagliolato, P, Vianello, A and Carrara, P (2014). EDI: Software per la metadatazione di risorse geografiche, dati osservativi e documenti, conformi RNDT e INSPIRE Available at: https:// github.com/SP7-Ritmare?query=EDI.

[B2] European Commission (2008). Commission Regulation (EC) No 1205/2008 of 3 December 2008 implementing Directive 2007/2/EC of the European Parliament and of the Council as regards metadata Official Journal of the European Union L 326((1205)): 12–30. Available at: http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2008:326:0012:0030:EN:PDF.

[B3] Fugazza, C et al. (2014). A Holistic, Semantics-aware Approach to Spatial Data Infrastructures Proceedings of 3rd International Conference on Data Management Technologies and Applications. SCITEPRESS – Science and and Technology Publications: 349–356, DOI: https://doi.org/10.5220/0004997603490356 Available at: http://www.scitepress.org/DigitalLibrary/Link.aspx?.

[B4] Harris, S and Seaborne, A (2013). SPARQL 1.1 Overview. W3C recommendation March 21 2013 Available at: http://www.w3.org/TR/sparql11-query/.

[B5] W3C (2014). RDF 1.1 Semantics – W3C Recommendation In: W3C. February 25 2014 Available at: http://www.w3.org/TR/rdf11-mt/.

[B6] Fugazza, C et al. (2014). The RITMARE Starter Kit: Bottom-up Capacity Building for Geospatial Data Providers Proceedings of the 9th International Conference on Software Paradigm Trends. : 169–176. Available at: http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0004999801690176.

[B7] Fielding, R T and Taylor, R N (2000). Principled design of the modern Web architecture Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium. 2(2): 115–150.

[B8] Open Geospatial Consortium (). http://www.opengeospatial.org/

[B9] Hickson, I, Berjon, R, Faulkner, S, Leithead, T, Navara, E, O’Connor, E and Pfeiffer, S (2014). HTML5, W3C Recommendation. REC-html5-20141028 Available at: <http://www.w3.org/TR/2014/REC-html5-20141028>.

[B10] European Commission Joint Research Centre (2013). INSPIRE Metadata Implementing Rules: Technical Guidelines based on EN ISO 19115 and EN ISO 19119, version 1.3 Available at: http://inspire.ec.europa.eu/documents/Metadata/MD_IR_and_ISO_20131029.pdf.

[B11] Open Geospatial Consortium (2014). OGC® SensorML: Model and XML Encoding Standard, version 2.0.0 Available at: https://portal.opengeospatial.org/files/?artifact_id=55939.

[B12] International Organization for Standardization (2003). ISO 19115 Geographic Information – Metadata In: ISO/TC-211.

[B13] International Organization for Standardization (2005). ISO 19119 Geographic information – Services In: ISO/TC-211.

[B14] International Organization for Standardization (2007). ISO/TS 19139:2007 Geographic information – Metadata – XML schema implementation In: ISO. Available at: http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=32557.

Journal of Open Research Software

Software Metapapers