EDI – A Template-Driven Metadata Editor for Research Data

EDI is a general purpose, template-driven metadata editor for creating XML-based descriptions. Originally aimed at defining rich and standard metadata for geospatial resources, It can be easily customised in order to comply with a broad range of schemata and domains. EDI creates HTML5 [9] metadata forms with advanced assisted editing capabilities and compiles them into XML files. The examples included in the distribution implement profiles of the ISO 19139 standard for geographic information [14], such as core INSPIRE metadata [10], as well as the OGC [8] standard for sensor description, SensorML [11]. Templates (the blueprints for a specific metadata format) drive form behaviour by element data types and provide advanced features like codelists 1 underlying combo boxes or autocompletion functionalities. Virtually, the editing of any metadata format can be supported by creating a specific template.


Introduction
The effective provisioning of spatial resources on the Internet has always been hampered by the intrinsic characteristics of this category of data: They are either of non-textual nature or, even when utilizing a text-based format for their encoding, generally convey little information on their semantics, purpose, and associated principals. For this reason, metadata is typically associated with resources in order to ground search, retrieval, and ultimately reuse of spatial information. Also, once instances of this category of resources are successfully "discovered" through the ad-hoc search engines generally referred to as geoportals, compliance with a number of access protocols is required by user agents in order to actually exploit the associated data.
The management of geospatial information in Europe is regulated by the INSPIRE (INfrastructure for SPatial InfoRmation in Europe) Directive [2], which itself is based on the groundwork set by the ISO 19115 ("Geographic information -Metadata) and ISO 19119 (Geographic information -Services) standards. Unfortunately, these standards can easily become inadequate because of the emergence of categories of data not previously considered. As an example, spatial data (in the broadest sense) have been recently enriched by the new category of data sources constituted by real-time/near-real-time data pulled from sensors. Consequently, these new data sources require specific data and metadata representations for management and enactment.
In this context, metadata editing plays a pivotal role that requires a flexible strategy. EDI, is a template-based metadata editor that is capable of abstracting from the specific XML schema a given metadata format is complying with. Also, the service-based deployment strategy allows users to retain full control on the data structures that are produced. Further details on EDI's role can be found in [1,3] EDI's main goals are depicted in Figure 1 and listed here: • Assisting metadata creation • by concealing complexity, e.g. storing mandatory default data inside hidden fields, driving users in fields filling, etc. • by helping avoid typos, e.g. by providing combo boxes and autocompletion functionalities • Fostering semantic enrichment • With an eye to the Semantic Web, EDI allows plugging in external data sources that are made available as SPARQL 2 endpoints. On the basis of these, EDI can semantically annotate metadata by providing, beside text descriptions of the entities that are referred to in the metadata (e.g., keywords, points of contact, toponyms, etc.), unique identifiers for these, in the form of URIs 3 • Constraining metadata completion • by validating field types and mandatory field presence on the client-side • by facilitating input with dedicated widgets; As an example, the geographic extent of a dataset (the "bounding box") can be entered either by drawing a rectangle on a map or by filling in the coordinates EDI allows system administrators to easily associate codelists, controlled vocabularies, and any kind of context information that are specific to their domain with the application. This capability is supported by allowing the editing interface to draw information from generic RDF data structures [5] made available as SPARQL endpoints.
EDI was developed as part of sub-project 7 (http:// www.ritmare.it/en/articolazione/sottoprogetto-7.html) of the Italian Flagship Project RITMARE (http://www.ritmare.it/). Creating a form based on a specific template is as easy as inserting the following line inside a script tag: edi.loadLocalTemplate("INSPIRE_dataset", "1.00", onTemplateLoaded); Templates are described by XML files, in terms of elements (groups of controls somehow related to one another) and items (single controls). Items • are associated with data types to enable validation, • can have a display directive (a URL can be displayed as a single text box or as a text box with underlying picture preview) • can have a datasource Implementation and architecture EDI consists of a client-and a server-side components.
The relationship between them is shown in Figure 2.
The EDI client gathers input from the user. It then packs the information provided by the user into an intermediate data structure (referred to as EDIML) and sends it to the EDI server to be compiled into the specific metadata (XML) format the template refers to.
The server sends the XML metadata back to the client for further usage (e.g. storage, CSW 4 sharing, …).

The client
The client is written in Javascript and is based on the jQuery (https://jquery.com) and Bootstrap 3 (http://getbootstrap.com) frameworks.
Its main components are: • UI Generation module -it renders HTML5 controls according to the data type and constraints defined in the template. • Internationalisation module -in templates, all labels and help boxes can be localised by specifying them in an arbitrary number of languages, each being declared by an xml:lang attribute. • Validation module -on the basis of the data types and constraints specified in the template, form content is validated upon submission, returning a "pass", "warning" or "error" state. Validation can be disabled at template-level for testing purposes. • EDIML module -this component hosts the main data structure, i.e. ediml.content, and methods to maintain it. This is also the module managing conversations with the EDI server. • Datasource Mgmt module -manages the data sources underlying codelists or autocompletion functionalities.

The server
The server is written in Java as a Spring 5 Boot Application. The server consists of: • A RESTful [7] interface -that the client uses to compile metadata and access utility services. • Business logic -it consists of two main blocks: • Metadata compilation -this is the actual core of the server: it transforms an EDIML XML document containing the data and structure of the user filled-in form into the target XML metadata. Compilation can be followed by a template-defined chain of XSLT transformations if further processing is required. • Utility services -various services, e.g. XSLT transformations, EDIML retrieval, etc. • Metadata persistence -metadata generated by the server is stored in a PostgreSQL relational database. Listings 1-4 show excerpts of the template for SensorML v2.0.0 and allow to pinpoint the main characteristics of the EDI meta-language. The outer template tag contains a settings section that defines the general parameters that are taken into account for creating the HTML editing frontend: Particularly, the metadataEndpoint and sparqlEndpoint tags contain, respectively, the URL of the web service Listing 2: Sensorml template excerpt -(endpointtypes).

Listing 1: SensorML template excerpt -(Settings).
processing the client input and the default SPARQL endpoint for the datasources below. The baseDocument tag contains the outermost levels of the document's XML hierarchy that is shared by all descriptions that are created with a given template. The next essential component of a template is the definition of datasources, that is, the specification of where to look up when the editor fills drop-down lists, provides alternatives for the autocompletion features, etc. For convenience, these can also be clustered into end-pointTypes in order to avoid duplication of parameters that are shared by all instances of a specific triple store (i.e., the data base for RDF). The template then contains a sequence of group tags that allow the developer, as the name suggests, to group metadata elements together, a feature that can be employed to divide the editing interface into sections or tabs. Group tags contain a number of element tags that represent metadata fields, even when these are composite entities made up by a number of distinct items. Each of these three constructs can be given multilingual labels in order to tailor the interface to multiple languages (and automatically switch between them). Each element tag specifies whether the metadata element isMandatory or isMultiple. Also, tag hasRoot specifies at which level of the XML node tree multiple instances of the same element shall be rooted. Finally, item tags represent all the nodes that are required in order to fully define the metadata item (ISO metadata is particularly redundant in the specification of these).
A key component in the specification of items is the associated data type, specified by means of the hasDatatype attribute.
EDI templates define the primitives necessary for the definition of metadata elements.

Listing 3: Sensorml template excerpt -(datasources).
Listing 5: Hidden predefined item example.   Each element defines one or more items that correspond to the individual XML nodes that are populated in the metadata record.
The essential information defining an element is: • the element must or must not show in the interface (isFixed, see Listing 5) • textual explanation in the interface (label and help) • compulsoriness (isMandatory) • the XPath defining the position where the XML node shall be placed (hasRoot, hasPath) • its value (hasValue, defaultValue, see Listing 6) • data types (hasDatatype, see Listing 6) Elements contain items, which have a data type declared for validation purposes.
A list of available data types is shown in Table 1.

The interchange format -EDIML
EDIML is an XML format we devised to pass on userentered metadata from the EDI client to the EDI server and backwards, without losing the semantic enrichment EDI is capable of when the destination format is semantics-unaware, as ISO standards, unfortunately, are. This is the reason why EDI constantly keeps two versions of each metadata: one in EDIML, e.g. for subsequent editing, and one in the destination format (e.g. SensorML 2.00).

Quality control
Functional testing was conducted by manually creating a collection of metadata, used as a test bench for validation.
Both Sensor ML (ver 1.0.0 and 2.0.0) and INSPIRE metadata are created by different experts of different disciplines (physical and chemical oceanography, marine geology, geophysics, coastal systems, marine ecology, fishery and aquaculture, marine biomolecular science, human impacts, climatology, biogeochemistry, remote sensing, agriculture) and for different type of data resources, in order to account for the huge variability of contents, lexicons, outlooks, as well as for the subjectivity, in the metadata filling. Several experts participating to research projects, mainly RITMARE, are involved in the testing; metadata are manually compiled using the template available in the standard distribution that refers to one specific metadata (XML) format. More than 60 samples of Sensor ML and more than 50 samples of INSPIRE metadata are compiled for the purpose.
The consistency was carried out for the whole metadata collection against the corresponding XML Schemas 6 (as declared in the baseDocument tag), by means of SensorML and ISO XSDs.
Quality has been further tested specifically for the INSPIRE profile by means of the official INSPIRE validator (available at http://inspire-geoportal.ec.europa.eu/valida-tor2/), manually uploading the samples, and getting positive returns.

English
(3) Reuse potential EDI actually has great reuse potential. In fact, it can generate documents according to any XML schema and then any data formats with an XML serialisation can be produced.
Plenty of templates are available in the EDI-NG_templates repository 10 , which we have been using in several research projects, also serve as samples of what can be attained with templates.
For instance, we provide both SensorML 1.0.1 and SensorML 2.0 As an example, an EDI template 11 producing an RDF/ XML output has been developed.
On the other hand, as EDI allows for specifying an arbitrary chain of XSL Transformations for post-processing the XML output, it can generate any text-based output format as well.