Transitive Credit and JSON-LD

Science and engineering research increasingly relies on activities that facilitate research but are not currently rewarded or recognized. This includes the sharing of data; development of common data resources, software and methodologies; and annotation of data and publications. This situation has been documented in a number of recent reports [1, 2] that focus on changing needs and mechanisms for attribution and citation of digital products, from the use of alternative metrics [3] that track reports of research impact apart from research publications, to work on data [4]. About half of the articles in many recent issues of Science describe research that depended on software, and a larger fraction analyze data. Indeed, the US National Science Foundation recently updated its guide to proposers to instruct them to provide a list of their “products”— objects that are “citable and accessible including but not limited to publications, data sets, software, patents, and copyrights”—rather than publications [5]. To promote and advance pursuit of activities that facilitate research, we must develop mechanisms for assigning credit, facilitate the appropriate attribution of research outcomes, devise incentives for activities that facilitate research, and allocate funds to maximize return on investment. In this article, we explore how the idea of transitive credit [6, 7], which would credit both direct and indirect contributions, can be implemented. Note that this article is an extended version of an earlier paper [8].

in this case 75% to her/himself, and 25% to the software code previously mentioned.Credit is now transitive, in that the lead software developer of the code can be given credit for 12.5% of the paper.If another paper is later written that extends the product B paper and gives 10% credit to that paper, the lead software package developer will also have 1.25% credit for the new paper.
The value of transitive credit is in measuring the indirect contributions to a product, which today are not quantitatively captured.Because they aren't captured, they aren't rewarded, and there is a disincentive to perform them, due to the cost (in time or something else).If they were captured, this disincentive would be replaced by an incentive, which for software and data would mean to publish and share them in a reusable form.
Transitive credit could be implemented by adding either creditmaps to the metadata stored with DOIs [9], or alternatively by building a separate system to store creditmaps and then adding pointers to entries in this system to the metadata stored in DOIs.Here, we focus on the creditmap itself, and how it can be described in a structured, machine (and human) readable form.

Determining Creditmaps and Weights
One of the challenges in implementing transitive credit is in how creditmap entries are determined: what are the items that should be credited?Automated systems, such as those that are imagined to store provenance and encourage reproducibility, will help here, as these systems will store the tools and products (e.g., software and data) used in a set of work.As systems such as Mendeley [10], CiteULike [11], and Zotero [12] become more common, they will be able to generate lists of items that have been read during a set of work.Assembling the list of authors and people to be acknowledged will probably remain somewhat manual, though again tools can help, for example, GitHub provides a listing of committers to a repository.
Once the items are determined, what weight should be assigned to each is the second challenge.While this could be carried out manually, it probably won't be in general, since determining weights for dozens or hundreds of items is probably beyond the average human capability and is also probably not needed.A tool to help determine weights (or a feature of a creditmap tool) seems most likely to be successful.Such a tool would likely have provide simultaneous views of the credit map and weights: one that allows a detailed view of any particular contriponent and its weight, and the other that provides a view (perhaps graphical) of the entire creditmap and weights.Such a tool could offer an equal distribution of weights as a starting point, or a manual distribution of weights to each type of contriponent, with an equal distribution within types (e.g.50% credit to authors, all of which start as equal, 25% credit to citations, all of which again start as equal, and 25% credit to software and data, again all of which start as equal.)A particular value could then be modified, with the tool changing all the other weights so that they continue to sum to 1.

JSON-LD
JSON-LD (JavaScript Object Notation for Linked Data, http://json-ld.org) is a subset of the key-value based JSON document format that provides a way of describing machine-readable information with semantic context.Popular as an alternative to XML in the web development community, JSON is also used as a base data format for search engines such as Elastic Search [13] and NoSQL data stores such as MongoDB [14].
In the final stages of standardization at W3C [15], JSON-LD is designed to lower the barrier for data publishers who wish to provide 'Linked Data' so that concepts and entities can be identified with certainty.As an example of machine readable data with semantics take this JSON snippet describing a person: {"name": "Daniel S. Katz"}.Without additional context there is ambiguity as to what the term describes (such as the name of a person, place or thing).A better alternative is: { "@context": "http://schema.org","@type": "Person", "name": "Daniel S. Katz", "@id": "http://orcid.org/0000-0001-5934-7525"}

Using JSON-LD for Transitive Credit
Smith recently proposed using JSON-LD for research tools [16].In this paper, we extend this idea to suggest that it could be used for transitive credit of any scholarly product.Because of namespaced nature of the JSON-LD structure, it is trivial to include all contriponents such as datasets, software, and articles with their appropriate semantic context definition while maintaining a both human and machine-readable structure.Note that we use vocabularies from http://schema.org for the different entities; other examples include DOAP (https://github.com/edumbill/doap) and SPDX (http://spdx.org).
In order to use JSON-LD for product metadata, it needs to be standardized, at least in a de facto sense, both in terms of exactly how it is used (the schema) and how it is stored and accessed.
To determine a schema, a set of trials would likely help.They could be done through the Force11 Attribution working group [17], which then would also provide a venue for discussion and consensus.This group exists to "catalyze rapid convergence on requirements, approaches, and practical implementation of a system for tracking contributions to any scholarly product," [17], including transitive credit among other ideas currently being explored.
This credit information could be stored as part of the metadata within the DOI system, perhaps by updating the kernel metadata and adding a creditmap entry to the DOI data dictionary [9], though the authors note that this is a standardization activity which requires community effort, similar to that being undertaken today by CASRAI (http://casrai.org),VIVO (http://vivoweb.org),and others.Then, creditmap information would need to be added by registration authorities (RAs) that hold these scholarly products (i.e., publishers of all sorts, including journals, data archivers, and universities).Indexing systems such as Thomson Reuters and universities would also need to develop support for creditmaps.Similar complementary changes have also been discussed recently, aimed at characterizing contributions rather than assigning weights [18,19,20].

A JSON-LD Example
A subset of an possible creditmap for this article follows, as an illustrative example of the power of this concept.Some interesting aspects of the credit map include: • We assign credit to people who are acknowledged, as they have made a contribution to the article, e.g.Howison, Allen, Proctor • Software and tools are also cited, e.g., MongoDB, Mendeley.In this case, they are cited solely for the purpose of providing information about tools used; as they have not made a contribution to this article, we give them 0 credit.

Conclusions
In this paper we have outlined a mechanism (a creditmap) and a language (JSON-LD) for ascribing complete credit for all contriponents that have led to a scholarly product.This is one of the necessary elements for a transitive credit system.Others include standard practices for how to write the language (a schema), and how to store and access it.Other issues that would need to be resolved for this system to be used in practice are related to identifiers and social practices.
The manner in which we use unique identifiers is somewhat rough; we use ORCIDs for people and DOIs for papers, but for other elements, such as software and tools that may not have a DOI, the solution is not clear.Additionally, the manner in which weights are assigned is also rough; it might be easier to assign weights within categories, then assign weights to categories, and let the system determine the detailed weights.
While this paper addresses the technical aspects of how to make transitive credit possible, many social questions remain unresolved.For example, disciplinary communities will need to decide what the contriponents are that are relevant to their discipline, how to weigh various categories of contriponents (e.g., are developers more, equally, or less important than libraries?),and how the authors of products should assign weights to specific product contriponents (e.g., reference 1 is more/less important to a manuscript than reference 2.) Overall, the idea of transitive credit is powerful, as the recognition of contributions (credit) to products would encourage the development of such products, and the transitive nature of the credit would encourage the development of products at all levels, not just those that are likely to be cited in papers.
One potential extension to this creditmap we have proposed would be to replace the "keywords" entry to include a description of the subject area from a defined taxonomy.When appropriate, such as in the case of software, this could also include a description of the function of the tool.Using indexing tools such as Elastic Search [13], it would then be possible to build indexes of these fields and then make powerful faceted searches such as 'find astrophysics software, written in Python designed to manipulate spectroscopic data'.
One future project that would test the value of transitive credit would be to build creditmaps for a set of software in a given area domain, potentially by working with data stored in GitHub that includes contributors and dependencies, and in some case, citations.This would build a creditmap graph that could then be analyzed.
Finally, we note that our proposal in this paper is very compatible with the current altmetrics activities [3], and would add to the alternative metrics that could be collected and analyzed.