Transitive Credit and JSON-LD

Daniel S. Katz; Arfon M. Smith

1 Introduction

Science and engineering research increasingly relies on activities that facilitate research but are not currently rewarded or recognized. This includes the sharing of data; development of common data resources, software and methodologies; and annotation of data and publications. This situation has been documented in a number of recent reports [, ] that focus on changing needs and mechanisms for attribution and citation of digital products, from the use of alternative metrics [] that track reports of research impact apart from research publications, to work on data []. About half of the articles in many recent issues of Science describe research that depended on software, and a larger fraction analyze data. Indeed, the US National Science Foundation recently updated its guide to proposers to instruct them to provide a list of their “products”—objects that are “citable and accessible including but not limited to publications, data sets, software, patents, and copyrights”—rather than publications [].

To promote and advance pursuit of activities that facilitate research, we must develop mechanisms for assigning credit, facilitate the appropriate attribution of research outcomes, devise incentives for activities that facilitate research, and allocate funds to maximize return on investment. In this article, we explore how the idea of transitive credit [, ], which would credit both direct and indirect contributions, can be implemented. Note that this article is an extended version of an earlier paper [].

2 Transitive Credit

Transitive credit involves three main elements. The first is complete credit. Any product should list all authors (as currently listed as authors of a paper), all contributors (as currently listed in the acknowledgements of a paper) and all component products that have been used, including both publications and other products such as software and data (as currently either cited, acknowledged, or not included in a paper). We coin the term “contriponents” for the combination of these contributors and components (though we eagerly welcome suggestions for a better term).

Second, all the contriponents need to have weights assigned. Determining how to weight credit of the authors may be difficult, but it should be possible. Methods for doing this weighting, whether using a taxonomy or a more traditional list of authors, and analysis of these methods and their impact would likely be developed if this overall idea moves forward. Just as publications today are submitted by one person who is responsible for making sure all authors are listed (and perhaps assigned roles in a taxonomy) and the publication is complete, the submitter would also be responsible for registering this fractional credit, no matter how the values are determined.

The third element of transitive credit is its transitive nature, as is shown, for example, by how the credit map for a product A, which is used by a product B, feeds into the credit map for product B. Suppose product A is a software package equally written by two authors and its credit map is that 50% of the credit for this should go the lead developer, 20% to the second developer, and 10% to the third developer. In addition, 5% should go to each of the four libraries that are needed to run the code. When this product is created and registered, this credit map is registered along with it. Product B is a paper that obtains new science results, and it depended on Product A. The person who registers the publication also registers its credit map, in this case 75% to her/himself, and 25% to the software code previously mentioned. Credit is now transitive, in that the lead software developer of the code can be given credit for 12.5% of the paper. If another paper is later written that extends the product B paper and gives 10% credit to that paper, the lead software package developer will also have 1.25% credit for the new paper.

The value of transitive credit is in measuring the indirect contributions to a product, which today are not quantitatively captured. Because they aren’t captured, they aren’t rewarded, and there is a disincentive to perform them, due to the cost (in time or something else). If they were captured, this disincentive would be replaced by an incentive, which for software and data would mean to publish and share them in a reusable form.

Transitive credit could be implemented by adding either creditmaps to the metadata stored with DOIs [], or alternatively by building a separate system to store creditmaps and then adding pointers to entries in this system to the metadata stored in DOIs. Here, we focus on the creditmap itself, and how it can be described in a structured, machine (and human) readable form.

3 Determining Creditmaps and Weights

One of the challenges in implementing transitive credit is in how creditmap entries are determined: what are the items that should be credited? Automated systems, such as those that are imagined to store provenance and encourage reproducibility, will help here, as these systems will store the tools and products (e.g., software and data) used in a set of work. As systems such as Mendeley [], CiteULike [], and Zotero [] become more common, they will be able to generate lists of items that have been read during a set of work. Assembling the list of authors and people to be acknowledged will probably remain somewhat manual, though again tools can help, for example, GitHub provides a listing of committers to a repository.

Once the items are determined, what weight should be assigned to each is the second challenge. While this could be carried out manually, it probably won’t be in general, since determining weights for dozens or hundreds of items is probably beyond the average human capability and is also probably not needed. A tool to help determine weights (or a feature of a creditmap tool) seems most likely to be successful. Such a tool would likely have provide simultaneous views of the credit map and weights: one that allows a detailed view of any particular contriponent and its weight, and the other that provides a view (perhaps graphical) of the entire creditmap and weights. Such a tool could offer an equal distribution of weights as a starting point, or a manual distribution of weights to each type of contriponent, with an equal distribution within types (e.g. 50% credit to authors, all of which start as equal, 25% credit to citations, all of which again start as equal, and 25% credit to software and data, again all of which start as equal.) A particular value could then be modified, with the tool changing all the other weights so that they continue to sum to 1.

4 JSON-LD

JSON-LD (JavaScript Object Notation for Linked Data, http://json-ld.org) is a subset of the key-value based JSON document format that provides a way of describing machine-readable information with semantic context. Popular as an alternative to XML in the web development community, JSON is also used as a base data format for search engines such as Elastic Search [] and NoSQL data stores such as MongoDB [].

In the final stages of standardization at W3C [], JSON-LD is designed to lower the barrier for data publishers who wish to provide ‘Linked Data’ so that concepts and entities can be identified with certainty. As an example of machine readable data with semantics take this JSON snippet describing a person: {“name”: “Daniel S. Katz”}. Without additional context there is ambiguity as to what the term describes (such as the name of a person, place or thing). A better alternative is:

{
  “@context”: “http://schema.org”,
  “@type”: “Person”,
  “name”: “Daniel S. Katz”,
  “@id”: “http://orcid.org/0000-0001-5934-7525”
}

5 Using JSON-LD for Transitive Credit

Smith recently proposed using JSON-LD for research tools []. In this paper, we extend this idea to suggest that it could be used for transitive credit of any scholarly product. Because of namespaced nature of the JSON-LD structure, it is trivial to include all contriponents such as datasets, software, and articles with their appropriate semantic context definition while maintaining a both human and machine-readable structure. Note that we use vocabularies from http://schema.org for the different entities; other examples include DOAP (https://github.com/edumbill/doap) and SPDX (http://spdx.org).

In order to use JSON-LD for product metadata, it needs to be standardized, at least in a de facto sense, both in terms of exactly how it is used (the schema) and how it is stored and accessed.

To determine a schema, a set of trials would likely help. They could be done through the Force11 Attribution working group [], which then would also provide a venue for discussion and consensus. This group exists to “catalyze rapid convergence on requirements, approaches, and practical implementation of a system for tracking contributions to any scholarly product,” [], including transitive credit among other ideas currently being explored.

This credit information could be stored as part of the metadata within the DOI system, perhaps by updating the kernel metadata and adding a creditmap entry to the DOI data dictionary [], though the authors note that this is a standardization activity which requires community effort, similar to that being undertaken today by CASRAI (http://casrai.org), VIVO (http://vivoweb.org), and others. Then, creditmap information would need to be added by registration authorities (RAs) that hold these scholarly products (i.e., publishers of all sorts, including journals, data archivers, and universities). Indexing systems such as Thomson Reuters and universities would also need to develop support for creditmaps. Similar complementary changes have also been discussed recently, aimed at characterizing contributions rather than assigning weights [, , ].

6 A JSON-LD Example

A subset of an possible creditmap for this article follows, as an illustrative example of the power of this concept. Some interesting aspects of the credit map include:

We assign credit to people who are acknowledged, as they have made a contribution to the article, e.g. Howison, Allen, Proctor
Software and tools are also cited, e.g., MongoDB, Mendeley. In this case, they are cited solely for the purpose of providing information about tools used; as they have not made a contribution to this article, we give them 0 credit.

{
  “@context”: “http://schema.org”,
  “@type”: “ScholarlyArticle”,
  “headline”: “Implementing Transitive Credit with JSON-LD”,
  “dateCreated”: “2014-07-10”,
  “keywords”: “transitive credit, credit for code, json-ld, linked data”,
  “author”: [
    {
    “@type”: “Person”,
    “name”: “Daniel S. Katz”,
    “@id”: “http://orcid.org/0000-0001-5934-7525”,
    “email”: “d.katz@ieee.org”
    “creditWeight”: “0.25”
    },
    {
    “@type”: “Person”,
    “name”: “Arfon Smith”,
    “@id”: “http://orcid.org/0000-0002-7217-4494”,
    “email”: “arfon@github.com”,
    “creditWeight”: “0.25”
    }
  ],
  “citation”: {
    “articles”: [
      {
        “@type”: “ScholarlyArticle”,
        “headline”: “Transitive credit as a means to address social and technological
        concerns stemming from citation and attribution of digital products”,
        “doi”: “10.5334/jors.be”,
        “creditWeight”: “0.32”
      }
  ],
  “software”: [
      {
        “@type”: “Code”,
        “name”: “MongoDB”,
        “codeRepository”: “https://github.com/mongodb”,
        “license”: “http://www.apache.org/licenses/LICENSE-2.0”,
        “creditWeight”: “0.0”
      }
  ],
  “tool”: [
      {
        “@type”: “Tool”,
        “name”: “Mendeley”,
        “toolURL”: “http://www.mendeley.com”,
        “creditWeight”: “0.0”
      }
  ],
  “acknowledgment”: [
      {
        “@type”: “Person”,
        “name”: “James Howison”,
        “@id”: “http://orcid.org/0000-0002-5702-149X”,
        “email”: “james@howison.name”,
        “creditWeight”: “0.01”
      },
      {
        “@type”: “Person”,
        “name”: “Gabrielle Allen”,
        “@id”: “http://orcid.org/0000-0003-3106-5360”,
        “email”: “gdallen@illinois.edu”,
        “creditWeight”: “0.01”
      },
      {
        “@type”: “Person”,
        “name”: “David Proctor”,
        “@id”: “http://orcid.org/0000-0002-6068-7110”,
        “email”: “djproctor@gmail.com”,
        “creditWeight”: “0.01”
      }
  ],
  “other”: [
      {
        “@type”: “BlogPosting”,
        “headline”: “JSON-LD for software discovery, reuse and credit”,
        “url”: “http://www.arfon.org/json-ld-for-software-discovery-reuse-and-credit”,
        “license”: “http://creativecommons.org/licenses/by/4.0/”,
        “creditWeight”: “0.15”
        }
      ]
    }
  }

7 Conclusions

In this paper we have outlined a mechanism (a creditmap) and a language (JSON-LD) for ascribing complete credit for all contriponents that have led to a scholarly product. This is one of the necessary elements for a transitive credit system. Others include standard practices for how to write the language (a schema), and how to store and access it. Other issues that would need to be resolved for this system to be used in practice are related to identifiers and social practices.

The manner in which we use unique identifiers is somewhat rough; we use ORCIDs for people and DOIs for papers, but for other elements, such as software and tools that may not have a DOI, the solution is not clear. Additionally, the manner in which weights are assigned is also rough; it might be easier to assign weights within categories, then assign weights to categories, and let the system determine the detailed weights.

While this paper addresses the technical aspects of how to make transitive credit possible, many social questions remain unresolved. For example, disciplinary communities will need to decide what the contriponents are that are relevant to their discipline, how to weigh various categories of contriponents (e.g., are developers more, equally, or less important than libraries?), and how the authors of products should assign weights to specific product contriponents (e.g., reference 1 is more/less important to a manuscript than reference 2.)

Overall, the idea of transitive credit is powerful, as the recognition of contributions (credit) to products would encourage the development of such products, and the transitive nature of the credit would encourage the development of products at all levels, not just those that are likely to be cited in papers.

One potential extension to this creditmap we have proposed would be to replace the “keywords” entry to include a description of the subject area from a defined taxonomy. When appropriate, such as in the case of software, this could also include a description of the function of the tool. Using indexing tools such as Elastic Search [], it would then be possible to build indexes of these fields and then make powerful faceted searches such as ‘find astrophysics software, written in Python designed to manipulate spectroscopic data’.

One future project that would test the value of transitive credit would be to build creditmaps for a set of software in a given area domain, potentially by working with data stored in GitHub that includes contributors and dependencies, and in some case, citations. This would build a creditmap graph that could then be analyzed.

Finally, we note that our proposal in this paper is very compatible with the current altmetrics activities [], and would add to the alternative metrics that could be collected and analyzed.

Competing Interests

The authors declare that they have no competing interests.

[B1] National Science Foundation (2012). A Vision and Strategy for Software for Science, Engineering, and Education: Cyberinfrastructure Framework for the 21st Century NSF, : 12–113. Available from: http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12113. Accessed: 2015-02-08.

[B2] Katz, D S Choi, S C Lapp, H Maheshwari, K Löffler, F Turk, M et al. (2014). Summary of the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1) Journal of Open Research Software 2(1) DOI: https://doi.org/10.5334/jors.an

[B3] Priem, J, Taraborelli, D, Groth, P and Neylon, C (2011). Altmetrics Manifesto. Available from: http://altmetrics.org/manifesto.

[B4] National Research Council (2012). For Attribution–Developing Data Attribution and Citation Practices and Standards National Academies Press,

[B5] National Science Foundation (2014). Grant and Proposal Guide NSF, : 14–1.

[B6] Katz, D S (2013). Citation and Attribution of Digital Products: Social and Technological Concerns figshare, DOI: https://doi.org/10.6084/m9.figshare.791606 791606.

[B7] Katz, D S (2014). Transitive credit as a means to address social and technological concerns stemming from citation and attribution of digital products Journal of Open Research Software, DOI: https://doi.org/10.5334/jors.be

[B8] Katz, D S and Smith, A M (2014). Implementing Transitive Credit with JSON-LD CoRR, abs/1407.5117. Available from: http://arxiv.org/abs/1407.5117. Accessed: 2015-02-08.

[B9] International DOI Foundation (2012). DOI Data Model. In: The DOI Handbook International DOI Foundation, Available from: http://www.doi.org/doi_handbook/4_Data_Model.html. Accessed: 2014-07-14.

[B10] Mendeley (). Available from: http://www.mendeley.com. Accessed: 2015-02-08.

[B11] CiteULike (). Available from: http://www.citeulike.org. Accessed: 2015-02-08.

[B12] Zotero (). Available from: http://www.zotero.com. Accessed: 2015-02-08.

[B13] Elastic Search (). Available from: http://www.elasticsearch.org. Accessed: 2014-07-07.

[B14] MongoDB (). Available from: http://www.mongodb.org. Accessed: 2014-07-07.

[B15] W3C JSON-LD definition (). Available from: http://www.w3.org/TR/json-ld/. Accessed: 2014-07-07.

[B16] Smith, A (2014). JSON-LD for software discovery, reuse, and credit Available from: http://www.arfon.org/json-ld-for-software-discovery-reuse-and-credit. Accessed: 2014-07-10.

[B17] Force11 Attribution Working Group (). Available from: https://www.force11.org/group/attributionwg. Accessed: 2015-02-10.

[B18] Davenport, E and Cronin, B (2001). Who dunnit? Metatags and hyperauthorship Journal of the American Society for Information Science and Technology 52(9): 770–773, DOI: https://doi.org/10.1002/asi.1123

[B19] Allen, L, Scott, J, Brandt, A, Hlava, M and Altman, M (2014). Publishing: Credit where credit is due Nature 508: 312–313, DOI: https://doi.org/10.1038/508312a

[B20] Häussler, C and Sauermann, H (2014). The Anatomy of Teams: Division of Labor and Allocation of Credit in Collaborative Knowledge Production SSRN working papers series, DOI: https://doi.org/10.2139/ssrn.2434327

Journal of Open Research Software

Issues in Research Software