In September 2017, 37 people interested in sustainable research software came together at the Working towards Sustainable Software for Science: Practice and Experiences (WSSSPE5.1, wssspe.researchcomputing.org.uk/wssspe5-1/) meeting in Manchester, UK. WSSSPE5.1 immediately preceded the Second Research Software Engineers (RSE) Conference, so that RSE attendees could also attend WSSSPE5.1.
WSSSPE is an international community-driven organization that promotes sustainable research software by addressing challenges related to the full lifecycle of research software through shared learning and community action. It envisions a world where research software is accessible, robust, sustained, and recognized as a scholarly research product critical to the advancement of knowledge, learning, and discovery.
WSSSPE promotes sustainable research software by positively impacting:
WSSSPE defines Sustainable software as software that has the capacity to endure such that it will continue to be available in the future, on new platforms, meeting new needs. The research software lifecycle includes: acquiring and assembling resources (including funding and people) into teams and communities, developing software, using software, recognizing contributions to and of software, and maintaining software.
Six previous WSSSPE events1 [1, 2, 3, 4] included group discussions about problems and potential solutions in the sustainable research software space. WSSSPE5.1 used the speed blog methodology to generate eight reports on different views of this space. The blogs were published by the UK Software Sustainability Institute (SSI) on its website.
The remainder of this paper includes a set of presentations from the accepted papers and lighting talks (§2), outputs from the speed-blogging groups (§3), and thematic analysis of the resulting blogs (§4), before concluding (§5).
Submissions to WSSSPE5.1 comprised six papers and eight lightning talks. Of these, four papers [5, 6, 7, 8] and seven lightning talks [9, 10, 11, 12, 13, 14, 15] were accepted. All papers and lightning talks were published as a figshare collection [16]. Slides for the given talks have been published on the WSSSPE5.1 website (wssspe.researchcomputing.org.uk/wssspe5-1/wssspe5-1-agenda/).
The “research software sustainability space” can be described as a set of activities which impact the sustainability of research software in different ways and on different levels, e.g., how a research software is developed by developers, how it is funded, how and where it is published, etc. These activities consist of “actions”, which are undertaken by “agents”, where an “actor” acts on an “actee”. Activities in the space can thus be modeled as a directed graph, and represented in respective schematics. In this section, we aim to evaluate which subset of activities has been represented in the presentations given at WSSSPE5.1.
Figure 1 shows a version of a schematic of activities within the research software sustainability space as introduced by Katz [17]. The original schematic has been a result of introspective work based on experience rather than quantitative research. Author Druskat has remodeled the original schematic manually, using [18] run on draw.io, to produce Figure 1. In order to adapt it for purposes of workshop evaluation, both node outlines and edge labels have been weighted to show the distribution of recorded agents and actions across workshop presentations [19]. This work was done based on the presentation abstracts [16] rather than the actual presentations to ensure traceability.
Research software sustainability space schematic; topic weights based on WSSSPE5.1 presentations.
To calculate the applicable weights, the schematic was resolved into unique activities. A unique activity is a single edge label verb (the action) going from a node (the actor) to another node (the actee). The presentations were then coded, recording for each presentation whether or not an activity was present. Based on the total of occurrences of an activity across presentations (x) and the range of occurrence totals of all activities across presentations (r), the font size in point (pt) for the edge labels representing single actions has been normalized to target scaling range t = 12 – 24, yielding the scaled font size x′:
Similarly, the total of occurrences of an agent across presentations, either as actor or actee of an activity, has been normalized to target scaling range t = 2–20, yielding the scaled width of the node outline in pt.
Figure 2 shows the distribution of actions over all presentations, based on [19], where an action is a labeled edge from a node in the sustainability schematic (Figure 1) to another node. The source node represents the actor, and the target node the actee of the respective action. The weights and action distribution show a clear overall primary focus of presentations on software development (People develop Software). More generally, the people involved in research software take prominent focus as well as software engineering principles, and research software itself. However, across all presentations most actors, actees, and actions within the space have been the topic of a presentation as actor or actee, with the exception of hardware and underlying software. Future workshops could take care to address this topic specifically in their calls for submissions in order to close gaps in research, discussion and progress.
Distribution of activity coverage over presentations.
Taken for itself, these figures only represent a snapshot of activities around research software sustainability. In order to represent the continuum of efforts and make quantitatively informed statements about both general progress, and the specific insights and outcomes from the particular workshop reported on here, a larger corpus of workshop products (abstracts, presentations, blogs, etc.) would have to be analyzed. While such a larger-scaled analysis is out of scope for this report, work has started within the WSSSPE community to create an ontology of activities in the research software sustainability space [20, 21], which will enable analyses of this kind in future work.
Table 1 shows the distribution of combined actor and actee reference from the research software sustainability space over workshop presentations.
Table 1
Distribution of topics from the research software sustainability space over workshop presentations for actors and actees (combined).
Presentation | Communities | Funding organizations | Hardware & underlying software | Hiring organizations | People | Publishers, repositories, indices | Software | Software engineering processes |
---|---|---|---|---|---|---|---|---|
[5] | • | • | ||||||
[6] | • | • | • | • | • | |||
[7] | • | • | • | |||||
[8] | • | • | • | • | ||||
[9] | • | • | • | • | ||||
[10] | • | • | • | • | • | |||
[11] | • | • | • | • | ||||
[12] | • | • | • | |||||
[13] | • | • | • | • | ||||
[14] | • | • | • | • | • | • | ||
[15] | • | • | • | • | • | • | • |
The Software Sustainability Institute (SSI) discusses speed blogging at www.software.ac.uk/term/speed-blogging. The goal of speed blogging is to preserve as much content and context as possible from working groups, and to publish results in an easily digestible form, usually blog posts. Half the available working group time at WSSSPE5.1 was allocated to discussion, and the other half to writing the blogs. The rest of this section summarizes the WSSSPE5.1 blog posts, which are accessible at wssspe.researchcomputing.org.uk/wssspe5-1/. Indented italicized text indicates quotations from the blogs.
For many, the role of research software project manager (RSPM) may be an accidental calling. The career path for this role isn’t well-established, and research software development in academia may itself be something of a haphazard, nigh-accidental byproduct of conducting domain research. Individuals approaching this role may have little to no wider industry experience, instead approaching the project manager role from research or research software engineering.
Nowadays, software is used in most research. But how the software is created, used, and what it depends on are not well understood questions. The importance of such knowledge varies based on the motivation of the reader. On one side, we could be interested in the impact of the software, how many times it has been used and by who. This type of analysis could come, for example, from funding bodies and organisations to reward the creation of something and help its sustainability, from institutions who hire people behind that software, or from the software authors to get an understanding of the needs of their users or simply to get credit for their work. Another motivation may be trying to understand the research being carried out with a particular software or set of tools either for purely academic purposes (e.g., by historians and scholars of science) or with a commercial perspective (such as by intellectual property teams from universities for the monetisation of the software).
Ensuring reproducibility of research has been identified as one of the challenges in scientific research. While reproducibility of results is a concern in all fields of science, the emphasis of this group is in the area of computer software reuse and the reproduction of results. The availability of complete descriptions, ideally including program source code, documentation and archives of all necessary components and input datasets would be a major step to resolving research reproducibility concerns.
Higher quality output is on every university’s wish list, as it leads to a potential increase in QR funding (www.hefce.ac.uk/rsrch/funding/mainstream/), while reducing the reputational risk associated with substandard research practices. Reproducibility is notoriously difficult to achieve and RSEs are an essential part of enabling this […] leading to higher citations and greater research impact.
An RSE team that is permanently employed can be truly agile. Recruiting an expert for a short period of time in an academic institution is virtually impossible. As long as we rely on fixed-term contracts for RSEs, a lot of important work will fail to be done, and funds will not be spent as effectively as they could be.
The citation of research software has a number of purposes, most importantly attribution and credit, but also the provision of impact metrics for funding proposals, job interviews, etc. Stringent software citation practices […] therefore include the citation of a software version itself, rather than a paper about the software. Direct software citation also enables reproducibility of research results as the exact version can be retrieved from the citation.
As a general concept: start small and then go as far as necessary. Reaching for the perfect software development approach is intimidating and overwhelming, and it is not the task of a researcher nor necessary for most research projects. A maturity model can help researchers identify where they are and where they should be […]. Restricting the use of tech jargon to a minimum and offering explanations where necessary can help, too.
What can be termed as “coding” is a subset of wider software engineering practices such as version control, continuous integration and good software design. Coding is prevalent in academia but practices that allow sustainable software to be produced are frequently overlooked. Motivating the uptake of the approaches, methods and tools, and highlighting the benefit they deliver, by engaging with researchers who develop software is the first step in spreading best practice in our community.
The authors point out the benefit of using online systems such as GitLab to reduce entry barriers and motivating the use of, e.g., version control and continuous integration (CI). Other software engineering principles such as pair-programming and code review are also encouraged early in (graduate) students training in order to demonstrate the benefits for future use and bridging gaps between disciplines. If these research software management practices become a requirement in grants application and reporting, widespread deployment is inevitable. Increasing recognition of software as a valid research output could provide further motivation, along with better reproducibility and reduced duplication of effort.
The speed blog topics were determined using an unconference format—where participants chose what to cover as a group—and therefore provided a snapshot of issues that were particularly timely and relevant to the WSSSPE community. Participants were free to choose which speed blogging group to join, depending on their own particular interests and goals. In this section, we treat the speed blogs as qualitative data, which we systematically analyse to determine the prevalence of particular themes.
We used a hybrid thematic/framework analytic approach, where we used the schematic of the research software sustainability space shown in Figure 1 as a starting point for our analysis, and then refined this as we familiarized ourselves with the data. The schematic nodes formed the categories: funders (‘funding organizations’ in the schematic); employers (‘hiring organizations’); publishers, repositories, indices; research software (‘software’); software engineering processes; communities. The edges provided the categories: reward & recognition (‘recognize, reward’); training; standardization (‘standardize’); reproducibility (‘reproduce’). Based on a bottom-up analysis of the data, we broke the schematic ‘people’ node down further into users, research software engineers and researchers, and added a further category of software infrastructure.
Authors Jay and Haines coded the blogposts independently, recording for each blogpost whether or not the theme was present. This process resulted in agreement of 79%. Disagreements were then resolved via discussion. The original data set (blogposts and individual and joint coding scores) are available for further analysis [24].
All of the eight blogs mentioned research software, and researchers (i.e., domain specialists rather than RSEs). The blogs also all mentioned reward and recognition, either for software itself, or for people writing software. The prevalence of all of the categories across the blog posts can be seen in Figure 3. During the discussion of the speed blog topics it was decided that there seemed to be sufficient writing already on citation and credit, so while this topic fit under the WSSSPE umbrella, it was not suggested as a topic for speed blogs. The fact that it was mentioned in all the blogs indicates that it is still an important theme for the community.
The number of speed blogs mentioning each of the themes.
In summary, the community recognizes the need for improving software engineering practices and has been starting to take action around this. A set of initial success stories have been reported from a handful of organizations. Recognition for work, including via citations, remains a topic of interest, with less progress that is needed. Based on the presentations and working groups at WSSSPE5.1, we have presented a set of topics and mapped the presentations and speed blogs onto these topics. The presentations and speed blogs cover most of the topics and do not have subjects that are not in the list, indicating that the topics (or themes, a more fine-grained mapping of the space as discussed in Section 4) are a good representation of the space.
While the existing set of topics may serve as a broad overview of issues in the research software sustainability space, we cannot yet use them to make broader, informed statements about how the space itself and the activities taking place within it have developed, or whether – and which – activities have helped to solve issues, and whether there are gaps in activities that the community should aim to fill.
Overall, as we have stated previously [25], we have learned from the first four years of WSSSPE that it is relatively easy to get motivated people to attend a meeting and productively spend their time there both doing work and planning more work, but it is very hard to get that additional work after the meeting to take place. Given this, we have turned WSSSPE meetings (including this one and another in 2017, and one in 2018) into gathering places to discuss scientific software sustainability, and for groups that are already in place or that can be composed of related funded activities to meet.
1The first WSSSPE workshop was named “Working towards Sustainable Software for Science: Practice and Experiences,” which remains the meaning of the WSSSPE group, but the workshops after that were named “Workshop on Sustainable Software for Science: Practice and Experiences.” Together these reflect that WSSSPE is both a community and a set of workshops.
S. Druskat would like to acknowledge funding assistance from the Software Sustainability Institute. The Software Sustainability Institute is supported by the EPSRC, BBSRC and ESRC Grant EP/N006410/1.
The authors have no competing interests to declare.
Katz, D S, Allen, G, Chue Hong, N, Parashar, M and Proctor, D 2013 First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE): Submission and Peer-Review Process, and Results. arXiv. 1311.3523. Available from: http://arxiv.org/abs/1311.3523.
Katz, D S, Choi, S C T, Lapp, H, Maheshwari, K, Löffler, F, Turk, M, et al. 2014 Summary of the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1). Journal of Open Research Software, 2(1). DOI: https://doi.org/10.5334/jors.an
Katz, D S, Allen, G, Chue Hong, N, Cranston, K, Parashar, M, Proctor, D, et al. 2014 Second Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2): Submission, Peer-Review and Sorting Process, and Results. arXiv. 1411.3464. Available from: http://arxiv.org/abs/1411.3464.
Katz, D S, Choi, S T, Wilkins-Diehr, N, Chue Hong, N, Venters, C C, Howison, J, et al. 2016 Report on the Second Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2). Journal of Open Research Software, 4(1): e7. DOI: https://doi.org/10.5334/jors.85
Nangia, U and Katz, D S 2017 Track 1 Paper: Surveying the U.S. National Postdoctoral Association Regarding Software Use and Training in Research. In: Chue Hong, N, Druskat, S, Haines, R, Jay, C, Katz, D S and Sufi, S (eds.), Proceedings of the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1). figshare. DOI: https://doi.org/10.6084/m9.figshare.5328442.v3
Haupt, C and Schlauch, T 2017 Track 1 Paper: The Software Engineering Community at DLR — How We Got Where We Are. In: Chue Hong, N, Druskat, S, Haines, R, Jay, C, Katz, D S and Sufi, S (eds.), Proceedings of the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1). figshare. DOI: https://doi.org/10.6084/m9.figshare.5331703.v2
Queiroz, F, Silva, R, Miller, J, Brockhauser, S and Fangohr, H 2017 Track 1 Paper: Good Usability Practices in Scientific Software Development. In: Chue Hong, N, Druskat, S, Haines, R, Jay, C, Katz, D S and Sufi, S (eds.), Proceedings of the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1). figshare. DOI: https://doi.org/10.6084/m9.figshare.5331814.v3
Mulholland, D, Alencar, P and Cowan, D 2017 Track 2 Paper: The Future of Metadata-Oriented Testing of Research Software: Automated Generation of Test Regimes and Other Benefits. In: Chue Hong, N, Druskat, S, Haines, R, Jay, C, Katz, D S and Sufi, S (eds.), Proceedings of the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1). figshare. DOI: https://doi.org/10.6084/m9.figshare.5334091.v1
Silva, R 2017 Track 1 Lightning Talk: Research Software in Brazil. In: Chue Hong, N, Druskat, S, Haines, R, Jay, C, Katz, D S and Sufi, S (eds.), Proceedings of the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1). figshare. DOI: https://doi.org/10.6084/m9.figshare.5328043.v1
Struck, A 2017 Track 1 Lightning Talk: How Red Tape and Other Obstacles Are Holding Us Back. In: Chue Hong, N, Druskat, S, Haines, R, Jay, C, Katz, D S and Sufi, S (eds.), Proceedings of the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1). figshare. DOI: https://doi.org/10.6084/m9.figshare.5350501.v2
Washbrook, A 2017 Track 1 Lightning Talk: Continuous Software Quality Analysis for the ATLAS Experiment at CERN. In: Chue Hong, N, Druskat, S, Haines, R, Jay, C, Katz, D S and Sufi, S (eds.), Proceedings of the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1). figshare. DOI: https://doi.org/10.6084/m9.figshare.5348830.v2
Dasler, R 2017 Track 1 Lightning Talk: CERN Analysis Preservation – Contextualising Analyses through Data and Software Preservation. In: Chue Hong, N, Druskat, S, Haines, R, Jay, C, Katz, D S and Sufi, S (eds.), Proceedings of the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1). figshare. DOI: https://doi.org/10.6084/m9.figshare.5330812.v2
Alhozaimy, S, Haines, R and Jay, C 2017 Track 1 Lightning Talk: Forking as a Tool for Software Sustainability —An Empirical Study. In: Chue Hong, N, Druskat, S, Haines, R, Jay, C, Katz, D S and Sufi, S (eds.), Proceedings of the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1). figshare. DOI: https://doi.org/10.6084/m9.figshare.5328796.v1
Maassen, J, Drost, N, van Hage, W and van Nieuwpoort, R 2017 Track 2 Lightning Talk: Software Development Best Practices at the Netherlands eScience Center. In: Chue Hong, N, Druskat, S, Haines, R, Jay, C, Katz, D S and Sufi, S (eds.), Proceedings of the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1). figshare. DOI: https://doi.org/10.6084/m9.figshare.5327587.v2
Druskat, S 2017 Track 2 Lightning Talk: Should CITATION Files Be Standardized? In: Chue Hong, N, Druskat, S, Haines, R, Jay, C, Katz, D S and Sufi, S (eds.), Proceedings of the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1). figshare. DOI: https://doi.org/10.6084/m9.figshare.3827058
Chue Hong, N, Druskat, S, Haines, R, Jay, C, Katz, D S and Sufi, S (eds.) 2017 Proceedings of the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1). University of Manchester, Manchester, UK: figshare. DOI: https://doi.org/10.6084/m9.figshare.c.3869782
Katz, D S 2018 Research Software Sustainability: WSSSPE & URSSI. DOI: https://doi.org/10.6084/m9.figshare.6081248.v1
The JGraph drawio authors 2018 draw.io (Version v8.7.10). Available from: https://github.com/jgraph/drawio/releases/tag/v8.7.10.
Druskat, S 2018 WSSSPE5.1 presentations: Research software sustainability activity analysis (Version 0.2.0). DOI: https://doi.org/10.5281/zenodo.1291506
Druskat, S and Katz, D S 2018 Mapping the research software sustainability space. Available from: https://arxiv.org/abs/1807.01772.
Druskat, S and Katz, D S forthcoming Mapping the Research Software Sustainability Space. In: 2018 IEEE 14th International Conference on eScience (eScience). IEEE Computer Society, 25–30. DOI: https://doi.org/10.1109/eScience.2018.00014
Fenner, M, Katz, D S, Nielsen, L H and Smith, A 2018 DOI Registrations for Software. DataCite. Available from: https://blog.datacite.org/doi-registrations-software/.
Li, K, Yan, E and Feng, Y 2017 nov How is R cited in research outputs? Structure, impacts, and citation standard. Journal of Informetrics, 11(4): 989–1002. DOI: https://doi.org/10.1016/j.joi.2017.08.003
Jay, C and Haines, R 2018 WSSSPE 5.1–Data for speed blog analysis. DOI: https://doi.org/10.5281/zenodo.1305091
Katz, D S, Niemeyer, K E, Gesing, S, Hwang, L, Bangerth, W, Hettrick, S, et al. 2018 Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4). Journal of Open Research Software, 6(1): 10. DOI: https://doi.org/10.5334/jors.184