Subjective listening tests are routinely conducted by academic researchers and industry professionals to assess the quality of various speech and audio processing algorithms and transmission services. Listening tests often take place in controlled environments for the sake of consistency, but in many cases, listening tests could be undertaken remotely using a suitable web interface. Despite the work of several projects in the past, there is no publicly available, fully hosted listening test platform which allows for easy test creation, deployment and data collection. Here, we present a fully functional end-to-end listening test platform which allows a user to create and share MUSHRA, ACR and A/B tests within minutes. Collected data can then be downloaded in various forms. For users who would prefer to host the system on their own servers, we provide Docker and AWS images for easy installation.
Advances in audio processing technology often employ a series of objective and subjective tests to compare performance against previous algorithms. In the case of subjective listening tests, the most commonly used methodologies include; Multiple Stimuli with Hidden Reference and Anchor (MUSHRA) , Absolute Category Rating (ACR)  and A/B comparison testing . Increasingly, these tests are being conducted online using browser based technology and it has been shown that the results for online tests correlate well with those obtained in controlled laboratory conditions . Typically, a researcher will design a bespoke testing system or extend from an existing framework. The most common frameworks are briefly described here.
The frameworks differ in their supports for customisation to suit various experiment requirements. BeaqleJS mainly supports A/B and MUSHRA tests or any modified tests based on the two predefined tests forms; the webMUSHRA and WAET allow researchers to build their tests based on a wider range of tests forms and rating scales; TheFragebogen has extra support for behaviour research by incorporating behavioural scales (the NASA task load index (TLX) and the visual analogue scale (VAS)), allowing for free-hand input, and documenting user’s response time; the webMUSHRA provides 2D and 3D GUIs to facilitate the report of localisation judgement for spatial audio assessment. In response to the challenge of the environment control in remote tests, different frameworks also provide solutions accordingly. TheFragebogen’s response time records are used to scrutinize anomalous user behaviours; the P.808 Toolkit implements the standard crowdsourcing methods  which test participant’s hearing ability, headphone usage, and the environment to control for the test conditions remotely. The P.808 Toolkit is a welcome addition but currently only runs on the Amazon Mechanical Turk (AMT), a common platform for crowdsource experiments, to integrate the experiment design, setup, and recruitment procedures in one place. These frameworks are admirable but they require coding knowledge and a significant time investment to successfully configure and connect them to a suitable database layer along with deployment on a secure publicly accessible server.
Our goal in creating this system was to allow any researcher to create and share listening tests easily without the need for coding, managing deployment or hosting. To that end, we have created GoListen which is a fully hosted end-to-end listening test platform with support for several popular listening test types along with a variety of survey question types. We are making the source code available but the key advantage of our system is that we are also providing a fully hosted web application. In total, we provide three options for using the system.
The system we present here, allows users to create and share a test in minutes with no code or setup required. Currently supported test types include A/B, ACR and MUSHRA with more test types to follow. Survey questions can be inserted before, after and between audio examples. Question types include multiple choice, check box and free input text answers. Audio playback for each question can be set to loop infinitely or a fixed number of times. The subject can also be required to listen to each audio example in full any number of times before being allowed to proceed. Switching between audio examples is fully synchronised, meaning that audio playback continues uninterrupted regardless of which audio condition is being listened to. Subjects may also loop sections of audio if enabled by the test creator. Individual questions can be optional or required for both audio examples and survey questions. Questions and audio examples can be viewed on individual pages or the full test can be configured as a single page view. Audio test order can be randomised for all test types and stimulus presentation order can be randomised within a MUSHRA test. The system also allows for timing each question. In terms of the test builder, questions can be reordered and edited easily. Tests can be created from a blank template or a predefined template. They can also be edited and duplicated. When the results have been collected, they can be downloaded in CSV and JSON format.
In the following sections, we present the user interface and illustrate how it can be used to create tests.
The user interface is organised in a familiar manner whereby the main navigation is presented in the left pane and the content is presented in the right pane. When a user logs in, they are presented with the dashboard. This is the “home” view and most actions start from here.
Creating a new test is achieved by tapping the Add Test button, Figure 1. The user has the option of selecting a blank test or a pre-populated template which is created by administrators of the system. Upon selection, the test design screen opens which is discussed in a later section.
Another way to create a test is to copy an existing test which you may have created in the past. Tap on the Copy icon and a new test will appear in your dashboard with the same name as your copied test and “copy” appended to the end. The new test can be renamed and will not contain the responses from the parent test.
Any existing test can be edited from the dashboard by tapping on the Edit icon. This brings the user back into the test design screen which will be discussed below. Note, if a test has already been shared and received responses, it cannot be edited without making a new copy.
A test can be shared by tapping on the share icon. This presents a modal allowing the user to copy the public link for the test or to launch the test in a new browser window. The test can be shared from the dashboard or from the test design screen.
Designing a test is easy using the set of editing tools the system provides. Every test is given a name and description so that the user can locate the test on the dashboard later on, Figure 2.
Survey questions can be added by tapping the Add Question button which reveals a context menu listing various question types, Figure 2. Options include:
Audio questions can be added by tapping the Add Question button which reveals a context menu listing various question types, Figure 2. Audio Options include:
Each audio question has a settings menu which controls how the respondent will interact with audio playback as seen in Figure 7. The audio formats supported are those described in the HTML5 Audio specification . The options are as follows:
As you build the test in the test design screen, you may want to check what it will look like for the respondent. This can be achieved by tapping on the preview icon which can be found on the tool bar which appears at the top and bottom of the test design screen as seen in Figure 2.
Every test has a set of global parameters which can be accessed by tapping on the gear icon in the tool bar at the top of the test design screen. Options include:
After inserting all questions in a test, you may need to edit or change some attributes of the test. The system contains some useful tools to make editing quick and easy. Tools include:
The system currently supports the following tests: Multiple Stimuli with Hidden Reference and Anchor (MUSHRA) , Absolute Category Rating (ACR)  and A/B comparison testing . In all test types, audio switching is synchronous and seamless. This allows the respondent to switch audio conditions without hearing any discontinuities in the audio playback. A loop tool is optional in all test types.
An A/B test is used to compare two versions of the same piece of audio where each version has been subjected to a different version of the same process. When presented with the test, the respondent will see an interface similar to that in Figure 8. The respondent may interact in the following ways:
An ACR test is used to assess the quality of a piece of audio on a labeled 5 point scale. When presented with the test, the respondent will see an interface similar to that in Figure 9. The respondent may interact in the following ways:
A MUSHRA test is used to assess the relative quality of multiple audio stimuli on a continuous scale ranging from 0 to 100. Within the stimuli there is a hidden reference and an anchor. The test is more flexible than the ITU-R BS.1534-3 standard and allows the user to add or omit anchors. When presented with the test, the respondent will see an interface similar to that in Figure 10. The respondent may interact in the following ways:
Test order randomisation is supported for all audio test types. This is achieved during survey creation by adding a Group Divider before and after a group of audio tests as shown in Figure 11. Randomisation of stimuli within a single MUSHRA audio test is also supported and can be activated within the question settings menu of any MUSHRA audio example as seen in Figure 12.
Once the link to a test has been shared, the system will collect responses from any respondent who completes the test. The test owner can check the responses at any time from the Responses tab at the top of the test design screen, Figure 2.
The response view shown in Figure 13 allows a user to:
Respondents also have the ability to delete their response using a unique link furnished to them at the end of a completed test. In this instance, the test owner will receive a notification requesting that the updated results be downloaded.
The software is separated into two independent parts: a front-end and a back-end which serves a database. These two parts are currently served on the same server with different processes but could easily be served independently in a microservices style architecture. The front-end is implemented in React.js and requests data from the back-end which uses Python to make requests to the database which is implemented in MongoDB. The architecture is RESTFul. In addition to source code, we have also provided the software both as a Docker install and an AWS AMI (Amazon Machine Image). Details can be found in the GitHub repository .
For the Docker install, there are 3 containers: frontend, backend and database, all working on the host network. This means each container must have some ports open and connected to local ports of the host. For example, the frontend has ports 80 and 443 opened, which are connected to ports 80 and 443 of the host.
AWS AMI is based on Ubuntu 18.04 with the latest Docker files installed. The configuration is the same as above.
Some components contain unit testing within the code but in general, end-to-end testing was primarily used to ensure the quality of the software. Alpha testing was carried out by 5 researchers at QxLab in University College Dublin. Testers were instructed to create variants of each test type and to provide test responses in order to ensure that the data is collected and stored as expected. This phase of testing also revealed many useful user experience optimisations. Beta testing was carried out by conducting two real-world studies using the software. The software was then released to some invited universities and companies where more feedback was gathered.
The web application is built with privacy in mind. End to end SSL encryption (256-bit) is used by default between client and server. All passwords are stored in encrypted format (256-bit) so are not human readable in the event of any breach. All passwords require a minimum length with a combination of upper/lower case letters, numbers and symbols. The researcher can delete all and any data at any point in time using the survey deletion and account deletion tools we provide within the app. The respondent is completely anonymous. No login is required and the app collects no data beyond the survey responses. The app does not store any IP addresses or track the respondents in any other way. The respondent can delete their response to a survey at any time using a unique link provided to them at the end of each survey. The privacy statement we use for the hosted web application can be found here https://golisten.ucd.ie/GoListenPrivacyStatment.html.
The system is available as a web service at https://golisten.ucd.ie/ where a user can create a free account and begin using the service immediately. We have also made the source code available for use under the MIT License.
The software can be run on many operating systems. The following systems were used during testing.
Minimum system requirements:
All contributors are members of QxLab at University College Dublin, Ireland.
Name: Listening Test Platform
Persistent identifier: https://github.com/QxLabIreland/listening-test
Publisher: QxLab, University College Dublin
Version published: 0.1.0
Date published: 24/05/20
Name: Listening Test Platform
Persistent identifier: https://github.com/QxLabIreland/listening-test
Date published: 24/05/20
The software itself is a listening quality survey creation tool so the reuse potential is limitless. Furthermore, the software is designed to be an end-user system, meaning that it can be used by non-technical users with no coding ability if it is setup correctly once. For example, within a university setting, an audio research lab could install and run the system on a server. The system once running, handles all user account creation and management for other members of the lab. The survey creation tools within the system follow common design patterns allowing non-technical users to create and share very complex surveys without needing any knowledge of coding. As described throughout the article, the user can choose from many standard survey components along with the custom audio test components we provide (AB, ACR and MUSHRA). Combining these components allows a user to achieve a wide variety of listening quality test scenarios.
Since the software provides a full end-to-end survey platform, it could also be extended to other media types such as image and video giving rise to even greater reuse potential. The existing framework already handles account creation, user management, data storage and the survey creation tools which can be used as the common core for any other media quality assessment tasks one might think of. Even beyond media quality, many of the core survey features could be reused for any bespoke platform requiring online survey delivery and data collection.
We would like to acknowledge all those at QxLab who helped test the software.
Funded in part by the Science Foundation Ireland (SFI), and the European Regional Development Fund under Grant 12/RC/2289_P2 and Grant 13/RC/2077_P2.
The authors have no competing interests to declare.
Cartwright M, Pardo B, Mysore GJ, Hoffman M. “Fast and easy crowdsourced perceptual audio evaluation.” In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2016; 619–623. DOI: https://doi.org/10.1109/ICASSP.2016.7471749
Schoeffler M, Bartoschek S, Stöter F-R, Roess M, Westphal S, Edler B, Herre J. “webMUSHRA – A Comprehensive Framework for Web-based Listening Tests.” Journal of Open Research Software. 2018; 6(1): 8. DOI: https://doi.org/10.5334/jors.187
Guse D, Orefice HR, Reimers G, Hohlfeld O. “Thefragebogen: A web browser-based questionnaire framework for scientific research.” In 11th International Conference on Quality of Multimedia Experience, QoMEX, 2019. DOI: https://doi.org/10.1109/QoMEX.2019.8743231
Naderi B, Cutler R. “An Open Source Implementation of ITU-T Recommendation P.808 with Validation.” In Proc. Interspeech 2020. 2020; 2862–2866. [Online]. DOI: https://doi.org/10.21437/Interspeech.2020-2665
Mdn webdocs – media container formats. 2020 [Online]. Available: https://developer.mozilla.org/en-US/docs/Web/Media/Formats/Containers.
Go listen – github repository. 2020 [Online]. Available: https://github.com/QxLabIreland/listening-test.