(1) Overview

Introduction

Over the past ten years or so, due to their affordability, compactness, high-resolution imagery, and durability, digital action cameras (e.g. Garmin VIRB, GoPro, Campark, Akaso, VanTop Moment) have become an increasingly popular tool in various research fields. In social sciences, they are typically used in visual anthropology [, ] and geography [, ] for collecting a broad range of data in the field. Indeed, videos recorded by digital action cameras (DACs) can enhance analyses based on innovative methodologies such as mobile methods [, ], participatory visual research [, , ], or non-participatory observation studies [, ]. They are also many applications of DACs in transportation studies [, , , ], environmental sciences [, ], and organizational research [, ].

Furthermore, recent introductions of automatic detection video tools in the field of computer vision include SegNet [] based on Deep Convolutional Neural Networks, Vatic [] or development tools associated with the OpenCV libraries [], YOLO3, BGSLibrary [], such as Simple Vehicle Counting [] and Vehicle Detection with Haar Cascades []. Automatic detection video tools are more commonly developed and used in studies focused on road traffic where there is a need for tracking and counting vehicles in a fixed camera video recording [, ]. Consequently, the combined use of computer vision and video analysis becomes an interesting tool for comprehending the urban area and the detection of traffic conflicts [, ], the classification of users [, ], and lastly, for the analysis of user behavior from motor vehicles [], pedestrians [, , ], as well as cyclists [].

Since these applications are specifically used to identify features related to road traffic recorded by a fixed camera, automatic detection becomes more difficult when the camera is secured to a moving object. For example, when the camera is secured to the handle bars of a moving bike.

The application presented (Vifeco) in this article is not aligned with the above described work on automatic object detection in videos for two reasons. First, we want the user to be able to define their own features for identification. In consequence, the application can then be used in many different domains outside of transport. Second, we seek to develop an application that is not limited to fixed cameras.

The aim of this article is, therefore, to describe an open-source application (Vifeco) that makes it possible to manually identify features on a video.

Implementation and Architecture

From a technical point of view, Vifeco is an open-source software (Apache licence, version 2.0) written in Java 11 with the JavaFX UI toolkit. The application can be installed on any platform which has a Java Runtime Environment (JRE) and JavaFX runtime (please see the Dependencies section). The user interface currently supports three languages (English, French, Spanish). The application and its Java source code are available free of charge (under the Apache Licence 2.0) and can be downloaded at https://github.com/LAEQ/vifeco/releases/.

Vifeco is built with the Griffon framework. Inspired by Grails, this framework enforces concepts like convention over configuration and dependency injection (JSR 330) which facilitates building modular and independent components and services []. Also, the Griffon framework is based on the MVC pattern that allows a clear separation between the model, the controller, and the view. The communication between the modules is based on the observer pattern with the dispatch of events and the registration of listeners to allow the communication and the exchange of data throughout the application.

The application’s features allow: 1) the management of several users for counting features on videos, 2) the creation of a category (i.e. feature) and a collection of categories, 3) reading video and identifying the features on it (e.g. moving car, moving truck, moving bicycle, etc.); 4) exporting and importing the results in json files; 5) analyzing the counting concordance between two sessions (e.g. users). All of these reported functions are described in detail in the following section.

The application is split into multiple MVC modules (Figure 1). The Main module loads the different sub-panels of the application. It displays the navigation bar which allows the user to load different sub-modules. Next, four modules implement a CRUD (create, read, update and delete) interface for managing videos, users, categories and collections. These four components receive an injected DAO service to persist data on an embedded H2 database (https://www.h2database.com/html/main.html), using the ORM (Object Relation Mapping) Hibernate library for managing transactions with the database. The module entitled Editor is dedicated to the manipulation of a video as follows:

  • – a Player component with all basic functionalities (play, pause, rewind, forward);
  • – a Controls component for adjusting video player settings (playback speed) and customize the display of category icons (size, opacity and duration of icons on the video)
  • – a Timeline component for displaying a panel where the category icons are plotted throughout the video.
  • – A Summary module for displaying a panel where the counts for each category are reported.
Figure 1 

Architecture of VIFECO as reflected by the different modules.

Finally, the last module is dedicated to calculating several concordance indicators —described in detail later in the next section— for evaluating the agreement between two users.

Before describing the interface, it is worth noting that when the executable is first launched, the application creates a folder entitled vifeco at the user root level (e.g., ~/vifeco for Linux and macOS, and, c:\users\<username>\vifeco for Windows).

Vifeco Interface

The application’s interface is easy to use and organized into different panels. First, four CRUD interfaces are designated to manage users, features, collections, and videos (Figures 2 to 5 in which the red square indicates the button that activates the panel).

Figure 2 

CRUD interface for managing users.

It is important to remember that a category is a feature that the user wants to locate and count on a video, such as a moving car or a moving truck. Basically, the application already includes several categories (Figure 3). To create a new category, it is necessary to type its name and specify a unique keyboard shortcut. For the icon shape, the application only supports an SVG (scalable vector graphics) with a single path attribute (https://developer.mozilla.org/en-US/docs/Web/SVG/Element/path). Using this W3C vector standard makes it easy to change the color and size of icons that will be overlaid on the video (Figure 3). Once the icons are created, application users can then create a new collection of categories (Figure 4).

Figure 3 

CRUD interface for managing categories.

Figure 4 

CRUD interface for managing collections.

Another interface allows users to manage videos. First, the user can add video to Vifeco (by clicking on the icon surrounded by the blue square that activates an open dialog window for selecting a single file). Each new video is assigned with the default user and default collection. It can be changed by clicking on the user or collection name. A drop menu will appear with a list of user or collection to choose from. Selecting a video will display the counts for each category and will allow the user to start a session by clicking Start (Figure 5).

Figure 5 

CRUD interface for managing videos.

Once uploaded, the user can start a work session by selecting the video and then clicking the start button. A new window will open with the editor. The interface is divided into three panels indicated by the label numbers in the Figure 6: 1) the video, 2) player functionalities, 3) a table which summarizes the counts for each category, and 4) a table which displays the full list of the features added by the user. While viewing the video, the user can place the mouse pointer on a video feature and press the keyboard shortcut associated with a category to precisely add an icon above it. In case of an error, it can be removed by either clicking the icon or with the delete button in the panel 4.

Figure 6 

Interface for editing video.

Clicking on the row of the table list will bring the video directly to the timestamp associated with the point. The user can also manually edit the text field of the elapsed time using the format HH:MM:SS.

The player functionalities offer a play button, pause button and a slider to scroll throughout the video. The controls button will open a separate window with sliders to adjust the playback speed, and different settings for the display of the icons (size, opacity, duration of display). The blue button allows selection of a second video file to be played in a separate window. It stays synchronized with the main player. This can be useful for watching second camera recording with a different angle.

Evaluating the agreement between two users

VFC allows cross validation between counts done by two users. This is a crucial feature to validate the counting process and ensure the robustness of the final measure.

This is achieved by pair-wise comparisons between counts coming from two users for the same video. Tabular and graphical outputs are provided to describe the level of agreement (Figure 7).

Figure 7 

Concordance between two users.

In the table, three columns are shown for each user: the number of times each category has been counted (Total), the numbers of matched (Concordant) and unmatched (Discordant) counts. Finally, the last column provides the global concordance index (Figure 8), defined as the percentage of all the recorded features that have been counted by the two counters.

Figure 8 

Concordance index between two users.

The Overall Concordance Index is calculated for the full video (last row of the table, Figure 5). Note that if the difference in the number of features between categories is wide, the overall index will hide strong disagreement for categories with lower number of features. This is why a concordance index per category is also provided.

Finally, the chart provides temporally detailed information for a temporal window selected by the user (1 minute in the figure). Thus, during the video, the analyst can quickly find where the agreements and disagreements are strong. The bar chart shows in green the matched counts (agreements) and in red and orange the unmatched features from the two users (disagreements). A second chart represents the Concordance Index (%) calculated for each temporal window.

VFC can also export the counts and the matching of the counts as json files to allow a more in-depth analysis with other statistical software (e.g. R) if needed.

Quality Control section

All the entities and core services are fully tested with the Spock framework. The validation and relations between entities is done using annotation with Hibernate ORM. To guarantee data integrity, the configuration does not allow any cascading deletion, which could accidentally result in corrupted data. The data access object layer (DAO) which execute the CRUD operations are tested against an in-memory H2 database. Serialization and deserialization of data is also tested to validate the import and export of json files. The statistic service is tested against different sets of data to check the accuracy of the Tarjan algorithm and the creation of graphs for matching features between two sessions.

Due to the constant changes in the UI, we did not automate UI testing and was done manually during development of the application. Any bugs can be reported on github.com by creating an issue.

(2) Availability

Operating system

Windows, Mac OS X and Linux operating systems.

Programming language

Java with the Griffon framework 2.15.1 (http://griffon-framework.org/), an embedded H2 database (https://www.h2database.com/html/main.html), and the Bootstrap toolkit (https://getbootstrap.com/). The free icons used for Vifeco software are distributed by http://iconmonstr.com.

Additional system requirements

JavaFx media player supports a limited number of audio and video encoding types. Please refer to the JavaFx documentation at https://openjfx.io/javadoc/11/javafx.media/javafx/scene/media/package-summary.html.

Dependencies

Starting with Java 9, JavaFX is not bundled with Oracle nor Open JDK/JRE. JavaFX has been broken out into its own separate modules. Therefore, it must be downloaded separately and configured at compile and runtime. You can find more information on how to compile and run Java 11 programs with JavaFX at http://openjfx.io.

A better alternative is to use Liberica JDK from Bellsoft (https://bell-sw.com/pages/downloads/). Based on OpenJDK, it comes bundled and configured with JavaFX. Available for Java 8, 11 and 15, Windows, MacOS and Linux. This is the configuration we used for the development and is our recommendation to be used with Vifeco. It is also the runtime bundle with the Vifeco installer.

Another alternative is ZuluJDK from Azul (https://www.azul.com/downloads/zulu-community/?package=jdk). We do not guarantee compatibility with your application.

List of contributors

Project management: Philippe Apparicio (Full professor, INRS, Montreal, Canada). Software development: David Maignan (Undergraduate student in computer science and software engineering, UQAM, Montreal, Canada). Collaborator for the concordance module: Jérémy Gelb (PhD student in urban studies, INRS, Montreal, Canada). Contributors for translation: Jean-Marie Buregeya and Victoria Jepson (English), Dominique Mathon (Spanish).

Software location

Archive

Name: Zenodo

Persistent identifier: DOI: 10.5281/zenodo.3417052

Licence: Apache Licence 2.0

Publisher: David Maignan

Version published: 3.0

Date published: 09/04/2021

Code Repository

Name: GitHub

Identifier: https://github.com/LAEQ/vifeco

Licence: Apache Licence 2.0

Date published: 09/04/2021

Language

English, French, Spanish

(3) Reuse potential

The Vifeco has been designed to be a very versatile application and the functionalities developed have made it usable in a broad set of research fields.

Our urban studies laboratory uses the application for counting different types of encountered vehicles during bicycle trips (moving or stopped car, heavy vehicle, bicycles). Additionally, it can be used to record and categorize conflicts with other road users and pedestrians. In sum, every research project in need of a human count of objects appearing in recorded videos could make a great use of Vifeco. For example, Vifeco can be useful for studies analyzing recorded discourses, audiences, fauna, or interactions in public space. That is why we publish it as an open-source software to make it accessible and eventually modifiable for more specific research needs.

We strongly believe that this software could have a direct use in many research fields. The development of algorithms to automatically detect features in video is really promising; however, two problems limit this potential. First, most machine learning algorithms require a large number of pre-classified examples before being able to perform well, and thus are not suitable for many research projects. Second, the features to detect on videos risk being too subtle to be caught by a computer, such as in studies about peoples’ interactions. In the same vein, when counting vehicles during bicycle trips, stationary vehicles with running engines must be distinguished from parked vehicles. Therefore, the human judgement remains and will remain a necessity in many projects despite the development of computer vision.