GIFT-Grab: Real-time C++ and Python Multi-channel Video Capture, Processing and Encoding API

GIFT-Grab is an open-source API for acquiring, processing and encoding video streams in real time. GIFT-Grab supports video acquisition using various frame-grabber hardware as well as from standard-compliant ­network­streams­and­video­files.­The­current­GIFT-Grab­release­allows­for­multi-channel­video­acquisition­ and encoding at the maximum frame rate of supported hardware – 60 frames per second (fps). GIFT-Grab builds­on­well-established­highly­configurable­multimedia­libraries­including­FFmpeg­and­OpenCV.­­GIFT-Grab­ exposes­a­simplified­high-level­API,­aimed­at­facilitating­integration­into­client­­applications­with­minimal­ coding­effort.­The­core­implementation­of­GIFT-Grab­is­in­C++11.­GIFT-Grab­also­features­a­Python­API­ compatible­with­the­widely­used­scientific­computing­packages­NumPy­and­SciPy. GIFT-Grab was developed for capturing multiple simultaneous intra-operative video streams from medical imaging devices. Yet due to the ubiquity of video processing in research, GIFT-Grab can be used in many other areas. GIFT-Grab is hosted and managed on the software repository of the Centre for Medical Image Computing (CMIC) at University College London, and is also mirrored on GitHub. In addition it is available for installation from the Python Package Index (PyPI) via the pip installation tool.


Introduction
GIFT-Grab is an open-source application programming interface (API) for acquiring, processing and encoding video data in real time.GIFT-Grab supports live video acquisition via supported frame-grabber hardware as well as the capture of standard-compliant network streams [1].GIFT-Grab also supports offline video acquisition from video files.GIFT-Grab is implemented in C++11 yet it also comes with a Python API that facilitates video data processing with NumPy [2] and SciPy [3], two popular Python scientific computing packages widely used in academia.GIFT-Grab can be built from source code and can also be installed from the Python Package Index (PyPI) [4] using the pip installation tool [5] (please see the installation note in the Availability section).
GIFT-Grab leverages the video capture and processing functionality of external multimedia libraries including FFmpeg [6] and OpenCV [7], and of software development kits (SDKs) provided by frame-grabber hardware manufacturers.Although FFmpeg comes with powerful capabilities, it is architected in a low-level C-language coding style.Due to this, it requires unintuitive configuration involving numerous parameters, even for seemingly simple tasks.On the other hand, OpenCV provides a higherlevel API than FFmpeg, resulting in a shallower learning curve.However it does not support callbacks; but instead requires the client application to query video sources for new data.
Video data acquired from external sources is frequently used in an image processing pipeline.It becomes a challenge for the end user to convert between various datatypes representing video data in different software packages, especially when using various capabilities offered by different libraries and SDKs.It is not uncommon for different video processing libraries to have no common data interface other than a byte array.On the other hand, scientific computing packages like SciPy [3] provide powerful image processing capabilities in conjunction with more natural datatypes like the NumPy array [2].GIFT-Grab aims to bridge this gap of video acquisition, processing and subsequent encoding, by providing a pipeline-oriented architecture which facilitates real-time video processing with minimal coding effort.We have developed GIFT-Grab to meet the requirements of the international research initiative "Guided Instrumentation for Fetal Therapy and Surgery" ("GIFT-Surg") [8].GIFT-Surg involves major innovations in science and engineering combined with clinical translation for improving fetal therapy and diagnosis by providing advanced image processing and visualisation capabilities.We have already obtained promising results with a number of novel methods including real-time mosaicing of the placenta [9] and sensor-free real-time instrument tracking [10].These methods rely on real-time video streams from medical devices such as endoscopes and ultrasound probes.GIFT-Grab seamlessly makes video data from medical devices available for use in these medical imaging methods.It supports the simultaneous acquisition of multi-channel video streams at up to 60 Full HD (1080p) fps.
An integral part of the development cycle of new medical imaging methods is validation with data representative of the targeted medical scenario.In the case of GIFT-Surg this requires making intra-operative video data from medical devices available for offline use.However raw video data imposes an impractical burden in terms of storage requirements.For instance one Full HD (1920 × 1080) video frame with three channels occupies approximately 6 megabytes (MB) in raw format.At a frame rate of 60 fps, storing an hour's worth of intra-operative endoscopic video requires 1.25 terabytes (TB) of storage space.It is clear that permanently storing raw video is not a scalable solution.However it is also easy to see that even temporarily storing raw video (e.g. for offline encoding) would push storage hardware to its limit if not handled carefully.To remedy this, GIFT-Grab supports real-time encoding of video in popular formats including High Efficiency Video Coding (HEVC) [11] and Xvid [12].GIFT-Grab leverages the real-time data processing capabilities of graphics processing units (GPU) for streamlining intra-operative video capture.GIFT-Grab is made available to the community as open-source under a permissive licence [1,4].

Implementation and architecture
Before diving into the details of the GIFT-Grab architecture, listing 1 illustrates a simple video processing pipeline using the GIFT-Grab Python API.As an example, this pipeline includes saving Gaussian-smoothed video data captured from a frame-grabber card to a file.The reader should note that the source code in listing 1 is trimmed for brevity.A full working example is provided in listing 3. Listing 1: Python code snippet demonstrating a GIFT-Grab pipeline for capturing live video data from an Epiphan DVI2PCIe Duo card [13]; processing and encoding it for saving to a video file.Note that the Python imports (six lines of code) have been excluded for brevity.The processing node of the pipeline is a class called GaussianS-mootherBGRA and defined in listing 2. A full working example is provided in listing 3.
GIFT-Grab uses the abstract factory design pattern [14] as an abstraction layer representing the creation of connections to supported devices.Upon request, the video source factory creates a polymorphic object that implements the abstract video source interface (IVideoSource).This is illustrated in the UML class diagram in Figure 1.The video source factory manages each device connection as a singleton [14].As a safety net against potential resource issues, it also takes care of properly destroying all created device connection singletons at the end of its lifetime.
GIFT-Grab relates video producers (sources) to video consumers (targets) via the observer design pattern [14].The observer design pattern is a high-level design paradigm similar in concept to function callbacks.It allows observers to subscribe to video sources (i.e. to be attached), so as to be notified of each new video frame.The observableobserver hierarchy in GIFT-Grab is illustrated by the UML class diagram in Figure 2.
The observable-observer hierarchy is a solid building block that facilitates video processing pipelines in GIFT-Grab.The exemplary processing pipeline shown in listing 1 includes a processing node.This node is the GaussianSmootherBGRA class detailed in listing 2 that smoothes arriving frames using the GIFT-Grab NumPy [2] data wrapper in conjunction with SciPy image processing functions [3], and subsequently passes them further down the pipeline.Similarly, each IVideoTarget is an IObservers, with its update method automatically calling the append method (i.e.appending that particular frame to the file).Listing 2: Python code snippet showing a sample video processing node: upon receiving a video frame GaussianSmootherBGRA smoothes it with a Gaussian kernel and feeds it further down the processing pipeline (such as the one shown in listing 1) -cf. Figure 2. GIFT-Grab wraps video data into NumPy arrays [2] to allow for processing with SciPy routines [3].Note that the sole purpose of this code snippet is to demonstrate GIFT-Grab's compatibility with NumPy and SciPy.As such GaussianSmootherBGRA does not implement any data buffering (such as a ring buffer).Data buffering might be required to enable such a processing node to cope with real-time frame rates depending on the capabilities of the platform hardware.
The GIFT-Grab IVideoSource implementors make use of functionality provided by external libraries such as OpenCV [7] and Epiphan Video Grabber SDK [15] to realise the actual data connection to the supported devices.Some of these libraries inherently support callbacks, thereby making the implementation of the observer design pattern intuitive.However others operate on a data-querying basis rather than providing a callback mechanism.In GIFT-Grab we remedy this issue by using the visitor design pattern [14]: a daemon object acts as the visitor to any IVideoSource implementor that does not inherently support the observer paradigm.This daemon object queries the video source in regular time intervals for new video frames, and subsequently notifies all observers of that video source.This is shown in Figure 3.
GIFT-Grab uses the abstract factory design pattern [14] also for abstracting the creation of video targets (see Figure 4).Video targets abstract the encoding of video frames to video files.The VideoTargetFactory operates on the "resource acquisition is initialisation" (RAII) principle [16]: each created IVideoTarget is immediately ready for use (see listing 1).As shown in Figure 2 each video target is also an observer in the observable-observer hierarchy, which allows it to be attached to a video source for automatically saving each frame from a connected device.The abstract factory design pattern helps minimise the exposed API, which allows for a better encapsulation of the underlying implementation details.

Real-time encoding benchmarks
We benchmarked the real-time performance of GIFT-Grab by measuring the time it takes to encode video frames to a video file using a GPU.The video frames were read from the "Big Buck Bunny" sequence [17], in two different colour spaces: I420 and BGRA.They were then used for producing three different HEVC-encoded MP4 video files [11,18] of different resolutions: We measured the wall-clock execution time of the append function of VideoTargetFFmpeg using the Boost Timer Library [19].This execution time includes copying the video data from the CPU to the GPU, encoding the frame and actually writing it to the target video file.In

Table 1:
Recorded maximum CPU load (as percentage) and maximum system memory used (in gigabytes) while recording each of the datasets shown in figure 5. Note that these are system-wide measurements to give an idea of the system load during the benchmarking process.
GeForce GTX TITAN X GeForce GTX 980 Ti I420 frames HD: 14.3%, 5.7 GB addition we monitored the GPU memory used throughout the process, via the NVIDIA System Management Interface [20].We report the maximum recorded GPU memory value for each case.Furthermore we monitored the total CPU load and the total system memory usage, using the Bash commands top and free respectively (see Table 1).
We obtained these measurements on a workstation with two GPUs.At the time of recording, the workstation was in an idle state, with each of the GPUs running only a graphical user session.The hardware and software specifications of the workstation follow:  2. This is because the first frame is used for inferring the resolution to use in the resulting video file.This information is subsequently used for initialising the file writer object, including memory allocation.
The upper row in Figure 5 demonstrates that in the absence of colour conversion (when I420 frames are fed to the file writer), all recorded execution times are well below 16.67 msec, the time interval between two video frames at 60 fps.The lower row shows that even when colour conversion is involved, 60 fps can still be achieved for Full HD video frames.However neither 60 fps nor 30 fps can be guaranteed for 4K video frames when colour conversion is involved.On the other hand, the relatively low GPU memory consumption suggests that multiple frames of the same video stream or frames of multiple video streams may be encoded in parallel.This could thus potentially yield higher frame rates when colour conversion is needed.Encoding multiple frames of the same video stream however requires proper synchronisation mechanisms to preserve the order of video frames, as well as proper handling of the inter-frame dependencies.

Quality control
GIFT-Grab can be built from source code or installed via the pip installation tool [5] from the Python Package Index (PyPI) [4].We encourage the reader interested in the GIFT-Grab Python API to use this option (please see the installation note in the Availability section).Building GIFT-Grab from source requires configuration via CMake [21].This installation option comes with a CMake discovery script that allows for seamlessly including GIFT-Grab in C++ software projects that use CMake [21].
Delivering high-quality sustainable medical imaging software is one of the core goals of GIFT-Surg [8].Owing to the fact that the GIFT-Surg software deliverables are intended to be used for pre-operative surgical planning and intra-operative interventional guidance applications, we follow a development process that is inspired by the internationally accepted IEC EN 62304:2006 Standard "Medical device software -Software life cycle processes" [22].This standard outlines the key requirements for the safe design and maintenance of software products intended for clinical use.We will briefly discuss the key sections of EN 62304 in relation to our development practices.
Section 5 of EN 62304 defines the requirements of the medical software development process.It emphasises traceability between each software development step and the software specifications, tests and risk control measures.It allows for an existing development process compatible with the standard to be adopted instead of defining a new one.We follow a development process very similar to that demonstrated by Höss et al. in a recent paper [23].Höss et al. use state-of-the-art software development tools and technologies to implement the key requirements of EN 62304.As such they have not only achieved quality assurance (QA) of their software tool, but also laid a solid practical example for translating EN 62304 to the clinical ecosystem.
We use GitLab [24] for project management and Gitlab CI [25] for continuous integration (CI) in a similar fashion to how Höss et al. use Redmine [26] and Jenkins [27] respectively for these two tasks.The issue tracker functionality of GitLab serves to document feature requests and bug reports, which then steer the development process.GitLab allows for referencing issues from code commits, which ensures a direct link between the software specifications and the concrete development steps that lead to the realisation of those.This practice is also in line with section 6 of EN 62304, which outlines the requirements for maintaining clinical software.
We have adopted the test-driven development (TDD) process [28] for GIFT-Grab.In TDD each new feature or bug is first defined through a set of failing tests.Subsequently the implementation of the new feature or the bug fix boils down to adding new and/or refactoring the existing source code in a manner that makes the new tests pass.GitLab CI automatically executes the GIFT-Grab tests as well as the documented deployment procedures.This is a safety net that ensures changes to GIFT-Grab do not break existing functionality.In other words, when implementing a new feature or fixing a bug, only the code changes that make the new tests pass without causing the existing ones to fail are merged into the stable code branch.This practice is inspired by the requirement for identifying and resolving potential regressions during the active development of medical device software, as outlined in section 5 of EN 62304.
The GIFT-Grab tests are organised as a comprehensive test suite that is configured using CTest [29].This test suite uses Catch [30] and pytest [31], at the C++ and Python levels respectively.The unit tests implemented using Catch [30] serve two purposes.First, they ensure that the video source and target factory instances are created according to the singleton pattern.Second, they check that the video source factory manages the connection to each supported device as a singleton.The unit tests implemented using pytest [31] check the correct operation of all documented GIFT-Grab features.All the tests pertaining to video acquisition from supported frame-grabber hardware are executed on dedicated test machines with the respective cards installed.Owing to the polymorphism that enables treating video files and frame-grabber cards simply as video sources, the unit tests for files are designed following a similar pattern to the unit tests for frame-grabber hardware.Sample video files with known specs are also included as part of the testing infrastructure.These together with a provided internal Python module that makes use of FFmpeg utility applications [6] serve to ensure video de-/encoding capabilities are operational.In addition to the unit tests, the GIFT-Grab pytest suite also provides real-time integration tests that create image processing pipelines where intermediate processing nodes measure the acquisition frame rates and report in cases where the measured frame rates are lower than documented values.Last but not least, the GIFT-Grab test suite also checks the correct exposure of raw video data as NumPy arrays.For further technical details we refer the reader to the GIFT-Grab documentation [1,4].Apart from the automated tests, we also run manual tests, notably before each release, where a human observer runs a sample Python application related to a specific GIFT-Grab feature and (visually) assesses the result (e.g. a video file produced) for correctness.Section 8 of EN 62304 defines the requirements for the configuration management of medical device software.This among others involves the version management of the "software of unknown provenance" (SOUP) in relation to the versions of developed software.In the case of GIFT-Grab, SOUP corresponds to the third party software libraries used by GIFT-Grab.We clearly document these software libraries in the GIFT-Grab documentation alongside their respective versions that GIFT-Grab has been tested with.GitLab further allows for documenting project-specific information using the Markdown language [32].Using GitLab, information can be organised in the form of wiki pages and the Markdown language provides means for cross-referencing between different documentation resources.We use these GitLab features for documenting details related to the relevant SOUP libraries, such as detailed and custom installation instructions wherever applicable.
Section 9 of EN 62304 stipulates that problems related to medical device software are to be documented, classified for criticality and impact, and subsequently investigated.It further demands that users of the software be advised of existing problems.Our GitLab infrastructure serves to document reported problems in the form of issue tickets as detailed above.In addition, one or more labels related to an issue's criticality and impact can be assigned to each ticket.Last but not least, each ticket can be marked for resolution in a specific release.We have also open-sourced GIFT-Grab on the popular GitHub [1].GitHub provides a similar issue tracking and documentation functionality, and has millions of users.We believe making GIFT-Grab thus available to a broader audience will accelerate the discovery (and potentially the resolution) of problems.
GIFT-Grab is not a full medical diagnostic or guidance tool, but one of the building blocks thereof.In other words, GIFT-Grab is a component that serves to make data from external medical devices available for processing in clinical software.As such, we focus on defining interfaces i) between GIFT-Grab and medical devices by supporting appropriate frame-grabber hardware, and ii) between GIFT-Grab and clinical software via an appropriate representation of video data to facilitate integration and compatibility with a broad range of available software tools.This is inspired by the requirement of EN 62304 for defining software inputs and outputs, and the interfaces between the software system and other systems (section 5.2.2).Although we endeavour to follow development practices compatible with EN 62304, we are also aware that compliance with EN 62304 requires a formal process for quality management and risk mitigation, as well as for usability engineering [23].Furthermore, these processes are not only to be formalised and followed, but will also be subject to a formal independent audit.
(2) Availability Installation note: The reader interested in installing GIFT-Grab via the pip installation tool should read the installation instructions in [33] before proceeding.Availability information for GIFT-Grab at the time of preparing this manuscript is listed below.

Dependencies
The required GIFT-Grab dependencies are listed in bold face below.The requirement for each of the others depends on the desired GIFT-Grab features.As such, all features are inactive by default, to ensure the minimal number of dependencies for installing GIFT-Grab.We refer the reader to the GIFT-Grab documentation [1] for more detailed information, in particular the minimum acceptable versions of the listed dependencies.
At time of composing this manuscript GIFT-Grab uses stable versions of all the listed external dependencies except for libVLC.This is because some critical issues in the stable releases of libVLC have been fixed in recent release candidates.
• A C++11 compatible compiler such as GCC [36] • CMake [21] • Python [37] Language English (3) Reuse potential Real-time and non-real-time video processing are ubiquitous in our age.Many applications in different domains use video data.The GIFT-Surg research initiative [8] develops novel medical imaging methods for fetal therapy and surgery, that rely on video from medical devices [10,9].Other examples include real-time stereo reconstruction [48] and probe tracking [49] in robotic surgery, surveillance endoscopies [50], real-time panorama image synthesis [51], vehicle surveillance [52] and content-based video identification [53].
GIFT-Grab allows real-time video processing applications to connect to external devices and to encode video data from these devices.In addition GIFT-Grab exposes a simplified API architecture suitable for video processing pipelines.Wrapping video data as NumPy arrays [2] makes it possible to create video processing pipelines using the rich collection of methods in SciPy [3].In order to facilitate easy deployment, GIFT-Grab is also bundled as a PyPI package [4].This allows for a one-liner installation using pip, the recommended tool for Python packages [5] (please see the installation note in the Availability section).
GIFT-Grab relies on highly configurable external multimedia libraries and SDKs.Wherever applicable, it uses default configuration options for these.In other words, GIFT-Grab does not attempt to tweak the parameters of these external dependencies unless absolutely necessary.As such GIFT-Grab may not be a suitable choice for applications that have specific requirements, for instance custom video quality.

Appendix A. Full working Python example
The source code listing below shows a full Python example that can be copy-pasted out to a Python interpreter and executed.Listing 3: A full Python example that demonstrates a GIFT-Grab pipeline capturing video data from a local file; processing and subsequently encoding processed frames out to another video file.The processing node of this pipeline is the GaussianSmootherBGRA class defined in listing 2 and duplicated here for convenience.

Figure 1 :
Figure 1: UML class diagram showing how connections to supported devices are created and maintained in GIFT-Grab.The VideoSourceFactory singleton creates on demand a polymorphic IVideoSource object that serves as the interface for acquiring video frames from a supported device.The diagram shows three of the derived classes which implement this interface.The choice of the IVideoSource type depends on the connection request.

Figure 2 :
Figure 2: UML class diagram showing the observer design pattern hierarchy for video sources and targets in GIFT-Grab.Every IVideoSource is also an IObservable, to which IObservers can be attached.Each time a new VideoFrame becomes available, the IObservable notifies all its IObservers by calling the update method.Similarly, each IVideoTarget is an IObservers, with its update method automatically calling the append method (i.e.appending that particular frame to the file).

Figure 3 :
Figure 3: UML sequence diagram showing how the visitor design pattern [14] is used for realising the observer pattern with an IVideoSource implementor that does not inherently support callbacks: The BroadcastDaemon visitor queries the VideoSourceOpenCV object in regular intervals for a new VideoFrame.Once obtained, the new VideoFrame is propagated to all attached observers by calling notify.

Figure 4 :
Figure 4: UML class diagram representing the video target hierarchy in GIFT-Grab.Two implementors of the IVideoTarget interface are shown: VideoTargetOpenCV and VideoTargetFFmpeg classes which are used for encoding video frames and saving them to video files.Similar to the case of VideoSourceFactory, the VideoTargetFactory singleton creates on demand a polymorphic IVideoTarget object.However contrary to the case of VideoSourceFactory, the ownership of the created IVideoTarget object passes to the caller, i.e. the caller is responsible for destroying it to free up memory at the end of its lifetime.

• 1 .Figure 5
Figure5shows the wall-clock execution times for 100 video frames in each configuration.The execution time for the first video frame has been excluded from these graphs, but reported in Table2.This is because the first frame is used for inferring the resolution to use in the resulting video file.This information is subsequently used for initialising the file writer object, including memory allocation.The upper row in Figure5demonstrates that in the absence of colour conversion (when I420 frames are fed to the file writer), all recorded execution times are well below 16.67 msec, the time interval between two video frames at 60 fps.The lower row shows that even when colour conversion is involved, 60 fps can still be achieved for Full HD video frames.However neither 60 fps nor 30

Figure 5 :
Figure 5: Wall-clock execution times in milliseconds for encoding 100 video frames of three different resolutions on two GPUs and subsequently writing them to a video file.Each column is for one GPU.The upper row shows execution times when I420 frames are fed into the file writer, i.e. no colour space conversion before encoding.The lower row shows the case when BGRA frames are fed into the file writer, i.e.BGRA-to-I420 conversion performed before encoding.The darker shade in each graph shows the region within the time limits that would allow for processing 60 fps.The lighter shade in the lower row shows the respective region for 30 fps (but not 60 fps).The numbers in parantheses in the figure legends indicate the maximum GPU memory allocated at a time for each dataset.

Table 2 :
Encoding times in milliseconds for the excluded first frame (initialisation) in each of the datasets shown in figure5.