(1) Overview

Introduction

Optical flow calculation constitutes one of the fundamental tasks of computer vision. It aims to reconstruct the motion between two consecutive images by finding pixel mappings between them, expressed as a 2D vector field. Seminal works by Horn and Schunck [] and Lucas and Kanade [] are now over 40 years old, and underpin the progress of the field.

Interest in optical flow algorithms remains high, as an analysis of the publication date of papers evaluated on SINTEL shows []. The explosion of the use of deep learning methods for optical flow prediction coupled with new potential applications such as autonomous driving has led to a renewed push for improvements on the state of the art on ever more complex benchmarks, such as SINTEL [], KITTI [], and DAVIS [].

Several methods to estimate optical flow from image sequences are available in Python, ranging from the OpenCV function cv2.calcOpticalFlowFarneback [] based on Farnebäck’s algorithm [] to implementations of methods aiming to better merge different scales, such as the Python wrapper by Pathak et al. [] for Liu et al. []. Virtually all recent deep learning based methods were developed in Python, with many important works, e.g. FlowNet2 [], PWC-Net [], or RAFT [] either making use of the PyTorch machine learning framework [] from the onset, or making available a Pytorch implementation later on.

Despite the extensive research to date, to the best of our knowledge there exists no off-the-shelf software that allows for easy handling and manipulation of optical flow fields. We examined existing code, and what was retrieved was either algorithm dependent, deals mainly with visualisation (flowvid [], flow-vis []), or adds only basic flow warping with a focus on very specific tasks such as reading/writing flows to the display capabilities (flowpy []).

In contrast, we provide a structured approach to flow fields and their manipulation. We take both possible reference frames for flow vectors into account (see Figure 1), and offer a wide range of operations on the flow fields themselves. This rigorous method ensures mathematically correct handling, thereby helping users avoid pitfalls such as simply negating flow vectors to obtain the inverse of a flow field. Beyond that, we especially highlight the flow composition functions that have not previously been implemented in this form. As an example, composition functions allow users to calculate the flow field which combined sequentially with another (known) flow is equivalent to a third flow. These operations were derived from first principles, and proved to be paramount for the construction of the optical flow ground truth in the synthetic datasets used in our work presented in Ravasio et al. [].

Figure 1 

Flow fields in the two possible frames of reference: “source” means all pixels at time t1 are mapped to a new location at time t2, while “target” means all pixels at time t2 are matched with a different previous location at the time t1.

Theory

We can define a flow field as mapping the coordinates x of features 1 to i at time t1 to coordinates at time t2:

(1)
12:=Xt1Xt2;X={x1,,xi}

In the context of this work, the features are image pixels, corresponding to a discretised regular grid in the theoretically continuous image. However, there are two possible frames of reference, illustrated in Figure 1:

  • Source, “s”: The pixel features whose motion is tracked by the flow vectors are those in what we term the source domain, or the image at time t1. Thus, the flow field s,12 indicates the motion of every pixel present in the image at time t1 to their respective end position at time t2.
  • Target, “t”: The pixel features whose motion is tracked by the flow vectors are those in what we term the target domain, or the image at time t2. The flow field t,12 matches every pixel present in the image at time t2 to their origin at time t1.

Therefore, we can extend the previous definition as follows:

(2)
s;12:=Gt1Xt2t;12:=Xt1Gt2

where G = H × W, a 2D space with a regular grid defined by H = {0, 1, …, H–2, H–1} and W = {0, 1, …, W–2, W–1}.

Given this, we set up the following equation:

(3)
1223=13

where 12 is the flow from time t1 to time t2, and ⊕ is the non-commutative operation corresponding to the sequential application of two flow fields. To calculate the actual flow vector values of 13, the naive approach would be to simply add the vectors:

(4)
12+23={(Gt1Xt2)+(Gt2Xt3)if source reference(Xt1Gt2)+(Xt2Gt3)if target reference

However, a closer inspection reveals this to be incorrect: a different set of features is tracked from time t1 to t2 than from time t2 to t3 (see also Figure 2). It is therefore necessary to add an intermediate step which refers the features in Gt2 back to Gt1, or the features in Gt2 forward to Gt3, for the source and the target reference frame respectively. The correct operation in the source reference frame is then:

(5)
s;13=s;12s;23=s;12+21{s;23}=s;12+s;121{s;23}

where s;12 {G} means applying the flow s;12 to the grid Gt1 to obtain the new feature coordinates Xt2, followed by an interpolation operation to obtain new grid values Gt2. The grid can either contain the pixel values of an image or, as in the previous equation, the vector values of another flow field. The inverse of a flow field can be calculated with the formula s;121=s;12{s;12}, i.e. reversing the flow vectors and then applying the flow field to the result in order to obtain an interpolation of the result in a new regular grid, as above.

Figure 2 

Schematic of the flow field composition operation. To obtain the correct coordinate mapping xt1xt3 from Flows 1 and 2, it is necessary to select the flow vector from 23 which starts at the end position of the tracked feature at time t2, or xt2. Therefore, it is not sufficient to simply add the vectors of the two flow fields: this would be equivalent to selecting the flow vector from 23 which starts at xt1 (assuming the “source” frame of reference is used).

If 23 in Equation (3) is unknown, given known inputs 12 and 13, a similar consideration applies. In this case, the flow vectors can be subtracted directly, but then need to be mapped onto the grid at time t2:

(6)
s;23=s;12{s;13s;12}

Finally, if 12 is unknown, we can subtracts known input 23 from the second known input 13, after mapping the former from t2 to t1 via t3. The following equation applies:

(7)
s;12=s;1321{s;23}=s;13(2331){s;23}=s;13(s;23s;131){s;23}=s;13(s;23+s;231{s;131}){s;23}

Equations (5) to (7) allow us to achieve all three important modes of composition of flow fields in the “source” reference frame using just two fundamental operations: vector addition, and applying a flow field to an input. An analogous approach yields corresponding formulae for flow fields in the “target” frame of reference.

Implementation and architecture

Oflibnumpy and oflibpytorch are implemented as a custom flow class based on either NumPy arrays or PyTorch tensors, respectively, with detailed documentation and usage guides available on oflibnumpy.rtfd.io and oflibpytorch.rtfd.io. Using tensors instead of arrays means oflibpytorch can partially run on GPU instead of being limited to CPU operations, and therefore integrate better into PyTorch-based deep learning algorithms that use optical flow fields. This comes at the cost of having to control the tensor devices of inputs and outputs.

The flow class has three main attributes:

  • Vectors vecs: The flow vectors themselves. They are expressed as an array of shape (H, W, 2) (oflibnumpy) or a tensor of shape (2, H, W), following the PyTorch channel-first convention (oflibpytorch). The dimension of size 2 corresponds to the vector components (x, y), with x defined positive towards the right, and y defined positive downwards. This follows the OpenCV convention for flow vectors, e.g. as output by the calcOpticalFlowFarneback function.
  • Reference ref: The flow reference determines which frame of reference the flow vectors are in. The options are “source” or “target”: either the flow vectors originate from a regular grid in the flow source, or they point to a regular grid in the flow target (see Figure 1 and Equation (2)).
  • Mask mask: A boolean array or tensor of shape (H, W) which indicates which flow vectors are valid. This is not relevant for simple flow operations such as resizing, or tracking points, but becomes very important when several flow fields are combined. In that case, often only parts of the area H × W of the resulting flow field will contain useful vector values. This is not a limitation of the algorithm, but a characteristic that arises from the nature of the operation.

The implemented class methods can be grouped as follows:

  • Constructors: to create flow fields from a given transformation matrix, a list of transforms, or filled with zero-magnitude vectors
  • Manipulation: inverting, resizing, and padding flows, or switching their reference frame
  • Application: warping an input, or tracking specific points
  • Evaluation: finding the valid source or target area, necessary padding of inputs, or fitting a transformation matrix to the flow field
  • Visualisation: either using the classic hue-based method, or showing arrows

Finally, as a key component building on the entire rest of the flow library, the method combine_flows allows for the composition of two input flow fields into one output flow field. Three modes are available: the output of the function corresponds to either 12, 23 or 13 in Equation (3), the other two being the given input flows. The equations used for this function were derived from first principles as shown in Equations (5) to (7), and rely exclusively on the implemented flow class methods. These also ensure the valid area is tracked correctly through each operation, and returned as the mask of the resulting flow object.

Warping inputs with a flow field in the “source” frame of reference is a slow operation, as it requires interpolation from an unstructured to a regular grid. To optimise performance, we therefore adapt some of the flow composition equations to avoid this operation wherever possible. For example, we make use of a fundamental relationship between the two frames of reference derived from Figure 1: we can invert a flow field and switch its frame of reference at the same time by simply inverting the flow vectors.

(8)
s;121:=Gt1Xt2=Xt2Gt1=t;21

Applied to Equation (5), this allows us to replace two slow operations, namely inverting the flow field s and applying it to an input, with two fast operations: getting the inverse in the “target” frame of reference, and applying it to an input.

(9)
s;13=s;12+s;121{s;23}=s;12+(s;12)1;st{s;23}=s;12+t;21{s;23}

Quality control

We implemented tests based on the unittest package for all relevant functions and class methods. They verify the mathematical validity of the flow composition operations, compare the output of functions such as Flow.resize with expected results, and ensure unexpected inputs throw the required error. They are available from the test folder on GitHub.

These tests were written and continuously updated during development. The coverage package (see coverage.rtfd.io) reports an overall test coverage of 99% for both oflibnumpy and oflibpytorch.

Official documentation is available on ReadTheDocs (RTD; see oflibnumpy.rtfd.io and oflibpytorch.rtfd.io). The introduction page provides simple code examples that use the core functionality, along with the expected output. Users can find this code along with further sample usages in a file called examples.py in the source code on GitHub. Comparing the outputs will serve to confirm the installation is working as intended. A more complete demonstration and explanation of the capabilities of oflibnumpy and oflibpytorch is available on the “Usage” page of the RTD documentation, including visualisations of the outputs.

(2) Availability

Operating system

As both oflibnumpy and oflibpytorch are pure Python packages, they are compatible with any operating system that can provide a Python 3 environment. Development took place on a Windows 10 system.

Programming language

Oflibnumpy and oflibpytorch require Python 3, and have been specifically tested for Python 3.7 and 3.9.

Additional system requirements

There are no additional system requirements.

Dependencies

The following packages are required and will be installed as dependencies when using the commands pip install oflibnumpy or pip install oblibpytorch, or when installing the code from source via setup.py:

  • NumPy ≥ 1.15 []
  • SciPy ≥ 1.4 []
  • OpenCV ≥ 3.4 []

Oflibpytorch additionally requires PyTorch ≥ 1.4 [], and a compatible version of the CUDA toolkit [] if operations on GPU are required. We recommend an installation in a virtual Conda environment, using the install command suggested on PyTorch.org.

List of contributors

The contributors to this work are Claudio S. Ravasio, Christos Bergeles, and Lyndon Da Cruz.

Software location: Oflibnumpy

Archive 1

Name: Zenodo

Persistent identifier: 10.5281/zenodo.4916270

License: MIT License

Publisher: Claudio S. Ravasio

Version published: 1.0.0

Date published: 09/06/21

Archive 2

Name: Python Package Index (PyPI)

Persistent identifier: pypi.org/project/oflibnumpy

License: MIT License

Publisher: Claudio S. Ravasio

Version published: 1.0.0

Date published: 09/06/21

Code repository

Name: GitHub

Persistent identifier: github.com/RViMLab/oflibnumpy

License: MIT License

Date published: 09/06/21

Software location: Oflibpytorch

Archive 1

Name: Zenodo

Persistent identifier: 10.5281/zenodo.4916367

License: MIT License

Publisher: Claudio S. Ravasio

Version published: 1.0.0

Date published: 09/06/21

Archive 2

Name: Python Package Index (PyPI)

Persistent identifier: pypi.org/project/oflibpytorch

License: MIT License

Publisher: Claudio S. Ravasio

Version published: 1.0.0

Date published: 09/06/21

Code repository

Name: GitHub

Persistent identifier: github.com/RViMLab/oflibpytorch

License: MIT License

Date published: 09/06/21

Language

English

(3) Reuse potential

An early version of the library described in this paper was used in Ravasio et al. []. We especially made use of the ability to find a flow field which, when combined sequentially with a first known flow, results in a known third flow (see Equation (6)). This combine_flow function, which in turn relies on the implementation of the flow class and its methods, is key to the creation of the complex synthetic optical flow datasets used in our work and continues to be used in ongoing research on the topic. We therefore see this software as both of great value for specialised work with flow fields as well as broadly applicable to any optical flow task that requires common operations such as warping an input, resizing a flow field, or inverting it. We intend it to be an off-the-shelf tool that is easy to install, easy to use, and well documented for researchers from any field. As an example, Sintel [] as well as KITTI [] provide ground truth flow fields along with data on invalid pixels. Once loaded into NumPy, both can be used to construct a single oflib flow object, making use of the mask attribute, which will then automatically keep track of any changes to invalid pixels effected by operations carried out on the flow field. More experienced programmers with very specific needs can also modify the source code, or simply make use of the existing structure by extending the flow class as required.

The main support channel for oflibnumpy and oflibpytorch are their respective GitHub issue pages. Users are also welcome to contact the first author of this paper via email to ask for support or report issues.