PyDDA: A Pythonic Direct Data Assimilation Framework for Wind Retrievals

This software assimilates data from an arbitrary number of weather radars together with other spatial wind fields (eg numerical weather forecasting model data) in order to retrieve high resolution three dimensional wind fields. PyDDA uses NumPy and SciPy’s optimization techniques combined with the Python Atmospheric Radiation Measurement (ARM) Radar Toolkit (Py-ART) in order to create wind fields using the 3D variational technique (3DVAR). PyDDA is hosted and distributed on GitHub at https://github.com/ openradar/PyDDA. PyDDA has the potential to be used by the atmospheric science community to develop high resolution wind retrievals from radar networks. These retrievals can be used for the evaluation of numerical weather forecasting models and plume modelling. This paper shows how wind fields from 2 NEXt generation RADar (NEXRAD) WSR-88D radars and the High Resolution Rapid Refresh can be assimilated together using PyDDA to create a high resolution wind field inside Hurricane Florence.


Introduction
High resolution 3D wind retrievals have a wide range of practical applications.For example, 3D wind retrievals are frequently used to gain insight on the kinematic processes inside thunderstorms, hurricanes, and tornadoes (i.e.[1,2]).Furthermore, 3D wind fields are critical for dispersion modelling [3] and for predicting potential hazards to infrastructure.These wind retrievals are commonly created from data from networks of scanning weather radars such as those from the NEXt generation RADar (NEXRAD) network of WSR-88Ds installed by the National Oceanic and Atmospheric Adminstration (NOAA) all over the United States that can detect the velocity of particles in the direction of the radar beam over a volume, commonly called the radial or Doppler velocity [4,5].The NEXRAD network samples volumes of approximately 250 km by 250 km by 20 km every 5 to 10 minutes.While other wind measurements such as profilers and anemometers provide the most accurate wind measurements, they measure a much more limited volume than a scanning radar being limited to the column and single point in space respectively.Numerical weather forecasting models provide 3D winds with a greater volume coverage than the NEXRAD network, but they are affected by uncertainties in the model prediction and assimilation and generally provide data at coarser spatial and temporal scales than wind observations made by scanning radars.Therefore a solution for integrating wind observations from many different sources operating at differing spatial and temporal scales has the best capability of providing a complete picture of the spatial and temporal evolution of the 3D wind field.
Since each sensor and model produces winds at differing spatial and temporal scales, retrieving the 3D winds from them is a nontrivial task.For a single radar, deriving a 3D wind field amounts to solving one equation with three unknowns since only the wind velocity in the direction relative to the radar beam is known.Therefore the problem of deriving 3D winds from one radar requires additional constraints.For more than one radar, we can increase the number of equations corresponding to each radar.However, even in the multi-radar scenario, there are measurement uncertainties related to the differing sampling strategies and sensitivities of each radar.Therefore extra assumptions must be made to retrieve the 3D wind field from velocities recorded network of radars and produced by models.Two frameworks have typically been used to retrieve the 3D winds from a network of radars.One is the framework that imposes a strong constraint by integrating the mass continuity equation from the surface to the top of the atmosphere [6,7].The other, is the 3D variational (3DVAR) framework that minimizes a sum of cost functions related to radar measurements, equations of motion, balloon based profiles of wind measurements and high resolution weather forecasting models [8,9,10,11,12].The 3DVAR framework has the advantage in that it is easier to integrate observations from extra sensors and models than the strong constraint framework and is less sensitive to initial and boundary conditions [9,12,13].Therefore, the current standard for 3D wind retrievals from radars is the 3DVAR framework.
Past software applied to 3D wind retrievals includes Custom Editing and Display of Reduced Information (CEDRIC) [13] and MultiDop [9,11,14].CEDRIC is based off of the traditional mass continuity integration technique which, again, limits the ability of the user to account for measurement uncertainties than 3DVAR.While ground-breaking at the time, CEDRIC is difficult to use as it involves learning a separate scripting language just to use it.Multidop is a Python wrapper around a C program implementing the 3DVAR technique called Dual Doppler Analysis (DDA).Multidop bought 3DVAR winds to the open source community for the first time.However, it is based off of legacy code for the minimization of the cost function, can be difficult to compile, currently only supports up to 3 radars, is challenging to extend, and is not thread-safe for mass processing of winds on a multiple computational cores.Pythonic Direct Data Assimilation (PyDDA) was developed in order to make a fully Pythonic, open source solution for implementing the 3DVAR framework for 3D wind retrievals where data from an arbitrary number of radars and models can be added.PyDDA is fully written in Python and is built on standard packages in the Scientific Python ecosystem including NumPy [15], SciPy [16], Cartopy [17], and matplotlib [18] as well as the Python ARM Radar Toolkit [19].
In this paper, we will first introduce how the 3DVAR framework in PyDDA was implemented using the Scientific Python ecosystem and how contributors can expand PyDDA to include their data.After that, an example case of using the software to retrieve winds from NEXRAD observations and High Resolution Rapid Refresh (HRRR) model runs in Hurricane Florence is presented.Finally, details on how PyDDA is quality controlled are presented.

Implementation and architecture 3DVAR framework
Let v(x, y, z) = (u(x, y, z), v(x, y, z), w(x, y, z)) ∈ ℜ 3 be the analysis wind field over a Cartesian grid.The v(x, y, z) are defined over a discrete Cartesian grid of m by n by o points provided as (x ijk , y ijk , z ijk ) for i ∈ [0, m), j ∈ [0, n), and k ∈ [0, o).PyDDA derives v(x ijk , y ijk , z ijk ) for i ∈ [0, m), j ∈ [0, n) by finding the v(x ijk , y ijk , z ijk ) that minimizes the cost function in Equation (1).
with a gradient ∇J(v) that is the sum of the gradients C n ∇J n of each term in Equation 1.All of the J n , ∇J n terms are defined as in Tables 1 and 2. Detailed formulas of each function in Tables 1 and 2 are available in the source code of the cost_functions module and in [9,10,11].Equation ( 1) is written in Python using the NumPy and SciPy libraries.
The C n terms are user adjustable weighting coefficients for each constraint.Therefore, the importance of a particular measurement or model derived data point to the constraint can be adjusted.This is useful for accounting for the uncertainties in each constraint.For example, the retrieval can be more weakly constrained by model data that has higher uncertainty than the radar observations.Furthermore, since Equation ( 1) is a sum of cost functions and their gradients, adding more terms to Equation ( 1) is a simple way to expand this framework.All of the functions in Tables 1 and 2 are currently implemented in the cost_functions module of PyDDA.

Optimization problem
Finding the v(x ijk , y ijk , z ijk ) that minimizes J(v) involves the minimization of a cost function of the order of a million (3mno) dimensions, three dimensions for each (x ijk , y ijk , z ijk ) for i ∈ [0, m), j ∈ [0, n), and k ∈ [0, o).SciPy's implementation of the Limited Memory Broyden-Fletcher-Goldfarb-Shanno Bounded (L-BFGS-B) optimization technique is suited for Table 1: List of cost functions currently implemented in PyDDA.

Cost Function Symbol Routine
Total

Mass continuity J mass calculate_mass_continuity
Vertical vorticity minimizing such a cost function [16,20].The L-BFGS-B technique needs the cost function J(v), its gradient ∇J(v), the bounds of v(x ijk , y ijk , z ijk ) and initial guess of v(x ijk , y ijk , z ijk ) for i ∈ [0, m), j ∈ [0, n), and k ∈ [0, o), all of which are obtainable since J(v) is twice differentiable.L-BFGS-B is suited for minimizing J(v) as it is designed to conserve memory which is important when the dimensionality of J(v) is on the order of millions.Furthermore, since L-BFGS-B minimizes J(v) over a bounded domain, this ensures that convergence to a physically realistic solution is reached.
In addition, SciPy's implementation of L-BFGS-B takes advantage of as many cores that are present in a single machine allowing for fast convergence.The retrieval module contains the code that implements the 3DVAR technique using L-BFGS-B.In particular, the code is contained within the get_dd_ wind_field procedure which takes in input from an arbitrary number of Py-ART Grid objects representing an arbitrary number of radars and automatically enters in the optimal inputs to L-BFGS-B for the user.get_ dd_wind_field will run iterations of L-BFGS-B until convergence is reached.Convergence is reached when either |∇(J(v))|<10 -3 or when the maximum change in w over 10 iterations is less than 0.2 m s -1 .
Another component of the optimization problem is that the final v may be sensitive to the initial guess for v. Therefore, PyDDA has an initalization module that provides users with several options for initial guesses listed in Table 3.All of these functions return the initial v as a 3-tuple of NumPy arrays on the Py-ART Grid objects' grid.This standardized form for outputs in the initialization module allows for the addition of custom initializations.
Having both the initialization and retrieval contained within PyDDA reduces the wind retrieval technique to just a few lines of code as shown below.

Data visualization
In addition to calculation, PyDDA comes with a basic visualization module that supports the plotting of wind barbs and streamlines over gridded radar reflectivity fields for easy viewing of the results.visualization was written in Python and uses the matplotlib and cartopy libraries for visualization of the data.Figure 1 shows an example cross section through a thunderstorm sampled by 2 Doppler radars in Darwin, Australia.In addition to plotting over the analysis grid, plotting over geographic maps is supported through the use of the cartopy package [17], with an example streamline plot shown in Figure 2.

Software Quality control
Using the GitHub issue tracker, users can request for help with PyDDA as well as point out potential issues with the software.This provides a forum for contributors to make various improvements to PyDDA.When a contributor makes a modification to PyDDA, the contributor submits a pull request to the master branch of openradar/PyDDA on GitHub.For each pull request, pytest runs a suite of tests on each module of PyDDA to assure that the quality of the retrievals are maintained.In addition, whenever a user submits a pull request on GitHub, Travis CI builds PyDDA in the test environment and runs pytest whenever a pull request is made to the master branch of PyDDA.Travis CI runs these tests on Python 3.6, 3.7, and 3.8 environments on Linux.If the build from Travis CI fails, the contributor is asked to resolve the issues with the code until the continuous integration completes successfully.In addition, each pull request has to be approved by the main developer.Therefore, the main developer determines if the contributor should write a unit test for their added module.Furthermore, PyDDA is also released on the conda-forge repository which requires that Circle CI, Travis CI, and AppVeyor all run PyDDA's unit tests before the release is published on conda-forge.
Unit tests also need to validate the results of PyDDA against basic dynamical principles.For example, the unit tests that tests the get_dd_wind_field function will perform the retrieval on the example in Figure 2 against previous 3DVAR retrievals of this same dataset by [21].While there are differences between the retrieval used by [21] and in PyDDA, both [21] and PyDDA should show resolve a strong updraft (w > 1) in the location given in Figure 2b.Therefore, the test checks for the presence of this updraft.In order to test individual cost functions, properties of the wind field that would be expected from physical or mathematical principles are checked.For example, in order to test the J mass = ∇ • v, the test check to see if J mass is zero in a constant wind field, but negative in a convergent wind field and positive in a divergent wind field.

Support Documentation
Documentation is provided at http://openradarscience. org/PyDDA.It is automatically updated by Travis CI when pull request to the master branch is approved on GitHub.Sphinx automatically generates documentation from the docstrings at the top of each procedure that are written in reStructuredText.Travis CI, using doctr, will run Sphinx every time a pull request to the master branch is approved in order to update the documentation.Therefore, contributors must provide documentation in docstrings at the top of their procedures.Three examples on how to use PyDDA are provided in the documentation.Any user can download and run them to check if the PyDDA is working.

Issues
Developing wind retrievals from radars is typically a complicated task.Therefore, PyDDA has support methods available for users.The primary method of support is through PyDDA's issue tracker on GitHub.On the issue tracker, users are encouraged to ask questions as well as point out potential bugs in PyDDA.In addition, users can provide input for features they would like to see as well as develop their own contributions.The PyDDA developers will examine each issue and provide support.If the issue is related to a bug in PyDDA, either the developer or the contributor will work on resolving the bug.

Operating system
Windows, Mac OS X, and Linux
3. An Intel x86 compatible CPU with 4+ cores.In a project funded by the Department of Energy's Office of Energy Efficiency & Renewable Energy, PyDDA will produce wind fields for assessing the effects of damaging winds on the electric grid.In addition, 3D wind retrievals are vital to understanding thunderstorm dynamics so an open source solution for such retrievals has great potential to be used by the meteorology community.

A synergy of radar observations and models in Hurricane Florence
Using PyDDA, combining observations from high resolution forecasting models and radar observations to create a more complete picture of the 3D wind field inside a hurricane is simple.For this example, we will show how the winds inside Hurricane Florence can be retrieved using PyDDA and show how using radar and model data together demonstrates the improvement in capabilities that PyDDA provides compared to previous software.On 14 September 2018, Hurricane Florence was within range of 2 radars from the NEXRAD network: KMHX stationed in Newport, NC and KLTX stationed in Wilmington, NC.Both radars provided continuous 3D observations of radial velocities inside Hurricane Florence as it made landfall over North Carolina.In addition, the High Resolution Rapid Refresh (HRRR) model was run at every hour, providing model reanalysis of observational data at an hourly temporal resolution and a 3.7 km spatial resolution.The archived HRRR data that is supported by PyDDA can be downloaded from the University of Utah's HRRR archive [22,23].
As one can see, in Figure 3 we have an incomplete picture of the wind field inside Hurricane Florence as the 2 NEXRAD radars only provide partial coverage of the winds at 0.5 km altitude.Therefore, we also constrain against the HRRR calculated winds in Figure 4, which gives us the retrieval in Figure 5.As you can see, a much more complete picture of Hurricane Florence is provided, showing that Eastern North Carolina is indeed having winds much higher than the 10 m s -1 inferred by Figure 3 and also showing the cyclonic circulation that would be expected in a hurricane.This shows how the integration of data from multiple sources enhances the applicability and expandability of PyDDA compared to other software, as Multidop and CEDRIC did not support the integration of model data.

Expanding PyDDA
PyDDA was designed so that contributors can easily add cost functions and custom initializations to PyDDA.In particular, since Equation ( 1) is a sum of cost functions, adding terms to Equation ( 1) expands the framework.The cost_functions module contains all of the cost functions that are used by PyDDA.Expanding the code to incorporate more cost functions involves implementing the cost function, its gradient, and then adding the cost function to the J_function in the cost_ functions module and the gradient to grad_J in  the cost_functions module.Since a cost function will typically be the integral of a function over a grid, commonly called a functional, the gradients are derived by taking the functional derivative of the cost function.For more information on functional derivatives, see [24].See [10] for formulas for the gradients to the mass continuity and radar observational cost functions currently used in PyDDA.Since the functional derivatives and the cost functions have explicit forms, they are all implemented into PyDDA using NumPy [14] and SciPy [15].The constraints module can be expanded to include more functions that interpolate data from various datasets to the analysis grid.Currently, it has two examples from HRRR and WRF data.The get_dd_wind_field already is able to take in any number of models with any field name, so any function that interpolates a dataset can be added to the constraints module.
Another way that PyDDA can be expanded is by adding in custom initializations to the 3DVAR technique.Since Py-DDA takes in a 3-tuple of NumPy arrays representing v with the same shape as the analysis grid as an initialization, any function that returns this 3-tuple can be added to the initialization module.Example initializations consisting of a constant wind field, a sounding, and HRRR and WRF outputs are already in PyDDA.
The PyDDA development team is currently seeking collaborators for expanding the functionality of PyDDA.If a user needs help in expanding PyDDA to include their own sensor or model, they are encouraged to contact the main developer currently supporting PyDDA, Robert Jackson, at rjackson@anl.govfor collaboration on this effort.There is a contributor's guide in the documentation that shows the code style, code of conduct, license, and documentation standards that must be followed for each contribution.In addition, we have published a roadmap of the eventual goals of PyDDA, available at https://github.com/openradar/PyDDA/blob/master/ROADMAP.md that show what contributions to PyDDA are useful.

Figure 1 :
Figure 1: An example wind barb plot overlaid on radar reflectivity at 3.5 km altitude for winds retrieved in thunderstorms sampled by 2 radars over Darwin.Contours represent the presence of updrafts with given velocities at the 3.5 km height level.The area inside the two circles indicate where the wind retrieval is most reliable.

Figure 2 :( 3 )
Figure 2: (a) An example streamline plot overlaid on radar reflectivity at 3 km altitude observed from 2 radars over Darwin, Australia.Contours represent horizontal wind speed.(b) as (a), but zoomed into the region enclosed by the grey box.Contours represent vertical wind speed.

Figure 3 :
Figure 3: Winds at 0.5 km retrieved by PyDDA using only data from the 2 NEXRAD radars overlaid on a reflectivity mosaic generated from the two NEXRAD radars.Barbs are in m s -1 .

Figure 4 :
Figure 4: As Figure 3, but only using the HRRR model as a constraint.

Figure 5 :
Figure 5: As Figure 3, but using both the radar and HRRR winds as constraints.

Table 2 :
List of cost function gradients currently implemented in PyDDA.

Table 3 :
List of current initialization options in PyDDA.