Generalized Sampling in Julia

Generalized sampling is a numerically stable framework for obtaining reconstructions of signals in different bases and frames from their samples. In this paper, we will introduce a carefully documented toolbox for performing generalized sampling in Julia. Julia is a new language for technical computing with focus on performance, which is ideally suited to handle the large size problems often encountered in generalized sampling. The toolbox provides specialized solutions for the setup of Fourier bases and wavelets. The performance of the toolbox is compared to existing implementations of generalized sampling in MATLAB.


Introduction
Generalized sampling [3,4] is a framework for estimating representations of functions in different bases and frames in a numerically stable manner. This paper documents a toolbox for performing generalized sampling in Julia [5]. Julia is a new language for technical computing with focus on performance, which is essential for the large problems encountered in generalized sampling.
The theory of generalized sampling does not restrict the type of bases to consider, but the applications have focused on Fourier bases and multiscale representations like wavelets. Hence our software has specialized solutions for this particular set of bases.
The paper is organized as follows: In Section 2 and Section 3 we recap the generalized sampling framework, set the notation and discuss the prerequisites. In Sections 4 and 5 we introduce the algorithms in 1D and 2D, respectively. Section 6 is an introduction to the released software.

Generalized sampling
Mathematically, samples of a function f in a Hilbert space H with respect to a sample basis {s n } n∈N consists of inner products { f, s n } n∈N . In generalized sampling we want to use these samples to estimate the inner products { f, r n } n∈N , where {r n } n∈N is another basis for H. The basis {r n } n∈N is used for reconstructing f .
In practice we only consider a finite number of sampling and reconstruction functions, i.e., we have access to the samples From these samples we estimate the coefficients in the reconstruction basis, w r n ≈ w r n = f, r n , 1 ≤ n ≤ N which are used to compute an approximation of f , The actual computation of the reconstruction coefficients w r = { w r n } N n=1 is performed by solving a least squares problem. The infinite change of basis matrix between the sampling and reconstruction subspaces has (i, j)'th entry r j , s i . We consider a finite M × N section of this matrix, denoted by T : The reconstruction coefficients are computed as the least squares solution It is well-known that the solution of (4) is w r = T † w s where T † is the pseudo-inverse of T . However, for large matrices T it is not feasible to compute this solution analytically. In fact, for realistic sample sizes it may not even be possible to store the change of basis matrix (3). Therefore, in order to enjoy generalized sampling we need specialized algorithms for each set of sampling and reconstruction bases. A popular algorithm for solving large least squares problems is conjugate gradients (see e.g. [8, p. 637]). To apply the conjugate gradients algorithm we need to be able to compute matrix-vector products with T and its adjoint T * . The convergence rate of conjugate gradients (and similar algorihtms) depends on the condition number of T . As mentioned earlier, the benefit of generalized sampling is that we have conditions that ensure good numerical properties of T . The key trick that ensures numerical stability is to let N < M , i.e., to have more samples than reconstruction coefficients; the relation between N and M is determined by the stable sampling rate. For the particular choice of sampling in Fourier space and reconstructing with wavelets the stable sampling rate is linear [4].
Thus, in order to perform generalized sampling, i.e., compute the desired coefficients (1) we need to be able to perform multiplications with T and T * . For the specific choice of sampling with Fourier bases and reconstructing in wavelet bases, this can be accomplished efficiently by using non-uniform fast Fourier transforms.

Notation
When sampling frequency responses we let {ξ m } M m=1 denote the frequency locations in 1D and {ξ m } M m=1 , where ξ m = (ξ xm , ξ ym ), denote the frequency locations in 2D. With H = L 2 (R d ) the elements in T are evaluations of the Fourier transforms of the reconstruction functions, i.e., r j , s i = r j (ξ i ).

Non-uniform fast Fourier transform
To introduce the non-uniform discrete Fourier transform (NDFT) in dimension D, let N d , d = 1, . . . , D be even, positive integers, N = (N 1 , . . . , N D ) and The set of sampling locations is denoted This can be written as a matrix multiplication y = F x, where F m,n = exp(−2πik n · ξ m ) with a suitable ordering of the elements in I N .
The adjoint NDFT is defined as multiplication with F * , i.e., if z = F * y, then In Julia we have access to an NFFT package (https://github.com/tknopp/NFFT.jl) for fast approximation of the NDFT and adjoint NDFT. The NFFT package is inspired by the C library documented in [11].
If the M sampling points are uniformly distributed the NDFT (5) reduces to a DFT with appropriate phase shifts. For simplicity we consider the 1D situation with where ε −1 ∈ N and ξ m = ξ m /N . With = k + 1 + N/2 (5) becomes With this notation we see that

Daubechies scaling functions
Let φ denote the scaling function of a multiresolution analysis (see e.g. [10]). The scaling function on scale J with translation k is defined as We know that If φ is the Haar wavelet, then for all J 0 ≥ 0 For Daubechies wavelets of higher orders, J needs to be large enough to ensure that supp(φ J ) ⊆ [− 1 2 , 1 2 ] and the functions near the boundaries (i.e., with k near −2 J−1 or 2 J−1 ) need further corrections. We use the boundary wavelets of [6] that have the same number of vanishing moments as the internal/non-boundary wavelets. With p vanishing moments there are p left boundary functions φ L k and p right boundary functions φ R k , k = 0, . . . , p − 1. In both cases k = 0 is the function closest to the associated edge, i.e., when traversing the functions from left to right the order is φ L 0 , . . . , φ L p−1 at the left edge and φ R p−1 , . . . , φ R 0 at the right edge. At scale J we define and similarly for φ R k . Let τ h deonte the translation operator, . For a scaling function related to a Daubechies wavelet with p > 1 vanishing moments, supp(φ) = [−p + 1, p] and for J with 2 J ≥ p, we let For the Haar wavelet φ int Then

Generalized sampling in 1D
For a function f ∈ L 2 (R) we wish to compute an approximation of f 1 [− 1 2 , 1 2 ] with scaling functions from a single V int J as defined in (6). Let χ = {ξ m } M m=1 denote the frequency locations where we obtain samples y m = f (ξ m ). The scale of the reconstruction is J and N = 2 J is the number of reconstructed coefficients. The boundary scaling functions needs to be translated and dilated to fit this reconstruction interval, i.e., on the left side we consider τ − 1 2 δ 2 J φ L k and on the right side we consider τ 1 Let p ≥ 1 be the number of vanishing moments of the scaling function and t m,n denote the (m, n)'th entry of the change of basis matrix T : With the usual calculus for Fourier transforms, (11) and (12), we have that Introduce the diagonal matrix Let L be the M × p matrix with entries L m,n = t m,n , R be the M × p matrix with entries R m,n = t m,N −p+1+n and I = D · F . With this notation we can write T as the block matrix T = L I R .

Non-uniform sampling
With non-uniform sampling points we may have clusters and desolate areas in the frequencies. To compensate for this the reconstructed coefficients are computed as the solution of a weighted least squares problem: where Q = diag(µ m , 1 ≤ m ≤ M ). From the solver's point of view, the only change from the ordinary least squares is that w s m and t m,n are multiplied with √ µ m for all n = 1, . . . , N .
To determine the weights, we follow [1] and [2]. For simplicity, we focus on the case where the unknown function f is supported in where Ω is the set of sampling frequencies and Y ⊂ R is a closed, simply connected set.
With this notation, we sum up some of the results of [2]:

Theorem 1
Let Ω be a countable set of sampling frequencies such that δ [− 1 and µ ω is the Lebesque measure of the Voronoi region of ω ∈ Ω.
1 [2] refers to is as a density, but the lower the number, the more dense is the set for some Y and sufficiently large N , the above results leads to (8).

Generalized sampling in 2D
For f ∈ L 2 (R 2 ) we wish to compute an approximation of f 1 [− 1 J and obtain an approximation relative to V int J ⊗ V int J . This way we obtain a similar block structure of the change of basis matrix. Indeed, if both dimensions are divided as in (7), the division of 2D coefficients are For most applications, the central part I x I y will be dominant, and in certain cases one may even choose to disregard the edge scaling functions. An explicit example of (9) in the setup with one sample point and two left, two central, and two right scaling functions can be found in Appendix C. Just as in the 1D case, non-uniform sampling patterns require that we solve a weighted least squares problem (8) and introduce the notion of bandwidth and density -this deferred to Appendix B.

The GeneralizedSampling package
This section introduce the basic use of the GeneralizedSampling package through examples and demonstrate the performance. The examples are shamefully copied from Hansen et al.

Package overview
We have two goals with the GeneralizedSampling package: It should be fast and easy to use. We have therefore put effort into providing only a few necessary high-level functions and hiding the lower level details. The most important function is Freq2Wave that computes a representation of the change of basis matrix. Several built-in functions are overloaded to make the output of Freq2Wave behave like an ordinary matrix, including the backslash operator "\" for computing least squares solutions to T x = y. Currently, the least squares solution is computed with a conjugate gradient procedure.
A separate package, IntervalWavelets, has been developed to visualize the wavelet representations and is available at https://github.com/robertdj/IntervalWavelets.jl. The function of interest from IntervalWavelets is weval that evaluates a representation in the basis of V int J from (6).

Using the package
We begin with an example of how reconstruction is performed in 1D. These examples are also included in the package as scripts that are ready to run.
The Fourier transform f of function f : R → R is measured in the frequency domain at { n 2 } 63 n=−64 , i.e., we have access to { f n 2 } 63 n=−64 . For convenience points on a uniform grid with distance ε apart are available with the function grid. We wish to compute an approximation of f in the Haar basis at scale 6, i.e., with 64 Haar scaling functions.
In Julia, let fhat denote a vector with the values of the Fourier transform.
julia > using GeneralizedSampling julia > xi = grid (128 , 0.5) julia > T = Freq2Wave ( xi , " haar " , 6) julia > wcoef = T \ fhat To evaluate the vector wcoef of coefficients for the Haar scaling functions, weval of the IntervalWavelets package is used. The wcoef vector has complex entries and weval only accepts real vectors. Furthermore, the resolution of the reconstruction must be specified: A general Daubechies wavelet can only be computed in the dyadic rationals, i.e., points of the form k/2 R for k ∈ Z and R ∈ N ∪ {0}, where R is referred to as the resolution.
julia > using IntervalWavelets julia > x , y = weval ( real ( wcoef ) , " haar " , 10) An example included in the package is the reconstruction of a truncated cosine (with inspiration from [7]). The result is seen in Fig. 1.
To reconstruct in a different Daubechies basis associated with at wavelet with p vanishing moments, two things must be changed in the above code: julia > T = Freq2Wave ( xi , " dbp " , J ) julia > x , y = weval ( real ( wcoef ) , " dbp " , 10) The output of weval are vectors with entries that are pairs of x, f N,M x) . In the example with the truncated cosine the higher order, continuous Daubechies scaling functions are not well suited to represent the discontinuity. As metioned in Section 4.1 it may be of interest to compute reconstructions from nonuniform sampling points. In this situation the bandwidth must be supplied as a fourth parameter to Freq2Wave.
Reconstruction of 2D functions/images is performed in a very similar manner. The only difference is that the sampling locations xi must be a matrix with two columns. Remember when choosing the scale J that the number of scaling functions at scale J is 4 J instead of the 2 J in 1D and the matrices therefore grow rapidly with the scale.
As an example we consider reconstruction of a simulated brain made with the Matlab [12] software released along with [9] (available at http://bigwww.epfl.ch/algorithms/ mriphantom). The reconstructed brain with the Daubechies 4 scaling functions is seen in Fig. 2.
These examples are released along with the code. To avoid having to compute the frequencies for the brain images we have saved these in a native Julia format. However, that these files are quite large and not release with the source code -instead they are available from one of the author's website: http://people.math.aau.dk/~morteng.

Runtime and technical comparison
Julia is an interpreted language with a fast JIT compiler. A consequence is that a function is compiled the first time it is called, causing an overhead in terms of time and memory. All subsequent calls are, however, without this compiling overhead. The runtimes reported in this section are not for the first run.   [7] (available at http://www.damtp.cam.ac.uk/research/afha/code). In this connection two comparisons are relevant: Computing the representation of the change of matrix (initialization/"init") and using this representation to compute the least squares solution (solution/"sol"). In both cases the initialization step is fast for Haar scaling functions: All Fourier transforms have simple, analytical expressions that are easily vectorized. For higher order Daubechies scaling functions all computations rely on iterative prodcedures.
In both packages the solution step is based on a conjugate gradient like algorithm, where the computational cost is dominated by multiplication with the change of basis matrix and its adjoint. In Matlab the built-in lsqr function is used and in Julia a custom implementation of the conjugate gradient for is used.
In GeneralizedSampling we have relied on Julia's ability to write functions that modify their arguments in-place to drastically reduce the memory consumption in an iterative algorihtm like conjugate gradients. Julia's @time macro makes it easy to estimate the memory allocation of a function. Matlab has no such documented features and we have therefore not included comparisons on memory usage.
Especially for fast runtimes it is not accurate to rely on timing a single run of a function (using e.g. tic and toc in Matlab). In Matlab the times are obtained with the built-in timeit function and in Julia we use the benchmark package BenchmarkTools available at https://github.com/JuliaCI/BenchmarkTools.jl.
For small problems the Julia and Matlab code are comparable, but for large problems the Julia code is significantly faster. The one execption is for the "Uniform 2D" exam-ple: The Julia timing is using the general NFFT algorithm, whereas the Matlab timing is considering the special case where the standard FFT is applicable, as explained in Section 3.2.
Note that the "sol" times and number of iterations are not directly comparable, since [7] use L 2 ([0, 1] d ) as the reconstruction space. But the time per iteration ("s/n") are comparable.

Availability of the package
The GeneralizedSampling package is open-source with an MIT license and available from the GitHub repository https://github.com/robertdj/GeneralizedSampling.jl. For an easy installation use the built-in package manager in Julia:

A. Fourier transform of Daubechies wavelets
In the code and experiments we let H = L 2 (R) and define the Fourier transform of f ∈ L 1 (R) as Let δ a and τ h denote a dilation and translation operator, respectively: We have the following wellknown relations between the Fourier operator and the dilation and translation operators: With the properties (11) and (12) we have for a general scaling function that For the Haar wavelet basis we have analytic expressions for the Fourier transform of the scaling function: A general Daubechies scaling function φ is defined by a filter {h k } k∈Z where only finitely many entries are non-zero. The associated low-pass filter, m 0 , is defined as The Fourier transform is computed in terms of the low-pass filter: see e.g. [10]. To ensure convergence of the product in (14), the filter coefficients must be scaled such that m 0 (0) = 1. In the GeneralizedSampling package we use the filters provided in [6].

A.1. Fourier transform of boundary scaling functions
Computation of the Fourier transform of the boundary wavelets of [6] is described in [7] and repeated here for completion. The left boundary scaling functions satisfies the following dilation equation: Applying the Fourier transform to this equation yields that These equations are collected in vector form by introducing the matrices With this notation (16) can be written as v left that can be solved with respect to v left The counterpart of (15) for the right boundary scaling functions are Introduce the matrices U right and V right completely analogously to U left and V left , respectively, and let v right With these notational counterparts, the computations above for the left boundary scaling functions can be copied for the right scaling functions.

A.2. Fourier transform in 2D
When considering two dimensional wavelets we introduce the scaling function and the horizontal, vertical and diagonal wavelets as the tensor products We denote the two dimensional wavelet functions by When the scale is fixed, the translations are used to index the function. The separable nature of these functions gives the identity

B. Weights in non-uniform sampling
The bandwidth area is divided into the Voronoi tesselation induced by the sampling points, i.e., the Voronoi cell of point ξ i is The weight µ m of sampling point ξ m is then the area of V m . Let B denote the collection of boundaries of the Voronoi cells: The density δ is defined as Since Y K is closed, δ is attained. Due to (19), each x ∈ Y K lies either in a unique Voronoi cell or in B and δ is attained at a point in B, which is also a closed set. More precisely, as B is a union of straight line segments, the supremum is attained at one of the corners in B. Algorithm 1 summarizes this.

C. Multiplication in 2D
In 2D we have the block structure for the reconstruction coefficients as considered in (9). We refer to the vectorized version of a matrix as the vector obtained by stacking the columns of the matrix. In the change of basis matrix the structure of a row is the vectorized form of the matrix in (9). However, both in the implementation and this section we consider multiplication of a change of basis matrix and this "matrix form" of the coefficients.
As an example, consider the hypothetical situation with 2 left, 2 internal and 2 right scaling functions. The matrix for a single frequency ξ = (ξ x , ξ y ) is then as follows: In the column-major ordering of matrices used in Julia, this orders the scaling functions first by the y-coordinate and then by the x-coordinate, which is consistent with the ordering used in the NFFT package.

C.1. Multiplication with T
The contributions to the product of the change of basis matix with a matrix of the associated coefficients must be handled differently: • The internal part I x I y .
• The "upper" and "lower" parts, L x I y and R x I y .
• The "left" and "right" parts, the vertical blocks [(L x L y ) (I x L y ) (R x L y ) ] and The internal part is handled by a 2D NFFT and for any reasonably large image this dominates the computation time. As a representative of the "left"/"right" parts, consider the contribution of the 'th column of X, 1 ≤ ≤ p, where the contribution to the m'th entry in the product is The sum is recognized as the multiplication with a 1D change of basis matrix based on the x coordinates of the samples. Conclusively, the "left" contribution is based on 1D multiplication and a subsequent Hadamard product. For the "right" part, the only change is that the Hadamard product is with φ R −1 (ξ my ) M m=1 . As an example of the "upper"/"lower" parts, consider the 'th row of X, 1 ≤ ≤ p, where the contribution to the m'th entry in the product is: The sum is a 1D NDFT of the row vector [x ,k ] 2 J −p k=p+1 , approximated with the NFFT. As in the "left"/"right" case, we must afterwards perform a Hadamard product.

C.2. Multiplication with T *
In the multiplication X = T * v, The situation is therefore similar to that in Appendix C.1: The interior part of X is computed with a 2D adjoint NFFT and the "left"/"right" and "upper"/"lower" parts are computed with appropriate combinations of Hadamard products and 1D multiplications/NFFT's.