Abstract:
Kalman filtering and smoothing algorithms are used in many areas, including tracking and navigation, medical applications, and financial trend filtering. One of the basic assumptions required to apply the Kalman smoothing framework is that error covariance matrices are known and given. In this paper, we study a general class of inference problems where covariance matrices can depend functionally on unknown parameters. In the Kalman framework, this allows modeling situations where covariance matrices may depend functionally on the state sequence being estimated. We present an extended formulation and generalized Gauss-Newton (GGN) algorithm for inference in this context. When applied to dynamic systems inference, we show the algorithm can be implemented to preserve the computational efficiency of the classic Kalman smoother. The new approach is illustrated with a synthetic numerical example.

Abstract:
Many inverse problems include nuisance parameters which, while not of direct interest, are required to recover primary parameters. Structure present in these problems allows efficient optimization strategies - a well known example is variable projection, where nonlinear least squares problems which are linear in some parameters can be very efficiently optimized. In this paper, we extend the idea of projecting out a subset over the variables to a broad class of maximum likelihood (ML) and maximum a posteriori likelihood (MAP) problems with nuisance parameters, such as variance or degrees of freedom. As a result, we are able to incorporate nuisance parameter estimation into large-scale constrained and unconstrained inverse problem formulations. We apply the approach to a variety of problems, including estimation of unknown variance parameters in the Gaussian model, degree of freedom (d.o.f.) parameter estimation in the context of robust inverse problems, automatic calibration, and optimal experimental design. Using numerical examples, we demonstrate improvement in recovery of primary parameters for several large- scale inverse problems. The proposed approach is compatible with a wide variety of algorithms and formulations, and its implementation requires only minor modifications to existing algorithms.

Abstract:
We explore the use of stochastic optimization methods for seismic waveform inversion. The basic principle of such methods is to randomly draw a batch of realizations of a given misfit function and goes back to the 1950s. The ultimate goal of such an approach is to dramatically reduce the computational cost involved in evaluating the misfit. Following earlier work, we introduce the stochasticity in waveform inversion problem in a rigorous way via a technique called randomized trace estimation. We then review theoretical results that underlie recent developments in the use of stochastic methods for waveform inversion. We present numerical experiments to illustrate the behavior of different types of stochastic optimization methods and investigate the sensitivity to the batch size and the noise level in the data. We find that it is possible to reproduce results that are qualitatively similar to the solution of the full problem with modest batch sizes, even on noisy data. Each iteration of the corresponding stochastic methods requires an order of magnitude fewer PDE solves than a comparable deterministic method applied to the full problem, which may lead to an order of magnitude speedup for waveform inversion in practice.

Abstract:
We explore the use of stochastic optimization methods for seismic waveform inversion. The basic principle of such methods is to randomly draw a batch of realizations of a given misfit function and goes back to the 1950s. The ultimate goal of such an approach is to dramatically reduce the computational cost involved in evaluating the misfit. Following earlier work, we introduce the stochasticity in waveform inversion problem in a rigorous way via a technique called randomized trace estimation. We then review theoretical results that underlie recent developments in the use of stochastic methods for waveform inversion. We present numerical experiments to illustrate the behavior of different types of stochastic optimization methods and investigate the sensitivity to the batch size and the noise level in the data. We find that it is possible to reproduce results that are qualitatively similar to the solution of the full problem with modest batch sizes, even on noisy data. Each iteration of the corresponding stochastic methods requires an order of magnitude fewer PDE solves than a comparable deterministic method applied to the full problem, which may lead to an order of magnitude speedup for waveform inversion in practice. 1. Introduction The use of simultaneous source data in seismic imaging has a long history. So far, simultaneous sources have been used to increase the efficiency of data acquisition [1, 2], migration [3, 4], and simulation [5–7]. Recently, the use of simultaneous source encoding has found its way into waveform inversion. Two key factors play a role in this development: (i) in 3D, one is forced to use modeling engines whose cost is proportional to the number of shots (as opposed to 2D frequency-domain methods where one can reuse the LU factorization to cheaply model any number of shots) and (ii) the curse of dimensionality: the number of shots and the number of gridpoints grows by an order of magnitude. The basic idea of replacing single-shot data by randomly combined “super shots” is intuitively pleasing and has lead to several algorithms [8–11]. All of these aim at reducing the computational costs of full waveform inversion by reducing the number of PDE solves (i.e., the number of simulations). This reduction comes at the cost of introducing random crosstalk between the shots into the problem. It was observed by Krebs et al. [8] that it is beneficial to recombine the shots at every iteration to suppress the random crosstalk and that the approach might be more sensitive to noise in the data. In this paper, we follow Haber et al. [12] and

Abstract:
Regularization plays a key role in a variety of optimization formulations of inverse problems. A recurring theme in regularization approaches is the selection of regularization parameters, and their effect on the solution and on the optimal value of the optimization problem. The sensitivity of the value function to the regularization parameter can be linked directly to the Lagrange multipliers. This paper characterizes the variational properties of the value functions for a broad class of convex formulations, which are not all covered by standard Lagrange multiplier theory. An inverse function theorem is given that links the value functions of different regularization formulations (not necessarily convex). These results have implications for the selection of regularization parameters, and the development of specialized algorithms. Numerical examples illustrate the theoretical results.

Abstract:
We consider an important class of signal processing problems where the signal of interest is known to be sparse, and can be recovered from data given auxiliary information about how the data was generated. For example, a sparse Green's function may be recovered from seismic experimental data using sparsity optimization when the source signature is known. Unfortunately, in practice this information is often missing, and must be recovered from data along with the signal using deconvolution techniques. In this paper, we present a novel methodology to simultaneously solve for the sparse signal and auxiliary parameters using a recently proposed variable projection technique. Our main contribution is to combine variable projection with sparsity promoting optimization, obtaining an efficient algorithm for large-scale sparse deconvolution problems. We demonstrate the algorithm on a seismic imaging example.

Abstract:
We present a Kalman smoothing framework based on modeling errors using the heavy tailed Student's t distribution, along with algorithms, convergence theory, open-source general implementation, and several important applications. The computational effort per iteration grows linearly with the length of the time series, and all smoothers allow nonlinear process and measurement models. Robust smoothers form an important subclass of smoothers within this framework. These smoothers work in situations where measurements are highly contaminated by noise or include data unexplained by the forward model. Highly robust smoothers are developed by modeling measurement errors using the Student's t distribution, and outperform the recently proposed L1-Laplace smoother in extreme situations with data containing 20% or more outliers. A second special application we consider in detail allows tracking sudden changes in the state. It is developed by modeling process noise using the Student's t distribution, and the resulting smoother can track sudden changes in the state. These features can be used separately or in tandem, and we present a general smoother algorithm and open source implementation, together with convergence analysis that covers a wide range of smoothers. A key ingredient of our approach is a technique to deal with the non-convexity of the Student's t loss function. Numerical results for linear and nonlinear models illustrate the performance of the new smoothers for robust and tracking applications, as well as for mixed problems that have both types of features.

Abstract:
The classical approach to linear system identification is given by parametric Prediction Error Methods (PEM). In this context, model complexity is often unknown so that a model order selection step is needed to suitably trade-off bias and variance. Recently, a different approach to linear system identification has been introduced, where model order determination is avoided by using a regularized least squares framework. In particular, the penalty term on the impulse response is defined by so called stable spline kernels. They embed information on regularity and BIBO stability, and depend on a small number of parameters which can be estimated from data. In this paper, we provide new nonsmooth formulations of the stable spline estimator. In particular, we consider linear system identification problems in a very broad context, where regularization functionals and data misfits can come from a rich set of piecewise linear quadratic functions. Moreover, our anal- ysis includes polyhedral inequality constraints on the unknown impulse response. For any formulation in this class, we show that interior point methods can be used to solve the system identification problem, with complexity O(n3)+O(mn2) in each iteration, where n and m are the number of impulse response coefficients and measurements, respectively. The usefulness of the framework is illustrated via a numerical experiment where output measurements are contaminated by outliers.

Abstract:
We introduce a class of quadratic support (QS) functions, many of which play a crucial role in a variety of applications, including machine learning, robust statistical inference, sparsity promotion, and Kalman smoothing. Well known examples include the l2, Huber, l1 and Vapnik losses. We build on a dual representation for QS functions using convex analysis, revealing the structure necessary for a QS function to be interpreted as the negative log of a probability density, and providing the foundation for statistical interpretation and analysis of QS loss functions. For a subclass of QS functions called piecewise linear quadratic (PLQ) penalties, we also develop efficient numerical estimation schemes. These components form a flexible statistical modeling framework for a variety of learning applications, together with a toolbox of efficient numerical methods for inference. In particular, for PLQ densities, interior point (IP) methods can be used. IP methods solve nonsmooth optimization problems by working directly with smooth systems of equations characterizing their optimality. The efficiency of the IP approach depends on the structure of particular applications. We consider the class of dynamic inverse problems using Kalman smoothing, where the aim is to reconstruct the state of a dynamical system with known process and measurement models starting from noisy output samples. In the classical case, Gaussian errors are assumed in the process and measurement models. The extended framework allows arbitrary PLQ densities to be used, and the proposed IP approach solves the generalized Kalman smoothing problem while maintaining the linear complexity in the size of the time series, just as in the Gaussian case. This extends the computational efficiency of classic algorithms to a much broader nonsmooth setting, and includes many recently proposed robust and sparse smoothers as special cases.

Abstract:
In this paper, we present the optimization formulation of the Kalman filtering and smoothing problems, and use this perspective to develop a variety of extensions and applications. We first formulate classic Kalman smoothing as a least squares problem, highlight special structure, and show that the classic filtering and smoothing algorithms are equivalent to a particular algorithm for solving this problem. Once this equivalence is established, we present extensions of Kalman smoothing to systems with nonlinear process and measurement models, systems with linear and nonlinear inequality constraints, systems with outliers in the measurements or sudden changes in the state, and systems where the sparsity of the state sequence must be accounted for. All extensions preserve the computational efficiency of the classic algorithms, and most of the extensions are illustrated with numerical examples, which are part of an open source Kalman smoothing Matlab/Octave package.