Abstract:
We give a new proof of Talagrand's transportation-cost inequality on Gaussian spaces. The proof combines the large deviation approach from Gozlan in [Ann. Prob. (2009)] with the Borell-Sudakov-Tsirelson inequality. Several applications are discussed. First, we show how to deduce transportation-cost inequalities for the law of diffusions driven by Gaussian processes both in the additive and the multiplicative noise case. In the multiplicative case, the equation is understood in rough paths sense and we use properties of the It\=o-Lyons map to deduce the inequalities which improves existing results even in the Brownian motion case. Second, we present a general theorem which allows to derive Gaussian tail estimates for functionals on spaces on which a $p$-transportation-cost inequality holds. In the Gaussian case, this result can be seen as a generalization of the ``generalized Fernique theorem'' from Friz and Oberhauser obtained in [Proc. Amer. Math. Soc. (2010)]. Applications to objects in rough path theory are given, such as solutions to rough differential equations and to a counting process studied by Cass, Litterer, Lyons in [Ann. Prob. (2013)].

Abstract:
In this work we present Cutting Plane Inference (CPI), a Maximum A Posteriori (MAP) inference method for Statistical Relational Learning. Framed in terms of Markov Logic and inspired by the Cutting Plane Method, it can be seen as a meta algorithm that instantiates small parts of a large and complex Markov Network and then solves these using a conventional MAP method. We evaluate CPI on two tasks, Semantic Role Labelling and Joint Entity Resolution, while plugging in two different MAP inference methods: the current method of choice for MAP inference in Markov Logic, MaxWalkSAT, and Integer Linear Programming. We observe that when used with CPI both methods are significantly faster than when used alone. In addition, CPI improves the accuracy of MaxWalkSAT and maintains the exactness of Integer Linear Programming.

Abstract:
Under the key assumption of finite {\rho}-variation, {\rho}\in[1,2), of the covariance of the underlying Gaussian process, sharp a.s. convergence rates for approximations of Gaussian rough paths are established. When applied to Brownian resp. fractional Brownian motion (fBM), {\rho}=1 resp. {\rho}=1/(2H), we recover and extend the respective results of [Hu--Nualart; Rough path analysis via fractional calculus; TAMS 361 (2009) 2689-2718] and [Deya--Neuenkirch--Tindel; A Milstein-type scheme without L\'evy area terms for SDEs driven by fractional Brownian motion; AIHP (2011)]. In particular, we establish an a.s. rate k^{-(1/{\rho}-1/2-{\epsilon})}, any {\epsilon}>0, for Wong-Zakai and Milstein-type approximations with mesh-size 1/k. When applied to fBM this answers a conjecture in the afore-mentioned references.

Abstract:
Integrability properties of (classical, linear, linear growth) rough differential equations (RDEs) are considered, the Jacobian of the RDE flow driven by Gaussian signals being a motivating example. We revisit and extend some recent ground-breaking work of Cass-Litterer-Lyons in this regard; as by-product, we obtain a user-friendly "transitivity property" of such integrability estimates. We also consider rough integrals; as a novel application, uniform Weibull tail estimates for a class of (random) rough integrals are obtained. A concrete example arises from the stochastic heat-equation, spatially mollified by hyper-viscosity, and we can recover (in fact: sharpen) a technical key result of [Hairer, Comm.PureAppl.Math.64,no.11,(2011),1547-1585].

Abstract:
We derive explicit distance bounds for Stratonovich iterated integrals along two Gaussian processes (also known as signatures of Gaussian rough paths) based on the regularity assumption of their covariance functions. Similar estimates have been obtained recently in [Friz-Riedel, AIHP, to appear]. One advantage of our argument is that we obtain the bound for the third level iterated integrals merely based on the first two levels, and this reflects the intrinsic nature of rough paths. Our estimates are sharp when both covariance functions have finite 1-variation, which includes a large class of Gaussian processes. Two applications of our estimates are discussed. The first one gives the a.s. convergence rates for approximated solutions to rough differential equations driven by Gaussian processes. In the second example, we show how to recover the optimal time regularity for solutions of some rough SPDEs.

Abstract:
We give meaning to differential equations with a rough path term and a Brownian noise term as driving signals. Such differential equations as well as the question of regularity of the solution map arise naturally and we discuss two applications: one revisits Clark's robustness problem in nonlinear filtering, the other is a Feynman--Kac type representation of linear RPDEs. En passant, we give a short and direct argument that implies integrability estimates for rough differential equations with Gaussian driving signals which is of independent interest.

Abstract:
It is a well-known fact that finite rho-variation of the covariance (in 2D sense) of a general Gaussian process implies finite rho-variation of Cameron-Martin paths. In the special case of fractional Brownian motion (think: 2H=1/rho), in the rougher than Brownian regime, a sharper result holds thanks to a Besov-type embedding [Friz-Victoir, JFA, 2006]. In the present note we give a general result which closes this gap. We comment on the importance of this result for various applications.

Abstract:
We speed up marginal inference by ignoring factors that do not significantly contribute to overall accuracy. In order to pick a suitable subset of factors to ignore, we propose three schemes: minimizing the number of model factors under a bound on the KL divergence between pruned and full models; minimizing the KL divergence under a bound on factor count; and minimizing the weighted sum of KL divergence and factor count. All three problems are solved using an approximation of the KL divergence than can be calculated in terms of marginals computed on a simple seed graph. Applied to synthetic image denoising and to three different types of NLP parsing models, this technique performs marginal inference up to 11 times faster than loopy BP, with graph sizes reduced up to 98%-at comparable error in marginals and parsing accuracy. We also show that minimizing the weighted sum of divergence and size is substantially faster than minimizing either of the other objectives based on the approximation to divergence presented here.

Abstract:
Belief Propagation has been widely used for marginal inference, however it is slow on problems with large-domain variables and high-order factors. Previous work provides useful approximations to facilitate inference on such models, but lacks important anytime properties such as: 1) providing accurate and consistent marginals when stopped early, 2) improving the approximation when run longer, and 3) converging to the fixed point of BP. To this end, we propose a message passing algorithm that works on sparse (partially instantiated) domains, and converges to consistent marginals using dynamic message scheduling. The algorithm grows the sparse domains incrementally, selecting the next value to add using prioritization schemes based on the gradients of the marginal inference objective. Our experiments demonstrate local anytime consistency and fast convergence, providing significant speedups over BP to obtain low-error marginals: up to 25 times on grid models, and up to 6 times on a real-world natural language processing task.

Abstract:
Systematic use of the published results of randomized clinical trials is increasingly important in evidence-based medicine. In order to collate and analyze the results from potentially numerous trials, evidence tables are used to represent trials concerning a set of interventions of interest. An evidence table has columns for the patient group, for each of the interventions being compared, for the criterion for the comparison (e.g. proportion who survived after 5 years from treatment), and for each of the results. Currently, it is a labour-intensive activity to read each published paper and extract the information for each field in an evidence table. There have been some NLP studies investigating how some of the features from papers can be extracted, or at least the relevant sentences identified. However, there is a lack of an NLP system for the systematic extraction of each item of information required for an evidence table. We address this need by a combination of a maximum entropy classifier, and integer linear programming. We use the later to handle constraints on what is an acceptable classification of the features to be extracted. With experimental results, we demonstrate substantial advantages in using global constraints (such as the features describing the patient group, and the interventions, must occur before the features describing the results of the comparison).