Abstract:
We present an unbiased and robust analysis method for power-law blinking statistics in the photoluminescence of single nano-emitters, allowing us to extract both the bright- and dark-state power-law exponents from the emitters' intensity autocorrelation functions. As opposed to the widely-used threshold method, our technique therefore does not require discriminating the emission levels of bright and dark states in the experimental intensity timetraces. We rely on the simultaneous recording of 450 emission timetraces of single CdSe/CdS core/shell quantum dots at a frame rate of 250 Hz with single photon sensitivity. Under these conditions, our approach can determine ON and OFF power-law exponents with a precision of 3% from a comparison to numerical simulations, even for shot-noise-dominated emission signals with an average intensity below 1 photon per frame and per quantum dot. These capabilities pave the way for the unbiased, threshold-free determination of blinking power-law exponents at the micro-second timescale.

Abstract:
We investigate the asymptotic behavior of Bayesian posterior distributions under independent and identically distributed ($i.i.d.$) misspecified models. More specifically, we study the concentration of the posterior distribution on neighborhoods of $f^{\star}$, the density that is closest in the Kullback--Leibler sense to the true model $f_0$. We note, through examples, the need for assumptions beyond the usual Kullback--Leibler support assumption. We then investigate consistency with respect to a general metric under three assumptions, each based on a notion of divergence measure, and then apply these to a weighted $L_1$-metric in convex models and non-convex models. Although a few results on this topic are available, we believe that these are somewhat inaccessible due, in part, to the technicalities and the subtle differences compared to the more familiar well-specified model case. One of our goals is to make some of the available results, especially that of , more accessible. Unlike their paper, our approach does not require construction of test sequences. We also discuss a preliminary extension of the $i.i.d.$ results to the independent but not identically distributed ($i.n.i.d.$) case.

Abstract:
Let $V = < p_{ij}(x)e^{\la_ix}, i=1,...,n, j=1, ..., N_i >$ be a space of quasi-polynomials of dimension $N=N_1+...+N_n$. Define the regularized fundamental operator of $V$ as the polynomial differential operator $D = \sum_{i=0}^N A_{N-i}(x)\p^i$ annihilating $V$ and such that its leading coefficient $A_0$ is a polynomial of the minimal possible degree. We construct a space of quasi-polynomials $U = < q_{ab}(u)e^{z_au} >$ whose regularized fundamental operator is the differential operator $\sum_{i=0}^N u^i A_{N-i}(\partial_u)$. The space $U$ is constructed from $V$ by a suitable integral transform. Our integral transform corresponds to the bispectral involution on the space of rational solutions (vanishing at infinity) to the KP hierarchy, see \cite{W}. As a corollary of the properties of the integral transform we obtain a correspondence between critical points of the two master functions associated with the $(\glN,\glM)$ dual Gaudin models as well as between the corresponding Bethe vectors.

Abstract:
Model selection is of fundamental importance to high dimensional modeling featured in many contemporary applications. Classical principles of model selection include the Kullback-Leibler divergence principle and the Bayesian principle, which lead to the Akaike information criterion and Bayesian information criterion when models are correctly specified. Yet model misspecification is unavoidable when we have no knowledge of the true model or when we have the correct family of distributions but miss some true predictor. In this paper, we propose a family of semi-Bayesian principles for model selection in misspecified models, which combine the strengths of the two well-known principles. We derive asymptotic expansions of the semi-Bayesian principles in misspecified generalized linear models, which give the new semi-Bayesian information criteria (SIC). A specific form of SIC admits a natural decomposition into the negative maximum quasi-log-likelihood, a penalty on model dimensionality, and a penalty on model misspecification directly. Numerical studies demonstrate the advantage of the newly proposed SIC methodology for model selection in both correctly specified and misspecified models.

Abstract:
Functional magnetic resonance imaging (fMRI) is one of the most widely used tools to study the neural underpinnings of human cognition. Standard analysis of fMRI data relies on a general linear model (GLM) approach to separate stimulus induced signals from noise. Crucially, this approach relies on a number of assumptions about the data which, for inferences to be valid, must be met. The current paper reviews the GLM approach to analysis of fMRI time-series, focusing in particular on the degree to which such data abides by the assumptions of the GLM framework, and on the methods that have been developed to correct for any violation of those assumptions. Rather than biasing estimates of effect size, the major consequence of non-conformity to the assumptions is to introduce bias into estimates of the variance, thus affecting test statistics, power, and false positive rates. Furthermore, this bias can have pervasive effects on both individual subject and group-level statistics, potentially yielding qualitatively different results across replications, especially after the thresholding procedures commonly used for inference-making.

Abstract:
We consider high-dimensional inference when the assumed linear model is misspecified. We describe some correct interpretations and corresponding sufficient assumptions for valid asymptotic inference of the model parameters, which still have a useful meaning when the model is misspecified. We largely focus on the de-sparsified Lasso procedure but we also indicate some implications for (multiple) sample splitting techniques. In view of available methods and software, our results contribute to robustness considerations with respect to model misspecification.

Abstract:
Finding an unconstrained and statistically interpretable reparameterization of a covariance matrix is still an open problem in statistics. Its solution is of central importance in covariance estimation, particularly in the recent high-dimensional data environment where enforcing the positive-definiteness constraint could be computationally expensive. We provide a survey of the progress made in modeling covariance matrices from two relatively complementary perspectives: (1) generalized linear models (GLM) or parsimony and use of covariates in low dimensions, and (2) regularization or sparsity for high-dimensional data. An emerging, unifying and powerful trend in both perspectives is that of reducing a covariance estimation problem to that of estimating a sequence of regression problems. We point out several instances of the regression-based formulation. A notable case is in sparse estimation of a precision matrix or a Gaussian graphical model leading to the fast graphical LASSO algorithm. Some advantages and limitations of the regression-based Cholesky decomposition relative to the classical spectral (eigenvalue) and variance-correlation decompositions are highlighted. The former provides an unconstrained and statistically interpretable reparameterization, and guarantees the positive-definiteness of the estimated covariance matrix. It reduces the unintuitive task of covariance estimation to that of modeling a sequence of regressions at the cost of imposing an a priori order among the variables. Elementwise regularization of the sample covariance matrix such as banding, tapering and thresholding has desirable asymptotic properties and the sparse estimated covariance matrix is positive definite with probability tending to one for large samples and dimensions.

Abstract:
Misspecified models often provide useful information about the true data generating distribution. For example, if $y$ is a non--linear function of $x$ the least squares estimator $\widehat{\beta}$ is an estimate of $\beta$, the slope of the best linear approximation to the non--linear function. Motivated by problems in astronomy, we study how to incorporate observation measurement error variances into fitting parameters of misspecified models. Our asymptotic theory focuses on the particular case of linear regression where often weighted least squares procedures are used to account for heteroskedasticity. We find that when the response is a non--linear function of the independent variable, the standard procedure of weighting by the inverse of the observation variances can be counter--productive. In particular, ordinary least squares may have lower asymptotic variance. We construct an adaptive estimator which has lower asymptotic variance than either OLS or standard WLS. We demonstrate our theory in a small simulation and apply these ideas to the problem of estimating the period of a periodic function using a sinusoidal model.

Abstract:
There is vast empirical evidence that given a set of assumptions on the real-world dynamics of an asset, the European options on this asset are not efficiently priced in options markets, giving rise to arbitrage opportunities. We study these opportunities in a generic stochastic volatility model and exhibit the strategies which maximize the arbitrage profit. In the case when the misspecified dynamics is a classical Black-Scholes one, we give a new interpretation of the classical butterfly and risk reversal contracts in terms of their (near) optimality for arbitrage strategies. Our results are illustrated by a numerical example including transaction costs.

Abstract:
Functional MRI (fMRI) experiments rely on precise characterization of the blood oxygen level dependent (BOLD) signal. As the spatial resolution of fMRI reaches the sub-millimeter range, the need for quantitative modelling of spatiotemporal properties of this hemodynamic signal has become pressing. Here, we find that a detailed physiologically-based model of spatiotemporal BOLD responses predicts traveling waves with velocities and spatial ranges in empirically observable ranges. Two measurable parameters, related to physiology, characterize these waves: wave velocity and damping rate. To test these predictions, high-resolution fMRI data are acquired from subjects viewing discrete visual stimuli. Predictions and experiment show strong agreement, in particular confirming BOLD waves propagating for at least 5–10 mm across the cortical surface at speeds of 2–12 mm s-1. These observations enable fundamentally new approaches to fMRI analysis, crucial for fMRI data acquired at high spatial resolution.