Abstract:
In the causal adjustment setting, variable selection techniques based on either the outcome or treatment allocation model can result in the omission of confounders or the inclusion of spurious variables in the propensity score. We propose a variable selection method based on a penalized likelihood which considers the response and treatment assignment models simultaneously. The proposed method facilitates confounder selection in high-dimensional settings. We show that under some conditions our method attains the oracle property. The selected variables are used to form a double robust regression estimator of the treatment effect. Simulation results are presented and economic growth data are analyzed.

Abstract:
In this paper we present an extension of population-based Markov chain Monte Carlo (MCMC) to the trans-dimensional case. One of the main challenges in MCMC-based inference is that of simulating from high and trans-dimensional target measures. In such cases, MCMC methods may not adequately traverse the support of the target; the simulation results will be unreliable. We develop population methods to deal with such problems, and give a result proving the uniform ergodicity of these population algorithms, under mild assumptions. This result is used to demonstrate the superiority, in terms of convergence rate, of a population transition kernel over a reversible jump sampler for a Bayesian variable selection problem. We also give an example of a population algorithm for a Bayesian multivariate mixture model with an unknown number of components. This is applied to gene expression data of 1000 data points in six dimensions and it is demonstrated that our algorithm out performs some competing Markov chain samplers.

Abstract:
The estimation of parameters in the frequency spectrum of a seasonally persistent stationary stochastic process is addressed. For seasonal persistence associated with a pole in the spectrum located away from frequency zero, a new Whittle-type likelihood is developed that explicitly acknowledges the location of the pole. This Whittle likelihood is a large sample approximation to the distribution of the periodogram over a chosen grid of frequencies, and constitutes an approximation to the time-domain likelihood of the data, via the linear transformation of an inverse discrete Fourier transform combined with a demodulation. The new likelihood is straightforward to compute, and as will be demonstrated has good, yet non-standard, properties. The asymptotic behaviour of the proposed likelihood estimators is studied; in particular, $N$-consistency of the estimator of the spectral pole location is established. Large finite sample and asymptotic distributions of the score and observed Fisher information are given, and the corresponding distributions of the maximum likelihood estimators are deduced. A study of the small sample properties of the likelihood approximation is provided, and its superior performance to previously suggested methods is shown, as well as agreement with the developed distributional approximations.

Abstract:
This paper constructs a doubly robust estimator for continuous dose-response estimation. An outcome regression model is augmented with a set of inverse generalized propensity score covariates to correct for potential misspecification bias. From the augmented model we can obtain consistent estimates of mean average potential outcomes for distinct strata of the treatment. A polynomial regression is then fitted to these point estimates to derive a Taylor approximation to the continuous dose-response function. The bootstrap is used for variance estimation. Analytical results and simulations show that our approach can provide a good approximation to linear or nonlinear dose-response functions under various sources of misspecification of the outcome regression or propensity score models. Efficiency in finite samples is good relative to minimum variance consistent estimators.

Abstract:
There is considerable interest in cell biology in determining whether, and to what extent, the spatial arrangement of nuclear objects affects nuclear function. A common approach to address this issue involves analyzing a collection of images produced using some form of fluorescence microscopy. We assume that these images have been successfully pre-processed and a spatial point pattern representation of the objects of interest within the nuclear boundary is available. Typically in these scenarios, the number of objects per nucleus is low, which has consequences on the ability of standard analysis procedures to demonstrate the existence of spatial preference in the pattern. There are broadly two common approaches to look for structure in these spatial point patterns. First a spatial point pattern for each image is analyzed individually, or second a simple normalization is performed and the patterns are aggregated. In this paper we demonstrate using synthetic spatial point patterns drawn from predefined point processes how difficult it is to distinguish a pattern from complete spatial randomness using these techniques and hence how easy it is to miss interesting spatial preferences in the arrangement of nuclear objects. The impact of this problem is also illustrated on data related to the configuration of PML nuclear bodies in mammalian fibroblast cells.

Abstract:
There are almost 1,300 entries for higher eukaryotes in the Nuclear Protein Database. The proteins' subcellular distribution patterns within interphase nuclei can be complex, ranging from diffuse to punctate or microspeckled, yet they all work together in a coordinated and controlled manner within the three-dimensional confines of the nuclear volume. In this review we describe recent advances in the use of quantitative methods to understand nuclear spatial organisation and discuss some of the practical applications resulting from this work.

Abstract:
Homologous recombination is an important operator in the evolution of biological organisms. However, there is still no clear, generally accepted understanding of why it exists and under what circumstances it is useful. In this paper we consider its utility in the context of an infinite population haploid model with selection and homologous recombination. We define utility in terms of two metrics - the increase in frequency of fit genotypes, and the increase in average population fitness, relative to those associated with selection only. Explicitly, we exhaustively explore the eight-dimensional parameter space of a two-locus two-allele system, showing, as a function of the landscape and the initial population, that recombination is beneficial in terms of our metrics in two distinct regimes: a landscape independent regime - the "search" regime - where recombination aids in the search for a fit genotype that is absent or at low frequency in the population; and the "modular" regime, associated with quasi-additive fitness landscapes with low epistasis, where recombination allows for the juxtaposition of fit "modules" or Building Blocks. Thus, we conclude that the ubiquity and utility of recombination is intimately associated with the existence of modularity in biological fitness landscapes.

Abstract:
In this article we describe Bayesian nonparametric procedures for two-sample hypothesis testing. Namely, given two sets of samples $\mathbf{y}^{\scriptscriptstyle(1)}\;$\stackrel{\scriptscriptstyle{iid}}{\s im}$\;F^{\scriptscriptstyle(1)}$ and $\mathbf{y}^{\scriptscriptstyle(2 )}\;$\stackrel{\scriptscriptstyle{iid}}{\sim}$\;F^{\scriptscriptstyle( 2)}$, with $F^{\scriptscriptstyle(1)},F^{\scriptscriptstyle(2)}$ unknown, we wish to evaluate the evidence for the null hypothesis $H_0:F^{\scriptscriptstyle(1)}\equiv F^{\scriptscriptstyle(2)}$ versus the alternative $H_1:F^{\scriptscriptstyle(1)}\neq F^{\scriptscriptstyle(2)}$. Our method is based upon a nonparametric P\'{o}lya tree prior centered either subjectively or using an empirical procedure. We show that the P\'{o}lya tree prior leads to an analytic expression for the marginal likelihood under the two hypotheses and hence an explicit measure of the probability of the null $\mathrm{Pr}(H_0|\{\mathbf {y}^{\scriptscriptstyle(1)},\mathbf{y}^{\scriptscriptstyle(2)}\}\mathbf{)}$.

Abstract:
This paper outlines an approach to creating questions for a subject-based question bank for use in UK library schools. The authors outline a concept map for information science and describe how Bloom’s taxonomy can be adapted to the creation of higher level questions than the commonly used and simple recall type. Sample questions were created using the International Encyclopedia of Information and Library Science (IEILS) and subjects defined by staff at the Department of Information Science at Loughborough University. A role is suggested for the Learning and Teaching Support Network for Information and Computer Science (LTSN-ICS).

Abstract:
Detailed seabed substrate maps are increasingly in demand for effective planning and management of marine ecosystems and resources. It has become common to use remotely sensed multibeam echosounder data in the form of bathymetry and acoustic backscatter in conjunction with ground-truth sampling data to inform the mapping of seabed substrates. Whilst, until recently, such data sets have typically been classified by expert interpretation, it is now obvious that more objective, faster and repeatable methods of seabed classification are required. This study compares the performances of a range of supervised classification techniques for predicting substrate type from multibeam echosounder data. The study area is located in the North Sea, off the north-east coast of England. A total of 258 ground-truth samples were classified into four substrate classes. Multibeam bathymetry and backscatter data, and a range of secondary features derived from these datasets were used in this study. Six supervised classification techniques were tested: Classification Trees, Support Vector Machines, k-Nearest Neighbour, Neural Networks, Random Forest and Naive Bayes. Each classifier was trained multiple times using different input features, including i) the two primary features of bathymetry and backscatter, ii) a subset of the features chosen by a feature selection process and iii) all of the input features. The predictive performances of the models were validated using a separate test set of ground-truth samples. The statistical significance of model performances relative to a simple baseline model (Nearest Neighbour predictions on bathymetry and backscatter) were tested to assess the benefits of using more sophisticated approaches. The best performing models were tree based methods and Naive Bayes which achieved accuracies of around 0.8 and kappa coefficients of up to 0.5 on the test set. The models that used all input features didn't generally perform well, highlighting the need for some means of feature selection.