Abstract:
We discuss the implementation of the nuclear model based on realistic nuclear spectral functions in the GENIE neutrino interaction generator. Besides improving on the Fermi gas description of the nuclear ground state, our scheme involves a new prescription for $Q^2$ selection, meant to efficiently enforce energy momentum conservation. The results of our simulations, validated through comparison to electron scattering data, have been obtained for a variety of target nuclei, ranging from carbon to argon, and cover the kinematical region in which quasi elastic scattering is the dominant reaction mechanism. We also analyse the influence of the adopted nuclear model on the determination of neutrino oscillation parameters.

Abstract:
In this article, the results from a series of muon flux measurements conducted at the Kimballton Underground Research Facility (KURF), Virginia, United States, are presented. The detector employed for these investigations, is made of plastic scintillator bars readout by wavelength shifting fibers and multianode photomultiplier tubes. Data was taken at several locations inside KURF, spanning rock overburden values from ~ 200 to 1450 m.w.e. From the extracted muon rates an empirical formula was devised, that estimates the muon flux inside the mine as a function of the overburden. The results are in good agreement with muon flux calculations based on analytical models and MUSIC.

Abstract:
Relationship-aware sequential pattern mining is the problem of mining frequent patterns in sequences in which the events of a sequence are mutually related by one or more concepts from some respective hierarchical taxonomies, based on the type of the events. Additionally events themselves are also described with a certain number of taxonomical concepts. We present RaSP an algorithm that is able to mine relationship-aware patterns over such sequences; RaSP follows a two stage approach. In the first stage it mines for frequent type patterns and {\em all} their occurrences within the different sequences. In the second stage it performs hierarchical mining where for each frequent type pattern and its occurrences it mines for more specific frequent patterns in the lower levels of the taxonomies. We test RaSP on a real world medical application, that provided the inspiration for its development, in which we mine for frequent patterns of medical behavior in the antibiotic treatment of microbes and show that it has a very good computational performance given the complexity of the relationship-aware sequential pattern mining problem.

Abstract:
Feature selection is one of the most prominent learning tasks, especially in high-dimensional datasets in which the goal is to understand the mechanisms that underly the learning dataset. However most of them typically deliver just a flat set of relevant features and provide no further information on what kind of structures, e.g. feature groupings, might underly the set of relevant features. In this paper we propose a new learning paradigm in which our goal is to uncover the structures that underly the set of relevant features for a given learning problem. We uncover two types of features sets, non-replaceable features that contain important information about the target variable and cannot be replaced by other features, and functionally similar features sets that can be used interchangeably in learned models, given the presence of the non-replaceable features, with no change in the predictive performance. To do so we propose a new learning algorithm that learns a number of disjoint models using a model disjointness regularization constraint together with a constraint on the predictive agreement of the disjoint models. We explore the behavior of our approach on a number of high-dimensional datasets, and show that, as expected by their construction, these satisfy a number of properties. Namely, model disjointness, a high predictive agreement, and a similar predictive performance to models learned on the full set of relevant features. The ability to structure the set of relevant features in such a manner can become a valuable tool in different applications of scientific knowledge discovery.

Abstract:
Most metric learning algorithms, as well as Fisher's Discriminant Analysis (FDA), optimize some cost function of different measures of within-and between-class distances. On the other hand, Support Vector Machines(SVMs) and several Multiple Kernel Learning (MKL) algorithms are based on the SVM large margin theory. Recently, SVMs have been analyzed from SVM and metric learning, and to develop new algorithms that build on the strengths of each. Inspired by the metric learning interpretation of SVM, we develop here a new metric-learning based SVM framework in which we incorporate metric learning concepts within SVM. We extend the optimization problem of SVM to include some measure of the within-class distance and along the way we develop a new within-class distance measure which is appropriate for SVM. In addition, we adopt the same approach for MKL and show that it can be also formulated as a Mahalanobis metric learning problem. Our end result is a number of SVM/MKL algorithms that incorporate metric learning concepts. We experiment with them on a set of benchmark datasets and observe important predictive performance improvements.

Abstract:
Metric learning methods have been shown to perform well on different learning tasks. Many of them rely on target neighborhood relationships that are computed in the original feature space and remain fixed throughout learning. As a result, the learned metric reflects the original neighborhood relations. We propose a novel formulation of the metric learning problem in which, in addition to the metric, the target neighborhood relations are also learned in a two-step iterative approach. The new formulation can be seen as a generalization of many existing metric learning methods. The formulation includes a target neighbor assignment rule that assigns different numbers of neighbors to instances according to their quality; `high quality' instances get more neighbors. We experiment with two of its instantiations that correspond to the metric learning algorithms LMNN and MCML and compare it to other metric learning methods on a number of datasets. The experimental results show state-of-the-art performance and provide evidence that learning the neighborhood relations does improve predictive performance.

Abstract:
Recommendation systems often rely on point-wise loss metrics such as the mean squared error. However, in real recommendation settings only few items are presented to a user. This observation has recently encouraged the use of rank-based metrics. LambdaMART is the state-of-the-art algorithm in learning to rank which relies on such a metric. Despite its success it does not have a principled regularization mechanism relying in empirical approaches to control model complexity leaving it thus prone to overfitting. Motivated by the fact that very often the users' and items' descriptions as well as the preference behavior can be well summarized by a small number of hidden factors, we propose a novel algorithm, LambdaMART Matrix Factorization (LambdaMART-MF), that learns a low rank latent representation of users and items using gradient boosted trees. The algorithm factorizes lambdaMART by defining relevance scores as the inner product of the learned representations of the users and items. The low rank is essentially a model complexity controller; on top of it we propose additional regularizers to constraint the learned latent representations that reflect the user and item manifolds as these are defined by their original feature based descriptors and the preference behavior. Finally we also propose to use a weighted variant of NDCG to reduce the penalty for similar items with large rating discrepancy. We experiment on two very different recommendation datasets, meta-mining and movies-users, and evaluate the performance of LambdaMART-MF, with and without regularization, in the cold start setting as well as in the simpler matrix completion setting. In both cases it outperforms in a significant manner current state of the art algorithms.

Abstract:
We study the problem of learning local metrics for nearest neighbor classification. Most previous works on local metric learning learn a number of local unrelated metrics. While this "independence" approach delivers an increased flexibility its downside is the considerable risk of overfitting. We present a new parametric local metric learning method in which we learn a smooth metric matrix function over the data manifold. Using an approximation error bound of the metric matrix function we learn local metrics as linear combinations of basis metrics defined on anchor points over different regions of the instance space. We constrain the metric matrix function by imposing on the linear combinations manifold regularization which makes the learned metric matrix function vary smoothly along the geodesics of the data manifold. Our metric learning method has excellent performance both in terms of predictive power and scalability. We experimented with several large-scale classification problems, tens of thousands of instances, and compared it with several state of the art metric learning methods, both global and local, as well as to SVM with automatic kernel selection, all of which it outperforms in a significant manner.

Abstract:
The Double Chooz experiment has determined the value of the neutrino oscillation parameter $\theta_{13}$ from an analysis of inverse beta decay interactions with neutron capture on hydrogen. This analysis uses a three times larger fiducial volume than the standard Double Chooz assessment, which is restricted to a region doped with gadolinium (Gd), yielding an exposure of 113.1 GW-ton-years. The data sample used in this analysis is distinct from that of the Gd analysis, and the systematic uncertainties are also largely independent, with some exceptions, such as the reactor neutrino flux prediction. A combined rate- and energy-dependent fit finds $\sin^2 2\theta_{13}=0.097\pm 0.034(stat.) \pm 0.034 (syst.)$, excluding the no-oscillation hypothesis at 2.0 \sigma. This result is consistent with previous measurements of $\sin^2 2\theta_{13}$.

Abstract:
We present a search for Lorentz violation with 8249 candidate electron antineutrino events taken by the Double Chooz experiment in 227.9 live days of running. This analysis, featuring a search for a sidereal time dependence of the events, is the first test of Lorentz invariance using a reactor-based antineutrino source. No sidereal variation is present in the data and the disappearance results are consistent with sidereal time independent oscillations. Under the Standard-Model Extension (SME), we set the first limits on fourteen Lorentz violating coefficients associated with transitions between electron and tau flavor, and set two competitive limits associated with transitions between electron and muon flavor.