Abstract:
A great many tools have been developed for supervised classification, ranging from early methods such as linear discriminant analysis through to modern developments such as neural networks and support vector machines. A large number of comparative studies have been conducted in attempts to establish the relative superiority of these methods. This paper argues that these comparisons often fail to take into account important aspects of real problems, so that the apparent superiority of more sophisticated methods may be something of an illusion. In particular, simple methods typically yield performance almost as good as more sophisticated methods, to the extent that the difference in performance may be swamped by other sources of uncertainty that generally are not considered in the classical supervised classification paradigm.

Abstract:
Fienberg convincingly demonstrates that Bayesian models and methods represent a powerful approach to squeezing illumination from data in public policy settings. However, no school of inference is without its weaknesses, and, in the face of the ambiguities, uncertainties, and poorly posed questions of the real world, perhaps we should not expect to find a formally correct inferential strategy which can be universally applied, whatever the nature of the question: we should not expect to be able to identify a "norm" approach. An analogy is made between George Box's "no models are right, but some are useful," and inferential systems [arXiv:1108.2177].

Abstract:
The papers in this collection are superb illustrations of the power of modern Bayesian methods. They give examples of problems which are well suited to being tackled using such methods, but one must not lose sight of the merits of having multiple different strategies and tools in one's inferential armoury.

Abstract:
The area under the ROC curve is widely used as a measure of performance of classification rules. However, it has recently been shown that the measure is fundamentally incoherent, in the sense that it treats the relative severities of misclassifications differently when different classifiers are used. To overcome this, Hand (2009) proposed the $H$ measure, which allows a given researcher to fix the distribution of relative severities to a classifier-independent setting on a given problem. This note extends the discussion, and proposes a modified standard distribution for the $H$ measure, which better matches the requirements of researchers, in particular those faced with heavily unbalanced datasets, the $Beta(\pi_1+1,\pi_0+1)$ distribution. [Preprint submitted at Pattern Recognition Letters]

Abstract:
The vast potential of the genomic insight offered by microarray technologies has led to their widespread use since they were introduced a decade ago. Application areas include gene function discovery, disease diagnosis, and inferring regulatory networks. Microarray experiments enable large-scale, high-throughput investigations of gene activity and have thus provided the data analyst with a distinctive, high-dimensional field of study. Many questions in this field relate to finding subgroups of data profiles which are very similar. A popular type of exploratory tool for finding subgroups is cluster analysis, and many different flavors of algorithms have been used and indeed tailored for microarray data. Cluster analysis, however, implies a partitioning of the entire data set, and this does not always match the objective. Sometimes pattern discovery or bump hunting tools are more appropriate. This paper reviews these various tools for finding interesting subgroups.

Abstract:
The vast potential of the genomic insight offered by microarray technologies has led to their widespread use since they were introduced a decade ago. Application areas include gene function discovery, disease diagnosis, and inferring regulatory networks. Microarray experiments enable large-scale, high-throughput investigations of gene activity and have thus provided the data analyst with a distinctive, high-dimensional field of study. Many questions in this field relate to finding subgroups of data profiles which are very similar. A popular type of exploratory tool for finding subgroups is cluster analysis, and many different flavors of algorithms have been used and indeed tailored for microarray data. Cluster analysis, however, implies a partitioning of the entire data set, and this does not always match the objective. Sometimes pattern discovery or bump hunting tools are more appropriate. This paper reviews these various tools for finding interesting subgroups.

Abstract:
Star-galaxy classification is one of the most fundamental data-processing tasks in survey astronomy, and a critical starting point for the scientific exploitation of survey data. For bright sources this classification can be done with almost complete reliability, but for the numerous sources close to a survey's detection limit each image encodes only limited morphological information. In this regime, from which many of the new scientific discoveries are likely to come, it is vital to utilise all the available information about a source, both from multiple measurements and also prior knowledge about the star and galaxy populations. It is also more useful and realistic to provide classification probabilities than decisive classifications. All these desiderata can be met by adopting a Bayesian approach to star-galaxy classification, and we develop a very general formalism for doing so. An immediate implication of applying Bayes's theorem to this problem is that it is formally impossible to combine morphological measurements in different bands without using colour information as well; however we develop several approximations that disregard colour information as much as possible. The resultant scheme is applied to data from the UKIRT Infrared Deep Sky Survey (UKIDSS), and tested by comparing the results to deep Sloan Digital Sky Survey (SDSS) Stripe 82 measurements of the same sources. The Bayesian classification probabilities obtained from the UKIDSS data agree well with the deep SDSS classifications both overall (a mismatch rate of 0.022, compared to 0.044 for the UKIDSS pipeline classifier) and close to the UKIDSS detection limit (a mismatch rate of 0.068 compared to 0.075 for the UKIDSS pipeline classifier). The Bayesian formalism developed here can be applied to improve the reliability of any star-galaxy classification schemes based on the measured values of morphology statistics alone.

Abstract:
Learning the network structure of a large graph is computationally demanding, and dynamically monitoring the network over time for any changes in structure threatens to be more challenging still. This paper presents a two-stage method for anomaly detection in dynamic graphs: the first stage uses simple, conjugate Bayesian models for discrete time counting processes to track the pairwise links of all nodes in the graph to assess normality of behavior; the second stage applies standard network inference tools on a greatly reduced subset of potentially anomalous nodes. The utility of the method is demonstrated on simulated and real data sets.

Abstract:
Classifying streaming data requires the development of methods which are computationally efficient and able to cope with changes in the underlying distribution of the stream, a phenomenon known in the literature as concept drift. We propose a new method for detecting concept drift which uses an Exponentially Weighted Moving Average (EWMA) chart to monitor the misclassification rate of an streaming classifier. Our approach is modular and can hence be run in parallel with any underlying classifier to provide an additional layer of concept drift detection. Moreover our method is computationally efficient with overhead O(1) and works in a fully online manner with no need to store data points in memory. Unlike many existing approaches to concept drift detection, our method allows the rate of false positive detections to be controlled and kept constant over time.