Abstract:
Variance estimation for estimators of state, county, and school district quantities derived from the Census 2000 long form are discussed. The variance estimator must account for (1) uncertainty due to imputation, and (2) raking to census population controls. An imputation procedure that imputes more than one value for each missing item using donors that are neighbors is described and the procedure using two nearest neighbors is applied to the Census long form. The Kim and Fuller [Biometrika 91 (2004) 559--578] method for variance estimation under fractional hot deck imputation is adapted for application to the long form data. Numerical results from the 2000 long form data are presented.

Abstract:
The present paper deals with optimisation of Nearest Neighbour rule Classifiers via Genetic Algorithms. The methodology consists on implement a Genetic Algorithm capable of search the input feature space used by the NNR classifier. Results show that is adequate to perform feature reduction and simultaneous improve the Recognition Rate. Some practical examples prove that is possible to Recognise Portuguese Granites in 100%, with only 3 morphological features (from an original set of 117 features), which is well suited for real time applications. Moreover, the present method represents a robust strategy to understand the proper nature of the images treated, and their discriminant features. KEYWORDS: Feature Reduction, Genetic Algorithms, Nearest Neighbour Rule Classifiers (k-NNR).

Abstract:
We derive an asymptotic expansion for the excess risk (regret) of a weighted nearest-neighbour classifier. This allows us to find the asymptotically optimal vector of nonnegative weights, which has a rather simple form. We show that the ratio of the regret of this classifier to that of an unweighted k-nearest neighbour classifier depends asymptotically only on the dimension d of the feature vectors, and not on the underlying populations. The improvement is greatest when d=4, but thereafter decreases as $d\rightarrow\infty$. The popular bagged nearest neighbour classifier can also be regarded as a weighted nearest neighbour classifier, and we show that its corresponding weights are somewhat suboptimal when d is small (in particular, worse than those of the unweighted k-nearest neighbour classifier when d=1), but are close to optimal when d is large. Finally, we argue that improvements in the rate of convergence are possible under stronger smoothness assumptions, provided we allow negative weights. Our findings are supported by an empirical performance comparison on both simulated and real data sets.

Abstract:
In the on-line nearest-neighbour graph (ONG), each point after the first in a sequence of points in R^d is joined by an edge to its nearest-neighbour amongst those points that precede it in the sequence. We study the large-sample asymptotic behaviour of the total power-weighted length of the ONG on uniform random points in (0,1)^d. In particular, for d=1 and weight exponent \alpha>1/2, the limiting distribution of the centred total weight is characterized by a distributional fixed-point equation. As an ancillary result, we give exact expressions for the expectation and variance of the standard nearest-neighbour (directed) graph on uniform random points in the unit interval.

Abstract:
The on-line nearest-neighbour graph on a sequence of $n$ uniform random points in $(0,1)^d$ ($d \in \N$) joins each point after the first to its nearest neighbour amongst its predecessors. For the total power-weighted edge-length of this graph, with weight exponent $\alpha \in (0,d/2]$, we prove $O(\max \{n^{1-(2\alpha/d)}, \log n \})$ upper bounds on the variance. On the other hand, we give an $n \to \infty$ large-sample convergence result for the total power-weighted edge-length when $\alpha > d/2$. We prove corresponding results when the underlying point set is a Poisson process of intensity $n$.

Abstract:
Marginal imputation, which consists of imputing each item requiring imputation separately, is often used in surveys. This type of imputation procedures leads to asymptotically unbiased estimators of simple parameters such as population totals (or means), but tends to distort relationships between variables. As a result, it generally leads to biased estimators of bivariate parameters such as coefficients of correlation or odd-ratios. Household and social surveys typically collect categorical variables, for which missing values are usually handled by nearest-neighbour imputation or random hot-deck imputation. In this paper, we propose a simple random imputation procedure, closely related to random hot-deck imputation, which succeeds in preserving the relationship between categorical variables. Also, a fully efficient version of the latter procedure is proposed. A limited simulation study compares several estimation procedures in terms of relative bias and relative efficiency.

Abstract:
In order to overcome the problem of item nonresponse, random imputation methods are often used because they tend to preserve the distribution of the imputed variable. Among the random imputation methods, the random hot-deck has the interesting property of imputing observed values. A new random hot-deck imputation method is proposed. The key innovation of this method is that the selection of donors is viewed as a sampling problem and uses calibration and balanced sampling. This approach makes it possible to select donors such that if the auxiliary variables were imputed, their estimated totals would not change. As a consequence, very accurate and stable totals estimations can be obtained. Moreover, the method is based on a nonparametric procedure. Donors are selected in neighborhoods of recipients. In this way, the missing value of a recipient is replaced with an observed value of a similar unit. This new approach is very flexible and can greatly improve the quality of estimations. Also, this method is unbiased under very different models and is thus resistant to model misspecification. Finally, the new method makes it possible to introduce edit rules while imputing.

Abstract:
This paper develops results for the next nearest neighbour Ising model on random graphs. Besides being an essential ingredient in classic models for frustrated systems, second neighbour interactions interactions arise naturally in several applications such as the colour diversity problem and graphical games. We demonstrate ensembles of random graphs, including regular connectivity graphs, that have a periodic variation of free energy, with either the ratio of nearest to next nearest couplings, or the mean number of nearest neighbours. When the coupling ratio is integer paramagnetic phases can be found at zero temperature. This is shown to be related to the locked or unlocked nature of the interactions. For anti-ferromagnetic couplings, spin glass phases are demonstrated at low temperature. The interaction structure is formulated as a factor graph, the solution on a tree is developed. The replica symmetric and energetic one-step replica symmetry breaking solution is developed using the cavity method. We calculate within these frameworks the phase diagram and demonstrate the existence of dynamical transitions at zero temperature for cases of anti-ferromagnetic coupling on regular and inhomogeneous random graphs.

Abstract:
Advances in wireless technologies have led to the development of sensor nodes that are capable of sensing, processing, and transmitting. They collect large amounts of sensor data in a highly decentralized manner. Classification is an important task in data mining. In this paper a Nearest Neighbour Classification technique is used to classify the Wireless Sensor Network data. Our experimental investigation yields a significant output in terms of the correctly classified success rate being 92.3%.