Abstract:
A guiding principle for data reduction in statistical inference is the sufficiency principle. This paper extends the classical sufficiency principle to decentralized inference, i.e., data reduction needs to be achieved in a decentralized manner. We examine the notions of local and global sufficient statistics and the relationship between the two for decentralized inference under different observation models. We then consider the impacts of quantization on decentralized data reduction which is often needed when communications among sensors are subject to finite capacity constraints. The central question we intend to ask is: if each node in a decentralized inference system has to summarize its data using a finite number of bits, is it still optimal to implement data reduction using global sufficient statistics prior to quantization? We show that the answer is negative using a simple example and proceed to identify conditions under which sufficiency based data reduction followed by quantization is indeed optimal. They include the well known case when the data at decentralized nodes are conditionally independent as well as a class of problems with conditionally dependent observations that admit conditional independence structure through the introduction of an appropriately chosen hidden variable.

Abstract:
The unknown vector estimation problem with bandwidth constrained wireless sensor network is considered. In such networks, sensor nodes make distributed observations on the unknown vector and collaborate with a fusion center to generate a final estimate. Due to power and communication bandwidth limitations, each sensor node must compress its data and transmit to the fusion center. In this paper, both centralized and decentralized estimation frameworks are developed. The closed-form solution for the centralized estimation framework is proposed. The computational complexity of decentralized estimation problem is proven to be NP-hard and a Gauss-Seidel algorithm to search for an optimal solution is also proposed. Simulation results show the good performance of the proposed algorithms. 1. Introduction The developments in microelectromechanical systems technology, wireless communications, and digital electronics have enabled the deployment of low-cost wireless sensor networks (WSNs) in large scale using small size sensor nodes [1]. In such networks, the distributed sensors collaborate with a fusion center to jointly estimate the unknown parameter. If fusion center receives all measurement data from all sensors directly and processes them in real time, the corresponding processing of sensor data is known as the centralized estimation, which has several serious drawbacks, including poor survivability and reliability, heavy communications, and computational burdens. Since all sensors have limited battery power, their computation and communication capability are severely limited; the decentralized estimation methods are widely discussed in recent years [2–6]. In the decentralized estimation framework, every sensor is also a subprocessor. It first preprocesses the measurements in terms of a criterion and then transmits its local compression data to the fusion center. Upon receiving the sensor messages, the fusion center combines them according to a fusion rule to generate the final result. In such networks, less information is transmitted leading to a significant power-saving advantage which is very important in the case of WSNs. To minimize the communication cost, only limited amount of information is allowed to be transmitted through networks; dimensionality reduction estimation methods have attracted considerable attentions [7–9]. The basic idea of the dimensionality reduction estimation strategy is to prefilter the high-dimensional observation vector by a linear transformation (matrix) to project the observation onto the subspace spanned by basis vectors and

Abstract:
A model of mutation rate evolution for multiple loci under arbitrary selection is analyzed. Results are obtained using techniques from Karlin (1982) that overcome the weak selection constraints needed for tractability in prior studies of multilocus event models. A multivariate form of the reduction principle is found: reduction results at individual loci combine topologically to produce a surface of mutation rate alterations that are neutral for a new modifier allele. New mutation rates survive if and only if they fall below this surface - a generalization of the hyperplane found by Zhivotovsky et al. (1994) for a multilocus recombination modifier. Increases in mutation rates at some loci may evolve if compensated for by decreases at other loci. The strength of selection on the modifier scales in proportion to the number of germline cell divisions, and increases with the number of loci affected. Loci that do not make a difference to marginal fitnesses at equilibrium are not subject to the reduction principle, and under fine tuning of mutation rates would be expected to have higher mutation rates than loci in mutation-selection balance. Other results include the nonexistence of 'viability analogous, Hardy-Weinberg' modifier polymorphisms under multiplicative mutation, and the sufficiency of average transmission rates to encapsulate the effect of modifier polymorphisms on the transmission of loci under selection. A conjecture is offered regarding situations, like recombination in the presence of mutation, that exhibit departures from the reduction principle. Constraints for tractability are: tight linkage of all loci, initial fixation at the modifier locus, and mutation distributions comprising transition probabilities of reversible Markov chains.

Abstract:
While machine learning has proven to be a powerful data-driven solution to many real-life problems, its use in sensitive domains that involve human subjects has been limited due to privacy concerns. The cryptographic approach known as "differential privacy" offers provable privacy guarantees. In this paper we study the learnability under Vapnik's general learning setting with differential privacy constraint, and reveal some intricate relationships between privacy, stability and learnability. In particular, we show that a problem is privately learnable \emph{if an only if} there is a private algorithm that asymptotically minimizes the empirical risk (AERM). This is rather surprising because for non-private learning, AERM alone is not sufficient for learnability. This result suggests that when searching for private learning algorithms, we can restrict the search to algorithms that are AERM. In light of this, we propose a conceptual procedure that always finds a universally consistent algorithm whenever the problem is learnable under privacy constraint. We also propose a generic and practical algorithm and show that under very general conditions it privately learns a wide class of learning problems.

Abstract:
While the Internet was conceived as a decentralized network, the most widely used web applications today tend toward centralization. Control increasingly rests with centralized service providers who, as a consequence, have also amassed unprecedented amounts of data about the behaviors and personalities of individuals. Developers, regulators, and consumer advocates have looked to alternative decentralized architectures as the natural response to threats posed by these centralized services. The result has been a great variety of solutions that include personal data stores (PDS), infomediaries, Vendor Relationship Management (VRM) systems, and federated and distributed social networks. And yet, for all these efforts, decentralized personal data architectures have seen little adoption. This position paper attempts to account for these failures, challenging the accepted wisdom in the web community on the feasibility and desirability of these approaches. We start with a historical discussion of the development of various categories of decentralized personal data architectures. Then we survey the main ideas to illustrate the common themes among these efforts. We tease apart the design characteristics of these systems from the social values that they (are intended to) promote. We use this understanding to point out numerous drawbacks of the decentralization paradigm, some inherent and others incidental. We end with recommendations for designers of these systems for working towards goals that are achievable, but perhaps more limited in scope and ambition.

Abstract:
The digitization of the medical data has been a sensitive topic. In modern times laws such as the HIPAA provide some guidelines for electronic transactions in medical data to prevent attacks and fraudulent usage of private information. In our paper, we explore an architecture that uses hybrid computing with decentralized key management and show how it is suitable in preventing a special form of re-identification attack that we name as the re-assembly attack. This architecture would be able to use current infrastructure from mobile phones to server certificates and cloud based decentralized storage models in an efficient way to provide a reliable model for communication of medical data. We encompass entities including patients, doctors, insurance agents, emergency contacts, researchers, medical test laboratories and technicians. This is a complete architecture that provides patients with a good level of privacy, secure communication and more direct control.

Abstract:
This project addresses the issues associated with providing Decentralized Data Offloading service to HPC Centers. HPC centers are High Performance Computing centers that use Parallel Processing for running advanced applications more reliably and efficiently. The main concept of this project is the offloading of data from a HPC Center to the destination site in decentralized mode. This project uses the decentralization concept where it is possible for the end user to retrieve the data even when the center logs out. This is possible by moving the data from the center to the Scratch Space. From Scratch space the data is moved to the intermediate storage nodes 1..n and from the nth node the data is transferred to the destination site within a deadline. These techniques are implemented within a Production Job Scheduler which schedules the jobs and Bit Torrent tool is used for data transfer in a decentralized environment. Thus the total offloading times are minimized; data loss and offload delays are also prevented.

Abstract:
The mean vector and covariance matrix are sufficient statistics when the underlying distribution is multivariate normal. Many type of statistical analyses used in practice rely on the assumption of multivariate normality (Gaussian model). For these analyses, maintaining the mean vector and covari-ance matrix of the masked data to be the same as that of the original data implies that if the masked data is analyzed using these techniques, the results of such analysis will be the same as that using the original data. For numerical confidential data, a recently proposed perturbation method makes it possi-ble to maintain the mean vector and covariance matrix of the masked data to be exactly the same as the original data. However, as it is currently proposed, the perturbed values from this method are consid-ered synthetic because they are generated without considering the values of the confidential variables (and are based only on the non-confidential variables). Some researchers argue that synthetic data re-sults in information loss. In this study, we provide a new methodology for generating non-synthetic perturbed data that maintains the mean vector and covariance matrix of the masked data to be exactly the same as the original data while offering a selectable degree of similarity between original and per-turbed data.

Abstract:
This paper deals with the analysis of critical observability and design of observers for networks of Finite State Machines (FSMs). Critical observability is a property of FSMs that corresponds to the possibility of detecting immediately if the current state of an FSM reaches a set of critical states modeling unsafe operations. This property is relevant in safety--critical applications where the timely recovery of human errors and device failures is of primary importance in ensuring safety. A critical observer is an observer that detects on--line the occurrence of critical states. When a large--scale network of FSMs is considered, the construction of such an observer is prohibitive because of the large computational effort needed. In this paper we propose a decentralized architecture for critical observers of networks of FSMs, where on--line detection of critical states is performed by local critical observers, each associated with an FSM of the network. For the design of local observers, efficient algorithms were provided which are based on on--the-fly techniques. Further, we present results on model reduction of networks of FSMs, based on bisimulation equivalence preserving critical observability. The advantages of the proposed approach in terms of computational complexity are discussed and examples offered.

Abstract:
This paper gives new foundations of quantum state reduction without appealing to the projection postulate for the probe measurement. For this purpose, the quantum Bayes principle is formulated as the most fundamental principle for determining the state of a quantum system, and the joint probability distribution for the outcomes of local successive measurements on a noninteracting entangled system is derived without assuming the projection postulate.