Guarding from Spurious Discoveries in High Dimension
Jianqing Fan,Wen-Xin Zhou
Statistics , 2015,
Abstract: Many data-mining and statistical machine learning algorithms have been developed to select a subset of covariates to associate with a response variable. Spurious discoveries can easily arise in high-dimensional data analysis due to enormous possibilities of such selections. How can we know statistically our discoveries better than those by chance? In this paper, we define a measure of goodness of spurious fit, which shows how good a response variable can be fitted by an optimally selected subset of covariates under the null model, and propose a simple and effective LAMM algorithm to compute it. It coincides with the maximum spurious correlation for linear models and can be regarded as a generalized maximum spurious correlation. We derive the asymptotic distribution of such goodness of spurious fit for generalized linear models and $L_1$-regression. Such an asymptotic distribution depends on the sample size, ambient dimension, the number of variables used in the fit, and the covariance information. It can be consistently estimated by multiplier bootstrapping and used as a benchmark to guard against spurious discoveries. It can also be applied to model selection, which considers only candidate models with goodness of fits better than those by spurious fits. The theory and method are convincingly illustrated by simulated examples and an application to the binary outcomes from German Neuroblastoma Trials.
New Insight into the Graphene Based Films Prepared from Carbon Fibers  [PDF]
Yan-Xiang Wang, Wen-Xin Fan, Guo-Li Wang, Min-Xia Ji
Materials Sciences and Applications (MSA) , 2011, DOI: 10.4236/msa.2011.27113
Abstract: In this work, ultrathin sections from longitudinal polyacrylonitrile (PAN) based T700 and T300 carbon fibers were prepared by ultramicrotomy, a promising graphene based thin films were developed in one step at ambient temperature. It is investigated that the network-graphene planes composed with carbon atoms are partly straight and partly twisted in the thin films prepared from T700 carbon fibers, the distance between the carbon atoms of network-graphene plane decreases, the order design of graphene in the films prepared from T700 carbon fibers is denser and its arrangement shows a preferred orientation along the drawing direction, its consistency of the neighboring graphene based planes is better, moreover, the relative content of the forming SP2-hybridized orbit of carbon atoms in the films prepared from T700 carbon fibers is higher, in the other words, the fact of the graphene based film prepared from carbon fibers without having the characteristic of skin-core structure has been verified.
Are Discoveries Spurious? Distributions of Maximum Spurious Correlations and Their Applications
Jianqing Fan,Qi-Man Shao,Wen-Xin Zhou
Statistics , 2015,
Abstract: Over the last two decades, many exciting variable selection methods have been developed for finding a small group of covariates that are associated with the response from a large pool. Can the discoveries by such data mining approaches be spurious due to high-dimensionality and limited sample size? Can our fundamental assumptions on exogeneity of covariates needed for such variable selection be validated with the data? To answer these questions, we need to derive the distributions of the maximum spurious correlations given certain number of predictors, namely, the distribution of the correlation of a response variable $Y$ with the best $s$ linear combinations of $p$ covariates $\mathbf{X}$, even when $\mathbf{X}$ and $Y$ are independent. When the covariance matrix of $\mathbf{X}$ possesses the restricted eigenvalue property, we derive such distributions for both finite $s$ and diverging $s$, using Gaussian approximation and empirical process techniques. However, such a distribution depends on the unknown covariance matrix of $\mathbf{X}$. Hence, we propose a multiplier bootstrap method to approximate the unknown distributions and establish the consistency of such a simple bootstrap approach. The results are further extended to the situation where residuals are from regularized fits. Our approach is then applied to construct the upper confidence limit for the maximum spurious correlation and testing exogeneity of covariates. The former provides a baseline for guiding false discoveries due to data mining and the latter tests whether our fundamental assumptions for high-dimensional model selection are statistically valid. Our techniques and results are illustrated by both numerical examples.
Embedded microprocessor designed for VxWorks

QU Wen-xin,FAN Xiao-ya,HU Ying,

计算机应用研究 , 2007,
Abstract: This paper presented a design of VxWorks for embedded microprocessor. Firstly, analyzed the VxWorks operating system.Introduced the design of the embedded SOC (system on chip) system.Then showed the architecture of Longtium R2 microprocessor, and discussed the system of SOC in detail.Finally,explained simulation and synthesis results.
Research Progress in Memory Technique of the Multi-Core and Multi-Thread Processor

QU Wen-Xin,FAN Xiao-Ya,HANG Sheng-Bing,

计算机科学 , 2007,
Abstract: Multi-core and multi-thread technique has become the trend of the research of the micropressor. Multi-core and multi-thread technique can improve the performance of the micropressor. But in the same time,the memory system has more pressure. The latence of access the memory has become the obstacle to improve the performance of the microprocessor. This paper firstly discusses some of the common architechture of the multi-core and multi-thread processor. And secondly explains today the research of the multi-core and multi-thread processor. Finally, at the base of above, the possible diretions of the memory techniques of the multi-core and multi-thread processor are stated.
The Research into and Realization of the Scientific Feed Management System
REN Rong,BAO Wen-xin
Journal of Chongqing Normal University , 2009,
Abstract: Aimed at the request of rational and scientific farming of special features animal, such as milk cow, the meat cow and the meat sheep...and so on, the paper established the database of special animal’s feed composition and ratio in Ningxia. It was realized to the platforms shared by whole region, such as feed formula service management, business processing platform. The application result of this system shows that it improved the formula of milk cow’s science feed, and enhanced the utilization of the raw material, and therefore it is attained the expected target to economize cost and increase benefit. The paper introduced Islamic Livestock Norm Production and Quality Attestation System in Electronic Agriculture Platform of West National Region, discussed the design process and production of Science Feed Management System too. Provide information technology support to enhance farming management level and livestock production efficient.
Bose-Einstein condensation in an optical lattice
P. B. Blakie,Wen-Xin Wang
Physics , 2007, DOI: 10.1103/PhysRevA.76.053620
Abstract: In this paper we develop an analytic expression for the critical temperature for a gas of ideal bosons in a combined harmonic lattice potential, relevant to current experiments using optical lattices. We give corrections to the critical temperature arising from effective mass modifications of the low energy spectrum, finite size effects and excited band states. We compute the critical temperature using numerical methods and compare to our analytic result. We study condensation in an optical lattice over a wide parameter regime and demonstrate that the critical temperature can be increased or reduced relative to the purely harmonic case by adjusting the harmonic trap frequency. We show that a simple numerical procedure based on a piecewise analytic density of states provides an accurate prediction for the critical temperature.
Matrix Completion via Max-Norm Constrained Optimization
T. Tony Cai,Wen-Xin Zhou
Computer Science , 2013,
Abstract: Matrix completion has been well studied under the uniform sampling model and the trace-norm regularized methods perform well both theoretically and numerically in such a setting. However, the uniform sampling model is unrealistic for a range of applications and the standard trace-norm relaxation can behave very poorly when the sampling distribution is non-uniform. In this paper we propose and analyze a max-norm constrained empirical risk minimization method for noisy matrix completion under a general sampling model. The optimal rate of convergence is established under the Frobenius norm loss in the context of approximately low-rank matrix reconstruction. It is shown that the max-norm constrained method is minimax rate-optimal and it yields a uni?ed and robust approximate recovery guarantee, with respect to the sampling distributions. The computational effectiveness of this method is also studied, based on a first-order algorithm for solving convex programs involving a max-norm constraint.
A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion
T. Tony Cai,Wen-Xin Zhou
Statistics , 2013,
Abstract: We consider in this paper the problem of noisy 1-bit matrix completion under a general non-uniform sampling distribution using the max-norm as a convex relaxation for the rank. A max-norm constrained maximum likelihood estimate is introduced and studied. The rate of convergence for the estimate is obtained. Information-theoretical methods are used to establish a minimax lower bound under the general sampling model. The minimax upper and lower bounds together yield the optimal rate of convergence for the Frobenius norm loss. Computational algorithms and numerical performance are also discussed.
Distribution of Typical Organic Pollutants in Benthic Mussels from the Inshore Areas of Yellow Sea

LIU Wen-xin,HU Jing,CHEN Jiang-lin,FAN Yong-sheng,TAO Shu,

环境科学 , 2008,
Abstract: Based on the second national baseline survey on marine pollution, the concentration, distribution and potential ecological risk of typical organic pollutants in the benthic mussels from the inshore areas of Yellow Sea were determined. The results indicated that, at over 35% of the total sampling sites, the tissue concentrations of petroleum hydrocarbons were higher than the Category I (15000 ng/g) of the national marine biological quality standards, and at the sites near Dalian Bay, the tissue concentrations even exceeded the Category II (50000 ng/g). At minor sites located in Dalian Bay, Weihai and Jiaozhou Bay, relatively high concentrations of polycyclic aromatic hydrocarbons (PAHs) and phthalate esters (PAEs) occurred, while the concentrations of PAHs and PAEs in mussels at most other sites were low. The dominant median and high rings components indicated pyrolytic procedures as the main source of local PAHs, and DBP and DEHP were the major constitution of phthalate esters. At all the sites, the tissue concentrations of PCBs were generally low (< 10 ng/g). The sites with DDTs concentrations over the Category I of the national quality standards were mainly situated in the southern coasts of Yellow Sea, and the predominant fractions were the metabolites of DDT, i.e., DDD and DDE. p,p?-DDT was detected in mussel species at all the monitoring sites, especially at the sites close to Dalian Bay and Penglai (fraction > 50%), which indicated potential inputs from the neighboring areas. Accordingly, petroleum hydrocarbons and PAHs in mussels at the coastal areas of Dalian Bay, Weihai and Jiaozhou Bay were high, the tissue DDTs levels at the sites near Jiaozhou Bay and Haizhou Bay were high, these sea areas had therefore higher ecological risk, and new inputs of DDT in the inshore areas of Dalian Bay and Penglai may threat on the benthic surroundings.
