Abstract:
Information and uncertainty are closely related and extensively studied concepts in a number of scientific disciplines such as communication theory, probability theory, and statistics. Increasing the information arguably reduces the uncertainty on a given random subject. Consider the uncertainty measure as the variance of a random variable. Given the information that its outcome is in an interval, the uncertainty is expected to reduce when the interval shrinks. This proposition is not generally true. In this paper, we provide a necessary and sufficient condition for this proposition when the random variable is absolutely continuous or integer valued. We also give a similar result on Shannon information.

For English sentences with a large amount of feature data and complex pronunciation changes contrast to words, there are more problems existing in Hidden Markov Model (HMM), such as the computational complexity of the Viterbi algorithm and mixed Gaussian distribution probability. This article explores the segment-mean algorithm for dimensionality reduction of speech feature parameters, the clustering cross-grouping algorithm and the HMM grouping algorithm, which are proposed for the implementation of the speaker-independent English sentence recognition system based on HMM and clustering. The experimental result shows that, compared with the single HMM, it improves not only the recognition rate but also the recognition speed of the system.

Abstract:
Multivariate normal mixtures provide a flexible model for high-dimensional data. They are widely used in statistical genetics, statistical finance, and other disciplines. Due to the unboundedness of the likelihood function, classical likelihood-based methods, which may have nice practical properties, are inconsistent. In this paper, we recommend a penalized likelihood method for estimating the mixing distribution. We show that the maximum penalized likelihood estimator is strongly consistent when the number of components has a known upper bound. We also explore a convenient EM-algorithm for computing the maximum penalized likelihood estimator. Extensive simulations are conducted to explore the effectiveness and the practical limitations of both the new method and the ratified maximum likelihood estimators. Guidelines are provided based on the simulation results.

Abstract:
Normal mixture distributions are arguably the most important mixture models, and also the most technically challenging. The likelihood function of the normal mixture model is unbounded based on a set of random samples, unless an artificial bound is placed on its component variance parameter. Moreover, the model is not strongly identifiable so it is hard to differentiate between over dispersion caused by the presence of a mixture and that caused by a large variance, and it has infinite Fisher information with respect to mixing proportions. There has been extensive research on finite normal mixture models, but much of it addresses merely consistency of the point estimation or useful practical procedures, and many results require undesirable restrictions on the parameter space. We show that an EM-test for homogeneity is effective at overcoming many challenges in the context of finite normal mixtures. We find that the limiting distribution of the EM-test is a simple function of the $0.5\chi^2_0+0.5\chi^2_1$ and $\chi^2_1$ distributions when the mixing variances are equal but unknown and the $\chi^2_2$ when variances are unequal and unknown. Simulations show that the limiting distributions approximate the finite sample distribution satisfactorily. Two genetic examples are used to illustrate the application of the EM-test.

Abstract:
Population quantiles and their functions are important parameters in many applications. For example, the lower quantiles often serve as crucial quality indices for forestry products. Given several independent samples from populations satisfying the density ratio model, we investigate the properties of empirical likelihood (EL) based inferences. The induced EL quantile estimators are shown to admit a Bahadur representation that leads to asymptotically valid confidence intervals for functions of quantiles. We rigorously prove that EL quantiles based on all the samples are more efficient than empirical quantiles based on individual samples. A simulation study shows that the EL quantiles and their functions have superior performance when the density ratio model assumption is satisfied and when it is mildly violated. An example is used to demonstrate the new method and the potential cost savings.

Abstract:
Empirical likelihood is a popular nonparametric or semi-parametric statistical method with many nice statistical properties. Yet when the sample size is small, or the dimension of the accompanying estimating function is high, the application of the empirical likelihood method can be hindered by low precision of the chi-square approximation and by nonexistence of solutions to the estimating equations. In this paper, we show that the adjusted empirical likelihood is effective at addressing both problems. With a specific level of adjustment, the adjusted empirical likelihood achieves the high-order precision of the Bartlett correction, in addition to the advantage of a guaranteed solution to the estimating equations. Simulation results indicate that the confidence regions constructed by the adjusted empirical likelihood have coverage probabilities comparable to or substantially more accurate than the original empirical likelihood enhanced by the Bartlett correction.

Abstract:
The finite mixtures of von Mises distributions in both mean direction and concentration parameters are widely used in many disciplines, including astronomy, biology, ecology, geology and medicine. It is well known that the likelihood function is unbounded for any sample size. Hence, the ordinary maximum likelihood estimator (MLE) is not consistent. Similar to normal mixtures in both mean and variance parameters, this drawback of MLE will disappear by introducing a penalty function to the log-likelihood function or putting constraints on component concentration parameters (Chen et al., 2006 and Tan et al., 2006). In this paper, we prove that both of the penalized maximum likelihood estimator and the constrained maximum likelihood estimator are asymptotically consistent and efficient. The finite sample performance of penalized MLE and constrained MLE are compared with the moment estimator (Spur and Koutbeiy, 1991) through simulations. The PMLE is found to have the best performance in term of mean square error. A real data example is used to illustrate the proposed methods.

Abstract:
This paper presents a hypothesis testing method given independent samples from a number of connected populations. The method is motivated by a forestry project for monitoring change in the strength of lumber. Traditional practice has been built upon nonparametric methods which ignore the fact that these populations are connected. By pooling the information in multiple samples through a density ratio model, the proposed empirical likelihood method leads to a more efficient inference and therefore reduces the cost in applications. The new test has a classical chi-square null limiting distribution. Its power function is obtained under a class of local alternatives. The local power is found increased even when some underlying populations are unrelated to the hypothesis of interest. Simulation studies confirm that this test has better power properties than potential competitors, and is robust to model misspecification. An application example to lumber strength is included.

Abstract:
Selenium (Se) is a trace element required for normal body function. Its supplementation of human diet at standard optimum amount prevents oxidative damages in cells and could be a viable method in the prevention of diseases related to DNA damage, including cancer, neurodegenerative diseases and aging. While Se anticancer properties have been linked to its ability to remove excess Reactive Oxygen Species (ROS) in cells, the underlying molecular mechanism remains unknown. Recent studies have shown that the removal of ROS alone cannot account for Se anticancer properties. To really comprehend the molecular basis of Se anticancer properties, current researches now focus on the metabolism of Se in the cell, especially Se-containing amino acids. Selenocysteine (Sec) is a novel amino acid and one of the selenium-containing compounds in the cell. It is essential in the maintenance of the integrity of its parent proteins, some of which include enzymes such as Glutathione Peroxidases (GPXs) and Thioredoxin Reductases (TrXs). We propose in this study that the overproduction of Sec via the overexpression of Selenocysteine synthase (SecS) gene and Se supplementation induced cell death in Prostate Carcinoma (PC-3) cells. Although the mechanism underlying the cell death induction is unknown, we propose it could be due to the random incorporation of Sec into proteins at high concentration, causing premature protein degradation and cell death. The outcome of this study showed that increasing the concentration of intracellular Se-containing amino acids may provide important clinical implications for the treatment of cancer.

Abstract:
The paper presents a novel pressure sensor based on a silicon nitride (SiNx) nanocoated long-period grating (LPG). The high-temperature, radio-frequency plasma-enhanced chemical-vapor-deposited (RF PECVD) SiNx nanocoating was applied to tune the sensitivity of the LPG to the external refractive index. The technique allows for deposition of good quality, hard and wear-resistant nanofilms as required for optical sensors. Thanks to the SiNx nanocoating it is possible to overcome a limitation of working in the external-refractive-index range, which for a bare fiber cannot be close to that of the cladding. The nanocoated LPG-based sensing structure we developed is functional in high-refractive-index liquids (nD > 1.46) such as oil or gasoline, with pressure sensitivity as high as when water is used as a working liquid. The nanocoating developed for this experiment not only has the highest refractive index ever achieved in LPGs (n > 2.2 at λ = 1,550 nm), but is also the thinnest (