The Turtleback Diagram for Conditional Probability  [PDF]
Donghui Yan, Gary E. Davis
Open Journal of Statistics (OJS) , 2018, DOI: 10.4236/ojs.2018.84045
Abstract: We elaborate on an alternative representation of conditional probability to the usual tree diagram. We term the representation “turtleback diagram” for its resemblance to the pattern on turtle shells. Adopting the set theoretic view of events and the sample space, the turtleback diagram uses elements from Venn diagrams—set intersection, complement and partition—for conditioning, with the additional notion that the area of a set indicates probability whereas the ratio of areas for conditional probability. Once parts of the diagram are drawn and properly labeled, the calculation of conditional probability involves only simple arithmetic on the area of relevant sets. We discuss turtleback diagrams in relation to other visual representations of conditional probability, and detail several scenarios in which turtleback diagrams prove useful. By the equivalence of recursive space partition and the tree, the turtleback diagram is seen to be equally expressive as the tree diagram for abstract concepts. We also provide empirical data on the use of turtleback diagrams with undergraduate students in elementary statistics or probability courses.
Cluster Forests
Donghui Yan,Aiyou Chen,Michael I. Jordan
Computer Science , 2011, DOI: 10.1016/j.csda.2013.04.010
Abstract: With inspiration from Random Forests (RF) in the context of classification, a new clustering ensemble method---Cluster Forests (CF) is proposed. Geometrically, CF randomly probes a high-dimensional data cloud to obtain "good local clusterings" and then aggregates via spectral clustering to obtain cluster assignments for the whole dataset. The search for good local clusterings is guided by a cluster quality measure kappa. CF progressively improves each local clustering in a fashion that resembles the tree growth in RF. Empirical studies on several real-world datasets under two different performance metrics show that CF compares favorably to its competitors. Theoretical analysis reveals that the kappa measure makes it possible to grow the local clustering in a desirable way---it is "noise-resistant". A closed-form expression is obtained for the mis-clustering rate of spectral clustering under a perturbation model, which yields new insights into some aspects of spectral clustering.
Clinical characteristics and risk factors of 27 liver failure patients complicated by invasive fungal infections  [PDF]
Longfeng Jiang, Jun Li, Yaping Han, Yuan Liu, Youde Yan, Nian Chen, Li Dong, Donghui Zhou, Ruiyun Wang
Health (Health) , 2011, DOI: 10.4236/health.2011.31008
Abstract: To investigate the clinical feature, risk factors and outcome of treatment in patients with liver failure complicated by invasive fungal infections. Retro-spective analysis of the clinical data and related factors of 27 patients with liver failure com-plicated by invasive fungal infections was per-formed. These patients were admitted from January 2007 to August 2009 in our department. Among them, Candida albicans accounted for 17cases (54.84%), albicans tropicals for 4 cases (12.90%). Fungal infection in respiratory tract and alimentary tract accounted for 58.06% and 11% respectively. 81.25% of them had fever fluctuating from 37.4oC to 40oC. 81.25% had elevated white blood cell counts .All had the usage of broad-spectrum of antibiotics, whereas some of them used corticosteroids and had invasive medical manipulation for the treatment. Most patients deteriorated after invasive fungal infections. 21 cases accepted with the treatment of antifungal drugs and mortality rate was 63.00%. It was found that the invasive fungal infection possibility of patients with liver failure significantly increased. To prevent the occurrence of invasive fungal infection, promptly early treatment of liver failure, proper use of antibiotics, cautious use or disuse of corticosteroids, reduction of invasive medical manipulation should be well done. Early detection and treatment of fungal infection are vital to decrease in mortality rate.
Generation of pancreatic islet cells from human embryonic stem cells
DongHui Zhang,Wei Jiang,Yan Shi,HongKui Deng
Science China Life Sciences , 2009, DOI: 10.1007/s11427-009-0095-3
Abstract: Efficiently obtaining functional pancreatic islet cells derived from human embryonic stem (hES) cells not only provides great potential to solve the shortage of islets sources for type I diabetes cell therapy, but also benefits the study of the development of the human pancreas and diabetes pathology. In 2001, hES cells were reported to have the capacity to generate insulin-producing cells by spontaneous differentiation in vitro. Since then, many strategies (such as overexpression of key transcription factors, delivery of key proteins for pancreatic development, co-transplantation of differentiated hES cells along with fetal pancreas, stepwise differentiation by mimicking in vivo pancreatic development) have been employed in order to induce the differentiation of pancreatic islet cells from hES cells. Moreover, patient-specific induced pluripotent stem (iPS) cells can be generated by reprogramming somatic cells. iPS cells have characteristics similar to those of ES cells and offer a new cell source for type I diabetes cell therapy that reduces the risk of immunologic rejection. In this review, we summarize the recent progress made in the differentiation of hES and iPS cells into functional pancreatic islet cells and discuss the challenges for their future study.
Statistical methods for tissue array images - algorithmic scoring and co-training
Donghui Yan,Pei Wang,Michael Linden,Beatrice Knudsen,Timothy Randolph
Computer Science , 2011, DOI: 10.1214/12-AOAS543
Abstract: Recent advances in tissue microarray technology have allowed immunohistochemistry to become a powerful medium-to-high throughput analysis tool, particularly for the validation of diagnostic and prognostic biomarkers. However, as study size grows, the manual evaluation of these assays becomes a prohibitive limitation; it vastly reduces throughput and greatly increases variability and expense. We propose an algorithm - Tissue Array Co-Occurrence Matrix Analysis (TACOMA) - for quantifying cellular phenotypes based on textural regularity summarized by local inter-pixel relationships. The algorithm can be easily trained for any staining pattern, is absent of sensitive tuning parameters and has the ability to report salient pixels in an image that contribute to its score. Pathologists' input via informative training patches is an important aspect of the algorithm that allows the training for any specific marker or cell type. With co-training, the error rate of TACOMA can be reduced substantially for a very small training sample (e.g., with size 30). We give theoretical insights into the success of co-training via thinning of the feature set in a high-dimensional setting when there is "sufficient" redundancy among the features. TACOMA is flexible, transparent and provides a scoring process that can be evaluated with clarity and confidence. In a study based on an estrogen receptor (ER) marker, we show that TACOMA is comparable to, or outperforms, pathologists' performance in terms of accuracy and repeatability.
Classification under Data Contamination with Application to Remote Sensing Image Mis-registration
Donghui Yan,Peng Gong,Aiyou Chen,Liheng Zhong
Computer Science , 2011,
Abstract: This work is motivated by the problem of image mis-registration in remote sensing and we are interested in determining the resulting loss in the accuracy of pattern classification. A statistical formulation is given where we propose to use data contamination to model and understand the phenomenon of image mis-registration. This model is widely applicable to many other types of errors as well, for example, measurement errors and gross errors etc. The impact of data contamination on classification is studied under a statistical learning theoretical framework. A closed-form asymptotic bound is established for the resulting loss in classification accuracy, which is less than $\epsilon/(1-\epsilon)$ for data contamination of an amount of $\epsilon$. Our bound is sharper than similar bounds in the domain adaptation literature and, unlike such bounds, it applies to classifiers with an infinite Vapnik-Chervonekis (VC) dimension. Extensive simulations have been conducted on both synthetic and real datasets under various types of data contamination, including label flipping, feature swapping and the replacement of feature values with data generated from a random source such as a Gaussian or Cauchy distribution. Our simulation results show that the bound we derive is fairly tight.
Consistent Condom Use Increases the Colonization of Lactobacillus crispatus in the Vagina
Liyan Ma, Zhi Lv, Jianrong Su, Jianjie Wang, Donghui Yan, Jingjuan Wei, Shuang Pei
PLOS ONE , 2013, DOI: 10.1371/journal.pone.0070716
Abstract: Background Non-hormonal contraception methods have been widely used, but their effects on colonization by vaginal lactobacilli remain unclear. Objective To determine the association between non-hormonal contraception methods and vaginal lactobacilli on women’s reproductive health. Methods The cross-sectional study included 164 healthy women between 18–45 years of age. The subjects were divided into different groups on the basis of the different non-hormonal contraception methods used by them. At the postmenstrual visit (day 21 or 22 of the menstrual cycle), vaginal swabs were collected for determination of Nugent score, quantitative culture and real-time polymerase chain reaction (PCR) of vaginal lactobacilli. The prevalence, colony counts and 16S rRNA gene expression of the Lactobacillus strains were compared between the different groups by Chi-square and ANOVA statistical analysis methods. Results A Nugent score of 0–3 was more common in the condom group (93.1%) than in the group that used an interuterine device(IUD) (75.4%), (p = 0.005). The prevalence of H2O2-producing Lactobacillus was significantly higher in the condom group (82.3%) than in the IUD group (68.2%), (p = 0.016). There was a significant difference in colony count (mean ± standard error (SE), log10colony forming unit (CFU)/ml) of H2O2-producing Lactobacillus between condom users (7.81±0.14) and IUD users (6.54±0.14), (p = 0.000). The 16S rRNA gene expression (mean ± SE, log10copies/ml) of Lactobacillus crispatus was significantly higher in the condom group (8.09±0.16) than in the IUD group (6.03±0.18), (p = 0.000). Conclusion Consistent condom use increases the colonization of Lactobacillus crispatus in the vagina and may protect against both bacterial vaginosis (BV) and human immunodeficiency virus (HIV).
On the nodal line of the second eigenfunction of the Laplacian over some concave domains in $\mathbb{R}^2$
Donghui Yang
Mathematics , 2010,
Abstract: In this paper we will prove the nodal line $N$ of the second eigenfunction of the Laplacian over some simply connected concave domain $\Omega$ in $\mathbb{R}^2$ must intersect the boundary $\partial\Omega$ at exactly two points.
A new compact class of open sets under Hausdorff distance and shape optimization
Donghui Yang
Mathematics , 2010,
Abstract: In this paper we obtain a new class of open sets, and we prove the class is compact under the Hausdorff distance, then we prove the existence of solutions of some shape optimization for elliptic equations.
