Abstract:
Linguistic structures exhibit a rich array of global phenomena, however commonly used Markov models are unable to adequately describe these phenomena due to their strong locality assumptions. We propose a novel hierarchical model for structured prediction over sequences and trees which exploits global context by conditioning each generation decision on an unbounded context of prior decisions. This builds on the success of Markov models but without imposing a fixed bound in order to better represent global phenomena. To facilitate learning of this large and unbounded model, we use a hierarchical Pitman-Yor process prior which provides a recursive form of smoothing. We propose prediction algorithms based on A* and Markov Chain Monte Carlo sampling. Empirical results demonstrate the potential of our model compared to baseline finite-context Markov models on part-of-speech tagging and syntactic parsing.

Abstract:
We understand that adolescence is a period of several transformations. We objectify in this study to investigate the vulnerability and the risks of adolescents in the midst of ST/HIV/Aids in their several contexts. It was used the exploratory method, and based on a cultural and ethnographical approach, developed in an elementary and high school in Fortaleza, Ceará, in the period between April and June, 2007. The following instruments were used to work with the 20 selected adolescents observation, participant observation, field diary, focal group. The students of this study were divided into two focal groups according to the gender. There were five meetings with the average duration of 90 minutes. For the data analysis, it was used the content analysis of Bardin. As a result, it is possible to emphasize that social groups are essential in adolescents’ formation of knowledge related to STD/HIV/Aids, however there is a deficit of reflexive and constructive dialogue in these spaces, which are not performing their real roles. This problem together with womanly submission, culturally rooted, influences the risk and vulnerability that expose the adolescents. Therefore, it was concluded that the society as a whole needs to reflect on these factors, distinguishing its roles in the control of STD/HIV/Aids of the adolescents as well as of the population in general, aiming health promotion.

Abstract:
We understand that adolescence is a period of several transformations. We objectify in this study to investigate the vulnerability and the risks of adolescents in the midst of ST/HIV/Aids in their several contexts. It was used the exploratory method, and based on a cultural and ethnographical approach, developed in an elementary and high school in Fortaleza, Ceará, in the period between April and June, 2007. The following instruments were used to work with the 20 selected adolescents observation, participant observation, field diary, focal group. The students of this study were divided into two focal groups according to the gender. There were five meetings with the average duration of 90 minutes. For the data analysis, it was used the content analysis of Bardin. As a result, it is possible to emphasize that social groups are essential in adolescents’ formation of knowledge related to STD/HIV/Aids, however there is a deficit of reflexive and constructive dialogue in these spaces, which are not performing their real roles. This problem together with womanly submission, culturally rooted, influences the risk and vulnerability that expose the adolescents. Therefore, it was concluded that the society as a whole needs to reflect on these factors, distinguishing its roles in the control of STD/HIV/Aids of the adolescents as well as of the population in general, aiming health promotion.

Abstract:
Let $(X,\mu)$ be a probability space, $G$ a countable amenable group and $(F_n)_n$ a left F\o lner sequence in $G$. This paper analyzes the non-conventional ergodic averages \[\frac{1}{|F_n|}\sum_{g \in F_n}\prod_{i=1}^d (f_i\circ T_1^g\cdots T_i^g)\] associated to a commuting tuple of $\mu$-preserving actions $T_1$, ..., $T_d:G\curvearrowright X$ and $f_1$, ..., $f_d \in L^\infty(\mu)$. We prove that these averages always converge in $\|\cdot\|_2$, and that they witness a multiple recurrence phenomenon when $f_1 = \ldots = f_d = 1_A$ for a non-negligible set $A\subseteq X$. This proves a conjecture of Bergelson, McCutcheon and Zhang. The proof relies on an adaptation from earlier works of the machinery of sated extensions.

Abstract:
Morphisms between (formal) contexts are certain pairs of maps, one between objects and one between attributes of the contexts in question. We study several classes of such morphisms and the connections between them. Among other things, we show that the category CLc of complete lattices with complete homomorphisms is (up to a natural isomorphism) a full reflective subcategory of the category of contexts with so-called conceptual morphisms; the reflector associates with each context its concept lattice. On the other hand, we obtain a dual adjunction between CLc and the category of contexts with so-called concept continuous morphisms. Suitable restrictions of the adjoint functors yield a categorical equivalence and a duality between purified contexts and doubly based lattices, and in particular, between reduced contexts and irreducibly bigenerated complete lattices. A central role is played by continuous maps between closure spaces and by adjoint maps between complete lattices.

Abstract:
LightGBM is an open-source, distributed and high-performance GB framework built by Microsoft company. LightGBM has some advantages such as fast learning speed, high parallelism efficiency and high-volume data, and so on. Based on the open data set of credit card in Taiwan, five data mining methods, Logistic regression, SVM, neural network, Xgboost and LightGBM, are compared in this paper. The results show that the AUC, F_{1}-Score and the predictive correct ratio of LightGBM are the best, and that of Xgboost is second. It indicates that LightGBM or Xgboost has a good performance in the prediction of categorical response variables and has a good application value in the big data era.

Abstract:
In this study, several radial basis function networks are compared according to their approximation ability in time series forecasting problems. Optimal values for the tested parameters are obtained using computer simulation runs. Effects of width selection in Gaussian Kernels, of the number of neurons in the hidden layer, and of selection of kernel function are investigated.

Abstract:
MARS uses the mouse, rat, dog, opossum, chicken, and frog genome sequences as pairwise informant sources for Twinscan and combines the resulting transcript predictions into genes based on coding (CDS) region overlap. Based on the EGASP assessment, MARS is one of the more accurate dual-genome prediction programs. Compared to the GENCODE annotation, we find that predictive sensitivity increases, while specificity decreases, as more informant species are used. MARS correctly predicts alternatively spliced transcripts for 11 of the 236 multi-exon GENCODE genes that are alternatively spliced in the coding region of their transcripts. For these genes a total of 24 correct transcripts are predicted.The MARS algorithm is able to predict alternatively spliced transcripts without the use of expressed sequence information, although the number of loci in which multiple predicted transcripts match multiple alternatively spliced transcripts in the GENCODE annotation is relatively small.Accurate prediction of protein-coding genes in mammals remains a challenging and active area of research [1]. In the past decade the most important advance in de novo gene prediction came with the initial availability of extensive human and mouse genomic sequences. Several gene prediction algorithms were introduced at that time that improved gene prediction by using the specific patterns of evolutionary conservation that are indicative of protein coding genes [2-4].All of the dual-genome (category 4) gene finders participating in EGASP rely on alignments to one or more informant genome sequences. For predicting human genes, dual-genome gene prediction algorithms most often use the mouse genome sequence as a source of evolutionary conservation information. This was originally a consequence of the early availability, with respect to other mammals, of the mouse genome sequence [5-8]. However, as additional genomes were sequenced, it became apparent that the evolutionarily divergence between human and

Abstract:
Objective We examined whether a panel of SNPs, systematically selected from genome-wide association studies (GWAS), could improve risk prediction of coronary heart disease (CHD), over-and-above conventional risk factors. These SNPs have already demonstrated reproducible associations with CHD; here we examined their use in long-term risk prediction. Study Design and Setting SNPs identified from meta-analyses of GWAS of CHD were tested in 840 men and women aged 55–75 from the Edinburgh Artery Study, a prospective, population-based study with 15 years of follow-up. Cox proportional hazards models were used to evaluate the addition of SNPs to conventional risk factors in prediction of CHD risk. CHD was classified as myocardial infarction (MI), coronary intervention (angioplasty, or coronary artery bypass surgery), angina and/or unspecified ischaemic heart disease as a cause of death; additional analyses were limited to MI or coronary intervention. Model performance was assessed by changes in discrimination and net reclassification improvement (NRI). Results There were significant improvements with addition of 27 SNPs to conventional risk factors for prediction of CHD (NRI of 54%, P<0.001; C-index 0.671 to 0.740, P = 0.001), as well as MI or coronary intervention, (NRI of 44%, P<0.001; C-index 0.717 to 0.750, P = 0.256). ROC curves showed that addition of SNPs better improved discrimination when the sensitivity of conventional risk factors was low for prediction of MI or coronary intervention. Conclusion There was significant improvement in risk prediction of CHD over 15 years when SNPs identified from GWAS were added to conventional risk factors. This effect may be particularly useful for identifying individuals with a low prognostic index who are in fact at increased risk of disease than indicated by conventional risk factors alone.

Abstract:
Nine different small SCFGs were implemented to explore the tradeoffs between model complexity and prediction accuracy. Each model was tested for single sequence structure prediction accuracy on a benchmark set of RNA secondary structures.Four SCFG designs had prediction accuracies near the performance of current energy minimization programs. One of these designs, introduced by Knudsen and Hein in their PFOLD algorithm, has only 21 free parameters and is significantly simpler than the others.Many RNAs conserve a base-paired secondary structure that is important to their function [1,2]. Accurate RNA secondary structure predictions help in understanding an RNA's function, in identifying novel functional RNAs in genome sequences, and in recognizing evolutionarily related RNAs in other organisms. Most RNA secondary structure prediction algorithms are based on energy minimization [2-7]. Alternatively, probabilistic modeling approaches using stochastic context-free grammars (SCFGs) can be used [8-10]. A potential advantage of a probabilistic modeling approach is that it is more readily extended to include other sources of statistical information that constrain a structure prediction.For example, an outstanding problem is consensus RNA secondary structure prediction for a small number of structurally homologous RNA sequences. Comparative sequence analysis is probably the most powerful source of information for RNA structure prediction [11-14]. Homologous RNAs tend to conserve a common base-paired secondary structure, and conserved base-pairing interactions are revealed by compensatory mutations in multiple RNA sequence alignments [12,15-19]. Comparative sequence analysis is extremely reliable, and has produced strikingly accurate RNA structure predictions [14,20], but one is usually not blessed with the large number of sequences (nor the time and human expertise) that a purely comparative approach requires. There is a need for automated approaches that combine evolutionary