Abstract:
Next generation sequencing (NGS) analysis, open-source software, cloud computing and wiki-style genomics were among the hot topics and discussions at the recent Genome Informatics meeting at the Wellcome Trust Genome Campus, Cambridge, UK. Here we summarize some highlights of the meeting.Comparison of related genomes can generate a wealth of knowledge about genome evolution and function. Recent advances in NGS technologies have greatly increased the scale and scope with which we can interrogate novel genomes and uncover genetic variation. However, for variation detection and statistical analysis, there are false positive errors for various reasons, notably incompleteness of reference genomes, read mapping errors or limitations, and sequencing-induced features. Benjamin Dickins (Penn State University, University Park, USA) discussed an approach to estimate polymorphism accuracy from NGS data by deeply sequencing a small plasmid genome and comparing it with Sanger sequencing.Elliott Margulies (National Human Genome Research Institute, Bethesda, USA) gave an enticing presentation on this topic entitled 'Analysis of identical twins' genomes reveals sources of false-positive variation detection'. With 55× and 50× depth of read coverage from each twin's sample, they initially identified 83,538 discordant genotype calls across 97.6% of the human reference genome. Through inspection of a random set of discordantly genotyped positions, he revealed that a majority occurred in regions with poorly aligned reads. Margulies noted that he would be highly suspicious of genotype calls in regions with high coverage but with low mapping scores. When these events were filtered out, the number dropped to 13,140, a reduction of 84%. By then further introducing other filtering mechanisms, such as incorrect alignments of short reads across indels, Q20 (99% confidence) evidence in the other twin and 10% allele frequency, Margulies' final number of discordant genotype calls was only in the r

Abstract:
Advances in genome sequencing are providing unprecedented resolution of rare and private variants. However, methods which assess the effect of these variants have relied predominantly on information within coding sequences. Assessing their impact in non-coding sequences remains a significant contemporary challenge. In this review, we highlight the role of regulatory variation as causative agents and modifiers of monogenic disorders. We further discuss how advances in functional genomics are now providing new opportunity to assess the impact of rare non-coding variants and their role in disease.

Abstract:
We introduce the notion of an ACF space, that is, a space for which a generalized version of M. Riesz's theorem for conjugate functions with values in the Banach space is bounded. We use transference to prove that spaces for which the Hilbert transform is bounded, i\.e\. $X\in\text{HT}$, are ACF spaces. We then show that Bourgain's proof of $X\in\text{HT}\implies X\in\text{UMD}$ is a consequence of this result.

Abstract:
Population-scale genome sequencing allows the characterization of functional effects of a broad spectrum of genetic variants underlying human phenotypic variation. Here, we investigate the influence of rare and common genetic variants on gene expression patterns, using variants identified from sequencing data from the 1000 genomes project in an African and European population sample and gene expression data from lymphoblastoid cell lines. We detect comparable numbers of expression quantitative trait loci (eQTLs) when compared to genotypes obtained from HapMap 3, but as many as 80% of the top expression quantitative trait variants (eQTVs) discovered from 1000 genomes data are novel. The properties of the newly discovered variants suggest that mapping common causal regulatory variants is challenging even with full resequencing data; however, we observe significant enrichment of regulatory effects in splice-site and nonsense variants. Using RNA sequencing data, we show that 46.2% of nonsynonymous variants are differentially expressed in at least one individual in our sample, creating widespread potential for interactions between functional protein-coding and regulatory variants. We also use allele-specific expression to identify putative rare causal regulatory variants. Furthermore, we demonstrate that outlier expression values can be due to rare variant effects, and we approximate the number of such effects harboured in an individual by effect size. Our results demonstrate that integration of genomic and RNA sequencing analyses allows for the joint assessment of genome sequence and genome function.

Abstract:
Transcriptomic assays that measure expression levels are widely used to study the manifestation of environmental or genetic variations in cellular processes. RNA-sequencing in particular has the potential to considerably improve such understanding because of its capacity to assay the entire transcriptome, including novel transcriptional events. However, as with earlier expression assays, analysis of RNA-sequencing data requires carefully accounting for factors that may introduce systematic, confounding variability in the expression measurements, resulting in spurious correlations. Here, we consider the problem of modeling and removing the effects of known and hidden confounding factors from RNA-sequencing data. We describe a unified residual framework that encapsulates existing approaches, and using this framework, present a novel method, HCP (Hidden Covariates with Prior). HCP uses a more informed assumption about the confounding factors, and performs as well or better than existing approaches while having a much lower computational cost. Our experiments demonstrate that accounting for known and hidden factors with appropriate models improves the quality of RNA-sequencing data in two very different tasks: detecting genetic variations that are associated with nearby expression variations (cis-eQTLs), and constructing accurate co-expression networks.

Abstract:
RNA sequencing (RNA-seq) enables characterization and quantification of individual transcriptomes as well as detection of patterns of allelic expression and alternative splicing. Current RNA-seq protocols depend on high-throughput short-read sequencing of cDNA. However, as ongoing advances are rapidly yielding increasing read lengths, a technical hurdle remains in identifying the degree to which differences in read length influence various transcriptome analyses. In this study, we generated two paired-end RNA-seq datasets of differing read lengths (2x75 bp and 2x262 bp) for lymphoblastoid cell line GM12878 and compared the effect of read length on transcriptome analyses, including read-mapping performance, gene and transcript quantification, and detection of allele-specific expression (ASE) and allele-specific alternative splicing (ASAS) patterns. Our results indicate that, while the current long-read protocol is considerably more expensive than short-read sequencing, there are important benefits that can only be achieved with longer read length, including lower mapping bias and reduced ambiguity in assigning reads to genomic elements, such as mRNA transcript. We show that these benefits ultimately lead to improved detection of cis-acting regulatory and splicing variation effects within individuals.

Abstract:
This paper gives another version of results due to Raugel and Sell, and similar results due to Moise, Temam and Ziane, that state the following: the solution of the Navier-Stokes equation on a thin three-dimensional domain with periodic boundary conditions has global regularity, as long as there is some control on the size of the initial data and the forcing term, where the control is larger than that obtainable via ``small data'' estimates. The approach taken is to consider the three-dimensional equation as a perturbation of the equation when the vector field does not depend upon the coordinate in the thin direction.

Abstract:
Suppose that $H(q,p)$ is a Hamiltonian on a manifold $M$, and $\tilde L(q,\dot q)$, the Rayleigh dissipation function, satisfies the same hypotheses as a Lagrangian on the manifold $M$. We provide a Hamiltonian framework that gives the equation $\dot q = \frac{\partial H}{\partial p}(q,p), \quad \dot p = - \frac{\partial H}{\partial q}(q,p) - \frac{\partial \tilde L}{\partial \dot q}(q,\dot q)$. The method is to embed $M$ into a larger framework where the motion drives a wave equation on the negative half line, where the energy in the wave represents heat being carried away from the motion. We obtain a version of N\"other's Theorem that is valid for dissipative systems. We also show that this framework fits the widely held view of how Hamiltonian dynamics can lead to the "arrow of time."

Abstract:
We consider an equation similar to the Navier-Stokes equation. We show that there is initial data that exists in every Triebel-Lizorkin or Besov space (and hence in every Lebesgue and Sobolev space), such that after a finite time, the solution is in no Triebel-Lizorkin or Besov space (and hence in no Lebesgue or Sobolev space). The purpose is to show the limitations of the so called semigroup method for the Navier-Stokes equation. We also consider the possibility of existence of solutions with initial data in the Besov space $\dot B^{-1,\infty}_\infty$. We give initial data in this space for which there is no reasonable solution for the Navier-Stokes like equation.

Abstract:
We obtain logarithmic improvements for conditions for regularity of the Navier-Stokes equation, similar to those of Prodi-Serrin or Beale-Kato-Majda. Some of the proofs make use of a stochastic approach involving Feynman-Kac like inequalities. As part of the our methods, we give a different approach to a priori estimates of Foias, Guillope and Temam.