Genome evolution of novel influenza A (H1N1) viruses in humans
Zheng Kou,SongNian Hu,TianXian Li
Chinese Science Bulletin , 2009, DOI: 10.1007/s11434-009-0412-z
Abstract: The epidemic situation of A H1N1 flu arose in North America in April 2009, which rapidly expanded to three continents of Europe, Asia and Africa, with the risk ranking up to 5. Until May 13th, the flu virus of A H1N1 had spread into 33 countries and regions, with a laboratory confirmed case number of 5728, including 61 deaths. Based on IRV and EpiFluDB database, 425 parts of A H1N1 flu virus sequence were achieved, followed by sequenced comparison and evolution analysis. The results showed that the current predominant A H1N1 flu virus was a kind of triple reassortment A flu virus: (i) HA, NA, MP, NP and NS originated from swine influenza virus; PB2 and PA originated from bird influenza virus; PB1 originated from human influenza virus. (ii) The origin of swine influenza virus could be subdivided as follows: HA, NP and NS originated from classic swine influenza virus of H1N1 subtype; NA and MP originated from bird origin swine influenza virus of H1N1 subtype. (iii) A H1N1 flu virus experienced no significant mutation during the epidemic spread, accompanied with no reassortment of the virus genome. In the paper, the region of the representative strains for sequence analysis (A/California/04/2009 (H1N1) and A/Mexico/4486/2009 (H1N1)) included USA and Mexico and was relatively wide, which suggested that the analysis results were convincing.
Thousands of Novel Transcripts Identified in Mouse Cerebrum, Testis, and ES Cells Based on ribo-minus RNA Sequencing
Wanfei Liu,Shuhui Song,Songnian Hu
Frontiers in Genetics , 2011, DOI: 10.3389/fgene.2011.00093
Abstract: The high-throughput next-generation sequencing technologies provide an excellent opportunity for the detection of less-abundance transcripts that may not be identifiable by previously available techniques. Here, we report a discovery of thousands of novel transcripts (mostly non-coding RNAs) that are expressed in mouse cerebrum, testis, and embryonic stem (ES) cells, through an in-depth analysis of rmRNA-seq data. These transcripts show significant associations with transcriptional start and elongation signals. At the upstream of these transcripts we observed significant enrichment of histone marks (histone H3 lysine 4 trimethylation, H3K4me3), RNAPII binding sites, and cap analysis of gene expression tags that mark transcriptional start sites. Along the length of these transcripts, we also observed enrichment of histone H3 lysine 36 trimethylation (H3K36me3). Moreover, these transcripts show strong purifying selection in their genomic loci, exonic sequences, and promoter regions, implying functional constraints on the evolution of these transcripts. These results define a collection of novel transcripts in the mouse genome and indicate their potential functions in the mouse tissues and cells.
Differential gene expression in an elite hybrid rice cultivar (Oryza sativa, L) and its parental lines based on SAGE data
Shuhui Song, Hongzhu Qu, Chen Chen, Songnian Hu, Jun Yu
BMC Plant Biology , 2007, DOI: 10.1186/1471-2229-7-49
Abstract: By using an improved strategy of tag-to-gene mapping and two recently annotated genome assemblies (93-11 and PA64s), we identified 10,268 additional high-quality tags, reaching a grand total of 20,595 together with our previous result. We further detected 8.5% and 5.9% physically-mapped genes that are differentially-expressed among the triad (in at least one of the three stages) with P-values less than 0.05 and 0.01, respectively. These genes distributed in 12 major gene expression patterns; among them, 406 up-regulated and 469 down-regulated genes (P < 0.05) were observed. Functional annotations on the identified genes highlighted the conclusion that up-regulated genes (some of them are known enzymes) in hybrid are mostly related to enhancing carbon assimilation in leaves and roots. In addition, we detected a group of up-regulated genes related to male sterility and 442 down-regulated genes related to signal transduction and protein processing, which may be responsible for rice heterosis.We improved tag-to-gene mapping strategy by combining information from transcript sequences and rice genome annotation, and obtained a more comprehensive view on genes that related to rice heterosis. The candidates for heterosis-related genes among different genotypes provided new avenue for exploring the molecular mechanism underlying heterosis.Heterosis is defined as advantageous quantitative and qualitative traits of offspring over their parents, and the utilization of heterosis principles has been a major practice for increasing productivity of plants and animals [1]. A considerable amount of efforts have been invested in unraveling genetic basis of heterosis in rice (Oryza sativa, L) and it was explained mainly by mechanisms such as dominance [2] and epistasis [3]. Although many investigators favored one hypothesis over another, biological mechanisms of rice heterosis may not be fully characterized based on genetic approaches alone, especially based on classical genetic concep
On the molecular mechanism of GC content variation among eubacterial genomes
Hao Wu, Zhang Zhang, Songnian Hu, Jun Yu
Biology Direct , 2012, DOI: 10.1186/1745-6150-7-2
Abstract: Mutator genes, especially those with dominant effects on the mutation spectra, are biased towards either GC or AT richness, and they alter genomic GC content in the two opposite directions. Increased bacterial genome size (or gene number) appears to rely on increased genomic GC content; however, it is unclear whether the changes are directly related to certain environmental pressures. Certain environmental and bacteriological features are related to GC content variation, but their trends are more obvious when analyzed under the dnaE-based grouping scheme. Most terrestrial, plant-associated, and nitrogen-fixing bacteria are members of the dnaE1|dnaE2 group, whereas most pathogenic or symbiotic bacteria in insects, and those dwelling in aquatic environments, are largely members of the dnaE1|polV group.Our studies provide several lines of evidence indicating that DNA polymerase III α subunit and its isoforms participating in either replication (such as polC) or SOS mutagenesis/translesion synthesis (such as dnaE2), play dominant roles in determining GC variability. Other environmental or bacteriological factors, such as genome size, temperature, oxygen requirement, and habitat, either play subsidiary roles or rely indirectly on different mutator genes to fine-tune the GC content. These results provide a comprehensive insight into mechanisms of GC content variation and the robustness of eubacterial genomes in adapting their ever-changing environments over billions of years.This paper was reviewed by Nicolas Galtier, Adam Eyre-Walker, and Eugene Koonin.As one of the key parameters of genome sequences, the genomic GC content, confined to between 25% and 75%, has been investigated for over half a century [1-3]. There are several essential questions to be addressed concerning GC content and its variability. First, how does it vary: randomly, gene-centrically, species-specifically, regulated, or selected? Second, at what level does GC content vary: replication, transcription
Effectiveness of 10 polymorphic microsatellite markers for parentage and pedigree analysis in plateau pika (Ochotona curzoniae)
Kexin Li, Jianing Geng, Jiapeng Qu, Yanming Zhang, Songnian Hu
BMC Genetics , 2010, DOI: 10.1186/1471-2156-11-101
Abstract: The error in parentage assignment using a combination of these 10 loci was very low as indicated by their power of discrimination (0.803 - 0.932), power of exclusion (0.351 - 0.887), and an effectiveness of the combined probability of exclusion in parentage assignment of 99.999%.All the offspring of a family could be assigned to their biological mother; and their father or relatives could also be identified. This set of markers therefore provides a powerful and efficient tool for parentage assignment and other population analyses in the plateau pika.Plateau pikas (Ochotona curzoniae) are small lagomorphs that inhabit the high alpine grasslands of the Tibetan plateau of China. They live in cohesive families and occupy burrow systems. Plateau pikas exhibit monogamy, polygyny, polyandry and promiscuous mating systems [1]. Approximately 57.8% of pikas exhibit philopatry, and dispersal movements are extremely restricted, although some dispersal may occur to ensure spatial separation of kin that may otherwise mate [2]. Inbreeding would be expected to occur under these circumstances. Dominant males monopolize mating in order to maximise reproductive fitness and minimise inbreeding depression. Previous methods to determine the level of inbreeding and how it affects the population depended mainly on direct observation due to the lack of molecular tools. Although family group behaviors have been described through observation in the plateau pika [2-5], details of family structures lack corroborative molecular evidence. In some breeding systems such as lekking, polygyny, polyandry and cooperative breeding, it may be impossible to determine parentage from direct observations [6]. Therefore, molecular tools such as microsatellites markers are necessary to obtain genetic information about family structure, social behavior and dispersal. Microsatellite markers, also called short tandem repeats (STRs), are ideal molecular markers for various genetic studies because they are highly p
An efficient procedure for plant organellar genome assembly, based on whole genome data from the 454 GS FLX sequencing platform
Tongwu Zhang, Xiaowei Zhang, Songnian Hu, Jun Yu
Plant Methods , 2011, DOI: 10.1186/1746-4811-7-38
Abstract: We report a new, rapid procedure for plant chloroplast and mitochondrial genome sequencing and assembly using the Roche/454 GS FLX platform. Plant cells can contain multiple copies of the organellar genomes, and there is a significant correlation between the depth of sequence reads in contigs and the number of copies of the genome. Without isolating organellar DNA from the mixture of nuclear and organellar DNA for sequencing, we retrospectively extracted assembled contigs of either chloroplast or mitochondrial sequences from the whole genome shotgun data. Moreover, the contig connection graph property of Newbler (a platform-specific sequence assembler) ensures an efficient final assembly. Using this procedure, we assembled both chloroplast and mitochondrial genomes of a resurrection plant, Boea hygrometrica, with high fidelity. We also present information and a minimal sequence dataset as a reference for the assembly of other plant organellar genomes.Organellar genomes are widely used in evolutionary and population genetics studies. The plastid genome contains many essential genes, especially those required for photosynthesis. Information from multiple plastid genomes harbors suites of characters that transcend the green plant branch in the tree of life [1]. There are multiple copies of the organellar genomes in plant cells, e.g. plant leaf cells often contain 400 to 1,600 copies of the plastid genome [2]. In angiosperms, most chloroplast (cp) genomes are circular DNA molecules ranging from 120 to 160 kb. They have a quadripartite organization, consisting of two copies of inverted repeats (IRs) of 20-28 kb in size, which divides the rest of the genome into a large-single-copy region (LSC; 80-90 kb) and a small-single-copy (SSC; 16-27 kb) region [3]. Plants have larger and more complex mitochondrial (mt) genomes than other unicellular and multicellular eukaryotes. Mitochondrial genomes, especially those in seed plants, are exceptionally varied in size and structure,
Analysis of porcine MHC expression profile
Jiang Fanbo,Chen Chen,Deng Yajun,Yu Jun,Hu Songnian
Chinese Science Bulletin , 2005, DOI: 10.1007/BF02897382
Abstract: The porcine major histocompatibility complex (MHC, also named swine leukocyte antigen, SLA) is associated not only with immune responsibility and disease susceptibility, but also with some reproductive and productive traits such as growth rate and carcass composition. As yet systematical research on SLA expression profile is not reported. In order to illustrate SLA expression comprehensively and deepen our understanding of its function, we outlined the expression profile of SLA in 51 tissues of Landrace by analyzing a large amount of ESTs produced by “Sino-Danish Porcine Genome Project”. In addition, we also compared the expression profile of SLA in several tissues from different development stages and from another breed (Erhualian). The result shows: (i) classical SLA genes are highly expressed in immune tissues and middle part of intestine; (ii) althoughSLA-3 is an SLA Ia gene, its expression abundance and pattern are quite different from those of the other two SLA Ia genes. The same phenomenon is seen inHLA-C expression, suggesting that the two genes may function similarly and undergo convergent evolution; (iii) except in jejunum, the antigen presenting genes are more highly expressed in breed Erhualian than in Landrace. The difference might associate with the higher resistance to bad conditions (including pathogens) of Erhualian and higher growth rates of Landrace.
Transposable-Element Associated Small RNAs in Bombyx mori Genome
Yimei Cai, Qing Zhou, Caixia Yu, Xumin Wang, Songnian Hu, Jun Yu, Xiaomin Yu
PLOS ONE , 2012, DOI: 10.1371/journal.pone.0036599
Abstract: Small RNAs are a group of regulatory RNA molecules that control gene expression at transcriptional or post-transcriptional levels among eukaryotes. The silkworm, Bombyx mori L., genome harbors abundant repetitive sequences derived from families of retrotransposons and transposons, which together constitute almost half of the genome space and provide ample resource for biogenesis of the three major small RNA families. We systematically discovered transposable-element (TE)-associated small RNAs in B. mori genome based on a deep RNA-sequencing strategy and the effort yielded 182, 788 and 4,990 TE-associated small RNAs in the miRNA, siRNA and piRNA species, respectively. Our analysis suggested that the three small RNA species preferentially associate with different TEs to create sequence and functional diversity, and we also show evidence that a Bombyx non-LTR retrotransposon, bm1645, alone contributes to the generation of TE-associated small RNAs in a very significant way. The fact that bm1645-associated small RNAs partially overlap with each other implies a possibility that this element may be modulated by different mechanisms to generate different products with diverse functions. Taken together, these discoveries expand the small RNA pool in B. mori genome and lead to new knowledge on the diversity and functional significance of TE-associated small RNAs.
The Bryopsis hypnoides Plastid Genome: Multimeric Forms and Complete Nucleotide Sequence
Fang Lü,Wei Xü,Chao Tian,Guangce Wang,Jiangfeng Niu,Guanghua Pan,Songnian Hu
PLOS ONE , 2012, DOI: 10.1371/journal.pone.0014663
Abstract: Bryopsis hypnoides Lamouroux is a siphonous green alga, and its extruded protoplasm can aggregate spontaneously in seawater and develop into mature individuals. The chloroplast of B. hypnoides is the biggest organelle in the cell and shows strong autonomy. To better understand this organelle, we sequenced and analyzed the chloroplast genome of this green alga.
Gene and Genome Parameters of Mammalian Liver Circadian Genes (LCGs)
Gang Wu, Jiang Zhu, Fuhong He, Weiwei Wang, Songnian Hu, Jun Yu
PLOS ONE , 2012, DOI: 10.1371/journal.pone.0046961
Abstract: The mammalian circadian system controls various physiology processes and behavior responses by regulating thousands of circadian genes with rhythmic expressions. In this study, we redefined circadian-regulated genes based on published results in the mouse liver and compared them with other gene groups defined relative to circadian regulations, especially the non-circadian-regulated genes expressed in liver at multiple molecular levels from gene position to protein expression based on integrative analyses of different datasets from the literature. Based on the intra-tissue analysis, the liver circadian genes or LCGs show unique features when compared to other gene groups. First, LCGs in general have less neighboring genes and larger in both genomic and 3′-UTR lengths but shorter in CDS (coding sequence) lengths. Second, LCGs have higher mRNA and protein abundance, higher temporal expression variations, and shorter mRNA half-life. Third, more than 60% of LCGs form major co-expression clusters centered in four temporal windows: dawn, day, dusk, and night. In addition, larger and smaller LCGs are found mainly expressed in the day and night temporal windows, respectively, and we believe that LCGs are well-partitioned into the gene expression regulatory network that takes advantage of gene size, expression constraint, and chromosomal architecture. Based on inter-tissue analysis, more than half of LCGs are ubiquitously expressed in multiple tissues but only show rhythmical expression in one or limited number of tissues. LCGs show at least three-fold lower expression variations across the temporal windows than those among different tissues, and this observation suggests that temporal expression variations regulated by the circadian system is relatively subtle as compared with the tissue expression variations formed during development. Taken together, we suggest that the circadian system selects gene parameters in a cost effective way to improve tissue-specific functions by adapting temporal variations from the environment over evolutionary time scales.
