首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Brucella is a facultative intracellular bacterium belongs to the class alpha proteobacteria. It causes zoonotic disease brucellosis to wide range of animals. Brucella species are highly conserved in nucleotide level. Here, we employed a comparative genomics approach to examine the role of homologous recombination and positive selection in the evolution of Brucella. For the analysis, we have selected 19 complete genomes from 8 species of Brucella. Among the 1599 core genome predicted, 24 genes were showing signals of recombination but no significant breakpoint was found. The analysis revealed that recombination events are less frequent and the impact of recombination occurred is negligible on the evolution of Brucella. This leads to the view that Brucella is clonally evolved. On other hand, 56 genes (3.5 % of core genome) were showing signals of positive selection. Results suggest that natural selection plays an important role in the evolution of Brucella. Some of the genes that are responsible for the pathogenesis of Brucella were found positively selected, presumably due to their role in avoidance of the host immune system.  相似文献   

2.
Bulbophyllum is the largest genus in Orchidaceae with a pantropical distribution. Due to highly significant diversifications, it is considered to be one of the most taxonomically and phylogenetically complex taxa. The diversification pattern and evolutionary adaptation of chloroplast genomes are poorly understood in this species-rich genus, and suitable molecular markers are necessary for species determination and phylogenetic analysis. A natural Asian section Macrocaulia was selected to estimate the interspecific divergence of chloroplast genomes in this study. Here, we sequenced the complete chloroplast genome of four Bulbophyllum species, including three species from section Macrocaulia. The four chloroplast genomes had a typical quadripartite structure with a genome size ranged from 156,182 to 158,524 bp. The chloroplast genomes included 113 unique genes encoding 79 proteins, 30 tRNAs and 4 rRNAs. Comparison of the four chloroplast genomes showed that the three species from section Macrocaulia had similar structure and gene contents, and shared a number of indels, which mainly contribute to its monophyly. In addition, interspecific divergence level was also great. Several exclusive indels and polymorphism SSR loci might be used for taxonomical identification and determining interspecific polymorphisms. A total of 20 intergenic regions and three coding genes of the most variable hotspot regions were proposed as candidate effective molecular markers for future phylogenetic relationships at different taxonomical levels and species divergence in Bulbophyllum. All of chloroplast genes in four Bulbophyllum species were under purifying selection, while 13 sites within six genes exhibited site-specific selection. A whole chloroplast genome phylogenetic analysis based on Maximum Likelihood, Bayesian and Parsimony methods all supported the monophyly of section Macrocaulia and the genus of Bulbophyllum. Our findings provide valuable molecular markers to use in accurately identifying species, clarifying taxonomy, and resolving the phylogeny and evolution of the genus Bulbophyllum. The molecular markers developed in this study will also contribute to further research of conservation of Bulbophyllum species.  相似文献   

3.
Phenotypic behavior of a group of organisms can be studied using a range of molecular evolutionary tools that help to determine evolutionary relationships. Traditionally a gene or a set of gene sequences was used for generating phylogenetic trees. Incomplete evolutionary information in few selected genes causes problems in phylogenetic tree construction. Whole genomes are used as remedy. Now, the task is to identify the suitable parameters to extract the hidden information from whole genome sequences that truly represent evolutionary information. In this study we explored a random anchor (a stretch of 100 nucleotides) based approach (ABWGP) for finding distance between any two genomes, and used the distance estimates to compute evolutionary trees. A number of strains and species of Mycobacteria were used for this study. Anchor-derived parameters, such as cumulative normalized score, anchor order and indels were computed in a pair-wise manner, and the scores were used to compute distance/phylogenetic trees. The strength of branching was determined by bootstrap analysis. The terminal branches are clearly discernable using the distance estimates described here. In general, different measures gave similar trees except the trees based on indels. Overall the tree topology reflected the known biology of the organisms. This was also true for different strains of Escherichia coli. A new whole genome-based approach has been described here for studying evolutionary relationships among bacterial strains and species.  相似文献   

4.
We present an annotation pipeline that accurately predicts exon–intron structures and protein-coding sequences (CDSs) on the basis of full-length cDNAs (FLcDNAs). This annotation pipeline was used to identify genes in 10 plant genomes. In particular, we show that interspecies mapping of FLcDNAs to genomes is of great value in fully utilizing FLcDNA resources whose availability is limited to several species. Because low sequence conservation at 5′- and 3′-ends of FLcDNAs between different species tends to result in truncated CDSs, we developed an improved algorithm to identify complete CDSs by the extension of both ends of truncated CDSs. Interspecies mapping of 71 801 monocot FLcDNAs to the Oryza sativa genome led to the detection of 22 142 protein-coding regions. Moreover, in comparing two mapping programs and three ab initio prediction programs, we found that our pipeline was more capable of identifying complete CDSs. As demonstrated by monocot interspecies mapping, in which nucleotide identity between FLcDNAs and the genome was ∼80%, the resultant inferred CDSs were sufficiently accurate. Finally, we applied both inter- and intraspecies mapping to 10 monocot and dicot genomes and identified genes in 210 551 loci. Interspecies mapping of FLcDNAs is expected to effectively predict genes and CDSs in newly sequenced genomes.  相似文献   

5.
Q Xu  G Xiong  P Li  F He  Y Huang  K Wang  Z Li  J Hua 《PloS one》2012,7(8):e37128

Background

Cotton (Gossypium spp.) is a model system for the analysis of polyploidization. Although ascertaining the donor species of allotetraploid cotton has been intensively studied, sequence comparison of Gossypium chloroplast genomes is still of interest to understand the mechanisms underlining the evolution of Gossypium allotetraploids, while it is generally accepted that the parents were A- and D-genome containing species. Here we performed a comparative analysis of 13 Gossypium chloroplast genomes, twelve of which are presented here for the first time.

Methodology/Principal Findings

The size of 12 chloroplast genomes under study varied from 159,959 bp to 160,433 bp. The chromosomes were highly similar having >98% sequence identity. They encoded the same set of 112 unique genes which occurred in a uniform order with only slightly different boundary junctions. Divergence due to indels as well as substitutions was examined separately for genome, coding and noncoding sequences. The genome divergence was estimated as 0.374% to 0.583% between allotetraploid species and A-genome, and 0.159% to 0.454% within allotetraploids. Forty protein-coding genes were completely identical at the protein level, and 20 intergenic sequences were completely conserved. The 9 allotetraploids shared 5 insertions and 9 deletions in whole genome, and 7-bp substitutions in protein-coding genes. The phylogenetic tree confirmed a close relationship between allotetraploids and the ancestor of A-genome, and the allotetraploids were divided into four separate groups. Progenitor allotetraploid cotton originated 0.43–0.68 million years ago (MYA).

Conclusion

Despite high degree of conservation between the Gossypium chloroplast genomes, sequence variations among species could still be detected. Gossypium chloroplast genomes preferred for 5-bp indels and 1–3-bp indels are mainly attributed to the SSR polymorphisms. This study supports that the common ancestor of diploid A-genome species in Gossypium is the maternal source of extant allotetraploid species and allotetraploids have a monophyletic origin. G. hirsutum AD1 lineages have experienced more sequence variations than other allotetraploids in intergenic regions. The available complete nucleotide sequences of 12 Gossypium chloroplast genomes should facilitate studies to uncover the molecular mechanisms of compartmental co-evolution and speciation of Gossypium allotetraploids.  相似文献   

6.
To explore the mitochondrial genes of the Cruciferae family, the mitochondrial genome of Raphanus sativus (sat) was sequenced and annotated. The circular mitochondrial genome of sat is 239,723 bp and includes 33 protein-coding genes, three rRNA genes and 17 tRNA genes. The mitochondrial genome also contains a pair of large repeat sequences 5.9 kb in length, which may mediate genome reorga-nization into two sub-genomic circles, with predicted sizes of 124.8 kb and 115.0 kb, respectively. Furthermore, gene evolution of mitochondrial genomes within the Cruciferae family was analyzed using sat mitochondrial type (mitotype), together with six other re-ported mitotypes. The cruciferous mitochondrial genomes have maintained almost the same set of functional genes. Compared with Cycas taitungensis (a representative gymnosperm), the mitochondrial genomes of the Cruciferae have lost nine protein-coding genes and seven mitochondrial-like tRNA genes, but acquired six chloroplast-like tRNAs. Among the Cruciferae, to maintain the same set of genes that are necessary for mitochondrial function, the exons of the genes have changed at the lowest rates, as indicated by the numbers of single nucleotide polymorphisms. The open reading frames (ORFs) of unknown function in the cruciferous genomes are not conserved. Evolutionary events, such as mutations, genome reorganizations and sequence insertions or deletions (indels), have resulted in the non- conserved ORFs in the cruciferous mitochondrial genomes, which is becoming significantly different among mitotypes. This work represents the first phylogenic explanation of the evolution of genes of known function in the Cruciferae family. It revealed significant variation in ORFs and the causes of such variation.  相似文献   

7.
8.

Background  

Brucellaspecies are Gram-negative, facultative intracellular bacteria that cause brucellosis in humans and animals. Sequences of fourBrucellagenomes have been published, and variousBrucellagene and genome data and analysis resources exist. A web gateway to integrate these resources will greatly facilitateBrucellaresearch.Brucellagenome data in current databases is largely derived from computational analysis without experimental validation typically found in peer-reviewed publications. It is partially due to the lack of a literature mining and curation system able to efficiently incorporate the large amount of literature data into genome annotation. It is further hypothesized that literature-basedBrucellagene annotation would increase understanding of complicatedBrucellapathogenesis mechanisms.  相似文献   

9.
Insertions and deletions (indels) in protein-coding genes are important sources of genetic variation. Their role in creating new proteins may be especially important after gene duplication. However, little is known about how indels affect the divergence of duplicate genes. We here study thousands of duplicate genes in five fish (teleost) species with completely sequenced genomes. The ancestor of these species has been subject to a fish-specific genome duplication (FSGD) event that occurred approximately 350 Ma. We find that duplicate genes contain at least 25% more indels than single-copy genes. These indels accumulated preferentially in the first 40 my after the FSGD. A lack of widespread asymmetric indel accumulation indicates that both members of a duplicate gene pair typically experience relaxed selection. Strikingly, we observe a 30-80% excess of deletions over insertions that is consistent for indels of various lengths and across the five genomes. We also find that indels preferentially accumulate inside loop regions of protein secondary structure and in regions where amino acids are exposed to solvent. We show that duplicate genes with high indel density also show high DNA sequence divergence. Indel density, but not amino acid divergence, can explain a large proportion of the tertiary structure divergence between proteins encoded by duplicate genes. Our observations are consistent across all five fish species. Taken together, they suggest a general pattern of duplicate gene evolution in which indels are important driving forces of evolutionary change.  相似文献   

10.
Comparative genomic approaches are useful in identifying molecular differences between organisms. Currently available methods fail to identify small changes in genomes, such as expansion of short repetitive motifs and to analyse divergent sequences. In this report, we describe an anchor-based whole genome comparison (ABWGC) method. ABWGC is based on random sampling of anchor sequences from one genome, followed by analysis of sampled and homologous regions from the target genome. The method was applied to compare two strains of Mycobacterium tuberculosis CDC1551 and H37Rv. ABWGC was able to identify a total of 104 indels including 20 expansion of short repetitive sequences and five recombination events. It included 18 new unidentified genomic differences. ABWGC also identified 188 SNPs including eight new ones. The method was also used to compare M. tuberculosis H37Rv and M. avium genomes. ABWGC was able to correctly pick 1002 additional indels (size>100nt) between the two organisms in contrast to MUMmer, a popular tool for comparative genomics. ABWGC was able to identify correctly repeat expansion and indels in a set of simulated sequences. The study also revealed important role of small repeat expansion in the evolution of M. tuberculosis strains.  相似文献   

11.
12.

Background and Aims

It is known that the miniature inverted-repeat terminal element (MITE) preferentially inserts into low-copy-number sequences or genic regions. Characterization of the second largest subunit of low-copy nuclear RNA polymerase II (RPB2) has indicated that MITE and indels have shaped the homoeologous RPB2 loci in the St and H genome of Eymus species in Triticeae. The aims of this study was to determine if there is MITE in the RPB2 gene in Hordeum genomes, and to compare the gene evolution of RPB2 with other diploid Triticeae species. The sequences were used to reconstruct the phylogeny of the genus Hordeum.

Methods

RPB2 regions from all diploid species of Hordeum, one tetraploid species (H. brevisubulatum) and ten accessions of diploid Triticeae species were amplified and sequenced. Parsimony analysis of the DNA dataset was performed in order to reveal the phylogeny of Hordeum species.

Key Results

MITE was detected in the Xu genome. A 27–36 bp indel sequence was found in the I and Xu genome, but deleted in the Xa and some H genome species. Interestingly, the indel length in H genomes corresponds well to their geographical distribution. Phylogenetic analysis of the RPB2 sequences positioned the H and Xa genome in one monophyletic group. The I and Xu genomes are distinctly separated from the H and Xa ones. The RPB2 data also separated all New World H genome species except H. patagonicum ssp. patagonicum from the Old World H genome species.

Conclusions

MITE and large indels have shaped the RPB2 loci between the Xu and H, I and Xa genomes. The phylogenetic analysis of the RPB2 sequences confirmed the monophyly of Hordeum. The maximum-parsimony analysis demonstrated the four genomes to be subdivided into two groups.Key words: Molecular evolution, RPB2, Hordeum, transposable element, phylogeny  相似文献   

13.
Recognizing the pseudogenes in bacterial genomes   总被引:9,自引:0,他引:9  
Pseudogenes are now known to be a regular feature of bacterial genomes and are found in particularly high numbers within the genomes of recently emerged bacterial pathogens. As most pseudogenes are recognized by sequence alignments, we use newly available genomic sequences to identify the pseudogenes in 11 genomes from 4 bacterial genera, each of which contains at least 1 human pathogen. The numbers of pseudogenes range from 27 in Staphylococcus aureus MW2 to 337 in Yersinia pestis CO92 (e.g. 1–8% of the annotated genes in the genome). Most pseudogenes are formed by small frameshifting indels, but because stop codons are A + T-rich, the two low-G + C Gram-positive taxa (Streptococcus and Staphylococcus) have relatively high fractions of pseudogenes generated by nonsense mutations when compared with more G + C-rich genomes. Over half of the pseudogenes are produced from genes whose original functions were annotated as ‘hypothetical’ or ‘unknown’; however, several broadly distributed genes involved in nucleotide processing, repair or replication have become pseudogenes in one of the sequenced Vibrio vulnificus genomes. Although many of our comparisons involved closely related strains with broadly overlapping gene inventories, each genome contains a largely unique set of pseudogenes, suggesting that pseudogenes are formed and eliminated relatively rapidly from most bacterial genomes.  相似文献   

14.
15.
16.
Brucella species include important zoonotic pathogens that have a substantial impact on both agriculture and human health throughout the world. Brucellae are thought of as “stealth pathogens” that escape recognition by the host innate immune response, modulate the acquired immune response, and evade intracellular destruction. We analyzed the genome sequences of members of the family Brucellaceae to assess its evolutionary history from likely free-living soil-based progenitors into highly successful intracellular pathogens. Phylogenetic analysis split the genus into two groups: recently identified and early-dividing “atypical” strains and a highly conserved “classical” core clade containing the major pathogenic species. Lateral gene transfer events brought unique genomic regions into Brucella that differentiated them from Ochrobactrum and allowed the stepwise acquisition of virulence factors that include a type IV secretion system, a perosamine-based O antigen, and systems for sequestering metal ions that are absent in progenitors. Subsequent radiation within the core Brucella resulted in lineages that appear to have evolved within their preferred mammalian hosts, restricting their virulence to become stealth pathogens capable of causing long-term chronic infections.  相似文献   

17.
Brucella species are facultative intracellular pathogenic α-Proteobacteria that can cause brucellosis in humans and domestic animals. The clinical and veterinary importance of the bacteria has led to well established studies on the molecular mechanisms of Brucella infection of host organisms. However, to date, no genome-wide study has scanned for genes related to the host specificity of Brucella spp. The majority of bacterial genes related to specific environmental adaptations such as host specificity are well-known to have evolved under positive selection pressure. We thus detected signals of positive selection for individual orthologous genes among Brucella genomes and identified genes related to host specificity. We first determined orthologous sets from seven completely sequenced Brucella genomes using the Reciprocal Best Hits (RBH). A maximum likelihood analysis based on the branch-site test was accomplished to examine the presence of positive selection signals, which was subsequently confirmed by phylogenetic analysis. Consequently, 12 out of 2,033 orthologous genes were positively selected by specific Brucella lineages, each of which belongs to a particular animal host. Extensive literature reviews revealed that half of these computationally identified genes are indeed involved in Brucella host specificity. We expect that this genome-wide approach based on positive selection may be reliably used to screen for genes related to environmental adaptation of a particular species and that it will provide a set of appropriate candidate genes.  相似文献   

18.

Background

Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes.

Methodology/Principal Findings

We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes.

Conclusion

The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species.  相似文献   

19.
Comparative chloroplast genome analyses are mostly carried out at lower taxonomic levels, such as the family and genus levels. At higher taxonomic levels, chloroplast genomes are generally used to reconstruct phylogenies. However, little attention has been paid to chloroplast genome evolution within orders. Here, we present the chloroplast genome of Sedum sarmentosum and take advantage of several available (or elucidated) chloroplast genomes to examine the evolution of chloroplast genomes in Saxifragales. The chloroplast genome of S. sarmentosum is 150,448 bp long and includes 82,212 bp of a large single-copy (LSC) region, 16.670 bp of a small single-copy (SSC) region, and a pair of 25,783 bp sequences of inverted repeats (IRs).The genome contains 131 unique genes, 18 of which are duplicated within the IRs. Based on a comparative analysis of chloroplast genomes from four representative Saxifragales families, we observed two gene losses and two pseudogenes in Paeonia obovata, and the loss of an intron was detected in the rps16 gene of Penthorum chinense. Comparisons among the 72 common protein-coding genes confirmed that the chloroplast genomes of S. sarmentosum and Paeonia obovata exhibit accelerated sequence evolution. Furthermore, a strong correlation was observed between the rates of genome evolution and genome size. The detected genome size variations are predominantly caused by the length of intergenic spacers, rather than losses of genes and introns, gene pseudogenization or IR expansion or contraction. The genome sizes of these species are negatively correlated with nucleotide substitution rates. Species with shorter duration of the life cycle tend to exhibit shorter chloroplast genomes than those with longer life cycles.  相似文献   

20.
Pseudomonas aeruginosa is an opportunistic bacterial pathogen able to thrive in highly diverse ecological niches and to infect compromised patients. Its genome exhibits a mosaic structure composed of a core genome into which accessory genes are inserted en bloc at specific sites. The size and the content of the core genome are open for debate as their estimation depends on the set of genomes considered and the pipeline of gene detection and clustering. Here, we redefined the size and the content of the core genome of P. aeruginosa from fully re-analyzed genomes of 17 reference strains. After the optimization of gene detection and clustering parameters, the core genome was defined at 5,233 orthologs, which represented ~ 88% of the average genome. Extrapolation indicated that our panel was suitable to estimate the core genome that will remain constant even if new genomes are added. The core genome contained resistance determinants to the major antibiotic families as well as most metabolic, respiratory, and virulence genes. Although some virulence genes were accessory, they often related to conserved biological functions. Long-standing prophage elements were subjected to a genetic drift to eventually display a G+C content as higher as that of the core genome. This contrasts with the low G+C content of highly conserved ribosomal genes. The conservation of metabolic and respiratory genes could guarantee the ability of the species to thrive on a variety of carbon sources for energy in aerobiosis and anaerobiosis. Virtually all the strains, of environmental or clinical origin, have the complete toolkit to become resistant to the major antipseudomonal compounds and possess basic pathogenic mechanisms to infect humans. The knowledge of the genes shared by the majority of the P. aeruginosa isolates is a prerequisite for designing effective therapeutics to combat the wide variety of human infections.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号