首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 33 毫秒
1.
缅甸陆龟线粒体全基因组的测序及分析   总被引:4,自引:0,他引:4  
张颖  聂刘旺  宋娇莲 《动物学报》2007,53(1):151-158
本文参照近缘物种的线粒体基因组序列,设计17对特异引物,采用LD-PCR、PCR及测序技术获得了我国广西产缅甸陆龟的线粒体全基因组序列,分析了其基因组特点和各基因的定位。结果表明:缅甸陆龟线粒体基因组全长为16813bp,碱基组成为35.30%A、26.47%T、12.09%G、26.14%C,包括13个蛋白质编码基因、2个rRNA基因、22个tRNA基因和1个非编码基因控制区(D-Loop区)。缅甸陆龟线粒体基因组各基因长度、位置与典型的脊椎动物相似,其编码蛋白质区域和rRNA基因与其它脊椎动物具有很高的同源性,显示龟类线粒体基因组在进化上十分保守。将缅甸陆龟的线粒体基因组序列提交到GenBank,获得的检索号为DQ656607。本文还结合GenBank中已发表的其它16种龟鳖类动物的线粒体基因组序列,探讨龟鳖类动物不同科间的系统进化关系。  相似文献   

2.
Analysis of evolution of paralogous genes in a genome is central to our understanding of genome evolution. Comparison of closely related bacterial genomes, which has provided clues as to how genome sequences evolve under natural conditions, would help in such an analysis. With species Staphylococcus aureus, whole-genome sequences have been decoded for seven strains. We compared their DNA sequences to detect large genome polymorphisms and to deduce mechanisms of genome rearrangements that have formed each of them. We first compared strains N315 and Mu50, which make one of the most closely related strain pairs, at the single-nucleotide resolution to catalogue all the middle-sized (more than 10 bp) to large genome polymorphisms such as indels and substitutions. These polymorphisms include two paralogous gene sets, one in a tandem paralogue gene cluster for toxins in a genomic island and the other in a ribosomal RNA operon. We also focused on two other tandem paralogue gene clusters and type I restriction-modification (RM) genes on the genomic islands. Then we reconstructed rearrangement events responsible for these polymorphisms, in the paralogous genes and the others, with reference to the other five genomes. For the tandem paralogue gene clusters, we were able to infer sequences for homologous recombination generating the change in the repeat number. These sequences were conserved among the repeated paralogous units likely because of their functional importance. The sequence specificity (S) subunit of type I RM systems showed recombination, likely at the homology of a conserved region, between the two variable regions for sequence specificity. We also noticed novel alleles in the ribosomal RNA operons and suggested a role for illegitimate recombination in their formation. These results revealed importance of recombination involving long conserved sequence in the evolution of paralogous genes in the genome.  相似文献   

3.
RNAmmer: consistent and rapid annotation of ribosomal RNA genes   总被引:7,自引:0,他引:7  
The publication of a complete genome sequence is usually accompanied by annotations of its genes. In contrast to protein coding genes, genes for ribosomal RNA (rRNA) are often poorly or inconsistently annotated. This makes comparative studies based on rRNA genes difficult. We have therefore created computational predictors for the major rRNA species from all kingdoms of life and compiled them into a program called RNAmmer. The program uses hidden Markov models trained on data from the 5S ribosomal RNA database and the European ribosomal RNA database project. A pre-screening step makes the method fast with little loss of sensitivity, enabling the analysis of a complete bacterial genome in less than a minute. Results from running RNAmmer on a large set of genomes indicate that the location of rRNAs can be predicted with a very high level of accuracy. Novel, unannotated rRNAs are also predicted in many genomes. The software as well as the genome analysis results are available at the CBS web server.  相似文献   

4.
We determined the complete nucleotide sequence of the chloroplast genome of the leptosporangiate fern, Adiantum capillus-veneris L. (Pteridaceae). The circular genome is 150,568 bp, with a large single-copy region (LSC) of 82,282 bp, a small-single copy region (SSC) of 21,392 bp and inverted repeats (IR) of 23,447 bp each. We compared the sequence to other published chloroplast genomes to infer the location of putative genes. When the IR is considered only once, we assigned 118 genes, of which 85 encode proteins, 29 encode tRNAs and 4 encode rRNAs. Four protein-coding genes, all four rRNA genes and six tRNA genes occur in the IR. Most (57) putative protein-coding genes appear to start with an ATG codon, but we also detected five other possible start codons, some of which suggest tRNA editing. We also found 26 apparent stop codons in 18 putative genes, also suggestive of RNA editing. We found all but one of the tRNA genes necessary to encode the complete repertoire required for translation. The missing trnK gene appears to have been disrupted by a large inversion, relative to other published chloroplast genomes. We detected several structural rearrangements that may provide useful information for phylogenetic studies.  相似文献   

5.
目的 获得中国地鼠线粒体基因组序列,为线粒体疾病模型提供分子数据.方法 参照近缘物种的线粒体基因组序列,设计27对特异引物,采用TD-PCR及测序技术获得了中国地鼠的线粒体全基因组序列,分析了其基因组特点和各基因的定位.还结合GenBank中已发表的其他5种啮齿类动物的线粒体基因组序列,探讨啮齿类动物不同科间的系统进化关系.结果 中国地鼠线粒体基因组全长为16 283 bp,碱基组成为33.53%A、30.50%T、12.98%G、22.80%C,包括13个蛋白质编码基因、2个rRNA基因、22个tRNA基因和1个非编码基因控制区.中国地鼠和金黄地鼠亲缘关系最近.结论 中国地鼠线粒体基因组各基因长度、位置与典型的啮齿类动物相似,其编码蛋白质区域和rRNA基因与其他啮齿类动物具有很高的同源性,显示线粒体基因组在进化上十分保守.5种动物的分子系统进化树与传统分类地位一致.  相似文献   

6.
We present an annotation pipeline that accurately predicts exon–intron structures and protein-coding sequences (CDSs) on the basis of full-length cDNAs (FLcDNAs). This annotation pipeline was used to identify genes in 10 plant genomes. In particular, we show that interspecies mapping of FLcDNAs to genomes is of great value in fully utilizing FLcDNA resources whose availability is limited to several species. Because low sequence conservation at 5′- and 3′-ends of FLcDNAs between different species tends to result in truncated CDSs, we developed an improved algorithm to identify complete CDSs by the extension of both ends of truncated CDSs. Interspecies mapping of 71 801 monocot FLcDNAs to the Oryza sativa genome led to the detection of 22 142 protein-coding regions. Moreover, in comparing two mapping programs and three ab initio prediction programs, we found that our pipeline was more capable of identifying complete CDSs. As demonstrated by monocot interspecies mapping, in which nucleotide identity between FLcDNAs and the genome was ∼80%, the resultant inferred CDSs were sufficiently accurate. Finally, we applied both inter- and intraspecies mapping to 10 monocot and dicot genomes and identified genes in 210 551 loci. Interspecies mapping of FLcDNAs is expected to effectively predict genes and CDSs in newly sequenced genomes.  相似文献   

7.
Complete structure of the chloroplast genome of Arabidopsis thaliana.   总被引:7,自引:0,他引:7  
The complete nucleotide sequence of the chloroplast genome of Arabidopsis thaliana has been determined. The genome as a circular DNA composed of 154,478 bp containing a pair of inverted repeats of 26,264 bp, which are separated by small and large single copy regions of 17,780 bp and 84,170 bp, respectively. A total of 87 potential protein-coding genes including 8 genes duplicated in the inverted repeat regions, 4 ribosomal RNA genes and 37 tRNA genes (30 gene species) representing 20 amino acid species were assigned to the genome on the basis of similarity to the chloroplast genes previously reported for other species. The translated amino acid sequences from respective potential protein-coding genes showed 63.9% to 100% sequence similarity to those of the corresponding genes in the chloroplast genome of Nicotiana tabacum, indicating the occurrence of significant diversity in the chloroplast genes between two dicot plants. The sequence data and gene information are available on the World Wide Web database KAOS (Kazusa Arabidopsis data Opening Site) at http://www.kazusa.or.jp/arabi/.  相似文献   

8.
9.
Rickettsia are endosymbionts of arthropods, some of which are vectored to vertebrates where they cause disease. Recently, it has been found that some Rickettsia strains harbour conjugative plasmids and others encode some conjugative machinery within the bacterial genome. We investigated the distribution of these conjugation genes in a phylogenetically diverse collection of Rickettsia isolated from arthropods. We found that these genes are common throughout the genus and, in stark contrast to other genes in the genome, conjugation genes are frequently horizontally transmitted between strains. There is no evidence to suggest that these genes are preferentially transferred between phylogenetically related strains, which is surprising given that closely related strains infect similar host species. In addition to detecting patterns of horizontal transmission between diverse Rickettsia species, these findings have implications for the evolution of pathogenicity, the evolution of Rickettsia genomes and the genetic manipulation of intracellular bacteria.  相似文献   

10.
The availability of a large number of complete genome sequences raises the question of how many genes are essential for cellular life. Trying to reconstruct the core of the protein-coding gene set for a hypothetical minimal bacterial cell, we have performed a computational comparative analysis of eight bacterial genomes. Six of the analyzed genomes are very small due to a dramatic genome size reduction process, while the other two, corresponding to free-living relatives, are larger. The available data from several systematic experimental approaches to define all the essential genes in some completely sequenced bacterial genomes were also considered, and a reconstruction of a minimal metabolic machinery necessary to sustain life was carried out. The proposed minimal genome contains 206 protein-coding genes with all the genetic information necessary for self-maintenance and reproduction in the presence of a full complement of essential nutrients and in the absence of environmental stress. The main features of such a minimal gene set, as well as the metabolic functions that must be present in the hypothetical minimal cell, are discussed.  相似文献   

11.

Background

Pseudomonas aeruginosa is an important opportunistic pathogen responsible for many infections in hospitalized and immunocompromised patients. Previous reports estimated that approximately 10% of its 6.6 Mbp genome varies from strain to strain and is therefore referred to as “accessory genome”. Elements within the accessory genome of P. aeruginosa have been associated with differences in virulence and antibiotic resistance. As whole genome sequencing of bacterial strains becomes more widespread and cost-effective, methods to quickly and reliably identify accessory genomic elements in newly sequenced P. aeruginosa genomes will be needed.

Results

We developed a bioinformatic method for identifying the accessory genome of P. aeruginosa. First, the core genome was determined based on sequence conserved among the completed genomes of twelve reference strains using Spine, a software program developed for this purpose. The core genome was 5.84 Mbp in size and contained 5,316 coding sequences. We then developed an in silico genome subtraction program named AGEnt to filter out core genomic sequences from P. aeruginosa whole genomes to identify accessory genomic sequences of these reference strains. This analysis determined that the accessory genome of P. aeruginosa ranged from 6.9-18.0% of the total genome, was enriched for genes associated with mobile elements, and was comprised of a majority of genes with unknown or unclear function. Using these genomes, we showed that AGEnt performed well compared to other publically available programs designed to detect accessory genomic elements. We then demonstrated the utility of the AGEnt program by applying it to the draft genomes of two previously unsequenced P. aeruginosa strains, PA99 and PA103.

Conclusions

The P. aeruginosa genome is rich in accessory genetic material. The AGEnt program accurately identified the accessory genomes of newly sequenced P. aeruginosa strains, even when draft genomes were used. As P. aeruginosa genomes become available at an increasingly rapid pace, this program will be useful in cataloging the expanding accessory genome of this bacterium and in discerning correlations between phenotype and accessory genome makeup. The combination of Spine and AGEnt should be useful in defining the accessory genomes of other bacterial species as well.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-737) contains supplementary material, which is available to authorized users.  相似文献   

12.
The Horizontal Gene Transfer DataBase (HGT-DB) is a genomic database that includes statistical parameters such as G+C content, codon and amino-acid usage, as well as information about which genes deviate in these parameters for prokaryotic complete genomes. Under the hypothesis that genes from distantly related species have different nucleotide compositions, these deviated genes may have been acquired by horizontal gene transfer. The current version of the database contains 88 bacterial and archaeal complete genomes, including multiple chromosomes and strains. For each genome, the database provides statistical parameters for all the genes, as well as averages and standard deviations of G+C content, codon usage, relative synonymous codon usage and amino-acid content. It also provides information about correspondence analyses of the codon usage, plus lists of extraneous group of genes in terms of G+C content and lists of putatively acquired genes. With this information, researchers can explore the G+C content and codon usage of a gene when they find incongruities in sequence-based phylogenetic trees. A search engine that allows searches for gene names or keywords for a specific organism is also available. HGT-DB is freely accessible at http://www.fut.es/~debb/HGT.  相似文献   

13.
Kanaya S  Kinouchi M  Abe T  Kudo Y  Yamada Y  Nishi T  Mori H  Ikemura T 《Gene》2001,276(1-2):89-99
With increases in the amounts of available DNA sequence data, it has become increasingly important to develop tools for comprehensive systematic analysis and comparison of species-specific characteristics of protein-coding sequences for a wide variety of genomes. In the present study, we used a novel neural-network algorithm, a self-organizing map (SOM), to efficiently and comprehensively analyze codon usage in approximately 60,000 genes from 29 bacterial species simultaneously. This SOM makes it possible to cluster and visualize genes of individual species separately at a much higher resolution than can be obtained with principal component analysis. The organization of the SOM can be explained by the genome G+C% and tRNA compositions of the individual species. We used SOM to examine codon usage heterogeneity in the E. coli O157 genome, which contains 'O157-unique segments' (O-islands), and showed that SOM is a powerful tool for characterization of horizontally transferred genes.  相似文献   

14.
MOTIVATION: The recent outbreak of severe acute respiratory syndrome (SARS) caused by SARS coronavirus (SARS-CoV) has necessitated an in-depth molecular understanding of the virus to identify new drug targets. The availability of complete genome sequence of several strains of SARS virus provides the possibility of identification of protein-coding genes and defining their functions. Computational approach to identify protein-coding genes and their putative functions will help in designing experimental protocols. RESULTS: In this paper, a novel analysis of SARS genome using gene prediction method GeneDecipher developed in our laboratory has been presented. Each of the 18 newly sequenced SARS-CoV genomes has been analyzed using GeneDecipher. In addition to polyprotein 1ab(1), polyprotein 1a and the four genes coding for major structural proteins spike (S), small envelope (E), membrane (M) and nucleocapsid (N), six to eight additional proteins have been predicted depending upon the strain analyzed. Their lengths range between 61 and 274 amino acids. Our method also suggests that polyprotein 1ab, polyprotein 1a, S, M and N are proteins of viral origin and others are of prokaryotic. Putative functions of all predicted protein-coding genes have been suggested using conserved peptides present in their open reading frames. AVAILABILITY: Detailed results of GeneDecipher analysis of all the 18 strains of SARS-CoV genomes are available at http://www.igib.res.in/sarsanalysis.html  相似文献   

15.
16.
The genome sequencing of H37Rv strain of Mycobacterium tuberculosis was completed in 1998 followed by the whole genome sequencing of a clinical isolate, CDC1551 in 2002. Since then, the genomic sequences of a number of other strains have become available making it one of the better studied pathogenic bacterial species at the genomic level. However, annotation of its genome remains challenging because of high GC content and dissimilarity to other model prokaryotes. To this end, we carried out an in-depth proteogenomic analysis of the M. tuberculosis H37Rv strain using Fourier transform mass spectrometry with high resolution at both MS and tandem MS levels. In all, we identified 3176 proteins from Mycobacterium tuberculosis representing ~80% of its total predicted gene count. In addition to protein database search, we carried out a genome database search, which led to identification of ~250 novel peptides. Based on these novel genome search-specific peptides, we discovered 41 novel protein coding genes in the H37Rv genome. Using peptide evidence and alternative gene prediction tools, we also corrected 79 gene models. Finally, mass spectrometric data from N terminus-derived peptides confirmed 727 existing annotations for translational start sites while correcting those for 33 proteins. We report creation of a high confidence set of protein coding regions in Mycobacterium tuberculosis genome obtained by high resolution tandem mass-spectrometry at both precursor and fragment detection steps for the first time. This proteogenomic approach should be generally applicable to other organisms whose genomes have already been sequenced for obtaining a more accurate catalogue of protein-coding genes.  相似文献   

17.
李浩  杨东旭  温林冉  郑伟  郭峰 《微生物学报》2021,61(9):2921-2933
[目的] 识别并修正由断裂的标记基因引起的来自宏基因组测序组装的基因组污染度的高估。[方法] 利用纯菌完整基因组构造的模拟数据来分析断裂基因对基因组质量评估的影响以及设定矫正参数,基于nr库的分类学注释结果来判定2个断裂标记基因(即断裂基因对)是否来自于同一标记基因,在剔除断裂冗余基因后重新计算污染度。[结果] 基于纯菌完整基因组模拟打断数据的结果表明基因组片段化程度越高,基因组的污染度越高,并且该现象在分箱获得的微生物基因组草图中也有体现。我们设计的矫正流程能将纯菌模拟打断数据的污染度纠正到完整基因组的水平。在对760个肠道和土壤宏基因组来源的污染度大于0的基因组草图进行矫正后,接近半数基因组的污染度降低,其中43个基因组的污染度降至0。[结论] 我们的流程可以在一定程度上矫正由断裂基因引起的基因组污染度的高估,提高分箱基因组草图的可利用率,并可应用于需求日益增加的宏基因组来源的基因组质量评估中。  相似文献   

18.
瓦氏黄颡鱼线粒体全基因组序列分析及系统进化   总被引:3,自引:0,他引:3  
鲿科鱼类种类繁多, 外形相似, 形态学分类较为困难。为了给鲿科鱼类乃至鲇形目鱼类的系统进化研究积累基础资料, 文章采用参照近缘物种线粒体基因组设计覆盖全基因组引物的方法, 利用16对引物对瓦氏黄颡鱼(Pelteobagrus vachelli)线粒体全基因组进行扩增, PCR产物转化到质粒后测序, 最终获得线粒体基因组全序列, 其全长为16 527 bp, 包括2个rRNA基因、22个tRNA基因、13个编码蛋白质基因和一个非编码控制区。瓦氏黄颡鱼(P. vachelli)线粒体基因组结构和基因排列顺序与现已公布的鲇形目鱼类完全一致, 序列分析表明, 与鲇形目其他种属间具有较高的同源性, 与拟鲿属的同源性最高(91%)。利用鲇形目共4科6属9种及3个外群的线粒体全基因组序列, 从线粒体基因组水平探讨了鲿科鱼类及其在鲇形目的系统进化地位, 结果表明: 鲿科鱼类的瓦氏黄颡鱼(P. vachelli)、黄颡鱼(Pelteobagrus fulvidraco)、光泽黄颡鱼(Pelteobagrus nitidus)及越南拟鲿(Pseudobagrus tokiensis)构成一单系群; 拟鲿属与黄颡鱼属的关系较近; 黄颡鱼属中瓦氏黄颡鱼(P. vachelli)与光泽黄颡鱼(P.nitidus)的关系近于黄颡鱼(P. fulvidraco)。  相似文献   

19.
Chudin  Eugene  Walker  Randal  Kosaka  Alan  Wu  Sue X  Rabert  Douglas  Chang  Thomas K  Kreder  Dirk E 《Genome biology》2002,4(1):1-10

Background

The availability of both mouse and human draft genomes has marked the beginning of a new era of comparative mammalian genomics. The two available mouse genome assemblies, from the public mouse genome sequencing consortium and Celera Genomics, were obtained using different clone libraries and different assembly methods.

Results

We present here a critical comparison of the two latest mouse genome assemblies. The utility of the combined genomes is further demonstrated by comparing them with the human 'golden path' and through a subsequent analysis of a resulting conserved sequence element (CSE) database, which allows us to identify over 6,000 potential novel genes and to derive independent estimates of the number of human protein-coding genes.

Conclusion

The Celera and public mouse assemblies differ in about 10% of the mouse genome. Each assembly has advantages over the other: Celera has higher accuracy in base-pairs and overall higher coverage of the genome; the public assembly, however, has higher sequence quality in some newly finished bacterial artifical chromosome clone (BAC) regions and the data are freely accessible. Perhaps most important, by combining both assemblies, we can get a better annotation of the human genome; in particular, we can obtain the most complete set of CSEs, one third of which are related to known genes and some others are related to other functional genomic regions. More than half the CSEs are of unknown function. From the CSEs, we estimate the total number of human protein-coding genes to be about 40,000. This searchable publicly available online CSEdb will expedite new discoveries through comparative genomics.  相似文献   

20.
Sequences of the complete protein-coding portions of the mitochondrial (mt) genome were analysed for 6 species of cestodes (including hydatid tapeworms and the pork tapeworm) and 5 species of trematodes (blood flukes and liver- and lung-flukes). A near-complete sequence was also available for an additional trematode (the blood fluke Schistosoma malayensis). All of these parasites belong to a large flatworm taxon named the Neodermata. Considerable variation was found in the base composition of the protein-coding genes among these neodermatans. This variation was reflected in statistically-significant differences in numbers of each inferred amino acid between many pairs of species. Both convergence and divergence in nucleotide, and hence amino acid, composition was noted among groups within the Neodermata. Considerable variation in skew (unequal representation of complementary bases on the same strand) was found among the species studied. A pattern is thus emerging of diversity in the mt genome in neodermatans that may cast light on evolution of mt genomes generally.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号