首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 531 毫秒
1.
The Organelle Genome Database Project (GOBASE).   总被引:2,自引:1,他引:1       下载免费PDF全文
The taxonomically broad organelle genome database (GOBASE) organizes and integrates diverse data related to organelles (mitochondria and chloroplasts). The current version of GOBASE focuses on the mitochondrial subset of data and contains molecular sequences, RNA secondary structures and genetic maps, as well as taxonomic information for all eukaryotic species represented. The database has been designed so that complex biological queries, especially ones posed in a comparative genomics context, are supported. GOBASE has been implemented as a relational database with a web-based user interface (http://megasun.bch.umontreal.ca/gobase/gobas e.html ). Custom software tools have been written in house to assist in the population of the database, data validation, nomenclature standardization and front-end design. The database is fully operational and publicly accessible via the World Wide Web, allowing interactive browsing, sophisticated searching and easy downloading of data.  相似文献   

2.
GOBASE is a relational database containing integrated sequence, RNA secondary structure and biochemical and taxonomic information about organelles. GOBASE release 6 (summer 2002) contains over 130 000 mitochondrial sequences, an increase of 37% over the previous release, and more than 30 000 chloroplast sequences in a new auxiliary database. To handle this flood of new data, we have designed and implemented GOpop, a Java system for population and verification of the database. We have also implemented a more powerful and flexible user interface using the PHP programming language. http://megasun.bch.umontreal.ca/gobase/gobase.html.  相似文献   

3.
Mamit-tRNA (http://mamit-tRNA.u-strasbg.fr), a database for mammalian mitochondrial genomes, has been developed for deciphering structural features of mammalian mitochondrial tRNAs and as a helpful tool in the frame of human diseases linked to point mutations in mitochondrial tRNA genes. To accommodate the rapid growing availability of fully sequenced mammalian mitochondrial genomes, Mamit-tRNA has implemented a relational database, and all annotated tRNA genes have been curated and aligned manually. System administrative tools have been integrated to improve efficiency and to allow real-time update (from GenBank Database at NCBI) of available mammalian mitochondrial genomes. More than 3000 tRNA gene sequences from 150 organisms are classified into 22 families according to the amino acid specificity as defined by the anticodon triplets and organized according to phylogeny. Each sequence is displayed linearly with color codes indicating secondary structural domains and can be converted into a printable two-dimensional (2D) cloverleaf structure. Consensus and typical 2D structures can be extracted for any combination of primary sequences within a given tRNA specificity on the basis of phylogenetic relationships or on the basis of structural peculiarities. Mamit-tRNA further displays static individual 2D structures of human mitochondrial tRNA genes with location of polymorphisms and pathology-related point mutations. The site offers also a table allowing for an easy conversion of human mitochondrial genome nucleotide numbering into conventional tRNA numbering. The database is expected to facilitate exploration of structure/function relationships of mitochondrial tRNAs and to assist clinicians in the frame of pathology-related mutation assignments.  相似文献   

4.
Two families of fungal mitochondrial introns that include all known sequences have been recognized. These families are now extended to incorporate a plant mitochondrial intron and several introns in chloroplast- and nuclear-encoded rRNA and tRNA precursors. Members of the same family share distinctive sequence stretches and a number of potential RNA secondary structures that would bring these stretches and the intron-exon junctions into relatively close proximity. Using several of these introns which have been extensively studied by either biochemical or genetic means, an attempt is made to integrate the available data into a common picture.  相似文献   

5.
There are more than 200 completed genomes and over 1 million nonredundant sequences in public repositories. Although the structural data are more sparse (approximately 13,000 nonredundant structures solved to date), several powerful sequence-based methodologies now allow these structures to be mapped onto related regions in a significant proportion of genome sequences. We review a number of publicly available strategies for providing structural annotations for genome sequences, and we describe the protocol adopted to provide CATH structural annotations for completed genomes. In particular, we assess the performance of several sequence-based protocols employing Hidden Markov model (HMM) technologies for superfamily recognition, including a new approach (SAMOSA [sequence augmented models of structure alignments]) that exploits multiple structural alignments from the CATH domain structure database when building the models. Using a data set of remote homologs detected by structure comparison and manually validated in CATH, a single-seed HMM library was able to recognize 76% of the data set. Including the SAMOSA models in the HMM library showed little gain in homolog recognition, although a slight improvement in alignment quality was observed for very remote homologs. However, using an expanded 1D-HMM library, CATH-ISL increased the coverage to 86%. The single-seed HMM library has been used to annotate the protein sequences of 120 genomes from all three major kingdoms, allowing up to 70% of the genes or partial genes to be assigned to CATH superfamilies. It has also been used to recruit sequences from Swiss-Prot and TrEMBL into CATH domain superfamilies, expanding the CATH database eightfold.  相似文献   

6.
We analyzed the current status (as of the end of August 2008) of human mitochondrial genomes deposited in GenBank, amounting to 5140 complete or coding-region sequences, in order to present an overall picture of the diversity present in the mitochondrial DNA of the global human population. To perform this task, we developed mtDNA-GeneSyn, a computer tool that identifies and exhaustedly classifies the diversity present in large genetic data sets. The diversity observed in the 5140 human mitochondrial genomes was compared with all possible transitions and transversions from the standard human mitochondrial reference genome. This comparison showed that tRNA and rRNA secondary structures have a large effect in limiting the diversity of the human mitochondrial sequences, whereas for the protein-coding genes there is a bias toward less variation at the second codon positions. The analysis of the observed amino acid variations showed a tolerance of variations that convert between the amino acids V, I, A, M, and T. This defines a group of amino acids with similar chemical properties that can interconvert by a single transition.  相似文献   

7.
New directions in biology are being driven by the complete sequencing of genomes, which has given us the protein repertoires of diverse organisms from all kingdoms of life. In tandem with this accumulation of sequence data, worldwide structural genomics initiatives, advanced by the development of improved technologies in X-ray crystallography and NMR, are expanding our knowledge of structural families and increasing our fold libraries. Methods for detecting remote sequence similarities have also been made more sensitive and this means that we can map domains from these structural families onto genome sequences to understand how these families are distributed throughout the genomes and reveal how they might influence the functional repertoires and biological complexities of the organisms. We have used robust protocols to assign sequences from completed genomes to domain structures in the CATH database, allowing up to 60% of domain sequences in these genomes, depending on the organism, to be assigned to a domain family of known structure. Analysis of the distribution of these families throughout bacterial genomes identified more than 300 universal families, some of which had expanded significantly in proportion to genome size. These highly expanded families are primarily involved in metabolism and regulation and appear to make major contributions to the functional repertoire and complexity of bacterial organisms. When comparisons are made across all kingdoms of life, we find a smaller set of universal domain families (approx. 140), of which families involved in protein biosynthesis are the largest conserved component. Analysis of the behaviour of other families reveals that some (e.g. those involved in metabolism, regulation) have remained highly innovative during evolution, making it harder to trace their evolutionary ancestry. Structural analyses of metabolic families provide some insights into the mechanisms of functional innovation, which include changes in domain partnerships and significant structural embellishments leading to modulation of active sites and protein interactions.  相似文献   

8.

Background

Pseudoscorpions are chelicerates and have historically been viewed as being most closely related to solifuges, harvestmen, and scorpions. No mitochondrial genomes of pseudoscorpions have been published, but the mitochondrial genomes of some lineages of Chelicerata possess unusual features, including short rRNA genes and tRNA genes that lack sequence to encode arms of the canonical cloverleaf-shaped tRNA. Additionally, some chelicerates possess an atypical guanine-thymine nucleotide bias on the major coding strand of their mitochondrial genomes.

Results

We sequenced the mitochondrial genomes of two divergent taxa from the chelicerate order Pseudoscorpiones. We find that these genomes possess unusually short tRNA genes that do not encode cloverleaf-shaped tRNA structures. Indeed, in one genome, all 22 tRNA genes lack sequence to encode canonical cloverleaf structures. We also find that the large ribosomal RNA genes are substantially shorter than those of most arthropods. We inferred secondary structures of the LSU rRNAs from both pseudoscorpions, and find that they have lost multiple helices. Based on comparisons with the crystal structure of the bacterial ribosome, two of these helices were likely contact points with tRNA T-arms or D-arms as they pass through the ribosome during protein synthesis. The mitochondrial gene arrangements of both pseudoscorpions differ from the ancestral chelicerate gene arrangement. One genome is rearranged with respect to the location of protein-coding genes, the small rRNA gene, and at least 8 tRNA genes. The other genome contains 6 tRNA genes in novel locations. Most chelicerates with rearranged mitochondrial genes show a genome-wide reversal of the CA nucleotide bias typical for arthropods on their major coding strand, and instead possess a GT bias. Yet despite their extensive rearrangement, these pseudoscorpion mitochondrial genomes possess a CA bias on the major coding strand. Phylogenetic analyses of all 13 mitochondrial protein-coding gene sequences consistently yield trees that place pseudoscorpions as sister to acariform mites.

Conclusion

The well-supported phylogenetic placement of pseudoscorpions as sister to Acariformes differs from some previous analyses based on morphology. However, these two lineages share multiple molecular evolutionary traits, including substantial mitochondrial genome rearrangements, extensive nucleotide substitution, and loss of helices in their inferred tRNA and rRNA structures.  相似文献   

9.
F Michel  A Jacquier  B Dujon 《Biochimie》1982,64(10):867-881
The complete sequences of nine Saccharomyces cerevisiae mitochondrial introns, six of which carry long open reading frames, have already been published. We have recently determined the sequence of an intron in the large ribosomal mitochondrial RNA of Kluyveromyces thermotolerans (Jacquier et al., in preparation), which we found to be closely related to its S. cerevisiae counterpart. This latter result prompted us to undertake a systematic search for possible homologous elements in the other, available sequences with the help of an original computer program. A previously unsuspected wealth of evolutionarily conserved sequences and secondary structures was thus uncovered. Seven at least of the available sequences may be folded up into elaborate secondary structure models, the cores of which are nearly identical. These models result in bringing together the exon-intron junctions into relatively close spatial proximity and looping out either all or most of the sequences in open reading frame, when present. These results and their possible implications with respect to the mechanism of splicing are discussed in the light of available genetic and biochemical data.  相似文献   

10.
The cloverleaf secondary structure of transfer RNA (tRNA) is highly conserved across all forms of life. Here, we provide sequence data and inferred secondary structures for all tRNA genes from 8 new arachnid mitochondrial genomes, including representatives from 6 orders. These data show remarkable reductions in tRNA gene sequences, indicating that T-arms are missing from many of the 22 tRNAs in the genomes of 4 out of 7 orders of arachnids. Additionally, all opisthothele spiders possess some tRNA genes that lack sequences that could form well-paired aminoacyl acceptor stems. We trace the evolution of T-arm loss onto phylogenies of arachnids and show that a genome-wide propensity to lose sequences that encode canonical cloverleaf structures likely evolved multiple times within arachnids. Mapping of structural characters also shows that certain tRNA genes appear more evolutionarily prone to lose the sequence coding for the T-arm and that once a T-arm is lost, it is not regained. We use tRNA structural data to construct a phylogeny of arachnids and find high bootstrap support for a clade that is not supported in phylogenies that are based on more traditional morphological characters. Together, our data demonstrate variability in structural evolution among different tRNAs as well as evidence for parallel evolution of the loss of sequence coding for tRNA arms within an ancient and diverse group of animals.  相似文献   

11.
5S Ribosomal RNA (5S rRNA) is a universal component of ribosomes, and the corresponding gene is easily identified in archaeal, bacterial and nuclear genome sequences. However, organelle gene homologs (rrn5) appear to be absent from most mitochondrial and several chloroplast genomes. Here, we re-examine the distribution of organelle rrn5 by building mitochondrion- and plastid-specific covariance models (CMs) with which we screened organelle genome sequences. We not only recover all organelle rrn5 genes annotated in GenBank records, but also identify more than 50 previously unrecognized homologs in mitochondrial genomes of various stramenopiles, red algae, cryptomonads, malawimonads and apusozoans, and surprisingly, in the apicoplast (highly derived plastid) genomes of the coccidian pathogens Toxoplasma gondii and Eimeria tenella. Comparative modeling of RNA secondary structure reveals that mitochondrial 5S rRNAs from brown algae adopt a permuted triskelion shape that has not been seen elsewhere. Expression of the newly predicted rrn5 genes is confirmed experimentally in 10 instances, based on our own and published RNA-Seq data. This study establishes that particularly mitochondrial 5S rRNA has a much broader taxonomic distribution and a much larger structural variability than previously thought. The newly developed CMs will be made available via the Rfam database and the MFannot organelle genome annotator.  相似文献   

12.
The National Center for Biotechnology Information (NCBI) integrates data from more than 20 biological databases through a flexible search and retrieval system called Entrez. A core Entrez database, Entrez Nucleotide, includes GenBank and is tightly linked to the NCBI Taxonomy database, the Entrez Protein database, and the scientific literature in PubMed. A suite of more specialized databases for genomes, genes, gene families, gene expression, gene variation, and protein domains dovetails with the core databases to make Entrez a powerful system for genomic research. Linked to the full range of Entrez databases is the NCBI Map Viewer, which displays aligned genetic, physical, and sequence maps for eukaryotic genomes including those of many plants. A specialized plant query page allow maps from all plant genomes covered by the Map Viewer to be searched in tandem to produce a display of aligned maps from several species. PlantBLAST searches against the sequences shown in the Map Viewer allow BLAST alignments to be viewed within a genomic context. In addition, precomputed sequence similarities, such as those for proteins offered by BLAST Link, enable fluid navigation from unannotated to annotated sequences, quickening the pace of discovery. NCBI Web pages for plants, such as Plant Genome Central, complete the system by providing centralized access to NCBI's genomic resources as well as links to organism-specific Web pages beyond NCBI.  相似文献   

13.
Database on the structure of large ribosomal subunit RNA.   总被引:3,自引:0,他引:3       下载免费PDF全文
Our database on large ribosomal subunit RNA contained 334 sequences in July, 1995. All sequences in the database are aligned, taking into account secondary structure. The aligned sequences are provided, together with incorporated secondary structure information, in several computer-readable formats. These data can easily be obtained through the World Wide Web. The files in the database are also available via anonymous ftp.  相似文献   

14.
Searching for IRES   总被引:13,自引:3,他引:10  
  相似文献   

15.
Small RNA database.   总被引:2,自引:0,他引:2       下载免费PDF全文
The small RNA database is a compilation of all the small size RNA sequences available to date, including nuclear, nucleolar, cytoplasmic and mitochondrial small RNAs from eukaryotic organisms and small RNAs from prokaryotic cells as well as viruses. Currently, about 600 small RNA sequences are in our database. It also gives the sources of individual RNAs and their GenBank accession numbers. The small RNA database can be accessed through WWW(World Wide Web). Our WWW URL address is: http://mbcr.bcm.tmc.edu/smallRNA/smallrna. html . The new small RNA sequences published since our last compilation are listed in this paper.  相似文献   

16.
17.
Interval-based distance function for identifying RNA structure candidates   总被引:1,自引:0,他引:1  
Many clustering approaches have been developed for biological data analysis, however, the application of traditional clustering algorithms for RNA structure data analysis is still a challenging issue. This arises from the existence of complex secondary structures while clustering. One of the most critical issues of cluster analysis is the development of appropriate distance measures in high dimensional space. The traditional distance measures focus on scale issues, but ignores the correlation between two values. This article develops a novel interval-based distance (Hausdorff) measure for computing the similarity between characterized structures. Three relationships including perfect match, partially overlapped and non-overlapped are considered. Finally, we demonstrate the methods by analyzing a data set of RNA secondary structures from the Rfam database.  相似文献   

18.
J M Burke 《Gene》1988,73(2):273-294
In vivo and in vitro genetic techniques have been widely used to investigate the structure-function relationships and requirements for splicing of group-I introns. Analyses of group-I introns from extremely diverse genetic systems, including fungal mitochondria, protozoan nuclei, and bacteriophages, have yielded results which are complementary and highly consistent. In vivo genetic studies of fungal mitochondrial systems have served to identify cis-acting sequences within mitochondrial introns, and trans-acting protein products of mitochondrial and nuclear genes which are important for splicing, and to show that some mitochondrial introns are mobile genetic elements. In vitro genetic studies of the self-splicing intron within the Tetrahymena thermophila nuclear large ribosomal RNA precursor (Tetrahymena LSU intron) have been used to examine essential and nonessential RNA sequences and structures in RNA-catalyzed splicing. In vivo and in vitro genetic analysis of the intron within the bacteriophage T4 td gene has permitted the detailed examination of mutant phenotypes by analyzing splicing in vivo and self-splicing in vitro. The genetic studies combined with phylogenetic analysis of intron structure based on comparative nucleotide sequence data [Cech 73 (1988) 259-271] and with biochemical data obtained from in vitro splicing experiments have resulted in significant advances in understanding the biology and chemistry of group-I introns.  相似文献   

19.
Many raw biological sequence data have been generated by the human genome project and related efforts. The understanding of structural information encoded by biological sequences is important to acquire knowledge of their biochemical functions but remains a fundamental challenge. Recent interest in RNA regulation has resulted in a rapid growth of deposited RNA secondary structures in varied databases. However, a functional classification and characterization of the RNA structure have only been partially addressed. This article aims to introduce a novel interval-based distance metric for structure-based RNA function assignment. The characterization of RNA structures relies on distance vectors learned from a collection of predicted structures. The distance measure considers the intersected, disjoint, and inclusion between intervals. A set of RNA pseudoknotted structures with known function are applied and the function of the query structure is determined by measuring structure similarity. This not only offers sequence distance criteria to measure the similarity of secondary structures but also aids the functional classification of RNA structures with pesudoknots.  相似文献   

20.
GOBASE is a relational database that integrates data associated with mitochondria and chloroplasts. The most important data in GOBASE, i. e., molecular sequences and taxonomic information, are obtained from the public sequence data repository at the National Center for Biotechnology Information (NCBI), and are validated by our experts. Maintaining a curated genomic database comes with a towering labor cost, due to the shear volume of available genomic sequences and the plethora of annotation errors and omissions in records retrieved from public repositories. Here we describe our approach to increase automation of the database population process, thereby reducing manual intervention. As a first step, we used Unified Modeling Language (UML) to construct a list of potential errors. Each case was evaluated independently, and an expert solution was devised, and represented as a diagram. Subsequently, the UML diagrams were used as templates for writing object-oriented automation programs in the Java programming language.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号