首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
PALI (release 1.2) contains three-dimensional (3-D) structure-dependent sequence alignments as well as structure-based phylogenetic trees of homologous protein domains in various families. The data set of homologous protein structures has been derived by consulting the SCOP database (release 1.50) and the data set comprises 604 families of homologous proteins involving 2739 protein domain structures with each family made up of at least two members. Each member in a family has been structurally aligned with every other member in the same family (pairwise alignment) and all the members in the family are also aligned using simultaneous super-position (multiple alignment). The structural alignments are performed largely automatically, with manual interventions especially in the cases of distantly related proteins, using the program STAMP (version 4.2). Every family is also associated with two dendrograms, calculated using PHYLIP (version 3.5), one based on a structural dissimilarity metric defined for every pairwise alignment and the other based on similarity of topologically equivalent residues. These dendrograms enable easy comparison of sequence and structure-based relationships among the members in a family. Structure-based alignments with the details of structural and sequence similarities, superposed coordinate sets and dendrograms can be accessed conveniently using a web interface. The database can be queried for protein pairs with sequence or structural similarities falling within a specified range. Thus PALI forms a useful resource to help in analysing the relationship between sequence and structure variation at a given level of sequence similarity. PALI also contains over 653 'orphans' (single member families). Using the web interface involving PSI_BLAST and PHYLIP it is possible to associate the sequence of a new protein with one of the families in PALI and generate a phylogenetic tree combining the query sequence and proteins of known 3-D structure. The database with the web interfaced search and dendrogram generation tools can be accessed at http://pauling.mbu.iisc.ernet. in/ approximately pali.  相似文献   

2.
The Los Alamos hepatitis C sequence database   总被引:6,自引:0,他引:6  
MOTIVATION: The hepatitis C virus (HCV) is a significant threat to public health worldwide. The virus is highly variable and evolves rapidly, making it an elusive target for the immune system and for vaccine and drug design. At present, some 30 000 HCV sequences have been published. A central website that provides annotated sequences and analysis tools will be helpful to HCV scientists worldwide. RESULTS: The HCV sequence database collects and annotates sequence data and provides them to the public via a website that contains a user-friendly search interface and a large number of sequence analysis tools, based on the model of the highly regarded Los Alamos HIV database. The HCV sequence database was officially launched in September 2003. Since then, its usage has steadily increased and is now at an average of approximately 280 visits per day from distinct IP addresses. AVAILABILITY: The HCV website can be accessed via http://hcv.lanl.gov and http://hcv-db.org.  相似文献   

3.
Sequencing ribosomal RNA (rRNA) genes is currently the method of choice for phylogenetic reconstruction, nucleic acid based detection and quantification of microbial diversity. The ARB software suite with its corresponding rRNA datasets has been accepted by researchers worldwide as a standard tool for large scale rRNA analysis. However, the rapid increase of publicly available rRNA sequence data has recently hampered the maintenance of comprehensive and curated rRNA knowledge databases. A new system, SILVA (from Latin silva, forest), was implemented to provide a central comprehensive web resource for up to date, quality controlled databases of aligned rRNA sequences from the Bacteria, Archaea and Eukarya domains. All sequences are checked for anomalies, carry a rich set of sequence associated contextual information, have multiple taxonomic classifications, and the latest validly described nomenclature. Furthermore, two precompiled sequence datasets compatible with ARB are offered for download on the SILVA website: (i) the reference (Ref) datasets, comprising only high quality, nearly full length sequences suitable for in-depth phylogenetic analysis and probe design and (ii) the comprehensive Parc datasets with all publicly available rRNA sequences longer than 300 nucleotides suitable for biodiversity analyses. The latest publicly available database release 91 (August 2007) hosts 547 521 sequences split into 461 823 small subunit and 85 689 large subunit rRNAs.  相似文献   

4.
基于形态特征和ITS序列对7个鹅膏菌属菌株的分类鉴定   总被引:7,自引:0,他引:7  
以采自浙江省丽水地区的7个鹅膏菌属菌株作为研究材料,在基于形态特征进行初步鉴定的基础上,对7种鹅膏菌的rDNAITS区段进行克隆测序和序列特征比较分析。进一步对ITS序列进行核酸序列数据库GenBank同源性检索比对,将从GenBank检索获得的9个最相似物种的ITS序列连同7种鹅膏菌的ITS序列一起作系统发育分析。结果表明:基于ITS序列对f6、f9和f493个菌株的分子鉴定支持了基于形态特征的鉴定结果,对f5的分子鉴定不支持形态鉴定的结果,f8为鹅膏菌属内某种,f66为鹅膏菌属内某种,并与Amanitafulva,A.atrofusca,A.orientifulva3种鹅膏菌的亲缘关系较近,f7与另外6种鹅膏菌的亲缘关系相差甚远。研究结果提示基于分子水平上的ITS序列分析不能单方面作为大型真菌分类鉴定的可靠依据,可以作为基于传统形态学分类鉴定的辅助参考依据。  相似文献   

5.
Abstract

Molecular sequence data have become prominent tools for phylogenetic relationship inference, particularly useful in the analysis of highly diverse taxonomic orders. Ribosomal RNA sequences provide markers that can be used in the study of phylogeny, because their function and structure have been conserved to a large extent throughout the evolutionary history of organisms. These sequences are inferred from cloned or enzymatically amplified gene sequences, or determined by direct RNA sequencing. The first step of the phylogenetic interpretation of nucleic acid sequence variations implies proper alignment of corresponding sequences from various organisms. Best alignment based on similarity criteria is greatly reinforced, in the case of ribosomal RNAs, by secondary structure homologies. Distance matrix methods to infer evolutionary trees are based on the assumption that the phylogenetic distance between each pair of organisms is proportional to the number of nucleotide substitution events. Computed tree inference methods usually take into consideration the possibility of unequal mutation rates among lineages. Divergence times can be estimated on the tree, provided that at least one lineage has been dated by fossil records. We have utilized this approach based on ribosomal RNA sequence comparison to investigate the phylogenetic relationship between dinoflagellated and other eukaryote protists, and to refine controverse phylogenies of the class Dinophycae.  相似文献   

6.

Background  

Availability of high-resolution RNA crystal structures for the 30S and 50S ribosomal subunits and the subsequent validation of comparative secondary structure models have prompted the biologists to use three-dimensional structure of ribosomal RNA (rRNA) for evaluating sequence alignments of rRNA genes. Furthermore, the secondary and tertiary structural features of rRNA are highly useful and successfully employed in designing rRNA targeted oligonucleotide probes intended for in situ hybridization experiments. RNA3D, a program to combine sequence alignment information with three-dimensional structure of rRNA was developed. Integration into ARB software package, which is used extensively by the scientific community for phylogenetic analysis and molecular probe designing, has substantially extended the functionality of ARB software suite with 3D environment.  相似文献   

7.
We describe PRIMROSE, a computer program for identifying 16S rRNA probes and PCR primers for use as phylogenetic and ecological tools in the identification and enumeration of bacteria. PRIMROSE is designed to use data from the Ribosomal Database Project (RDP) to find potentially useful oligonucleotides with up to two degenerate positions. The taxonomic range of these, and other existing oligonucleotides, can then be explored, allowing for the rapid identification of suitable oligonucleotides. PRIMROSE includes features to allow user-defined sequence databases to be used. An in silico trial of the program using the RDP database identified oligonucleotides that described their target taxa with a degree of accuracy far greater than that of equivalent currently used oligonucleotides. We identify oligonucleotides for subdivisions of the Proteobacteria and for the Cytophaga–Flexibacter–Bacteroides (CFB) division. These oligonucleotides describe up to 94.7% of their target taxon with fewer than 50 non-target hits, and the authors recommend that they be investigated further. A comparison with PROBE DESIGN within the ARB software package shows that PRIMROSE is capable of identifying oligonucleotides with a higher specificity. PRIMROSE has an intuitive graphical user interface and runs on the Microsoft Windows 95/NT/2000 operating systems. It is open source and is freely available from the authors.  相似文献   

8.
9.
Bacterial phylogeny based on 16S and 23S rRNA sequence analysis   总被引:28,自引:0,他引:28  
Abstract: Molecular phylogeny increasingly supports the understanding of organismal relationships and provides the basis for the classification of microorganisms according to their natural affiliations. Comparative sequence analysis of ribosomal RNAs or the corresponding genes currently is the most widely used approach for the reconstruction of microbial phylogeny. The highly and less conserved primary and higher order structure elements of rRNAs document the history of microbial evolution and are informative for definite phylogenetic levels. An optimal alignment of the primary structures and a careful data selection are prerequisites for reliable phylogenetic conclusions. rRNA based phylogenetic trees can be reconstructed and the significance of their topologies evaluated by applying distance, maximum parsimony and maximum likelihood methods of phylogeny inference in comparison, and by fortuitous or directed resampling of the data set. Phylogenetic trees based on almost equivalent data sets of bacterial 23S and 16S rRNAs are in good agreement and their overall topologies are supported by alternative phylogenetic markers such as elongation factors and ATPase subunits. Besides their phylogenetic information content, the differently conserved primary structure regions of rRNAs provide target sites for specific hybridization probes which have been proven to be powerful tools for the identification of microbes on the basis of their phylogenetic relationships.  相似文献   

10.
We describe the further development of a widely used package of DNA and protein sequence analysis programs for microcomputers (1,2,3). The package now provides a screen oriented user interface, and an enhanced working environment with powerful formatting, disk access, and memory management tools. The new GenBank floppy disk database is supported transparently to the user and a similar version of the NBRF protein database is provided. The programs can use sequence file annotation to automatically annotate printouts and translate or extract specified regions from sequences by name. The sequence comparison programs can now perform a 5000 X 5000 bp analysis in 12 minutes on an IBM PC. A program to locate potential protein coding regions in nucleic acids, a digitizer interface, and other additions are also described.  相似文献   

11.
T-REX (tree and reticulogram reconstruction) is an application to reconstruct phylogenetic trees and reticulation networks from distance matrices. The application includes a number of tree fitting methods like NJ, UNJ or ADDTREE which have been very popular in phylogenetic analysis. At the same time, the software comprises several new methods of phylogenetic analysis such as: tree reconstruction using weights, tree inference from incomplete distance matrices or modeling a reticulation network for a collection of objects or species. T-REX also allows the user to visualize obtained tree or network structures using Hierarchical, Radial or Axial types of tree drawing and manipulate them interactively. AVAILABILITY: T-REX is a freeware package available online at: http://www.fas.umontreal.ca/biol/casgrain/en/labo/t-rex  相似文献   

12.
序列分析可以获取蕴藏在简单序列中的生物信息,是生物信息分析的基础。通过生物大分子序列差异分析构建的系统树则可为我们提供可视化的物种间的进化关系。MATLAB7.X生物信息工具箱包含了几个图形用户界面设计的专用分析工具,这些专用分析工具交互性好,易于使用。借助于这些分析工具,用户不仅可以对基因序列进行分析查看并能进行相对应的氨基酸序列分析,还可以方便快捷地构建系统发育树。即使用户不会编程也可以进行序列分析和系统发育分析的研究,大大地提高了分析的效率。本文详细介绍了序列分析工具Seqtool和系统发育分析工具Phytreetool在序列分析及系统发育树构建方面的应用,所有操作方便快捷,分析结果可视化程度高。  相似文献   

13.
The signing authors together with the journal Systematic and Applied Microbiology (SAM) have started an ambitious project that has been conceived to provide a useful tool especially for the scientific microbial taxonomist community. The aim of what we have called "The All-Species Living Tree" is to reconstruct a single 16S rRNA tree harboring all sequenced type strains of the hitherto classified species of Archaea and Bacteria. This tree is to be regularly updated by adding the species with validly published names that appear monthly in the Validation and Notification lists of the International Journal of Systematic and Evolutionary Microbiology. For this purpose, the SAM executive editors, together with the responsible teams of the ARB, SILVA, and LPSN projects (www.arb-home.de, www.arb-silva.de, and www.bacterio.cict.fr, respectively), have prepared a 16S rRNA database containing over 6700 sequences, each of which represents a single type strain of a classified species up to 31 December 2007. The selection of sequences had to be undertaken manually due to a high error rate in the names and information fields provided for the publicly deposited entries. In addition, from among the often occurring multiple entries for a single type strain, the best-quality sequence was selected for the project. The living tree database that SAM now provides contains corrected entries and the best-quality sequences with a manually checked alignment. The tree reconstruction has been performed by using the maximum likelihood algorithm RAxML. The tree provided in the first release is a result of the calculation of a single dataset containing 9975 single entries, 6728 corresponding to type strain gene sequences, as well as 3247 additional high-fquality sequences to give robustness to the reconstruction. Trees are dynamic structures that change on the basis of the quality and availability of the data used for their calculation. Therefore, the addition of new type strain sequences in further subsequent releases may help to resolve certain branching orders that appear ambiguous in this first release. On the web sites: www.elsevier.de/syapm and www.arb-silva.de/living-tree, the All-Species Living Tree team will release a regularly updated database compatible with the ARB software environment containing the whole 16S rRNA dataset used to reconstruct "The All-Species Living Tree". As a result, the latest reconstructed phylogeny will be provided. In addition to the ARB file, a readable multi-FASTA universal sequence editor file with the complete alignment will be provided for those not using ARB. There is also a complete set of supplementary tables and figures illustrating the selection procedure and its outcome. It is expected that the All-Species Living Tree will help to improve future classification efforts by simplifying the selection of the correct type strain sequences. For queries, information updates, remarks on the dataset or tree reconstructions shown, a contact email address has been created (living-tree@arb-silva.de). This provides an entry point for anyone from the scientific community to provide additional input for the construction and improvement of the first tree compiling all sequenced type strains of all prokaryotic species for which names had been validly published.  相似文献   

14.
We present the ggtreeExtra package for visualizing heterogeneous data with a phylogenetic tree in a circular or rectangular layout (https://www.bioconductor.org/packages/ggtreeExtra). The package supports more data types and visualization methods than other tools. It supports using the grammar of graphics syntax to present data on a tree with richly annotated layers and allows evolutionary statistics inferred by commonly used software to be integrated and visualized with external data. GgtreeExtra is a universal tool for tree data visualization. It extends the applications of the phylogenetic tree in different disciplines by making more domain-specific data to be available to visualize and interpret in the evolutionary context.  相似文献   

15.
apTreeshape: statistical analysis of phylogenetic tree shape   总被引:3,自引:0,他引:3  
apTreeshape is a R package dedicated to simulation and analysis of phylogenetic tree topologies using statistical imbalance measures. It is a companion library of the R package 'ape', which provides additional functions for reading, plotting, manipulating phylogenetic trees and for connecting to public phylogenetic tree databases. One strength of the package is to include appropriate corrections of classical shape statistics as well as new tests based on the statistical theory of likelihood ratios.  相似文献   

16.
The internal transcribed spacer 2 (ITS2) has been used as a phylogenetic marker for more than two decades. As ITS2 research mainly focused on the very variable ITS2 sequence, it confined this marker to low-level phylogenetics only. However, the combination of the ITS2 sequence and its highly conserved secondary structure improves the phylogenetic resolution1 and allows phylogenetic inference at multiple taxonomic ranks, including species delimitation2-8.The ITS2 Database9 presents an exhaustive dataset of internal transcribed spacer 2 sequences from NCBI GenBank11 accurately reannotated10. Following an annotation by profile Hidden Markov Models (HMMs), the secondary structure of each sequence is predicted. First, it is tested whether a minimum energy based fold12 (direct fold) results in a correct, four helix conformation. If this is not the case, the structure is predicted by homology modeling13. In homology modeling, an already known secondary structure is transferred to another ITS2 sequence, whose secondary structure was not able to fold correctly in a direct fold.The ITS2 Database is not only a database for storage and retrieval of ITS2 sequence-structures. It also provides several tools to process your own ITS2 sequences, including annotation, structural prediction, motif detection and BLAST14 search on the combined sequence-structure information. Moreover, it integrates trimmed versions of 4SALE15,16 and ProfDistS17 for multiple sequence-structure alignment calculation and Neighbor Joining18 tree reconstruction. Together they form a coherent analysis pipeline from an initial set of sequences to a phylogeny based on sequence and secondary structure.In a nutshell, this workbench simplifies first phylogenetic analyses to only a few mouse-clicks, while additionally providing tools and data for comprehensive large-scale analyses.  相似文献   

17.
Networks of evolving genotypes can be constructed from the worldwide time-resolved genotyping of pathogens like influenza viruses. Such genotype networks are graphs where neighbouring vertices (viral strains) differ in a single nucleotide or amino acid. A rich trove of network analysis methods can help understand the evolutionary dynamics reflected in the structure of these networks. Here, I analyse a genotype network comprising hundreds of influenza A (H3N2) haemagglutinin genes. The network is rife with cycles that reflect non-random parallel or convergent (homoplastic) evolution. These cycles also show patterns of sequence change characteristic for strong and local evolutionary constraints, positive selection and mutation-limited evolution. Such cycles would not be visible on a phylogenetic tree, illustrating that genotype network analysis can complement phylogenetic analyses. The network also shows a distinct modular or community structure that reflects temporal more than spatial proximity of viral strains, where lowly connected bridge strains connect different modules. These and other organizational patterns illustrate that genotype networks can help us study evolution in action at an unprecedented level of resolution.  相似文献   

18.
Aromatic prenyltransferases transfer prenyl moieties onto aromatic acceptor molecules, catalyzing an electrophilic substitution of the aromatic ring under formation of carbon–carbon bonds. They give rise to an astounding diversity of primary and secondary metabolites in plants, fungi and bacteria. This review describes a recently discovered family of aromatic prenyltransferases. The structure of these enyzmes shows a type of β/α fold with antiparallel β strands. Due to the α-β-β-α architecture of this fold, this group of enzymes was designated as ABBA prenyltransferases. They lack the (N/D)DxxD motif which is characteristic for many other prenyltransferases.At present, 14 genes with sequence similarity to ABBA prenyltransferases can be identified in the database. A phylogenetic analysis of these genes separates them into two clades. One of them comprises the 4-hydroxyphenylpyruvate 3-dimethylallyltransferases CloQ and NovQ involved in aminocoumarin antibiotic biosynthesis in Streptomyces strains, as well as four genes of unknown function from fungal genomes. The other clade comprises genes involved in the biosynthesis of prenylated naphthoquinones and prenylated phenazines in different streptomycetes. ABBA prenyltransferases are soluble biocatalysts which can easily be obtained as homogeneous proteins in significant amounts. Their substrates are accommodated in a surprisingly spacious central cavity which explains their promiscuity for different aromatic substrates. Therefore, the enzymes of this family represent attractive tools for the chemoenzymatic synthesis of bioactive molecules.  相似文献   

19.
The GenBank database contains essentially all of the nucleotide sequence data generated for published molecular systematic studies, but for the majority of taxa these data remain sparse. GenBank has value for phylogenetic methods that leverage data–mining and rapidly improving computational methods, but the limits imposed by the sparse structure of the data are not well understood. Here we present a tree representing 13,093 land plant genera—an estimated 80% of extant plant diversity—to illustrate the potential of public sequence data for broad phylogenetic inference in plants, and we explore the limits to inference imposed by the structure of these data using theoretical foundations from phylogenetic data decisiveness. We find that despite very high levels of missing data (over 96%), the present data retain the potential to inform over 86.3% of all possible phylogenetic relationships. Most of these relationships, however, are informed by small amounts of data—approximately half are informed by fewer than four loci, and more than 99% are informed by fewer than fifteen. We also apply an information theoretic measure of branch support to assess the strength of phylogenetic signal in the data, revealing many poorly supported branches concentrated near the tips of the tree, where data are sparse and the limiting effects of this sparseness are stronger. We argue that limits to phylogenetic inference and signal imposed by low data coverage may pose significant challenges for comprehensive phylogenetic inference at the species level. Computational requirements provide additional limits for large reconstructions, but these may be overcome by methodological advances, whereas insufficient data coverage can only be remedied by additional sampling effort. We conclude that public databases have exceptional value for modern systematics and evolutionary biology, and that a continued emphasis on expanding taxonomic and genomic coverage will play a critical role in developing these resources to their full potential.  相似文献   

20.
Identification of ectomycorrhizal (ECM) fungi is often achieved through comparisons of ribosomal DNA internal transcribed spacer (ITS) sequences with accessioned sequences deposited in public databases. A major problem encountered is that annotation of the sequences in these databases is not always complete or trustworthy. In order to overcome this deficiency, we report on UNITE, an open-access database. UNITE comprises well annotated fungal ITS sequences from well defined herbarium specimens that include full herbarium reference identification data, collector/source and ecological data. At present UNITE contains 758 ITS sequences from 455 species and 67 genera of ECM fungi. UNITE can be searched by taxon name, via sequence similarity using blastn, and via phylogenetic sequence identification using galaxie. Following implementation, galaxie performs a phylogenetic analysis of the query sequence after alignment either to pre-existing generic alignments, or to matches retrieved from a blast search on the UNITE data. It should be noted that the current version of UNITE is dedicated to the reliable identification of ECM fungi. The UNITE database is accessible through the URL http://unite.zbi.ee  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号