首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We developed novel programs for displaying and analyzing the transmembrane alpha-helical segments (TMSs) in the aligned sequences of homologous integral membrane proteins. TMS_ALIGN predicts the positions of putative TMSs in multiply aligned protein sequences and graphically shows the TMSs in the alignment. TMS_SPLIT (1). predicts the positions of TMSs for each sequence; (2). allows a user to select proteins with a specified number of TMSs, and (3). splits the sequences into groups of TMSs of equal numbers. TMS_CUT works like TMS_SPLIT, but it can cut sequences with any combination of TMSs. The BASS program similarly allows comparison of protein repeat elements, equivalent to TMS_SPLIT plus IC, but it provides the comparison data expressed in BLAST E values. These programs, together with the IntraCompare program, facilitate the identification of repeat sequences in integral membrane proteins. They also facilitate the estimation of protein topology and the determination of evolutionary pathways.  相似文献   

2.
MOTIVATION: Evolutionary relationships of proteins have long been derived from the alignment of protein sequences. But from the view of function, most restraints of evolutionary divergence operate at the level of tertiary structure. It has been demonstrated that quantitative measures of dissimilarity in families of structurally similar proteins can be applied to the construction of trees from a comparison of their three-dimensional structures. However, no convenient tool is publicly available to carry out such analyses. RESULTS: We developed STRUCLA (STRUcture CLAssification), a WWW tool for generation of trees based on evolutionary distances inferred from protein structures according to various methods. The server takes as an input a list of PDB files or the initial alignment of protein coordinates provided by the user (for instance exported from SWISS PDB VIEWER). The user specifies the distance cutoff and selects the distance measures. The server returns series of unrooted trees in the NEXUS format and corresponding distance matrices, as well as a consensus tree. The results can be used as an alternative and a complement to a fixed hierarchy of current protein structure databases. It can complement sequence-based phylogenetic analysis in the 'twilight zone of homology', where amino acid sequences are too diverged to provide reliable relationships.  相似文献   

3.
The design of synthetic genes   总被引:1,自引:1,他引:0       下载免费PDF全文
Computer programs are described that aid in the design of synthetic genes coding for proteins that are targets of a research program in site directed mutagenesis. These programs "reverse-translate" protein sequences into general nucleic acid sequences (those where codons have not yet been selected), map restriction sites into general DNA sequences, identify points in the synthetic gene where unique restriction sites can be introduced, and assist in the design of genes coding for hybrids and evolutionary intermediates between homologous proteins. Application of these programs therefore facilitates the use of modular mutagenesis to create variants of proteins, and the implementation of evolutionary guidance as a strategy for selecting mutants.  相似文献   

4.
HCVDB   总被引:2,自引:0,他引:2  
To date, more than 30 000 hepatitis C virus (HCV) sequences have been deposited in the generalist databases DNA Data Bank of Japan (DDBJ), EMBL Nucleotide Sequence Database (EMBL) and GenBank. The main difficulties with HCV sequences in these databases are their retrieval, annotation and analyses. To help HCV researchers face the increasing needs of HCV sequence analyses, we developed a specialised database of computer-annotated HCV sequences, called HCVDB. HCVDB is re-built every month from an up-to-date EMBL database by an automated process. HCVDB provides key data about the HCV sequences (e.g. genotype, genomic region, protein names and functions, known 3-dimensional structures) and ensures consistency of the annotations, which enables reliable keyword queries. The database is highly integrated with sequence and structure analysis tools and the SRS (LION bioscience) keywords query system. Thus, any user can extract subsets of sequences matching particular criteria or enter their own sequences and analyse them with various bioinformatics programs available on the same server. AVAILABILITY: HCVDB is available from http://hepatitis.ibcp.fr.  相似文献   

5.
Reconstructing the evolutionary history of protein sequences will provide a better understanding of divergence mechanisms of protein superfamilies and their functions. Long-term protein evolution often includes dynamic changes such as insertion, deletion, and domain shuffling. Such dynamic changes make reconstructing protein sequence evolution difficult and affect the accuracy of molecular evolutionary methods, such as multiple alignments and phylogenetic methods. Unfortunately, currently available simulation methods are not sufficiently flexible and do not allow biologically realistic dynamic protein sequence evolution. We introduce a new method, indel-Seq-Gen (iSG), that can simulate realistic evolutionary processes of protein sequences with insertions and deletions (indels). Unlike other simulation methods, iSG allows the user to simulate multiple subsequences according to different evolutionary parameters, which is necessary for generating realistic protein families with multiple domains. iSG tracks all evolutionary events including indels and outputs the "true" multiple alignment of the simulated sequences. iSG can also generate a larger sequence space by allowing the use of multiple related root sequences. With all these functions, iSG can be used to test the accuracy of, for example, multiple alignment methods, phylogenetic methods, evolutionary hypotheses, ancestral protein reconstruction methods, and protein family classification methods. We empirically evaluated the performance of iSG against currently available methods by simulating the evolution of the G protein-coupled receptor and lipocalin protein families. We examined their true multiple alignments, reconstruction of the transmembrane regions and beta-strands, and the results of similarity search against a protein database using the simulated sequences. We also presented an example of using iSG for examining how phylogenetic reconstruction is affected by high indel rates.  相似文献   

6.
Each amino acid in a protein is considered to be an individual, mutable characteristic of the species from which the protein is extracted. For a branching tree representing the evolutionary history of the known sequences in different species, our computer programs use majority logic and parsimony of mutations to determine the most likely ancestral amino acid for each position of the protein at each node of the tree. The number of mutations necessary between the ancestral and present species is summed for each branch and the entire tree. The programs then move branches to make many different configurations, from which we select the one with the minimum number of mutations as the most likely evolutionary history. We used this method to elucidate primate phylogeny from sequences of fibrinopeptides, carbonic anhydrase, and the hemoglobin beta, delta and alpha chains. All available sequences indicate that the early Pongidae had diverged into two lines before the divergence of an ancestor for the human line alone. We have constructed some probable ancestral sequences at major points during primate evolution and have developed tentative trees showing the order of divergences and evolutionary distances among primate groups. Further questions on primate evolution could be answered in the future by the detemination of the appropriate sequences.  相似文献   

7.
ProteoMix is a suite of JAVA programs for identifying, annotating and predicting regions of interest in large sets of amino acid sequences, according to systematic and consistent criteria. It is based on two concepts (1) the integration of results from different sequence analysis tools increases the prediction reliability; and (2) the integration protocol is critical and needs to be easily adaptable in a case-by-case manner. ProteoMix was designed to analyze simultaneously multiple protein sequences using several bioinformatics tools, merge the results of the analyses using logical functions and display them on an integrated viewer. In addition, new sequences can be added seamlessly to an analysis performed on an initial set of sequences. ProteoMix has a modular design, and bioinformatics tools are run on remote servers accessed using the Internet Simple Object Access Protocol (SOAP), ensuring the swift implementation of additional tools. ProteoMix has a user-friendly interactive graphical user interface environment and runs on PCs with Microsoft OS. AVAILABILITY: ProteoMix is freely available for academic users at http://bio.gsc.riken.jp/ProteoMix/  相似文献   

8.

Background  

The availability of complete genomic sequences for hundreds of organisms promises to make obtaining genome-wide estimates of substitution rates, selective constraints and other molecular evolution variables of interest an increasingly important approach to addressing broad evolutionary questions. Two of the programs most widely used for this purpose are codeml and baseml, parts of the PAML (Phylogenetic Analysis by Maximum Likelihood) suite. A significant drawback of these programs is their lack of a graphical user interface, which can limit their user base and considerably reduce their efficiency.  相似文献   

9.
A web-based resource, Microbial Community Analysis (MiCA), has been developed to facilitate studies on microbial community ecology that use analyses of terminal-restriction fragment length polymorphisms (T-RFLP) of 16S and 18S rRNA genes. MiCA provides an intuitive web interface to access two specialized programs and a specially formatted database of 16S ribosomal RNA sequences. The first program performs virtual polymerase chain reaction (PCR) amplification of rRNA genes and restriction of the amplicons using primer sequences and restriction enzymes chosen by the user. This program, in silico PCR and Restriction (ISPaR), uses a binary encoding of DNA sequences to rapidly scan large numbers of sequences in databases searching for primer annealing and restriction sites while permitting the user to specify the number of mismatches in primer sequences. ISPaR supports multiple digests with up to three enzymes. The number of base pairs between the 5′ and 3′ primers and the proximal restriction sites can be reported, printed, or exported in various formats. The second program, APLAUS, infers a plausible community structure(s) based on T-RFLP data supplied by a user. APLAUS estimates the relative abundances of populations and reports a listing of phylotypes that are consistent with the empirical data. MiCA is accessible at .  相似文献   

10.
We describe the further development of a widely used package of DNA and protein sequence analysis programs for microcomputers (1,2,3). The package now provides a screen oriented user interface, and an enhanced working environment with powerful formatting, disk access, and memory management tools. The new GenBank floppy disk database is supported transparently to the user and a similar version of the NBRF protein database is provided. The programs can use sequence file annotation to automatically annotate printouts and translate or extract specified regions from sequences by name. The sequence comparison programs can now perform a 5000 X 5000 bp analysis in 12 minutes on an IBM PC. A program to locate potential protein coding regions in nucleic acids, a digitizer interface, and other additions are also described.  相似文献   

11.
The genomic signatures of positive selection and evolutionary constraints can be detected by analyses of nucleotide sequences. One of the most widely used programs for this purpose is CodeML, part of the PAML package. Although a number of bioinformatics tools have been developed to facilitate the use of CodeML, these have various limitations. Here, we present a wrapper tool named EasyCodeML that provides a user‐friendly graphical interface for using CodeML. EasyCodeML has a custom running mode in which parameters can be adjusted to meet different requirements. It also offers a preset running mode in which an evolutionary analysis pipeline and publication‐quality tables can be exported by a single click. EasyCodeML allows visualized, interactive tree labelling, which greatly simplifies the use of the branch, branch‐site, and clade models of selection. The program allows comparison of major codon‐based models for analyses of selection. EasyCodeML is a stand‐alone package that is supported in Windows, Mac, and Linux operating systems, and is freely available at https://github.com/BioEasy/EasyCodeML .  相似文献   

12.
Substitution matrices have been useful for sequence alignment and protein sequence comparisons. The BLOSUM series of matrices, which had been derived from a database of alignments of protein blocks, improved the accuracy of alignments previously obtained from the PAM-type matrices estimated from only closely related sequences. Although BLOSUM matrices are scoring matrices now widely used for protein sequence alignments, they do not describe an evolutionary model. BLOSUM matrices do not permit the estimation of the actual number of amino acid substitutions between sequences by correcting for multiple hits. The method presented here uses the Blocks database of protein alignments, along with the additivity of evolutionary distances, to approximate the amino acid substitution probabilities as a function of actual evolutionary distance. The PMB (Probability Matrix from Blocks) defines a new evolutionary model for protein evolution that can be used for evolutionary analyses of protein sequences. Our model is directly derived from, and thus compatible with, the BLOSUM matrices. The model has the additional advantage of being easily implemented.  相似文献   

13.
Translational control directed by the eukaryotic translation initiation factor 2 alpha-subunit (eIF2alpha) kinase GCN2 is important for coordinating gene expression programs in response to nutritional deprivation. The GCN2 stress response, conserved from yeast to mammals, is critical for resistance to nutritional deficiencies and for the control of feeding behaviors in rodents. The mouse protein IMPACT has sequence similarities to the yeast YIH1 protein, an inhibitor of GCN2. YIH1 competes with GCN2 for binding to a positive regulator, GCN1. Here, we present evidence that IMPACT is the functional counterpart of YIH1. Overexpression of IMPACT in yeast lowered both basal and amino acid starvation-induced levels of phosphorylated eIF2alpha, as described for YIH1 (31). Overexpression of IMPACT in mouse embryonic fibroblasts inhibited phosphorylation of eIF2alpha by GCN2 under leucine starvation conditions, abolishing expression of its downstream target genes, ATF4 (CREB-2) and CHOP (GADD153). IMPACT bound to the minimal yeast GCN1 segment required for interaction with yeast GCN2 and YIH1 and to native mouse GCN1. At the protein level, IMPACT was detected mainly in the brain. IMPACT was found to be abundant in the majority of hypothalamic neurons. Scattered neurons expressing this protein at higher levels were detected in other regions such as the hippocampus and piriform cortex. The abundance of IMPACT correlated inversely with phosphorylated eIF2alpha levels in different brain areas. These results suggest that IMPACT ensures constant high levels of translation and low levels of ATF4 and CHOP in specific neuronal cells under amino acid starvation conditions.  相似文献   

14.
Sequence based homology studies play an important role in evolutionary tracing and classification of proteins. Various methods are available to analyze biological sequence information. However, with the advent of proteomics era, there is a growing demand for analysis of huge amount of biological sequence information, and it has become necessary to have programs that would provide speedy analysis. ISHAN has been developed as a homology analysis package, built on various sequence analysis tools viz FASTA, ALIGN, CLUSTALW, PHYLIP and CODONW (for DNA sequences). This JAVA application offers the user choice of analysis tools. For testing, ISHAN was applied to perform phylogenetic analysis for sets of Caspase 3 DNA sequences and NF-kappaB p105 amino acid sequences. By integrating several tools it has made analysis much faster and reduced manual intervention.  相似文献   

15.
ACNUC is a database structure and retrieval software for usewith either the GenBank or EMBL nucleic acid sequence data collections.The nucleotide and textual data furnished by both collectionsare each restructured into a database that allows sequence retrievalon a multi-criterion basis. The main selection criteria are:species (or higher order taxon), keyword, reference, journal,author, and organelle; all logical combinations of these criteriacan be used. Direct access to sequence regions that code fora specific product (protein, tRNA or rRNA) is provided. A versatileextraction procedure copies selected sequences, or fragmentsof them, from the database to user files suitable to be analysedby user-supplied application programs. A detailed help mechanismis provided to aid the user at any time during the retrievalsession. All software has been written in FORTRAN 77 which guaranteesa high degree of transportability to minicomputers or mainframes.reference, journal, author, and organelle; all logical combinationsof these criteria can be used. Direct access to sequence regionsthat code for a specific product (protein, tRNA or rRNA) isprovided. A versatile extraction procedure copies selected sequences,or fragments of them, from the database to user files suitableto be analysed by user-supplied application programs. A detailedhelp mechanism is provided to aid the user at any time duringthe retrieval session. All software has been written in FORTRAN77 which guarantees a high degree of transportability to minicomputersor mainframes. Received on May 1, 1985; accepted on June 13, 1985  相似文献   

16.
Computer programs that can be used for the design of syntheticgenes and that are run on an Apple Macintosh computer are described.These programs determine nucleic acid sequences encoding aminoacid sequences. They select DNA sequences based on codon usageas specified by the user, and determine the placement of basechanges that can be used to create restriction enzyme siteswithout altering the amino acid sequence. A new algorithm forfinding restriction sites by translating the restriction endonucleasetarget sequence in all three reading frames and then searchingthe given peptide or protein amino acid sequence with theseshort restriction enzyme peptide sequences is described. Examplesare given for the creation of synthetic DNA sequences for thebovine prethrombin-2 and ribonuclease A genes Received on October 18, 1988; accepted on December 9, 1988  相似文献   

17.
Phylomat: an automated protein motif analysis tool for phylogenomics   总被引:2,自引:0,他引:2  
Recent progress in genomics, proteomics, and bioinformatics enables unprecedented opportunities to examine the evolutionary history of molecular, cellular, and developmental pathways through phylogenomics. Accordingly, we have developed a motif analysis tool for phylogenomics (Phylomat, http://alg.ncsa.uiuc.edu/pmat) that scans predicted proteome sets for proteins containing highly conserved amino acid motifs or domains for in silico analysis of the evolutionary history of these motifs/domains. Phylomat enables the user to download results as full protein or extracted motif/domain sequences from each protein. Tables containing the percent distribution of a motif/domain in organisms normalized to proteome size are displayed. Phylomat can also align the set of full protein or extracted motif/domain sequences and predict a neighbor-joining tree from relative sequence similarity. Together, Phylomat serves as a user-friendly data-mining tool for the phylogenomic analysis of conserved sequence motifs/domains in annotated proteomes from the three domains of life.  相似文献   

18.
DNA barcoding is a promising approach to the diagnosis of biological diversity in which DNA sequences serve as the primary key for information retrieval. Most existing software for evolutionary analysis of DNA sequences was designed for phylogenetic analyses and, hence, those algorithms do not offer appropriate solutions for the rapid, but precise analyses needed for DNA barcoding, and are also unable to process the often large comparative datasets. We developed a flexible software tool for DNA taxonomy, named TaxI. This program calculates sequence divergences between a query sequence (taxon to be barcoded) and each sequence of a dataset of reference sequences defined by the user. Because the analysis is based on separate pairwise alignments this software is also able to work with sequences characterized by multiple insertions and deletions that are difficult to align in large sequence sets (i.e. thousands of sequences) by multiple alignment algorithms because of computational restrictions. Here, we demonstrate the utility of this approach with two datasets of fish larvae and juveniles from Lake Constance and juvenile land snails under different models of sequence evolution. Sets of ribosomal 16S rRNA sequences, characterized by multiple indels, performed as good as or better than cox1 sequence sets in assigning sequences to species, demonstrating the suitability of rRNA genes for DNA barcoding.  相似文献   

19.
MEGA2: molecular evolutionary genetics analysis software.   总被引:201,自引:0,他引:201  
We have developed a new software package, Molecular Evolutionary Genetics Analysis version 2 (MEGA2), for exploring and analyzing aligned DNA or protein sequences from an evolutionary perspective. MEGA2 vastly extends the capabilities of MEGA version 1 by: (1) facilitating analyses of large datasets; (2) enabling creation and analyses of groups of sequences; (3) enabling specification of domains and genes; (4) expanding the repertoire of statistical methods for molecular evolutionary studies; and (5) adding new modules for visual representation of input data and output results on the Microsoft Windows platform. AVAILABILITY: http://www.megasoftware.net. CONTACT: s.kumar@asu.edu  相似文献   

20.
AIMS: Partial genetic characterization of several chromosomal regions on 35 16SrI-B phytoplasma strains maintained in periwinkle and collected in different geographical areas from plants of diverse species. METHODS AND RESULTS: Genes coding for ribosomal protein rpL22, elongation factor EF-Tu and random cloned sequences amplified with primers AY19p/m, G35p/m and BB88F1/R1 after RFLP analyses showed a high degree of polymorphism among the strains studied. The ribosomal protein (rp) subgroups B and K, and an undescribed subgroup designated N, were identified. Amplicons obtained with primers AY19p/m and BB88F1/R1, revealed a high and a low degree of polymorphism, respectively. CONCLUSIONS: A probable spacer role could be attributed to the AY19p/m sequence and a possible coding function to the BB88F1/R1 sequence. No relationship was found among genetic polymorphisms, identified by statistical analyses, and epidemiological or biological parameters. SIGNIFICANCE AND IMPACT OF THE STUDY: The analyses of five different genomic sequences of the 35 strains belonging to subgroup 16SrI-B allowed a finer distinction among them, confirming that the polymorphism level of 16S rDNA is too low to be adopted as unique parameter for classification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号