首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.

Background  

Deluged by the rate and complexity of completed genomic sequences, the need to align longer sequences becomes more urgent, and many more tools have thus been developed. In the initial stage of genomic sequence analysis, a biologist is usually faced with the questions of how to choose the best tool to align sequences of interest and how to analyze and visualize the alignment results, and then with the question of whether poorly aligned regions produced by the tool are indeed not homologous or are just results due to inappropriate alignment tools or scoring systems used. Although several systematic evaluations of multiple sequence alignment (MSA) programs have been proposed, they may not provide a standard-bearer for most biologists because those poorly aligned regions in these evaluations are never discussed. Thus, a tool that allows cross comparison of the alignment results obtained by different tools simultaneously could help a biologist evaluate their correctness and accuracy.  相似文献   

3.
The alignment of Escherichia coli citrate synthase to pig heart citrate synthase and the multiple alignment of the known sequences of the citrate synthase family of enzymes have been performed using six different amino acid similarity scoring matrices and a large range of gap penalty ratios for insertions and deletions of amino acids. The alignment studies have been performed as the first step in a project aimed at homology modelling E. coli citrate synthase (a hexamer) from pig heart citrate synthase (a dimer) in a molecular modelling approach to the study of multi-subunit enzymes. The effects of several important variables in producing realistic alignments have been investigated. The difference between multiple alignment of the family of enzymes versus simple pairwise alignment of the pig heart and E. coli proteins was explored. The effects of initial separate multiple alignments of the most highly related or most homologous species of the family of enzymes upon a subsequent pairwise alignment between species was evaluated. The value of 'fingerprinting' certain residues to bias the alignment in favour of matching those residues, as well as the worth of the computerized approach compared to an intuitive alignment technique, were assessed.  相似文献   

4.
5.
Clustal W—蛋白质与核酸序列分析软件   总被引:2,自引:1,他引:2  
蛋白质与核酸的序列分析在现代生物学和生物信息学中发挥着重要作用,新的算法和软件层出不穷,本文介绍一个可运行在PC机上的完全免费的多序列比较软件-ClustalW,它不但可以进行蛋白质与核酸的多序列比较,分析不同序列之间的相似性关系,还可以绘制进化树。由于其灵活的输入输出格式、方便的参数设定和选择、详尽的在线帮助以及良好的可移植性,使得ClustalW在蛋白质与核酸的序列分析中得到了广泛应用。  相似文献   

6.
Computational methods such as sequence alignment and motif construction are useful in grouping related proteins into families, as well as helping to annotate new proteins of unknown function. These methods identify conserved amino acids in protein sequences, but cannot determine the specific functional or structural roles of conserved amino acids without additional study. In this work, we present 3MATRIX (http://3matrix.stanford.edu) and 3MOTIF (http://3motif.stanford.edu), a web-based sequence motif visualization system that displays sequence motif information in its appropriate three-dimensional (3D) context. This system is flexible in that users can enter sequences, keywords, structures or sequence motifs to generate visualizations. In 3MOTIF, users can search using discrete sequence motifs such as PROSITE patterns, eMOTIFs, or any other regular expression-like motif. Similarly, 3MATRIX accepts an eMATRIX position-specific scoring matrix, or will convert a multiple sequence alignment block into an eMATRIX for visualization. Each query motif is used to search the protein structure database for matches, in which the motif is then visually highlighted in three dimensions. Important properties of motifs such as sequence conservation and solvent accessible surface area are also displayed in the visualizations, using carefully chosen color shading schemes.  相似文献   

7.

Background  

There have been many algorithms and software programs implemented for the inference of multiple sequence alignments of protein and DNA sequences. The "true" alignment is usually unknown due to the incomplete knowledge of the evolutionary history of the sequences, making it difficult to gauge the relative accuracy of the programs.  相似文献   

8.
We have identified four new types of short conserved sequence domains in homing endonucleases and related proteins. These domains are modular, appearing in various combinations. One domain includes a motif known by structure as a novel sequence-specific DNA-binding helix. Sequence similarity shows two other domains to be new types of helix-turn-helix DNA-binding domains. We term the new domains nuclease-associated modular DNA-binding domains (NUMODs).  相似文献   

9.
The most popular way of comparing the performance of multiple sequence alignment programs is to use empirical testing on sets of test sequences. Several such test sets now exist, each with potential strengths and weaknesses. We apply several different alignment packages to 6 benchmark datasets, and compare their relative performances. HOMSTRAD, a collection of alignments of homologous proteins, is regularly used as a benchmark for sequence alignment though it is not designed as such, and lacks annotation of reliable regions within the alignment. We introduce this annotation into HOMSTRAD using protein structural superposition. Results on each database show that method performance is dependent on the input sequences. Alignment benchmarks are regularly used in combination to measure performance across a spectrum of alignment problems. Through combining benchmarks, it is possible to detect whether a program has been over-optimised for a single dataset, or alignment problem type.  相似文献   

10.

Background  

Alignment and comparison of related genome sequences is a powerful method to identify regions likely to contain functional elements. Such analyses are data intensive, requiring the inclusion of genomic multiple sequence alignments, sequence annotations, and scores describing regional attributes of columns in the alignment. Visualization and browsing of results can be difficult, and there are currently limited software options for performing this task.  相似文献   

11.
Conserved segments in DNA or protein sequences are strong candidates for functional elements and thus appropriate methods for computing them need to be developed and compared. We describe five methods and computer programs for finding highly conserved blocks within previously computed multiple alignments, primarily for DNA sequences. Two of the methods are already in common use; these are based on good column agreement and high information content. Three additional methods find blocks with minimal evolutionary change, blocks that differ in at most k positions per row from a known center sequence and blocks that differ in at most k positions per row from a center sequence that is unknown a priori. The center sequence in the latter two methods is a way to model potential binding sites for known or unknown proteins in DNA sequences. The efficacy of each method was evaluated by analysis of three extensively analyzed regulatory regions in mammalian beta-globin gene clusters and the control region of bacterial arabinose operons. Although all five methods have quite different theoretical underpinnings, they produce rather similar results on these data sets when their parameters are adjusted to best approximate the experimental data. The optimal parameters for the method based on information content varied little for different regulatory regions of the beta-globin gene cluster and hence may be extrapolated to many other regulatory regions. The programs based on maximum allowed mismatches per row have simple parameters whose values can be chosen a priori and thus they may be more useful than the other methods when calibration against known functional sites is not available.  相似文献   

12.
Garg A  Kaur H  Raghava GP 《Proteins》2005,61(2):318-324
The present study is an attempt to develop a neural network-based method for predicting the real value of solvent accessibility from the sequence using evolutionary information in the form of multiple sequence alignment. In this method, two feed-forward networks with a single hidden layer have been trained with standard back-propagation as a learning algorithm. The Pearson's correlation coefficient increases from 0.53 to 0.63, and mean absolute error decreases from 18.2 to 16% when multiple-sequence alignment obtained from PSI-BLAST is used as input instead of a single sequence. The performance of the method further improves from a correlation coefficient of 0.63 to 0.67 when secondary structure information predicted by PSIPRED is incorporated in the prediction. The final network yields a mean absolute error value of 15.2% between the experimental and predicted values, when tested on two different nonhomologous and nonredundant datasets of varying sizes. The method consists of two steps: (1) in the first step, a sequence-to-structure network is trained with the multiple alignment profiles in the form of PSI-BLAST-generated position-specific scoring matrices, and (2) in the second step, the output obtained from the first network and PSIPRED-predicted secondary structure information is used as an input to the second structure-to-structure network. Based on the present study, a server SARpred (http://www.imtech.res.in/raghava/sarpred/) has been developed that predicts the real value of solvent accessibility of residues for a given protein sequence. We have also evaluated the performance of SARpred on 47 proteins used in CASP6 and achieved a correlation coefficient of 0.68 and a MAE of 15.9% between predicted and observed values.  相似文献   

13.
The spliceosomal small nuclear ribonucleoproteins (snRNPs) U1, U2, U4/U6 and U5 share eight proteins B', B, D1, D2, D3, E, F and G which form the structural core of the snRNPs. This class of common proteins plays an essential role in the biogenesis of the snRNPs. In addition, these proteins represent the major targets for the so-called anti-Sm auto-antibodies which are diagnostic for systemic lupus erythematosus (SLE). We have characterized the proteins F and G from HeLa cells by cDNA cloning, and, thus, all human Sm protein sequences are now available for comparison. Similar to the D, B/B' and E proteins, the F and G proteins do not possess any of the known RNA binding motifs, suggesting that other types of RNA-protein interactions occur in the snRNP core. Strikingly, the eight human Sm proteins possess mutual homology in two regions, 32 and 14 amino acids long, that we term Sm motifs 1 and 2. The Sm motifs are evolutionarily highly conserved in all of the putative homologues of the human Sm proteins identified in the data base. These results suggest that the Sm proteins may have arisen from a single common ancestor. Several hypothetical proteins, mainly of plant origin, that clearly contain the conserved Sm motifs but exhibit only comparatively low overall homology to one of the human Sm proteins, were identified in the data base. This suggests that the Sm motifs may also be shared by non-spliceosomal proteins. Further, we provide experimental evidence that the Sm motifs are involved, at least in part, in Sm protein-protein interactions. Specifically, we show by co-immunoprecipitation analyses of in vitro translated B' and D3 that the Sm motifs are essential for complex formation between B' and D3. Our finding that the Sm proteins share conserved sequence motifs may help to explain the frequent occurrence in patient sera of anti-Sm antibodies that cross-react with multiple Sm proteins and may ultimately further our understanding of how the snRNPs act as auto-antigens and immunogens in SLE.  相似文献   

14.
Alignment of sequences is an important routine in various areas of science, notably molecular biology. Multiple sequence alignment is a computationally hard optimization problem which involves the consideration of different possible alignments in order to find an optimal one, given a measure of goodness of alignments. Dynamic programming algorithms are generally well suited for the search of optimal alignments, but are constrained by unwieldy space requirements for large numbers of sequences. Carrillo and Lipman devised a method that helps to reduce the search space for an optimal alignment under a sum-of-pairs measure using bounds on the scores of its pairwise projections. In this paper, we generalize Carrillo and Lipman bounds and demonstrate a novel approach for finding optimal sum-of-pairs multiple alignments that allows incremental pruning of the optimal alignment search space. This approach can result in a drastic pruning of the final search space polytope (where we search for the optimal alignment) when compared to Carrillo and Lipman's approach and hence allows many runs that are not feasible with the original method.  相似文献   

15.
CLUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W. The new system is easy to use, providing an integrated system for performing multiple sequence and profile alignments and analysing the results. CLUSTAL X displays the sequence alignment in a window on the screen. A versatile sequence colouring scheme allows the user to highlight conserved features in the alignment. Pull-down menus provide all the options required for traditional multiple sequence and profile alignment. New features include: the ability to cut-and-paste sequences to change the order of the alignment, selection of a subset of the sequences to be realigned, and selection of a sub-range of the alignment to be realigned and inserted back into the original alignment. Alignment quality analysis can be performed and low-scoring segments or exceptional residues can be highlighted. Quality analysis and realignment of selected residue ranges provide the user with a powerful tool to improve and refine difficult alignments and to trap errors in input sequences. CLUSTAL X has been compiled on SUN Solaris, IRIX5.3 on Silicon Graphics, Digital UNIX on DECstations, Microsoft Windows (32 bit) for PCs, Linux ELF for x86 PCs, and Macintosh PowerMac.  相似文献   

16.
SNUFER is a software for the automatic localization and generation of tables used for the presentation of single nucleotide polymorphisms (SNPs). After input of a fasta file containing the sequences to be analyzed, a multiple sequence alignment is generated using ClustalW ran inside SNUFER. The ClustalW output file is then used to generate a table which displays the SNPs detected in the aligned sequences and their degree of similarity. This table can be exported to Microsoft Word, Microsoft Excel or as a single text file, permitting further editing for publication. The software was written using Delphi 7 for programming and FireBird 2.0 for sequence database management. It is freely available for noncommercial use and can be downloaded from http://www.heranza.com.br/bioinformatica2.htm.  相似文献   

17.
In the present study, an attempt has been made to develop a method for predicting gamma-turns in proteins. First, we have implemented the commonly used statistical and machine-learning techniques in the field of protein structure prediction, for the prediction of gamma-turns. All the methods have been trained and tested on a set of 320 nonhomologous protein chains by a fivefold cross-validation technique. It has been observed that the performance of all methods is very poor, having a Matthew's Correlation Coefficient (MCC) 相似文献   

18.
19.
Wrabl JO  Grishin NV 《Proteins》2004,54(1):71-87
An algorithm was developed to locally optimize gaps from the FSSP database. Over 2 million gaps were identified from all versus all FSSP structure comparisons, and datasets of non-identical gaps and flanking regions comprising between 90,000 and 135,000 sequence fragments were extracted for statistical analysis. Relative to background frequencies, gaps were enriched in residue types with small side chains and high turn propensity (D, G, N, P, S), and were depleted in residue types with hydrophobic side chains (C, F, I, L, V, W, Y). In contrast, regions flanking a gap exhibited opposite trends in amino acid frequencies, i.e., enrichment in hydrophobic residues and a high degree of secondary structure. Log-odds scores of residue type as a function of position in or around a gap were derived from the statistics. Three simple experiments demonstrated that these scores contained significant predictive information. First, regions where gaps were observed in single sequences taken from HOMSTRAD structure-based multiple sequence alignments generally scored higher than regions where gaps were not observed. Second, given the correct pairwise-aligned cores, the actual positions of gaps could be reproduced from sequence more accurately using the structurally-derived statistics than by using random pairwise alignments. Finally, revision of the Clustal-W residue-specific gap opening parameters with this new information improved the agreement of Clustal-W alignments with the structure-based alignments. At least three applications for these results are envisioned: improvement of gap penalties in pairwise (or multiple) sequence alignment, prediction of regions of single sequences likely (or unlikely) to contain indels, and more accurate placement of gaps in automated pairwise structure alignment.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号