首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Information about the three-dimensional structure or functionof a newly determined protein sequence can be obtained if theprotein is found to contain a characterized motif or patternof residues. Recently a database (PROSITE) has been establishedthat contains 337 known motifs encoded as a list of allowedresidue types at specific positions along the sequence. PROMOTis a FORTRAN computer program that takes a protein sequenceand examines if it contains any of the motifs in PROSITE. Theprogram also extends the definitions of patterns beyond thoseused in PROSITE to provide a simple, yet flexible, method toscan either a PROSITE or a user-defined pattern against a proteinsequence database. Received on October 17, 1990; accepted on November 15, 1990  相似文献   

2.
Direct cloning of large genomic sequences   总被引:1,自引:0,他引:1  
  相似文献   

3.
MOTIVATION: In 2001 and 2002, we published two papers (Bioinformatics, 17, 282-283, Bioinformatics, 18, 77-82) describing an ultrafast protein sequence clustering program called cd-hit. This program can efficiently cluster a huge protein database with millions of sequences. However, the applications of the underlying algorithm are not limited to only protein sequences clustering, here we present several new programs using the same algorithm including cd-hit-2d, cd-hit-est and cd-hit-est-2d. Cd-hit-2d compares two protein datasets and reports similar matches between them; cd-hit-est clusters a DNA/RNA sequence database and cd-hit-est-2d compares two nucleotide datasets. All these programs can handle huge datasets with millions of sequences and can be hundreds of times faster than methods based on the popular sequence comparison and database search tools, such as BLAST.  相似文献   

4.
GMAP: a genomic mapping and alignment program for mRNA and EST sequences   总被引:13,自引:0,他引:13  
MOTIVATION: We introduce GMAP, a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. The program generates accurate gene structures, even in the presence of substantial polymorphisms and sequence errors, without using probabilistic splice site models. Methodology underlying the program includes a minimal sampling strategy for genomic mapping, oligomer chaining for approximate alignment, sandwich DP for splice site detection, and microexon identification with statistical significance testing. RESULTS: On a set of human messenger RNAs with random mutations at a 1 and 3% rate, GMAP identified all splice sites accurately in over 99.3% of the sequences, which was one-tenth the error rate of existing programs. On a large set of human expressed sequence tags, GMAP provided higher-quality alignments more often than blat did. On a set of Arabidopsis cDNAs, GMAP performed comparably with GeneSeqer. In these experiments, GMAP demonstrated a several-fold increase in speed over existing programs. AVAILABILITY: Source code for gmap and associated programs is available at http://www.gene.com/share/gmap SUPPLEMENTARY INFORMATION: http://www.gene.com/share/gmap.  相似文献   

5.
This work presents a method to compare local clusters of interactingresidues as observed in a known three-dimensional protein structurewith corresponding clusters inferred from homologous proteinsequences, assuming conserved protein folding. For this purposethe local environment of a selected residue in a known proteinstructure is defined as the ensemble of amino acids in contactwith it in the folded state. Using a multiple sequence alignmentto identify corresponding residues in homologous proteins, adetailed comparison can be performed between the local environmentof a selected amino acid in the template protein structure andthe expected local environments at the sets of equivalent residues,derived from the aligned protein sequences. The comparison makesit possible to detect conserved local features such as hydrogenbonding or complementarity in residue substitution. A globalmeasure of environmental similarity is also defined, to searchfor conserved amino acid clusters subject to functional or structural constraints. The proposed approach is useful for investigatingprotein function as well as for site-directed mutagenesis experiments,where appropriate amino acid substitutions can be suggestedby observing naturally occurring protein variants.  相似文献   

6.
SUMMARY: In the segment-by-segment approach to sequence alignment, pairwise and multiple alignments are generated by comparing gap-free segments of the sequences under study. This method is particularly efficient in detecting local homologies, and it has been used to identify functional regions in large genomic sequences. Herein, an algorithm is outlined that calculates optimal pairwise segment-by-segment alignments in essentially linear space. AVAILABILTIY: The program is available at the Bielefeld Bioinformatics Server (BiBiServ) at http://bibiserv.techfak. uni-bielefeld.de/dialign/  相似文献   

7.
Single-stranded DNA or RNA libraries used in SELEX experiments usually include primer-annealing sequences for PCR amplification. In genomic SELEX, these fixed sequences may form base pairs with the central genomic fragments and interfere with the binding of target molecules to the genomic sequences. In this study, a method has been developed to circumvent these artificial effects. Primer-annealing sequences are removed from the genomic library before selection with the target protein and are then regenerated to allow amplification of the selected genomic fragments. A key step in the regeneration of primer-annealing sequences is to employ thermal cycles of hybridization-extension, using the sequences from unselected pools as templates. The genomic library was derived from the bacteriophage fd, and the gene 5 protein (g5p) from the phage was used as a target protein. After four rounds of primer-free genomic SELEX, most cloned sequences overlapped at a segment within gene 6 of the viral genome. This sequence segment was pyrimidine-rich and contained no stable secondary structures. Compared with a neighboring genomic fragment, a representative sequence from the family of selected sequences had about 23-fold higher g5p-binding affinity. Results from primer-free genomic SELEX were compared with the results from two other genomic SELEX protocols.  相似文献   

8.
9.
  相似文献   

10.

Background

Genomic sequence alignment is a powerful method for genome analysis and annotation, as alignments are routinely used to identify functional sites such as genes or regulatory elements. With a growing number of partially or completely sequenced genomes, multiple alignment is playing an increasingly important role in these studies. In recent years, various tools for pair-wise and multiple genomic alignment have been proposed. Some of them are extremely fast, but often efficiency is achieved at the expense of sensitivity. One way of combining speed and sensitivity is to use an anchored-alignment approach. In a first step, a fast search program identifies a chain of strong local sequence similarities. In a second step, regions between these anchor points are aligned using a slower but more accurate method.

Results

Herein, we present CHAOS, a novel algorithm for rapid identification of chains of local pair-wise sequence similarities. Local alignments calculated by CHAOS are used as anchor points to improve the running time of DIALIGN, a slow but sensitive multiple-alignment tool. We show that this way, the running time of DIALIGN can be reduced by more than 95% for BAC-sized and longer sequences, without affecting the quality of the resulting alignments. We apply our approach to a set of five genomic sequences around the stem-cell-leukemia (SCL) gene and demonstrate that exons and small regulatory elements can be identified by our multiple-alignment procedure.

Conclusion

We conclude that the novel CHAOS local alignment tool is an effective way to significantly speed up global alignment tools such as DIALIGN without reducing the alignment quality. We likewise demonstrate that the DIALIGN/CHAOS combination is able to accurately align short regulatory sequences in distant orthologues.
  相似文献   

11.
We present an extension of the program SIMCOAL, which allows for simulation of the genomic diversity of samples drawn from a set of populations with arbitrary patterns of migrations and complex demographic histories, including bottlenecks and various modes of demographic expansion. The main additions to the previous version include the possibility of arbitrary and heterogeneous recombination rates between adjacent loci and multiple coalescent events per generation, allowing for the simulation of very large samples and recombining genomic regions, together with the simulation of single nucleotide polymorphism data with frequency ascertainment bias. AVAILABILITY: http://cmpg.unibe.ch/software/simcoal2/.  相似文献   

12.
Carbon distribution is responsible for stability and structure of proteins. Arrangement of carbon along the protein sequence is depends on how the amino acids are organized and is guided by mRNAs. An atomic level revision is important for understanding these codes. This will ultimately help in identification of disorders and suggest mutations. For this purpose a carbon distribution analysis program has been developed. This program captures the hydrophobic / hydrophilic / disordered regions in a protein. The program gives accurate results. The calculations are precise and sensitive to single amino acid resolution. This program is to help in mutational studies leading to protein stabilisation.  相似文献   

13.
We report the latest release (version 1.6) of the CATH protein domains database (http://www.biochem.ucl. ac.uk/bsm/cath ). This is a hierarchical classification of 18 577 domains into evolutionary families and structural groupings. We have identified 1028 homo-logous superfamilies in which the proteins have both structural, and sequence or functional similarity. These can be further clustered into 672 fold groups and 35 distinct architectures. Recent developments of the database include the generation of 3D templates for recognising structural relatives in each fold group, which has led to significant improvements in the speed and accuracy of updating the database and also means that less manual validation is required. We also report the establishment of the CATH-PFDB (Protein Family Database), which associates 1D sequences with the 3D homologous superfamilies. Sequences showing identifiable homology to entries in CATH have been extracted from GenBank using PSI-BLAST. A CATH-PSIBLAST server has been established, which allows you to scan a new sequence against the database. The CATH Dictionary of Homologous Superfamilies (DHS), which contains validated multiple structural alignments annotated with consensus functional information for evolutionary protein superfamilies, has been updated to include annotations associated with sequence relatives identified in GenBank. The DHS is a powerful tool for considering the variation of functional properties within a given CATH superfamily and in deciding what functional properties may be reliably inherited by a newly identified relative.  相似文献   

14.
I describe a computer program which can align a large number of nucleic acid sequences with one another. The program uses an heuristic, iterative algorithm which has been tested extensively, and is found to produce useful alignments of a variety of sequence families. The algorithm is fast enough to be practical for the analysis of large number of sequences, and is implemented in a program which contains a variety of other functions to facilitate the analysis of the aligned result.  相似文献   

15.
Nishizawa M  Nishizawa K 《Proteins》1999,37(2):284-292
We showed previously that the use of arginine versus lysine residues in eukaryote proteins is correlated positively with local GC content of the genome within approximately 50 residues. Cumulative analyses show that the tendency for self-clustering (or repetitive use) generally is the case for all types of amino acids except for certain hydrophobic types. The degree to which each of the amino acids is used recurrently is weak for ancient proteins (or protein domains), those that are conserved through both eukaryotes and prokaryotes, but strong for modern proteins, which are unique to organisms of particular phyla. These findings support the idea that repetitiveness occurs due to a propensity of genomic DNA to cause tandem genomic duplication. A protein sequence with high repetitiveness tends to be unique in the homology search, which may indicate the weaker constraints and, hence, more arbitrary use of amino acids. Simulation analyses suggest that tandem gene duplications on a very small scale (1 or 2 codons) is an important causal factor in maintaining repetitiveness in the presence of concomittant occurrence of substitutive point mutation. For yeast proteins, approximately 1.3 duplication events per 1,000 residues on average are likely to occur, whereas 10 events of substitution mutation occur. It also is suggested that duplication enhances the probability of occurrence of some peptide motifs, such as those found in zinc fingers and segments with extreme physicochemical characteristics, and, thus, that local repetitiveness is a genomic factor influencing the evolution of eukaryote proteins.  相似文献   

16.
A multiple alignment program for protein sequences   总被引:1,自引:0,他引:1  
A program for the multiple alignment of protein sequences ispresented. The program is an extension of the fast alignmentprogram by Wilbur et al. (1984) into higher dimensions. Theuse of hash procedures on fragments of the protein sequencesincreases the speed of calculation. Thereby we also take intoaccount fragments which are present in some, but not in all,sequences considered. The results of some multiple alignmentsare given. Received on September 11, 1986; accepted on March 18, 1987  相似文献   

17.
A program, BIOSITE, providing for the interactive visual comparisonof aligned homologous amino-acid sequences is presented, includingan example of its application. The program allows for two typesof comparison sequence to be generated: an ‘identity’sequence and a ‘difference’ sequence. These maybe used on subsets of sequences and in further comparisons toidentify candidate sites involved in a distinct functional property.The program should prove a useful tool for biologists engagedin understanding sequence—function relationships.  相似文献   

18.
Tandem repeats finder: a program to analyze DNA sequences.   总被引:63,自引:3,他引:63       下载免费PDF全文
A tandem repeat in DNA is two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats have been shown to cause human disease, may play a variety of regulatory and evolutionary roles and are important laboratory and analytic tools. Extensive knowledge about pattern size, copy number, mutational history, etc. for tandem repeats has been limited by the inability to easily detect them in genomic sequence data. In this paper, we present a new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size. We model tandem repeats by percent identity and frequency of indels between adjacent pattern copies and use statistically based recognition criteria. We demonstrate the algorithm's speed and its ability to detect tandem repeats that have undergone extensive mutational change by analyzing four sequences: the human frataxin gene, the human beta T cellreceptor locus sequence and two yeast chromosomes. These sequences range in size from 3 kb up to 700 kb. A World Wide Web server interface atc3.biomath.mssm.edu/trf.html has been established for automated use of the program.  相似文献   

19.
LINKER: a program to generate linker sequences for fusion proteins   总被引:9,自引:0,他引:9  
The construction of functional fusion proteins often requires a linker sequence that adopts an extended conformation to allow for maximal flexibility. Linker sequences are generally selected based on intuition. Without a reliable selection criterion, the design of such linkers is often difficult, particularly in situations where longer linker sequences are required. Here we describe a program called LINKER which can automatically generate a set of linker sequences that are known to adopt extended conformations as determined by X-ray crystallography and NMR. The only required input to the program is the desired linker sequence length. The program is specifically designed to assist in fusion protein construction. A number of optional input parameters have been incorporated so that users are able to enhance sequence selection based on specific applications. The program output simply contains a set of sequences with a specified length. This program should be a useful tool in both the biotechnology industry and biomedical research. It can be accessed through the Web page http://www.fccc. edu/research/labs/feng/linker.html.  相似文献   

20.

Background  

The conservation of gene order among prokaryotic genomes can provide valuable insight into gene function, protein interactions, or events by which genomes have evolved. Although some tools are available for visualizing and comparing the order of genes between genomes of study, few support an efficient and organized analysis between large numbers of genomes. The Prokaryotic Sequence homology Analysis Tool (PSAT) is a web tool for comparing gene neighborhoods among multiple prokaryotic genomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号