首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Automated assembly of protein blocks for database searching.   总被引:52,自引:7,他引:45       下载免费PDF全文
A system is described for finding and assembling the most highly conserved regions of related proteins for database searching. First, an automated version of Smith's algorithm for finding motifs is used for sensitive detection of multiple local alignments. Next, the local alignments are converted to blocks and the best set of non-overlapping blocks is determined. When the automated system was applied successively to all 437 groups of related proteins in the PROSITE catalog, 1764 blocks resulted; these could be used for very sensitive searches of sequence databases. Each block was calibrated by searching the SWISS-PROT database to obtain a measure of the chance distribution of matches, and the calibrated blocks were concatenated into a database that could itself be searched. Examples are provided in which distant relationships are detected either using a set of blocks to search a sequence database or using sequences to search the database of blocks. The practical use of the blocks database is demonstrated by detecting previously unknown relationships between oxidoreductases and by evaluating a proposed relationship between HIV Vif protein and thiol proteases.  相似文献   

2.
Sequence motifs specific for cytosine methyltransferases   总被引:2,自引:0,他引:2  
J Pósfai  A S Bhagwat  R J Roberts 《Gene》1988,74(1):261-265
Using a new alignment method, the sequences of 13 m5C methyltransferases (MTases) have been examined. Five extremely well-conserved blocks of sequence have been detected and have been used as fixed points for the alignment of the 13 sequences. Following this initial alignment, five further blocks of similarity have been identified to give a total of ten recognizable blocks of sequence homology that are all arranged in a common order. The structures of these MTases consist of a variable-length N-terminal arm followed by eight well-conserved blocks each separated by small variable-length regions. A large variable-length segment of 90 to 270 amino acids (aa) then follows. After this are two blocks, and a variable-length C-terminal segment completes the sequence. Within the final alignment, 20 aa in the protein sequences, and 86 nucleotides in the nucleotide sequences are invariant. The strongest conservation is found in proximity to a suspected functional site that contains the dipeptide proline-cysteine. Consensus patterns can be defined for the five best conserved blocks and, when used as search motifs, are able to clearly distinguish between the m5C MTases and all other identified proteins in the PIR database. This suggests they may be of use in identifying putative MTases among protein sequences of unknown function.  相似文献   

3.
An extracellular poly-α-L-guluronate lyase from Klebsiella aerogenes degrades those blocks from alginate which contain both mannuronic and guluronic acid residues (poly-MG blocks) to a mixture of oligosaccharides. From an analysis of these products, it is concluded that poly-MG blocks do not have a strictly alternating sequence of the two uronic acid residues. Enzymic degradation of various samples of algal alginate to leave the poly-M blocks intact has shown that these blocks have a uniform chain-length, estimated at 24 residues.  相似文献   

4.
5.
Haspel N  Tsai CJ  Wolfson H  Nussinov R 《Proteins》2003,51(2):203-215
We have previously presented a building block folding model. The model postulates that protein folding is a hierarchical top-down process. The basic unit from which a fold is constructed, referred to as a hydrophobic folding unit, is the outcome of combinatorial assembly of a set of "building blocks." Results obtained by the computational cutting procedure yield fragments that are in agreement with those obtained experimentally by limited proteolysis. Here we show that as expected, proteins from the same family give very similar building blocks. However, different proteins can also give building blocks that are similar in structure. In such cases the building blocks differ in sequence, stability, contacts with other building blocks, and in their 3D locations in the protein structure. This result, which we have repeatedly observed in many cases, leads us to conclude that while a building block is influenced by its environment, nevertheless, it can be viewed as a stand-alone unit. For small-sized building blocks existing in multiple conformations, interactions with sister building blocks in the protein will increase the population time of the native conformer. With this conclusion in hand, it is possible to develop an algorithm that predicts the building block assignment of a protein sequence whose structure is unknown. Toward this goal, we have created sequentially nonredundant databases of building block sequences. A protein sequence can be aligned against these, in order to be matched to a set of potential building blocks.  相似文献   

6.
By using an unsupervised cluster analyzer, we have identified a local structural alphabet composed of 16 folding patterns of five consecutive C(alpha) ("protein blocks"). The dependence that exists between successive blocks is explicitly taken into account. A Bayesian approach based on the relation protein block-amino acid propensity is used for prediction and leads to a success rate close to 35%. Sharing sequence windows associated with certain blocks into "sequence families" improves the prediction accuracy by 6%. This prediction accuracy exceeds 75% when keeping the first four predicted protein blocks at each site of the protein. In addition, two different strategies are proposed: the first one defines the number of protein blocks in each site needed for respecting a user-fixed prediction accuracy, and alternatively, the second one defines the different protein sites to be predicted with a user-fixed number of blocks and a chosen accuracy. This last strategy applied to the ubiquitin conjugating enzyme (alpha/beta protein) shows that 91% of the sites may be predicted with a prediction accuracy larger than 77% considering only three blocks per site. The prediction strategies proposed improve our knowledge about sequence-structure dependence and should be very useful in ab initio protein modelling.  相似文献   

7.
We have amplified, by the polymerase chain reaction, and have sequenced the D-loop region of the mitochondrial DNA from the sperm whale (Physeter macrocephalus). The sperm whale D-loop was aligned with D- loop sequences from four other cetaceans (Commerson's dolphin, orca, fin whale, and minke whale) and an out-group (cow). This alignment showed the sperm whale sequence to be larger than that of other cetaceans. In addition, some sequence blocks were highly conserved among all six species, suggesting roles in the functioning of mitochondrial DNA. Other blocks that were previously reported to be well conserved among cetaceans showed little sequence conservation with the sperm whale D-loop, which argues against the functional importance of these sequence blocks in cetaceans.   相似文献   

8.
The pyr-4 gene of Neurospora crassa encodes orotidine-5' -phosphate decarboxylase, which catalyses the sixth step in the pyrimidine biosynthetic pathway. The complete nucleotide sequence of a 1.8-kb genomic fragment containing the pyr-4 gene has been determined. Using transposon mutagenesis, the coding region has been identified, and the amino acid (aa) sequence deduced. Comparison of the pyr-4 aa sequence with URA3, the equivalent gene of Saccharomyces cerevisiae, showed extensive blocks of homology, with non-homologous sequences between these blocks being generally much longer in Neurospora than in yeast. Computer-predicted protein secondary structure of pyr-4 and URA3 was conserved within equivalent blocks. Upstream sequences of pyr-4 were compared with other sequenced Neurospora genes and possible promoter sequences identified.  相似文献   

9.
E Ohtsuka  Z Tozuka  S Iwai    M Ikehara 《Nucleic acids research》1982,10(20):6235-6241
A new condensing reagent 1-(2,4,6,-triisopropylbenzenesulfonyl)-5-(pyridin-2-yl)tetrazolide (TPSPy) was found to give one diastereoisomer of dinucleoside monophosphate aryl esters. Several oligodeoxynucleotide blocks were prepared using this reagent. A heptadecanucleotide, dTATCCCTTGCGGTGATA, which had the same sequence as the lambda cro binding DNA sequence was synthesized by condensing mono-, tri- and dodecanucleotide blocks using this reagent on a polystyrene support.  相似文献   

10.
11.
An automatic procedure is proposed to identify, from the protein sequence database, conserved amino acid patterns (or sequence motifs) that are exclusive to a group of functionally related proteins. This procedure is applied to the PIR database and a dictionary of sequence motifs that relate to specific superfamilies constructed. The motifs have a practical relevance in identifying the membership of specific superfamilies without the need to perform sequence database searches in 20% of newly determined sequences. The sequence motifs identified represent functionally important sites on protein molecules. When multiple blocks exist in a single motif they are often close together in the 3-D structure. Furthermore, occasionally these motif blocks were found to be split by introns when the correlation with exon structures was examined.  相似文献   

12.
The solid-phase peptide synthesis of a reportedly inaccessible peptide sequence of chaperonin 60.1 (195-219) is described using oxazolidine containing dipeptide building blocks ('pseudo-proline' dipeptide units). Two attempts at the synthesis of the chaperonin 60.1 sequence are outlined using one and two pseudo-proline units, respectively, and these results are compared with the outcome of an ordinary stepwise (double) coupling procedure. The only successful synthesis is that combining two pseudo-proline building blocks.  相似文献   

13.
Three sequence blocks of 10–12 bp are conserved in sequence and order 5 to putative start codons of several higher-plant mitochondrial genes. At least 25 examples were found, primarily associated with coxII, atp6, and orf25, in monocotyledons and dicotyledons. The proximal block can be 9 bp from start codons, and the three blocks generally occur within 100 bp 5 of start codons. In three examples 5 termini of the blocks represent recombination breakpoints, resulting in conservation of the blocks in resultant configurations. The two proximal blocks can form a secondary structure motif. The occurrence of the blocks near start codons, and conserved sequence and order, is consistent with a possible role in translation initiation or regulation.  相似文献   

14.
Yuri Motorin 《Gene》1996,170(2):289-290
Five blocks of significant differences exist between two published sequences of the cDNA encoding human valyl-tRNA synthetase (GenBank X59303 and M98326). By comparison with the partial sequence of rat valyl-tRNA synthetase (GenBank M98327) the correct sequence can be deduced for two such blocks. The possible origin of the diversity for the two sequences is discussed.  相似文献   

15.
The Blocks Database World Wide Web (http://www.blocks.fhcrc.org ) and Email (blocks@blocks.fhcrc.org) servers provide tools for the detection and analysis of protein homology based on alignment blocks representing conserved regions of proteins. During the past year, searching has been augmented by supplementation of the Blocks Database with blocks from the Prints Database, for a total of 4754 blocks from 1163 families. Blocks from both the Blocks and Prints Databases and blocks that are constructed from sequences submitted to Block Maker can be used for blocks-versus-blocks searching of these databases with LAMA, and for viewing logos and bootstrap trees. Sensitive searches of up-to-date protein sequence databanks are carried out via direct links to the MAST server using position-specific scoring matrices and to the BLAST and PSI-BLAST servers using consensus-embedded sequence queries. Utilizing the trypsin family to evaluate performance, we illustrate the superiority of blocks-based tools over expert pairwise searching or Hidden Markov Models.  相似文献   

16.
The nucleotide sequence coding for human angiogenin has been deduced from the published amino acid sequence with the use of codons preferentially utilized in highly expressed E. coli genes. It was divided into forty-three oligonucleotides, which were synthesized by automatic gene assembler and then joined by DNA ligase into three double-stranded blocks, the blocks were consequently cloned and ligated in M13mp8 phage, and the resultant 389-bp DNA sequence coding for human angiogenin was analysed by chain-terminator sequencing technique.  相似文献   

17.
A workbench for multiple alignment construction and analysis   总被引:126,自引:0,他引:126  
Multiple sequence alignment can be a useful technique for studying molecular evolution, as well as for analyzing relationships between structure or function and primary sequence. We have developed for this purpose an interactive program, MACAW (Multiple Alignment Construction and Analysis Workbench), that allows the user to construct multiple alignments by locating, analyzing, editing, and combining "blocks" of aligned sequence segments. MACAW incorporates several novel features. (1) Regions of local similarity are located by a new search algorithm that avoids many of the limitations of previous techniques. (2) The statistical significance of blocks of similarity is evaluated using a recently developed mathematical theory. (3) Candidate blocks may be evaluated for potential inclusion in a multiple alignment using a variety of visualization tools. (4) A user interface permits each block to be edited by moving its boundaries or by eliminating particular segments, and blocks may be linked to form a composite multiple alignment. No completely automatic program is likely to deal effectively with all the complexities of the multiple alignment problem; by combining a powerful similarity search algorithm with flexible editing, analysis and display tools, MACAW allows the alignment strategy to be tailored to the problem at hand.  相似文献   

18.
The role of recombination and mutation in 16S-23S rDNA spacer rearrangements   总被引:25,自引:0,他引:25  
Gürtler V 《Gene》1999,238(1):241-252
The intragenomic heterogeneity of the bacterial intergenic (16S-23S rDNA) spacer region (ISR) was analysed from the following species in which sequences for the complete rRNA operon (rrn) set have been determined (rrn number): Enterococcus faecalis (6) and E. faecium (6), Bacillus subtilis (10), Staphylococcus aureus (9), Vibrio cholerae (4), Haemophilus influenzae (6) and Escherichia coli (7). It was found that some spacer sequence blocks were highly conserved between operons of a genome, whereas the presence of others was variable. When these variations were analysed using the program PLATO and partial likelihood phylogenies determined by DNAml for each operon set, three regions showed significant (Z>3.3) spatial variation [Region I was 78-184 nt long (2.14.4) possibly due to recombination or selection. Within Region I, there was sequence block variation in all operon sets [some operons contained tRNA genes (tRNAala, tRNAile or tRNAglu), whereas others had sequence blocks such as VS2 (S. aureus) or rsl (E. coli)]. Q Analysis of the ISR sequence from E. faecalis and E. faecium showed that there was more interspecies than intraspecies variation (both in DNA sequence and in the presence or absence of blocks). Dot matrix analysis of the sequence blocks in the nine rrn ISRs from S. aureus showed that there was significant homology between VS2 and VS5/VS6. Furthermore, repeat motifs with only A or T were present in higher copy numbers in VS5/VS6 than in VS2. Since these sequence blocks (VS2 and VS5-VS6) are related, intragenic evolution resulting in AT expansion may have occurred between these two regions. A model is proposed that postulates a role for recombination and AT-expansion in intra-genomic ISR variations. This process may represent a general mechanism of concerted evolution for bacterial ISR rearrangements.  相似文献   

19.
20.
The 3' untranslated (UT) sequences of the genomic RNAs of five geographic variants of the alphavirus Ross River virus (RRV) were determined and compared with the 3' UT sequence of RRV T48, the prototype strain. Part of the 3' UT region of Getah virus, a close serological relative of RRV, was also sequenced. The RRV 3' UT region varies markedly in length between variants. Large deletions or insertions, sequence rearrangements and single nucleotide substitutions are observed. A sequence tract of 49 to 58 nucleotides, which is repeated as four blocks in the RRV T48 3' UT region, occurs only once in the 3' UT region of one RRV strain (NB5092), indicating that the existence of repeat sequence blocks is not essential for RRV replication. However, the precise sequence of the 3' proximal copy of the repeat block and its position relative to the poly(A) tail were identical in all RRV isolates examined, suggesting that it has an important role in RRV replication. Nucleotide substitutions between RRV variants are distributed non-randomly along the length of the 3' UT region. The sequence of 120 to 130 nucleotides adjacent to the poly(A) tail is strongly conserved. Getah virus RNA contains three repeat sequence blocks in the 3' UT region. These are similar in sequence to those in RRV RNA but differ in their arrangement. Homology between the RRV and Getah 3' UT sequences is greatest in the 3' proximal repeat sequence block that shows three differences in 49 nucleotides. The 3' proximal repeat in Getah RNA occurs at the same position, relative to the poly(A) tail, as in all RRV variants. The RRV and Getah virus 3' UT sequences show extensive homology in the region between the 3' proximal repeat and the poly(A) tail but, apart from the repeat blocks themselves, they show no significant homology elsewhere.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号