首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
High throughput macromolecular structure determination is very essential in structural genomics as the available number of sequence information far exceeds the number of available 3D structures. ACORN, a freely available resource in the CCP4 suite of programs is a comprehensive and efficient program for phasing in the determination of protein structures, when atomic resolution data are available. ACORN with the automatic model-building program ARP/wARP and refinement program REFMAC is a suitable combination for the high throughput structural genomics. ACORN can also be run with secondary structural elements like helices and sheets as inputs with high resolution data. In situations, where ACORN phasing is not sufficient for building the protein model, the fragments (incomplete model/dummy atoms) can again be used as a starting input. Iterative ACORN is proved to work efficiently in the subsequent model building stages in congerin (PDB-ID: lis3) and catalase (PDB-ID: 1gwe) for which models are available.  相似文献   

2.
In this paper we describe a method for the statistical reconstruction of a large DNA sequence from a set of sequenced fragments. We assume that the fragments have been assembled and address the problem of determining the degree to which the reconstructed sequence is free from errors, i.e., its accuracy. A consensus distribution is derived from the assembled fragment configuration based upon the rates of sequencing errors in the individual fragments. The consensus distribution can be used to find a minimally redundant consensus sequence that meets a prespecified confidence level, either base by base or across any region of the sequence. A likelihood-based procedure for the estimation of the sequencing error rates, which utilizes an iterative EM algorithm, is described. Prior knowledge of the error rates is easily incorporated into the estimation procedure. The methods are applied to a set of assembled sequence fragments from the human G6PD locus. We close the paper with a brief discussion of the relevance and practical implications of this work.  相似文献   

3.
Understanding, and ultimately predicting, how a 1-D protein chain reaches its native 3-D fold has been one of the most challenging problems during the last few decades. Data increasingly indicate that protein folding is a hierarchical process. Hence, the question arises as to whether we can use the hierarchical concept to reduce the practically intractable computational times. For such a scheme to work, the first step is to cut the protein sequence into fragments that form local minima on the polypeptide chain. The conformations of such fragments in solution are likely to be similar to those when the fragments are embedded in the native fold, although alternate conformations may be favored during the mutual stabilization in the combinatorial assembly process. Two elements are needed for such cutting: (1) a library of (clustered) fragments derived from known protein structures and (2) an assignment algorithm that selects optimal combinations to "cover" the protein sequence. The next two steps in hierarchical folding schemes, not addressed here, are the combinatorial assembly of the fragments and finally, optimization of the obtained conformations. Here, we address the first step in a hierarchical protein-folding scheme. The input is a target protein sequence and a library of fragments created by clustering building blocks that were generated by cutting all protein structures. The output is a set of cutout fragments. We briefly outline a graph theoretic algorithm that automatically assigns building blocks to the target sequence, and we describe a sample of the results we have obtained.  相似文献   

4.
Najafabadi HS  Saberi A  Torabi N  Chamankhah M 《BioTechniques》2008,44(4):519-20, 522, 524-6
This work introduces minimum accumulative degeneracy, a variant of the degenerate primer design problem, which is particularly useful when a large number of sequences are to be covered by a set of restricted number of primers. A primer set, which is designed on a minimum accumulative degeneracy basis, especially helps to reduce nonspecific PCR amplification of undesired DNA fragments, as fewer primer species are present in PCR. A Boltzmann machine is designed to solve the minimum accumulative degeneracy degenerate primer design problem, called the MAD-DPD Boltzmann machine. This algorithm shows great flexibility, as it can be determined either to solve the problem with strict fidelity to covering all input sequences or to exclude some input sequences if it results in less degenerate primers. This Boltzmann machine is successfully implemented in designing a new set of primers for amplification of antibody variable fragments from mouse spleen cells, which theoretically covers more diverse antibody sequences than currently available primers. The MAD-DPD Boltzmann machine is available online at bioinf.cs.ipm.ir/download/MAD_DPD08172007.zip.  相似文献   

5.
A new method based on neural networks to cluster proteins into families is described. The network is trained with the Kohonen unsupervised learning algorithm, using matrix pattern representations of the protein sequences as inputs. The components (x, y) of these 20×20 matrix patterns are the normalized frequencies of all pairs xy of amino acids in each sequence. We investigate the influence of different learning parameters in the final topological maps obtained with a learning set of ten proteins belonging to three established families. In all cases, except in those where the synaptic vectors remains nearly unchanged during learning, the ten proteins are correctly classified into the expected families. The classification by the trained network of mutated or incomplete sequences of the learned proteins is also analysed. The neural network gives a correct classification for a sequence mutated in 21.5%±7% of its amino acids and for fragments representing 7.5%±3% of the original sequence. Similar results were obtained with a learning set of 32 proteins belonging to 15 families. These results show that a neural network can be trained following the Kohonen algorithm to obtain topological maps of protein sequences, where related proteins are finally associated to the same winner neuron or to neighboring ones, and that the trained network can be applied to rapidly classify new sequences. This approach opens new possibilities to find rapid and efficient algorithms to organize and search for homologies in the whole protein database.  相似文献   

6.
Summary We discuss the identification of multiple input, multiple output, discrete-time bilinear state space systems. We consider two identification problems. In the first case, the input to the system is a measurable white noise sequence. We show that it is possible to identify the system by solving a nonlinear optimization problem. The number of parameters in this optimization problem can be reduced by exploiting the principle of separable least squares. A subspace-based algorithm can be used to generate initial estimates for this nonlinear identification procedure. In the second case, the input to the system is not measurable. This makes it a much more difficult identification problem than the case with known inputs. At present, we can only solve this problem for a certain class of single input, single output bilinear state space systems, namely bilinear systems in phase variable form.  相似文献   

7.
Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended.  相似文献   

8.
In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higherenergy collision induced dissociation (HCD) of charged peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.  相似文献   

9.
Polyacrylamide gel electrophoresis of DNA fragments obtained by the polymerase chain reaction using Taq polymerase revealed the presence of multiple fragments shorter than the expected product. These abortive extension products were observed even when analysis by agarose gel electrophoresis showed only a single band. The production of prematurely terminated fragments can be exploited for the sequencing of PCR products if phosphorothioate groups are incorporated base specifically during the reaction in the presence of two oligonucleotide primers, one of which is 5'-32P-labeled. The addition of snake venom phosphodiesterase to the reaction mixture after completion of the amplification cycles digests each fragment from the 3'-end to a phosphorothioate group so that the sequence can be read by polyacrylamide gel electrophoresis.  相似文献   

10.
One of the main problems in constructing synthetic genes is the incorrect hybridisation between the oligonucleotides. The problem is resolved if the sequence uniquely defines the position of the oligonucleotide in the assembled gene. This can be accomplished through the wise partition of dsDNA sequence in the fragments. We describe a program for use in designing such gene assembly. For a given DNA sequence and the approximate location of oligonucleotide boundary it generates all sets of protruding ends that share the smallest homology.  相似文献   

11.
The Gibbs sampling method has been widely used for sequence analysis after it was successfully applied to the problem of identifying regulatory motif sequences upstream of genes. Since then, numerous variants of the original idea have emerged: however, in all cases the application has been to finding short motifs in collections of short sequences (typically less than 100 nucleotides long). In this paper, we introduce a Gibbs sampling approach for identifying genes in multiple large genomic sequences up to hundreds of kilobases long. This approach leverages the evolutionary relationships between the sequences to improve the gene predictions, without explicitly aligning the sequences. We have applied our method to the analysis of genomic sequence from 14 genomic regions, totaling roughly 1.8 Mb of sequence in each organism. We show that our approach compares favorably with existing ab initio approaches to gene finding, including pairwise comparison based gene prediction methods which make explicit use of alignments. Furthermore, excellent performance can be obtained with as little as four organisms, and the method overcomes a number of difficulties of previous comparison based gene finding approaches: it is robust with respect to genomic rearrangements, can work with draft sequence, and is fast (linear in the number and length of the sequences). It can also be seamlessly integrated with Gibbs sampling motif detection methods.  相似文献   

12.
Duplex DNA fragments differing by single base substitutions can be separated by electrophoresis in denaturing gradient polyacrylamide gels, but only substitutions in a restricted part of the molecule lead to a separation (1). In an effort to circumvent this problem, we demonstrated that the melting properties and electrophoretic behavior of a 135 base pair DNA fragment containing a beta-globin promoter are changed by attaching a GC-rich sequence, called a 'GC-clamp' (2). We predicted that these changes should make it possible to resolve most, if not all, single base substitutions within fragments attached to the clamp. To test this possibility we examined the effect of several different single base substitutions on the electrophoretic behavior of the beta-globin promoter fragment in denaturing gradient gels. We find that the GC-clamp allows the separation of fragments containing substitutions throughout the promoter fragment. Many of these substitutions do not lead to a separation when the fragment is not attached to the clamp. Theoretical calculations and analysis of a large number of different mutations indicate that approximately 95% of all possible single base substitutions should be separable when attached to a GC-clamp.  相似文献   

13.
DNA-encoded chemical libraries are large collections of small organic molecules, individually coupled to DNA fragments that serve as amplifiable identification bar codes. The isolation of specific binders requires a quantitative analysis of the distribution of DNA fragments in the library before and after capture on an immobilized target protein of interest. Here, we show how Illumina sequencing can be applied to the analysis of DNA-encoded chemical libraries, yielding over 10 million DNA sequence tags per flow-lane. The technology can be used in a multiplex format, allowing the encoding and subsequent sequencing of multiple selections in the same experiment. The sequence distributions in DNA-encoded chemical library selections were found to be similar to the ones obtained using 454 technology, thus reinforcing the concept that DNA sequencing is an appropriate avenue for the decoding of library selections. The large number of sequences obtained with the Illumina method now enables the study of very large DNA-encoded chemical libraries (>500,000 compounds) and reduces decoding costs.  相似文献   

14.
SUMMARY: Multiple sequence alignment is the NP-hard problem of aligning three or more DNA or amino acid sequences in an optimal way so as to match as many characters as possible from the set of sequences. The popular sequence alignment program ClustalW uses the classical method of approximating a sequence alignment, by first computing a distance matrix and then constructing a guide tree to show the evolutionary relationship of the sequences. We show that parallelizing the ClustalW algorithm can result in significant speedup. We used a cluster of workstations using C and message passing interface for our implementation. Experimental results show that speedup of over 5.5 on six processors is obtainable for most inputs. AVAILABILITY: The software is available upon request from the second author.  相似文献   

15.
Sequence specificity of curved DNA   总被引:16,自引:0,他引:16  
S Diekmann 《FEBS letters》1986,195(1-2):53-56
Anomalously slow migration of DNA fragments on polyacrylamide gels is interpreted as resulting from curvature of the DNA fragment. Different models have been suggested to explain DNA curvature. In this work a number of DNA fragments were synthesized, cloned, and electrophoretically characterized to distinguish between these models. Strong anomaly of migration is found for sequence stretches (dA)n repeated in phase with the helix turn with n at least 4. For n smaller than 4 only negligible anomaly is observed. The results contradict the purine-clash hypothesis. The data can be explained by assuming longer stretches of As to be in a B'-form, and that tilt of this structure might be the reason for its curvature.  相似文献   

16.
Dong Q  Wang X  Lin L 《Proteins》2008,72(1):353-366
In recent years, protein structure prediction using local structure information has made great progress. In this study, a novel and effective method is developed to predict the local structure and the folding fragments of proteins. First, the proteins with known structures are split into fragments. Second, these fragments, represented by dihedrals, are clustered to produce the building blocks (BBs). Third, an efficient machine learning method is used to predict the local structures of proteins from sequence profiles. Finally, a bi-gram model, trained by an iterated algorithm, is introduced to simulate the interactions of these BBs. For test proteins, the building-block lattice is constructed, which contains all the folding fragments of the proteins. The local structures and the optimal fragments are then obtained by the dynamic programming algorithm. The experiment is performed on a subset of the PDB database with sequence identity less than 25%. The results show that the performance of the method is better than the method that uses only sequence information. When multiple paths are returned, the average classification accuracy of local structures is 72.27% and the average prediction accuracy of local structures is 67.72%, which is a significant improvement in comparison with previous studies. The method can predict not only the local structures but also the folding fragments of proteins. This work is helpful for the ab initio protein structure prediction and especially, the understanding of the folding process of proteins.  相似文献   

17.
An Eulerian path approach to global multiple alignment for DNA sequences.   总被引:3,自引:0,他引:3  
With the rapid increase in the dataset of genome sequences, the multiple sequence alignment problem is increasingly important and frequently involves the alignment of a large number of sequences. Many heuristic algorithms have been proposed to improve the speed of computation and the quality of alignment. We introduce a novel approach that is fundamentally different from all currently available methods. Our motivation comes from the Eulerian method for fragment assembly in DNA sequencing that transforms all DNA fragments into a de Bruijn graph and then reduces sequence assembly to a Eulerian path problem. The paper focuses on global multiple alignment of DNA sequences, where entire sequences are aligned into one configuration. Our main result is an algorithm with almost linear computational speed with respect to the total size (number of letters) of sequences to be aligned. Five hundred simulated sequences (averaging 500 bases per sequence and as low as 70% pairwise identity) have been aligned within three minutes on a personal computer, and the quality of alignment is satisfactory. As a result, accurate and simultaneous alignment of thousands of long sequences within a reasonable amount of time becomes possible. Data from an Arabidopsis sequencing project is used to demonstrate the performance.  相似文献   

18.
A complete simulation of the selection process can be constructed using a population of self-replicating finite-state automata. The entire population is challenged with a repeating sequence of inputs, and those individuals that are best able to recognize the input sequence are allowed to replicate most rapidly. Replication proceeds with imperfect fidelity, so that under the constraint of constant total population size, a quasispecies distribution of error copies is obtained. The operation of this simulation provides an essential representation of an evolving system. When the input sequence is altered, the structure of the existing population is destabilized, and a new quasispecies distribution emerges. The ability of the system to respond to changes in the input and the structure of the quasispecies distribution are shown to be critically dependent on the fidelity of replication.  相似文献   

19.
20.
Structural genomic projects envision almost routine protein structure determinations, which are currently imaginable only for small proteins with molecular weights below 25,000 Da. For larger proteins, structural insight can be obtained by breaking them into small segments of amino acid sequences that can fold into native structures, even when isolated from the rest of the protein. Such segments are autonomously folding units (AFU) and have sizes suitable for fast structural analyses. Here, we propose to expand an intuitive procedure often employed for identifying biologically important domains to an automatic method for detecting putative folded protein fragments. The procedure is based on the recognition that large proteins can be regarded as a combination of independent domains conserved among diverse organisms. We thus have developed a program that reorganizes the output of BLAST searches and detects regions with a large number of similar sequences. To automate the detection process, it is reduced to a simple geometrical problem of recognizing rectangular shaped elevations in a graph that plots the number of similar sequences at each residue of a query sequence. We used our program to quantitatively corroborate the premise that segments with conserved sequences correspond to domains that fold into native structures. We applied our program to a test data set composed of 99 amino acid sequences containing 150 segments with structures listed in the Protein Data Bank, and thus known to fold into native structures. Overall, the fragments identified by our program have an almost 50% probability of forming a native structure, and comparable results are observed with sequences containing domain linkers classified in SCOP. Furthermore, we verified that our program identifies AFU in libraries from various organisms, and we found a significant number of AFU candidates for structural analysis, covering an estimated 5 to 20% of the genomic databases. Altogether, these results argue that methods based on sequence similarity can be useful for dissecting large proteins into small autonomously folding domains, and such methods may provide an efficient support to structural genomics projects.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号