首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Primer design for large scale sequencing.   总被引:10,自引:4,他引:6       下载免费PDF全文
We have developed PRIDE, a primer design program that automatically designs primers in single contigs or whole sequencing projects to extend the already known sequence and to double strand single-stranded regions. The program is fully integrated into the Staden package (GAP4) and accessible with a graphical user interface. PRIDE uses a fuzzy logic-based system to calculate primer qualities. The computational performance of PRIDE is enhanced by using suffix trees to store the huge amount of data being produced. A test set of 110 sequencing primers and 11 PCR primer pairs has been designed on genomic templates, cDNAs and sequences containing repetitive elements to analyze PRIDE's success rate. The high performance of PRIDE, combined with its minimal requirement of user interaction and its fast algorithm, make this program useful for the large scale design of primers, especially in large sequencing projects.  相似文献   

2.
J M Claverie 《Genomics》1992,12(4):838-841
The search for significant local similarities with known protein sequences is a powerful method for interpreting anonymous cDNA sequences or locating coding exons within genomic DNA sequences at a stage where the average contig size is still very small. The BLASTx program, implemented on the National Center for Biotechnology Information server, allows a sensitive search of all putative translations of a nucleotide query sequence against all known proteins in a matter of seconds. From an analysis of the current databases, I report a set of protein sequences exhibiting high local similarity to Alu repeat or vector sequences. These entries can lead to misleading interpretations of similarity searches. During the course of this study, the protease of a human spumaretrovirus was found to have integrated the 3' end half of the U2 snRNA.  相似文献   

3.
4.
The challenge of similarity search in massive DNA sequence databases has inspired major changes in BLAST-style alignment tools, which accelerate search by inspecting only pairs of sequences sharing a common short "seed," or pattern of matching residues. Some of these changes raise the possibility of improving search performance by probing sequence pairs with several distinct seeds, any one of which is sufficient for a seed match. However, designing a set of seeds to maximize their combined sensitivity to biologically meaningful sequence alignments is computationally difficult, even given recent advances in designing single seeds. This work describes algorithmic improvements to seed design that address the problem of designing a set of n seeds to be used simultaneously. We give a new local search method to optimize the sensitivity of seed sets. The method relies on efficient incremental computation of the probability that an alignment contains a match to a seed pi, given that it has already failed to match any of the seeds in a set Pi. We demonstrate experimentally that multi-seed designs, even with relatively few seeds, can be significantly more sensitive than even optimized single-seed designs.  相似文献   

5.
MOTIVATION: Advances in DNA microarray technology and computational methods have unlocked new opportunities to identify 'DNA fingerprints', i.e. oligonucleotide sequences that uniquely identify a specific genome. We present an integrated approach for the computational identification of DNA fingerprints for design of microarray-based pathogen diagnostic assays. We provide a quantifiable definition of a DNA fingerprint stated both from a computational as well as an experimental point of view, and the analytical proof that all in silico fingerprints satisfying the stated definition are found using our approach. RESULTS: The presented computational approach is implemented in an integrated high-performance computing (HPC) software tool for oligonucleotide fingerprint identification termed TOFI. We employed TOFI to identify in silico DNA fingerprints for several bacteria and plasmid sequences, which were then experimentally evaluated as potential probes for microarray-based diagnostic assays. Results and analysis of approximately 150 in silico DNA fingerprints for Yersinia pestis and 250 fingerprints for Francisella tularensis are presented. AVAILABILITY: The implemented algorithm is available upon request.  相似文献   

6.
7.
A core task in computational structural biology is the search of conformational space for low energy configurations of a biological macromolecule. Because conformational space has a very high dimensionality, the most successful search methods integrate some form of prior knowledge into a general sampling algorithm to reduce the effective dimensionality. However, integrating multiple types of constraints can be challenging. To streamline the incorporation of diverse constraints, we developed the Broker: an extension of the Rosetta macromolecular modeling suite that can express a wide range of protocols using constraints by combining small, independent modules, each of which implements a different set of constraints. We demonstrate expressiveness of the Broker through several code vignettes. The framework enables rapid protocol development in both biomolecular design and structural modeling tasks and thus is an important step towards exposing the rich functionality of Rosetta’s core libraries to a growing community of users addressing a diverse set of tasks in computational biology.  相似文献   

8.
9.
The search for significant local similarities with known protein sequences is a powerful method for interpreting anonymous cDNA sequences or locating coding exons within genomic DNA sequences at a stage where the average contig size is still very small. The BLASTx program, implemented on the National Center for Biotechnology Information server, allows a sensitive search of all putative translations of a nucleotide query sequence against all known proteins in a matter of seconds. From an analysis of the current databases, I report a set of protein sequences exhibiting high local similarity to Alu repeat or vector sequences. These entries can lead to misleading interpretations of similarity searches. During the course of this study, the protease of a human spumaretrovirus was found to have integrated the 3′ end half of the U2 snRNA.  相似文献   

10.
11.
Microsatellites, such as (TG)n found at random throughout the genome, or as 3' extensions of Alu sequences are being increasingly used as genetic markers because of their pluriallelic character. The search for polymorphic microsatellites is time consuming, however, as it is necessary to sequence clones containing the microsatellites sequences in order to design specific PCR primers before testing for polymorphism, which does not always occur. We propose here a new approach to generate polymorphic markers, based on the amplification of microsatellites at the 3' end of Alu sequences, without the need for cloning or sequencing steps.  相似文献   

12.
A self-consistent theory is presented that can be used to estimate the number and composition of sequences satisfying a predetermined set of constraints. The theory is formulated so as to examine the features of sequences having a particular value of Delta=E(f)-(u), where E(f) is the energy of sequences when in a target structure and (u) is an average energy of non-target structures. The theory yields the probabilities w(i)(alpha) that each position i in the sequence is occupied by a particular monomer type alpha. The theory is applied to a simple lattice model of proteins. Excellent agreement is observed between the theory and the results of exact enumerations. The theory provides a quantitative framework for the design and interpretation of combinatorial experiments involving proteins, where a library of amino acid sequences is searched for sequences that fold to a desired structure.  相似文献   

13.
SUMMARY: We have developed U-PRIMER, a primer design program, to compute a minimal primer set (MPS) for any given set of DNA sequences. The U-PRIMER algorithm, which uses automatic variable fixing and automatic redundant constraint elimination to tackle the binary integer programming problem associated with the MPS selection problem. The program has been tested successfully with 32 adipocyte development-related genes and 9 TB-specific genes to obtain their respective MPSs. AVAILABILITY: A free copy of U-PRIMER implemented in C++ programming language is available from http://www.u-vision-biotech.com  相似文献   

14.
Primer approximation multiplex PCR (PAMP) is a new experimental protocol for efficiently assaying structural variation in genomes. PAMP is particularly suited to cancer genomes where the precise breakpoints of alterations such as deletions or translocations vary between patients. The design of PCR primer sets for PAMP is challenging because a large number of primer pairs are required to detect alterations in the hundreds of kilobases range that can occur in cancer. These sets of primers must achieve high coverage of the region of interest, while avoiding primer dimers and satisfying the physico-chemical constraints of good PCR primers. We describe a natural formulation of these constraints as a combinatorial optimization problem. We show that the PAMP primer design problem is NP-hard, and design algorithms based on simulated annealing and integer programming, that provide good solutions to this problem in practice. The algorithms are applied to a test region around the known CDKN2A deletion, which show excellent results even in a 1:49 mixture of mutated:wild-type cells. We use these test results to help set design parameters for larger problems. We can achieve near-optimal designs for regions close to 1 Mb.  相似文献   

15.
A Windows program for metabolic engineering analysis and experimental design has been developed. A graphical user interface enables the pictorial, "on-screen" construction of a metabolic network. Once a model is composed, balance equations are automatically generated. Model construction, modification and information exchange between different users is thus considerably simplified. For a given model, the program can then be used to predict all the extreme point flux distributions that optimize an objective function while satisfying balances and constraints by using a depth-first search strategy. One can also find the minimum reaction set that satisfies different conditions. Based on the identified flux distributions or linear combinations, the user can simulate the NMR and GC/MS spectra of selected signal molecules. Alternately, spectra vectorization allows for the automated optimization of labeling experiments that are intended to distinguish between different, yet plausible flux extreme point distributions. The example provided entails predicting the flux distributions associated with deleting pyruvate kinase and designing 13C NMR experiments that can maximally discriminate between the flux distributions.  相似文献   

16.
17.
Linkage mapping strategies for complex disorders have evolved under a variety of constraints. Some of these constraints reflect the nature of complex disorders and are manifest in limitations on the kinds of data that can be collected, while others were (at least historically) strictly computational. This paper focuses on how computational issues have impacted the design of studies on complex disorders and, conversely, how our study designs have influenced the computational issues that have been addressed. We now have unprecedented computational resources, but also face unprecedented computational and methodological challenges as we move from the linkage mapping of genes influencing susceptibility to complex disorders toward the identification of the actual variation affecting susceptibility to these disorders. The near-term computational and methodological issues we must address will be profoundly influenced by the study designs of the recent past. But future study designs, as well as our investments in computational and methodological research, ought to be developed considering the computational and informatics resources we now have at hand.  相似文献   

18.
We demonstrate a modeling and computational framework that allows for rapid screening of thousands of potential network designs for particular dynamic behavior. To illustrate this capability we consider the problem of hysteresis, a prerequisite for construction of robust bistable switches and hence a cornerstone for construction of more complex synthetic circuits. We evaluate and rank most three node networks according to their ability to robustly exhibit hysteresis where robustness is measured with respect to parameters over multiple dynamic phenotypes. Focusing on the highest ranked networks, we demonstrate how additional robustness and design constraints can be applied. We compare our results to more traditional methods based on specific parameterization of ordinary differential equation models and demonstrate a strong qualitative match at a small fraction of the computational cost.  相似文献   

19.
Despite significant successes in structure‐based computational protein design in recent years, protein design algorithms must be improved to increase the biological accuracy of new designs. Protein design algorithms search through an exponential number of protein conformations, protein ensembles, and amino acid sequences in an attempt to find globally optimal structures with a desired biological function. To improve the biological accuracy of protein designs, it is necessary to increase both the amount of protein flexibility allowed during the search and the overall size of the design, while guaranteeing that the lowest‐energy structures and sequences are found. DEE/A*‐based algorithms are the most prevalent provable algorithms in the field of protein design and can provably enumerate a gap‐free list of low‐energy protein conformations, which is necessary for ensemble‐based algorithms that predict protein binding. We present two classes of algorithmic improvements to the A* algorithm that greatly increase the efficiency of A*. First, we analyze the effect of ordering the expansion of mutable residue positions within the A* tree and present a dynamic residue ordering that reduces the number of A* nodes that must be visited during the search. Second, we propose new methods to improve the conformational bounds used to estimate the energies of partial conformations during the A* search. The residue ordering techniques and improved bounds can be combined for additional increases in A* efficiency. Our enhancements enable all A*‐based methods to more fully search protein conformation space, which will ultimately improve the accuracy of complex biomedically relevant designs. Proteins 2015; 83:1859–1877. © 2015 Wiley Periodicals, Inc.  相似文献   

20.
Nature has evolved a vast repertoire of structures and functions based on an ordered, orchestrated, protein building-blocks assembly. For decades these sophisticated materials have been studied, mimicked, and repurposed, yet recently, computational protein engineering methods provided an alternative route: creating protein materials de-novo, surpassing evolutionary constraints and optimized for specific tasks. We highlight two areas of research that fundamentally accelerate design of structurally well-defined programmable protein materials. First, implementations of hierarchical assembly and geometric sampling (docking) strategies to create designable backbones under pre-specified symmetry constraints. Second, progress in protein–protein interfaces and sequence design methods, using Rosetta, that drive programmable supramolecular assemblies. These approaches have proven effective in generating diverse protein assemblies in 0-, 1-, 2-, and 3-dimensional architectures (constituting single or multiple components), and as part of a synthetic or a biological system. We expect these methods shall transform the toolbox of protein designers developing next generation synthetic and biological materials.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号