共查询到20条相似文献,搜索用时 0 毫秒
1.
We present anO(R logP) time,O(M+P
2
) space algorithm for searching a restriction map withM sites for the best matches to a shorter map withP sites, whereR, the number of matching site pairs, is bounded byMP. As first proposed by Watermanet al. (1984,Nucl. Acids Res.
12, 237–242) the objective function used to score matches is additive in the number of unaligned sites and the discrepancies
in the distances between adjacent aligned sites. Our algorithm is basically a sparse dynamic programming computation in which
“candidate lists” are used to model the future contribution of all previously computed entries to those yet to be computed.
A simple modification to the algorithm computes the distance between two restriction maps withM andN sites, respectively, inO(MN(logM+logN)) time.
This author’s work was supported in part by National Library of Medicine Grant R01-LM4960.
This author’s work was supported in part by National Library of Medicine Grant R01-LM5110. 相似文献
2.
N N Sokolov N V Anike?cheva A B Fitsner O T Samko E B Khoroshutina 《Zhurnal mikrobiologii, epidemiologii, i immunobiologii》1988,(9):86-89
The search for restrictases in 154 strains belonging to 104 species of 32 genera of microorganisms has been carried out by the method of rapid toluene assay. In 10 strains the activity of endonucleases specifically fragmenting the DNA of phage lambda in the presence of Mg2+ ions has been detected. Restrictases Pae I and Pae II formed by two Pseudomonas aeruginosa strains have been identified as the true isoschizomers of restriction endonucleases Sph I and Sma I respectively. The results of the screening of restrictase-producing strains indicate that the production of restrictases is widely spread among microorganisms of the genus Bacillus. 相似文献
3.
Geer LY Markey SP Kowalak JA Wagner L Xu M Maynard DM Yang X Shi W Bryant SH 《Journal of proteome research》2004,3(5):958-964
Large numbers of MS/MS peptide spectra generated in proteomics experiments require efficient, sensitive and specific algorithms for peptide identification. In the Open Mass Spectrometry Search Algorithm (OMSSA), specificity is calculated by a classic probability score using an explicit model for matching experimental spectra to sequences. At default thresholds, OMSSA matches more spectra from a standard protein cocktail than a comparable algorithm. OMSSA is designed to be faster than published algorithms in searching large MS/MS datasets. 相似文献
4.
An algorithm for searching restriction maps 总被引:1,自引:0,他引:1
This paper presents an algorithm thai searches a DNA restrictionenzyme map for regions that approximately match a shorter 'probe'map. Both the map and the probe consist of a sequence of address-enzymepairs denoting restriction sites, and the algorithm penalizesa potential match for undetected or missing sites and for discrepanciesin the distance between adjacent sites. The algorithm was designedspecifically for comparing relatively short DNA sequences witha long restriction map, a problem that will become increasingcommon as large physical maps are generated. The algorithm hasbeen used to extract information from a restriction map of theentire Escherichia coli genome.
Received on October 28, 1989; accepted on February 2, 1990 相似文献
5.
In this paper we examine the distance geometry (DG) algorithm in the form used to determine the structure of proteins. We focus on three aspects of the algorithm: bound smoothing with the triangle inequality, the random selection of distances within the bounds, and the number of distances needed to specify a structure. Computational experiments are performed using simulated and real data for basic pancreatic trypsin inhibitor (BPTI) from nmr and crystallographic measurements. We find that the upper bounds determined by bound smoothing to be a linear function of the true crystal distance. A simple model that describes the results obtained with randomly selected trial distances is proposed. Using this representation of the trial distances, we show that BPTI DG structures are more compact than the true crystal structure. We also show that the DG-generated structures no longer resemble test structures when the number of these interresidue distance constraints is less than the number of degrees of freedom of the protein backbone. While the actual model will be sensitive the way distances are chosen, our conclusions are likely to apply to other versions of the DG algorithm. 相似文献
6.
The main methodological approaches to the search of new restriction endonucleases are reviewed. These methods include obtaining acellular extracts by ultrasonic desintegration of microbial cells, osmotic shock effects, the effects of organic solvents, mechanical disruption of bacterial cells, biphase division after Albertson and others. The resolving power of any method discussed depends mainly on the level of restriction endonuclease activity, the presence of nonspecific endonucleases in the biomass, the presence of exonucleases and the taxonomy of the used microorganisms. 相似文献
7.
Accelerated off-target search algorithm for siRNA 总被引:7,自引:0,他引:7
MOTIVATION: Designing highly effective short interfering RNA (siRNA) sequences with maximum target-specificity for mammalian RNA interference (RNAi) is one of the hottest topics in molecular biology. The relationship between siRNA sequences and RNAi activity has been studied extensively to establish rules for selecting highly effective sequences. However, there is a pressing need to compute siRNA sequences that minimize off-target silencing effects efficiently and to match any non-targeted sequences with mismatches. RESULTS: The enumeration of potential cross-hybridization candidates is non-trivial, because siRNA sequences are short, ca. 19 nt in length, and at least three mismatches with non-targets are required. With at least three mismatches, there are typically four or five contiguous matches, so that a BLAST search frequently overlooks off-target candidates. By contrast, existing accurate approaches are expensive to execute; thus we need to develop an accurate, efficient algorithm that uses seed hashing, the pigeonhole principle, and combinatorics to identify mismatch patterns. Tests show that our method can list potential cross-hybridization candidates for any siRNA sequence of selected human gene rapidly, outperforming traditional methods by orders of magnitude in terms of computational performance. AVAILABILITY: http://design.RNAi.jp CONTACT: yamada@cb.k.u-tokyo.ac.jp. 相似文献
8.
Optimizing selection of restriction enzymes in the search for DNA variants 总被引:7,自引:1,他引:7
下载免费PDF全文

E M Wijsman 《Nucleic acids research》1984,12(23):9209-9226
A model is developed for predicting the relative efficiencies of different enzymes for detecting DNA variants when such variants are the result of single base-pair changes. 71 enzymes are analyzed for this ability in human DNA. Their relative ranked efficiencies are influenced by the sizes of the probes used, and the size of the smallest detectable fragment produced. 相似文献
9.
Kyungtaek Lim Kazunori D. Yamada Martin C. Frith Kentaro Tomii 《Journal of structural and functional genomics》2016,17(4):147-154
Protein database search for public databases is a fundamental step in the target selection of proteins in structural and functional genomics and also for inferring protein structure, function, and evolution. Most database search methods employ amino acid substitution matrices to score amino acid pairs. The choice of substitution matrix strongly affects homology detection performance. We earlier proposed a substitution matrix named MIQS that was optimized for distant protein homology search. Herein we further evaluate MIQS in combination with LAST, a heuristic and fast database search tool with a tunable sensitivity parameter m, where larger m denotes higher sensitivity. Results show that MIQS substantially improves the homology detection and alignment quality performance of LAST across diverse m parameters. Against a protein database consisting of approximately 15 million sequences, LAST with m?=?105 achieves better homology detection performance than BLASTP, and completes the search 20 times faster. Compared to the most sensitive existing methods being used today, CS-BLAST and SSEARCH, LAST with MIQS and m?=?106 shows comparable homology detection performance at 2.0 and 3.9 times greater speed, respectively. Results demonstrate that MIQS-powered LAST is a time-efficient method for sensitive and accurate homology search. 相似文献
10.
The prediction of protein conformation from its amino-acid sequence is one of the most prominent problems in computational biology. But it is NP-hard. Here, we focus on an abstraction widely studied of this problem, the two-dimensional hydrophobic-polar protein folding problem (2D HP PFP). Mathematical optimal model of free energy of protein is established. Native conformations are often sought using stochastic sampling methods, but which are slow. The elastic net (EN) algorithm is one of fast deterministic methods as travelling salesman problem (TSP) strategies. However, it cannot be applied directly to protein folding problem, because of fundamental differences in the two types of problems. In this paper, how the 2D HP protein folding problem can be framed in terms of TSP is shown. Combination of the modified elastic net algorithm and novel local search method is adopted to solve this problem. To our knowledge, this is the first application of EN algorithm to 2D HP model. The results indicate that our approach can find more optimal conformations and is simple to implement, computationally efficient and fast. 相似文献
11.
Searching a database for a local alignment to a query under a typical scoring scheme, such as PAM120 or BLOSUM62 with affine gap costs, is a computation that has resisted algorithmic improvement due to its basis in dynamic programming and the weak nature of the signals being searched for. In a query preprocessing step, a set of tables can be built that permit one to (a) eliminate a large fraction of the dynamic programming matrix from consideration and (b) to compute several steps of the remainder with a single table lookup. While this result is not an asymptotic improvement over the original Smith-Waterman algorithm, its complexity is characterized in terms of some sparse features of the matrix and it yields the fastest software implementation to date for such searches. 相似文献
12.
13.
Finding the near-native structure of a protein is one of the most important open problems in structural biology and biological physics. The problem becomes dramatically more difficult when a given protein has no regular secondary structure or it does not show a fold similar to structures already known. This situation occurs frequently when we need to predict the tertiary structure of small molecules, called peptides. In this research work, we propose a new ab initio algorithm, the generalized pattern search algorithm, based on the well-known class of Search-and-Poll algorithms. We performed an extensive set of simulations over a well-known set of 44 peptides to investigate the robustness and reliability of the proposed algorithm, and we compared the peptide conformation with a state-of-the-art algorithm for peptide structure prediction known as PEPstr. In particular, we tested the algorithm on the instances proposed by the originators of PEPstr, to validate the proposed algorithm; the experimental results confirm that the generalized pattern search algorithm outperforms PEPstr by 21.17% in terms of average root mean-square deviation, RMSD Cα. 相似文献
14.
ABSTRACT: BACKGROUND: Previous studies on tumor classification based on gene expression profiles suggest that gene selection plays a key role in improving the classification performance. Moreover, finding important tumor-related genes with the highest accuracy is a very important task because these genes might serve as tumor biomarkers, which is of great benefit to not only tumor molecular diagnosis but also drug development. RESULTS: This paper proposes a novel gene selection method with rich biomedical meaning based on Heuristic Breadth-first Search Algorithm (HBSA) to find as many optimal gene subsets as possible. Due to the curse of dimensionality, this type of method could suffer from over-fitting and selection bias problems. To address these potential problems, a HBSA-based ensemble classifier is constructed using majority voting strategy from individual classifiers constructed by the selected gene subsets, and a novel HBSA-based gene ranking method is designed to find important tumor-related genes by measuring the significance of genes using their occurrence frequencies in the selected gene subsets. The experimental results on nine tumor datasets including three pairs of cross-platform datasets indicate that the proposed method can not only obtain better generalization performance but also find many important tumor-related genes. CONCLUSIONS: It is found that the frequencies of the selected genes follow a power-law distribution, indicating that only a few top-ranked genes can be used as potential diagnosis biomarkers. Moreover, the top-ranked genes leading to very high prediction accuracy are closely related to specific tumor subtype and even hub genes. Compared with other related methods, the proposed method can achieve higher prediction accuracy with fewer genes. Moreover, they are further justified by analyzing the top-ranked genes in the context of individual gene function, biological pathway, and protein-protein interaction network. 相似文献
15.
A novel combination of optimization methods (Genetic Algorithm with Distance Geometry) has been developed and shown to find near-optimal solutions to a set of imposed structural constraints. With this modelling tool (GADGET), the fold-space of a variety of small zinc-binding proteins was investigated under the constraints required to form a zinc-binding site (or pair of sites). Analysis of the results concentrated on the ring-finger domain as the "classic" zinc-finger domains were too constrained to provide much topological variety, whilst the TFIIH domain (which has large unstructured loops) did not behave well. The intermediate ring-finger domain, however, was found to adopt a variety of different folds, many of which had near-optimal scores under the fitness function employed in GADGET (forming good secondary structures and zinc-coordination). Although the native fold was dominant amongst the solutions, the discovery of good alternate folds shows that even the eight residues constrained to form two zinc-binding sites was not sufficient to uniquely determine the native fold. Despite this, the fold-space of 48 theoretically possible folds was greatly reduced with just six topologies found in significant numbers. 相似文献
16.
Background
Identification of protein complexes in large interaction networks is crucial to understand principles of cellular organization and predict protein functions, which is one of the most important issues in the post-genomic era. Each protein might be subordinate multiple protein complexes in the real protein-protein interaction networks. Identifying overlapping protein complexes from protein-protein interaction networks is a considerable research topic.Result
As an effective algorithm in identifying overlapping module structures, clique percolation method (CPM) has a wide range of application in social networks and biological networks. However, the recognition accuracy of algorithm CPM is lowly. Furthermore, algorithm CPM is unfit to identifying protein complexes with meso-scale when it applied in protein-protein interaction networks. In this paper, we propose a new topological model by extending the definition of k-clique community of algorithm CPM and introduced distance restriction, and develop a novel algorithm called CP-DR based on the new topological model for identifying protein complexes. In this new algorithm, the protein complex size is restricted by distance constraint to conquer the shortcomings of algorithm CPM. The algorithm CP-DR is applied to the protein interaction network of Sacchromyces cerevisiae and identifies many well known complexes.Conclusion
The proposed algorithm CP-DR based on clique percolation and distance restriction makes it possible to identify dense subgraphs in protein interaction networks, a large number of which correspond to known protein complexes. Compared to algorithm CPM, algorithm CP-DR has more outstanding performance.17.
We have reconstructed, from experimental approximately 2 nm resolution X-ray solution scattering profiles, the corresponding shapes and sizes of myoglobin, troponin C, spermadhesin PSP-I/PSP-II, chymotrypsinogen A, superoxide dismutase, ovalbumin, tubulin, nitrite reductase, catalase, the structural change of troponin C upon dissociation of the two high affinity Ca(2+), and the solution model structure of a tandem pair of fibronectin type III cytoplasmic domains of integrin alpha6beta4 before determination of its crystal structure. To this purpose we have designed a new genetic algorithm which gradually explores a discrete search space and evolves convergent models made of several hundred beads (down to 0.3 nm radius) best fitting the scattering profile upon Debye calculation, without geometrical constraints or penalty for loose beads. This is a procedure of effective numerical transformation of the one-dimensional scattering profiles into three-dimensional model structures. The number of beads in models is correlated with the protein molecular mass (with one exception). The shape and approximate dimensions of each protein have been retrieved by a set of ten solution models, essentially superimposable with the available crystal structures. 相似文献
18.
In many species genes move over limited distances, such that genetic differences among populations or individuals are expected to increase as a function of geographical distance. In other species, however, genes may move any distance over a single generation time, such that no increase of genetic differences is expected to occur with distance. Patterns of gene dispersal have been assessed typically using this theoretical property. In this study, this classical approach based on a Mantel test was compared to a new method using individual assignment to reveal contrasts in dispersal patterns between 15 populations of brook charr Salvelinus fontinalis and 10 populations of Atlantic salmon Salmo salar sampled in eastern Canada, where both species co-occur naturally. Based on the Mantel test, we found evidence for neither an increase of genetic differences with distance in either species nor a significant contrast between them. The individual-based method, in contrast, revealed that individual assignment in both species was non random, being significantly biased toward geographically proximate locations. Furthermore, brook charr were on average assigned to a closer river than were salmon, according to a priori expectations based on the dispersal behaviour of the two species. We thus propose that individual assignment methods might be a promising and more powerful alternative to Mantel tests when isolation by distance cannot be postulated a priori. 相似文献
19.
This paper describes a generic algorithm for finding restrictionsites within DNA sequences. The genericity ofthe algorithm is made possible through the use of set theory.Basic elements of DNA sequences, i.e. nucleotides (bases), arerepresented in sets, and DNA sequences, whether specific, ambiguousor even protein-coding, are represented as sequences of thosesets. The set intersection operation demonstrates its abilityto perform pattern-matching correctly on various DNA sequences.The performance analysis showed that the degree of complexityof the pattern matching is reduced from exponential to linear.An example is given to show the actual and potential restrictionsites, derived by the generic algorithm, in the DNA sequencetemplate coding for a synthetic calmodulin. Received on October 2, 1990; accepted on December 18, 1990 相似文献
20.
Jotun J. Hein 《Bulletin of mathematical biology》1989,51(5):597-603
In this article the question of reconstructing a phylogeny from additive distance data is addressed. Previous algorithms used the complete distance matrix of then OTUs (Operational Taxonomic Unit), that corresponds to the tips of the tree. This usedO(n 2) computing time. It is shown that this is wasteful for biologically reasonable trees. If the tree has internal nodes with degrees that are bounded onO(n*log(n)) algorithm is possible. It is also shown if the nodes can have unbounded degrees the problem hasn 2 as lower bound. 相似文献