首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
NMR resonance assignment is one of the key steps in solving an NMR protein structure. The assignment process links resonance peaks to individual residues of the target protein sequence, providing the prerequisite for establishing intra- and inter-residue spatial relationships between atoms. The assignment process is tedious and time-consuming, which could take many weeks. Though there exist a number of computer programs to assist the assignment process, many NMR labs are still doing the assignments manually to ensure quality. This paper presents a new computational method based on the combination of a suite of algorithms for automating the assignment process, particularly the process of backbone resonance peak assignment. We formulate the assignment problem as a constrained weighted bipartite matching problem. While the problem, in the most general situation, is NP-hard, we present an efficient solution based on a branch-and-bound algorithm with effective bounding techniques using two recently introduced approximation algorithms. We also devise a greedy filtering algorithm for reducing the search space. Our experimental results on 70 instances of (pseudo) real NMR data derived from 14 proteins demonstrate that the new solution runs much faster than a recently introduced (exhaustive) two-layer algorithm and recovers more correct peak assignments than the two-layer algorithm. Our result demonstrates that integrating different algorithms can achieve a good tradeoff between backbone assignment accuracy and computation time.  相似文献   

2.
Shah SB  Sahinidis NV 《PloS one》2012,7(5):e37493
Protein structure alignment is the problem of determining an assignment between the amino-acid residues of two given proteins in a way that maximizes a measure of similarity between the two superimposed protein structures. By identifying geometric similarities, structure alignment algorithms provide critical insights into protein functional similarities. Existing structure alignment tools adopt a two-stage approach to structure alignment by decoupling and iterating between the assignment evaluation and structure superposition problems. We introduce a novel approach, SAS-Pro, which addresses the assignment evaluation and structure superposition simultaneously by formulating the alignment problem as a single bilevel optimization problem. The new formulation does not require the sequentiality constraints, thus generalizing the scope of the alignment methodology to include non-sequential protein alignments. We employ derivative-free optimization methodologies for searching for the global optimum of the highly nonlinear and non-differentiable RMSD function encountered in the proposed model. Alignments obtained with SAS-Pro have better RMSD values and larger lengths than those obtained from other alignment tools. For non-sequential alignment problems, SAS-Pro leads to alignments with high degree of similarity with known reference alignments. The source code of SAS-Pro is available for download at http://eudoxus.cheme.cmu.edu/saspro/SAS-Pro.html.  相似文献   

3.
We often need to learn how to move based on a single performance measure that reflects the overall success of our movements. However, movements have many properties, such as their trajectories, speeds and timing of end-points, thus the brain needs to decide which properties of movements should be improved; it needs to solve the credit assignment problem. Currently, little is known about how humans solve credit assignment problems in the context of reinforcement learning. Here we tested how human participants solve such problems during a trajectory-learning task. Without an explicitly-defined target movement, participants made hand reaches and received monetary rewards as feedback on a trial-by-trial basis. The curvature and direction of the attempted reach trajectories determined the monetary rewards received in a manner that can be manipulated experimentally. Based on the history of action-reward pairs, participants quickly solved the credit assignment problem and learned the implicit payoff function. A Bayesian credit-assignment model with built-in forgetting accurately predicts their trial-by-trial learning.  相似文献   

4.
Assignment of orthologous genes via genome rearrangement   总被引:1,自引:0,他引:1  
The assignment of orthologous genes between a pair of genomes is a fundamental and challenging problem in comparative genomics. Existing methods that assign orthologs based on the similarity between DNA or protein sequences may make erroneous assignments when sequence similarity does not clearly delineate the evolutionary relationship among genes of the same families. In this paper, we present a new approach to ortholog assignment that takes into account both sequence similarity and evolutionary events at a genome level, where orthologous genes are assumed to correspond to each other in the most parsimonious evolving scenario under genome rearrangement. First, the problem is formulated as that of computing the signed reversal distance with duplicates between the two genomes of interest. Then, the problem is decomposed into two new optimization problems, called minimum common partition and maximum cycle decomposition, for which efficient heuristic algorithms are given. Following this approach, we have implemented a high-throughput system for assigning orthologs on a genome scale, called SOAR, and tested it on both simulated data and real genome sequence data. Compared to a recent ortholog assignment method based entirely on homology search (called INPARANOID), SOAR shows a marginally better performance in terms of sensitivity on the real data set because it is able to identify several correct orthologous pairs that are missed by INPARANOID. The simulation results demonstrate that SOAR, in general, performs better than the iterated exemplar algorithm in terms of computing the reversal distance and assigning correct orthologs.  相似文献   

5.
MOTIVATION: Protein sequence alignment plays a critical role in computational biology as it is an integral part in many analysis tasks designed to solve problems in comparative genomics, structure and function prediction, and homology modeling. METHODS: We have developed novel sequence alignment algorithms that compute the alignment between a pair of sequences based on short fixed- or variable-length high-scoring subsequences. Our algorithms build the alignments by repeatedly selecting the highest scoring pairs of subsequences and using them to construct small portions of the final alignment. We utilize PSI-BLAST generated sequence profiles and employ a profile-to-profile scoring scheme derived from PICASSO. RESULTS: We evaluated the performance of the computed alignments on two recently published benchmark datasets and compared them against the alignments computed by existing state-of-the-art dynamic programming-based profile-to-profile local and global sequence alignment algorithms. Our results show that the new algorithms achieve alignments that are comparable with or better than those achieved by existing algorithms. Moreover, our results also showed that these algorithms can be used to provide better information as to which of the aligned positions are more reliable--a critical piece of information for comparative modeling applications.  相似文献   

6.
Protein NMR peak assignment refers to the process of assigning a group of "spin systems" obtained experimentally to a protein sequence of amino acids. The automation of this process is still an unsolved and challenging problem in NMR protein structure determination. Recently, protein NMR peak assignment has been formulated as an interval scheduling problem (ISP), where a protein sequence P of amino acids is viewed as a discrete time interval I (the amino acids on P one-to-one correspond to the time units of I), each subset S of spin systems that are known to originate from consecutive amino acids from P is viewed as a "job" j(s), the preference of assigning S to a subsequence P of consecutive amino acids on P is viewed as the profit of executing job j(s) in the subinterval of I corresponding to P, and the goal is to maximize the total profit of executing the jobs (on a single machine) during I. The interval scheduling problem is max SNP-hard in general; but in the real practice of protein NMR peak assignment, each job j(s) usually requires at most 10 consecutive time units, and typically the jobs that require one or two consecutive time units are the most difficult to assign/schedule. In order to solve these most difficult assignments, we present an efficient 13/7-approximation algorithm for the special case of the interval scheduling problem where each job takes one or two consecutive time units. Combining this algorithm with a greedy filtering strategy for handling long jobs (i.e., jobs that need more than two consecutive time units), we obtain a new efficient heuristic for protein NMR peak assignment. Our experimental study shows that the new heuristic produces the best peak assignment in most of the cases, compared with the NMR peak assignment algorithms in the recent literature. The above algorithm is also the first approximation algorithm for a nontrivial case of the well-known interval scheduling problem that breaks the ratio 2 barrier.  相似文献   

7.
Flexible transfer lines or mixed-model assembly lines are capable of diversified small-lot production due to negligible switch-over costs. With these lines, it is possible to implement just-in-time (JIT) production, which involves producing only the necessary parts in the necessary quantities at the necessary times. The problem of sequencing flexible transfer lines according to the JIT philosophy can be formulated as a nonlinear integer programming problem. Heuristic algorithms to solve the problem have appeared in the literature. In this paper, we show that the problem can be explicitly reduced to an assignment problem. Thus, we provide an efficient algorithm for an optimal solution to the JIT sequencing problem.  相似文献   

8.
Path matching and graph matching in biological networks.   总被引:2,自引:0,他引:2  
We develop algorithms for the following path matching and graph matching problems: (i) given a query path p and a graph G, find a path p' that is most similar to p in G; (ii) given a query graph G (0) and a graph G, find a graph G (0)' that is most similar to G (0) in G. In these problems, p and G (0) represent a given substructure of interest to a biologist, and G represents a large network in which the biologist desires to find a related substructure. These algorithms allow the study of common substructures in biological networks in order to understand how these networks evolve both within and between organisms. We reduce the path matching problem to finding a longest weighted path in a directed acyclic graph and show that the problem of finding top k suboptimal paths can be solved in polynomial time. This is in contrast with most previous approaches that used exponential time algorithms to find simple paths which are practical only when the paths are short. We reduce the graph matching problem to finding highest scoring subgraphs in a graph and give an exact algorithm to solve the problem when the query graph G (0) is of moderate size. This eliminates the need for less accurate heuristic or randomized algorithms.We show that our algorithms are able to extract biologically meaningful pathways from protein interaction networks in the DIP database and metabolic networks in the KEGG database. Software programs implementing these techniques (PathMatch and GraphMatch) are available at http://faculty.cs.tamu.edu/shsze/pathmatch and http://faculty.cs.tamu.edu/shsze/graphmatch.  相似文献   

9.
An efficient rank based approach for closest string and closest substring   总被引:1,自引:0,他引:1  
Dinu LP  Ionescu R 《PloS one》2012,7(6):e37576
This paper aims to present a new genetic approach that uses rank distance for solving two known NP-hard problems, and to compare rank distance with other distance measures for strings. The two NP-hard problems we are trying to solve are closest string and closest substring. For each problem we build a genetic algorithm and we describe the genetic operations involved. Both genetic algorithms use a fitness function based on rank distance. We compare our algorithms with other genetic algorithms that use different distance measures, such as Hamming distance or Levenshtein distance, on real DNA sequences. Our experiments show that the genetic algorithms based on rank distance have the best results.  相似文献   

10.
Gene chromosomal assignment can be realized not only by somatic hybrid panels but also by spot-blot hybridization or polymerase chain reaction (PCR) of flow-sorted chromosomes. We propose a swine chromosome assignment strategy by PCR amplification on pooled chromosomal DNA, which allows assignment despite possible chromosomal contamination during sorting. Each pool contains three different chromosomes, each chromosome being present in one or two pools. We present concordant results obtained for eight markers already mapped to different swine chromosomes and we assign the somatostatin gene to chromosome 13, a new marker in the pig genome.  相似文献   

11.
MOTIVATION: Hidden Markov models (HMMs) and generalized HMMs been successfully applied to many problems, but the standard Viterbi algorithm for computing the most probable interpretation of an input sequence (known as decoding) requires memory proportional to the length of the sequence, which can be prohibitive. Existing approaches to reducing memory usage either sacrifice optimality or trade increased running time for reduced memory. RESULTS: We developed two novel decoding algorithms, Treeterbi and Parallel Treeterbi, and implemented them in the TWINSCAN/N-SCAN gene-prediction system. The worst case asymptotic space and time are the same as for standard Viterbi, but in practice, Treeterbi optimally decodes arbitrarily long sequences with generalized HMMs in bounded memory without increasing running time. Parallel Treeterbi uses the same ideas to split optimal decoding across processors, dividing latency to completion by approximately the number of available processors with constant average overhead per processor. Using these algorithms, we were able to optimally decode all human chromosomes with N-SCAN, which increased its accuracy relative to heuristic solutions. We also implemented Treeterbi for Pairagon, our pair HMM based cDNA-to-genome aligner. AVAILABILITY: The TWINSCAN/N-SCAN/PAIRAGON open source software package is available from http://genes.cse.wustl.edu.  相似文献   

12.
We present a new approach capable of assigning charge states to peptides based on both their intact mass spectrum and their fragmentation mass spectrum. More specifically, our approach aims at fully exploiting available information to improve correct charge assignment rate. This is achieved by using information provided by the fragmentation spectrum extensively. For low-resolution spectra, charge assignment based on fragmentation mass spectrum is better than charge assignment based on intact peptide signal only. We introduce two methods that allow to integrate information contributing to successful peptide charge state assignment. We demonstrate the performance of our algorithms on large ion trap data sets. The application of these algorithms to large-scale proteomics projects can save significant computation time and have a positive impact on identification false positive rates.  相似文献   

13.
A PCR primer sequence is called degenerate if some of its positions have several possible bases. The degeneracy of the primer is the number of unique sequence combinations it contains. We study the problem of designing a pair of primers with prescribed degeneracy that match a maximum number of given input sequences. Such problems occur when studying a family of genes that is known only in part, or is known in a related species. We prove that various simplified versions of the problem are hard, show the polynomiality of some restricted cases, and develop approximation algorithms for one variant. Based on these algorithms, we implemented a program called HYDEN for designing highly-degenerate primers for a set of genomic sequences. We report on the success of the program in an experimental scheme for identifying all human olfactory receptor (OR) genes. In that project, HYDEN was used to design primers with degeneracies up to 10(10) that amplified with high specificity many novel genes of that family, tripling the number of OR genes known at the time.  相似文献   

14.
Current Particle Swarm Optimization (PSO) algorithms do not address problems with unknown dimensions, which arise in many applications that would benefit from the use of PSO. In this paper, we propose a new algorithm, called Dimension Adaptive Particle Swarm Optimization (DA-PSO) that can address problems with any number of dimensions. We also propose and compare three other PSO-based methods with DA-PSO. We apply our algorithms to solve the Weibull mixture model density estimation problem as an illustration. DA-PSO achieves better objective function values than other PSO-based algorithms on four simulated datasets and a real dataset. We also compare DA-PSO with the recursive Expectation-Maximization (EM) estimator, which is a non-PSO-based method, obtaining again very good results.  相似文献   

15.
In this study, we address the meta-task scheduling problem in heterogeneous computing (HC) systems, which is to find a task assignment that minimizes the schedule length of a meta-task composed of several independent tasks with no data dependencies. The fact that the meta-task scheduling problem in HC systems is NP-hard has motivated the development of many heuristic scheduling algorithms. These heuristic algorithms, however, neglect the stochastic nature of task execution times in an attempt to minimize a deterministic objective function, which is the maximum of the expected values of machine loads. Contrary to existing heuristics, we account for this stochastic nature by modeling task execution times as random variables. We, then, formulate a stochastic scheduling problem where the objective is to minimize the expected value of the maximum of machine loads. We prove that this new objective is underestimated by the deterministic objective function and that an optimal task assignment obtained with respect to the deterministic objective function could be inefficient in a real computing platform. In order to solve the stochastic scheduling problem posed, we develop a genetic algorithm based scheduling heuristic. Our extensive simulation studies show that the proposed genetic algorithm can produce better task assignments as compared to existing heuristics. Specifically, we observe a performance improvement on the relative cost heuristic (M.-Y. Wu and W. Shu, A high-performance mapping algorithm for heterogeneous computing systems, in: Int. Parallel and Distributed Processing Symposium, San Francisco, CA, April 2001) by up to 61%.  相似文献   

16.
There are several instances in which quantitative trait locus (QTL) mapping experiments have been independently carried out for similar traits in different laboratories. We develop a permutation test of the correspondence between the test statistics obtained from genome-wide QTL scans in two such experiments to test whether the same QTLs are segregating in the experimental pair. In simulations, we show that the permutation test has the desired properties if chromosomes are of equal length, but bias can occur if chromosomes are of unequal length, a problem connected with autocorrelation of test statistic values. We apply the test to data from three recent mouse body weight QTL mapping experiments. The results from the test are non-significant, and imply a lack of overall concordance between the QTLs that were segregating in these experiments.  相似文献   

17.
We have developed an approach for simultaneous structure calculation and automatic Nuclear Overhauser Effect (NOE) assignment to solve nuclear magnetic resonance (NMR) structures from unassigned NOESY data. The approach, autoNOE-Rosetta, integrates Resolution Adapted Structural RECombination (RASREC) Rosetta NMR calculations with algorithms for automatic NOE assignment. The method was applied to two proteins in the 15–20 kDa size range for which both, NMR and X-ray data, is available. The autoNOE-Rosetta calculations converge for both proteins and yield accurate structures with an RMSD of 1.9 Å to the X-ray reference structures. The method greatly expands the radius of convergence for automatic NOE assignment, and should be broadly useful for NMR structure determination.  相似文献   

18.
Given a set of evolutionary trees on a same set of taxa, the maximum agreement subtree problem (MAST), respectively, maximum compatible tree problem (MCT), consists of finding a largest subset of taxa such that all input trees restricted to these taxa are isomorphic, respectively compatible. These problems have several applications in phylogenetics such as the computation of a consensus of phylogenies obtained from different data sets, the identification of species subjected to horizontal gene transfers and, more recently, the inference of supertrees, e.g., Trees Of Life. We provide two linear time algorithms to check the isomorphism, respectively, compatibility, of a set of trees or otherwise identify a conflict between the trees with respect to the relative location of a small subset of taxa. Then, we use these algorithms as subroutines to solve MAST and MCT on rooted or unrooted trees of unbounded degree. More precisely, we give exact fixed-parameter tractable algorithms, whose running time is uniformly polynomial when the number of taxa on which the trees disagree is bounded. The improves on a known result for MAST and proves fixed-parameter tractability for MCT.  相似文献   

19.
A PCR primer sequence is called degenerate if some of its positions have several possible bases. The degeneracy of the primer is the number of unique sequence combinations it contains. We study the problem of designing a pair of primers with prescribed degeneracy that match a maximum number of given input sequences. Such problems occur when studying a family of genes that is known only in part, or is known in a related species. We prove that various simplified versions of the problem are hard, show the polynomiality of some restricted cases, and develop approximation algorithms for one variant. Based on these algorithms, we implemented a program called HYDEN for designing highly degenerate primers for a set of genomic sequences. We report on the success of the program in several applications, one of which is an experimental scheme for identifying all human olfactory receptor (OR) genes. In that project, HYDEN was used to design primers with degeneracies up to 10(10) that amplified with high specificity many novel genes of that family, tripling the number of OR genes known at the time.  相似文献   

20.
Early flexible manufacturing system (FMS) production planning models exhibited a variety of planning objectives; typically, these objectives were independent of the overall production environment. More recently, some researchers have proposed hierarchical production planning and scheduling models for FMS. In this article, we examine production planning of FMS in a material requirements planning (MRP) environment. We propose a hierarchical structure that integrates FMS production planning into a closed-loop MRP system. This structure gives rise to the FMS/MRP rough-cut capacity planning (FMRCP) problem, the FMS/MRP grouping and loading (FMGL) problem, and the FMS/MRP detailed scheduling problem. We examine the FMRCP and FMGL problems in detail and present mathematical programming models for each of these problems. In particular, the FMRCP problem is modeled as a generalized assignment problem (GAP), and a GAP-based heuristic procedure is defined for the problem. We define a two-phase heuristic for the FMGL problem and present computational experience with both heuristics. The FMRCP heuristic is shown to solve problems that exhibit a dependent-demand relation within the FMS and with FMS capacity utilization as high as 99 percent. The FMGL heuristic requires very little CPU time and obtains solutions to the test problems that are on average within 1.5 percent of a theoretical lower bound. This FMS/MRP production planning framework, together with the resulting models, constitutes an important step in the integration of FMS technology with MRP production planning. The hierarchical planning mechanism directly provides for system-level MRP planning priorities to induce appropriate production planning and control objectives on the FMS while simultaneously allowing for necessary feedback from the FMS. Moreover, by demonstrating the tractability of the FMRCP and FMGL problems, this research establishes the necessary groundwork upon which to explore systemwide issues pertaining to the coordination of the hierarchical structure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号