首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 20 毫秒
1.
We analyse optimal and heuristic place prioritization algorithms for biodiversity conservation area network design which can use probabilistic data on the distribution of surrogates for biodiversity. We show how an Expected Surrogate Set Covering Problem (ESSCP) and a Maximal Expected Surrogate Covering Problem (MESCP) can be linearized for computationally efficient solution. For the ESSCP, we study the performance of two optimization software packages (XPRESS and CPLEX) and five heuristic algorithms based on traditional measures of complementarity and rarity as well as the Shannon and Simpson indices of α‐diversity which are being used in this context for the first time. On small artificial data sets the optimal place prioritization algorithms often produced more economical solutions than the heuristic algorithms, though not always ones guaranteed to be optimal. However, with large data sets, the optimal algorithms often required long computation times and produced no better results than heuristic ones. Thus there is generally little reason to prefer optimal to heuristic algorithms with probabilistic data sets.  相似文献   

2.
A challenging task in computational biology is the reconstruction of genomic sequences of extinct ancestors, given the phylogenetic tree and the sequences at the leafs. This task is best solved by calculating the most likely estimate of the ancestral sequences, along with the most likely edge lengths. We deal with this problem and also the variant in which the phylogenetic tree in addition to the ancestral sequences need to be estimated. The latter problem is known to be NP-hard, while the computational complexity of the former is unknown. Currently, all algorithms for solving these problems are heuristics without performance guarantees. The biological importance of these problems calls for developing better algorithms with guarantees of finding either optimal or approximate solutions.We develop approximation, fix parameter tractable (FPT), and fast heuristic algorithms for two variants of the problem; when the phylogenetic tree is known and when it is unknown. The approximation algorithm guarantees a solution with a log-likelihood ratio of 2 relative to the optimal solution. The FPT has a running time which is polynomial in the length of the sequences and exponential in the number of taxa. This makes it useful for calculating the optimal solution for small trees. Moreover, we combine the approximation algorithm and the FPT into an algorithm with arbitrary good approximation guarantee (PTAS). We tested our algorithms on both synthetic and biological data. In particular, we used the FPT for computing the most likely ancestral mitochondrial genomes of hominidae (the great apes), thereby answering an interesting biological question. Moreover, we show how the approximation algorithms find good solutions for reconstructing the ancestral genomes for a set of lentiviruses (relatives of HIV). Supplementary material of this work is available at www.nada.kth.se/~isaac/publications/aml/aml.html.  相似文献   

3.
4.
A Boolean network (BN) is a mathematical model of genetic networks. We propose several algorithms for control of singleton attractors in BN. We theoretically estimate the average-case time complexities of the proposed algorithms, and confirm them by computer experiments. The results suggest the importance of gene ordering. Especially, setting internal nodes ahead yields shorter computational time than setting external nodes ahead in various types of algorithms. We also present a heuristic algorithm which does not look for the optimal solution but for the solution whose computational time is shorter than that of the exact algorithms.  相似文献   

5.
A procedure is presented for the automatic determination of the amino acid sequence of peptides by processing data obtained from mass spectrometry analysis. This is a basic and relevant problem in the field of proteomics. Furthermore, it has an even higher conceptual and applicative interest in peptide research, as well as in other connected fields. The analysis does not rely on known protein databases, but on the computation of all amino acid sequences compatible with the given spectral data. By formulating a mathematical model for such combinatorial problems, the structural limitations of known methods are overcome, and efficient solution algorithms can be developed. The results are very encouraging both from the accuracy and computational points of view.  相似文献   

6.
The design of a protein folding approximation algorithm is not straightforward even when a simplified model is used. The folding problem is a combinatorial problem, where approximation and heuristic algorithms are usually used to find near optimal folds of proteins primary structures. Approximation algorithms provide guarantees on the distance to the optimal solution. The folding approximation approach proposed here depends on two-dimensional cellular automata to fold proteins presented in a well-studied simplified model called the hydrophobic–hydrophilic model. Cellular automata are discrete computational models that rely on local rules to produce some overall global behavior. One-third and one-fourth approximation algorithms choose a subset of the hydrophobic amino acids to form H–H contacts. Those algorithms start with finding a point to fold the protein sequence into two sides where one side ignores H’s at even positions and the other side ignores H’s at odd positions. In addition, blocks or groups of amino acids fold the same way according to a predefined normal form. We intend to improve approximation algorithms by considering all hydrophobic amino acids and folding based on the local neighborhood instead of using normal forms. The CA does not assume a fixed folding point. The proposed approach guarantees one half approximation minus the H–H endpoints. This lower bound guaranteed applies to short sequences only. This is proved as the core and the folds of the protein will have two identical sides for all short sequences.  相似文献   

7.

Background

Given a peptide as a string of amino acids, the masses of all its prefixes and suffixes can be found by a trivial linear scan through the amino acid masses. The inverse problem is the ideal de novo peptide sequencing problem: Given all prefix and suffix masses, determine the string of amino acids. In biological reality, the given masses are measured in a lab experiment, and measurements by necessity are noisy. The (real, noisy) de novo peptide sequencing problem therefore has a noisy input: a few of the prefix and suffix masses of the peptide are missing and a few other masses are given in addition. For this setting, we ask for an amino acid string that explains the given masses as accurately as possible.

Results

Past approaches interpreted accuracy by searching for a string that explains as many masses as possible. We feel, however, that it is not only bad to not explain a mass that appears, but also to explain a mass that does not appear. We propose to minimize the symmetric difference between the set of given masses and the set of masses that the string explains. For this new optimization problem, we propose an efficient algorithm that computes both the best and the k best solutions. Proof-of-concept experiments on measurements of synthesized peptides show that our approach leads to better results compared to finding a string that explains as many given masses as possible.

Conclusions

We conclude that considering the symmetric difference as optimization goal can improve the identification rates for de novo peptide sequencing. A preliminary version of this work has been presented at WABI 2016.
  相似文献   

8.
Molecular biologists strive to infer evolutionary relationships from quantitative macromolecular comparisons obtained by immunological, DNA hybridization, electrophoretic or amino acid sequencing techniques. The problem is to find unrooted phylogenies that best approximate a given dissimilarity matrix according to a goodness-of-fit measure, for example the least-squares-fit criterion or Farris'sf statistic. Computational costs of known algorithms guaranteeing optimal solutions to these problems increase exponentially with problem size; practical computational considerations limit the algorithms to analyzing small problems. It is established here that problems of phylogenetic inference based on the least-squares-fit criterion and thef statistic are NP-complete and thus are so difficult computationally that efficient optimal algorithms are unlikely to exist for them. The Natural Sciences and Engineering Research Council of Canada partially supported this research through an individual operating grant (A4142) to W.H.E. Day.  相似文献   

9.
Humans stand out from other animals in that they are able to explicitly report on the reliability of their internal operations. This ability, which is known as metacognition, is typically studied by asking people to report their confidence in the correctness of some decision. However, the computations underlying confidence reports remain unclear. In this paper, we present a fully Bayesian method for directly comparing models of confidence. Using a visual two-interval forced-choice task, we tested whether confidence reports reflect heuristic computations (e.g. the magnitude of sensory data) or Bayes optimal ones (i.e. how likely a decision is to be correct given the sensory data). In a standard design in which subjects were first asked to make a decision, and only then gave their confidence, subjects were mostly Bayes optimal. In contrast, in a less-commonly used design in which subjects indicated their confidence and decision simultaneously, they were roughly equally likely to use the Bayes optimal strategy or to use a heuristic but suboptimal strategy. Our results suggest that, while people’s confidence reports can reflect Bayes optimal computations, even a small unusual twist or additional element of complexity can prevent optimality.  相似文献   

10.
MOTIVATION: Deciphering the location of gene duplications and multiple gene duplication episodes on the Tree of Life is fundamental to understanding the way gene families and genomes evolve. The multiple gene duplication problem provides a framework for placing gene duplication events onto nodes of a given species tree, and detecting episodes of multiple gene duplication. One version of the multiple gene duplication problem was defined by Guigó et al. in 1996. Several heuristic solutions have since been proposed for this problem, but no exact algorithms were known. RESULTS: In this article we solve this longstanding open problem by providing the first exact and efficient solution. We also demonstrate the improvement offered by our algorithm over the best heuristic approaches, by applying it to several simulated as well as empirical datasets.  相似文献   

11.
In addition to the well‐established sense‐antisense complementarity abundantly present in the nucleic acid world and serving as a basic principle of the specific double‐helical structure of DNA, production of mRNA, and genetic code‐based biosynthesis of proteins, sense‐antisense complementarity is also present in proteins, where sense and antisense peptides were shown to interact with each other with increased probability. In nucleic acids, sense‐antisense complementarity is achieved via the Watson‐Crick complementarity of the base pairs or nucleotide pairing. In proteins, the complementarity between sense and antisense peptides depends on a specific hydropathic pattern, where codons for hydrophilic and hydrophobic amino acids in a sense peptide are complemented by the codons for hydrophobic and hydrophilic amino acids in its antisense counterpart. We are showing here that in addition to this pattern of the complementary hydrophobicity, sense and antisense peptides are characterized by the complementary order‐disorder patterns and show complementarity in sequence distribution of their disorder‐based interaction sites. We also discuss how this order‐disorder complementarity can be related to protein evolution.  相似文献   

12.
Here we report that prioritizing sites in order of rarity-weighted richness (RWR) is a simple, reliable way to identify sites that represent all species in the fewest number of sites (minimum set problem) or to identify sites that represent the largest number of species within a given number of sites (maximum coverage problem). We compared the number of species represented in sites prioritized by RWR to numbers of species represented in sites prioritized by the Zonation software package for 11 datasets in which the size of individual planning units (sites) ranged from <1 ha to 2,500 km2. On average, RWR solutions were more efficient than Zonation solutions. Integer programming remains the only guaranteed way find an optimal solution, and heuristic algorithms remain superior for conservation prioritizations that consider compactness and multiple near-optimal solutions in addition to species representation. But because RWR can be implemented easily and quickly in R or a spreadsheet, it is an attractive alternative to integer programming or heuristic algorithms in some conservation prioritization contexts.  相似文献   

13.
An efficient rank based approach for closest string and closest substring   总被引:1,自引:0,他引:1  
Dinu LP  Ionescu R 《PloS one》2012,7(6):e37576
This paper aims to present a new genetic approach that uses rank distance for solving two known NP-hard problems, and to compare rank distance with other distance measures for strings. The two NP-hard problems we are trying to solve are closest string and closest substring. For each problem we build a genetic algorithm and we describe the genetic operations involved. Both genetic algorithms use a fitness function based on rank distance. We compare our algorithms with other genetic algorithms that use different distance measures, such as Hamming distance or Levenshtein distance, on real DNA sequences. Our experiments show that the genetic algorithms based on rank distance have the best results.  相似文献   

14.
T cell recognition of the peptide–MHC complex initiates a cascade of immunological events necessary for immune responses. Accurate T-cell epitope prediction is an important part of the vaccine designing. Development of predictive algorithms based on sequence profile requires a very large number of experimental binding peptide data to major histocompatibility complex (MHC) molecules. Here we used inverse folding approach to study the peptide specificity of MHC Class-I molecule with the aim of obtaining a better differentiation between binding and nonbinding sequence. Overlapping peptides, spanning the entire protein sequence, are threaded through the backbone coordinates of a known peptide fold in the MHC groove, and their interaction energies are evaluated using statistical pairwise contact potentials. We used the Miyazawa & Jernigan and Betancourt & Thirumalai tables for pairwise contact potentials, and two distance criteria (Nearest atom ≫ 4.0 Å & C-beta ≫ 7.0 Å) for ranking the peptides in an ascending order according to their energy values, and in most cases, known antigenic peptides are highly ranked. The predictions from threading improved when used multiple templates and average scoring scheme. In general, when structural information about a protein-peptide complex is available, the current application of the threading approach can be used to screen a large library of peptides for selection of the best binders to the target protein. The proposed scheme may significantly reduce the number of peptides to be tested in wet laboratory for epitope based vaccine design.  相似文献   

15.
Studies of the assembly of the hexapeptide Acetyl-Trp-Leu5 (AcWL5) into β-sheets in membranes have provided insights into membrane protein folding. Yet, the exact structure of the oligomer in the lipid bilayer is unknown. Here we use neutron diffraction to study the disposition of the peptides in bilayers. We find that pairs of adjacent deuterium-labeled leucines have no well-defined peak or dip in the transmembrane distribution profiles, indicative of heterogeneity in the depth of membrane insertion. At the same time, the monomeric homolog AcWL4 exhibits a homogeneous, well-defined, interfacial location in neutron diffraction experiments. Thus, although the bilayer location of monomeric AcWL4 is determined by hydrophobicity matching or complementarity within the bilayer, the AcWL5 molecules in the oligomer are positioned at different depths within the bilayer because they assemble into a staggered transmembrane β-sheet. The AcWL5 assembly is dominated by protein-protein interactions rather than hydrophobic complementarity. These results have implications for the structure and folding of proteins in their native membrane environment and highlight the importance of the interplay between hydrophobic complementarity and protein-protein interactions in determining the structure of membrane proteins.  相似文献   

16.
Human intestinal mucins are high molecular weight glycoproteins which protect and lubricate the epithelium of the gastrointestinal tract. In cases of malignant disease, mucins are abnormally expressed, overproduced or underglycosylated. This feature may enable the mucins to serve as tumour markers. The MUC2 mucin largely consists of a variable number of tandem repeats of a 23 amino acid sequence, 1PTTTPITTTTTVTPTPTPTGTQT23. In this study we have localised the minimal and the optimal epitope within this region by the previously developed protein core specific 996 monoclonal antibody using synthetic peptides. Several overlapping and truncated peptides related to the tandem repeat unit have been prepared by solid-phase methodology. Other mucin peptides were synthesised on the tips of polyethylene pins, and these remained C-terminally attached to the pins for comparative investigations. The interaction of the 996 monoclonal antibody with the synthetic peptides was studied either in solution by competition RIA or on immobilised peptides by indirect ELISA experiments. These experiments show that the minimal epitope recognised by the 996 antibody is the Ac-19TGTQ22 (IC50=3100 μm in solution). For the optimal 996 antibody binding in solution the 16PTPTGTQ22 heptapeptide (IC50=3 μm ) is required. © 1998 European Peptide Society and John Wiley & Sons, Ltd.  相似文献   

17.
Cyanobacteria and particularly Microcystis sp. (Chroococcales) are known to produce a multitude of peptide metabolites. Here we report on the mass spectral analysis of cyanobacterial peptides in individual colonies of Microcystis sp. collected in a drinking water reservoir. A total number of more than 90 cyanopeptides could be detected, 61 of which could be identified either as known peptides or new structural variants of known peptide classes. For 18 new peptides flat structures are proposed. New congeners differed from known ones mainly in chlorination (aeruginosins), methylation (microginins), or amino acid sequences (cyanopeptolins). The high number of peptides and especially the new peptides underline the capability of Microcystis strains as producers of a high diversity of potentially bioactive compounds.  相似文献   

18.
Cell-penetrating peptides (CPPs) are an attractive tool for delivering membrane-impermeable compounds, including anionic biomacromolecules such as DNA and RNA, into living cells. Amphipathic helical peptides composed of hydrophobic amino acids and cationic amino acids are typical CPPs. In the current study, we designed amphipathic helical 12-mer peptides containing α,α-disubstituted α-amino acids (dAAs), which are known to stabilize peptide secondary structures. The dominant secondary structures of peptides in aqueous solution differed according to the introduced dAAs. Peptides containing hydrophobic dAAs and adopting a helical structure exhibited a good cell-penetrating ability. As an application of amphipathic helical peptides, small interfering RNA (siRNA) delivery into living human hepatoma cells was investigated. One of the peptides containing dAAs dipropylglycine formed stable complexes with siRNA at appropriate zeta-potential and size for intracellular siRNA delivery. This peptide showed effective RNA interference efficiency at short peptide length and low concentrations of peptide and siRNA. These findings will be helpful for the design of amphipathic helical CPPs as intracellular siRNA delivery.  相似文献   

19.
The combination of docking algorithms with NMR data has been developed extensively for the studies of protein-ligand interactions. However, to extend this development for the studies of protein-protein interactions, the intermolecular NOE constraints, which are needed, are more difficult to access. In the present work, we describe a new approach that combines an ab initio docking calculation and the mapping of an interaction site using chemical shift variation analysis. The cytochrome c553-ferredoxin complex is used as a model of numerous electron-transfer complexes. The 15N-labeling of both molecules has been obtained, and the mapping of the interacting site on each partner, respectively, has been done using HSQC experiments. 1H and 15N chemical shift analysis defines the area of both molecules involved in the recognition interface. Models of the complex were generated by an ab initio docking software, the BiGGER program (bimolecular complex generation with global evaluation and ranking). This program generates a population of protein-protein docked geometries ranked by a scoring function, combining relevant stabilization parameters such as geometric complementarity surfaces, electrostatic interactions, desolvation energy, and pairwise affinities of amino acid side chains. We have implemented a new module that includes experimental input (here, NMR mapping of the interacting site) as a filter to select the accurate models. Final structures were energy minimized using the X-PLOR software and then analyzed. The best solution has an interface area (1037.4 A2) falling close to the range of generally observed recognition interfaces, with a distance of 10.0 A between the redox centers.  相似文献   

20.
Similarity problems intensively investigated in computational molecular biology have the following two stringology models: find the longest string included in any string of a given finite language, and find the shortest string including every string of a given finite language. These two problems are exemplified by the two well-known pairs of problems, the longest common subsequence (or substring) problem and the shortest common supersequence (or superstring) problem. interpretations.

In this paper we consider opposite problems connected with string non-inclusion relations: find the shortest string included in no string of a given finite language and find the longest string including no string of a given finite language. The predicate “string is not included in string β” is interpreted either as “ is not a subsequence of β” or as “ is not a substring of β”. The main purpose is to determine the complexity status of the non-similarity problems. Using graph approaches, we present NP-hardness proofs for the first interpretation and polynomial-time algorithms for the second one. Special cases of the problems, and related issues are discussed.  相似文献   


设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号