首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Haplotyping as perfect phylogeny: a direct approach.   总被引:4,自引:0,他引:4  
A full haplotype map of the human genome will prove extremely valuable as it will be used in large-scale screens of populations to associate specific haplotypes with specific complex genetic-influenced diseases. A haplotype map project has been announced by NIH. The biological key to that project is the surprising fact that some human genomic DNA can be partitioned into long blocks where genetic recombination has been rare, leading to strikingly fewer distinct haplotypes in the population than previously expected (Helmuth, 2001; Daly et al., 2001; Stephens et al., 2001; Friss et al., 2001). In this paper we explore the algorithmic implications of the no-recombination in long blocks observation, for the problem of inferring haplotypes in populations. This assumption, together with the standard population-genetic assumption of infinite sites, motivates a model of haplotype evolution where the haplotypes in a population are assumed to evolve along a coalescent, which as a rooted tree is a perfect phylogeny. We consider the following algorithmic problem, called the perfect phylogeny haplotyping problem (PPH), which was introduced by Gusfield (2002) - given n genotypes of length m each, does there exist a set of at most 2n haplotypes such that each genotype is generated by a pair of haplotypes from this set, and such that this set can be derived on a perfect phylogeny? The approach taken by Gusfield (2002) to solve this problem reduces it to established, deep results and algorithms from matroid and graph theory. Although that reduction is quite simple and the resulting algorithm nearly optimal in speed, taken as a whole that approach is quite involved, and in particular, challenging to program. Moreover, anyone wishing to fully establish, by reading existing literature, the correctness of the entire algorithm would need to read several deep and difficult papers in graph and matroid theory. However, as stated by Gusfield (2002), many simplifications are possible and the list of "future work" in Gusfield (2002) began with the task of developing a simpler, more direct, yet still efficient algorithm. This paper accomplishes that goal, for both the rooted and unrooted PPH problems. It establishes a simple, easy-to-program, O(nm(2))-time algorithm that determines whether there is a PPH solution for input genotypes and produces a linear-space data structure to represent all of the solutions. The approach allows complete, self-contained proofs. In addition to algorithmic simplicity, the approach here makes the representation of all solutions more intuitive than in Gusfield (2002), and solves another goal from that paper, namely, to prove a nontrivial upper bound on the number of PPH solutions, showing that that number is vastly smaller than the number of haplotype solutions (each solution being a set of n pairs of haplotypes that can generate the genotypes) when the perfect phylogeny requirement is not imposed.  相似文献   

2.
The problem of inferring haplotype phase from a population of genotypes has received a lot of attention recently. This is partly due to the observation that there are many regions on human genomic DNA where genetic recombination is rare (Helmuth, 2001; Daly et al., 2001; Stephens et al., 2001; Friss et al., 2001). A Haplotype Map project has been announced by NIH to identify and characterize populations in terms of these haplotypes. Recently, Gusfield introduced the perfect phylogeny haplotyping problem, as an algorithmic implication of the no-recombination in long blocks observation, together with the standard population-genetic assumption of infinite sites. Gusfield's solution based on matroid theory was followed by direct theta(nm2) solutions that use simpler techniques (Bafna et al., 2003; Eskin et al., 2003), and also bound the number of solutions to the PPH problem. In this short note, we address two questions that were left open. First, can the algorithms of Bafna et al. (2003) and Eskin et al. (2003) be sped-up to O(nm + m2) time, which would imply an O(nm) time-bound for the PPH problem? Second, if there are multiple solutions, can we find one that is most parsimonious in terms of the number of distinct haplotypes. We give reductions that suggests that the answer to both questions is "no." For the first problem, we show that computing the output of the first step (in either method) is equivalent to Boolean matrix multiplication. Therefore, the best bound we can presently achieve is O(nm(omega-1)), where omega < or = 2.52 is the exponent of matrix multiplication. Thus, any linear time solution to the PPH problem likely requires a different approach. For the second problem of computing a PPH solution that minimizes the number of distinct haplotypes, we show that the problem is NP-hard using a reduction from Vertex Cover (Garey and Johnson, 1979).  相似文献   

3.
The Pure Parsimony Haplotyping (PPH) problem is a NP-hard combinatorial optimization problem that consists of finding the minimum number of haplotypes necessary to explain a given set of genotypes. PPH has attracted more and more attention in recent years due to its importance in analysis of many fine-scale genetic data. Its application fields range from mapping complex disease genes to inferring population histories, passing through designing drugs, functional genomics and pharmacogenetics. In this article we investigate, for the first time, a recent version of PPH called the Pure Parsimony Haplotype problem under Uncertain Data (PPH-UD). This version mainly arises when the input genotypes are not accurate, i.e., when some single nucleotide polymorphisms are missing or affected by errors. We propose an exact approach to solution of PPH-UD based on an extended version of Catanzaro et al.[1] class representative model for PPH, currently the state-of-the-art integer programming model for PPH. The model is efficient, accurate, compact, polynomial-sized, easy to implement, solvable with any solver for mixed integer programming, and usable in all those cases for which the parsimony criterion is well suited for haplotype estimation.  相似文献   

4.
A physico-mathematical model of the gating machinery of single ionic channels in biological membranes has been developed. In the paradigm of this model, gating particles are subjected to: (i) deterministic friction force responsible for interactions of gating particles with the surrounding solution; (ii) deterministic potential force depending on the structure and conformational state of the channel pore (the latter is controlled by the transmembrane voltage V and regulates the motion of particles overcoming potential barriers on going from the closed (open) to the open (closed) state of the channel); (iii) deterministic force responsible for interactions of water molecules with hydrophobic sites in the channel pore, and, finally, (iv) stochastic thermal fluctuation force. The model affords adequate approximation of experimental data.  相似文献   

5.
Phosphonopyruvate (P-pyr) hydrolase (PPH), a member of the phosphoenolpyruvate (PEP) mutase/isocitrate lyase (PEPM/ICL) superfamily, hydrolyzes P-pyr and shares the highest sequence identity and functional similarity with PEPM. Recombinant PPH from Variovorax sp. Pal2 was expressed in Escherichia coli and purified to homogeneity. Analytical gel filtration indicated that the protein exists in solution predominantly as a tetramer. The PPH pH rate profile indicates maximal activity over a broad pH range. The steady-state kinetic constants determined for a rapid equilibrium ordered kinetic mechanism with Mg2+ binding first (Kd = 140 +/- 40 microM), are kcat = 105 +/- 2 s(-1) and P-pyr Km = 5 +/- 1 microM. PEP (slow substrate kcat = 2 x 10(-4) s(-1)), oxalate, and sulfopyruvate are competitive inhibitors with Ki values of 2.0 +/- 0.1 mM, 17 +/- 1 microM, and 210 +/- 10 microM, respectively. Three PPH crystal structures have been determined, that of a ligand-free enzyme, the enzyme bound to Mg2+ and oxalate (inhibitor), and the enzyme bound to Mg2+ and P-pyr (substrate). The complex with the inhibitor was obtained by cocrystallization, whereas that with the substrate was obtained by briefly soaking crystals of the ligand-free enzyme with P-pyr prior to flash cooling. The PPH structure resembles that of the other members of the PEPM/ICL superfamily and is most similar to the functionally related enzyme, PEPM. Each monomer of the dimer of dimers exhibits an (alpha/beta)8 barrel fold with the eighth helix swapped between two molecules of the dimer. Both P-pyr and oxalate are anchored to the active site by Mg2+. The loop capping the active site is disordered in all three structures, in contrast to PEPM, where the equivalent loop adopts an open or disordered conformation in the unbound state but sequesters the inhibitor from solvent in the bound state. Crystal packing may have favored the open conformation of PPH even when the enzyme was cocrystallized with the oxalate inhibitor. Structure alignment of PPH with other superfamily members revealed two pairs of invariant or conservatively replaced residues that anchor the flexible gating loop. The proposed PPH catalytic mechanism is analogous to that of PEPM but includes activation of a water nucleophile with the loop Thr118 residue.  相似文献   

6.
A fundamental problem arising in the evolutionary molecular biology is to discover the locations of gene duplications and multiple gene duplication episodes based on the phylogenetic information. The solutions to the MULTIPLE GENE DUPLICATION problems can provide useful clues to place the gene duplication events onto the locations of a species tree and to expose the multiple gene duplication episodes. In this paper, we study two variations of the MULTIPLE GENE DUPLICATION problems: the EPISODE-CLUSTERING (EC) problem and the MINIMUM EPISODES (ME) problem. For the EC problem, we improve the results of Burleigh et al. with an optimal linear-time algorithm. For the ME problem, on the basis of the algorithm presented by Bansal and Eulenstein, we propose an optimal linear-time algorithm.  相似文献   

7.
Inferring haplotype data from genotype data is a crucial step in linking SNPs to human diseases. Given n genotypes over m SNP sites, the haplotype inference (HI) problem deals with finding a set of haplotypes so that each given genotype can be formed by a combining a pair of haplotypes from the set. The perfect phylogeny haplotyping (PPH) problem is one of the many computational approaches to the HI problem. Though it was conjectured that the complexity of the PPH problem was O(nm), the complexity of all the solutions presented until recently was O(nm (2)). In this paper, we make complete use of the column-ordering that was presented earlier and show that there must be some interdependencies among the pairwise relationships between SNP sites in order for the given genotypes to allow a perfect phylogeny. Based on these interdependencies, we introduce the FlexTree (flexible tree) data structure that represents all the pairwise relationships in O(m) space. The FlexTree data structure provides a compact representation of all the perfect phylogenies for the given set of genotypes. We also introduce an ordering of the genotypes that allows the genotypes to be added to the FlexTree sequentially. The column ordering, the FlexTree data structure, and the row ordering we introduce make the O(nm) OPPH algorithm possible. We present some results on simulated data which demonstrate that the OPPH algorithm performs quiet impressively when compared to the previous algorithms. The OPPH algorithm is one of the first O(nm) algorithms presented for the PPH problem.  相似文献   

8.
In this paper, it is shown that for a class of reaction networks, the discrete stochastic nature of the reacting species and reactions results in qualitative and quantitative differences between the mean of exact stochastic simulations and the prediction of the corresponding deterministic system. The differences are independent of the number of molecules of each species in the system under consideration. These reaction networks are open systems of chemical reactions with no zero-order reaction rates. They are characterized by at least two stationary points, one of which is a nonzero stable point, and one unstable trivial solution (stability based on a linear stability analysis of the deterministic system). Starting from a nonzero initial condition, the deterministic system never reaches the zero stationary point due to its unstable nature. In contrast, the result presented here proves that this zero-state is a stable stationary state for the discrete stochastic system, and other finite states have zero probability of existence at large times. This result generalizes previous theoretical studies and simulations of specific systems and provides a theoretical basis for analyzing a class of systems that exhibit such inconsistent behavior. This result has implications in the simulation of infection, apoptosis, and population kinetics, as it can be shown that for certain models the stochastic simulations will always yield different predictions for the mean behavior than the deterministic simulations.  相似文献   

9.
The dynamic optimization (open loop optimal control) of non-linear bioprocesses is considered in this contribution. These processes can be described by sets of non-linear differential and algebraic equations (DAEs), usually subject to constraints in the state and control variables. A review of the available solution techniques for this class of problems is presented, highlighting the numerical difficulties arising from the non-linear, constrained and often discontinuous nature of these systems. In order to surmount these difficulties, we present several alternative stochastic and hybrid techniques based on the control vector parameterization (CVP) approach. The CVP approach is a direct method which transforms the original problem into a non-linear programming (NLP) problem, which must be solved by a suitable (efficient and robust) solver. In particular, a hybrid technique uses a first global optimization phase followed by a fast second phase based on a local deterministic method, so it can handle the nonconvexity of many of these NLPs. The efficiency and robustness of these techniques is illustrated by solving several challenging case studies regarding the optimal control of fed-batch bioreactors and other bioprocesses. In order to fairly evaluate their advantages, a careful and critical comparison with several other direct approaches is provided. The results indicate that the two-phase hybrid approach presents the best compromise between robustness and efficiency.  相似文献   

10.
Even though individual-based models (IBMs) have become very popular in ecology during the last decade, there have been few attempts to implement behavioural aspects in IBMs. This is partly due to lack of appropriate techniques. Behavioural and life history aspects can be implemented in IBMs through adaptive models based on genetic algorithms and neural networks (individual-based-neural network-genetic algorithm, ING). To investigate the precision of the adaptation process, we present three cases where solutions can be found by optimisation. These cases include a state-dependent patch selection problem, a simple game between predators and prey, and a more complex vertical migration scenario for a planktivorous fish. In all cases, the optimal solution is calculated and compared with the solution achieved using ING. The results show that the ING method finds optimal or close to optimal solutions for the problems presented. In addition it has a wider range of potential application areas than conventional techniques in behavioural modelling. Especially the method is well suited for complex problems where other methods fail to provide answers. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

11.
The optimal discounted present value of an exploited population under constant effort harvesting in an environment with random disasters and bonanzas is investigated. The deterministic component of growth is density independent (also called Malthusian or exponential). The disasters and bonanzas are random, occurring at the times of events of a Poisson process. The density independent properties of the model and the constant effort open loop policy lead to an exact solution for the expected present value. The optimal expected present value is compared with those in deterministic models with and without deterministic type jumps. Both deterministic and random jumps can have a significant influence on the optimal present value. However, the effort to achieve the optimal is not sensitive to variations in the total jump frequency or in the discount rate. The average random jump model is much easier to apply than the deterministic jump model. Bonanzas can have much more of an effect on the present value than disasters given similar jump rates.  相似文献   

12.
Millstein [Bio. Philos. 17 (2002) 33] correctly identies a serious problem with the view that natural selection and random drift are not conceptually distinct. She offers a solution to this problem purely in terms of differences between the processes of selection and drift. I show that this solution does not work, that it leaves the vast majority of real biological cases uncategorized. However, I do think there is a solution to the problem she raises, and I offer it here. My solution depends on solving the biological analogue of the reference class problem in probability theory and on the reality of individual fitnesses.  相似文献   

13.
BackgroundAccurate estimation of blood loss is central to prompt diagnosis and management of post-partum hemorrhage (PPH), which remains a leading cause of maternal mortality in low-resource countries. In such settings, blood loss is often estimated visually and subjectively by attending health workers, due to inconsistent availability of laboratory infrastructure. We evaluated the diagnostic accuracy of weighed blood loss (WBL) versus changes in peri-partum hemoglobin to detect PPH.MethodsData from this analysis were collected as part of a randomized controlled trial comparing oxytocin with misoprostol for PPH (NCT01866241). Blood samples for complete blood count were drawn on admission and again prior to hospital discharge or before blood transfusion. During delivery, women were placed on drapes and had pre-weighed sanitary towels placed around their perineum. Blood was then drained into a calibrated container and the sanitary towels were added to estimate WBL, where each gram of blood was estimated as a milliliter. Sensitivity, specificity, negative and positive predictive values (PPVs) were calculated at various blood volume loss and time combinations, and we fit receiver-operator curves using blood loss at 1, 2, and 24 hours compared to a reference standard of haemoglobin decrease of >10%.ResultsA total of 1,140 women were enrolled in the study, of whom 258 (22.6%) developed PPH, defined as a haemoglobin drop >10%, and 262 (23.0%) had WBL ≥500mL. WBL generally had a poor sensitivity for detection of PPH (<75% for most volume-time combinations). In contrast, the specificity of WBL was high with blood loss ≥ 500mL at 1h and ≥750mL at any time points excluding PPH in over 97% of women. As such, WBL has a high PPV (>85%) in high prevalence settings when WBL exceeds 750mL.ConclusionWBL has poor sensitivity but high specificity compared to laboratory-based methods of PPH diagnosis. These characteristics correspond to a high PPV in areas with high PPH prevalence. Although WBL is not useful for excluding PPH, this low-cost, simple and reproducible method is promising as a reasonable method to identify significant PPH in such settings where quantifiable red cell indices are unavailable.  相似文献   

14.
5H-Pyridophenoxazin-5-one (PPH), a new anticancer iminoquinone, is able to inhibit a large number of lymphoblastoid and solid tumor-derived cells at submicromolar concentrations. Molecular modeling calculations indicated that this compound might intercalate into the DNA double strand. This was also supported by nuclear magnetic resonance studies. Since free radicals arising from anticancer quinonic drugs have been proposed to be key species responsible for DNA cleavage, we have aimed to intercept and identify free radicals from PPH generated under bioreductive conditions. The first and second monoelectronic reduction potentials of PPH were measured by means of cyclic voltammetry: the reduction potential of PPH is compatible with its reduction by compounds such as NADH, and suggested that reduction of PPH may play a role in its cytotoxicity. The radical anion PPH(*)(-) was detected by means of electron paramagnetic resonance spectroscopy, and its identification was supported by DFT calculations. EPR experiments in the presence of spin traps 5,5-dimethylpyrroline N-oxide and 5-(diethoxyphosphoryl)-5-methylpyrroline N-oxide suggested the occurrence of an electron transfer between the radical anion of the drug and oxygen resulting in the formation of the superoxide anion (O(2)(*)(-)). The enthalpy of the reaction of PPH(*)(-) with O(2) was determined both in the gas phase and in solution at the B3LYP/6-31+G level using the isodensity PCM method, and the overall process in dimethyl sulfoxide was predicted to be slightly exothermic. We propose that the monoelectronic reduction of PPH in the proximity of DNA may eventually lead to radicals that could cause considerable damage to DNA, thus accounting for the high cytotoxic activity of the drug. Indeed, a comet assay (alkaline single-cell electrophoresis) showed that PPH causes free radical-induced DNA damage.  相似文献   

15.
Monte Carlo means the statistical methods used to solve stochastic or deterministic physical or mathematical problems. The use of this technique for solving deterministic problems is indicated whenever there is no analytical solution for the problem or when this solution is very difficult or when the use of numerical methods requires an excessive amount of CPU-time. This paper describes the application of Monte Carlo methods using a computer program, called ISODOSE, which calculates isodose curves around linear radioactive sources to be used in brachytherapy treatment-planning. Brachytherapy is a special form of cancer treatment in which the radioactive source is placed near or inside the tumor, in order to produce cancer tissue necrosis. The program is written in C language, and the results will be compared to similar data obtained from a commercially available program at Hospital do Cancer de Recife (Brazil), radiation therapy institute. The use of Monte Carlo techniques in the ISODOSE program allows for plotting isodose curves around linear sources, and it is especially more precise near the borders of the source (singularities), taking less time than it would be possible by using deterministic methods. The ISODOSE program can be used for brachytherapy planning in small clinics in Recife and adjacent areas.  相似文献   

16.
Since neo-Darwinism arose from the work of Darwin and Mendel evolution by natural selection has been seen as contingent and historical being defined by an a posteriori selection process with no a priori laws that explain why evolution on Earth has taken the direction of the major evolutionary trends and transitions instead of any other direction. Recently, however, major life-history trends and transitions have been explained as inevitable because of a deterministic selection that unfolds from the energetic state of the organism and the density-dependent competitive interactions that arise from self-replication in limited environments. I describe differences and similarities between the historical and deterministic selection processes, illustrate concepts using life-history models on large body masses and limited reproductive rates, review life-history evolution with a wider focus on major evolutionary transitions, and propose that biotic evolution is driven by a universal natural selection where the long-term evolution of fitness-related traits is determined mainly by deterministic selection, while contingency is important predominately for neutral traits. Given suitable environmental conditions, it is shown that selection by energetic state and density-dependent competitive interactions unfolds to higher level selection for life-history transitions from simple asexually reproducing self-replicators to large bodied organisms with senescence and sexual reproduction between males and females, and in some cases, to the fully evolved eusocial colony with thousands of offspring workers. This defines an evolutionary arrow of time for open thermodynamic systems with a constant inflow of energy, predicting similar routes for long-term evolution on similar planets.  相似文献   

17.
A simple stochastic model on DNA renaturation kinetics in the presence and absence of cooperativity have been developed [the corresponding deterministic models have been explicitly treated in our previous work. Biochem Biophys Res Commun 293 (2002) 870-873]. Theoretical mean and variance of number of bases in single-stranded DNA (ssDNA), (which is of course a random variable) have been calculated and compared with the experimental values. The results showed that only the cooperative model correctly predicted the time t(m) at which variance becomes maximum whereas, the non-cooperative model overestimated it and thus proved the validity of the cooperative model. Some of the applications of this cooperative theory in resolving the problems of the central dogma of life, PCR etc. have also been discussed.  相似文献   

18.
Sorting permutations by transpositions is an important problem in genome rearrangements. A transposition is a rearrangement operation in which a segment is cut out of the permutation and pasted in a different location. The complexity of this problem is still open and it has been a 10-year-old open problem to improve the best known 1.5-approximation algorithm. In this paper, we provide a 1.375-approximation algorithm for sorting by transpositions. The algorithm is based on a new upper bound on the diameter of 3-permutations. In addition, we present some new results regarding the transposition diameter: We improve the lower bound for the transposition diameter of the symmetric group and determine the exact transposition diameter of simple permutations  相似文献   

19.
BACKGROUND: Several deterministic and stochastic combinatorial optimization algorithms have been applied to computational protein design and homology modeling. As structural targets increase in size, however, it has become necessary to find more powerful methods to address the increased combinatorial complexity. RESULTS: We present a new deterministic combinatorial search algorithm called 'Branch-and-Terminate' (B&T), which is derived from the Branch-and-Bound search method. The B&T approach is based on the construction of an efficient but very restrictive bounding expression, which is used for the search of a combinatorial tree representing the protein system. The bounding expression is used both to determine the optimal organization of the tree and to perform a highly effective pruning procedure named 'termination'. For some calculations, the B&T method rivals the current deterministic standard, dead-end elimination (DEE), sometimes finding the solution up to 21 times faster. A more significant feature of the B&T algorithm is that it can provide an efficient way to complete the optimization of problems that have been partially reduced by a DEE algorithm. CONCLUSIONS: The B&T algorithm is an effective optimization algorithm when used alone. Moreover, it can increase the problem size limit of amino acid sidechain placement calculations, such as protein design, by completing DEE optimizations that reach a point at which the DEE criteria become inefficient. Together the two algorithms make it possible to find solutions to problems that are intractable by either algorithm alone.  相似文献   

20.
We have cloned three genes for protein phosphatases in the yeast Saccharomyces cerevisiae. Two of the genes, PPH21 and PPH22, encode highly similar proteins that are homologs of the mammalian protein phosphatase 2A (PP2A), while the third gene, PPH3, encodes a new PP2A-related protein. Disruptions of either PPH21 or PPH22 had no effects, but spores disrupted for both genes produced very small colonies with few surviving cells. We conclude that PP2A performs an important function in yeast cells. A disruption of the third gene, PPH3, did not in itself affect growth, but it completely prevented growth of spores disrupted for both PPH21 and PPH22. Thus, PPH3 provides some PP2A-complementing activity which allows for a limited growth of PP2A-deficient cells. Strains were constructed in which we could study the phenotypes caused by either excess PP2A or total PP2A depletion. We found that the level of PP2A activity has dramatic effects on cell shape. PP2A-depleted cells develop an abnormal pear-shaped morphology which is particularly pronounced in the growing bud. In contrast, overexpression of PP2A produces more elongated cells, and high-level overexpression causes a balloonlike phenotype with huge swollen cells filled by large vacuoles.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号