首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In many research disciplines, hypothesis tests are applied to evaluate whether findings are statistically significant or could be explained by chance. The Wilcoxon–Mann–Whitney(WMW) test is among the most popular hypothesis tests in medicine and life science to analyze if two groups of samples are equally distributed. This nonparametric statistical homogeneity test is commonly applied in molecular diagnosis. Generally, the solution of the WMW test takes a high combinatorial effort for large sample cohorts containing a significant number of ties. Hence, P value is frequently approximated by a normal distribution. We developed EDISON-WMW, a new approach to calculate the exact permutation of the two-tailed unpaired WMW test without any corrections required and allowing for ties. The method relies on dynamic programing to solve the combinatorial problem of the WMW test efficiently. Beyond a straightforward implementation of the algorithm, we presented different optimization strategies and developed a parallel solution. Using our program,the exact P value for large cohorts containing more than 1000 samples with ties can be calculated within minutes. We demonstrate the performance of this novel approach on randomly-generated data, benchmark it against 13 other commonly-applied approaches and moreover evaluate molecular biomarkers for lung carcinoma and chronic obstructive pulmonary disease(COPD). We foundthat approximated P values were generally higher than the exact solution provided by EDISONWMW. Importantly, the algorithm can also be applied to high-throughput omics datasets, where hundreds or thousands of features are included. To provide easy access to the multi-threaded version of EDISON-WMW, a web-based solution of our algorithm is freely available at http://www.ccb.uni-saarland.de/software/wtest/.  相似文献   

2.
 Two results are presented for problems involving alleles with a continuous range of effects. The first result is a simple yet highly accurate numerical method that determines the equilibrium distribution of allelic effects, moments of this distribution, and the mutational load. The numerical method is explicitly applied to the mutation-selection balance problem of stabilising selection. The second result is an exact solution for the distribution of allelic effects under weak stabilising selection for a particular distribution of mutant effects. The exact solution is shown to yield a distribution of allelic effects that, depending on the mutation rate, interpolates between the ``House of Cards' approximation and the Gaussian approximation. The exact solution is also used to test the accuracy of the numerical method. Received: 7 November 2001 / Revised version: 5 September 2002 / Published online: 18 December 2002 Key words or phrases: Continuum of alleles – Numerical solution – Exact solution – Mutation selection balance – Stabilising selection  相似文献   

3.
We report a novel approach for inversion of large random matrices in massive Multiple-Input Multiple Output (MIMO) systems. It is based on the concept of inverse vectors in which an inverse vector is defined for each column of the principal matrix. Such an inverse vector has to satisfy two constraints. Firstly, it has to be in the null-space of all the remaining columns. We call it the null-space problem. Secondly, it has to form a projection of value equal to one in the direction of selected column. We term it as the normalization problem. The process essentially decomposes the inversion problem and distributes it over columns. Each column can be thought of as a node in the network or a particle in a swarm seeking its own solution, the inverse vector, which lightens the computational load on it. Another benefit of this approach is its applicability to all three cases pertaining to a linear system: the fully-determined, the over-determined, and the under-determined case. It eliminates the need of forming the generalized inverse for the last two cases by providing a new way to solve the least squares problem and the Moore and Penrose''s pseudoinverse problem. The approach makes no assumption regarding the size, structure or sparsity of the matrix. This makes it fully applicable to much in vogue large random matrices arising in massive MIMO systems. Also, the null-space problem opens the door for a plethora of methods available in literature for null-space computation to enter the realm of matrix inversion. There is even a flexibility of finding an exact or approximate inverse depending on the null-space method employed. We employ the Householder''s null-space method for exact solution and present a complete exposition of the new approach. A detailed comparison with well-established matrix inversion methods in literature is also given.  相似文献   

4.
Haplotype reconstruction from SNP fragments by minimum error correction   总被引:5,自引:0,他引:5  
MOTIVATION: Haplotype reconstruction based on aligned single nucleotide polymorphism (SNP) fragments is to infer a pair of haplotypes from localized polymorphism data gathered through short genome fragment assembly. An important computational model of this problem is the minimum error correction (MEC) model, which has been mentioned in several literatures. The model retrieves a pair of haplotypes by correcting minimum number of SNPs in given genome fragments coming from an individual's DNA. RESULTS: In the first part of this paper, an exact algorithm for the MEC model is presented. Owing to the NP-hardness of the MEC model, we also design a genetic algorithm (GA). The designed GA is intended to solve large size problems and has very good performance. The strength and weakness of the MEC model are shown using experimental results on real data and simulation data. In the second part of this paper, to improve the MEC model for haplotype reconstruction, a new computational model is proposed, which simultaneously employs genotype information of an individual in the process of SNP correction, and is called MEC with genotype information (shortly, MEC/GI). Computational results on extensive datasets show that the new model has much higher accuracy in haplotype reconstruction than the pure MEC model.  相似文献   

5.
In limited-view computed tomography reconstruction, iterative image reconstruction with sparsity-exploiting methods, such as total variation (TV) minimization, inspired by compressive sensing, potentially claims large reductions in sampling requirements. However, a quantitative notion of this claim is non-trivial because of the ill-defined reduction in sampling achieved by the sparsity-exploiting method. In this paper, exact reconstruction sampling condition for limited-view problem is studied by verifying the uniqueness of solution in TV minimization model. Uniqueness is tested by solving a convex optimization problem derived from the sufficient and necessary condition of solution uniqueness. Through this method, the sufficient sampling number of exact reconstruction is quantified for any fixed phantom and settled geometrical parameter in the limited-view problem. This paper provides a reference to quantify the sampling condition. Three phantoms are tested to study the sampling condition of limited view exact reconstruction in this paper. The experiment results show the quantified sampling number and indicate that an object would be accurately reconstructed as the scanning range becomes narrower by increasing sampling number. The increased samplings compensate for the deficiency of the projection angle. However, the lower bound of the scanning range corresponding to three different phantoms are presented, in which an exact reconstruction cannot be obtained once the projection angular is narrowed to this extent no matter how to increase sampling.  相似文献   

6.

We study fixation probabilities for the Moran stochastic process for the evolution of a population with three or more types of individuals and frequency-dependent fitnesses. Contrary to the case of populations with two types of individuals, in which fixation probabilities may be calculated by an exact formula, here we must solve a large system of linear equations. We first show that this system always has a unique solution. Other results are upper and lower bounds for the fixation probabilities obtained by coupling the Moran process with three strategies with birth–death processes with only two strategies. We also apply our bounds to the problem of evolution of cooperation in a population with three types of individuals already studied in a deterministic setting by Núñez Rodríguez and Neves (J Math Biol 73:1665–1690, 2016). We argue that cooperators will be fixated in the population with probability arbitrarily close to 1 for a large region of initial conditions and large enough population sizes.

  相似文献   

7.
Chaperones/heat shock proteins (HSPs) of the HSP90 and HSP70 families show elevated levels in proliferating mammalian cells and a cell cycle-dependent expression. They transiently associate with key molecules of the cell cycle control system such as Cdk4, Wee-1, pRb, p53, p27/Kip1 and are involved in the nuclear localization of regulatory proteins. They also associate with viral oncoproteins such as SV40 super T, large T and small t antigen, polyoma large and middle S antigen and EpsteinBarr virus nuclear antigen. This association is based on a J-domain in the viral proteins and may assist their targeting to the pRb/E2F complex. Small HSPs and their state of phosphorylation and oligomerization also seem to be involved in proliferation and differentiation. Chaperones/HSPs thus play important roles within cell cycle processes. Their exact functioning, however, is still a matter of discussion. HSP90 in particular, but also HSP70 and other chaperones associate with proteins of the mitogen-activated signal cascade, particularly with the Src kinase, with tyrosine receptor kinases, with Raf and the MAP-kinase activating kinase (MEK). This apparently serves the folding and translocation of these proteins, but possibly also the formation of large immobilized complexes of signal transducing molecules (scaffolding function).  相似文献   

8.
MOTIVATION: Deciphering the location of gene duplications and multiple gene duplication episodes on the Tree of Life is fundamental to understanding the way gene families and genomes evolve. The multiple gene duplication problem provides a framework for placing gene duplication events onto nodes of a given species tree, and detecting episodes of multiple gene duplication. One version of the multiple gene duplication problem was defined by Guigó et al. in 1996. Several heuristic solutions have since been proposed for this problem, but no exact algorithms were known. RESULTS: In this article we solve this longstanding open problem by providing the first exact and efficient solution. We also demonstrate the improvement offered by our algorithm over the best heuristic approaches, by applying it to several simulated as well as empirical datasets.  相似文献   

9.
In the morphogenesis of double stranded DNA phages, a precursor protein shell empty of DNA is first assembled and then filled with DNA. The assembly of the correctly dimensioned precursor shell (procapsid) of Salmonella bacteriophage P22 requires the interaction of some 420 coat protein subunits with approximately 200 scaffolding protein subunits to form a double shelled particle with the scaffolding protein on the inside. In the course of DNA packaging, all of the scaffolding protein subunits exit from the procapsid and participate in further rounds of procapsid assembly (King and Casjens. 1974. Nature (Lond.). 251:112-119). To study the mechanism of shell assembly we have purified the coat and scaffolding protein subunits by selective dissociation of isolated procapsids. Both proteins can be obtained as soluble subunits in Tris buffer at near neutral pH. The coat protein sedimented in sucrose gradients as a roughly spherical monomer, while the scaffolding protein sedimented as if it were an elongated monomer. When the two proteins were mixed together in 1.5 M guanidine hydrochloride and dialyzed back to buffer at room temperature, procapsids formed which were very similar in morphology, sedimentation behavior, and protein composition to procapsids formed in vivo. Incubation of either protein alone under the same conditions did not yield any large structures. We interpret these results to mean that the assembly of the shell involves a switching of both proteins from their nonaggregating to their aggregating forms through their mutual interaction. The results are discussed in terms of the general problem of self-regulated assembly and the control of protein polymerization in morphogenesis.  相似文献   

10.
All living systems depend on transformations of elements between different states. In particular, the transformation of dead organic matter in the soil (SOM) by decomposers (microbes) releases elements incorporated in SOM and makes the elements available anew to plants. A major problem in analysing and describing this process is that SOM, as the result of the decomposer activity, is a mixture of a very large number of molecules with widely differing chemical and physical properties. The continuous-quality equation (CQE) is a general equation describing this complexity by assigning a continuous-quality variable to each carbon atom in SOM. The use of CQE has been impeded by its complicated mathematics. Here, we show by deriving exact solutions that, at least for some specific cases, there exist solutions to CQE. These exact solutions show that previous approximations have overestimated the rate by which litter decomposes and as a consequence underestimated steady state SOM amounts. The exact and approximate solutions also differ with respect to the parameter space in which they yield finite steady-state SOM amounts. The latter point is important because temperature is one of the parameters and climatic change may move the solution from a region of the parameter space with infinite steady-state SOM to a region of finite steady-state SOM, with potentially large changes in soil carbon stores. We also show that the solution satisfies the Chapman-Kolmogorov theorem. The importance of this is that it provides efficient algorithms for numerical solutions.  相似文献   

11.
Many applications of data partitioning (clustering) have been well studied in bioinformatics. Consider, for instance, a set N of organisms (elements) based on DNA marker data. A partition divides all elements in N into two or more disjoint clusters that cover all elements, where a cluster contains a non-empty subset of N. Different partitioning algorithms may produce different partitions. To compute the distance and find the consensus partition (also called consensus clustering) between two or more partitions are important and interesting problems that arise frequently in bioinformatics and data mining, in which different distance functions may be considered in different partition algorithms. In this article, we discuss the k partition-distance problem. Given a set of elements N with k partitions of N, the k partition-distance problem is to delete the minimum number of elements from each partition such that all remaining partitions become identical. This problem is NP-complete for general k?>?2 partitions, and no algorithms are known at present. We design the first known heuristic and approximation algorithms with performance ratios 2 to solve the k partition-distance problem in O(k?·?ρ?·?|N|) time, where ρ is the maximum number of clusters of these k partitions and |N| is the number of elements in N. We also present the first known exact algorithm in O(??·?2(?)·k(2)?·?|N|(2)) time, where ? is the partition-distance of the optimal solution for this problem. Performances of our exact and approximation algorithms in testing the random data with actual sets of organisms based on DNA markers are compared and discussed. Experimental results reveal that our algorithms can improve the computational speed of the exact algorithm for the two partition-distance problem in practice if the maximum number of elements per cluster is less than ρ. From both theoretical and computational points of view, our solutions are at most twice the partition-distance of the optimal solution. A website offering the interactive service of solving the k partition-distance problem using our and previous algorithms is available (see http://mail.tmue.edu.tw/~yhchen/KPDP.html).  相似文献   

12.
Many statistical methods and programs are available to compute the significance of a given DNA pattern in a genome sequence. In this paper, after outlining the mathematical background of this problem, we present SPA (Statistic for PAtterns), an expert system with a simple web interface designed to be applied to two of these methods (large deviation approximations and exact computations using simple recurrences). A few results are presented, leading to a comparison between the two methods and to a simple decision rule in the choice of that to be used. Finally, future developments of SPA are discussed. This tool is available at the following address: http://stat.genopole.cnrs.fr/SPA/.  相似文献   

13.
MOTIVATION: There is a pressing need for improved proteomic screening methods allowing for earlier diagnosis of disease, systematic monitoring of physiological responses and the uncovering of fundamental mechanisms of drug action. The combined platform of LC-MS (Liquid-Chromatography-Mass-Spectrometry) has shown promise in moving toward a solution in these areas. In this paper we present a technique for discovering differences in protein signal between two classes of samples of LC-MS serum proteomic data without use of tandem mass spectrometry, gels or labeling. This method works on data from a lower-precision MS instrument, the type routinely used by and available to the community at large today. We test our technique on a controlled (spike-in) but realistic (serum biomarker discovery) experiment which is therefore verifiable. We also develop a new method for helping to assess the difficulty of a given spike-in problem. Lastly, we show that the problem of class prediction, sometimes mistaken as a solution to biomarker discovery, is actually a much simpler problem. RESULTS: Using precision-recall curves with experimentally extracted ground truth, we show that (1) our technique has good performance using seven replicates from each class, (2) performance degrades with decreasing number of replicates, (3) the signal that we are teasing out is not trivially available (i.e. the differences are not so large that the task is easy). Lastly, we easily obtain perfect classification results for data in which the problem of extracting differences does not produce absolutely perfect results. This emphasizes the different nature of the two problems and also their relative difficulties. AVAILABILITY: Our data are publicly available as a benchmark for further studies of this nature at http://www.cs.toronto.edu/~jenn/LCMS  相似文献   

14.
In this paper, a randomized numerical approach is used to obtain approximate solutions for a class of nonlinear Fredholm integral equations of the second kind. The proposed approach contains two steps: at first, we define a discretized form of the integral equation by quadrature formula methods and solution of this discretized form converges to the exact solution of the integral equation by considering some conditions on the kernel of the integral equation. And then we convert the problem to an optimal control problem by introducing an artificial control function. Following that, in the next step, solution of the discretized form is approximated by a kind of Monte Carlo (MC) random search algorithm. Finally, some examples are given to show the efficiency of the proposed approach.  相似文献   

15.

Background  

Searching for proteins that contain similar substructures is an important task in structural biology. The exact solution of most formulations of this problem, including a recently published method based on tableaux, is too slow for practical use in scanning a large database.  相似文献   

16.
The diamondback moth, Plutella xylostella (L.) (Lepidoptera: Plutellidae), is one of the most economically significant pests of canola, Brassica napus L., in Ardabil region, Iran. Use of host plant resistance integrated with biocontrol agents such as Diadegma majale (Gravenhorst) (Hymenoptera: Ichneumonidae) is an essential component of integrated management of P. xylostella. In this study, we investigated the parasitism by D. majale on six selected cultivars of canola under field conditions and preference and performance of the parasitoid on P. xylostella larvae under laboratory conditions. In field experiments, the highest larval density of P. xylostella was observed on Zarfam during 2008 and 2009. Larval densities were not significantly different among Opera, Hyola 401, Okapi, and Option 500 and Elite in 2008, but the lowest larval density was observed on Opera in 2009. No significant differences were observed among the rate of parasitized larvae on tested cultivars in 2008, while in 2009 the parasitism rate was significantly higher on Opera than on Zarfam. In free-choice situations, the percentage of parasitized larvae was significantly highest on Opera (88.7%) and lowest on Zarfam (62.95%). Developmental time from egg to adult, body mass, length of forewings and hindwings, length of hind femur and hind tibia of D. majale females reared on larvae of P. xylostella fed on Opera did not differ from other cultivars. Our results suggest that cultivation of Opera integrated with D. majale could provide effective and sustainable management of P. xylostella in the region.  相似文献   

17.
We describe an efficient algorithm for determining exactly the minimum number of sires consistent with the multi-locus genotypes of a mother and her progeny. We consider cases where a simple exhaustive search through all possible sets of sires is impossible in practice because it would take too long to complete. Our algorithm for solving this combinatorial optimization problem avoids visiting large parts of search space that would not result in a solution with fewer sires. This improvement is of particular importance when the number of allelic types in the progeny array is large and when the minimum number of sires is expected to be large. Precisely in such cases, it is important to know the minimum number of sires: this number gives an exact bound on the most likely number of sires estimated by a random search algorithm in a parameter region where it may be difficult to determine whether it has converged. We apply our algorithm to data from the marine snail, Littorina saxatilis.  相似文献   

18.
For precise boundary conditions of biological relevance, it is proved that the steadily propagating plane-wave solution to the Fisher equation requires the unique (eigenvalue) velocity of advance 2(Df)1/2, whereD is the diffusivity of the mutant species andf is the frequency of selection in favor of the mutant. This rigorous result shows that a so-called “wrong equation”, i.e. one which differs from Fisher's by a term that is seemingly inconsequential for certain initial conditions, cannot be employed readily to obtain approximate solutions to Fisher's, for the two equations will often have qualitatively different manifolds of exact solutions. It is noted that the Fisher equation itself may be inappropriate in certain biological contexts owing to the manifest instability of the lowerconcentration uniform equilibrium state (UES). Depicting the persistence of a mutantdeficient spatial pocket, an exact steady-state solution to the Fisher equation is presented. As an alternative and perhaps more faithful model equation for the propagation of certain species properties through a homogeneous population, we consider a reaction-diffusion equation that features a cubic-polynomial rate expression in the species concentration, with two stable UES and one intermediate unstable UES. This equation admits a remarkably simple exact analytical solution to the steadily propagating plane-wave eigenvalue problem. In the latter solution, the sign of the eigenvelocity is such that the wave propagates to yield the “preferred” stable UES (namely, the one further removed from the unstable intermediate UES) at all spatial points ast→∞. The cubic-polynomial equation also admits an exact steady-state solution for a mutant-deficient or mutant-isolated spatial pocket. Finally, the perpetuating growth of a mutant population from an arbitrary localized initial distribution, a mathematical problem analogous to that for ignition in laminar flame theory, is studied by applying differential inequality analysis, and rigorous sufficient conditions for extinction are derived here.  相似文献   

19.
Many prokaryotic and eukaryotic double-stranded DNA viruses use a scaffolding protein to assemble their capsid. Assembly of the double-stranded DNA bacteriophage P22 procapsids requires the interaction of 415 molecules of coat protein and 60-300 molecules of scaffolding protein. Although the 303-amino-acid scaffolding protein is essential for proper assembly of procapsids, little is known about its structure beyond an NMR structure of the extreme C-terminus, which is known to interact with coat protein. Deletion mutagenesis indicates that other regions of scaffolding protein are involved in interactions with coat protein and other capsid proteins. Single-cysteine and double-cysteine variants of scaffolding protein were generated for use in fluorescence resonance energy transfer and cross-linking experiments designed to probe the conformation of scaffolding protein in solution and within procapsids. We showed that the N-terminus and the C-terminus are proximate in solution, and that the middle of the protein is near the N-terminus but not accessible to the C-terminus. In procapsids, the N-terminus was no longer accessible to the C-terminus, indicating that there is a conformational change in scaffolding protein upon assembly. In addition, our data are consistent with a model where scaffolding protein dimers are positioned parallel with one another with the associated C-termini.  相似文献   

20.
MOTIVATION: Side-chain positioning is a central component of homology modeling and protein design. In a common formulation of the problem, the backbone is fixed, side-chain conformations come from a rotamer library, and a pairwise energy function is optimized. It is NP-complete to find even a reasonable approximate solution to this problem. We seek to put this hardness result into practical context. RESULTS: We present an integer linear programming (ILP) formulation of side-chain positioning that allows us to tackle large problem sizes. We relax the integrality constraint to give a polynomial-time linear programming (LP) heuristic. We apply LP to position side chains on native and homologous backbones and to choose side chains for protein design. Surprisingly, when positioning side chains on native and homologous backbones, optimal solutions using a simple, biologically relevant energy function can usually be found using LP. On the other hand, the design problem often cannot be solved using LP directly; however, optimal solutions for large instances can still be found using the computationally more expensive ILP procedure. While different energy functions also affect the difficulty of the problem, the LP/ILP approach is able to find optimal solutions. Our analysis is the first large-scale demonstration that LP-based approaches are highly effective in finding optimal (and successive near-optimal) solutions for the side-chain positioning problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号