共查询到20条相似文献,搜索用时 15 毫秒
1.
On the minimum efficiency of least squares 总被引:7,自引:0,他引:7
2.
Prediction and the efficiency of least squares 总被引:1,自引:0,他引:1
3.
Aleksandra Czarna Rafael Sanjuán Fernando González-Candelas Borys Wróbel 《BMC evolutionary biology》2006,6(1):105-13
Background
The least squares (LS) method for constructing confidence sets of trees is closely related to LS tree building methods, in which the goodness of fit of the distances measured on the tree (patristic distances) to the observed distances between taxa is the criterion used for selecting the best topology. The generalized LS (GLS) method for topology testing is often frustrated by the computational difficulties in calculating the covariance matrix and its inverse, which in practice requires approximations. The weighted LS (WLS) allows for a more efficient albeit approximate calculation of the test statistic by ignoring the covariances between the distances. 相似文献4.
5.
Methods of least squares and SIRT in reconstruction. 总被引:1,自引:0,他引:1
In this paper we show that a particular version of the Simultaneous Iterative Reconstruction Technique (SIRT) proposed by Gilbert in 1972 strongly resembles the Richardson least-squares algorithm.By adopting the adjustable parameters of the general Richardson algorithm, we have been able to produce generalized SIRT algorithms with improved convergence.A particular generalization of the SIRT algorithm, GSIRT, has an adjustable parameter σ and the starting picture ρ0 as input. A value for σ and a weighted back-projection for ρ0 produce a stable algorithm.We call the SIRT-like algorithms for the solution of the weighted leastsquares problems LSIRT and present two such algorithms, LSIRT1 and LSIRT2, which have definite computational advantages over SIRT and GSIRT.We have tested these methods on mathematically simulated phantoms and find that the new SIRT methods converge faster than Gilbert's SIRT but are more sensitive to noise present in the data. However, the faster convergence rates allow termination before the noise contribution degrades the reconstructed image excessively. 相似文献
6.
HARTLEY HO 《Biometrika》1948,35(PTS 1-2):32-45
7.
This paper deals with the synthesis of information from different studies when there is lack of independence in some of the contrasts to be combined. This problem can arise in several different situations in both case-control studies and clinical trials. For efficient estimation we appeal to the method of generalized least squares to estimate the summary effect and its standard error. This method requires estimates of the covariances between those contrasts that are not independent. Although it is not possible to estimate the covariance between effects that have been adjusted for confounding factors we present a method for finding upper and lower bounds for this covariance. In the simplest discussion homogeneity of the relative risks is assumed but the method is then extended to allow for heterogeneity in an overall estimate. We then illustrate the method with several examples from an analysis in which case-control studies of cervical cancer and oral contraceptive use are synthesized. 相似文献
8.
9.
We modified the phylogenetic program MrBayes 3.1.2 to incorporate the compound Dirichlet priors for branch lengths proposed recently by Rannala, Zhu, and Yang (2012. Tail paradox, partial identifiability and influential priors in Bayesian branch length inference. Mol. Biol. Evol. 29:325-335.) as a solution to the problem of branch-length overestimation in Bayesian phylogenetic inference. The compound Dirichlet prior specifies a fairly diffuse prior on the tree length (the sum of branch lengths) and uses a Dirichlet distribution to partition the tree length into branch lengths. Six problematic data sets originally analyzed by Brown, Hedtke, Lemmon, and Lemmon (2010. When trees grow too long: investigating the causes of highly inaccurate Bayesian branch-length estimates. Syst. Biol. 59:145-161) are reanalyzed using the modified version of MrBayes to investigate properties of Bayesian branch-length estimation using the new priors. While the default exponential priors for branch lengths produced extremely long trees, the compound Dirichlet priors produced posterior estimates that are much closer to the maximum likelihood estimates. Furthermore, the posterior tree lengths were quite robust to changes in the parameter values in the compound Dirichlet priors, for example, when the prior mean of tree length changed over several orders of magnitude. Our results suggest that the compound Dirichlet priors may be useful for correcting branch-length overestimation in phylogenetic analyses of empirical data sets. 相似文献
10.
Background
The conventional superposition methods use an ordinary least squares (LS) fit for structural comparison of two different conformations of the same protein. The main problem of the LS fit that it is sensitive to outliers, i.e. large displacements of the original structures superimposed.Results
To overcome this problem, we present a new algorithm to overlap two protein conformations by their atomic coordinates using a robust statistics technique: least median of squares (LMS). In order to effectively approximate the LMS optimization, the forward search technique is utilized. Our algorithm can automatically detect and superimpose the rigid core regions of two conformations with small or large displacements. In contrast, most existing superposition techniques strongly depend on the initial LS estimating for the entire atom sets of proteins. They may fail on structural superposition of two conformations with large displacements. The presented LMS fit can be considered as an alternative and complementary tool for structural superposition.Conclusion
The proposed algorithm is robust and does not require any prior knowledge of the flexible regions. Furthermore, we show that the LMS fit can be extended to multiple level superposition between two conformations with several rigid domains. Our fit tool has produced successful superpositions when applied to proteins for which two conformations are known. The binary executable program for Windows platform, tested examples, and database are available from https://engineering.purdue.edu/PRECISE/LMSfit. 相似文献11.
12.
Microarray experiments generate data sets with information on the expression levels of thousands of genes in a set of biological samples. Unfortunately, such experiments often produce multiple missing expression values, normally due to various experimental problems. As many algorithms for gene expression analysis require a complete data matrix as input, the missing values have to be estimated in order to analyze the available data. Alternatively, genes and arrays can be removed until no missing values remain. However, for genes or arrays with only a small number of missing values, it is desirable to impute those values. For the subsequent analysis to be as informative as possible, it is essential that the estimates for the missing gene expression values are accurate. A small amount of badly estimated missing values in the data might be enough for clustering methods, such as hierachical clustering or K-means clustering, to produce misleading results. Thus, accurate methods for missing value estimation are needed. We present novel methods for estimation of missing values in microarray data sets that are based on the least squares principle, and that utilize correlations between both genes and arrays. For this set of methods, we use the common reference name LSimpute. We compare the estimation accuracy of our methods with the widely used KNNimpute on three complete data matrices from public data sets by randomly knocking out data (labeling as missing). From these tests, we conclude that our LSimpute methods produce estimates that consistently are more accurate than those obtained using KNNimpute. Additionally, we examine a more classic approach to missing value estimation based on expectation maximization (EM). We refer to our EM implementations as EMimpute, and the estimate errors using the EMimpute methods are compared with those our novel methods produce. The results indicate that on average, the estimates from our best performing LSimpute method are at least as accurate as those from the best EMimpute algorithm. 相似文献
13.
MOTIVATION: Gene association/interaction networks provide vast amounts of information about essential processes inside the cell. A complete picture of gene-gene associations/interactions would open new horizons for biologists, ranging from pure appreciation to successful manipulation of biological pathways for therapeutic purposes. Therefore, identification of important biological complexes whose members (genes and their products proteins) interact with each other is of prime importance. Numerous experimental methods exist but, for the most part, they are costly and labor intensive. Computational techniques, such as the one proposed in this work, provide a quick 'budget' solution that can be used as a screening tool before more expensive techniques are attempted. Here, we introduce a novel computational method based on the partial least squares (PLS) regression technique for reconstruction of genetic networks from microarray data. RESULTS: The proposed PLS method is shown to be an effective screening procedure for the detection of gene-gene interactions from microarray data. Both simulated and real microarray experiments show that the PLS-based approach is superior to its competitors both in terms of performance and applicability. AVAILABILITY: R code is available from the supplementary web-site whose URL is given below. 相似文献
14.
Pulsed-laser photoacoustics is a technique which measures photoinduced enthalpic and volumetric changes on the nano- and microsecond timescales. Analysis of photoacoustic data generally requires deconvolution for a sum of exponentials, a procedure which has been developed extensively in the field of time-resolved fluorescence decay. Initial efforts to adapt an iterative nonlinear least squares computer program, utilizing the Marquardt algorithm, from the fluorescence field to photoacoustics indicated that significant modifications were needed. The major problem arises from the wide range of transient decay times which must be addressed by the photoacoustic technique. We describe an alternative approach to numerical convolution with exponential decays, developed to overcome the problems. Instead of using an approximation method (Simpson's rule) for evaluating the convolution integral, we construct a continuous instrumental response function by quadratic fitting of the discrete data and evaluate the convolution integral directly, without approximations. The success and limitations of this quadratic-fit convolution program are then demonstrated using simulated data. Finally, the program is applied to the analysis of experimental data to compare the resolution capabilities of two commercially available transducers. The advantages of a broadband, heavily damped transducer are shown for a standard organic photochemical system, the quenching of the triplet state of benzophenone by 2,5-dimethyl-2,4-hexadiene. 相似文献
15.
Recent improvements in measuring and data acquisition techniques have increased greatly the precision of chemical relaxation data. This has necessitated more accurate methods for data analysis in cases of complex relaxation spectra, as is often observed in biochemical systems. We have developed and applied a method capable of decomposing a wave form containing up to three overlapping exponentials. The method is based upon a nonlinear least squares algorithm. Analysis of the method shows that when it is applied to a two exponential function where the signal-to-noise ratio, ΔS/N, is fifty, using 100 data points, the resulting four coefficients (two amplitudes and two relaxation times) are each accurate to within 5–10 % over a wide range of conditions, i.e., relative amplitudes from 10–90 % and ratios of relaxation times of 2.5 or greater. The influence of the number of data points and of random noise is as expected from statistical theory. Applications of the curve fitting methods to experimental temperature-jump data from oxygen binding to hemoglobin and hemocyanin yields internally consistent results. The values of the root-meansquare of the residuals of the fit approach those expected on the basis of the experimental signal-to-noise ratios. 相似文献
16.
MOTIVATION: Genomic DNA copy number alterations are characteristic of many human diseases including cancer. Various techniques and platforms have been proposed to allow researchers to partition the whole genome into segments where copy numbers change between contiguous segments, and subsequently to quantify DNA copy number alterations. In this paper, we incorporate the spatial dependence of DNA copy number data into a regression model and formalize the detection of DNA copy number alterations as a penalized least squares regression problem. In addition, we use a stationary bootstrap approach to estimate the statistical significance and false discovery rate. RESULTS: The proposed method is studied by simulations and illustrated by an application to an extensively analyzed dataset in the literature. The results show that the proposed method can correctly detect the numbers and locations of the true breakpoints while appropriately controlling the false positives. AVAILABILITY: http://bioinformatics.med.yale.edu/DNACopyNumber CONTACT: hongyu.zhao@yale.edu SUPPLEMENTARY INFORMATION: http://bioinformatics.med.yale.edu/DNACopyNumber. 相似文献
17.
In our article, only a set of random positions of missing valueswas used for each dataset. However, imputation methods may 相似文献
18.
Missing value estimation for DNA microarray gene expression data: local least squares imputation 总被引:9,自引:0,他引:9
MOTIVATION: Gene expression data often contain missing expression values. Effective missing value estimation methods are needed since many algorithms for gene expression data analysis require a complete matrix of gene array values. In this paper, imputation methods based on the least squares formulation are proposed to estimate missing values in the gene expression data, which exploit local similarity structures in the data as well as least squares optimization process. RESULTS: The proposed local least squares imputation method (LLSimpute) represents a target gene that has missing values as a linear combination of similar genes. The similar genes are chosen by k-nearest neighbors or k coherent genes that have large absolute values of Pearson correlation coefficients. Non-parametric missing values estimation method of LLSimpute are designed by introducing an automatic k-value estimator. In our experiments, the proposed LLSimpute method shows competitive results when compared with other imputation methods for missing value estimation on various datasets and percentages of missing values in the data. AVAILABILITY: The software is available at http://www.cs.umn.edu/~hskim/tools.html CONTACT: hpark@cs.umn.edu 相似文献
19.
Tellinghuisen J 《Analytical biochemistry》2005,343(1):106-115
The method of generalized least squares (GLS) is used to assess the variance function for isothermal titration calorimetry (ITC) data collected for the 1:1 complexation of Ba(2+) with 18-crown-6 ether. In the GLS method, the least squares (LS) residuals from the data fit are themselves fitted to a variance function, with iterative adjustment of the weighting function in the data analysis to produce consistency. The data are treated in a pooled fashion, providing 321 fitted residuals from 35 data sets in the final analysis. Heteroscedasticity (nonconstant variance) is clearly indicated. Data error terms proportional to q(i) and q(i)/v are well defined statistically, where q(i) is the heat from the ith injection of titrant and v is the injected volume. The statistical significance of the variance function parameters is confirmed through Monte Carlo calculations that mimic the actual data set. For the data in question, which fall mostly in the range of q(i)=100-2000 microcal, the contributions to the data variance from the terms in q(i)(2) typically exceed the background constant term for q(i)>300 microcal and v<10 microl. Conversely, this means that in reactions with q(i) much less than this, heteroscedasticity is not a significant problem. Accordingly, in such cases the standard unweighted fitting procedures provide reliable results for the key parameters, K and DeltaH(degrees) and their statistical errors. These results also support an important earlier finding: in most ITC work on 1:1 binding processes, the optimal number of injections is 7-10, which is a factor of 3 smaller than the current norm. For high-q reactions, where weighting is needed for optimal LS analysis, tips are given for using the weighting option in the commercial software commonly employed to process ITC data. 相似文献
20.
A program for least squares analysis of reassociation and hybridization data. 总被引:27,自引:11,他引:27
下载免费PDF全文

A computer program is described for the rapid calculation of least squares solutions for data fitted to different functions normally used in reassociation and hybridization kinetic measurements. The equations for the fraction not reacted as a function of Cot follow: First order, exp(-kCot); second order, (1+kCot)-1; variable order, (1+kCot)-n; approximate fraction of DNA sequence remaining single stranded, (1+kCot)-.44; and a function describing the pairing of tracer when the rate constant for the tracer (k) is distinct from the driver rate constant (kd): (formula: see text). Several components may be used for most of these functional forms. The standard deviations of the individual parameters at the solutions are calculated. 相似文献