共查询到20条相似文献,搜索用时 15 毫秒
1.
Aleksandra Czarna Rafael Sanjuán Fernando González-Candelas Borys Wróbel 《BMC evolutionary biology》2006,6(1):105-13
Background
The least squares (LS) method for constructing confidence sets of trees is closely related to LS tree building methods, in which the goodness of fit of the distances measured on the tree (patristic distances) to the observed distances between taxa is the criterion used for selecting the best topology. The generalized LS (GLS) method for topology testing is often frustrated by the computational difficulties in calculating the covariance matrix and its inverse, which in practice requires approximations. The weighted LS (WLS) allows for a more efficient albeit approximate calculation of the test statistic by ignoring the covariances between the distances. 相似文献2.
HARTLEY HO 《Biometrika》1948,35(PTS 1-2):32-45
3.
4.
Missing value estimation for DNA microarray gene expression data: local least squares imputation 总被引:9,自引:0,他引:9
MOTIVATION: Gene expression data often contain missing expression values. Effective missing value estimation methods are needed since many algorithms for gene expression data analysis require a complete matrix of gene array values. In this paper, imputation methods based on the least squares formulation are proposed to estimate missing values in the gene expression data, which exploit local similarity structures in the data as well as least squares optimization process. RESULTS: The proposed local least squares imputation method (LLSimpute) represents a target gene that has missing values as a linear combination of similar genes. The similar genes are chosen by k-nearest neighbors or k coherent genes that have large absolute values of Pearson correlation coefficients. Non-parametric missing values estimation method of LLSimpute are designed by introducing an automatic k-value estimator. In our experiments, the proposed LLSimpute method shows competitive results when compared with other imputation methods for missing value estimation on various datasets and percentages of missing values in the data. AVAILABILITY: The software is available at http://www.cs.umn.edu/~hskim/tools.html CONTACT: hpark@cs.umn.edu 相似文献
5.
In our article, only a set of random positions of missing valueswas used for each dataset. However, imputation methods may 相似文献
6.
Microarray experiments generate data sets with information on the expression levels of thousands of genes in a set of biological samples. Unfortunately, such experiments often produce multiple missing expression values, normally due to various experimental problems. As many algorithms for gene expression analysis require a complete data matrix as input, the missing values have to be estimated in order to analyze the available data. Alternatively, genes and arrays can be removed until no missing values remain. However, for genes or arrays with only a small number of missing values, it is desirable to impute those values. For the subsequent analysis to be as informative as possible, it is essential that the estimates for the missing gene expression values are accurate. A small amount of badly estimated missing values in the data might be enough for clustering methods, such as hierachical clustering or K-means clustering, to produce misleading results. Thus, accurate methods for missing value estimation are needed. We present novel methods for estimation of missing values in microarray data sets that are based on the least squares principle, and that utilize correlations between both genes and arrays. For this set of methods, we use the common reference name LSimpute. We compare the estimation accuracy of our methods with the widely used KNNimpute on three complete data matrices from public data sets by randomly knocking out data (labeling as missing). From these tests, we conclude that our LSimpute methods produce estimates that consistently are more accurate than those obtained using KNNimpute. Additionally, we examine a more classic approach to missing value estimation based on expectation maximization (EM). We refer to our EM implementations as EMimpute, and the estimate errors using the EMimpute methods are compared with those our novel methods produce. The results indicate that on average, the estimates from our best performing LSimpute method are at least as accurate as those from the best EMimpute algorithm. 相似文献
7.
Tellinghuisen J 《Analytical biochemistry》2005,343(1):106-115
The method of generalized least squares (GLS) is used to assess the variance function for isothermal titration calorimetry (ITC) data collected for the 1:1 complexation of Ba(2+) with 18-crown-6 ether. In the GLS method, the least squares (LS) residuals from the data fit are themselves fitted to a variance function, with iterative adjustment of the weighting function in the data analysis to produce consistency. The data are treated in a pooled fashion, providing 321 fitted residuals from 35 data sets in the final analysis. Heteroscedasticity (nonconstant variance) is clearly indicated. Data error terms proportional to q(i) and q(i)/v are well defined statistically, where q(i) is the heat from the ith injection of titrant and v is the injected volume. The statistical significance of the variance function parameters is confirmed through Monte Carlo calculations that mimic the actual data set. For the data in question, which fall mostly in the range of q(i)=100-2000 microcal, the contributions to the data variance from the terms in q(i)(2) typically exceed the background constant term for q(i)>300 microcal and v<10 microl. Conversely, this means that in reactions with q(i) much less than this, heteroscedasticity is not a significant problem. Accordingly, in such cases the standard unweighted fitting procedures provide reliable results for the key parameters, K and DeltaH(degrees) and their statistical errors. These results also support an important earlier finding: in most ITC work on 1:1 binding processes, the optimal number of injections is 7-10, which is a factor of 3 smaller than the current norm. For high-q reactions, where weighting is needed for optimal LS analysis, tips are given for using the weighting option in the commercial software commonly employed to process ITC data. 相似文献
8.
Prediction and the efficiency of least squares 总被引:1,自引:0,他引:1
9.
Robust estimation in pulse fluorometry. A study of the method of moments and least squares.
下载免费PDF全文

I Isenberg 《Biophysical journal》1983,43(2):141-148
Most laboratories use least-squares iterative reconvolution (LSIR) as a routine method for estimating decay parameters in pulse fluorometric data. It is shown here, however, that LSIR is very sensitive to small amounts of error in the data whenever two decays become too close to one another, or whenever analyses of three decays are attempted. In such cases, inferior methods of estimating integrals, small zero point shifts, or small errors in the measured exciting light will result in failures of least squares, where the method of moments, with moment index displacement and lambda invariance testing, will succeed. The method of moments is therefore robust with respect to such errors while least squares is not. 相似文献
10.
On the minimum efficiency of least squares 总被引:7,自引:0,他引:7
11.
The quality of protein function predictions relies on appropriate training of protein classification methods. Performance of these methods can be affected when only a limited number of protein samples are available, which is often the case in divergent protein families. Whereas profile hidden Markov models and PSI-BLAST presented significant performance decrease in such cases, alignment-free partial least-squares classifiers performed consistently better even when used to identify short fragmented sequences. 相似文献
12.
For the usual full rank univariate least squares regression model y = XB + e, E(e) = 0, E(ee) = A, the equality of the estimates occurs when B-B* = (XA?1X)?1XA-1y-(XX)?1Xy = 0. A necessary and sufficient condition for this equality is that A has some N - k + 1 roots equal where N is the rank of A and k is the rank of X. 相似文献
13.
Methods of least squares and SIRT in reconstruction. 总被引:1,自引:0,他引:1
In this paper we show that a particular version of the Simultaneous Iterative Reconstruction Technique (SIRT) proposed by Gilbert in 1972 strongly resembles the Richardson least-squares algorithm.By adopting the adjustable parameters of the general Richardson algorithm, we have been able to produce generalized SIRT algorithms with improved convergence.A particular generalization of the SIRT algorithm, GSIRT, has an adjustable parameter σ and the starting picture ρ0 as input. A value for σ and a weighted back-projection for ρ0 produce a stable algorithm.We call the SIRT-like algorithms for the solution of the weighted leastsquares problems LSIRT and present two such algorithms, LSIRT1 and LSIRT2, which have definite computational advantages over SIRT and GSIRT.We have tested these methods on mathematically simulated phantoms and find that the new SIRT methods converge faster than Gilbert's SIRT but are more sensitive to noise present in the data. However, the faster convergence rates allow termination before the noise contribution degrades the reconstructed image excessively. 相似文献
14.
Microarray gene expression data often contains multiple missing values due to various reasons. However, most of gene expression data analysis algorithms require complete expression data. Therefore, accurate estimation of the missing values is critical to further data analysis. In this paper, an Iterated Local Least Squares Imputation (ILLSimpute) method is proposed for estimating missing values. Two unique features of ILLSimpute method are: ILLSimpute method does not fix a common number of coherent genes for target genes for estimation purpose, but defines coherent genes as those within a distance threshold to the target genes. Secondly, in ILLSimpute method, estimated values in one iteration are used for missing value estimation in the next iteration and the method terminates after certain iterations or the imputed values converge. Experimental results on six real microarray datasets showed that ILLSimpute method performed at least as well as, and most of the time much better than, five most recent imputation methods. 相似文献
15.
On inconsistency of the neighbor-joining, least squares, and minimum evolution estimation when substitution processes are incorrectly modeled 总被引:4,自引:0,他引:4
Using analytical methods, we show that under a variety of model misspecifications, Neighbor-Joining, minimum evolution, and least squares estimation procedures are statistically inconsistent. Failure to correctly account for differing rates-across-sites processes, failure to correctly model rate matrix parameters, and failure to adjust for parallel rates-across-sites changes (a rates-across-subtrees process) are all shown to lead to a "long branch attraction" form of inconsistency. In addition, failure to account for rates-across-sites processes is also shown to result in underestimation of evolutionary distances for a wide variety of substitution models, generalizing an earlier analytical result for the Jukes-Cantor model reported in Golding and a similar bias result for the GTR or REV model in Kelly and Rice (1996). Although standard rates-across-sites models can be employed in many of these cases to restore consistency, current models cannot account for other kinds of misspecification. We examine an idealized but biologically relevant case, where parallel changes in rates at sites across subtrees is shown to give rise to inconsistency. This changing rates-across-subtrees type model misspecification cannot be adjusted for with conventional methods or without carefully considering the rate variation in the larger tree. The results are presented for four-taxon trees, but the expectation is that they have implications for larger trees as well. To illustrate this, a simulated 42-taxon example is given in which the microsporidia, an enigmatic group of eukaryotes, are incorrectly placed at the archaebacteria-eukaryotes split because of incorrectly specified pairwise distances. The analytical nature of the results lend insight into the reasons that long branch attraction tends to be a common form of inconsistency and reasons that other forms of inconsistency like "long branches repel" can arise in some settings. In many of the cases of inconsistency presented, a particular incorrect topology is estimated with probability converging to one, the implication being that measures of uncertainty like bootstrap support will be unable to detect that there is a problem with the estimation. The focus is on distance methods, but previous simulation results suggest that the zones of inconsistency for distance methods contain the zones of inconsistency for maximum likelihood methods as well. 相似文献
16.
Multitrait least squares for quantitative trait loci detection 总被引:20,自引:0,他引:20
A multiple-trait QTL mapping method using least squares is described. It is presented as an extension of a single-trait method for use with three-generation, outbred pedigrees. The multiple-trait framework allows formal testing of whether the same QTL affects more than one trait (i.e., a pleiotropic QTL) or whether more than one linked QTL are segregating. Several approaches to the testing procedure are presented and their suitability discussed. The performance of the method is investigated by simulation. As previously found, multitrait analyses increase the power to detect a pleiotropic QTL and the precision of its location estimate. With enough information, discrimination between alternative genetic models is possible. 相似文献
17.
18.
19.
20.
Internal forces in the human body can be estimated from measured movements and external forces using inverse dynamic analysis. Here we present a general method of analysis which makes optimal use of all available data, and allows the use of inverse dynamic analysis in cases where external force data is incomplete. The method was evaluated for the analysis of running on a partially instrumented treadmill. It was found that results correlate well with those of a conventional analysis where all external forces are known. 相似文献