首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Sibship reconstruction from genetic data with typing errors   总被引:13,自引:0,他引:13  
Wang J 《Genetics》2004,166(4):1963-1979
Likelihood methods have been developed to partition individuals in a sample into full-sib and half-sib families using genetic marker data without parental information. They invariably make the critical assumption that marker data are free of genotyping errors and mutations and are thus completely reliable in inferring sibships. Unfortunately, however, this assumption is rarely tenable for virtually all kinds of genetic markers in practical use and, if violated, can severely bias sibship estimates as shown by simulations in this article. I propose a new likelihood method with simple and robust models of typing error incorporated into it. Simulations show that the new method can be used to infer full- and half-sibships accurately from marker data with a high error rate and to identify typing errors at each locus in each reconstructed sib family. The new method also improves previous ones by adopting a fresh iterative procedure for updating allele frequencies with reconstructed sibships taken into account, by allowing for the use of parental information, and by using efficient algorithms for calculating the likelihood function and searching for the maximum-likelihood configuration. It is tested extensively on simulated data with a varying number of marker loci, different rates of typing errors, and various sample sizes and family structures and applied to two empirical data sets to demonstrate its usefulness.  相似文献   

2.
In isothermal titration calorimetry (ITC), the two main sources of random (statistical) error are associated with the extraction of the heat q from the measured temperature changes and with the delivery of metered volumes of titrant. The former leads to uncertainty that is approximately constant and the latter to uncertainty that is proportional to q. The role of these errors in the analysis of ITC data by nonlinear least squares is examined for the case of 1:1 binding, M+X right arrow over left arrow MX. The standard errors in the key parameters-the equilibrium constant Ko and the enthalpy DeltaHo-are assessed from the variance-covariance matrix computed for exactly fitting data. Monte Carlo calculations confirm that these "exact" estimates will normally suffice and show further that neglect of weights in the nonlinear fitting can result in significant loss of efficiency. The effects of the titrant volume error are strongly dependent on assumptions about the nature of this error: If it is random in the integral volume instead of the differential volume, correlated least-squares is required for proper analysis, and the parameter standard errors decrease with increasing number of titration steps rather than increase.  相似文献   

3.
In quantitative biology, observed data are fitted to a model that captures the essence of the system under investigation in order to obtain estimates of the parameters of the model, as well as their standard errors and interactions. The fitting is best done by the method of maximum likelihood, though least-squares fits are often used as an approximation because the calculations are perceived to be simpler. Here Brian Williams and Chris Dye argue that the method of maximum likelihood is generally preferable to least squares giving the best estimates of the parameters for data with any given error distribution, and the calculations are no more difficult than for least-squares fitting. They offer a relatively simple explanation of the methods and describe its implementation using examples from leishmaniasis epidemiology.  相似文献   

4.
Johnson PC  Haydon DT 《Genetics》2007,175(2):827-842
The importance of quantifying and accounting for stochastic genotyping errors when analyzing microsatellite data is increasingly being recognized. This awareness is motivating the development of data analysis methods that not only take errors into consideration but also recognize the difference between two distinct classes of error, allelic dropout and false alleles. Currently methods to estimate rates of allelic dropout and false alleles depend upon the availability of error-free reference genotypes or reliable pedigree data, which are often not available. We have developed a maximum-likelihood-based method for estimating these error rates from a single replication of a sample of genotypes. Simulations show it to be both accurate and robust to modest violations of its underlying assumptions. We have applied the method to estimating error rates in two microsatellite data sets. It is implemented in a computer program, Pedant, which estimates allelic dropout and false allele error rates with 95% confidence regions from microsatellite genotype data and performs power analysis. Pedant is freely available at http://www.stats.gla.ac.uk/ approximately paulj/pedant.html.  相似文献   

5.
Vasco DA 《Genetics》2008,179(2):951-963
The estimation of ancestral and current effective population sizes in expanding populations is a fundamental problem in population genetics. Recently it has become possible to scan entire genomes of several individuals within a population. These genomic data sets can be used to estimate basic population parameters such as the effective population size and population growth rate. Full-data-likelihood methods potentially offer a powerful statistical framework for inferring population genetic parameters. However, for large data sets, computationally intensive methods based upon full-likelihood estimates may encounter difficulties. First, the computational method may be prohibitively slow or difficult to implement for large data. Second, estimation bias may markedly affect the accuracy and reliability of parameter estimates, as suggested from past work on coalescent methods. To address these problems, a fast and computationally efficient least-squares method for estimating population parameters from genomic data is presented here. Instead of modeling genomic data using a full likelihood, this new approach uses an analogous function, in which the full data are replaced with a vector of summary statistics. Furthermore, these least-squares estimators may show significantly less estimation bias for growth rate and genetic diversity than a corresponding maximum-likelihood estimator for the same coalescent process. The least-squares statistics also scale up to genome-sized data sets with many nucleotides and loci. These results demonstrate that least-squares statistics will likely prove useful for nonlinear parameter estimation when the underlying population genomic processes have complex evolutionary dynamics involving interactions between mutation, selection, demography, and recombination.  相似文献   

6.
Patterns that resemble strongly skewed size distributions are frequently observed in ecology. A typical example represents tree size distributions of stem diameters. Empirical tests of ecological theories predicting their parameters have been conducted, but the results are difficult to interpret because the statistical methods that are applied to fit such decaying size distributions vary. In addition, binning of field data as well as measurement errors might potentially bias parameter estimates. Here, we compare three different methods for parameter estimation – the common maximum likelihood estimation (MLE) and two modified types of MLE correcting for binning of observations or random measurement errors. We test whether three typical frequency distributions, namely the power-law, negative exponential and Weibull distribution can be precisely identified, and how parameter estimates are biased when observations are additionally either binned or contain measurement error. We show that uncorrected MLE already loses the ability to discern functional form and parameters at relatively small levels of uncertainties. The modified MLE methods that consider such uncertainties (either binning or measurement error) are comparatively much more robust. We conclude that it is important to reduce binning of observations, if possible, and to quantify observation accuracy in empirical studies for fitting strongly skewed size distributions. In general, modified MLE methods that correct binning or measurement errors can be applied to ensure reliable results.  相似文献   

7.
Several recent reports have addressed the problem of estimating the response slope from repeated measurements of paired data when both stimulus and response variables are subject to biological variability. These earlier approaches suffer from several drawbacks: useful information about the relationships between the error components in a closed-loop system is not fully utilized; the response intercept cannot be directly estimated; and the normalization procedure required in some methods may fail under certain circumstances. This paper proposes a new, general method of simultaneously estimating the response slope and intercept from corrupted stimulus-response data when the errors in both variables are specifically related by the system structure. A direct extension of the least-squares approach, this method [directed least squares (DLS)] reduces to ordinary least-squares methods when either of the measured variables is error free and to the reduced-major-axis (RMA) method of Kermack and Haldane (Biometrics 37: 30-41, 1950) when the magnitudes of the normalized errors are equal. The DLS estimators are scale invariant, statistically unbiased and always assume the minimum variance. With simple modifications, the method is also applicable to paired data. If, however, the relation between error components is uncertain, then the RMA method is optimal, i.e., having the least possible asymptotic bias and variance. These results are illustrated by using various types of closed-loop respiratory response data.  相似文献   

8.
A modification of a method of Gardner, which employs Fourier-transform techniques, is used to obtain initial estimates for the number of terms and values of the parameters for data which are represented by a sum of exponential terms. New experimental methods have increased both the amount and accuracy of data from radiopharmaceutical experiments. This in turn allows one to devise specific numerical methods that utilize the better data. The inherent difficulties of fitting exponentials to data, which is an ill-posed problem, cannot be overcome by any method. However, we show that the present accuracy of Fourier methods may be extended by our numerical methods applied to the improved data sets. In many cases the method yields accurate estimates for the parameters; these estimates then are to be used as initial estimates for a nonlinear least-squares analysis of the problem.  相似文献   

9.
A statistical model is proposed for the analysis of errors in microarray experiments and is employed in the analysis and development of a combined normalisation regime. Through analysis of the model and two-dye microarray data sets, this study found the following. The systematic error introduced by microarray experiments mainly involves spot intensity-dependent, feature-specific and spot position-dependent contributions. It is difficult to remove all these errors effectively without a suitable combined normalisation operation. Adaptive normalisation using a suitable regression technique is more effective in removing spot intensity-related dye bias than self-normalisation, while regional normalisation (block normalisation) is an effective way to correct spot position-dependent errors. However, dye-flip replicates are necessary to remove feature-specific errors, and also allow the analyst to identify the experimentally introduced dye bias contained in non-self-self data sets. In this case, the bias present in the data sets may include both experimentally introduced dye bias and the biological difference between two samples. Self-normalisation is capable of removing dye bias without identifying the nature of that bias. The performance of adaptive normalisation, on the other hand, depends on its ability to correctly identify the dye bias. If adaptive normalisation is combined with an effective dye bias identification method then there is no systematic difference between the outcomes of the two methods.  相似文献   

10.
A convenient method for evaluation of biochemical reaction rate coefficients and their uncertainties is described. The motivation for developing this method was the complexity of existing statistical methods for analysis of biochemical rate equations, as well as the shortcomings of linear approaches, such as Lineweaver-Burk plots. The nonlinear least-squares method provides accurate estimates of the rate coefficients and their uncertainties from experimental data. Linearized methods that involve inversion of data are unreliable since several important assumptions of linear regression are violated. Furthermore, when linearized methods are used, there is no basis for calculation of the uncertainties in the rate coefficients. Uncertainty estimates are crucial to studies involving comparisons of rates for different organisms or environmental conditions. The spreadsheet method uses weighted least-squares analysis to determine the best-fit values of the rate coefficients for the integrated Monod equation. Although the integrated Monod equation is an implicit expression of substrate concentration, weighted least-squares analysis can be employed to calculate approximate differences in substrate concentration between model predictions and data. An iterative search routine in a spreadsheet program is utilized to search for the best-fit values of the coefficients by minimizing the sum of squared weighted errors. The uncertainties in the best-fit values of the rate coefficients are calculated by an approximate method that can also be implemented in a spreadsheet. The uncertainty method can be used to calculate single-parameter (coefficient) confidence intervals, degrees of correlation between parameters, and joint confidence regions for two or more parameters. Example sets of calculations are presented for acetate utilization by a methanogenic mixed culture and trichloroethylene cometabolism by a methane-oxidizing mixed culture. An additional advantage of application of this method to the integrated Monod equation compared with application of linearized methods is the economy of obtaining rate coefficients from a single batch experiment or a few batch experiments rather than having to obtain large numbers of initial rate measurements. However, when initial rate measurements are used, this method can still be used with greater reliability than linearized approaches.  相似文献   

11.
Aim Public land survey records are commonly used to reconstruct historical forest structure over large landscapes. Reconstruction studies have been criticized for using absolute measures of forest attributes, such as density and basal area, because of potential selection bias by surveyors and unknown measurement error. Current methods to identify bias are based upon statistical techniques whose assumptions may be violated for survey data. Our goals were to identify and directly estimate common sources of bias and error, and to test the accuracy of statistical methods to identify them. Location Forests in the western USA: Mogollon Plateau, Arizona; Blue Mountains, Oregon; Front Range, Colorado. Methods We quantified both selection bias and measurement error for survey data in three ponderosa pine landscapes by directly comparing measurements of bearing trees in survey notes with remeasurements of bearing trees at survey corners (384 corners and 812 trees evaluated). Results Selection bias was low in all areas and there was little variability among surveyors. Surveyors selected the closest tree to the corner 95% to 98% of the time, and hence bias may have limited impacts on reconstruction studies. Bourdo’s methods were able to successfully detect presence or absence of bias most of the time, but do not measure the rate of bias. Recording and omission errors were common but highly variable among surveyors. Measurements for bearing trees made by surveyors were generally accurate. Most bearings were less than 5° in error and most distances were within 5% of our remeasurements. Many, but not all, surveyors in the western USA probably estimated diameter of bearing trees at stump height (0.3 m). These estimates deviated from reconstructed diameters by a mean absolute error of 7.0 to 10.6 cm. Main conclusions Direct comparison of survey data at relocated corners is the only method that can determine if bias and error are meaningful. Data from relocated trees show that biased selection of trees is not likely to be an important source of error. Many surveyor errors would have no impact on reconstruction studies, but omission errors have the potential to have a large impact on results. We suggest how to reduce potential errors through data screening.  相似文献   

12.
Marques TA 《Biometrics》2004,60(3):757-763
Line transect sampling is one of the most widely used methods for animal abundance assessment. Standard estimation methods assume certain detection on the transect, no animal movement, and no measurement errors. Failure of the assumptions can cause substantial bias. In this work, the effect of error measurement on line transect estimators is investigated. Based on considerations of the process generating the errors, a multiplicative error model is presented and a simple way of correcting estimates based on knowledge of the error distribution is proposed. Using beta models for the error distribution, the effect of errors and of the proposed correction is assessed by simulation. Adequate confidence intervals for the corrected estimates are obtained using a bootstrap variance estimate for the correction and the delta method. As noted by Chen (1998, Biometrics 54, 899-908), even unbiased estimators of the distances might lead to biased density estimators, depending on the actual error distribution. In contrast with the findings of Chen, who used an additive model, unbiased estimation of distances, given a multiplicative model, lead to overestimation of density. Some error distributions result in observed distance distributions that make efficient estimation impossible, by removing the shoulder present in the original detection function. This indicates the need to improve field methods to reduce measurement error. An application of the new methods to a real data set is presented.  相似文献   

13.
Computational methods for the estimation of stoichiometric associationconstants for multiple-ligand binding systems are currentlybased on non-linear least-squares regression analysis. Thesecomputational methods require sophisticated, iterative algorithmsto assure convergence to a solution, as well as initial parameterand error estimates. A simple procedure, called lambda-invariancetesting (LIT), was developed that provides a single-pass (non-iterative)estimation of stoichiometric association constants. The LITmethod was applied to simulated binding data containing Gaussianerror and to real data drawn from the literature. This methodprovided parameter estimates essentially equivalent to thoseobtained by least-squares regression analysis, with no initialparameter or error estimates required. Received on May 26, 1987; accepted on July 27, 1987  相似文献   

14.
15.
Closer scrutiny has been accorded a recently reported procedure for characterizing weak protein dimerization by sedimentation equilibrium (INVEQ) in which the equilibrium distribution is analyzed as a dependence of radial distance on solute concentration rather than of solute concentration on radial distance. By demonstrating theoretically that the fundamental parameter derived from the analysis is simply the difference between the dimerization constant and the osmotic second virial coefficient for monomer-monomer interaction, this investigation refutes the original claim that independent estimates of these two parameters can be obtained by nonlinear curve fitting of the sedimentation equilibrium distribution. This criticism also applies to conventional analyses of sedimentation distributions by the commonly employed Beckman Origin and NONLIN software. Numerically simulated distributions are then analyzed to demonstrate limitations of the procedure and also to indicate a means of improving the reliability of the returned estimate of the dimerization constant. These features are illustrated by applying the original and revised analytical procedures to a sedimentation equilibrium distribution for alpha-chymotrypsin (pH 4.0, I 0.05 M).  相似文献   

16.
For many species, breeding population size is an important metric for assessing population status. A variety of simple methods are often used to estimate this metric for ground‐nesting birds that nest in open habitats (e.g., beaches, riverine sandbars). The error and bias associated with estimates derived using these methods vary in relation to differing monitoring intensities and detection rates. However, these errors and biases are often difficult to obtain, poorly understood, and largely unreported. A method was developed to estimate the number of breeding pairs using counts of nests and broods from monitoring data where multiple surveys were made throughout a single breeding season (breeding pair estimator; BPE). The BPE method was compared to two commonly used estimation methods using simulated data from an individual‐based model that allowed for the comparison of biases and accuracy. The BPE method underestimated the number of breeding pairs, but generally performed better than the other two commonly used methods when detection rates were low and monitoring frequency was high. As detection rates and time between surveys increased, the maximum nest and brood count method performs similar to the BPE. The BPE was compared to four commonly used methods to estimate breeding pairs for empirically derived data sets on the Platte River. Based on our simulated data, we expect our BPE to be closest to the true number of breeding pairs as compared to other methods. The methods tested resulted in substantially different estimates of the numbers of breeding pairs; however, coefficients from trend analyses were not statistically different. When data from multiple nest and brood surveys are available, the BPE appears to result in reasonably precise estimates of numbers of breeding pairs. Regardless of the estimation method, investigators are encouraged to acknowledge whether the method employed is likely to over‐ or underestimate breeding pairs. This study provides a means to recognize the potential biases in breeding pair estimates.  相似文献   

17.
The problem of comparing and pooling experimentally independent estimates of a parameter such as a Michaelis constant (K) has been treated as a simple analysis of variance of "within" and "between" set deviations from the fitted variable (v). As applied to assessing the reproducibility of multiple estimates of the same K, this is identical to the procedure of Duggleby (Anal. Biochem. 189, 84-87, 1990). However, the theory developed here shows that applying Duggleby's procedure to the comparison of two experiments (each consisting of multiple data sets) depends critically on the assumption of equal errors within and between the individual sets, i.e., Fvb vw = s2wv/s2bv is close to 1. Application of the method when this is not the case will underestimate the common error (s2rv), overestimate its associated degrees of freedom (vr = vb+vw), and may suggest apparently significant differences where there are none. The theory also shows that this situation is an instance of the Fisher-Behrens problem and shows how Welch's solution can be applied. This gives the between set error s2bv as the corrected estimate of the common error and the corrected degrees of freedom as a simple function of vb, vw, and Fvb vw. When the nine prephenate dehydratase data sets which originally showed three apparently significant differences were reanalyzed in this way, all the variations in K were found to be within the range of the experimental error.  相似文献   

18.
I. Birth and death rates of natural cladoceran populations cannot be measured directly. Estimates of these population parameters must be calculated using methods that make assumptions about the form of population growth. These methods generally assume that the population has a stable age distribution.
2. To assess the effect of variable age distributions, we tested six egg ratio methods for estimating birth and death rates with data from thirty-seven laboratory populations of Daphnia pulicaria. The populations were grown under constant conditions, but the initial age distributions and egg ratios of the populations varied. Actual death rates were virtually zero, so the difference between the estimated and actual death rates measured the error in both birth and death rate estimates.
3. The results demonstrate that unstable population structures may produce large errors in the birth and death rates estimated by any of these methods. Among the methods tested, Taylor and Slatkin's formula and Paloheimo's formula were most reliable for the experimental data.
4. Further analyses of three of the methods were made using computer simulations of growth of age-structured populations with initially unstable age distributions. These analyses show that the time interval between sampling strongly influences the reliability of birth and death rate estimates. At a sampling interval of 2.5 days (equal to the duration of the egg stage), Paloheimo's formula was most accurate. At longer intervals (7.5–10 days), Taylor and Slatkin's formula which includes information on population structure was most accurate.  相似文献   

19.
The developmental mechanisms behind developmental instability (DI) are only poorly understood. Nevertheless, fluctuating asymmetry (FA) is often used a surrogate for DI. Based on statistical arguments it is often assumed that individual levels of FA are only weakly associated with the underlying DI. Patterns in FA therefore need to be interpreted with caution, and should ideally be transformed into patterns in DI. In order to be able to achieve that, assumptions about the distribution of developmental errors must be made. Current models assume that errors during development are additive and independent such that they yield a normal distribution. The observation that the distribution of FA is often leptokurtic has been interpreted as evidence for between-individual variation in DI. This approach has led to unrealistically high estimates of between-individual variation in DI, and potentially incorrect interpretations of patterns in FA, especially at the individual level. Recently, it has been suggested that the high estimates of variation in DI may be biased upward because either developmental errors are log-normal or gamma distributed and/or low measurement resolution of FA. A proper estimation of the amount (and shape) of heterogeneity in DI is crucial for the interpretation of patterns in FA and their transformation into patterns in DI. Yet, incorrect model assumptions may render misleading inferences. We therefore develop a statistical model to evaluate the sensitivity of results under the normal error model against the two alternative distributions as well as to investigate the importance of low measurement resolution. An analysis of simulated and empirical data sets indicated that bias due to misspecification of the developmental error distribution can be substantial, yet, did not appear to reduce estimates of variation in DI in empirical data sets to a large extent. Effects of low measurement resolution were neglectable. The importance of these results are discussed in the context of the interpretation of patterns in FA.  相似文献   

20.
Guolo A 《Biometrics》2008,64(4):1207-1214
SUMMARY: We investigate the use of prospective likelihood methods to analyze retrospective case-control data where some of the covariates are measured with error. We show that prospective methods can be applied and the case-control sampling scheme can be ignored if one adequately models the distribution of the error-prone covariates in the case-control sampling scheme. Indeed, subject to this, the prospective likelihood methods result in consistent estimates and information standard errors are asymptotically correct. However, the distribution of such covariates is not the same in the population and under case-control sampling, dictating the need to model the distribution flexibly. In this article, we illustrate the general principle by modeling the distribution of the continuous error-prone covariates using the skewnormal distribution. The performance of the method is evaluated through simulation studies, which show satisfactory results in terms of bias and coverage. Finally, the method is applied to the analysis of two data sets which refer, respectively, to a cholesterol study and a study on breast cancer.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号