首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Pedigree data can be evaluated, and subsequently corrected, by analysis of the distribution of genetic markers, taking account of the possibility of mistyping . Using a model of pedigree error developed previously, we obtained the maximum likelihood estimates of error parameters in pedigree data from Tokelau. Posterior probabilities for the possible true relationships in each family are conditional on the putative relationships and the marker data are calculated using the parameter estimates. These probabilities are used as a basis for discriminating between pedigree error and genetic marker errors in families where inconsistencies have been observed. When applied to the Tokelau data and compared with the results of retyping inconsistent families, these statistical procedures are able to discriminate between pedigree and marker error, with approximately 90% accuracy, for families with two or more offspring. The large proportion of inconsistencies inferred to be due to marker error (61%) indicates the importance of discriminating between error sources when judging the reliability of putative relationship data. Application of our model of pedigree error has proved to be an efficient way of determining and subsequently correcting sources of error in extensive pedigree data collected in large surveys.  相似文献   

2.

Background  

Increasingly researchers are turning to the use of haplotype analysis as a tool in population studies, the investigation of linkage disequilibrium, and candidate gene analysis. When the phase of the data is unknown, computational methods, in particular those employing the Expectation-Maximisation (EM) algorithm, are frequently used for estimating the phase and frequency of the underlying haplotypes. These methods have proved very successful, predicting the phase-known frequencies from data for which the phase is unknown with a high degree of accuracy. Recently there has been much speculation as to the effect of unknown, or missing allelic data – a common phenomenon even with modern automated DNA analysis techniques – on the performance of EM-based methods. To this end an EM-based program, modified to accommodate missing data, has been developed, incorporating non-parametric bootstrapping for the calculation of accurate confidence intervals.  相似文献   

3.
G F Reed  R B McHugh 《Biometrics》1979,35(2):473-478
Diagnostic error in large scale screening programs for dental caries is frequent, as when teeth initially classified as carious are later diagnosed as never having been affected by caries. This paper presents a general formulation of diagnostic error in dental caries screening. The basic parameter of the formulation is the caries incidence rate. The general formulation of this paper permits an explicit comparison, in a common notation, of two specific models of diagnostic error--one due to Carlos and Senning, the other to Lu. Each model gives rise to a consistent estimator of the incidence rate. The distributional properties of these estimators had not previously been examined because of the dependence of teeth in the same mouth with respect to caries experience. Using the present formulation, the sampling scheme may be regarded as a one-stage cluster sample with mouths as clusters. This approach accounts for intracluster dependence thus permitting the derivation of an estimator of the relevant covariance matrix and a confidence interval for the incidence rate.  相似文献   

4.
The power of QTL mapping by a mixed-model approach has been studied for hybrid crops but remains unknown in self-pollinated crops. Our objective was to evaluate the usefulness of mixed-model QTL mapping in the context of a breeding program for a self-pollinated crop. Specifically, we simulated a soybean (Glycine max L. Merr.) breeding program and applied a mixed-model approach that comprised three steps: variance component estimation, single-marker analyses, and multiple-marker analysis. Average power to detect QTL ranged from <1 to 47% depending on the significance level (0.01 or 0.0001), number of QTL (20 or 80), heritability of the trait (0.40 or 0.70), population size (600 or 1,200 inbreds), and number of markers (300 or 600). The corresponding false discovery rate ranged from 2 to 43%. Larger populations, higher heritability, and fewer QTL controlling the trait led to a substantial increase in power and to a reduction in the false discovery rate and bias. A stringent significance level reduced both the power and false discovery rate. There was greater power to detect major QTL than minor QTL. Power was higher and the false discovery rate was lower in hybrid crops than in self-pollinated crops. We conclude that mixed-model QTL mapping is useful for gene discovery in plant breeding programs of self-pollinated crops.  相似文献   

5.
A general expression for the likelihood of a set of phenotypic observations on a randomly sampled pedigree, suitable for a wide variety of genetic models, has been previously modified to allow for independent ascertainments via probands. In this paper, further allowance is made for the fact that a pedigree usually contains some individuals who, whatever their phentoype, could never be probands, and we derive the limiting form of the likelihood appropriate for single ascertainment. The case when the sampling frame is ill-defined is discussed, and suggestions made for how to proceed in such a case.  相似文献   

6.
This paper investigates marker-assisted introgression of a major gene into an outbred line, where identification of the introgressed gene is incomplete because marker alleles are not unique to the base populations (the same marker allele can occur in both donor and recipient population). Those markers are used to identify the introgressed allele as well as the background genotype. The effect of using those markers, as if they were completely informative on the retention of the introgressed allele, was examined over five generations of backcrossing by using a single marker or a marker bracket for different starting frequencies of the marker alleles. Results were calculated by using both a deterministic approach, where selection is only for the desired allele, and by a stochastic approach, where selection is also on background genotype. When marker allele frequencies in donor and recipient population diverged from 1 and 0 (using a diallelic marker), the ability to retain the desired allele rapidly declined. Marker brackets performed notably better than single markers. If selection on background marker genotype was applied, the desired allele could be lost even more quickly than expected at random because the chance that the allele, which is common in the donor line, is present on the locus identifying the introgressed allele and is surrounded by alleles common in the recipient line on the background marker loci, will descend from the donor line (double recombination has taken place), is a lot smaller than the chance that this allele will stem from the recipient line (in which the allele occurs in low frequency). Marker brackets again performed better. Preselection against marker homozygotes (producing uninformative gametes) gave a slightly better retention of the introgressed allele.  相似文献   

7.
8.
S Chen  C Cox 《Biometrics》1992,48(2):593-598
We consider a regression to the mean problem with a very large sample for the first measurement and relatively small subsample for the second measurement, selected on the basis of the initial measurement. This is a situation that often occurs in screening trials. We propose to estimate the unselected population mean and variance from the first measurement in the larger sample. Using these estimates, the correlation between the two measurements, as well as an effect of treatment, can be estimated in simple and explicit form. Under the condition that the size of the subsample is of a smaller order, the new estimators for all the four parameters are as asymptotically efficient as the usual maximum likelihood estimators. Tests based on this new approach are also discussed. An illustration from a cholesterol screening study is included.  相似文献   

9.
A method is developed for identifying measurement errors and estimating fermentation states in the presence of unidentified reactant or product. Unlike conventional approaches using elemental balances, this method employs an empirically determined basis, which can tolerate unidentified reaction species. The essence of this approach is derived from the concept of reaction subspace and the technique of singular value decomposition. It is shown that the subspace determined via singular value decomposition of multiple experimental data provides an empirical basis for identifying measurement errors. The same approach is applied to fermentation state estimation. Via the formulation of the reaction subspace, the sensitivity of state estimates to measurement errors is quantified in terms of a dimensionless quantity, maximum error gain (MEG). It is shown that using the empirically determined subspace, one can circumvent the problem of unidentified reaction species, meanwhile reducing the sensitivity of the estimates.  相似文献   

10.
We investigate the effects of measurement error on the estimationof nonparametric variance functions. We show that either ignoringmeasurement error or direct application of the simulation extrapolation,SIMEX, method leads to inconsistent estimators. Nevertheless,the direct SIMEX method can reduce bias relative to a naiveestimator. We further propose a permutation SIMEX method thatleads to consistent estimators in theory. The performance ofboth the SIMEX methods depends on approximations to the exactextrapolants. Simulations show that both the SIMEX methods performbetter than ignoring measurement error. The methodology is illustratedusing microarray data from colon cancer patients.  相似文献   

11.
12.
This study examined the method of simultaneous estimation of recombination frequency and parameters for a qualitative trait locus and compared the results with those of standard methods of linkage analysis. With both approaches we were able to detect linkage of an incompletely penetrant qualitative trait to highly polymorphic markers with recombination frequencies in the range of .00-.05. Our results suggest that detecting linkage at larger recombination frequencies may require larger data sets or large high-density families. When applied to all families without regard to informativeness of the family structure for linkage, analyses of simulated data could detect no advantage of simultaneous estimation over more traditional and much less time-consuming methods, either in detecting linkage, estimating frequency, refining estimates of parameters for the qualitative trait locus, or avoiding false evidence for linkage. However, the method of sampling affected results.  相似文献   

13.
Luo Y  Lin S 《Biometrics》2003,59(2):393-401
Genetic marker data has been increasingly incorporated into segregation analysis, as combined segregation and linkage analysis has been performed more frequently. In this article, we study the extent of information gains with incorporation of marker data in segregation analysis, a topic that has not been investigated rigorously. Specifically, the current study is to investigate the influence of marker data on genetic model parameter estimation. A variance matrix criterion (as the inverse of the Fisher information matrix) and a relative entropy criterion (a measure of flatness of expected log-likelihood surface) are used to quantify the information gains. Our results indicate that substantial information gain can be achieved with the incorporation of marker data. The amount of variance reduction increases as the heterozygosity of the linked marker increases and as the trait gets closer to the linked marker(s). Incorporation of marker data in larger pedigrees also yields greater information gains based on both criteria. The effect of pedigree structure is also studied.  相似文献   

14.
BACKGROUND: In human pedigree data age at disease occurrence frequently is missing and is imputed using various methods. However, little is known about the performance of these methods when applied to families. In particular, there is little information about the level of agreement between imputed and actual values of temporal data and their effects on inferences. METHODS: We performed two evaluations of five imputation methods used to generate complete data for repositories to be shared by many investigators. Two of the methods are mean substitution methods, two are regression methods and one is a multiple imputation method based on one of the regression methods. To evaluate the methods, we randomly deleted the years of disease diagnosis of some men in a sample of pedigrees ascertained as part of a prostate cancer study. In the first evaluation, we used the five methods to impute the missing diagnosis years and evaluated agreement between imputed and actual values. In the second evaluation, we compared agreement between regression coefficients estimated using imputed diagnosis years with those estimated using the actual years. RESULTS/CONCLUSIONS: For both evaluations, we found optimal or near-optimal performance from a regression method that imputes a man's diagnosis year based on the year of birth and year of last observation of all affected men with complete data. The multiple imputation analogue of this method also performed well.  相似文献   

15.
The integrated squared error estimation of parameters   总被引:2,自引:0,他引:2  
HEATHCOTE  C. R. 《Biometrika》1977,64(2):255-264
  相似文献   

16.
17.
In many medical studies, markers are contingent on recurrent events and the cumulative markers are usually of interest. However, the recurrent event process is often interrupted by a dependent terminal event, such as death. In this article, we propose a joint modeling approach for analyzing marker data with informative recurrent and terminal events. This approach introduces a shared frailty to specify the explicit dependence structure among the markers, the recurrent, and terminal events. Estimation procedures are developed for the model parameters and the degree of dependence, and a prediction of the covariate‐specific cumulative markers is provided. The finite sample performance of the proposed estimators is examined through simulation studies. An application to a medical cost study of chronic heart failure patients from the University of Virginia Health System is illustrated.  相似文献   

18.
Most quantitative trait locus (QTL) mapping studies in plants have used designed mapping populations. As an alternative to traditional QTL mapping, in silico mapping via a mixed-model approach simultaneously exploits phenotypic, genotypic, and pedigree data already available in breeding programs. The statistical power of this in silico mapping method, however, remains unknown. Our objective was to evaluate the power of in silico mapping via a mixed-model approach in hybrid crops. We used maize (Zea mays L.) as a model species to study, by computer simulation, the influence of number of QTLs (20 or 80), heritability (0.40 or 0.70), number of markers (200 or 400), and sample size (600 or 2,400 hybrids). We found that the average power to detect QTLs ranged from 0.11 to 0.59 for a significance level of =0.01, and from 0.01 to 0.47 for =0.0001. The false discovery rate ranged from 0.22 to 0.74 for =0.01, and from 0.05 to 0.46 for =0.0001. As with designed mapping experiments, a large sample size, high marker density, high heritability, and small number of QTLs led to the highest power for in silico mapping via a mixed-model approach. The power to detect QTLs with large effects was greater than the power to detect QTL with small effects. We conclude that gene discovery in hybrid crops can be initiated by in silico mapping. Finding an acceptable compromise, however, between the power to detect QTL and the proportion of false QTL would be necessary.  相似文献   

19.
Gene mapping and genetic epidemiology require large-scale computation of likelihoods based on human pedigree data. Although computation of such likelihoods has become increasingly sophisticated, fast calculations are still impeded by complex pedigree structures, by models with many underlying loci and by missing observations on key family members. The current paper 'introduces' a new method of array factorization that substantially accelerates linkage calculations with large numbers of markers. This method is not limited to nuclear families or to families with complete phenotyping. Vectorization and parallelization are two general-purpose hardware techniques for accelerating computations. These techniques can assist in the rapid calculation of genetic likelihoods. We describe our experience using both of these methods with the existing program MENDEL. A vectorized version of MENDEL was run on an IBM 3090 supercomputer. A parallelized version of MENDEL was run on parallel machines of different architectures and on a network of workstations. Applying these revised versions of MENDEL to two challenging linkage problems yields substantial improvements in computational speed.  相似文献   

20.
Dupuis JA  Joachim J 《Biometrics》2006,62(3):706-712
We consider the problem of estimating the number of species of an animal community. It is assumed that it is possible to draw up a list of species liable to be present in this community. Data are collected from quadrat sampling. Models considered in this article separate the assumptions related to the experimental protocol and those related to the spatial distribution of species in the quadrats. Our parameterization enables us to incorporate prior information on the presence, detectability, and spatial density of species. Moreover, we elaborate procedures to build the prior distributions on these parameters from information furnished by external data. A simulation study is carried out to examine the influence of different priors on the performances of our estimator. We illustrate our approach by estimating the number of nesting bird species in a forest.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号