首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
Misspecification of relationships and of genotype data can cause problems in linkage analyses based on genome-scan data. Previous reports have focused on pairwise relationships and a simple error model. This article considers the increased information available from the joint analysis of trios of individuals, integrating this analysis with an error model that allows for the most common genotyping errors. Given observed marker phenotypes in a genome scan, computational methods are outlined both for likelihoods of relationships and for the posterior probabilities of underlying genotypes. The methods are applied to examples from two real data sets: one has been previously well analyzed, and, hence, Mendelian inconsistencies have been removed; the other typifies the pedigree and genotype errors encountered in the initial analyses of a study. It is demonstrated that the coupling of relationship inference and error detection is quite effective, that the error model is computationally practical, and that data on a third relative can often clarify relationships.  相似文献   

2.
Prior to performance of linkage analysis, elimination of all Mendelian inconsistencies in the pedigree data is essential. Often, identification of erroneous genotypes by visual inspection can be very difficult and time consuming. In fact, sometimes the errors are not recognized until the stage of running linkage-analysis software. The effort then required to find the erroneous genotypes and to cross-reference pedigree and marker data that may have been recoded and renumbered can be not only tedious but also quite daunting, in the case of very large pedigrees. We have implemented four error-checking algorithms in a new computer program, PedCheck, which will assist researchers in identifying all Mendelian inconsistencies in pedigree data and will provide them with useful and detailed diagnostic information to help resolve the errors. Our program, which uses many of the algorithms implemented in VITESSE, handles large data sets quickly and efficiently, accepts a variety of input formats, and offers various error-checking algorithms that match the subtlety of the pedigree error. These algorithms range from simple parent-offspring-compatibility checks to a single-locus likelihood-based statistic that identifies and ranks the individuals most likely to be in error. We use various real data sets to illustrate the power and effectiveness of our program.  相似文献   

3.
The pedigree and genotype data from the Framingham Heart Study were examined for errors. Errors in 21 of 329 pedigrees were detected with the program PREST, and of these the errors in 16 pedigrees were resolved. Genotyping errors were then detected with SIMWALK2. Five Mendelian errors were found following the pedigree corrections. Double-recombinant errors were more common, with 142 being detected at mistyping probabilities of 0.25 or greater.  相似文献   

4.
Johnson PC  Haydon DT 《Genetics》2007,175(2):827-842
The importance of quantifying and accounting for stochastic genotyping errors when analyzing microsatellite data is increasingly being recognized. This awareness is motivating the development of data analysis methods that not only take errors into consideration but also recognize the difference between two distinct classes of error, allelic dropout and false alleles. Currently methods to estimate rates of allelic dropout and false alleles depend upon the availability of error-free reference genotypes or reliable pedigree data, which are often not available. We have developed a maximum-likelihood-based method for estimating these error rates from a single replication of a sample of genotypes. Simulations show it to be both accurate and robust to modest violations of its underlying assumptions. We have applied the method to estimating error rates in two microsatellite data sets. It is implemented in a computer program, Pedant, which estimates allelic dropout and false allele error rates with 95% confidence regions from microsatellite genotype data and performs power analysis. Pedant is freely available at http://www.stats.gla.ac.uk/ approximately paulj/pedant.html.  相似文献   

5.
Inferring the haplotypes of the members of a pedigree from their genotypes has been extensively studied. However, most studies do not consider genotyping errors and de novo mutations. In this paper, we study how to infer haplotypes from genotype data that may contain genotyping errors, de novo mutations, and missing alleles. We assume that there are no recombinants in the genotype data, which is usually true for tightly linked markers. We introduce a combinatorial optimization problem, called haplotype configuration with mutations and errors (HCME), which calls for haplotype configurations consistent with the given genotypes that incur no recombinants and require the minimum number of mutations and errors. HCME is NP-hard. To solve the problem, we propose a heuristic algorithm, the core of which is an integer linear program (ILP) using the system of linear equations over Galois field GF(2). Our algorithm can detect and locate genotyping errors that cannot be detected by simply checking the Mendelian law of inheritance. The algorithm also offers error correction in genotypes/haplotypes rather than just detecting inconsistencies and deleting the involved loci. Our experimental results show that the algorithm can infer haplotypes with a very high accuracy and recover 65%-94% of genotyping errors depending on the pedigree topology.  相似文献   

6.
Sun L  Wilder K  McPeek MS 《Human heredity》2002,54(2):99-110
Accurate information on the relationships among individuals in a study is critical for valid linkage analysis. We extend the MLLR, EIBD, AIBS and IBS tests for detection of misspecified relationships to a broader range of relative pairs, and we improve the two-stage screening procedure for analyzing large data sets. We have developed software, PREST, which calculates the test statistics and performs the corresponding hypothesis tests for relationship misclassification in general outbred pedigrees. When a potential pedigree error is detected, our companion program, ALTERTEST, can be used to determine which relationships are compatible with the genotype data. Both programs are now freely available on the web.  相似文献   

7.
A simulation module is built into the software package colony to simulate marker genotype data of individuals with a predefined parentage and sibship structure. The simulated data can then be used to compare the accuracy, robustness and computational efficiency of different methods for sibship and parentage reconstruction, to examine the impact of different parameter options in a software on its accuracy and computational efficiency and to assess the information sufficiency of a given set of markers for a sibship and parentage analysis. This computer note describes the method used for simulating genotype data with a pedigree and its possible applications. The method can quickly generate genotype data for a one‐ or two‐generation pedigree of virtually any complexity with up to 30k offspring, at up to 30k codominant or dominant loci with an arbitrary degree of linkage and a user‐defined mistyping rate. The data can be fed directly into the colony program for analysis by three sibship and parentage reconstruction methods and can also be imported into other programs such as Excel and R. With slight modification, the data can be analysed by other relationship analysis software.  相似文献   

8.
In a genetic analysis of a polymorphic system, differences between the observed type of an individual and that expected from the parental types can arise either from an incorrect model or from pedigree errors. Such pedigree errors can cause severe difficulties in studies of the mode of inheritance of a novel polymorphic system. A method is proposed which overcomes the problem by including sire and dam error rates explicitly in the genetic model. The error rates are estimated by maximum likelihood, and likelihood ratio tests used to compare different models or estimates from different data sets. The proposals are applied to a study of the inheritance of the bovine serum AmI amylases.  相似文献   

9.
A user-friendly Hypercard interface for human linkage analysis   总被引:3,自引:0,他引:3  
The availability of a large number of highly informative geneticmarkers has made human linkage analysis faster and easier toperform. However, current linkage analysis software does notprovide an organizational database into which a large body oflinkage data can be easily stored and manipulated. This manualentry and editing of linkage data is often time consuming andprone to typing errors. In addition, the large number of allelesin many of these markers must be reduced in order to performlinkage analysis with multiple loci across large genetic distances.This reduction in allele number is often difficult and confusing,especially in large pedigrees. We have taken advantage of theMacintosh-based Hypercard program to develop an interface withwhich linkage data can be easily stored, retrieved and edited.For each family, the components of the pedigree, including IDnumbers, sex and affection status, only need to be entered once.The program (Linkage Interface) retrieves this information eachtime the data from a new polymorphic marker is entered. LinkageInterface has flexible editing capabilities that allow the userto change any portion of the pedigree, including the additionor deletion of family members, without affecting previouslyentered genotype data. Linkage Interface can also analyze boththe pedigree and marker data and will detect any inconsistenciesin inheritance patterns. In addition, the program can reducethe number of alleles for a polynwrphic marker. Linkage Interfacewill then compare the ‘reduced’ data to the originalmarker data and assists in maintaining all informative meiosesby pointing out which meioses have become non-informative. Oncepolymorphic marker data are entered, the pedigree data, includingthe marker genotypes, are easily exported to a text file. Thistext file can be transferred to an IBM-compatible computer fordirect use with DOS-based linkage programs.  相似文献   

10.
Error detection for genetic data, using likelihood methods.   总被引:6,自引:3,他引:3       下载免费PDF全文
As genetic maps become denser, the effect of laboratory typing errors becomes more serious. We review a general method for detecting errors in pedigree genotyping data that is a variant of the likelihood-ratio test statistic. It pinpoints individuals and loci with relatively unlikely genotypes. Power and significance studies using Monte Carlo methods are shown by using simulated data with pedigree structures similar to the CEPH pedigrees and a larger experimental pedigree used in the study of idiopathic dilated cardiomyopathy (DCM). The studies show the index detects errors for small values of theta with high power and an acceptable false positive rate. The method was also used to check for errors in DCM laboratory pedigree data and to estimate the error rate in CEPH-chromosome 6 data. The errors flagged by our method in the DCM pedigree were confirmed by the laboratory. The results are consistent with estimated false-positive and false-negative rates obtained using simulation.  相似文献   

11.
Studies of the quantitative genetics of natural populations have contributed greatly to evolutionary biology in recent years. However, while pedigree data required are often uncertain (i.e. incomplete and partly erroneous) and limited, means to evaluate the effects of such uncertainties have not been developed. We have therefore developed a general framework for power and sensitivity analyses of such studies. We propose that researchers first generate a set of pedigree data that they wish to use in a quantitative genetic study, as well as data regarding errors that occur in that pedigree. This pedigree is then permuted using the data regarding errors to generate hypothetical 'true' and 'assumed' pedigrees that differ so as to mimic pedigree errors that might occur in the study system under consideration. Phenotypic data are then simulated across the true pedigree (according to user-defined genetic and environmental covariance structures), before being analysed with standard quantitative genetic techniques in conjunction with the 'assumed' pedigree data. To illustrate this approach, we conducted power and sensitivity analyses in a well-known study of Soay sheep (Ovis aries). We found that, although the estimation of simple genetic (co)variance structures is fairly robust to pedigree errors, some potentially serious biases were detected under more complex scenarios involving maternal effects. Power analyses also showed that this study system provides high power to detect heritabilities as low as about 0.09. Given this range of results, we suggest that such power and sensitivity analyses could greatly complement empirical studies, and we provide the computer program PEDANTICS to aid in their application.  相似文献   

12.
Almudevar A 《Biometrics》2001,57(3):757-763
The problem of assessing the variability in pedigree reconstruction using DNA markers is considered for the special case of single generation samples with no parents present. Error in pedigree reconstruction is measured through a metric imposed on the space of partitions of the individuals into family groups. A confidence set can therefore be taken to be a neighborhood of a point estimate, analogous to the estimation of a parameter in Euclidean space. The coverage probability is estimated using bootstrap techniques. Although the distributional properties of the sample depend on the population genotype frequencies, these are in practice usually unknown. Confidence sets conditioned on a statistic approximately sufficient for these frequencies are compared with confidence sets obtained by substituting frequency estimates directly into the sampling distribution. In two simulation studies, the difference is found to be of some consequence.  相似文献   

13.
Inference of haplotypes is important in genetic epidemiology studies. However, all large genotype data sets have errors due to the use of inexpensive genotyping machines that are fallible and shortcomings in genotyping scoring softwares, which can have an enormous impact on haplotype inference. In this article, we propose two novel strategies to reduce the impact induced by genotyping errors in haplotype inference. The first method makes use of double sampling. For each individual, the “GenoSpectrum” that consists of all possible genotypes and their corresponding likelihoods are computed. The second method is a genotype clustering algorithm based on multi‐genotyping data, which also assigns a “GenoSpectrum” for each individual. We then describe two hybrid EM algorithms (called DS‐EM and MG‐EM) that perform haplotype inference based on “GenoSpectrum” of each individual obtained by double sampling and multi‐genotyping data. Both simulated data sets and a quasi real‐data set demonstrate that our proposed methods perform well in different situations and outperform the conventional EM algorithm and the HMM algorithm proposed by Sun, Greenwood, and Neal (2007, Genetic Epidemiology 31 , 937–948) when the genotype data sets have errors.  相似文献   

14.
This paper introduces a likelihood method of estimating ethnic admixture that uses individuals, pedigrees, or a combination of individuals and pedigrees. For each founder of a pedigree, admixture proportions are calculated by conditioning on the pedigree-wide genotypes at all ancestry-informative markers. These estimates are then propagated down the pedigree to the nonfounders by a simple averaging process. The large-sample standard errors of the founders' proportions can be similarly transformed into standard errors for the admixture proportions of the descendants. These standard errors are smaller than the corresponding standard errors when each individual is treated independently. Both hard and soft information on a founder's ancestry can be accommodated in this scheme, which has been implemented in the genetic software package Mendel. The utility of the method is demonstrated on simulated data and a real data example involving Mexican families of mixed Amerindian and Spanish ancestry.  相似文献   

15.
With the widespread availability of SNP genotype data, there is great interest in analyzing pedigree haplotype data. Intermarker linkage disequilibrium for microsatellite markers is usually low due to their physical distance; however, for dense maps of SNP markers, there can be strong linkage disequilibrium between marker loci. Linkage analysis (parametric and nonparametric) and family-based association studies are currently being carried out using dense maps of SNP marker loci. Monte Carlo methods are often used for both linkage and association studies; however, to date there are no programs available which can generate haplotype and/or genotype data consisting of a large number of loci for pedigree structures. SimPed is a program that quickly generates haplotype and/or genotype data for pedigrees of virtually any size and complexity. Marker data either in linkage disequilibrium or equilibrium can be generated for greater than 20,000 diallelic or multiallelic marker loci. Haplotypes and/or genotypes are generated for pedigree structures using specified genetic map distances and haplotype and/or allele frequencies. The simulated data generated by SimPed is useful for a variety of purposes, including evaluating methods that estimate haplotype frequencies for pedigree data, evaluating type I error due to intermarker linkage disequilibrium and estimating empirical p values for linkage and family-based association studies.  相似文献   

16.
Pedigree reconstruction using genotypic markers has become an important tool for the study of natural populations. The nonstandard nature of the underlying statistical problems has led to the necessity of developing specialized statistical and computational methods. In this article, a new version of pedigree reconstruction tools (PRT 2.0) is presented. The software implements algorithms proposed in Almudevar & Field (Journal of Agricultural Biological and Environmental Statistics, 4, 1999, 136) and Almudevar (Biometrics, 57, 2001a, 757) for the reconstruction of single generation sibling groups (SG). A wider range of enumeration algorithms is included, permitting improved computational performance. In particular, an iterative version of the algorithm designed for larger samples is included in a fully automated form. The new version also includes expanded simulation utilities, as well as extensive reporting, including half-sibling compatibility, parental genotype estimates and flagging of potential genotype errors. A number of alternative algorithms are described and demonstrated. A comparative discussion of the underlying methodologies is presented. Although important aspects of this problem remain open, we argue that a number of methodologies including maximum likelihood estimation (COLONY 1.2 and 2.0) and the set cover formulation (KINALYZER) exhibit undesirable properties in the sibling reconstruction problem. There is considerable evidence that large sets of individuals not genetically excluded as siblings can be inferred to be a true sibling group, but it is also true that unrelated individuals may be genetically compatible with a true sibling group by chance. Such individuals may be identified on a statistical basis. PRT 2.0, based on these sound statistical principles, is able to efficiently match or exceed the highest reported accuracy rates, particularly for larger SG. The new version is available at http://www.urmc.rochester.edu/biostat/people/faculty/almudevar.cfm.  相似文献   

17.
The purpose of this work is the development of a family-based association test that allows for random genotyping errors and missing data and makes use of information on affected and unaffected pedigree members. We derive the conditional likelihood functions of the general nuclear family for the following scenarios: complete parental genotype data and no genotyping errors; only one genotyped parent and no genotyping errors; no parental genotype data and no genotyping errors; and no parental genotype data with genotyping errors. We find maximum likelihood estimates of the marker locus parameters, including the penetrances and population genotype frequencies under the null hypothesis that all penetrance values are equal and under the alternative hypothesis. We then compute the likelihood ratio test. We perform simulations to assess the adequacy of the central chi-square distribution approximation when the null hypothesis is true. We also perform simulations to compare the power of the TDT and this likelihood-based method. Finally, we apply our method to 23 SNPs genotyped in nuclear families from a recently published study of idiopathic scoliosis (IS). Our simulations suggest that this likelihood ratio test statistic follows a central chi-square distribution with 1 degree of freedom under the null hypothesis, even in the presence of missing data and genotyping errors. The power comparison shows that this likelihood ratio test is more powerful than the original TDT for the simulations considered. For the IS data, the marker rs7843033 shows the most significant evidence for our method (p = 0.0003), which is consistent with a previous report, which found rs7843033 to be the 2nd most significant TDTae p value among a set of 23 SNPs.  相似文献   

18.
Hao K  Li C  Rosenow C  Hung Wong W 《Genomics》2004,84(4):623-630
Currently, most analytical methods assume all observed genotypes are correct; however, it is clear that errors may reduce statistical power or bias inference in genetic studies. We propose procedures for estimating error rate in genetic analysis and apply them to study the GeneChip Mapping 10K array, which is a technology that has recently become available and allows researchers to survey over 10,000 SNPs in a single assay. We employed a strategy to estimate the genotype error rate in pedigree data. First, the "dose-response" reference curve between error rate and the observable error number were derived by simulation, conditional on given pedigree structures and genotypes. Second, the error rate was estimated by calibrating the number of observed errors in real data to the reference curve. We evaluated the performance of this method by simulation study and applied it to a data set of 30 pedigrees genotyped using the GeneChip Mapping 10K array. This method performed favorably in all scenarios we surveyed. The dose-response reference curve was monotone and almost linear with a large slope. The method was able to estimate accurately the error rate under various pedigree structures and error models and under heterogeneous error rates. Using this method, we found that the average genotyping error rate of the GeneChip Mapping 10K array was about 0.1%. Our method provides a quick and unbiased solution to address the genotype error rate in pedigree data. It behaves well in a wide range of settings and can be easily applied in other genetic projects. The robust estimation of genotyping error rate allows us to estimate power and sample size and conduct unbiased genetic tests. The GeneChip Mapping 10K array has a low overall error rate, which is consistent with the results obtained from alternative genotyping assays.  相似文献   

19.
Errors in genotype calling can have perverse effects on genetic analyses, confounding association studies, and obscuring rare variants. Analyses now routinely incorporate error rates to control for spurious findings. However, reliable estimates of the error rate can be difficult to obtain because of their variance between studies. Most studies also report only a single estimate of the error rate even though genotypes can be miscalled in more than one way. Here, we report a method for estimating the rates at which different types of genotyping errors occur at biallelic loci using pedigree information. Our method identifies potential genotyping errors by exploiting instances where the haplotypic phase has not been faithfully transmitted. The expected frequency of inconsistent phase depends on the combination of genotypes in a pedigree and the probability of miscalling each genotype. We develop a model that uses the differences in these frequencies to estimate rates for different types of genotype error. Simulations show that our method accurately estimates these error rates in a variety of scenarios. We apply this method to a dataset from the whole-genome sequencing of owl monkeys (Aotus nancymaae) in three-generation pedigrees. We find significant differences between estimates for different types of genotyping error, with the most common being homozygous reference sites miscalled as heterozygous and vice versa. The approach we describe is applicable to any set of genotypes where haplotypic phase can reliably be called and should prove useful in helping to control for false discoveries.  相似文献   

20.

Background

Here we present two new computer tools, PREMIM and EMIM, for the estimation of parental and child genetic effects, based on genotype data from a variety of different child-parent configurations. PREMIM allows the extraction of child-parent genotype data from standard-format pedigree data files, while EMIM uses the extracted genotype data to perform subsequent statistical analysis. The use of genotype data from the parents as well as from the child in question allows the estimation of complex genetic effects such as maternal genotype effects, maternal-foetal interactions and parent-of-origin (imprinting) effects. These effects are estimated by EMIM, incorporating chosen assumptions such as Hardy-Weinberg equilibrium or exchangeability of parental matings as required.

Results

In application to simulated data, we show that the inference provided by EMIM is essentially equivalent to that provided by alternative (competing) software packages such as MENDEL and LEM. However, PREMIM and EMIM (used in combination) considerably outperform MENDEL and LEM in terms of speed and ease of execution.

Conclusions

Together, EMIM and PREMIM provide easy-to-use command-line tools for the analysis of pedigree data, giving unbiased estimates of parental and child genotype relative risks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号