首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 26 毫秒
1.
Pedigree reconstruction using genotypic markers has become an important tool for the study of natural populations. The nonstandard nature of the underlying statistical problems has led to the necessity of developing specialized statistical and computational methods. In this article, a new version of pedigree reconstruction tools (PRT 2.0) is presented. The software implements algorithms proposed in Almudevar & Field (Journal of Agricultural Biological and Environmental Statistics, 4, 1999, 136) and Almudevar (Biometrics, 57, 2001a, 757) for the reconstruction of single generation sibling groups (SG). A wider range of enumeration algorithms is included, permitting improved computational performance. In particular, an iterative version of the algorithm designed for larger samples is included in a fully automated form. The new version also includes expanded simulation utilities, as well as extensive reporting, including half-sibling compatibility, parental genotype estimates and flagging of potential genotype errors. A number of alternative algorithms are described and demonstrated. A comparative discussion of the underlying methodologies is presented. Although important aspects of this problem remain open, we argue that a number of methodologies including maximum likelihood estimation (COLONY 1.2 and 2.0) and the set cover formulation (KINALYZER) exhibit undesirable properties in the sibling reconstruction problem. There is considerable evidence that large sets of individuals not genetically excluded as siblings can be inferred to be a true sibling group, but it is also true that unrelated individuals may be genetically compatible with a true sibling group by chance. Such individuals may be identified on a statistical basis. PRT 2.0, based on these sound statistical principles, is able to efficiently match or exceed the highest reported accuracy rates, particularly for larger SG. The new version is available at http://www.urmc.rochester.edu/biostat/people/faculty/almudevar.cfm.  相似文献   

2.
MOTIVATION: Haplotype reconstruction is an essential step in genetic linkage and association studies. Although many methods have been developed to estimate haplotype frequencies and reconstruct haplotypes for a sample of unrelated individuals, haplotype reconstruction in large pedigrees with a large number of genetic markers remains a challenging problem. METHODS: We have developed an efficient computer program, HAPLORE (HAPLOtype REconstruction), to identify all haplotype sets that are compatible with the observed genotypes in a pedigree for tightly linked genetic markers. HAPLORE consists of three steps that can serve different needs in applications. In the first step, a set of logic rules is used to reduce the number of compatible haplotypes of each individual in the pedigree as much as possible. After this step, the haplotypes of all individuals in the pedigree can be completely or partially determined. These logic rules are applicable to completely linked markers and they can be used to impute missing data and check genotyping errors. In the second step, a haplotype-elimination algorithm similar to the genotype-elimination algorithms used in linkage analysis is applied to delete incompatible haplotypes derived from the first step. All superfluous haplotypes of the pedigree members will be excluded after this step. In the third step, the expectation-maximization (EM) algorithm combined with the partition and ligation technique is used to estimate haplotype frequencies based on the inferred haplotype configurations through the first two steps. Only compatible haplotype configurations with haplotypes having frequencies greater than a threshold are retained. RESULTS: We test the effectiveness and the efficiency of HAPLORE using both simulated and real datasets. Our results show that, the rule-based algorithm is very efficient for completely genotyped pedigree. In this case, almost all of the families have one unique haplotype configuration. In the presence of missing data, the number of compatible haplotypes can be substantially reduced by HAPLORE, and the program will provide all possible haplotype configurations of a pedigree under different circumstances, if such multiple configurations exist. These inferred haplotype configurations, as well as the haplotype frequencies estimated by the EM algorithm, can be used in genetic linkage and association studies. AVAILABILITY: The program can be downloaded from http://bioinformatics.med.yale.edu.  相似文献   

3.
A number of procedures have been developed that allow the genetic parameters of natural populations to be estimated using relationship information inferred from marker data rather than known pedigrees. Three published approaches are available; the regression, pair‐wise likelihood and Markov Chain Monte Carlo (MCMC) sib‐ship reconstruction methods. These were applied to body weight and molecular data collected from the Soay sheep population of St. Kilda, which has a previously determined pedigree. The regression and pair‐wise likelihood approaches do not specify an exact pedigree and yielded unreliable heritability estimates, that were sensitive to alteration of the fixed effects. The MCMC method, which specifies a pedigree prior to heritability estimation, yielded results closer to those determined using the known pedigree. In populations of low average relationship, such as the Soay sheep population, determination of a reliable pedigree is more useful than indirect approaches that do not specify a pedigree.  相似文献   

4.
For wildlife populations, it is often difficult to determine biological parameters that indicate breeding patterns and population mixing, but knowledge of these parameters is essential for effective management. A pedigree encodes the relationship between individuals and can provide insight into the dynamics of a population over its recent history. Here, we present a method for the reconstruction of pedigrees for wild populations of animals that live long enough to breed multiple times over their lifetime and that have complex or unknown generational structures. Reconstruction was based on microsatellite genotype data along with ancillary biological information: sex and observed body size class as an indicator of relative age of individuals within the population. Using body size‐class data to infer relative age has not been considered previously in wildlife genealogy and provides a marked improvement in accuracy of pedigree reconstruction. Body size‐class data are particularly useful for wild populations because it is much easier to collect noninvasively than absolute age data. This new pedigree reconstruction system, PR‐genie, performs reconstruction using maximum likelihood with optimization driven by the cross‐entropy method. We demonstrated pedigree reconstruction performance on simulated populations (comparing reconstructed pedigrees to known true pedigrees) over a wide range of population parameters and under assortative and intergenerational mating schema. Reconstruction accuracy increased with the presence of size‐class data and as the amount and quality of genetic data increased. We provide recommendations as to the amount and quality of data necessary to provide insight into detailed familial relationships in a wildlife population using this pedigree reconstruction technique.  相似文献   

5.
6.
The calculation of multipoint likelihoods of pedigree data is crucial for extracting the full available information needed for both parametric and nonparametric linkage analysis. Recent mathematical advances in both the Elston-Stewart and Lander-Green algorithms for computing exact multipoint likelihoods of pedigree data have enabled researchers to analyze data sets containing more markers and more individuals both faster and more efficiently. This paper presents novel algorithms that further extend the computational boundary of the Elston-Stewart algorithm. They have been implemented into the software package VITESSE v. 2 and are shown to be several orders of magnitude faster than the original implementation of the Elston-Stewart algorithm in VITESSE v. 1 on a variety of real pedigree data. VITESSE v. 2 was faster by a factor ranging from 168 to over 1,700 on these data sets, thus making a qualitative difference in the analysis. The main algorithm is based on the faster computation of the conditional probability of a component nuclear family within the pedigree by summing over the joint genotypes of the children instead of the parents as done in the VITESSE v. 1. This change in summation allows the parent-child transmission part of the calculation to be not only computed for each parent separately, but also for each locus separately by using inheritance vectors as is done in the Lander-Green algorithm. Computing both of these separately can lead to substantial computational savings. The use of inheritance vectors in the nuclear family calculation represents a partial synthesis of the techniques of the Lander-Green algorithm into the Elston-Stewart algorithm. In addition, the technique of local set recoding is introduced to further reduce the complexity of the nuclear family computation. These new algorithms, however, are not universally faster on all types of pedigree data compared to the method implemented in VITESSE v. 1 of summing over the parents. Therefore, a hybrid algorithm is introduced which combines the strength of both summation methods by using a numerical heuristic to decide which of the two to use for a given nuclear family within the pedigree and is shown to be faster than either method on its own. Finally, this paper discusses various complexity issues regarding both the Elston-Stewart and Lander-Green algorithms and possible future directions of further synthesis.  相似文献   

7.
The problem of ascertainment in segregation analysis arises when families are selected for study through ascertainment of affected individuals. In this case, ascertainment must be corrected for in data analysis. However, methods for ascertainment correction are not available for many common sampling schemes, e.g., sequential sampling of extended pedigrees (except in the case of "single" selection). Concerns about whether ascertainment correction is even required for large pedigrees, about whether and how multiple probands in the same pedigree can be taken into account properly, and about how to apply sequential sampling strategies have occupied many investigators in recent years. We address these concerns by reconsidering a central issue, namely, how to handle pedigree structure (including size). We introduce a new distinction, between sampling in such a way that observed pedigree structure does not depend on which pedigree members are probands (proband-independent [PI] sampling) and sampling in such a way that observed pedigree structure does depend on who are the probands (proband-dependent [PD] sampling). This distinction corresponds roughly (but not exactly) to the distinction between fixed-structure and sequential sampling. We show that conditioning on observed pedigree structure in ascertained data sets obtained under PD sampling is not in general correct (with the exception of "single" selection), while PI sampling of pedigree structures larger than simple sibships is generally not possible. Yet, in practice one has little choice but to condition on observed pedigree structure. We conclude that the problem of genetic modeling in ascertained data sets is, in most situations, literally intractable. We recommend that future efforts focus on the development of robust approximate approaches to the problem.  相似文献   

8.
Procedure is described to estimate allele frequencies in indigenous populations of Siberia using phenotype data not only for pure-blood representatives of the ethnic groups examined, but also for the descendants of mixed marriages. Implementation of the method requires reconstruction of the pedigree structure for the sample examined. Inclusion of the data on descendants of mixed marriages into the analysis increases the sample information content and decreases variance of the estimates obtained. The advantages of the method are illustrated using an example of Tundra Nentsy, for whom it was shown that variance of estimates at the analysis of the blood groups allele frequencies can be diminished approximately by a factor of 1.5.  相似文献   

9.
This study used simulations and a known two-generation pedigree of chinook salmon (Oncorhynchus tshawytscha) to evaluate the effect of full sibs of parents on pedigree reconstruction. Parentage analysis was conducted on 100 parent pair-offspring relationships from pedigrees with unrelated (simulation) and related (chinook salmon) candidate parents. Parentage assignment success for the chinook salmon was lower than in the simulated populations. For example, the six most variable loci (mean H(E) = 0.87) provided a mean of 97% unambiguous assignments in the simulated population and 67% unambiguous assignments for the chinook salmon. Estimates of the pairwise relatedness coefficient ((xy)) for most nonexcluded false parents and true parents of chinook salmon offspring exceeded 0.50. These results support the conclusion that closely related candidate parents decrease the power of genetic markers for pedigree reconstruction based on exclusion. Ambiguous parentage may be resolved using single parent- and parent pair-offspring likelihood analysis, however, these methods should be used with caution and they are not replacements for using more loci when many candidate parents are full sibs.  相似文献   

10.
There are several measures available to describe the genetic variability of populations. The average inbreeding coefficient of a population based on pedigree information is a frequently chosen option. Due to the developments in molecular genetics it is also possible to calculate inbreeding coefficients based on genetic marker information. A simulation study was carried out involving ten sires and 50 dams. The animals were mated over a period of 20 discrete generations. The population size was kept constant. Different situations with regard to the level of polymorphism and initial allele frequencies and mating scheme (random mating, avoidance of full sib mating, avoidance of full sib and half sib mating) were considered. Pedigree inbreeding coefficients of the last generation using full pedigree or 10, 5 and 2 generations of the pedigree were calculated. Marker inbreeding coefficients based on different sets of microsatellite loci were also investigated. Under random mating, pedigree-inbreeding coefficients are clearly more closely related to true autozygosity (i.e., the actual proportion of loci with alleles identical by descent) than marker-inbreeding coefficients. If mating is not random, the demands on the quality and quantity of pedigree records increase. Greater attention must be paid to the correct parentage of the animals.  相似文献   

11.
Pairwise analysis of Hin fI/33·6 DNA fingerprints from a total of one hundred and fifty-three Irish greyhounds of known pedigree were used to determine band-share estimates of unrelated, first-degree and second-degree relationships. Forty-eight unrelated Irish greyhounds were used to determine allele frequencies for three single-locus minisatellites, and following a preliminary screen, eight of the most polymorphic tetra-nucleotide microsatellites from a panel of 15. The results indicated that both band-share estimates by DNA fingerprinting and microsatellite allele frequencies are highly effective in resolving parentage in this greyhound population, while single-locus minisatellites showed limited polymorphism and could not be used alone for routine parentage testing in this breed. The present study also demonstrated that, to obtain optimal resolution of parentage, sample sets of known pedigree status are required to determine the band-share distribution and/or microsatellite allele frequencies.  相似文献   

12.
Minimum-recombinant haplotyping in pedigrees   总被引:15,自引:0,他引:15       下载免费PDF全文
This article presents a six-rule algorithm for the reconstruction of multiple minimum-recombinant haplotype configurations in pedigrees. The algorithm has three major features: First, it allows exhaustive search of all possible haplotype configurations under the criterion that there are minimum recombinants between markers. Second, its computational requirement is on the order of O(J(2)L(3)) in current implementation, where J is the family size and L is the number of marker loci under analysis. Third, it applies to various pedigree structures, with and without consanguinity relationship, and allows missing alleles to be imputed, during the haplotyping process, from their identical-by-descent copies. Haplotyping examples are provided using both published and simulated data sets.  相似文献   

13.
We have completed a genome scan of a 12-generation, 3,400-member pedigree with schizophrenia. Samples from 210 individuals were collected from the pedigree. We performed an "affecteds-only" genome-scan analysis using 43 members of the pedigree. The affected individuals included 29 patients with schizophrenia, 10 with schizoaffective disorders, and 4 with psychosis not otherwise specified. Two sets of white-European allele frequencies were used-one from a Swedish control population (46 unrelated individuals) and one from the pedigree (210 individuals). All analyses pointed to the same region: D6S264, located at 6q25.2, showed a maximum LOD score of 3.45 when allele frequencies in the Swedish control population were used, compared with a maximum LOD score of 2.59 when the pedigree's allele frequencies were used. We analyzed additional markers in the 6q25 region and found a maximum LOD score of 6.6 with marker D6S253, as well as a 6-cM haplotype (markers D6S253-D6S264) that segregated, after 12 generations, with the majority of the affected individuals. Multipoint analysis was performed with the markers in the 6q25 region, and a maximum LOD score of 7.7 was obtained. To evaluate the significance of the genome scan, we simulated the complete analysis under the assumption of no linkage. The results showed that a LOD score >2.2 should be considered as suggestive of linkage, whereas a LOD score >3.7 should be considered as significant. These results suggest that a common ancestral region was inherited by the affected individuals in this large pedigree.  相似文献   

14.
Although analytical procedures for multiple marker risk estimation are now well established, we still lack a unified optimal procedure for deciding which family members to examine and which markers to use. Towards this goal, the application of conditional risk distributions is developed, along with a suggested statistic for judging the utility of a marker. The conditional risk distribution depends on what knowledge has already been obtained about the pedigree, and indicates the expected outcome of risk estimates after another marker is examined. Population genetic aspects including haplotype frequencies, linkage disequilibrium, family size and pedigree structure and the statistical confidence in the linkage map all influence the optimal strategy for multiple marker risk estimation.  相似文献   

15.
Paternity inference using highly polymorphic codominant markers is becoming common in the study of natural populations. However, multiple males are often found to be genetically compatible with each offspring tested, even when the probability of excluding an unrelated male is high. While various methods exist for evaluating the likelihood of paternity of each nonexcluded male, interpreting these likelihoods has hitherto been difficult, and no method takes account of the incomplete sampling and error-prone genetic data typical of large-scale studies of natural systems. We derive likelihood ratios for paternity inference with codominant markers taking account of typing error, and define a statistic Δ for resolving paternity. Using allele frequencies from the study population in question, a simulation program generates criteria for Δ that permit assignment of paternity to the most likely male with a known level of statistical confidence. The simulation takes account of the number of candidate males, the proportion of males that are sampled and gaps and errors in genetic data. We explore the potentially confounding effect of relatives and show that the method is robust to their presence under commonly encountered conditions. The method is demonstrated using genetic data from the intensively studied red deer ( Cervus elaphus ) population on the island of Rum, Scotland. The Windows-based computer program, CERVUS , described in this study is available from the authors. CERVUS can be used to calculate allele frequencies, run simulations and perform parentage analysis using data from all types of codominant markers.  相似文献   

16.
Construction of simultaneous confidence sets for several effective doses currently relies on inverting the Scheffé type simultaneous confidence band, which is known to be conservative. We develop novel methodology to make the simultaneous coverage closer to its nominal level, for both two‐sided and one‐sided simultaneous confidence sets. Our approach is shown to be considerably less conservative than the current method, and is illustrated with an example on modeling the effect of smoking status and serum triglyceride level on the probability of the recurrence of a myocardial infarction.  相似文献   

17.
Pedigrees, depicting genealogical relationships between individuals, are important in several research areas. Molecular markers allow inference of pedigrees in wild species where relationship information is impossible to collect by observation. Marker data are analysed statistically using methods based on Mendelian inheritance rules. There are numerous computer programs available to conduct pedigree analysis, but most software is inflexible, both in terms of assumptions and data requirements. Most methods only accommodate monogamous diploid species using codominant markers without genotyping error. In addition, most commonly used methods use pairwise comparisons rather than a full-pedigree likelihood approach, which considers the likelihood of the entire pedigree structure and allows the simultaneous inference of parentage and sibship. Here, we describe colony, a computer program implementing full-pedigree likelihood methods to simultaneously infer sibship and parentage among individuals using multilocus genotype data. colony can be used for both diploid and haplodiploid species; it can use dominant and codominant markers, and can accommodate, and estimate, genotyping error at each locus. In addition, colony can carry out these inferences for both monoecious and dioecious species. The program is available as a Microsoft Windows version, which includes a graphical user interface, and a Macintosh version, which uses an R-based interface.  相似文献   

18.
Multipoint linkage analysis is a powerful method for mapping a rare disease gene on the human gene map despite limited genotype and pedigree data. However, there is no standard procedure for determining a confidence interval for gene location by using multipoint linkage analysis. A genetic counselor needs to know the confidence interval for gene location in order to determine the uncertainty of risk estimates provided to a consultant on the basis of DNA studies. We describe a resampling, or "bootstrap," method for deriving an approximate confidence interval for gene location on the basis of data from a single pedigree. This method was used to define an approximate confidence interval for the location of a gene causing nonsyndromal X-linked mental retardation in a single pedigree. The approach seemed robust in that similar confidence intervals were derived by using different resampling protocols. Quantitative bounds for the confidence interval were dependent on the genetic map chosen. Once an approximate confidence interval for gene location was determined for this pedigree, it was possible to use multipoint risk analysis to estimate risk intervals for women of unknown carrier status. Despite the limited genotype data, the combination of the resampling method and multipoint risk analysis had a dramatic impact on the genetic advice available to consultants.  相似文献   

19.
Several programs are currently available for the detection of genotyping error that may or may not be Mendelianly inconsistent. However, no systematic study exists that evaluates their performance under varying pedigree structures and sizes, marker spacing, and allele frequencies. Our simulation study compares four multipoint methods: Merlin, Mendel4, SimWalk2, and Sibmed. We look at empirical thresholds, power, and false-positive rates on 7 small pedigree structures that included sibships with and without genotyped parents, and a three-generation pedigree, using 11 microsatellite markers with 3 different map spacings. Simulated data includes 5,000 replicates of each pedigree structure and marker map, with random genotyping errors in about 4% of the middle marker's genotypes. We found that the default thresholds used by these programs provide low power (47-72%). Power is improved more by adding genotyped siblings than by using more closely spaced markers. Some mistyping methods are sensitive to the frequencies of the observed alleles. Siblings of mistyped individuals have elevated false-positive rates, as do markers close to the mistyped marker. We conclude that thresholds should be decided based on the pedigree and marker data and that greater focus should be placed on modeling genotyping error when computing likelihoods, rather than on detecting and eliminating genotyping errors.  相似文献   

20.
系统发育关系的构建对被子植物分类及进化研究非常重要。长期以来,被子植物系统发育的研究,大多使用质体基因、线粒体基因或少数保守的单拷贝核基因。该研究从已注释基因组或转录组中搜集88种被子植物(包含58目)的核基因集;通过对其进行同源基因聚类及去旁系同源基因,获得了5 993个一对一的直系同源基因家族(即对于每个基因家族,每种植物最多一条序列,最少包含50个物种);使用截取各种不同数目基因集的DNA或氨基酸序列,采用串联法(concatenation)和溯祖法(coalescence),共构建了20棵进化树。比较这些进化树,虽然大部分结果支持APG IV中描述的被子植物主要支系之间的关系[(真双子叶植物,单子叶植物),木兰类植物],但真双子叶植物内部各目分支的演化关系与APG IV有一个很大的不同,即认为檀香目和石竹目是蔷薇类植物的姊妹群。基于这些进化树,估算了被子植物各目分支的分化时间,结果表明被子植物的起源时间为237.78百万年前(95%置信区间为202.6~278.08),与主流观点认为的225百万年至240百万年前一致。以上结果为构建进化树提供了一种可行性策略,这种方法允许使用基因数目更多而计算速度更快。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号