首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In a population intended for breeding and selection, questions of interest relative to a specific segregating QTL are the variance it generates in the population, and the number and effects of its alleles. One approach to address these questions is to extract several inbreds from the population and use them to generate multiple mapping families. Given random sampling of parents, sampling strategy may be an important factor determining the power of the analysis and its accuracy in estimating QTL variance and allelic number. We describe appropriate multiple-family QTL mapping methodology and apply it to simulated data sets to determine optimal sampling strategies in terms of family number versus family size. Genomes were simulated with seven chromosomes, on which 107 markers and six QTL were distributed. The total heritability was 0.60. Two to ten alleles were segregating at each QTL. Sampling strategies ranged from sampling two inbreds and generating a single family of 600 progeny to sampling 40 inbreds and generating 40 families of 15 progeny each. Strategies involving only one to five families were subject to variation due to the sampling of inbred parents. For QTL where more than two alleles were segregating, these strategies did not sample QTL alleles representative of the original population. Conversely, strategies involving 30 or more parents were subject to variation due to sampling of QTL genotypes within the small families obtained. Given these constraints, greatest QTL detection power was obtained for strategies involving five to ten mapping families. The most accurate estimation of the variance generated by the QTL, however, was obtained with strategies involving 20 or more families. Finally, strategies with an intermediate number of families best estimated the number of QTL alleles. We conclude that no overall optimal sampling strategy exists but that the strategy adopted must depend on the objective.Communicated by P. Langridge  相似文献   

2.
整合分析中两种假设模型的介绍及实例分析*   总被引:5,自引:1,他引:4  
郑凤英  彭少麟 《生态科学》2004,23(4):292-294
整合分析(meta-analysis)是对同一主题下多个独立实验结果进行综合的统计学方法,被认为是到目前为止最好的数量综合方法。在进行整合分析时,首选应提出统计假设,根据假设的不同可将整合分析分为固定效应模型(fixedeffect model)和随机效应模型(random effect model),前者假定有相似的多个研究在同一分组里有一个共同的真实效应值,由于取样误差,导致在实际效应值的测定中各研究间存在差别;在后者中,假定各研究间有随机变量,因此,不共享一个真实效应值。介绍了两种假设模型下整合分析的计算方法,并进行了实例分析。  相似文献   

3.
A major QTL for P uptake had previously been mapped to a 13-cM marker interval on the long arm of chromosome 12. To map that major QTL with higher precision and certainty, a secondary mapping population was developed by backcrossing a near-isogenic line containing the QTL from the donor parent to the recurrent parent of low P uptake. Two different mapping strategies have been followed in this study. A conventional QTL mapping approach was based on individual F(2) RFLP data and the phenotypic evaluation of family means in the F(3). The second strategy employed a substitution-mapping approach. Phenotypic and marker data were obtained for 160 F(3) individuals of six highly informative families that differed in the size of donor chromosomal segments in the region of the putative QTL. QTL mapping showed that close to 80% of the variation between families was due to a single QTL, hereafter referred to as Pup1 (Phosphorus uptake 1). Pup1 was placed in a 3-cM interval flanked by markers S14025 and S13126, which is within 1 cM of the position identified in the original QTL mapping experiment. Other chromosomal regions and epistatic effects were not significant. Substitution mapping revealed that Pup1 co-segregated with marker S13126 and that the flanking markers, S14025 and S13752, were outside the interval containing Pup1. The two mapping strategies therefore yielded almost identical results and, in combining the advantages of both, Pup1 could be mapped with high certainty. The QTL mapping appoach showed that the phenotypic variation between families was due to only one QTL without any additional epistacic interactions, whereas the advantage of substitution mapping was to place clearly defined borders around the QTL.  相似文献   

4.
In QTL analysis of non-normally distributed phenotypes, non-parametric approaches have been proposed as an alternative to the use of parametric tests on mathematically transformed data. The non-parametric interval mapping test uses random ranking to deal with ties. Another approach is to assign to each tied individual the average of the tied ranks (midranks). This approach is implemented and compared to the random ranking approach in terms of statistical power and accuracy of the QTL position. Non-normal phenotypes such as bacteria counts showing high numbers of zeros are simulated (0-80% zeros). We show that, for low proportions of zeros, the power estimates are similar but, for high proportions of zeros, the midrank approach is superior to the random ranking approach. For example, with a QTL accounting for 8% of the total phenotypic variance, a gain from 8% to 11% of power can be obtained. Furthermore, the accuracy of the estimated QTL location is increased when using midranks. Therefore, if non-parametric interval mapping is chosen, the midrank approach should be preferred. This test might be especially relevant for the analysis of disease resistance phenotypes such as those observed when mapping QTLs for resistance to infectious diseases.  相似文献   

5.
Strategies for genetic mapping of categorical traits   总被引:3,自引:0,他引:3  
Shaoqi Rao  Xia Li 《Genetica》2000,109(3):183-197
The search for efficient and powerful statistical methods and optimal mapping strategies for categorical traits under various experimental designs continues to be one of the main tasks in genetic mapping studies. Methodologies for genetic mapping of categorical traits can generally be classified into two groups, linear and non-linear models. We develop a method based on a threshold model, termed mixture threshold model to handle ordinal (or binary) data from multiple families. Monte Carlo simulations are done to compare its statistical efficiencies and properties of the proposed non-linear model with a linear model for genetic mapping of categorical traits using multiple families. The mixture threshold model has notably higher statistical power than linear models. There may be an optimal sampling strategy (family size vs number of families) in which genetic mapping reaches its maximal power and minimal estimation errors. A single large-sibship family does not necessarily produce the maximal power for detection of quantitative trait loci (QTL) due to genetic sampling of QTL alleles. The QTL allelic model has a marked impact on efficiency of genetic mapping of categorical traits in terms of statistical power and QTL parameter estimation. Compared with a fixed number of QTL alleles (two or four), the model with an infinite number of QTL alleles and normally distributed allelic effects results in loss of statistical power. The results imply that inbred designs (e.g. F2 or four-way crosses) with a few QTL alleles segregating or reducing number of QTL alleles (e.g. by selection) in outbred populations are desirable in genetic mapping of categorical traits using data from multiple families. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

6.
It is a challenging issue to map Quantitative Trait Loci (QTL) underlying complex discrete traits,which usually show discontinuous distribution and less information,using conventional statisti-cal methods. Bayesian-Markov chain Monte Carlo (Bayesian-MCMC) approach is the key procedure in mapping QTL for complex binary traits,which provides a complete posterior distribution for QTL parameters using all prior information. As a consequence,Bayesian estimates of all interested vari-ables can be obtained straightforwardly basing on their posterior samples simulated by the MCMC algorithm. In our study,utilities of Bayesian-MCMC are demonstrated using simulated several ani-mal outbred full-sib families with different family structures for a complex binary trait underlied by both a QTL and polygene. Under the Identity-by-Descent-Based variance component random model,three samplers basing on MCMC,including Gibbs sampling,Metropolis algorithm and reversible jump MCMC,were implemented to generate the joint posterior distribution of all unknowns so that the QTL parameters were obtained by Bayesian statistical inferring. The results showed that Bayesian-MCMC approach could work well and robust under different family structures and QTL effects. As family size increases and the number of family decreases,the accuracy of the parameter estimates will be im-proved. When the true QTL has a small effect,using outbred population experiment design with large family size is the optimal mapping strategy.  相似文献   

7.
It is a challenging issue to map Quantitative Trait Loci (QTL) underlying complex discrete traits, which usually show discontinuous distribution and less information, using conventional statistical methods. Bayesian-Markov chain Monte Carlo (Bayesian-MCMC) approach is the key procedure in mapping QTL for complex binary traits, which provides a complete posterior distribution for QTL parameters using all prior information. As a consequence, Bayesian estimates of all interested variables can be obtained straightforwardly basing on their posterior samples simulated by the MCMC algorithm. In our study, utilities of Bayesian-MCMC are demonstrated using simulated several animal outbred full-sib families with different family structures for a complex binary trait underlied by both a QTL and polygene. Under the Identity-by-Descent-Based variance component random model, three samplers basing on MCMC, including Gibbs sampling, Metropolis algorithm and reversible jump MCMC, were implemented to generate the joint posterior distribution of all unknowns so that the QTL parameters were obtained by Bayesian statistical inferring. The results showed that Bayesian-MCMC approach could work well and robust under different family structures and QTL effects. As family size increases and the number of family decreases, the accuracy of the parameter estimates will be improved. When the true QTL has a small effect, using outbred population experiment design with large family size is the optimal mapping strategy.  相似文献   

8.
The identification of recessive disease-causing genes by homozygosity mapping is often restricted by lack of suitable consanguineous families. To overcome these limitations, we apply homozygosity mapping to single affected individuals from outbred populations. In 72 individuals of 54 kindred ascertained worldwide with known homozygous mutations in 13 different recessive disease genes, we performed total genome homozygosity mapping using 250,000 SNP arrays. Likelihood ratio Z-scores (ZLR) were plotted across the genome to detect ZLR peaks that reflect segments of homozygosity by descent, which may harbor the mutated gene. In 93% of cases, the causative gene was positioned within a consistent ZLR peak of homozygosity. The number of peaks reflected the degree of inbreeding. We demonstrate that disease-causing homozygous mutations can be detected in single cases from outbred populations within a single ZLR peak of homozygosity as short as 2 Mb, containing an average of only 16 candidate genes. As many specialty clinics have access to cohorts of individuals from outbred populations, and as our approach will result in smaller genetic candidate regions, the new strategy of homozygosity mapping in single outbred individuals will strongly accelerate the discovery of novel recessive disease genes.  相似文献   

9.
10.
Seventy to 75 sons of each of six Holstein sires were assayed for genotypes at a number of microsatellite loci spanning Chromosomes (Chrs) 1 and 6. The number of informative loci varied from three to eight on each chromosome in different sire families. Linkage order and map distance for microsatellite loci were estimated using CRI-MAP. Estimates of QTL effect and location were made by using a least squares interval mapping approach based on daughter yield deviations of sons for 305-d milk, fat, protein yield, and fat and protein percentage. Thresholds for statistical significance of QTL effects were determined from interval mapping of 10,000 random permutations of the data across the bull sire families and within each sire family separately. Across-sire analyses indicated a significant QTL for fat and protein yield, and fat percentage on Chr 1, and QTL effects on milk yield and protein percentage that might represent one or two QTL on Chr 6. Analyses within each sire family indicated significant QTL effects in five sire families, with one sire possibly being heterozygous for two QTLs. Statistically significant estimates of QTL effects on breeding value ranged from 340 to 640 kg of milk, from 15.6 to 28.4 kg of fat, and 14.4 to 17.6 kg of protein. Received: 19 November 1999 / Accepted: 31 August 2000  相似文献   

11.
Statistical methods for expression quantitative trait loci (eQTL) mapping   总被引:7,自引:0,他引:7  
  相似文献   

12.
Comparison of biometrical models for joint linkage association mapping   总被引:1,自引:0,他引:1  
Joint linkage association mapping (JLAM) combines the advantages of linkage mapping and association mapping, and is a powerful tool to dissect the genetic architecture of complex traits. The main goal of this study was to use a cross-validation strategy, resample model averaging and empirical data analyses to compare seven different biometrical models for JLAM with regard to the correction for population structure and the quantitative trait loci (QTL) detection power. Three linear models and four linear mixed models with different approaches to control for population stratification were evaluated. Models A, B and C were linear models with either cofactors (Model-A), or cofactors and a population effect (Model-B), or a model in which the cofactors and the single-nucleotide polymorphism effect were modeled as nested within population (Model-C). The mixed models, D, E, F and G, included a random population effect (Model-D), or a random population effect with defined variance structure (Model-E), a kinship matrix defining the degree of relatedness among the genotypes (Model-F), or a kinship matrix and principal coordinates (Model-G). The tested models were conceptually different and were also found to differ in terms of power to detect QTL. Model-B with the cofactors and a population effect, effectively controlled population structure and possessed a high predictive power. The varying allele substitution effects in different populations suggest as a promising strategy for JLAM to use Model-B for the detection of QTL and then to estimate their effects by applying Model-C.  相似文献   

13.
Case-control disease-marker association studies are often used in the search for variants that predispose to complex diseases. One approach to increasing the power of these studies is to enrich the case sample for individuals likely to be affected because of genetic factors. In this article, we compare three case-selection strategies that use allele-sharing information with the standard strategy that selects a single individual from each family at random. In affected sibship samples, we show that, by carefully selecting sibships and/or individuals on the basis of allele sharing, we can increase the frequency of disease-associated alleles in the case sample. When these cases are compared with unrelated controls, the difference in the frequency of the disease-associated allele is therefore also increased. We find that, by choosing the affected sib who shows the most evidence for pairwise allele sharing with the other affected sibs in families, the test statistic is increased by >20%, on average, for additive models with modest genotype relative risks. In addition, we find that the per-genotype information associated with the allele sharing-based strategies is increased compared with that associated with random selection of a sib for genotyping. Even though we select sibs on the basis of a nonparametric statistic, the additional gain for selection based on the unknown underlying mode of inheritance is minimal. We show that these properties hold even when the power to detect linkage to a region in the entire sample is negligible. This approach can be extended to more-general pedigree structures and quantitative traits.  相似文献   

14.
DNA pooling is a potential methodology for genetic loci with small effect contributing to complex diseases and quantitative traits. This is accomplished by the rapid preliminary screening of the genome for the allelic association with the most common class of polymorphic short tandem repeat markers. The methodology assumes as a common founder for the linked disease locus of interest and searches for a region of a chromosome shared between affected individuals. The general theory of DNA pooling basically relies on the observed differences in the allelic distribution between pools from affected and unaffected individuals, including a reduction in the number of alleles in the affected pool, which indicate the sharing of a chromosomal region. The power of statistic for associated linkage mapping can be determined using two recently developed strategies, firstly, by measuring the differences of allelic image patterns produced by two DNA pools of extreme character and secondly, by measuring total allele content differences by comparing between two pools containing large numbers of DNA samples. These strategies have effectively been utilized to identify the shared chromosomal regions for linkage studies and to investigate the candidate disease loci for fine structure gene mapping using allelic association. This paper outlines the utilization of DNA pooling as a potential tool to locate the complex disease loci, statistical methods for accurate estimates of allelic frequencies from DNA pools, its advantages, drawbacks and significance in associate linkage mapping using pooled DNA samples.  相似文献   

15.
Estimations of genetic parameters of wood traits based on reduced sample populations are widely reported in the literature, but few investigations have considered the consequences of these small populations on the precision of parameter estimates. The purpose of this study was to determine an optimal strategy for sampling subgroups, by varying either the number of families or the number of individuals (trees) per family, and by verifying the accuracy of certain genetic parameters (across-trials analysis). To achieve this, simulations were conducted using random resampling without replacement (k?=?1,000/pair of varying factors) on datasets containing 10-year total height of two coniferous species (Larix laricina and Picea mariana), as well as pilodyn measurements of wood density evaluated on a 26-year-old population of P. mariana. SAS® 9.2 Macro Language and Procedures were used to estimate confidence intervals of several genetic parameters with different reduced samplings. Simulation results show that reducing the number of trees per family per site had more impact on the magnitude and precision of genetic parameter estimates than reducing the number of families, especially for half-sib heritability and type B genetic correlations for height and wood density. A priori determination of an optimal subsampling strategy to evaluate the accuracy of genetic parameters should become common practice before assessing wood traits, in tree breeding studies or when planning juvenile retrospective progeny trials for forest tree species.  相似文献   

16.
Marginal tests based on individual SNPs are routinely used in genetic association studies. Studies have shown that haplotype‐based methods may provide more power in disease mapping than methods based on single markers when, for example, multiple disease‐susceptibility variants occur within the same gene. A limitation of haplotype‐based methods is that the number of parameters increases exponentially with the number of SNPs, inducing a commensurate increase in the degrees of freedom and weakening the power to detect associations. To address this limitation, we introduce a hierarchical linkage disequilibrium model for disease mapping, based on a reparametrization of the multinomial haplotype distribution, where every parameter corresponds to the cumulant of each possible subset of a set of loci. This hierarchy present in the parameters enables us to employ flexible testing strategies over a range of parameter sets: from standard single SNP analyses through the full haplotype distribution tests, reducing degrees of freedom and increasing the power to detect associations. We show via extensive simulations that our approach maintains the type I error at nominal level and has increased power under many realistic scenarios, as compared to single SNP and standard haplotype‐based studies. To evaluate the performance of our proposed methodology in real data, we analyze genome‐wide data from the Wellcome Trust Case‐Control Consortium.  相似文献   

17.
We consider three approaches for estimating the rates of nonsynonymous and synonymous changes at each site in a sequence alignment in order to identify sites under positive or negative selection: (1) a suite of fast likelihood-based "counting methods" that employ either a single most likely ancestral reconstruction, weighting across all possible ancestral reconstructions, or sampling from ancestral reconstructions; (2) a random effects likelihood (REL) approach, which models variation in nonsynonymous and synonymous rates across sites according to a predefined distribution, with the selection pressure at an individual site inferred using an empirical Bayes approach; and (3) a fixed effects likelihood (FEL) method that directly estimates nonsynonymous and synonymous substitution rates at each site. All three methods incorporate flexible models of nucleotide substitution bias and variation in both nonsynonymous and synonymous substitution rates across sites, facilitating the comparison between the methods. We demonstrate that the results obtained using these approaches show broad agreement in levels of Type I and Type II error and in estimates of substitution rates. Counting methods are well suited for large alignments, for which there is high power to detect positive and negative selection, but appear to underestimate the substitution rate. A REL approach, which is more computationally intensive than counting methods, has higher power than counting methods to detect selection in data sets of intermediate size but may suffer from higher rates of false positives for small data sets. A FEL approach appears to capture the pattern of rate variation better than counting methods or random effects models, does not suffer from as many false positives as random effects models for data sets comprising few sequences, and can be efficiently parallelized. Our results suggest that previously reported differences between results obtained by counting methods and random effects models arise due to a combination of the conservative nature of counting-based methods, the failure of current random effects models to allow for variation in synonymous substitution rates, and the naive application of random effects models to extremely sparse data sets. We demonstrate our methods on sequence data from the human immunodeficiency virus type 1 env and pol genes and simulated alignments.  相似文献   

18.
In this paper, different strategies to test for association in samples with related individuals designed for linkage studies are compared. Because no independent controls are available, a family-based association test and case-control tests corrected for the presence of related individuals in which unaffected relatives are used as controls were tested. When unrelated controls are available, additional strategies including selection of a single case per family considering either all families or a subset of linked families, are also considered. Analyses are performed on the simulated dataset, blind to the answers. The case-control test corrected for the presence of related individuals is the most powerful strategy to detect three loci associated with the disease under study. Using a correction factor for the case-control test performed conditional on the marker information rather than unconditional does not impact the power significantly.  相似文献   

19.
N Yi  S Xu 《Genetics》1999,153(2):1029-1040
Mapping quantitative trait loci (QTL) for complex binary traits is more challenging than for normally distributed traits due to the nonlinear relationship between the observed phenotype and unobservable genetic effects, especially when the mapping population contains multiple outbred families. Because the number of alleles of a QTL depends on the number of founders in an outbred population, it is more appropriate to treat the effect of each allele as a random variable so that a single variance rather than individual allelic effects is estimated and tested. Such a method is called the random model approach. In this study, we develop the random model approach of QTL mapping for binary traits in outbred populations. An EM-algorithm with a Fisher-scoring algorithm embedded in each E-step is adopted here to estimate the genetic variances. A simple Monte Carlo integration technique is used here to calculate the likelihood-ratio test statistic. For the first time we show that QTL of complex binary traits in an outbred population can be scanned along a chromosome for their positions, estimated for their explained variances, and tested for their statistical significance. Application of the method is illustrated using a set of simulated data.  相似文献   

20.
We develop an approach for the exploratory analysis of gene expression data, based upon blind source separation techniques. This approach exploits higher-order statistics to identify a linear model for (logarithms of) expression profiles, described as linear combinations of "independent sources." As a result, it yields "elementary expression patterns" (the "sources"), which may be interpreted as potential regulation pathways. Further analysis of the so-obtained sources show that they are generally characterized by a small number of specific coexpressed or antiexpressed genes. In addition, the projections of the expression profiles onto the estimated sources often provides significant clustering of conditions. The algorithm relies on a large number of runs of "independent component analysis" with random initializations, followed by a search of "consensus sources." It then provides estimates for independent sources, together with an assessment of their robustness. The results obtained on two datasets (namely, breast cancer data and Bacillus subtilis sulfur metabolism data) show that some of the obtained gene families correspond to well known families of coregulated genes, which validates the proposed approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号