首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Genome-wide linkage analysis using microsatellite markers has been successful in the identification of numerous Mendelian and complex disease loci. The recent availability of high-density single-nucleotide polymorphism (SNP) maps provides a potentially more powerful option. Using the simulated and Collaborative Study on the Genetics of Alcoholism (COGA) datasets from the Genetics Analysis Workshop 14 (GAW14), we examined how altering the density of SNP marker sets impacted the overall information content, the power to detect trait loci, and the number of false positive results. For the simulated data we used SNP maps with density of 0.3 cM, 1 cM, 2 cM, and 3 cM. For the COGA data we combined the marker sets from Illumina and Affymetrix to create a map with average density of 0.25 cM and then, using a sub-sample of these markers, created maps with density of 0.3 cM, 0.6 cM, 1 cM, 2 cM, and 3 cM. For each marker set, multipoint linkage analysis using MERLIN was performed for both dominant and recessive traits derived from marker loci. Our results showed that information content increased with increased map density. For the homogeneous, completely penetrant traits we created, there was only a modest difference in ability to detect trait loci. Additionally, as map density increased there was only a slight increase in the number of false positive results when there was linkage disequilibrium (LD) between markers. The presence of LD between markers may have led to an increased number of false positive regions but no clear relationship between regions of high LD and locations of false positive linkage signals was observed.  相似文献   

2.
A recent approach for gene mapping based on confidence set inference (CSI) promises several advantages, including avoidance of corrections for multiple tests, availability of confidence intervals with known statistical properties, and sufficient localizations of disease genes. This paper proposes an extended CSI procedure that can handle markers with incomplete polymorphism, thereby increasing the applicability of the set of CSI methods in practical situations. Simulation studies show that the new procedure retains the main advantages of the original CSI. Although it generally requires more data to achieve a similar power, this increase is moderate for markers with 80% heterozygosity or higher. We also investigate the effects of relative risk estimates and disease models. Our analyses show that perturbation from actual relative risks or multilocus disease models generally leads to reduction in power or inflation in type I error, as expected. Nevertheless, for certain classes of two-locus disease models, CSI can still perform well, with reasonably high actual coverage probabilities for at least one of the disease loci. Application of CSI to the data provided by the Genetic Analysis Workshop 13 yields encouraging results, as they compare favorably to those obtained from GENEHUNTER using its NPL sib-pair method.  相似文献   

3.
A simulation study was performed to investigate the effects of missing values, typing errors and distorted segregation ratios in molecular marker data on the construction of genetic linkage maps, and to compare the performance of three locus-ordering criteria (weighted least squares, maximum likelihood and minimum sum of adjacent recombination fractions criteria) in the presence of such effects. The study was based upon three linkage groups of 10 loci at 2, 6, and 10 cM spacings simulated from a doubled-haploid population of size 150. Criteria performance were assessed using the number of replicates with correctly estimated orders, the mean rank correlation between the estimated and the true order and the mean total map length. Bootstrap samples from replicates in the maximum likelihood analysis produced a measure of confidence in the estimated locus order. The effects of missing values and/or typing errors in the data are to reduce the proportion of correctly ordered maps, and this problem worsens as the distances between loci decreases. The maximum likelihood criterion is most successful at ordering loci correctly, but gives estimated map lengths, which are substantially inflated when typing errors are present. The presence of missing values in the data produces shorter map lengths for more widely spaced markers, especially under the weighted least-squares criterion. Overall, the presence of segregation distortion has little effect on this population.  相似文献   

4.
Previous studies have noted that the estimated positions of a large proportion of mapped quantitative trait loci (QTLs) coincide with marker locations and have suggested that this indicates a bias in the mapping methodology. In this study we predict the expected proportion of QTLs with positions estimated to be at the location of a marker and further examine the problem using simulated data. The results show that the higher proportion of putative QTLs estimated to be at marker positions compared with non-marker positions is an expected consequence of the estimation methods. The study initially focused on a single interval with no QTLs and was extended to include multiple intervals and QTLs of large effect. Further, the study demonstrated that the larger proportion of estimated QTL positions at the location of markers was not unique to linear regression mapping. Maximum likelihood produced similar results, although the accumulation of positional estimates at outermost markers was reduced when regions outside the linkage group were also considered. The bias towards marker positions is greatest under the null hypothesis of no QTLs or when QTL effects are small. This study discusses the impact the findings could have on the calculation of thresholds and confidence intervals produced by bootstrap methods.  相似文献   

5.
We use the Genetic Analysis Workshop 14 simulated data to explore the effectiveness of a two-stage strategy for mapping complex disease loci consisting of an initial genome scan with confidence interval construction for gene location, followed by fine mapping with family-based tests of association on a dense set of single-nucleotide polymorphisms. We considered four types of intervals: the 1-LOD interval, a basic percentile bootstrap confidence interval based on the position of the maximum Zlr score, and asymptotic and bootstrap confidence intervals based on a generalized estimating equations method. For fine mapping we considered two family-based tests of association: a test based on a likelihood ratio statistic and a transmission-disequilibrium-type test implemented in the software FBAT. In two of the simulation replicates, we found that the bootstrap confidence intervals based on the peak Zlr and the 1-LOD support interval always contained the true disease loci and that the likelihood ratio test provided further strong confirmatory evidence of the presence of disease loci in these regions.  相似文献   

6.
Yang Y  Ott J 《Human heredity》2002,53(4):227-236
In genome-wide screens of genetic marker loci, non-mendelian inheritance of a marker is taken to indicate its vicinity to a disease locus. Heritable complex traits are thought to be under the influence of multiple possibly interacting susceptibility loci yet the most frequently used methods of linkage and association analysis focus on one susceptibility locus at a time. Here we introduce log-linear models for the joint analysis of multiple marker loci and interaction effects between them. Our approach focuses on affected sib pair data and identical by descent (IBD) allele sharing values observed on them. For each heterozygous parent, the IBD values at linked markers represent a sequence of dependent binary variables. We develop log-linear models for the joint distribution of these IBD values. An independence log-linear model is proposed to model the marginal means and the neighboring interaction model is advocated to account for associations between adjacent markers. Under the assumption of conditional independence, likelihood methods are applied to simulated data containing one or two susceptibility loci. It is shown that the neighboring interaction log-linear model is more efficient than the independence model, and incorporating interaction in the two-locus analysis provides increased power and accuracy for mapping of the trait loci.  相似文献   

7.
A whole-genome scan using single marker association was used to detect chromosome regions associated with seven female fertility traits in Finnish Ayrshire dairy cattle. The phenotypic data consisted of de-regressed estimated breeding values for 340 bulls which were estimated using a single trait model. Genotypes were obtained with the Illumina BovineSNP50 panel and a total of 35 630 informative, high-quality single nucleotide polymorphism (SNP) markers were used. The association analysis was performed using a mixed-model approach which fitted a fixed effect for each SNP and a random polygenic effect. We detected eleven genome-wide significant associations on eight different chromosomes. With at least chromosome-wise significance after Bonferroni correction, sixteen SNPs on nine chromosomes showed significant associations with one or more fertility traits. The results confirmed quantitative trait loci on three chromosomes (1, 2 and 20) for fertility traits previously reported for the same breed and one on chromosome four previously detected in Holstein cattle.  相似文献   

8.
Goldringer I  Bataillon T 《Genetics》2004,168(1):563-568
The effective population size (Ne) is frequently estimated using temporal changes in allele frequencies at neutral markers. Such temporal changes in allele frequencies are usually estimated from the standardized variance in allele frequencies (Fc). We simulate Wright-Fisher populations to generate expected distributions of Fc and of Fc (Fc averaged over several loci). We explore the adjustment of these simulated Fc distributions to a chi-square distribution and evaluate the resulting precision on the estimation of Ne for various scenarios. Next, we outline a procedure to test for the homogeneity of the individual Fc across loci and identify markers exhibiting extreme Fc-values compared to the rest of the genome. Such loci are likely to be in genomic areas undergoing selection, driving Fc to values greater (or smaller) than expected under drift alone. Our procedure assigns a P-value to each locus under the null hypothesis (drift is homogeneous throughout the genome) and simultaneously controls the rate of false positive among loci declared as departing significantly from the null. The procedure is illustrated using two published data sets: (i) an experimental wheat population subject to natural selection and (ii) a maize population undergoing recurrent selection.  相似文献   

9.
To study genetic loci influencing obesity in nuclear families with type 2 diabetes, we performed a genome‐wide screen with 325 microsatellite markers that had an average spacing of 11 cM and a mean heterozygosity of ~75% covering all 22 autosomes. Genotype data were obtained from 562 individuals from 178 families from the Breda Study Cohort. These families were determined to have at least two members with type 2 diabetes. As a measure of obesity, the BMI of each diabetes patient was determined. The genotypes were analyzed using variance components (VCs) analysis implemented in GENEHUNTER 2 to determine quantitative trait loci influencing BMI. The VC analysis revealed two genomic regions showing VC logarithm of odds (LOD) scores ≥1.0 on chromosome 1 and chromosome 11. The regions of interest on both chromosomes were further investigated by fine‐mapping with additional markers, resulting in a VC LOD score of 1.5 on chromosome 1q and a VC LOD of 2.4 on chromosome 11q. The locus on chromosome 1 has been implicated previously in diabetes. The locus on chromosome 11 has been implicated previously in diabetes and obesity. Our study to determine linkage for BMI confirms the presence of quantitative trait loci influencing obesity in subjects with type 2 diabetes on chromosomes 1q31‐q42 and 11q14‐q24.  相似文献   

10.
We recently described a method for linkage disequilibrium (LD) mapping, using cladistic analysis of phased single-nucleotide polymorphism (SNP) haplotypes in a logistic regression framework. However, haplotypes are often not available and cannot be deduced with certainty from the unphased genotypes. One possible two-stage approach is to infer the phase of multilocus genotype data and analyze the resulting haplotypes as if known. Here, haplotypes are inferred using the expectation-maximization (EM) algorithm and the best-guess phase assignment for each individual analyzed. However, inferring haplotypes from phase-unknown data is prone to error and this should be taken into account in the subsequent analysis. An alternative approach is to analyze the phase-unknown multilocus genotypes themselves. Here we present a generalization of the method for phase-known haplotype data to the case of unphased SNP genotypes. Our approach is designed for high-density SNP data, so we opted to analyze the simulated dataset. The marker spacing in the initial screen was too large for our method to be effective, so we used the answers provided to request further data in regions around the disease loci and in null regions. Power to detect the disease loci, accuracy in localizing the true site of the locus, and false-positive error rates are reported for the inferred-haplotype and unphased genotype methods. For this data, analyzing inferred haplotypes outperforms analysis of genotypes. As expected, our results suggest that when there is little or no LD between a disease locus and the flanking region, there will be no chance of detecting it unless the disease variant itself is genotyped.  相似文献   

11.
Admixture between populations originating on different continents can be exploited to detect disease susceptibility loci at which risk alleles are distributed differentially between these populations. We first examine the statistical power and mapping resolution of this approach in the limiting situation in which gamete admixture and locus ancestry are measured without uncertainty. We show that, for a rare disease, the most efficient design is to study affected individuals only. In a typical African American population (two-way admixture proportions 0.8/0.2, ancestry crossover rate 2 per 100 cM), a study of 800 affected individuals has 90% power to detect at P values <10(-5) a locus that generates a risk ratio of 2 between populations, with an expected mapping resolution (size of 95% confidence region for the position of the locus) of 4 cM. In practice, to infer locus ancestry from marker data requires Bayesian computationally intensive methods, as implemented in the program ADMIXMAP. Affected-only study designs require strong prior information on the frequencies of each allele given locus ancestry. We show how data from unadmixed and admixed populations can be combined to estimate these ancestry-specific allele frequencies within the admixed population under study, allowing for variation between allele frequencies in unadmixed and admixed populations. Using simulated data based on the genetic structure of the African American population, we show that 60% of information can be extracted in a test for linkage using markers with an ancestry information content of 36% at 3-cM spacing. As in classic linkage studies, the most efficient strategy is to use markers at a moderate density for an initial genome search and then to saturate regions of putative linkage with additional markers, to extract nearly all information about locus ancestry.  相似文献   

12.
We present a maximum likelihood method for mapping quantitative trait loci that uses linkage disequilibrium information from single and multiple markers. We made paired comparisons between analyses using a single marker, two markers and six markers. We also compared the method to single marker regression analysis under several scenarios using simulated data. In general, our method outperformed regression (smaller mean square error and confidence intervals of location estimate) for quantitative trait loci with dominance effects. In addition, the method provides estimates of the frequency and additive and dominance effects of the quantitative trait locus.  相似文献   

13.
OBJECTIVES: The rarity of familial neuroblastoma (NB) has allowed only a few linkage studies, most of which did not show any evidence of linkage to regions involved in somatic alterations or to genes implicated in other neurocristopathies seldom associated with NB. We screened a highly informative family with recurrent NB by genome-wide linkage analysis aimed at identifying chromosomal regions for NB predisposing genes. METHODS: A genome-wide screen was performed using 382 microsatellite markers. Multipoint model-based linkage analysis was carried out under a dominant mode of inheritance for the disease using the 'affected only' approach. RESULTS: Our analysis identified two haplotypes co-segregating with the disease on chromosomes 2p and 12p, and yielded maximum lod-score values of 3.01 (p < 0.0001) for markers on both intervals. CONCLUSIONS: Evidence of linkage was reported at 16p in North American families, whereas our studies excluded this interval and indicated other loci for disease predisposition, thus confirming the remarkable genetic heterogeneity of NB. These results suggest an oligogenic inheritance in NB involving more loci in genetic determination of the disease.  相似文献   

14.
Ionita I  Lo SH 《Human heredity》2005,60(4):227-240
OBJECTIVE: The conventional affected sib pair methods evaluate the linkage information at a locus by considering only marginal information. We describe a multilocus linkage method that uses both the marginal information and information derived from the possible interactions among several disease loci, thereby increasing the significance of loci with modest effects. METHODS: Our method is based on a statistic that quantifies the linkage information contained in a set of markers. By a marker selection-reduction process, we screen a set of polymorphisms and select a few that seem linked to disease. RESULTS: We test our approach on genome scan data for inflammatory bowel disease (InfBD) and on simulated data. On real data we detect 6 of the 8 known InfBD loci; on simulated data we obtain improvements in power of up to 40% compared to a conventional single-locus method. CONCLUSION: Our extensive simulations and the results on real data show that our method is in general more powerful than single-locus methods in detecting disease loci responsible for complex traits. A further advantage of our approach is that it can be extended to make use of both the linkage and the linkage disequilibrium between disease loci and nearby markers.  相似文献   

15.
Putative prostate cancer susceptibility loci have recently been identified by genetic linkage analysis on chromosomes 1q24-25 (HPC1). 1q44.243 (PCaP), and Xq27-28 (HPCX). In order to estimate the genetic linkage in Icelandic prostate cancer families, we genotyped 241 samples from 87 families with eleven markers in the HPC1 region, six markers at PCaP, and eight at HPCX. Concurrently, we assessed allelic imbalance at the HPC1 and PCaP loci in selected tumors from the patients. For each of the candidate regions, the combined parametric and non-parametric LOD scores were strongly negative. Evidence for linkage allowing for genetic heterogeneity was also insignificant for all the regions. The results were negative irrespective of whether calculations were performed for the whole material or for a selected set of early age at onset families. The prevalence of allelic imbalance was relatively low in both the HPC1 (0%-9%) and PCaP (5%-20%) regions and was not elevated in tumors from positively linked families. Our studies indicate that the putative cancer susceptibility genes at chromosomes 1q24-25, 1q44.2-43, and Xq27-28 are unlikely to contribute significantly to hereditary prostate cancer in Iceland and that selective loss of the HPC1 and PCaP loci is a relatively rare somatic event in prostate cancers.  相似文献   

16.
Single nucleotide polymorphisms (SNPs), or biallelic markers, are popular in genetic linkage studies due to their abundance in the genome, stability, and ease of scoring. We determined the 'information ratio' (IR) of closely spaced SNPs in simulated nuclear families and affected sib pairs (ASPs). (The IR is the ratio of actual average maximum lod score to the maximum lod score attainable if the marker were fully informative.) The nuclear families included parental information, whereas the ASPs did not. We analyzed these SNPs in two ways: (1) using multipoint analysis, and (2) treating the SNPs as 'composite markers' (i.e., haplotypes, as assigned by GENEHUNTER). (3) We also calculated the IR of a single microsatellite marker with multiple alleles and compared with the IR from the SNPs. For each set of input conditions, we simulated 1000 nuclear families, of 2, 3, 4, or 5 children each, as well as 1000 ASPs. We generated SNP marker data for strings of k = 1, 2, 3, 5, 7, and 10 SNP loci, with no recombination (theta = 0) and no linkage disequilibrium among the SNPs. The MAF (minor allele frequency) was either 0.5 or 0.25, and allele frequencies were the same for all k loci in any analysis. We also generated marker data for one single-locus microsatellite marker, with m = 3, 4, 5, 6, 7, and 9 equally frequent alleles. In all simulations, the disease was fully penetrant dominant, and there was no recombination or linkage disequilibrium among markers or between marker and disease. When multipoint analysis was used, we found that 5-7 closely spaced SNPs were usually enough to yield an IR of approximately 100%, for nuclear families of any size. However, for the ASPs, even 7-10 SNPs yielded an IR of only 70-80%. A microsatellite with 9 equally frequent alleles yielded about the same IR (86-88%) as a string of 4-5 SNPs, in nuclear families. SNPs analyzed as 'composite markers' analyses performed worse, due to the inherent ambiguity of SNP haplotyping.  相似文献   

17.
Genomewide scans for mapping loci have proved to be extremely powerful and popular. We present a semiparametric method of mapping a quantitative-trait locus (QTL) or QTLs with the use of sib-pair data generated from a two-stage genomic scan. In a two-stage genomic scan, either the entire genome or a large portion of the genome is saturated with low-density markers at the first stage. At the second stage, the intervals that are identified as probable locations of the trait loci, by means of analysis of data from the first stage, are then saturated with higher-density markers. These data are then analyzed for fine mapping of the loci. Our statistical strategy for analysis of data from the first stage is a low-stringency method based on the rank correlation of squared trait-difference values of the sib pairs and the estimated identity-by-descent scores at the marker loci. We suggest the use of a low-stringency method at the first stage, to save on computational time and to avoid missing any marker interval that may contain the trait loci. For analysis of data from the second stage, we have developed a high-stringency nonparametric-regression approach, using the kernel-smoothing technique. Through extensive simulations, we show that this approach is more powerful than is a currently used method for mapping QTLs by use of sib pairs, particularly in the presence of dominance and epistatic effects at the trait loci.  相似文献   

18.
Genomewide Scan of Multiple Sclerosis in Finnish Multiplex Families   总被引:13,自引:3,他引:10       下载免费PDF全文
Multiple sclerosis (MS) is a neurological, demyelinating disorder with a putative autoimmune etiology. It is thought to be a multifactorial disease with a complex mode of inheritance. Here we report the results of a two-stage genomewide scan for loci predisposing to MS. The first stage of the screen, with a low-resolution map, was performed in a selection of 16 pedigrees collected from an isolated Finnish population. Multipoint, non-parametric linkage analysis of the 328 markers did not reveal statistically significant results. However, 10 slightly interesting regions (P = .1-.15) emerged, including our previous findings of the HLA complex on 6p21 and a putative locus on 5p14-p12. Eight of these novel regions were further analyzed by use of denser marker maps, in the second stage of the scan. For the chromosomal regions 4cen, 11tel, and 17q, the statistical significance increased, but not conclusively; for 2q32 and 10q21, the statistical significance did not change. Accordingly, genotyping of the high-density markers in these regions was performed, and the data were analyzed by use of two-point, parametric linkage analysis using the complete pedigree information of the 21 Finnish multiplex families. We detected suggestive evidence for a predisposing locus on chromosomal region 17q22-q24. Several markers on 17q22-q24 yielded positive LOD scores, with the maximum LOD score (Zmax) occurring with D17S807 (Zmax = 2.8, theta = .04; dominant model). Interestingly, a suggestive linkage between MS and the markers on 17q22-q24 was also revealed by a recent genomewide scan in MS families from the United Kingdom.  相似文献   

19.
Detection of tandem duplications and implications for linkage analysis.   总被引:1,自引:1,他引:0  
The first demonstration of an autosomal dominant human disease caused by segmental trisomy came in 1991 for Charcot-Marie-Tooth disease type 1A (CMT1A). For this disorder, the segmental trisomy is due to a large tandem duplication of 1.5 Mb of DNA located on chromosome 17p11.2-p12. The search for the CMT1A disease gene was misdirected and impeded because some chromosome 17 genetic markers that are linked to CMT1A lie within this duplication. To better understand how such a duplication might affect genetic analyses in the context of disease gene mapping, we studied the effects of marker duplication on transmission probabilities of marker alleles, on linkage analysis of an autosomal dominant disease, and on tests of linkage homogeneity. We demonstrate that the undetected presence of a duplication distorts transmission ratios, hampers fine localization of the disease gene, and increases false evidence of linkage heterogeneity. In addition, we devised a likelihood-based method for detecting the presence of a tandemly duplicated marker when one is suspected. We tested our methods through computer simulations and on CMT1A pedigrees genotyped at several chromosome 17 markers. On the simulated data, our method detected 96% of duplicated markers (with a false-positive rate of 5%). On the CMT1A data our method successfully identified two of three loci that are duplicated (with no false positives). This method could be used to identify duplicated markers in other regions of the genome and could be used to delineate the extent of duplications similar to that involved in CMT1A.  相似文献   

20.

Background

With dense genotyping, many choices exist for methods to detect quantitative trait loci (QTL) in livestock populations. However, no across-species study has been conducted on the performance of different methods using real data. We compared three methods that correct for relatedness either implicitly or explicitly: linkage and linkage disequilibrium haplotype-based analysis (LDLA), efficient mixed-model association (EMMA) analysis, and Bayesian whole-genome regression (BayesC). We analyzed one chromosome in each of five datasets (dairy cattle, beef cattle, sheep, horses, and pigs) using real genotypes based on dense single nucleotide polymorphisms and phenotypes. The P values corrected for multiple testing or Bayes factors greater than 150 were considered to be significant. To complete the real data study, we also simulated quantitative trait loci (QTL) for the same datasets based on the real genotypes. Several scenarios were chosen, with different QTL effects and linkage disequilibrium patterns. A pseudo-null statistical distribution was chosen to make the significance thresholds comparable across methods.

Results

For the real data, the three methods generally agreed within 1 or 2 cM for the locations of QTL regions and disagreed when no signals were significant (e.g. in pigs). For certain datasets, LDLA had more significant signals than EMMA or BayesC, but they were concentrated around the same peaks. Therefore, the three methods detected approximately the same number of QTL regions. For the simulated data, LDLA was slightly less powerful and accurate than either EMMA or BayesC but this depended strongly on how thresholds were set in the simulations.

Conclusions

All three methods performed similarly for real and simulated data. No method was clearly superior across all datasets or for any particular dataset. For computational efficiency and ease of interpretation, EMMA is recommended, but using more than one method is suggested.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-015-0087-7) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号