首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The abundance of the bath sponge Spongia agaricina has decreased drastically in recent years and it is now considered an endangered species under Annex 3 of Bern and Barcelona conventions. We describe eight microsatellite markers and present data on their allelic variation and utility as high resolution genetic markers. We analyzed 36 individuals from two populations and found that the number of alleles per locus ranged between 1 and 7. Observed heterozygosity ranged from 0 to 0.72. We found deviations from Hardy–Weinberg expectations for some loci. We exclusively detected null alleles for those loci that deviated from Hardy–Weinberg expectations. Also, distributions of allele frequencies differed significantly between the two populations, making them suitable for population genetic analyses.  相似文献   

2.
Khon Kaen Province in northeast Thailand is known as a hot spot for opisthorchiasis in Southeast Asia. Preliminary allozyme and mitochondrial DNA haplotype data from within one endemic district in this Province (Ban Phai), indicated substantial genetic variability within Opisthorchis viverrini. Here, we used microsatellite DNA analyses to examine the genetic diversity and population structure of O. viverrini from four geographically close localities in Khon Kaen Province. Genotyping based on 12 microsatellite loci yielded a mean number of alleles per locus that ranged from 2.83 to 3.7 with an expected heterozygosity in Hardy–Weinberg equilibrium of 0.44–0.56. Assessment of population structure by pairwise FST analysis showed inter-population differentiation (P<0.05) which indicates population substructuring between these localities. Unique alleles were found in three of four localities with the highest number observed per locality being three. Our results highlight the existence of genetic diversity and population substructuring in O. viverrini over a small spatial scale which is similar to that found at a larger scale. This provides the basis for the investigation of the role of parasite genetic diversity and differentiation in transmission dynamics and control of O. viverrini.  相似文献   

3.
The most important decision faced by large-scale studies, such as those presently encountered in human genetics, is to distinguish between those tests that are true positives from those that are not. In the context of genetics, this entails the determination of genetic markers that actually underlie medically-relevant phenotypes from a vast number of makers typically interrogated in genome-wide studies. A critical part of these decisions relies on the appropriate statistical assessment of data obtained from tests across numerous markers. Several methods have been developed to aid with such analyses, with family-wise approaches, such as the Bonferroni and Dunn-Šidàk corrections, being popular. Conditions that motivate the use of family-wise corrections are explored. Although simple to implement, one major limitation of these approaches is that they assume that p-values are i.i.d. uniformly distributed under the null hypothesis. However, several factors may violate this assumption in genome-wide studies including effects from confounding by population stratification, the presence of related individuals, the correlational structure among genetic markers, and the use of limiting distributions for test statistics. Even after adjustment for such effects, the distribution of p-values can substantially depart from a uniform distribution under the null hypothesis. In this work, I present a decision theory for the use of family-wise corrections for multiplicity and a generalization of the Dunn-Šidàk correction that relaxes the assumption of uniformly-distributed null p-values. The independence assumption is also relaxed and handled through calculating the effective number of independent tests. I also explicitly show the relationship between order statistics and family-wise correction procedures. This generalization may be applicable to multiplicity problems outside of genomics.  相似文献   

4.
Genome-wide association studies (GWAS) have identified thousands of genetic variants that are associated with complex traits. However, a stringent significance threshold is required to identify robust genetic associations. Leveraging relevant auxiliary covariates has the potential to boost statistical power to exceed the significance threshold. Particularly, abundant pleiotropy and the non-random distribution of SNPs across various functional categories suggests that leveraging GWAS test statistics from related traits and/or functional genomic data may boost GWAS discovery. While type 1 error rate control has become standard in GWAS, control of the false discovery rate can be a more powerful approach. The conditional false discovery rate (cFDR) extends the standard FDR framework by conditioning on auxiliary data to call significant associations, but current implementations are restricted to auxiliary data satisfying specific parametric distributions, typically GWAS p-values for related traits. We relax these distributional assumptions, enabling an extension of the cFDR framework that supports auxiliary covariates from arbitrary continuous distributions (“Flexible cFDR”). Our method can be applied iteratively, thereby supporting multi-dimensional covariate data. Through simulations we show that Flexible cFDR increases sensitivity whilst controlling FDR after one or several iterations. We further demonstrate its practical potential through application to an asthma GWAS, leveraging various functional genomic data to find additional genetic associations for asthma, which we validate in the larger, independent, UK Biobank data resource.  相似文献   

5.
Multiple testing (MT) with false discovery rate (FDR) control has been widely conducted in the “discrete paradigm” where p-values have discrete and heterogeneous null distributions. However, in this scenario existing FDR procedures often lose some power and may yield unreliable inference, and for this scenario there does not seem to be an FDR procedure that partitions hypotheses into groups, employs data-adaptive weights and is nonasymptotically conservative. We propose a weighted p-value-based FDR procedure, “weighted FDR (wFDR) procedure” for short, for MT in the discrete paradigm that efficiently adapts to both heterogeneity and discreteness of p-value distributions. We theoretically justify the nonasymptotic conservativeness of the wFDR procedure under independence, and show via simulation studies that, for MT based on p-values of binomial test or Fisher's exact test, it is more powerful than six other procedures. The wFDR procedure is applied to two examples based on discrete data, a drug safety study, and a differential methylation study, where it makes more discoveries than two existing methods.  相似文献   

6.
It is widely acknowledged that genome-wide association studies (GWAS) of complex human disease fail to explain a large portion of heritability, primarily due to lack of statistical power—a problem that is exacerbated when seeking detection of interactions of multiple genomic loci. An untapped source of information that is already widely available, and that is expected to grow in coming years, is population samples. Such samples contain genetic marker data for additional individuals, but not their relevant phenotypes. In this article we develop a highly efficient testing framework based on a constrained maximum-likelihood estimate in a case–control–population setting. We leverage the available population data and optional modeling assumptions, such as Hardy–Weinberg equilibrium (HWE) in the population and linkage equilibrium (LE) between distal loci, to substantially improve power of association and interaction tests. We demonstrate, via simulation and application to actual GWAS data sets, that our approach is substantially more powerful and robust than standard testing approaches that ignore or make naive use of the population sample. We report several novel and credible pairwise interactions, in bipolar disorder, coronary artery disease, Crohn’s disease, and rheumatoid arthritis.  相似文献   

7.
A key challenge in genomics is to identify genetic variants that distinguish patients with different survival time following diagnosis or treatment. While the log-rank test is widely used for this purpose, nearly all implementations of the log-rank test rely on an asymptotic approximation that is not appropriate in many genomics applications. This is because: the two populations determined by a genetic variant may have very different sizes; and the evaluation of many possible variants demands highly accurate computation of very small p-values. We demonstrate this problem for cancer genomics data where the standard log-rank test leads to many false positive associations between somatic mutations and survival time. We develop and analyze a novel algorithm, Exact Log-rank Test (ExaLT), that accurately computes the p-value of the log-rank statistic under an exact distribution that is appropriate for any size populations. We demonstrate the advantages of ExaLT on data from published cancer genomics studies, finding significant differences from the reported p-values. We analyze somatic mutations in six cancer types from The Cancer Genome Atlas (TCGA), finding mutations with known association to survival as well as several novel associations. In contrast, standard implementations of the log-rank test report dozens-hundreds of likely false positive associations as more significant than these known associations.  相似文献   

8.
G Dharmarajan  J C Beasley  O E Rhodes  Jr 《Heredity》2011,106(2):253-260
Population genetics is increasingly being used to study the biology of parasites at the scales of both the host (infrapopulation, IP) and host population (component population, CP). In this study we tested three mechanistic hypotheses that could explain deviations from Hardy–Weinberg equilibrium (HWE) expectations due to heterozygote deficits (HDs) at the CP scale in raccoon ticks (Ixodes texanus; n=718) collected from raccoons (Procyon lotor; n=91) and genotyped at 11 microsatellite loci. These hypotheses were presence of technical issues (for example, null alleles), hierarchical structure (for example, host demography) and cryptic structure (for example, kin structure). Although statistical support for null alleles existed, their presence would also be expected to lead to an underestimation in levels of relatedness, and thus kin structure. However, we found the opposite pattern: significant HD at the IP scale being more likely in CPs with significant vs non-significant levels of kin structure. Our analyses revealed that pooling of kin groups could lead to highly variable levels of FIS among loci, a pattern usually suggestive of null alleles. We used Monte–Carlo (MC) simulations to show that the existence of subdivided breeding groups and high variance in individual reproductive success could adequately explain deviations from HWE in I. texanus. Thus, our results indicate that biological factors can lead to patterns that have usually been interpreted as technical issues (for example, null alleles), and that it is important to take such factors into consideration because loci deviating from HWE likely reflect the effects of real biological processes.  相似文献   

9.
The marsh fritillary (Euphydryas aurinia) is a critically endangered butterfly species in Denmark known to be particularly vulnerable to habitat fragmentation due to its poor dispersal capacity. We identified and genotyped 318 novel SNP loci across 273 individuals obtained from 10 small and fragmented populations in Denmark using a genotyping‐by‐sequencing (GBS) approach to investigate its population genetic structure. Our results showed clear genetic substructuring and highly significant population differentiation based on genetic divergence (F ST) among the 10 populations. The populations clustered in three overall clusters, and due to further substructuring among these, it was possible to clearly distinguish six clusters in total. We found highly significant deviations from Hardy–Weinberg equilibrium due to heterozygote deficiency within every population investigated, which indicates substructuring and/or inbreeding (due to mating among closely related individuals). The stringent filtering procedure that we have applied to our genotype quality could have overestimated the heterozygote deficiency and the degree of substructuring of our clusters but is allowing relative comparisons of the genetic parameters among clusters. Genetic divergence increased significantly with geographic distance, suggesting limited gene flow at spatial scales comparable to the dispersal distance of individual butterflies and strong isolation by distance. Altogether, our results clearly indicate that the marsh fritillary populations are genetically isolated. Further, our results highlight that the relevant spatial scale for conservation of rare, low mobile species may be smaller than previously anticipated.  相似文献   

10.
Gilbert PB  Wu C  Jobes DV 《Biometrics》2008,64(1):198-207
Summary .   Consider a placebo-controlled preventive HIV vaccine efficacy trial. An HIV amino acid sequence is measured from each volunteer who acquires HIV, and these sequences are aligned together with the reference HIV sequence represented in the vaccine. We develop genome scanning methods to identify positions at which the amino acids in infected vaccine recipient sequences either (A) are more divergent from the reference amino acid than the amino acids in infected placebo recipient sequences or (B) have a different frequency distribution than the placebo sequences, irrespective of a reference amino acid. We consider t -test-type statistics for problem A and Euclidean, Mahalanobis, and Kullback–Leibler-type statistics for problem B. The test statistics incorporate weights to reflect biological information contained in different amino acid positions and mismatches. Position-specific p -values are obtained by approximating the null distribution of the statistics either by a permutation procedure or by a nonparametric estimation. A permutation method is used to estimate a cut-off p -value to control the per comparison error rate at a prespecified level. The methods are examined in simulations and are applied to two HIV examples. The methods for problem B address the general problem of comparing discrete frequency distributions between groups in a high-dimensional data setting.  相似文献   

11.
We consider the problem of testing a statistical hypothesiswhere the scientifically meaningful test statistic is a functionof latent variables. In particular, we consider detection ofgenetic linkage, where the latent variables are patterns ofinheritance at specific genome locations. Introduced by Geyer& Meeden (2005), fuzzy p-values are random variables, describedby their probability distributions, that are interpreted asp-values. For latent variable problems, we introduce the notionof a fuzzy p-value as having the conditional distribution ofthe latent p-value given the observed data, where the latentp-value is the random variable that would be the p-value ifthe latent variables were observed. The fuzzy p-value provides an exact test using two sets of simulationsof the latent variables under the null hypothesis, one unconditionaland the other conditional on the observed data. It providesnot only an expression of the strength of the evidence againstthe null hypothesis but also an expression of the uncertaintyin that expression owing to lack of knowledge of the latentvariables. We illustrate these features with an example of simulateddata mimicking a real example of the detection of genetic linkage.  相似文献   

12.
Genetic variants within the endothelin-1 gene (EDN1) have been associated with several cardiovascular diseases and may act as genetic prognostic markers. Here, we explored the overall relevance of EDN1 polymorphisms for long-term survival in patients undergoing on-pump cardiac surgery. A prospectively collected cohort of 455 Caucasian patients who underwent cardiac surgery with cardiopulmonary bypass was followed up for 5 years. The obtained genotypes and inferred haplotypes were analyzed for their associations with the five-year mortality rate (primary endpoint). The EDN1 T-1370G and K198N genotype distributions did not deviate from Hardy–Weinberg equilibrium and the major allele frequencies were 83% and 77%, respectively. The cardiovascular risk factors were equally distributed in terms of the different genotypes and haplotypes associated with the two polymorphisms. The five-year mortality rate did not differ among the different EDN1 T-1370G and K198N genotypes and haplotypes. Haplotype analysis revealed that carriers of the G-T (compound EDN1 T-1370G G/K198N T) haplotype had a higher cardiac index than did non-carriers (p = 0.0008); however, this difference did not reach significance after adjusting for multiple testing. The results indicate that common variations in EDN1 do not act as prognostic markers for long-term survival in patients undergoing on-pump cardiac surgery.  相似文献   

13.
Jon Wakefield 《Biometrics》2010,66(1):257-265
Summary .  Testing for Hardy–Weinberg equilibrium is ubiquitous and has traditionally been carried out via frequentist approaches. However, the discreteness of the sample space means that uniformity of  p -values under the null cannot be assumed, with enumeration of all possible counts, conditional on the minor allele count, offering a computationally expensive way of  p -value calibration. In addition, the interpretation of the subsequent  p -values, and choice of significance threshold depends critically on sample size, because equilibrium will always be rejected at conventional levels with large sample sizes. We argue for a Bayesian approach using both Bayes factors, and the examination of posterior distributions. We describe simple conjugate approaches, and methods based on importance sampling Monte Carlo. The former are convenient because they yield closed-form expressions for Bayes factors, which allow their application to a large number of single nucleotide polymorphisms (SNPs), in particular in genome-wide contexts. We also describe straightforward direct sampling methods for examining posterior distributions of parameters of interest. For large numbers of alleles at a locus we resort to Markov chain Monte Carlo. We discuss a number of possibilities for prior specification, and apply the suggested methods to a number of real datasets.  相似文献   

14.
Meta-analysis of genetic data must account for differences among studies including study designs, markers genotyped, and covariates. The effects of genetic variants may differ from population to population, i.e., heterogeneity. Thus, meta-analysis of combining data of multiple studies is difficult. Novel statistical methods for meta-analysis are needed. In this article, functional linear models are developed for meta-analyses that connect genetic data to quantitative traits, adjusting for covariates. The models can be used to analyze rare variants, common variants, or a combination of the two. Both likelihood-ratio test (LRT) and F-distributed statistics are introduced to test association between quantitative traits and multiple variants in one genetic region. Extensive simulations are performed to evaluate empirical type I error rates and power performance of the proposed tests. The proposed LRT and F-distributed statistics control the type I error very well and have higher power than the existing methods of the meta-analysis sequence kernel association test (MetaSKAT). We analyze four blood lipid levels in data from a meta-analysis of eight European studies. The proposed methods detect more significant associations than MetaSKAT and the P-values of the proposed LRT and F-distributed statistics are usually much smaller than those of MetaSKAT. The functional linear models and related test statistics can be useful in whole-genome and whole-exome association studies.  相似文献   

15.
Testing of Hardy–Weinberg proportions (HWP) with asymptotic goodness-of-fit tests is problematic when the contingency table of observed genotype counts has sparse cells or the sample size is low, and exact procedures are to be preferred. Exact p-values can be (1) calculated via computational demanding enumeration methods or (2) approximated via simulation methods. Our objective was to develop a new algorithm for exact tests of HWP with multiple alleles on the basis of conditional probabilities of genotype arrays, which is faster than existing algorithms. We derived an algorithm for calculating the exact permutation significance value without enumerating all genotype arrays having the same allele counts as the observed one. The algorithm can be used for testing HWP by (1) summation of the conditional probabilities of occurrence of genotype arrays with smaller probability than the observed one, and (2) comparison of the sum with a nominal Type I error rate α. Application to published experimental data from seven maize populations showed that the exact test is computationally feasible and reduces the number of enumerated genotype count matrices about 30% compared with previously published algorithms.  相似文献   

16.
Agaricus bisporus is a popular edible mushroom that is cultivated worldwide. Due to its secondary homothallic nature, cultivated A. bisporus strains have low genetic diversity, and breeding novel strains is challenging. The aim of this study was to investigate the genetic diversity and population structure of globally collected A. bisporus strains using simple sequence repeat (SSR) markers. Agaricus bisporus strains were divided based on genetic distance-based groups and model-based subpopulations. The major allele frequency (MAF), number of genotypes (NG), number of alleles (NA), observed heterozygosity (HO), expected heterozygosity (HE), and polymorphic information content (PIC) were calculated, and genetic distance, population structure, genetic differentiation, and Hardy–Weinberg equilibrium (HWE) were assessed. Strains were divided into two groups by distance-based analysis and into three subpopulations by model-based analysis. Strains in subpopulations POP A and POP B were included in Group I, and strains in subpopulation POP C were included in Group II. Genetic differentiation between strains was 99%. Marker AB-gSSR-1057 in Group II and subpopulation POP C was confirmed to be in HWE. These results will enhance A. bisporus breeding programs and support the protection of genetic resources.  相似文献   

17.
Enrichment analysis of gene sets is a popular approach that provides a functional interpretation of genome-wide expression data. Existing tests are affected by inter-gene correlations, resulting in a high Type I error. The most widely used test, Gene Set Enrichment Analysis, relies on computationally intensive permutations of sample labels to generate a null distribution that preserves gene–gene correlations. A more recent approach, CAMERA, attempts to correct for these correlations by estimating a variance inflation factor directly from the data. Although these methods generate P-values for detecting gene set activity, they are unable to produce confidence intervals or allow for post hoc comparisons. We have developed a new computational framework for Quantitative Set Analysis of Gene Expression (QuSAGE). QuSAGE accounts for inter-gene correlations, improves the estimation of the variance inflation factor and, rather than evaluating the deviation from a null hypothesis with a P-value, it quantifies gene-set activity with a complete probability density function. From this probability density function, P-values and confidence intervals can be extracted and post hoc analysis can be carried out while maintaining statistical traceability. Compared with Gene Set Enrichment Analysis and CAMERA, QuSAGE exhibits better sensitivity and specificity on real data profiling the response to interferon therapy (in chronic Hepatitis C virus patients) and Influenza A virus infection. QuSAGE is available as an R package, which includes the core functions for the method as well as functions to plot and visualize the results.  相似文献   

18.
There is increasing evidence that pleiotropy, the association of multiple traits with the same genetic variants/loci, is a very common phenomenon. Cross-phenotype association tests are often used to jointly analyze multiple traits from a genome-wide association study (GWAS). The underlying methods, however, are often designed to test the global null hypothesis that there is no association of a genetic variant with any of the traits, the rejection of which does not implicate pleiotropy. In this article, we propose a new statistical approach, PLACO, for specifically detecting pleiotropic loci between two traits by considering an underlying composite null hypothesis that a variant is associated with none or only one of the traits. We propose testing the null hypothesis based on the product of the Z-statistics of the genetic variants across two studies and derive a null distribution of the test statistic in the form of a mixture distribution that allows for fractions of variants to be associated with none or only one of the traits. We borrow approaches from the statistical literature on mediation analysis that allow asymptotic approximation of the null distribution avoiding estimation of nuisance parameters related to mixture proportions and variance components. Simulation studies demonstrate that the proposed method can maintain type I error and can achieve major power gain over alternative simpler methods that are typically used for testing pleiotropy. PLACO allows correlation in summary statistics between studies that may arise due to sharing of controls between disease traits. Application of PLACO to publicly available summary data from two large case-control GWAS of Type 2 Diabetes and of Prostate Cancer implicated a number of novel shared genetic regions: 3q23 (ZBTB38), 6q25.3 (RGS17), 9p22.1 (HAUS6), 9p13.3 (UBAP2), 11p11.2 (RAPSN), 14q12 (AKAP6), 15q15 (KNL1) and 18q23 (ZNF236).  相似文献   

19.
With the development of next-generation sequencing technology, there is a great demand for powerful statistical methods to detect rare variants (minor allele frequencies (MAFs)<1%) associated with diseases. Testing for each variant site individually is known to be underpowered, and therefore many methods have been proposed to test for the association of a group of variants with phenotypes, by pooling signals of the variants in a chromosomal region. However, this pooling strategy inevitably leads to the inclusion of a large proportion of neutral variants, which may compromise the power of association tests. To address this issue, we extend the -MidP method (Cheung et al., 2012, Genet Epidemiol 36: 675–685) and propose an approach (named ‘adaptive combination of P-values for rare variant association testing’, abbreviated as ‘ADA’) that adaptively combines per-site P-values with the weights based on MAFs. Before combining P-values, we first imposed a truncation threshold upon the per-site P-values, to guard against the noise caused by the inclusion of neutral variants. This ADA method is shown to outperform popular burden tests and non-burden tests under many scenarios. ADA is recommended for next-generation sequencing data analysis where many neutral variants may be included in a functional region.  相似文献   

20.
C Li  Y Sun  H W Huang  C H Cannon 《Heredity》2014,113(6):533-541
Given predicted rapid climate change, an understanding of how environmental factors affect genetic diversity in natural populations is important. Future selection pressures are inherently unpredictable, so forest management policies should maintain both overall diversity and identify genetic markers associated with the environmental factors expected to change most rapidly, like temperature and rainfall. In this study, we genotyped 648 individuals in 28 populations of Castanopsis fargesii (Fagaceae) using 32 expressed sequence tag (EST)-derived microsatellite markers. After removing six loci that departed from Hardy–Weinberg equilibrium, we measured genetic variation, population structure and identified candidate loci putatively under selection by temperature and precipitation. We found that C. fargesii populations possessed high genetic diversity and moderate differentiation among them, indicating predominant outcrossing and few restrictions to gene flow. These patterns reduce the possible impact of stochastic effects or the influence of genetic isolation. Clear footprints of divergent selection at four loci were discovered. Frequencies of five alleles at these loci were strongly correlated with environmental factors, particularly extremes in precipitation. These alleles varied from being near fixation at one end of the gradient to being completely absent at the other. Our study species is an important forest tree in the subtropical regions of China and could have a major role in future management and reforestation plans. Our results demonstrate that the gene flow is widespread and abundant in natural populations, maintaining high diversity, while diversifying selection is acting on specific genomic regions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号