首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Haplotype-based methods have become increasingly popular in the last decade because shared lengths in haplotypes can be used for disease localization. In this contribution, we propose a novel linkage-based haplotype-sharing approach for quantitative traits based on the class of Mantel statistics which is closely related to the weighted pair-wise correlation statistic. Because these statistics are known to be liberal, we propose a permutation test to evaluate significance. We applied the Mantel statistic to the autosomal data from the genome-wide scan of the Collaborative Study on the Genetics of Alcoholism with the Affymetrix Genotype 10 K array that was provided for the Genetic Analysis Workshop 14. Four regions on chromosome 4, 8, 16, and 20 showed p-values less than 0.005 with a minimum p-value of < 0.0001 on chromosome 16 (tsc0520638 at 72.8 cM). Three of these four regions located on chromosome 4, 16, and 20 have been reported previously in the Genetic Analysis Workshop 11.  相似文献   

2.
A simple method for the spectral analysis of multispecies microfossil data through time or stratigraphic level is presented. The method is based on the Mantel correlogram, allowing any ecological similarity measure to be used. The method can therefore be applied to binary (presence-absence) data as well as raw or normalized species counts. In contrast with spectral analysis of univariate ordination scores, this approach does not explicitly discard information. The method, referred to as the Mantel periodogram, is exemplified with a data set from the literature, demonstrating several astronomically forced periodicities in microfaunal data from the Plio-Pleistocene.  相似文献   

3.
Abstract. Plant species distributions are generally thought to be chiefly under environmental control, although they may be affected by disturbance events or dispersion properties of the species. The relative importance of these different factors is not easy to evaluate because they often share common spatial patterns, such that an inextricable network of relationships occurs between plant distributions, environmental conditions, disturbance events and endogenous factors such as propagule dispersion. In this paper we propose a method for untangling the common spatial component from the relationship between environmental conditions and the distribution of tree species. Using partial Mantel tests and path analysis, we test models of relationships between these data sets. Results show that in our study area, spatial patterns of species associated with hydric conditions remain largely correlated with environmental conditions. However, mesic sites show more complex forest covers, in which a significant spatial component persists when environmental variation is statistically controlled for. This remaining spatial variability suggests that other factors possessing spatial structure partly explain species distributions.  相似文献   

4.
OBJECTIVE: The potential value of haplotypes has attracted widespread interest in the mapping of complex traits. Haplotype sharing methods take the linkage disequilibrium information between multiple markers into account, and may have good power to detect predisposing genes. We present a new approach based on Mantel statistics for spacetime clustering, which is developed in order to improve the power of haplotype sharing analysis for gene mapping in complex disease. METHODS: The new statistic correlates genetic similarity and phenotypic similarity across pairs of haplotypes for case-only and case-control studies. The genetic similarity is measured as the shared length between haplotypes around a putative disease locus. The phenotypic similarity is measured as the mean-corrected cross-product based on the respective phenotypes. We analyzed two tests for statistical significance with respect to type I error: (1) assuming asymptotic normality, and (2) using a Monte Carlo permutation procedure. The results were compared to the chi(2) test for association based on 3-marker haplotypes. RESULTS: The results of the type I error rates for the Mantel statistics using the permutational procedure yielded pointwise valid tests. The approach based on the assumption of asymptotic normality was seriously liberal. CONCLUSION: Power comparisons showed that the Mantel statistics were better than or equal to the chi(2) test for all simulated disease models.  相似文献   

5.
The Mantel test is widely used to test the linear or monotonic independence of the elements in two distance matrices. It is one of the few appropriate tests when the hypothesis under study can only be formulated in terms of distances; this is often the case with genetic data. In particular, the Mantel test has been widely used to test for spatial relationship between genetic data and spatial layout of the sampling locations. We describe the domain of application of the Mantel test and derived forms. Formula development demonstrates that the sum-of-squares (SS) partitioned in Mantel tests and regression on distance matrices differs from the SS partitioned in linear correlation, regression and canonical analysis. Numerical simulations show that in tests of significance of the relationship between simple variables and multivariate data tables, the power of linear correlation, regression and canonical analysis is far greater than that of the Mantel test and derived forms, meaning that the former methods are much more likely than the latter to detect a relationship when one is present in the data. Examples of difference in power are given for the detection of spatial gradients. Furthermore, the Mantel test does not correctly estimate the proportion of the original data variation explained by spatial structures. The Mantel test should not be used as a general method for the investigation of linear relationships or spatial structures in univariate or multivariate data. Its use should be restricted to tests of hypotheses that can only be formulated in terms of distances.  相似文献   

6.
Methods for genetic linkage analysis using trisomies.   总被引:2,自引:2,他引:0       下载免费PDF全文
Certain genetic disorders are rare in the general population, but more common in individuals with specific trisomies. Examples of this include leukemia and duodenal atresia in trisomy 21. This paper presents a linkage analysis method for using trisomic individuals to map genes for such traits. It is based on a very general gene-specific dosage model that posits that the trait is caused by specific effects of different alleles at one or a few loci and that duplicate copies of "susceptibility" alleles inherited from the nondisjoining parent give increased likelihood of having the trait. Our mapping method is similar to identity-by-descent-based mapping methods using affected relative pairs and also to methods for mapping recessive traits using inbred individuals by looking for markers with greater than expected homozygosity by descent. In the trisomy case, one would take trisomic individuals and look for markers with greater than expected homozygosity in the chromosomes inherited from the nondisjoining parent. We present statistical methods for performing such a linkage analysis, including a test for linkage to a marker, a method for estimating the distance from the marker to the trait gene, a confidence interval for that distance, and methods for computing power and sample sizes. We also resolve some practical issues involved in implementing the methods, including how to use partially informative markers and how to test candidate genes.  相似文献   

7.
A total of 141 vascular plant species were recorded from 20 quadrats on five isolated granite outcrops in southern Western Australia. The distributions of common plants, rare plants, trophic guids, and the complete data were analysed with respect to substrate (on sheet granite or scree), location (on granite outcrops), and a broader classification of locations into two western and three eastern outcrops. Most of the observed correlations were in accord with an hypothesis of environmental control that predicts distributions reflecting spatially autocorrelated environmental parameters. However, plant guilds were very strongly correlated with substrate and relatively weakly correlated with location, a result that can be explained by coexistence through geographic replacement and contingent exclusion. Rare plants were more likely to grow on either substrate than were common plants, which contrasts with the supposition that rare plants have limited environmental tolerances, but may have been caused by spatial or temporal mass effects. The Mantel test and its recent extensions allow non-exclusive, competing hypotheses to be distinguished.  相似文献   

8.
Correct classification and prediction of tumor cells is essential for a successful diagnosis and reliable future treatment. In this study, we aimed at using genetic algorithms for feature selection and proposed silhouette statistics as a discriminant function to distinguish between six subtypes of pediatric acute lymphoblastic leukemia by using microarray with thousands of gene expressions. Our methods have shown a better classification accuracy than previously published methods and obtained a set of genes effective to discriminate subtypes of pediatric acute lymphoblastic leukemia. Furthermore, the use of silhouette statistics, offering the advantages of measuring the classification quality by a graphical display and by an average silhouette width, has also demonstrated feasibility and novelty for more difficult multiclass tumor prediction problems.  相似文献   

9.
10.
Ueki M  Cordell HJ 《PLoS genetics》2012,8(4):e1002625
Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new "joint effects" statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al.'s originally-proposed statistics, on account of the inflated error rate that can result.  相似文献   

11.
The statistics of bulk segregant analysis using next generation sequencing   总被引:1,自引:0,他引:1  
We describe a statistical framework for QTL mapping using bulk segregant analysis (BSA) based on high throughput, short-read sequencing. Our proposed approach is based on a smoothed version of the standard G statistic, and takes into account variation in allele frequency estimates due to sampling of segregants to form bulks as well as variation introduced during the sequencing of bulks. Using simulation, we explore the impact of key experimental variables such as bulk size and sequencing coverage on the ability to detect QTLs. Counterintuitively, we find that relatively large bulks maximize the power to detect QTLs even though this implies weaker selection and less extreme allele frequency differences. Our simulation studies suggest that with large bulks and sufficient sequencing depth, the methods we propose can be used to detect even weak effect QTLs and we demonstrate the utility of this framework by application to a BSA experiment in the budding yeast Saccharomyces cerevisiae.  相似文献   

12.
The hypothesis that genomic regions rich in non-protein-coding RNAs (ncRNAs) can be identified using local variations in single-base and dinucleotide statistics has been investigated. (G+C)%, (G-C)% difference, (A-T)% difference and dinucleotide-frequency statistics were compared among seven classes of ncRNAs and three genomes. Significant variations were observed in (G+C)% and, in Methanococcus jannaschii, in the frequency of the dinucleotide 'CG'. Screening programs based on these two base-composition statistics were developed. With (G+C)% screening alone, a 1% fraction of the M.jannaschii genome containing all 44 known transfer RNAs, ribosomal RNAs and signal recognition particle RNAs could be identified. When (G+C)% combined with CG dinucleotide-frequency screening was used, 43 of the 44 known M.jannaschii structural ncRNAs were again identified, while the number of presumably false hits overlapping a known or putative protein-coding gene was reduced from 15 to 6. In addition, 19 candidate ncRNAs were identified including one with significant homology to several known archaeal RNaseP RNAs.  相似文献   

13.
We explore the estimation of uncertainty in evolutionary parameters using a recently devised approach for resampling entire additive genetic variance–covariance matrices ( G ). Large‐sample theory shows that maximum‐likelihood estimates (including restricted maximum likelihood, REML) asymptotically have a multivariate normal distribution, with covariance matrix derived from the inverse of the information matrix, and mean equal to the estimated G . This suggests that sampling estimates of G from this distribution can be used to assess the variability of estimates of G , and of functions of G . We refer to this as the REML‐MVN method. This has been implemented in the mixed‐model program WOMBAT. Estimates of sampling variances from REML‐MVN were compared to those from the parametric bootstrap and from a Bayesian Markov chain Monte Carlo (MCMC) approach (implemented in the R package MCMCglmm). We apply each approach to evolvability statistics previously estimated for a large, 20‐dimensional data set for Drosophila wings. REML‐MVN and MCMC sampling variances are close to those estimated with the parametric bootstrap. Both slightly underestimate the error in the best‐estimated aspects of the G matrix. REML analysis supports the previous conclusion that the G matrix for this population is full rank. REML‐MVN is computationally very efficient, making it an attractive alternative to both data resampling and MCMC approaches to assessing confidence in parameters of evolutionary interest.  相似文献   

14.
Estimating effects of parental and sibling genotypes (indirect genetic effects) can provide insight into how the family environment influences phenotypic variation. There is growing molecular genetic evidence for effects of parental phenotypes on their offspring (e.g. parental educational attainment), but the extent to which siblings affect each other is currently unclear. Here we used data from samples of unrelated individuals, without (singletons) and with biological full-siblings (non-singletons), to investigate and estimate sibling effects. Indirect genetic effects of siblings increase (or decrease) the covariance between genetic variation and a phenotype. It follows that differences in genetic association estimates between singletons and non-singletons could indicate indirect genetic effects of siblings if there is no heterogeneity in other sources of genetic association between singletons and non-singletons. We used UK Biobank data to estimate polygenic score (PGS) associations for height, BMI and educational attainment in self-reported singletons (N = 50,143) and non-singletons (N = 328,549). The educational attainment PGS association estimate was 12% larger (95% C.I. 3%, 21%) in the non-singleton sample than in the singleton sample, but the height and BMI PGS associations were consistent. Birth order data suggested that the difference in educational attainment PGS associations was driven by individuals with older siblings rather than firstborns. The relationship between number of siblings and educational attainment PGS associations was non-linear; PGS associations were 24% smaller in individuals with 6 or more siblings compared to the rest of the sample (95% C.I. 11%, 38%). We estimate that a 1 SD increase in sibling educational attainment PGS corresponds to a 0.025 year increase in the index individual’s years in schooling (95% C.I. 0.013, 0.036). Our results suggest that older siblings may influence the educational attainment of younger siblings, adding to the growing evidence that effects of the environment on phenotypic variation partially reflect social effects of germline genetic variation in relatives.  相似文献   

15.
The aim of genetic mapping is to locate the loci responsible for specific traits such as complex diseases. These traits are normally caused by mutations at multiple loci of unknown locations and interactions. In this work, we model the biological system that relates DNA polymorphisms with complex traits as a linear mixing process. Given this model, we propose a new fine-scale genetic mapping method based on independent component analysis. The proposed method outputs both independent associated groups of SNPs in addition to specific associated SNPs with the phenotype. It is applied to a clinical data set for the Schizophrenia disease with 368 individuals and 42 SNPs. It is also applied to a simulation study to investigate in more depth its performance. The obtained results demonstrate the novel characteristics of the proposed method compared to other genetic mapping methods. Finally, we study the robustness of the proposed method with missing genotype values and limited sample sizes.  相似文献   

16.
In data analysis involving the proportional-hazards regression model due to Cox (1972, Journal of the Royal Statistical Society, Series B 34, 187-220), the test criteria commonly used for assessing the partial contribution to survival of subsets of concomitant variables are the classical likelihood ratio (LR) and Wald statistics. This paper presents an investigation of three other test criteria with potentially major computational advantages over the classical tests, especially for stepwise variable selection in moderate to large data sets. The alternative criteria considered are Rao's efficient score statistic and two other score statistics. Under the Cox model, the performance of these tests is examined empirically and compared with the performance of the LR and Wald statistics. Rao's test performs comparably to the LR test in all the cases considered. The performance of the other criteria is competitive in many cases. The use of these statistics is illustrated in a study of coronary artery disease.  相似文献   

17.
Robbins LG 《Genetics》2000,154(1):13-26
Graduate school programs in genetics have become so full that courses in statistics have often been eliminated. In addition, typical introductory statistics courses for the "statistics user" rather than the nascent statistician are laden with methods for analysis of measured variables while genetic data are most often discrete numbers. These courses are often seen by students and genetics professors alike as largely irrelevant cookbook courses. The powerful methods of likelihood analysis, although commonly employed in human genetics, are much less often used in other areas of genetics, even though current computational tools make this approach readily accessible. This article introduces the MLIKELY.PAS computer program and the logic of do-it-yourself maximum-likelihood statistics. The program itself, course materials, and expanded discussions of some examples that are only summarized here are available at http://www.unisi. it/ricerca/dip/bio_evol/sitomlikely/mlikely.h tml.  相似文献   

18.
The genetic structure and relationships among 10 Spanish dog breeds have been studied by using F statistics. Data came from 21 structural genic loci that codify for blood-soluble proteins and enzymes detected by electrophoresis. Of the 21 loci, 11 were found to be polymorphic. The study was done at three levels of hierarchical differentiation: ancestral trunks, breeds, and subpopulations. The deficit of heterozygotes was estimated at the subpopulation, breed, and ancestral trunk levels, with values of 4.0%, 6.5%, and 11.2%, respectively. In the whole population, the deficit of heterozygotes was about 17%. The proportion of genetic variability attributable to differences between subpopulations, breeds, and ancestral trunks was estimated to be 14.2%, 9.9%, and 6.9%, respectively. The dendrogram, obtained by using values of genic differentiation (FST) as a measure of the genetic distance among populations, is topologically identical to the one obtained using Nei's index of distance, which indicates a high correlation (r = .99) between both distances. These racial groupings, however, differ from the grouping obtained from historical, archeological, and morphological data.  相似文献   

19.
A robust analysis of comparative genomic microarray data is critical for meaningful genomic comparison studies. In this paper, we compare our method (implemented in a new software tool, GENCOM, freely available at ) with three commonly used analysis methods: GACK (freely available at ), an empirical cut-off value of twofold difference between the fluorescence intensities after LOWESS normalization or after AVERAGE normalization in which the fluorescence intensity is divided by the average fluorescence intensity of the entire data set. Each method was tested using data sets from real experiments with prior knowledge of conserved and divergent genes. GENCOM and GACK were superior when a high proportion of genes were divergent. GENCOM was the most suitable method for the data set in which the relationship between the fluorescence intensities was not linear. GENCOM has proved robust in an analysis of all the data sets tested.  相似文献   

20.
In this paper, correlation of the pixels comprising a microarray spot is investigated. Subsequently, correlation statistics, namely, Pearson correlation and Spearman rank correlation, are used to segment the foreground and background intensity of microarray spots. The performance of correlation-based segmentation is compared to clustering-based (PAM, k-means) and seeded-region growing techniques (SPOT). It is shown that correlation-based segmentation is useful in flagging poorly hybridized spots, thus minimizing false-positives. The present study also raises the intriguing question of whether a change in correlation can be an indicator of differential gene expression.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号