首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Both intra- and interspecific genomic comparisons have revealed local similarities in the level and frequency of mutational variation, as well as in patterns of gene expression. This autocorrelation between measurements leads to violations of assumptions of independence in many statistical methods, resulting in misleading and incorrect inferences. Here I show that autocorrelation can be due to many factors and is present across the genome. Using a one-dimensional spatial stochastic model, I further show how previous results can be employed to correct for autocorrelation along chromosomes in population and comparative genomics research. When multiple hypothesis tests are autocorrelated, I demonstrate that a simple correction can lead to increased power in statistical inference. I present a preliminary analysis of population genomic data from Drosophila simulans to show the ubiquity of autocorrelation and applicability of the methods proposed here.  相似文献   

2.
3.
4.
Comparative genome sequence analysis is powerful, but sequencing genomes is expensive. It is desirable to be able to predict how many genomes are needed for comparative genomics, and at what evolutionary distances. Here I describe a simple mathematical model for the common problem of identifying conserved sequences. The model leads to some useful rules of thumb. For a given evolutionary distance, the number of comparative genomes needed for a constant level of statistical stringency in identifying conserved regions scales inversely with the size of the conserved feature to be detected. At short evolutionary distances, the number of comparative genomes required also scales inversely with distance. These scaling behaviors provide some intuition for future comparative genome sequencing needs, such as the proposed use of “phylogenetic shadowing” methods using closely related comparative genomes, and the feasibility of high-resolution detection of small conserved features.  相似文献   

5.
MOTIVATION: At a recent meeting, the wavelet transform was depicted as a small child kicking back at its father, the Fourier transform. Wavelets are more efficient and faster than Fourier methods in capturing the essence of data. Nowadays there is a growing interest in using wavelets in the analysis of biological sequences and molecular biology-related signals. RESULTS: This review is intended to summarize the potential of state of the art wavelets, and in particular wavelet statistical methodology, in different areas of molecular biology: genome sequence, protein structure and microarray data analysis. I conclude by discussing the use of wavelets in modeling biological structures.  相似文献   

6.
A number of statistical methods are widely used to describe allelic variation at specific genetic loci and its implication on the evolutionary history of these loci. Although the methods were developed primarily to study allelic variation at loci that are virtually always present in the genome, they are often applied to data of gene content variation (i.e., presence/absence of multiple homologous genes) at the killer cell immunoglobulin-like receptor (KIR) gene cluster. In this paper, we discuss methodological issues involved in the analysis of gene content variation data in the KIR region and also its covariation with polymorphism at the human leukocyte antigen class I loci, which encode ligands for KIR. A comparison of several statistical methods and measures (gene frequency, haplotype frequency, and linkage disequilibrium estimation) using the Centre d’Etude du Polymorphisme Humain data will be provided using KIR haplotypes that have been determined by segregation analysis, noting the strengths and weaknesses of the methods when only the presence/absence data is considered. Finally, application of these methods to a set of globally distributed populations is described (see Single et al., Nat Genet 39:1114–1119, 2007) in order to illustrate the challenges faced when inferring the joint effects of natural selection and demographic history on these immune-related genes.  相似文献   

7.
The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12–13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs) with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis-) to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I) HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.  相似文献   

8.
Regional association analysis is a new statistical method which simultaneously considers all variants in a selected genome region. This method was created for the analysis of rare genetic variants, whose genotypes are determined by exome or genome sequencing. The gene is usually considered as a region. It was also proposed to use a regional analysis for testing of the association between a complex trait and a set of common variants genotyped by the panels developed for genome-wide association analysis. In this case, overlapping genome regions (sliding windows) are usually considered as a region. Since the size of such regions can be rather large, there is a risk of overestimation (inflation) of the test statistic and an increase in the type I error. In this work, the effect of the size of the region on the type I error was studied for traits with different heritability. The results of simulating experiments demonstrated that the physical size of the region but not the number of genetic variants in it is a limiting factor. The higher the trait heritability, the greater the type I error differs from the declared value. The analysis of a large number of real traits confirmed these conclusions. It is necessary to take into account these results during the interpretation of the results of regional association analysis conducted on large regions using common genetic variants.  相似文献   

9.
Spatial analysis of two-species interactions   总被引:10,自引:0,他引:10  
Mark Andersen 《Oecologia》1992,91(1):134-140
Summary In this paper, I present and discuss some methods for the analysis of univariate and bivariate spatial point pattern data. Examples of such data in ecology include x-y coordinates of organisms in mapped field plots. I illustrate the methods with analyses of data from mapped field plots on Mount St. Helens, Washington state, USA. The statistical methods I emphasize are graphical methods that rely on analysis of distances between organisms. Hypothesis testing for methods like these is easily done using Monte Carlo methods, which I also discuss. For both univariate and bivariate analyses, I find that second-order methods such as K-function plots are often preferable to first-order methods (i.e., QQ-plots). However, for multivariate analyses, these second-order methods are more sensitive to small sample sizes than first-order analyses.  相似文献   

10.
MOTIVATION: Novel methods, both molecular and statistical, are urgently needed to take advantage of recent advances in biotechnology and the human genome project for disease diagnosis and prognosis. Mass spectrometry (MS) holds great promise for biomarker identification and genome-wide protein profiling. It has been demonstrated in the literature that biomarkers can be identified to distinguish normal individuals from cancer patients using MS data. Such progress is especially exciting for the detection of early-stage ovarian cancer patients. Although various statistical methods have been utilized to identify biomarkers from MS data, there has been no systematic comparison among these approaches in their relative ability to analyze MS data. RESULTS: We compare the performance of several classes of statistical methods for the classification of cancer based on MS spectra. These methods include: linear discriminant analysis, quadratic discriminant analysis, k-nearest neighbor classifier, bagging and boosting classification trees, support vector machine, and random forest (RF). The methods are applied to ovarian cancer and control serum samples from the National Ovarian Cancer Early Detection Program clinic at Northwestern University Hospital. We found that RF outperforms other methods in the analysis of MS data.  相似文献   

11.
MOTIVATION: Array comparative genomic hybridization (CGH) allows detection and mapping of copy number of DNA segments. A challenge is to make inferences about the copy number structure of the genome. Several statistical methods have been proposed to determine genomic segments with different copy number levels. However, to date, no comprehensive comparison of various characteristics of these methods exists. Moreover, the segmentation results have not been utilized in downstream analyses. RESULTS: We describe a comparison of three popular and publicly available methods for the analysis of array CGH data and we demonstrate how segmentation results may be utilized in the downstream analyses such as testing and classification, yielding higher power and prediction accuracy. Since the methods operate on individual chromosomes, we also propose a novel procedure for merging segments across the genome, which results in an interpretable set of copy number levels, and thus facilitate identification of copy number alterations in each genome. AVAILABILITY: http://www.bioconductor.org  相似文献   

12.
Maximum likelihood and Bayesian approaches are presented for analyzing hierarchical statistical models of natural selection operating on DNA polymorphism within a panmictic population. For analyzing Bayesian models, we present Markov chain Monte-Carlo (MCMC) methods for sampling from the joint posterior distribution of parameters. For frequentist analysis, an Expectation-Maximization (EM) algorithm is presented for finding the maximum likelihood estimate of the genome wide mean and variance in selection intensity among classes of mutations. The framework presented here provides an ideal setting for modeling mutations dispersed through the genome and, in particular, for the analysis of how natural selection operates on different classes of single nucleotide polymorphisms (SNPs).  相似文献   

13.
Methods for the analysis of chromatin immunoprecipitation sequencing (ChIP-seq) data start by aligning the short reads to a reference genome. While often successful, they are not appropriate for cases where a reference genome is not available. Here we develop methods for de novo analysis of ChIP-seq data. Our methods combine de novo assembly with statistical tests enabling motif discovery without the use of a reference genome. We validate the performance of our method using human and mouse data. Analysis of fly data indicates that our method outperforms alignment based methods that utilize closely related species.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0756-4) contains supplementary material, which is available to authorized users.  相似文献   

14.
The Illumina Infinium HumanMethylation450 BeadChip has emerged as one of the most popular platforms for genome wide profiling of DNA methylation. While the technology is wide-spread, systematic technical biases are believed to be present in the data. For example, this array incorporates two different chemical assays, i.e., Type I and Type II probes, which exhibit different technical characteristics and potentially complicate the computational and statistical analysis. Several normalization methods have been introduced recently to adjust for possible biases. However, there is considerable debate within the field on which normalization procedure should be used and indeed whether normalization is even necessary. Yet despite the importance of the question, there has been little comprehensive comparison of normalization methods. We sought to systematically compare several popular normalization approaches using the Norwegian Mother and Child Cohort Study (MoBa) methylation data set and the technical replicates analyzed with it as a case study. We assessed both the reproducibility between technical replicates following normalization and the effect of normalization on association analysis. Results indicate that the raw data are already highly reproducible, some normalization approaches can slightly improve reproducibility, but other normalization approaches may introduce more variability into the data. Results also suggest that differences in association analysis after applying different normalizations are not large when the signal is strong, but when the signal is more modest, different normalizations can yield very different numbers of findings that meet a weaker statistical significance threshold. Overall, our work provides useful, objective assessment of the effectiveness of key normalization methods.  相似文献   

15.
《Epigenetics》2013,8(2):318-329
The Illumina Infinium HumanMethylation450 BeadChip has emerged as one of the most popular platforms for genome wide profiling of DNA methylation. While the technology is wide-spread, systematic technical biases are believed to be present in the data. For example, this array incorporates two different chemical assays, i.e., Type I and Type II probes, which exhibit different technical characteristics and potentially complicate the computational and statistical analysis. Several normalization methods have been introduced recently to adjust for possible biases. However, there is considerable debate within the field on which normalization procedure should be used and indeed whether normalization is even necessary. Yet despite the importance of the question, there has been little comprehensive comparison of normalization methods. We sought to systematically compare several popular normalization approaches using the Norwegian Mother and Child Cohort Study (MoBa) methylation data set and the technical replicates analyzed with it as a case study. We assessed both the reproducibility between technical replicates following normalization and the effect of normalization on association analysis. Results indicate that the raw data are already highly reproducible, some normalization approaches can slightly improve reproducibility, but other normalization approaches may introduce more variability into the data. Results also suggest that differences in association analysis after applying different normalizations are not large when the signal is strong, but when the signal is more modest, different normalizations can yield very different numbers of findings that meet a weaker statistical significance threshold. Overall, our work provides useful, objective assessment of the effectiveness of key normalization methods.  相似文献   

16.
Phylogenetic methods for the analysis of species data are widely used in evolutionary studies. However, preliminary data transformations and data reduction procedures (such as a size‐correction and principal components analysis, PCA) are often performed without first correcting for nonindependence among the observations for species. In the present short comment and attached R and MATLAB code, I provide an overview of statistically correct procedures for phylogenetic size‐correction and PCA. I also show that ignoring phylogeny in preliminary transformations can result in significantly elevated variance and type I error in our statistical estimators, even if subsequent analysis of the transformed data is performed using phylogenetic methods. This means that ignoring phylogeny during preliminary data transformations can possibly lead to spurious results in phylogenetic statistical analyses of species data.  相似文献   

17.
18.
Using ANOVA to analyze microarray data   总被引:6,自引:0,他引:6  
Churchill GA 《BioTechniques》2004,37(2):173-5, 177
ANOVA provides a general approach to the analysis of single and multiple factor experiments on both one- and two-color microarray platforms. Mixed model ANOVA is important because in many microarray experiments there are multiple sources of variation that must be taken into consideration when constructing tests for differential expression of a gene. The genome is large, and the signals of expression change can be small, so we must rely on rigorous statistical methods to distinguish signal from noise. We apply statistical tests to ensure that we are not just making up stories based on seeing patterns where there may be none.  相似文献   

19.
Multiple Trait Analysis of Genetic Mapping for Quantitative Trait Loci   总被引:49,自引:2,他引:47  
C. Jiang  Z. B. Zeng 《Genetics》1995,140(3):1111-1127
We present in this paper models and statistical methods for performing multiple trait analysis on mapping quantitative trait loci (QTL) based on the composite interval mapping method. By taking into account the correlated structure of multiple traits, this joint analysis has several advantages, compared with separate analyses, for mapping QTL, including the expected improvement on the statistical power of the test for QTL and on the precision of parameter estimation. Also this joint analysis provides formal procedures to test a number of biologically interesting hypotheses concerning the nature of genetic correlations between different traits. Among the testing procedures considered are those for joint mapping, pleiotropy, QTL by environment interaction, and pleiotropy vs. close linkage. The test of pleiotropy (one pleiotropic QTL at a genome position) vs. close linkage (multiple nearby nonpleiotropic QTL) can have important implications for our understanding of the nature of genetic correlations between different traits in certain regions of a genome and also for practical applications in animal and plant breeding because one of the major goals in breeding is to break unfavorable linkage. Results of extensive simulation studies are presented to illustrate various properties of the analyses.  相似文献   

20.
Behavioural studies are commonly plagued with data that violate the assumptions of parametric statistics. Consequently, classic nonparametric methods (e.g. rank tests) and novel distribution-free methods (e.g. randomization tests) have been used to a great extent by behaviourists. However, the robustness of such methods in terms of statistical power and type I error have seldom been evaluated. This probably reflects the fact that empirical methods, such as Monte Carlo approaches, are required to assess these concerns. In this study we show that analytical methods cannot always be used to evaluate the robustness of statistical tests, but rather Monte Carlo approaches must be employed. We detail empirical protocols for estimating power and type I error rates for parametric, nonparametric and randomization methods, and demonstrate their application for an analysis of variance and a regression/correlation analysis design. Together, this study provides a framework from which behaviourists can compare the reliability of different methods for data analysis, serving as a basis for selecting the most appropriate statistical test given the characteristics of data at hand. Copyright 2001 The Association for the Study of Animal Behaviour.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号