首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.

Background

When conducting multiple hypothesis tests, it is important to control the number of false positives, or the False Discovery Rate (FDR). However, there is a tradeoff between controlling FDR and maximizing power. Several methods have been proposed, such as the q-value method, to estimate the proportion of true null hypothesis among the tested hypotheses, and use this estimation in the control of FDR. These methods usually depend on the assumption that the test statistics are independent (or only weakly correlated). However, many types of data, for example microarray data, often contain large scale correlation structures. Our objective was to develop methods to control the FDR while maintaining a greater level of power in highly correlated datasets by improving the estimation of the proportion of null hypotheses.

Results

We showed that when strong correlation exists among the data, which is common in microarray datasets, the estimation of the proportion of null hypotheses could be highly variable resulting in a high level of variation in the FDR. Therefore, we developed a re-sampling strategy to reduce the variation by breaking the correlations between gene expression values, then using a conservative strategy of selecting the upper quartile of the re-sampling estimations to obtain a strong control of FDR.

Conclusion

With simulation studies and perturbations on actual microarray datasets, our method, compared to competing methods such as q-value, generated slightly biased estimates on the proportion of null hypotheses but with lower mean square errors. When selecting genes with controlling the same FDR level, our methods have on average a significantly lower false discovery rate in exchange for a minor reduction in the power.  相似文献   

3.

Background

The current versions of reference genome assemblies still contain gaps represented by stretches of Ns. Since high throughput sequencing reads cannot be mapped to those gap regions, the regions are depleted of experimental data. Moreover, several technology platforms assay a targeted portion of the genomic sequence, meaning that regions from the unassayed portion of the genomic sequence cannot be detected in those experiments. We here refer to all such regions as inaccessible regions, and hypothesize that ignoring these regions in the null model may increase false findings in statistical testing of colocalization of genomic features.

Results

Our explorative analyses confirm that the genomic regions in public genomic tracks intersect very little with assembly gaps of human reference genomes (hg19 and hg38). The little intersection was observed only at the beginning and end portions of the gap regions. Further, we simulated a set of synthetic tracks by matching the properties of real genomic tracks in a way that nullified any true association between them. This allowed us to test our hypothesis that not avoiding inaccessible regions (as represented by assembly gaps) in the null model would result in spurious inflation of statistical significance. We contrasted the distributions of test statistics and p-values of Monte Carlo-based permutation tests that either avoided or did not avoid assembly gaps in the null model when testing colocalization between a pair of tracks. We observed that the statistical tests that did not account for assembly gaps in the null model resulted in a distribution of the test statistic that is shifted to the right and a distribution of p-values that is shifted to the left (indicating inflated significance). We observed a similar level of inflated significance in hg19 and hg38, despite assembly gaps covering a smaller proportion of the latter reference genome.

Conclusion

We provide empirical evidence demonstrating that inaccessible regions, even when covering only a few percentages of the genome, can lead to a substantial amount of false findings if not accounted for in statistical colocalization analysis.
  相似文献   

4.

Background

Taxonomy or biological systematics is the basic scientific discipline of biology, postulating hypotheses of identity and relationships, on which all other natural sciences dealing with organisms relies. However, the scientific contributions of taxonomists have been largely neglected when using species names in scientific publications by not citing the authority on which they are based.

Discussion

Consequences of this neglect is reduced recognition of the importance of taxonomy, which in turn results in diminished funding, lower interest from journals in publishing taxonomic research, and a reduced number of young scientists entering the field. This has lead to the so-called taxonomic impediment at a time when biodiversity studies are of critical importance. Here we emphasize a practical and obvious solution to this dilemma. We propose that whenever a species name is used, the author(s) of the species hypothesis be included and the original literature source cited, including taxonomic revisions and identification literature - nothing more than what is done for every other hypothesis or assumption included in a scientific publication. In addition, we postulate that journals primarily publishing taxonomic studies should be indexed in ISISM.

Summary

The proposal outlined above would make visible the true contribution of taxonomists within the scientific community, and would provide a more accurate assessment for funding agencies impact and importance of taxonomy, and help in the recruitment of young scientists into the field, thus helping to alleviate the taxonomic impediment. In addition, it would also make much of the biological literature more robust by reducing or alleviating taxonomic uncertainty.  相似文献   

5.

Background

The availability of sequences from whole genomes to reconstruct the tree of life has the potential to enable the development of phylogenomic hypotheses in ways that have not been before possible. A significant bottleneck in the analysis of genomic-scale views of the tree of life is the time required for manual curation of genomic data into multi-gene phylogenetic matrices.

Results

To keep pace with the exponentially growing volume of molecular data in the genomic era, we have developed an automated technique, ASAP (Automated Simultaneous Analysis Phylogenetics), to assemble these multigene/multi species matrices and to evaluate the significance of individual genes within the context of a given phylogenetic hypothesis.

Conclusion

Applications of ASAP may enable scientists to re-evaluate species relationships and to develop new phylogenomic hypotheses based on genome-scale data.  相似文献   

6.

?

We examine the Tree of Life (TOL) as an evolutionary hypothesis and a heuristic. The original TOL hypothesis has failed but a new "statistical TOL hypothesis" is promising. The TOL heuristic usefully organizes data without positing fundamental evolutionary truth.

Reviewers

This article was reviewed by W. Ford Doolittle, Nicholas Galtier and Christophe Malaterre.  相似文献   

7.

Background

Molecular and epidemiological evidence demonstrate that altered gene expression and single nucleotide polymorphisms in the apoptotic pathway are linked to many cancers. Yet, few studies emphasize the interaction of variant apoptotic genes and their joint modifying effects on prostate cancer (PCA) outcomes. An exhaustive assessment of all the possible two-, three- and four-way gene-gene interactions is computationally burdensome. This statistical conundrum stems from the prohibitive amount of data needed to account for multiple hypothesis testing.

Methods

To address this issue, we systematically prioritized and evaluated individual effects and complex interactions among 172 apoptotic SNPs in relation to PCA risk and aggressive disease (i.e., Gleason score ≥ 7 and tumor stages III/IV). Single and joint modifying effects on PCA outcomes among European-American men were analyzed using statistical epistasis networks coupled with multi-factor dimensionality reduction (SEN-guided MDR). The case-control study design included 1,175 incident PCA cases and 1,111 controls from the prostate, lung, colo-rectal, and ovarian (PLCO) cancer screening trial. Moreover, a subset analysis of PCA cases consisted of 688 aggressive and 488 non-aggressive PCA cases. SNP profiles were obtained using the NCI Cancer Genetic Markers of Susceptibility (CGEMS) data portal. Main effects were assessed using logistic regression (LR) models. Prior to modeling interactions, SEN was used to pre-process our genetic data. SEN used network science to reduce our analysis from > 36 million to < 13,000 SNP interactions. Interactions were visualized, evaluated, and validated using entropy-based MDR. All parametric and non-parametric models were adjusted for age, family history of PCA, and multiple hypothesis testing.

Results

Following LR modeling, eleven and thirteen sequence variants were associated with PCA risk and aggressive disease, respectively. However, none of these markers remained significant after we adjusted for multiple comparisons. Nevertheless, we detected a modest synergistic interaction between AKT3 rs2125230-PRKCQ rs571715 and disease aggressiveness using SEN-guided MDR (p = 0.011).

Conclusions

In summary, entropy-based SEN-guided MDR facilitated the logical prioritization and evaluation of apoptotic SNPs in relation to aggressive PCA. The suggestive interaction between AKT3-PRKCQ and aggressive PCA requires further validation using independent observational studies.  相似文献   

8.

Background

Although expression microarrays have become a standard tool used by biologists, analysis of data produced by microarray experiments may still present challenges. Comparison of data from different platforms, organisms, and labs may involve complicated data processing, and inferring relationships between genes remains difficult.

Results

S TAR N ET 2 is a new web-based tool that allows post hoc visual analysis of correlations that are derived from expression microarray data. S TAR N ET 2 facilitates user discovery of putative gene regulatory networks in a variety of species (human, rat, mouse, chicken, zebrafish, Drosophila, C. elegans, S. cerevisiae, Arabidopsis and rice) by graphing networks of genes that are closely co-expressed across a large heterogeneous set of preselected microarray experiments. For each of the represented organisms, raw microarray data were retrieved from NCBI's Gene Expression Omnibus for a selected Affymetrix platform. All pairwise Pearson correlation coefficients were computed for expression profiles measured on each platform, respectively. These precompiled results were stored in a MySQL database, and supplemented by additional data retrieved from NCBI. A web-based tool allows user-specified queries of the database, centered at a gene of interest. The result of a query includes graphs of correlation networks, graphs of known interactions involving genes and gene products that are present in the correlation networks, and initial statistical analyses. Two analyses may be performed in parallel to compare networks, which is facilitated by the new H EAT S EEKER module.

Conclusion

S TAR N ET 2 is a useful tool for developing new hypotheses about regulatory relationships between genes and gene products, and has coverage for 10 species. Interpretation of the correlation networks is supported with a database of previously documented interactions, a test for enrichment of Gene Ontology terms, and heat maps of correlation distances that may be used to compare two networks. The list of genes in a S TAR N ET network may be useful in developing a list of candidate genes to use for the inference of causal networks. The tool is freely available at http://vanburenlab.medicine.tamhsc.edu/starnet2.html, and does not require user registration.  相似文献   

9.

Key message

Identification and allele-specific marker development of a functional SNP of HvLox - 1 which associated with barley lipoxygenase activity.

Abstract

Improving the stability of the flavor of beer is one of the main objectives in breeding barley for malting, and lipoxygenase-1 (LOX-1) is a key enzyme controlling this trait. In this study, a modified LOX activity assay was used for null LOX-1 mutant screening. Four barley landraces with no detected level of LOX-1 activity were screened from 1,083 barley germplasm accessions from China. The genomic sequence diversity of the HvLox-1 gene of the four null LOX-1 Chinese landraces was compared with that of a further 76 accessions. A total of 104 nucleotide polymorphisms were found, which contained 83 single-nucleotide polymorphisms (SNPs), 7 multiple-nucleotide polymorphisms, and 14 insertions and deletions. Most notably, we found a rare C/G mutation (SNP-61) in the second intron which led to null LOX-1 activity through an altered splicing acceptor site. In addition, an allele-specific polymerase chain reaction marker was developed for the genotyping of SNP-61, which could be used in breeding programs for barley to be used for malting. The objective was to improve beer quality.  相似文献   

10.

Background

De Winter and Happee [1] examined whether science based on selective publishing of significant results may be effective in accurate estimation of population effects, and whether this is even more effective than a science in which all results are published (i.e., a science without publication bias). Based on their simulation study they concluded that “selective publishing yields a more accurate meta-analytic estimation of the true effect than publishing everything, (and that) publishing nonreplicable results while placing null results in the file drawer can be beneficial for the scientific collective” (p.4).

Methods and Findings

Using their scenario with a small to medium population effect size, we show that publishing everything is more effective for the scientific collective than selective publishing of significant results. Additionally, we examined a scenario with a null effect, which provides a more dramatic illustration of the superiority of publishing everything over selective publishing.

Conclusion

Publishing everything is more effective than only reporting significant outcomes.  相似文献   

11.
Park  Jun Young  Wu  Chong  Pan  Wei 《BMC genetics》2018,19(1):68-43

Background

We propose a gene-level association test that accounts for individual relatedness and population structures in pedigree data in the framework of linear mixed models (LMMs). Our method data-adaptively combines the results across a class of score-based tests, only requiring fitting a single null model (under the null hypothesis) for the whole genome, thereby being computationally efficient.

Results

We applied our approach to test for association with the high-density lipoprotein (HDL) ratio of post- and pretreatments in GAW20 data. Using the LMM similar to that used by Aslibekyan et al. (PLos One, 7:48663, 2012), our method identified 2 nearly significant genes (APOA5 and ZNF259) near rs964184, whereas neither the other gene-level tests nor the standard test on each individual single-nucleotide polymorphism (SNP) detected any significant gene in a genome-wide scan.

Conclusions

Gene-level association testing can be a complementary approach to the SNP-level association testing and our method is adaptive and efficient compared to several other existing gene-level association tests.
  相似文献   

12.

Background

Combination of erlotinib and bevacizumab is a promising regimen in advanced non-squamous non-small-cell lung cancer (NSCLC). We are conducting a single arm phase II trial which aims to evaluate the efficacy and safety of this regime as a second- or third-line chemotherapy.

Methods

Key eligibility criteria were histologically or cytologically confirmed non-squamous NSCLC, stage III/IV or recurrent NSCLC not indicated radical chemoradiation, prior one or two regimen of chemotherapy, age 20 years or more, and performance status of two or less. The primary endpoint is objective response rate. The secondary endpoints include overall survival, progression-free survival, disease control rate and incidence of adverse events. This trial plans to accrue 80 patients based on a two-stage design employing a binomial distribution with an alternative hypothesis response rate of 35% and a null hypothesis threshold response rate of 20%. A subset analysis according to EGFR mutation status is planned.

Discussion

We have presented the design of a single arm phase II trial to evaluate the efficacy and safety of combination of bevacizumab and erlotinib in advanced non-squamous NSCLC patients. In particular we are interested in determining the merit of further development of this regimen and whether prospective patient selection using EGFR gene is necessary in future trials.

Trial registration

This trial was registered at the UMIN Clinical Trials Registry as UMIN000004255 (http://www.umin.ac.jp/ctr/index.htm).  相似文献   

13.

Background and aims

Seasonally flooded South American savannas harbor different kinds of mound-field landscapes of largely unknown origin. A recent study used soil carbon-isotope depth profiles and other proxies to infer vegetation history in murundu landscapes in Brazil. Results suggested that differential erosion, not building-up processes (e.g., termite mounds), produced mounds. We tested this approach to inferring mound origin in a mound-field landscape in French Guiana.

Methods

We examined carbon-isotope depth profiles of soil organic matter, phytolith profiles and contemporary vegetation composition in mounds and inter-mounds.

Results

Relative abundance of C3 and C4 plants across habitats was very different from that in murundu landscapes; C3 plants were better represented in inter-mounds than on mounds. Habitat differences in C3/C4 distribution were subtler than in murundu landscapes, limiting inference of vegetation history based on carbon isotopes. Still, carbon-isotope and phytolith depth profiles gave similar pictures of vegetation history, both favoring a building-up hypothesis, corroborating other evidence that these mounds are vestiges of ancient agricultural raised fields.

Conclusions

Carbon-isotope depth profiles are unlikely to be adequate for deciphering origin of mound-field landscapes from vegetation history in seasonally flooded savannas. Including data on current vegetation and phytoliths makes inferences more robust.  相似文献   

14.

Background

Consensus exists that several bariatric surgery procedures produce a rapid improvement of glucose homeostasis in obese diabetic patients, improvement apparently uncorrelated with the degree of eventual weight loss after surgery. Several hypotheses have been suggested to account for these results: among these, the anti-incretin, the ghrelin and the lower-intestinal dumping hypotheses have been discussed in the literature. Since no clear-cut experimental results are so far available to confirm or disprove any of these hypotheses, in the present work a mathematical model of the glucose-insulin-incretin system has been built, capable of expressing these three postulated mechanisms. The model has been populated with critically evaluated parameter values from the literature, and simulations under the three scenarios have been compared.

Results

The modeling results seem to indicate that the suppression of ghrelin release is unlikely to determine major changes in short-term glucose control. The possible existence of an anti-incretin hormone would be supported if an experimental increase of GIP concentrations were evident post-surgery. Given that, on the contrary, collected evidence suggests that GIP concentrations decrease post-surgery, the lower-intestinal dumping hypothesis would seem to describe the mechanism most likely to produce the observed normalization of Type 2 Diabetes Mellitus (T2DM) after bariatric surgery.

Conclusions

The proposed model can help discriminate among competing hypotheses in a context where definitive data are not available and mechanisms are still not clear.  相似文献   

15.
16.

Key message

Genetic and phenotypic analysis of two complementary maize panels revealed an important variation for biomass yield. Flowering and biomass QTL were discovered by association mapping in both panels.

Abstract

The high whole plant biomass productivity of maize makes it a potential source of energy in animal feeding and biofuel production. The variability and the genetic determinism of traits related to biomass are poorly known. We analyzed two highly diverse panels of Dent and Flint lines representing complementary heterotic groups for Northern Europe. They were genotyped with the 50 k SNP-array and phenotyped as hybrids (crossed to a tester of the complementary pool) in a western European field trial network for traits related to flowering time, plant height, and biomass. The molecular information revealed to be a powerful tool for discovering different levels of structure and relatedness in both panels. This study revealed important variation and potential genetic progress for biomass production, even at constant precocity. Association mapping was run by combining genotypes and phenotypes in a mixed model with a random polygenic effect. This permitted the detection of significant associations, confirming height and flowering time quantitative trait loci (QTL) found in literature. Biomass yield QTL were detected in both panels but were unstable across the environments. Alternative kinship estimator only based on markers unlinked to the tested SNP increased the number of significant associations by around 40 % with a satisfying control of the false positive rate. This study gave insights into the variability and the genetic architectures of biomass-related traits in Flint and Dent lines and suggests important potential of these two pools for breeding high biomass yielding hybrid varieties.  相似文献   

17.

Background

In designing genome-wide association (GWA) studies it is important to calculate statistical power. General statistical power calculation procedures for quantitative measures often require information concerning summary statistics of distributions such as mean and variance. However, with genetic studies, the effect size of quantitative traits is traditionally expressed as heritability, a quantity defined as the amount of phenotypic variation in the population that can be ascribed to the genetic variants among individuals. Heritability is hard to transform into summary statistics. Therefore, general power calculation procedures cannot be used directly in GWA studies. The development of appropriate statistical methods and a user-friendly software package to address this problem would be welcomed.

Results

This paper presents GWAPower, a statistical software package of power calculation designed for GWA studies with quantitative traits, where genetic effect is defined as heritability. Based on several popular one-degree-of-freedom genetic models, this method avoids the need to specify the non-centrality parameter of the F-distribution under the alternative hypothesis. Therefore, it can use heritability information directly without approximation. In GWAPower, the power calculation can be easily adjusted for adding covariates and linkage disequilibrium information. An example is provided to illustrate GWAPower, followed by discussions.

Conclusions

GWAPower is a user-friendly free software package for calculating statistical power based on heritability in GWA studies with quantitative traits. The software is freely available at: http://dl.dropbox.com/u/10502931/GWAPower.zip  相似文献   

18.

Background

When natural hybridization occurs at sites where the hybridizing species differ in abundance, the pollen load delivered to the rare species should be predominantly from the common species. Previous authors have therefore proposed a hypothesis on the direction of hybridization: interspecific hybrids are more likely to have the female parent from the rare species and the male parent from the common species. We wish to test this hypothesis using data of plant hybridizations both from our own experimentation and from the literature.

Results

By examining the maternally inherited chloroplast DNA of 6 cases of F1 hybridization from four genera of plants, we infer unidirectional hybridization in most cases. In all 5 cases where the relative abundance of the parental species deviates from parity, however, the direction is predominantly in the direction opposite of the prediction based strictly on numerical abundance.

Conclusion

Our results show that the observed direction of hybridization is almost always opposite of the predicted direction based on the relative abundance of the hybridizing species. Several alternative hypotheses, including unidirectional postmating isolation and reinforcement of premating isolation, were discussed.  相似文献   

19.
Our reply to the commentary on cladistics presented by Cronquist (1987) is aimed at four issues:
  1. the application of scientific principles in systematics;
  2. the recognition that the analysis of pattern is a vital precursor to any consideration of evolutionary process. A priori judgements of evolutionary process are unnecessary for the generation of informative systematic hypotheses which are chosen for their ability to explain the patterns of character distributions rather than for compatibility with any particular preconceived ideas about evolution;
  3. that phenetic concepts such as overall similarity, grades, gaps, and degree of divergence, if included in methods of phylogenetic inference, will give erroneous results. Paraphyletic and polyphyletic groups must, consequently, be rejected from systematics since they have no rational empirical basis for recognition;
  4. the fact that many of the problems of phylogenetic analysis attributed by Cronquist to cladistics are common to all systematic methods but that these can be dealt with by the application of such principles as parsimony, synapomorphy, and strict monophyly.
  相似文献   

20.

Background

Massively-parallel sequencing (MPS) technologies create challenges for informed consent of research participants given the enormous scale of the data and the wide range of potential results.

Discussion

We propose that the consent process in these studies be based on whether they use MPS to test a hypothesis or to generate hypotheses. To demonstrate the differences in these approaches to informed consent, we describe the consent processes for two MPS studies. The purpose of our hypothesis-testing study is to elucidate the etiology of rare phenotypes using MPS. The purpose of our hypothesis-generating study is to test the feasibility of using MPS to generate clinical hypotheses, and to approach the return of results as an experimental manipulation. Issues to consider in both designs include: volume and nature of the potential results, primary versus secondary results, return of individual results, duty to warn, length of interaction, target population, and privacy and confidentiality.

Summary

The categorization of MPS studies as hypothesis-testing versus hypothesis-generating can help to clarify the issue of so-called incidental or secondary results for the consent process, and aid the communication of the research goals to study participants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号