期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Computationally efficient sibship and parentage assignment from multilocus marker data 总被引：1，自引：0，他引：1

Wang J 《Genetics》2012,191(1):183-194

Quite a few methods have been proposed to infer sibship and parentage among individuals from their multilocus marker genotypes. They are all based on Mendelian laws either qualitatively (exclusion methods) or quantitatively (likelihood methods), have different optimization criteria, and use different algorithms in searching for the optimal solution. The full-likelihood method assigns sibship and parentage relationships among all sampled individuals jointly. It is by far the most accurate method, but is computationally prohibitive for large data sets with many individuals and many loci. In this article I propose a new likelihood-based method that is computationally efficient enough to handle large data sets. The method uses the sum of the log likelihoods of pairwise relationships in a configuration as the score to measure its plausibility, where log likelihoods of pairwise relationships are calculated only once and stored for repeated use. By analyzing several empirical and many simulated data sets, I show that the new method is more accurate than pairwise likelihood and exclusion-based methods, but is slightly less accurate than the full-likelihood method. However, the new method is computationally much more efficient than the full-likelihood method, and for the cases of both sexes polygamous and markers with genotyping errors, it can be several orders faster. The new method can handle a large sample with thousands of individuals and the number of markers limited only by the computer memory. 相似文献

2.

Relative efficiency of using summary versus individual data in random-effects meta-analysis

Ding-Geng Chen Dungang Liu Xiaoyi Min Heping Zhang 《Biometrics》2020,76(4):1319-1329

Meta-analysis is a statistical methodology for combining information from diverse sources so that a more reliable and efficient conclusion can be reached. It can be conducted by either synthesizing study-level summary statistics or drawing inference from an overarching model for individual participant data (IPD) if available. The latter is often viewed as the “gold standard.” For random-effects models, however, it remains not fully understood whether the use of IPD indeed gains efficiency over summary statistics. In this paper, we examine the relative efficiency of the two methods under a general likelihood inference setting. We show theoretically and numerically that summary-statistics-based analysis is at most as efficient as IPD analysis, provided that the random effects follow the Gaussian distribution, and maximum likelihood estimation is used to obtain summary statistics. More specifically, (i) the two methods are equivalent in an asymptotic sense; and (ii) summary-statistics-based inference can incur an appreciable loss of efficiency if the sample sizes are not sufficiently large. Our results are established under the assumption that the between-study heterogeneity parameter remains constant regardless of the sample sizes, which is different from a previous study. Our findings are confirmed by the analyses of simulated data sets and a real-world study of alcohol interventions. 相似文献

3.

Integration of genomic technologies for accelerated cancer drug development

Basik M Mousses S Trent J 《BioTechniques》2003,35(3):580-2, 584, 586 passim

New technologies have greatly increased the scientist's ability to investigate complex molecular interactions that occur in cancer development and to identify genetic alterations and drug targets. However, these new capabilities have not accelerated drug development efforts; rather, they may be contributing to increased research and development costs because the large number of new drug targets discovered through genomics need to be investigated in great detail to characterize their putative functional involvement in the disease process. One solution to this bottleneck in functional analysis is the use of high-throughput technologies to produce efficient processes that can rapidly handle the large flood of information at every stage of disease. This review examines the use of new and emerging DNA, tissue, and live-cell transfection microarray technologies that can be used to discover, validate, and translate information resulting from the completion of the Human Genome Project. 相似文献

4.

Group testing regression model estimation when case identification is a goal

Boan Zhang Christopher R. Bilder Joshua M. Tebbs 《Biometrical journal. Biometrische Zeitschrift》2013,55(2):173-189

Group testing is frequently used to reduce the costs of screening a large number of individuals for infectious diseases or other binary characteristics in small prevalence situations. In many applications, the goals include both identifying individuals as positive or negative and estimating the probability of positivity. The identification aspect leads to additional tests being performed, known as “retests”, beyond those performed for initial groups of individuals. In this paper, we investigate how regression models can be fit to estimate the probability of positivity while also incorporating the extra information from these retests. We present simulation evidence showing that significant gains in efficiency occur by incorporating retesting information, and we further examine which testing protocols are the most efficient to use. Our investigations also demonstrate that some group testing protocols can actually lead to more efficient estimates than individual testing when diagnostic tests are imperfect. The proposed methods are applied retrospectively to chlamydia screening data from the Infertility Prevention Project. We demonstrate that significant cost savings could occur through the use of particular group testing protocols. 相似文献

5.

An Accessible Method for Implementing Hierarchical Models with Spatio-Temporal Abundance Data

Beth E. Ross Mevin B. Hooten David N. Koons 《PloS one》2012,7(11)

A common goal in ecology and wildlife management is to determine the causes of variation in population dynamics over long periods of time and across large spatial scales. Many assumptions must nevertheless be overcome to make appropriate inference about spatio-temporal variation in population dynamics, such as autocorrelation among data points, excess zeros, and observation error in count data. To address these issues, many scientists and statisticians have recommended the use of Bayesian hierarchical models. Unfortunately, hierarchical statistical models remain somewhat difficult to use because of the necessary quantitative background needed to implement them, or because of the computational demands of using Markov Chain Monte Carlo algorithms to estimate parameters. Fortunately, new tools have recently been developed that make it more feasible for wildlife biologists to fit sophisticated hierarchical Bayesian models (i.e., Integrated Nested Laplace Approximation, ‘INLA’). We present a case study using two important game species in North America, the lesser and greater scaup, to demonstrate how INLA can be used to estimate the parameters in a hierarchical model that decouples observation error from process variation, and accounts for unknown sources of excess zeros as well as spatial and temporal dependence in the data. Ultimately, our goal was to make unbiased inference about spatial variation in population trends over time. 相似文献

6.

Independent component analysis-based penalized discriminant method for tumor classification using gene expression data 总被引：2，自引：0，他引：2

Huang DS Zheng CH 《Bioinformatics (Oxford, England)》2006,22(15):1855-1862

相似文献

7.

Biodiversity gains from efficient use of private sponsorship for flagship species conservation

Joseph R. Bennett Richard Maloney Hugh P. Possingham 《Proceedings. Biological sciences / The Royal Society》2015,282(1805)

To address the global extinction crisis, both efficient use of existing conservation funding and new sources of funding are vital. Private sponsorship of charismatic ‘flagship’ species conservation represents an important source of new funding, but has been criticized as being inefficient. However, the ancillary benefits of privately sponsored flagship species conservation via actions benefiting other species have not been quantified, nor have the benefits of incorporating such sponsorship into objective prioritization protocols. Here, we use a comprehensive dataset of conservation actions for the 700 most threatened species in New Zealand to examine the potential biodiversity gains from national private flagship species sponsorship programmes. We find that private funding for flagship species can clearly result in additional species and phylogenetic diversity conserved, via conservation actions shared with other species. When private flagship species funding is incorporated into a prioritization protocol to preferentially sponsor shared actions, expected gains can be more than doubled. However, these gains are consistently smaller than expected gains in a hypothetical scenario where private funding could be optimally allocated among all threatened species. We recommend integrating private sponsorship of flagship species into objective prioritization protocols to sponsor efficient actions that maximize biodiversity gains, or wherever possible, encouraging private donations for broader biodiversity goals. 相似文献

8.

An entropy-based approach for testing genetic epistasis underlying complex diseases

Kang G Yue W Zhang J Cui Y Zuo Y Zhang D 《Journal of theoretical biology》2008,250(2):362-374

The genetic basis of complex diseases is expected to be highly heterogeneous, with complex interactions among multiple disease loci and environment factors. Due to the multi-dimensional property of interactions among large number of genetic loci, efficient statistical approach has not been well developed to handle the high-order epistatic complexity. In this article, we introduce a new approach for testing genetic epistasis in multiple loci using an entropy-based statistic for a case-only design. The entropy-based statistic asymptotically follows a χ² distribution. Computer simulations show that the entropy-based approach has better control of type I error and higher power compared to the standard χ² test. Motivated by a schizophrenia data set, we propose a method for measuring and testing the relative entropy of a clinical phenotype, through which one can test the contribution or interaction of multiple disease loci to a clinical phenotype. A sequential forward selection procedure is proposed to construct a genetic interaction network which is illustrated through a tree-based diagram. The network information clearly shows the relative importance of a set of genetic loci on a clinical phenotype. To show the utility of the new entropy-based approach, it is applied to analyze two real data sets, a schizophrenia data set and a published malaria data set. Our approach provides a fast and testable framework for genetic epistasis study in a case-only design. 相似文献

9.

Next-generation phenotyping: requirements and strategies for enhancing our understanding of genotype–phenotype relationships and its relevance to crop improvement

Joshua N. Cobb Genevieve DeClerck Anthony Greenberg Randy Clark Susan McCouch 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2013,126(4):867-887

More accurate and precise phenotyping strategies are necessary to empower high-resolution linkage mapping and genome-wide association studies and for training genomic selection models in plant improvement. Within this framework, the objective of modern phenotyping is to increase the accuracy, precision and throughput of phenotypic estimation at all levels of biological organization while reducing costs and minimizing labor through automation, remote sensing, improved data integration and experimental design. Much like the efforts to optimize genotyping during the 1980s and 1990s, designing effective phenotyping initiatives today requires multi-faceted collaborations between biologists, computer scientists, statisticians and engineers. Robust phenotyping systems are needed to characterize the full suite of genetic factors that contribute to quantitative phenotypic variation across cells, organs and tissues, developmental stages, years, environments, species and research programs. Next-generation phenotyping generates significantly more data than previously and requires novel data management, access and storage systems, increased use of ontologies to facilitate data integration, and new statistical tools for enhancing experimental design and extracting biologically meaningful signal from environmental and experimental noise. To ensure relevance, the implementation of efficient and informative phenotyping experiments also requires familiarity with diverse germplasm resources, population structures, and target populations of environments. Today, phenotyping is quickly emerging as the major operational bottleneck limiting the power of genetic analysis and genomic prediction. The challenge for the next generation of quantitative geneticists and plant breeders is not only to understand the genetic basis of complex trait variation, but also to use that knowledge to efficiently synthesize twenty-first century crop varieties. 相似文献

10.

Trials and tribulations of statistical significance in biochemistry and omics

《Trends in biochemical sciences》2023,48(6):503-512

Over recent years many statisticians and researchers have highlighted that statistical inference would benefit from a better use and understanding of hypothesis testing, p-values, and statistical significance. We highlight three recommendations in the context of biochemical sciences. First recommendation: to improve the biological interpretation of biochemical data, do not use p-values (or similar test statistics) as thresholded values to select biomolecules. Second recommendation: to improve comparison among studies and to achieve robust knowledge, perform complete reporting of data. Third recommendation: statistical analyses should be reported completely with exact numbers (not as asterisks or inequalities). Owing to the high number of variables, a better use of statistics is of special importance in omic studies. 相似文献

11.

Living with observational data in biological anthropology

Richard J. Smith 《American journal of physical anthropology》2019,169(4):591-598

The establishment of cause and effect relationships is a fundamental objective of scientific research. Many lines of evidence can be used to make cause–effect inferences. When statistical data are involved, alternative explanations for the statistical relationship need to be ruled out. These include chance (apparent patterns due to random factors), confounding effects (a relationship between two variables because they are each associated with an unmeasured third variable), and sampling bias (effects due to preexisting properties of compared groups). The gold standard for managing these issues is a controlled randomized experiment. In disciplines such as biological anthropology, where controlled experiments are not possible for many research questions, causal inferences are made from observational data. Methods that statisticians recommend for this difficult objective have not been widely adopted in the biological anthropology literature. Issues involved in using statistics to make valid causal inferences from observational data are discussed. 相似文献

12.

Detection of gene x gene interactions in genome-wide association studies of human population data

Musani SK Shriner D Liu N Feng R Coffey CS Yi N Tiwari HK Allison DB 《Human heredity》2007,63(2):67-84

Empirical evidence supporting the commonality of gene x gene interactions, coupled with frequent failure to replicate results from previous association studies, has prompted statisticians to develop methods to handle this important subject. Nonparametric methods have generated intense interest because of their capacity to handle high-dimensional data. Genome-wide association analysis of large-scale SNP data is challenging mathematically and computationally. In this paper, we describe major issues and questions arising from this challenge, along with methodological implications. Data reduction and pattern recognition methods seem to be the new frontiers in efforts to detect gene x gene interactions comprehensively. Currently, there is no single method that is recognized as the 'best' for detecting, characterizing, and interpreting gene x gene interactions. Instead, a combination of approaches with the aim of balancing their specific strengths may be the optimal approach to investigate gene x gene interactions in human data. 相似文献

13.

PLINK: a tool set for whole-genome association and population-based linkage analyses 总被引：63，自引：0，他引：63

下载免费PDF全文

Purcell S Neale B Todd-Brown K Thomas L Ferreira MA Bender D Maller J Sklar P de Bakker PI Daly MJ Sham PC 《American journal of human genetics》2007,81(3):559-575

Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis. 相似文献

14.

随机森林算法基本思想及其在生态学中的应用——以云南松分布模拟为例 总被引：13，自引：0，他引：13

张雷王琳琳张旭东刘世荣孙鹏森王同立《生态学报》2014,34(3):650-659

通常来讲,生态学者对于解释生态关系、描述格局和过程、进行空间或时间预测比较感兴趣。这些工作可以通过模拟输出值(响应)与一些特征值(即解释变量)的关系来实现。然而,生态数据模拟遇到了挑战,这是因为响应变量和预测变量可能是连续变量或离散变量。需要解释的生态关系通常是非线性的,并且解释变量之间具有复杂的相互作用关系。响应变量和解释变量存在缺失值并不是不常有的现象,奇异值也经常出现在生态数据中。此外,生态学者通常希望生态模型即要易于建立又易要于解释。通常是利用多种统计方法来分析处理各种各样情景中出现的独特的生态问题,这些模型包括(多元)逻辑回归、线性模型、生存模型、方差分析等等。随机森林是一个可以处理所有这些问题的有效方法。随机森林可以用来做分类、聚类、回归和生存分析、评估变量的重要性、检测数据中的奇异值、对缺失数据进行插补等。鉴于随机森林本身在算法上的优势,将就随机森林在生态学中的应用进行总结,对建模过程进行概述,并以云南松分布模拟研究为例,对其主要功能特点进行案例展示。通过对随机森林的一般术语、概念和建模思想进行介绍,有利于读者掌握本方法的应用本质,可以预见随机森林在生态学研究中将得到更多的应用和发展。相似文献

15.

The limit fold change model: A practical approach for selecting differentially expressed genes from microarray data

David M Mutch Alvin Berger Robert Mansourian Andreas Rytz Matthew-Alan Roberts 《BMC bioinformatics》2002,3(1):17-11

Background

The biomedical community is developing new methods of data analysis to more efficiently process the massive data sets produced by microarray experiments. Systematic and global mathematical approaches that can be readily applied to a large number of experimental designs become fundamental to correctly handle the otherwise overwhelming data sets. 相似文献

16.

Reader reaction to “Outcome-adaptive lasso: Variable selection for causal inference” by Shortreed and Ertefaie (2017)

Ismaila Baldé Yi Archer Yang Geneviève Lefebvre 《Biometrics》2023,79(1):514-520

Shortreed and Ertefaie introduced a clever propensity score variable selection approach for estimating average causal effects, namely, the outcome adaptive lasso (OAL). OAL aims to select desirable covariates, confounders, and predictors of outcome, to build an unbiased and statistically efficient propensity score estimator. Due to its design, a potential limitation of OAL is how it handles the collinearity problem, which is often encountered in high-dimensional data. As seen in Shortreed and Ertefaie, OAL's performance degraded with increased correlation between covariates. In this note, we propose the generalized OAL (GOAL) that combines the strengths of the adaptively weighted L₁ penalty and the elastic net to better handle the selection of correlated covariates. Two different versions of GOAL, which differ in their procedure (algorithm), are proposed. We compared OAL and GOAL in simulation scenarios that mimic those examined by Shortreed and Ertefaie. Although all approaches performed equivalently with independent covariates, we found that both GOAL versions were more performant than OAL in low and high dimensions with correlated covariates. 相似文献

17.

A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals 总被引：6，自引：0，他引：6

Brian L. Browning Sharon R. Browning 《American journal of human genetics》2009,84(2):210-223

We present methods for imputing data for ungenotyped markers and for inferring haplotype phase in large data sets of unrelated individuals and parent-offspring trios. Our methods make use of known haplotype phase when it is available, and our methods are computationally efficient so that the full information in large reference panels with thousands of individuals is utilized. We demonstrate that substantial gains in imputation accuracy accrue with increasingly large reference panel sizes, particularly when imputing low-frequency variants, and that unphased reference panels can provide highly accurate genotype imputation. We place our methodology in a unified framework that enables the simultaneous use of unphased and phased data from trios and unrelated individuals in a single analysis. For unrelated individuals, our imputation methods produce well-calibrated posterior genotype probabilities and highly accurate allele-frequency estimates. For trios, our haplotype-inference method is four orders of magnitude faster than the gold-standard PHASE program and has excellent accuracy. Our methods enable genotype imputation to be performed with unphased trio or unrelated reference panels, thus accounting for haplotype-phase uncertainty in the reference panel. We present a useful measure of imputation accuracy, allelic R², and show that this measure can be estimated accurately from posterior genotype probabilities. Our methods are implemented in version 3.0 of the BEAGLE software package. 相似文献

18.

Kerfdr: a semi-parametric kernel-based approach to local false discovery rate estimation

Mickael Guedj Stephane Robin Alain Celisse Gregory Nuel 《BMC bioinformatics》2009,10(1):84

Background

The use of current high-throughput genetic, genomic and post-genomic data leads to the simultaneous evaluation of a large number of statistical hypothesis and, at the same time, to the multiple-testing problem. As an alternative to the too conservative Family-Wise Error-Rate (FWER), the False Discovery Rate (FDR) has appeared for the last ten years as more appropriate to handle this problem. However one drawback of FDR is related to a given rejection region for the considered statistics, attributing the same value to those that are close to the boundary and those that are not. As a result, the local FDR has been recently proposed to quantify the specific probability for a given null hypothesis to be true. 相似文献

19.

A stepwise algorithm for finding minimum evolution trees 总被引：7，自引：6，他引：1

Kumar S 《Molecular biology and evolution》1996,13(4):584-593

A stepwise algorithm for reconstructing minimum evolution (ME) trees from evolutionary distance data is proposed. In each step, a taxon that potentially has a neighbor (another taxon connected to it with a single interior node) is first chosen and then its true neighbor searched iteratively. For m taxa, at most (m-1)!/2 trees are examined and the tree with the minimum sum of branch lengths (S) is chosen as the final tree. This algorithm provides simple strategies for restricting the tree space searched and allows us to implement efficient ways of dynamically computing the ordinary least squares estimates of S for the topologies examined. Using computer simulation, we found that the efficiency of the ME method in recovering the correct tree is similar to that of the neighbor-joining method (Saitou and Nei 1987). A more exhaustive search is unlikely to improve the efficiency of the ME method in finding the correct tree because the correct tree is almost always included in the tree space searched with this stepwise algorithm. The new algorithm finds trees for which S values may not be significantly different from that of the ME tree if the correct tree contains very small interior branches or if the pairwise distance estimates have large sampling errors. These topologies form a set of plausible alternatives to the ME tree and can be compared with each other using statistical tests based on the minimum evolution principle. The new algorithm makes it possible to use the ME method for large data sets. 相似文献

20.

distAngsd: Fast and Accurate Inference of Genetic Distances for Next-Generation Sequencing Data

Lei Zhao Rasmus Nielsen Thorfinn Sand Korneliussen 《Molecular biology and evolution》2022,39(6)

Commonly used methods for inferring phylogenies were designed before the emergence of high-throughput sequencing and can generally not accommodate the challenges associated with noisy, diploid sequencing data. In many applications, diploid genomes are still treated as haploid through the use of ambiguity characters; while the uncertainty in genotype calling—arising as a consequence of the sequencing technology—is ignored. In order to address this problem, we describe two new probabilistic approaches for estimating genetic distances: distAngsd-geno and distAngsd-nuc, both implemented in a software suite named distAngsd. These methods are specifically designed for next-generation sequencing data, utilize the full information from the data, and take uncertainty in genotype calling into account. Through extensive simulations, we show that these new methods are markedly more accurate and have more stable statistical behaviors than other currently available methods for estimating genetic distances—even for very low depth data with high error rates. 相似文献