首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
King R  Brooks SP  Coulson T 《Biometrics》2008,64(4):1187-1195
SUMMARY: We consider the issue of analyzing complex ecological data in the presence of covariate information and model uncertainty. Several issues can arise when analyzing such data, not least the need to take into account where there are missing covariate values. This is most acutely observed in the presence of time-varying covariates. We consider mark-recapture-recovery data, where the corresponding recapture probabilities are less than unity, so that individuals are not always observed at each capture event. This often leads to a large amount of missing time-varying individual covariate information, because the covariate cannot usually be recorded if an individual is not observed. In addition, we address the problem of model selection over these covariates with missing data. We consider a Bayesian approach, where we are able to deal with large amounts of missing data, by essentially treating the missing values as auxiliary variables. This approach also allows a quantitative comparison of different models via posterior model probabilities, obtained via the reversible jump Markov chain Monte Carlo algorithm. To demonstrate this approach we analyze data relating to Soay sheep, which pose several statistical challenges in fully describing the intricacies of the system.  相似文献   

2.
Giribet, G. 2010. A new dimension in combining data? The use of morphology and phylogenomic data in metazoan systematics. —Acta Zoologica (Stockholm) 91 : 11–19 Animal phylogenies have been traditionally inferred by using the character state information derived from the observation of a diverse array of morphological and anatomical features, but the incorporation of molecular data into the toolkit of phylogenetic characters has shifted drastically the way researchers infer phylogenies. A main reason for this is the ease at which molecular data can be obtained, compared to, e.g., traditional histological and microscopical techniques. Researchers now routinely use genomic data for reconstructing relationships among animal phyla (using whole genomes or Expressed Sequence Tags) but the amount of morphological data available to study the same phylogenetic patterns has not grown accordingly. Given the disparity between the amounts of molecular and morphological data, some authors have questioned entire morphological programs. In this review I discuss issues related to the combinability of genomic and morphological data, the informativeness of each set of characters, and conclude with a discussion of how morphology could be made scalable by utilizing new techniques that allow for non‐intrusive examination of large amounts of preserved museum specimens. Morphology should therefore remains a strong field in evolutionary and comparative biology, as it continues to provide information for inferring phylogenetic patterns, is an important complement for the patterns derived from the molecular data, and it is the common nexus that allows studying fossil taxa with large data sets of molecular data.  相似文献   

3.
S. Mandal  J. Qin  R.M. Pfeiffer 《Biometrics》2023,79(3):1701-1712
We propose and study a simple and innovative non-parametric approach to estimate the age-of-onset distribution for a disease from a cross-sectional sample of the population that includes individuals with prevalent disease. First, we estimate the joint distribution of two event times, the age of disease onset and the survival time after disease onset. We accommodate that individuals had to be alive at the time of the study by conditioning on their survival until the age at sampling. We propose a computationally efficient expectation–maximization (EM) algorithm and derive the asymptotic properties of the resulting estimates. From these joint probabilities we then obtain non-parametric estimates of the age-at-onset distribution by marginalizing over the survival time after disease onset to death. The method accommodates categorical covariates and can be used to obtain unbiased estimates of the covariate distribution in the source population. We show in simulations that our method performs well in finite samples even under large amounts of truncation for prevalent cases. We apply the proposed method to data from female participants in the Washington Ashkenazi Study to estimate the age-at-onset distribution of breast cancer associated with carrying BRCA1 or BRCA2 mutations.  相似文献   

4.
Use of historical data and real-world evidence holds great potential to improve the efficiency of clinical trials. One major challenge is to effectively borrow information from historical data while maintaining a reasonable type I error and minimal bias. We propose the elastic prior approach to address this challenge. Unlike existing approaches, this approach proactively controls the behavior of information borrowing and type I errors by incorporating a well-known concept of clinically significant difference through an elastic function, defined as a monotonic function of a congruence measure between historical data and trial data. The elastic function is constructed to satisfy a set of prespecified criteria such that the resulting prior will strongly borrow information when historical and trial data are congruent, but refrain from information borrowing when historical and trial data are incongruent. The elastic prior approach has a desirable property of being information borrowing consistent, that is, asymptotically controls type I error at the nominal value, no matter that historical data are congruent or not to the trial data. Our simulation study that evaluates the finite sample characteristic confirms that, compared to existing methods, the elastic prior has better type I error control and yields competitive or higher power. The proposed approach is applicable to binary, continuous, and survival endpoints.  相似文献   

5.
In 1942, Ingold documented an ecologically defined group of fungi, aquatic hyphomycetes, on autumn-shed leaves decaying in streams. They were shown to be vital intermediaries between the nutritionally poor leaf substratum and leaf-eating invertebrates. Research has subsequently emphasized functional aspects such as leaf decomposition and nutritional conditioning by fungi. Structural aspects (community composition) have attracted less attention, partly because of the difficulties of identifying fungal mycelia in situ. Extraction, amplification (PCR, qPCR) and characterization of DNA and RNA, and, more recently, of proteins, allow much greater insights into the presence of fungal taxa, their metabolic status (dead, dormant or active), and their potential and actual participation in decomposition processes. This approach can yield huge amounts of data, and major challenges today are the development and application of suitable bioinformatics techniques. The complexity of data collection and evaluation favour interdisciplinary teams of researchers. Fungi are major players in most ecosystems and are increasingly affected by human impacts. Changing land use, eutrophication/pollution and climate change are among the major factors that affect diversity and ecological functions of aquatic hyphomycetes.  相似文献   

6.
In this article, we address a missing data problem that occurs in transplant survival studies. Recipients of organ transplants are followed up from transplantation and their survival times recorded, together with various explanatory variables. Due to differences in data collection procedures in different centers or over time, a particular explanatory variable (or set of variables) may only be recorded for certain recipients, which results in this variable being missing for a substantial number of records in the data. The variable may also turn out to be an important predictor of survival and so it is important to handle this missing-by-design problem appropriately. Consensus in the literature is to handle this problem with complete case analysis, as the missing data are assumed to arise under an appropriate missing at random mechanism that gives consistent estimates here. Specifically, the missing values can reasonably be assumed not to be related to the survival time. In this article, we investigate the potential for multiple imputation to handle this problem in a relevant study on survival after kidney transplantation, and show that it comprehensively outperforms complete case analysis on a range of measures. This is a particularly important finding in the medical context as imputing large amounts of missing data is often viewed with scepticism.  相似文献   

7.
Proteomic expression patterns derived from mass spectrometry have been put forward as potential biomarkers for the early diagnosis of cancer and other diseases. This approach has generated much excitement and has led to a large number of new experiments and vast amounts of new data. The data, derived at great expense, can have very little value if careful attention is not paid to the experimental design and analysis. Using examples from surface-enhanced laser desorption/ionisation time-of-flight (SELDI-TOF) and matrix-assisted laser desorption-ionisation/time-of-flight (MALDI-TOF) experiments, we describe several experimental design issues that can corrupt a dataset. Fortunately, the problems we identify can be avoided if attention is paid to potential sources of bias before the experiment is run. With an appropriate experimental design, proteomics technology can be a useful tool for discovering important information relating protein expression to disease.  相似文献   

8.
Molecular and functional profiling of cancer cell lines is subject to laboratory‐specific experimental practices and data analysis protocols. The current challenge therefore is how to make an integrated use of the omics profiles of cancer cell lines for reliable biological discoveries. Here, we carried out a systematic analysis of nine types of data modalities using meta‐analysis of 53 omics studies across 12 research laboratories for 2,018 cell lines. To account for a relatively low consistency observed for certain data modalities, we developed a robust data integration approach that identifies reproducible signals shared among multiple data modalities and studies. We demonstrated the power of the integrative analyses by identifying a novel driver gene, ECHDC1, with tumor suppressive role validated both in breast cancer cells and patient tumors. The multi‐modal meta‐analysis approach also identified synthetic lethal partners of cancer drivers, including a co‐dependency of PTEN deficient endometrial cancer cells on RNA helicases.  相似文献   

9.
Assessment of risk to infants and children resulting from the ingestion of contaminants in water is an important component of the analysis of possible environmental hazards. Children and infants represent a sensitive life stage because exposure to contaminants in early life can have developmental and long-lasting adverse effects. Children and infants tend to ingest relatively large amounts of water on a bodyweight-adjusted basis, especially those that are fed in early life with formula that is re-constituted or diluted with water. This article presents statistical estimates of the amounts of community water ingested by formula-fed infants based on nationwide sample survey data that support identification of respondents who consume formula and the amounts of water they ingest. Included are specific estimates of the amounts of community water ingested in formula. Estimates of total community water ingestion by children and infants who consume formula can be especially useful in exposure assessment since they represent a highly exposed population. For example, mean community water ingestion by infants 1 to 3 months of age who consume formula is 627 mL/day (136 mL/kg/day) and the 95th percentile is 1096 mL/day (290 mL/kg/day).  相似文献   

10.
The current global challenges that threaten biodiversity are immense and rapidly growing. These biodiversity challenges demand approaches that meld bioinformatics, large-scale phylogeny reconstruction, use of digitized specimen data, and complex post-tree analyses (e.g. niche modeling, niche diversification, and other ecological analyses). Recent developments in phylogenetics coupled with emerging cyberinfrastructure and new data sources provide unparalleled opportunities for mobilizing and integrating massive amounts of biological data, driving the discovery of complex patterns and new hypotheses for further study. These developments are not trivial in that biodiversity data on the global scale now being collected and analyzed are inherently complex. The ongoing integration and maturation of biodiversity tools discussed here is transforming biodiversity science, enabling what we broadly term “next-generation” investigations in systematics, ecology, and evolution (i.e., “biodiversity science”). New training that integrates domain knowledge in biodiversity and data science skills is also needed to accelerate research in these areas. Integrative biodiversity science is crucial to the future of global biodiversity. We cannot simply react to continued threats to biodiversity, but via the use of an integrative, multifaceted, big data approach, researchers can now make biodiversity projections to provide crucial data not only for scientists, but also for the public, land managers, policy makers, urban planners, and agriculture.  相似文献   

11.
生物多样性数据共享和发表: 进展和建议   总被引:1,自引:0,他引:1  
生物多样性研究、保护实践、自然资源管理及科学决策等越来越依赖于大量数据的共享和整合。虽然关于数据共享的呼吁和实践越来越多, 但很多科学家仍然主动或被动地拒绝共享数据。关于数据共享, 现实中存在一些认知和技术上的障碍, 比如科学家不愿意共享数据, 担心同行竞争, 认为缺少足够的回报, 不熟悉相关数据保存机构, 缺少简便的数据提交工具, 没有足够时间和经费等。解决这些问题及改善共享文化的关键在于使共享者获得适当的回报(比如数据引用)。基于同行评审的数据发表被认为不但能够为生产、管理和共享数据的科学家提供一种激励机制, 并且能够有效地促进数据再利用。因而, 数据发表作为数据共享的方式之一, 近来引起了较多关注, 在生物多样性领域出现了专门发表数据论文的期刊。在采取数据论文的模式上, 数据保存机构和科技期刊采用联合数据政策在促进数据共享方面可能更具可行性。本文总结了数据共享和发表方面的进展, 讨论了数据论文能在何种程度上促进数据共享, 以及数据共享和数据发表的关系等问题, 提出如下建议: (1)个体科学家应努力践行数据共享; (2)使用DOI号解决数据所有权和数据引用的问题; (3)科技期刊和数据保存机构联合采用更加合理和严格的数据保存政策; (4)资助机构和研究单位应当在数据共享中起到更重要的作用。  相似文献   

12.
We develop an approach, based on multiple imputation, to using auxiliary variables to recover information from censored observations in survival analysis. We apply the approach to data from an AIDS clinical trial comparing ZDV and placebo, in which CD4 count is the time-dependent auxiliary variable. To facilitate imputation, a joint model is developed for the data, which includes a hierarchical change-point model for CD4 counts and a time-dependent proportional hazards model for the time to AIDS. Markov chain Monte Carlo methods are used to multiply impute event times for censored cases. The augmented data are then analyzed and the results combined using standard multiple-imputation techniques. A comparison of our multiple-imputation approach to simply analyzing the observed data indicates that multiple imputation leads to a small change in the estimated effect of ZDV and smaller estimated standard errors. A sensitivity analysis suggests that the qualitative findings are reproducible under a variety of imputation models. A simulation study indicates that improved efficiency over standard analyses and partial corrections for dependent censoring can result. An issue that arises with our approach, however, is whether the analysis of primary interest and the imputation model are compatible.  相似文献   

13.
14.
We present a novel decomposition of nonnegative functional count data that draws on concepts from nonnegative matrix factorization. Our decomposition, which we refer to as NARFD (nonnegative and regularized function decomposition), enables the study of patterns in variation across subjects in a highly interpretable manner. Prototypic modes of variation are estimated directly on the observed scale of the data, are local, and are transparently added together to reconstruct observed functions. This contrasts with generalized functional principal component analysis, an alternative approach that estimates functional principal components on a transformed scale, produces components that typically vary across the entire functional domain, and reconstructs observations using complex patterns of cancellation and multiplication of functional principal components. NARFD is implemented using an alternating minimization algorithm, and we evaluate our approach in simulations. We apply NARFD to an accelerometer dataset comprising observations of physical activity for healthy older Americans.  相似文献   

15.
There are certain major obstacles to using motion analysis as an aid to clinical decision making. These include: the difficulty in comprehending large amounts of both corroborating and conflicting information; the subjectivity of data interpretation; the need for visualization; and the quantitative comparison of temporal waveform data. This paper seeks to overcome these obstacles by applying a hybrid approach to the analysis of motion analysis data using principal component analysis (PCA), the Dempster-Shafer (DS) theory of evidence and simplex plots. Specifically, the approach is used to characterise the differences between osteoarthritic (OA) and normal (NL) knee function data and to produce a hierarchy of those variables that are most discriminatory in the classification process. Comparisons of the results obtained with the hybrid approach are made with results from artificial neural network analyses.  相似文献   

16.
Albert PS 《Biometrics》2000,56(2):602-608
Binary longitudinal data are often collected in clinical trials when interest is on assessing the effect of a treatment over time. Our application is a recent study of opiate addiction that examined the effect of a new treatment on repeated urine tests to assess opiate use over an extended follow-up. Drug addiction is episodic, and a new treatment may affect various features of the opiate-use process such as the proportion of positive urine tests over follow-up and the time to the first occurrence of a positive test. Complications in this trial were the large amounts of dropout and intermittent missing data and the large number of observations on each subject. We develop a transitional model for longitudinal binary data subject to nonignorable missing data and propose an EM algorithm for parameter estimation. We use the transitional model to derive summary measures of the opiate-use process that can be compared across treatment groups to assess treatment effect. Through analyses and simulations, we show the importance of properly accounting for the missing data mechanism when assessing the treatment effect in our example.  相似文献   

17.
Harrington ED  Jensen LJ  Bork P 《FEBS letters》2008,582(8):1251-1258
Continuing improvements in DNA sequencing technologies are providing us with vast amounts of genomic data from an ever-widening range of organisms. The resulting challenge for bioinformatics is to interpret this deluge of data and place it back into its biological context. Biological networks provide a conceptual framework with which we can describe part of this context, namely the different interactions that occur between the molecular components of a cell. Here, we review the computational methods available to predict biological networks from genomic sequence data and discuss how they relate to high-throughput experimental methods.  相似文献   

18.
This paper develops an approach to protein backbone NMR assignment that effectively assigns large proteins while using limited sets of triple-resonance experiments. Our approach handles proteins with large fractions of missing data and many ambiguous pairs of pseudoresidues, and provides a statistical assessment of confidence in global and position-specific assignments. The approach is tested on an extensive set of experimental and synthetic data of up to 723 residues, with match tolerances of up to 0.5 ppm for and resonance types. The tests show that the approach is particularly helpful when data contain experimental noise and require large match tolerances. The keys to the approach are an empirical Bayesian probability model that rigorously accounts for uncertainty in the data at all stages in the analysis, and a hybrid stochastic tree-based search algorithm that effectively explores the large space of possible assignments.  相似文献   

19.
Aim Site occupancy probabilities of target species are commonly used in various ecological studies, e.g. to monitor current status and trends in biodiversity. Detection error introduces bias in the estimators of site occupancy. Existing methods for estimating occupancy probability in the presence of detection error use replicate surveys. These methods assume population closure, i.e. the site occupancy status remains constant across surveys, and independence between surveys. We present an approach for estimating site occupancy probability in the presence of detection error that requires only a single survey and does not require assumption of population closure or independence. In place of the closure assumption, this method requires covariates that affect detection and occupancy.Methods Penalized maximum-likelihood method was used to estimate the parameters. Estimability of the parameters was checked using data cloning. Parametric boostrapping method was used for computing confidence intervals.Important findings The single-survey approach facilitates analysis of historical datasets where replicate surveys are unavailable, situations where replicate surveys are expensive to conduct and when the assumptions of closure or independence are not met. This method saves significant amounts of time, energy and money in ecological surveys without sacrificing statistical validity. Further, we show that occupancy and habitat suitability are not synonymous and suggest a method to estimate habitat suitability using single-survey data.  相似文献   

20.
Photo-identification is a commonly used non-invasive technique that has been profitably employed in biological studies throughout the years. It starts from the assumption that a single individual can be recognized in multiple photos captured at different times by exploiting its unique representative and visible physical qualities such as marks, notches or any other definite feature. Hence, photo-identification is performed to infer knowledge about wild species' spatial and temporal distributions as well as population dynamics, thus providing valuable information especially when the species being investigated is ranked as data deficient. Furthermore, the technological improvements of the last decades and the large availability of devices with powerful computing capabilities are driving the research towards a common goal of enriching bio-ecological studies with innovative computer science approaches. In this scenario, computer vision plays a fundamental role, as it can successfully assist researchers in the analysis of large amounts of data. The aim of this paper is, in fact, to effectively provide a computer vision approach for the photo-identification of the Risso's dolphin, exploiting specific visual cues with a feature-based approach relying on SIFT and SURF feature detectors. The experiments have been conducted on image data acquired in the Gulf of Taranto from 2013 to 2017, conducting a comparative analysis of the performance of both SIFT and SURF, as well as a comparison with the state-of-the-art software DARWIN, and they proved the effectiveness of the proposed approach and suggested its application would be suitable to large scale studies. In conclusion, this paper shows an innovative computer vision application for the identification of unknown Risso's dolphin individuals that relies on a feature-based automated approach. The results suggest that the proposed approach can efficiently assist researchers during the photo-identification task of large amounts of data collected in such a challenging domain.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号