首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Summary .   Missing data, measurement error, and misclassification are three important problems in many research fields, such as epidemiological studies. It is well known that missing data and measurement error in covariates may lead to biased estimation. Misclassification may be considered as a special type of measurement error, for categorical data. Nevertheless, we treat misclassification as a different problem from measurement error because statistical models for them are different. Indeed, in the literature, methods for these three problems were generally proposed separately given that statistical modeling for them are very different. The problem is more challenging in a longitudinal study with nonignorable missing data. In this article, we consider estimation in generalized linear models under these three incomplete data models. We propose a general approach based on expected estimating equations (EEEs) to solve these three incomplete data problems in a unified fashion. This EEE approach can be easily implemented and its asymptotic covariance can be obtained by sandwich estimation. Intensive simulation studies are performed under various incomplete data settings. The proposed method is applied to a longitudinal study of oral bone density in relation to body bone density.  相似文献   

2.
With the development of functional genomics research, large-scale proteomics studies are now widespread, presenting significant challenges for data storage, exchange, and analysis. Here we present the Integrated Proteomics Exploring Database (IPED) as a platform for managing proteomics experimental data (both process and result data). IPED is based on the schema of the Proteome Experimental Data Repository (PEDRo), and complies with the General Proteomics Standard (GPS) drafted by the Proteomics Standards Committee of the Human Proteome Organization. In our work, we developed three components for the IPED platform: the IPED client editor, IPED server software, and IPED web interface. The client editor collects experimental data and generates an extensible markup language (XML) data file compliant with PEDRo and GPS; the server software parses the XML data file and loads information into a core database; and the web interface displays experimental results, to provide a convenient graphic representation of data. Given software convenience and data abundance, IPED is a powerful platform for data exchange and presents an important resource for the proteomics community. In its current release, IPED is available at http://www. biosino.org/iped2.  相似文献   

3.
Fitting bent lines to data, with applications to allometry   总被引:3,自引:0,他引:3  
Change-point models, in which a linear or non-linear relation is generalized by allowing it to change at a point not fixed in advance, are of growing importance in allometric and other types of modeling. Frequently, the change-point is picked "by eye" and separate regressions are run for each resultant subdomain. This procedure is deficient, however, for the following reasons: first, a repeatable and objective procedure for estimating the change-point has not been used; second, the subsequent analysis usually does not take into account the fact that the change-point is estimated from the data; and last, the usually desirable requirement of continuity at the change-point is ignored. This paper describes various methods for jointly estimating linear relations and the intervening change-point from the data. In the simplest case, with normal errors and a linear relation of one variable upon another, this amounts to fitting a "bent line" via least squares techniques. In addition, tests and graphical diagnostics for the presence of change-points are presented. An example is given where a change-point and slopes are estimated for the relation of running speed with size among land mammals. In the past, these data have been fit with a straight line or a parabola. It is shown here that superior fit and interpretability are achieved using a change-point model.  相似文献   

4.
5.
6.
7.

Background  

Amongst the most commonly used molecular markers for plant phylogenetic studies are the nuclear ribosomal internal transcribed spacers (ITS). Intra-individual variability of these multicopy regions is a very common phenomenon in plants, the causes of which are debated in literature. Phylogenetic reconstruction under these conditions is inherently difficult. Our approach is to consider this problem as a special case of the general biological question of how to infer the characteristics of hosts (represented here by plant individuals) from features of their associates (represented by cloned sequences here).  相似文献   

8.
A microarray experiment includes many steps, and each one of them may include systematic variations. To have a sound analysis, the systematic bias must be identified and removed prior to the data being analyzed. Based on the M-A dependency observed by Dudoit et al. (2002), we suggest that, instead of using the lowess normalization, a new normalization method called ANCOVA be used for dealing with genes with replicates. Simulation studies have shown that the performance of the suggested ANCOVA method is superior to any of the available approaches with regards to the Fisher's Z score and concordance rate. We used a microarray data from bladder cancer to illustrate the application of our approach. The edge the ANCOVA method has over the existing normalization approaches is further confirmed through real-time PCR.  相似文献   

9.
10.
There are certain major obstacles to using motion analysis as an aid to clinical decision making. These include: the difficulty in comprehending large amounts of both corroborating and conflicting information; the subjectivity of data interpretation; the need for visualization; and the quantitative comparison of temporal waveform data. This paper seeks to overcome these obstacles by applying a hybrid approach to the analysis of motion analysis data using principal component analysis (PCA), the Dempster-Shafer (DS) theory of evidence and simplex plots. Specifically, the approach is used to characterise the differences between osteoarthritic (OA) and normal (NL) knee function data and to produce a hierarchy of those variables that are most discriminatory in the classification process. Comparisons of the results obtained with the hybrid approach are made with results from artificial neural network analyses.  相似文献   

11.
Boundaries, data and conservation   总被引:1,自引:1,他引:0  
  相似文献   

12.
13.
SUMMARY: The advent of next-generation sequencing for functional genomics has given rise to quantities of sequence information that are often so large that they are difficult to handle. Moreover, sequence reads from a specific individual can contain sufficient information to potentially identify and genetically characterize that person, raising privacy concerns. In order to address these issues, we have developed the Mapped Read Format (MRF), a compact data summary format for both short and long read alignments that enables the anonymization of confidential sequence information, while allowing one to still carry out many functional genomics studies. We have developed a suite of tools (RSEQtools) that use this format for the analysis of RNA-Seq experiments. These tools consist of a set of modules that perform common tasks such as calculating gene expression values, generating signal tracks of mapped reads and segmenting that signal into actively transcribed regions. Moreover, the tools can readily be used to build customizable RNA-Seq workflows. In addition to the anonymization afforded by MRF, this format also facilitates the decoupling of the alignment of reads from downstream analyses. Availability and implementation: RSEQtools is implemented in C and the source code is available at http://rseqtools.gersteinlab.org/.  相似文献   

14.
15.

Background

Translating a known metabolic network into a dynamic model requires reasonable guesses of all enzyme parameters. In Bayesian parameter estimation, model parameters are described by a posterior probability distribution, which scores the potential parameter sets, showing how well each of them agrees with the data and with the prior assumptions made.

Results

We compute posterior distributions of kinetic parameters within a Bayesian framework, based on integration of kinetic, thermodynamic, metabolic, and proteomic data. The structure of the metabolic system (i.e., stoichiometries and enzyme regulation) needs to be known, and the reactions are modelled by convenience kinetics with thermodynamically independent parameters. The parameter posterior is computed in two separate steps: a first posterior summarises the available data on enzyme kinetic parameters; an improved second posterior is obtained by integrating metabolic fluxes, concentrations, and enzyme concentrations for one or more steady states. The data can be heterogenous, incomplete, and uncertain, and the posterior is approximated by a multivariate log-normal distribution. We apply the method to a model of the threonine synthesis pathway: the integration of metabolic data has little effect on the marginal posterior distributions of individual model parameters. Nevertheless, it leads to strong correlations between the parameters in the joint posterior distribution, which greatly improve the model predictions by the following Monte-Carlo simulations.

Conclusion

We present a standardised method to translate metabolic networks into dynamic models. To determine the model parameters, evidence from various experimental data is combined and weighted using Bayesian parameter estimation. The resulting posterior parameter distribution describes a statistical ensemble of parameter sets; the parameter variances and correlations can account for missing knowledge, measurement uncertainties, or biological variability. The posterior distribution can be used to sample model instances and to obtain probabilistic statements about the model's dynamic behaviour.  相似文献   

16.
Mathematical models are an essential tool in systems biology, linking the behaviour of a system to the interactions between its components. Parameters in empirical mathematical models must be determined using experimental data, a process called regression. Because experimental data are noisy and incomplete, diagnostics that test the structural identifiability and validity of models and the significance and determinability of their parameters are needed to ensure that the proposed models are supported by the available data.  相似文献   

17.
PROTICdb is a web-based application, mainly designed to store and analyze plant proteome data obtained by two-dimensional polyacrylamide gel electrophoresis (2-D PAGE) and mass spectrometry (MS). The purposes of PROTICdb are (i) to store, track, and query information related to proteomic experiments, i.e., from tissue sampling to protein identification and quantitative measurements, and (ii) to integrate information from the user's own expertise and other sources into a knowledge base, used to support data interpretation (e.g., for the determination of allelic variants or products of post-translational modifications). Data insertion into the relational database of PROTICdb is achieved either by uploading outputs of image analysis and MS identification software, or by filling web forms. 2-D PAGE annotated maps can be displayed, queried, and compared through a graphical interface. Links to external databases are also available. Quantitative data can be easily exported in a tabulated format for statistical analyses. PROTICdb is based on the Oracle or the PostgreSQL Database Management System and is freely available upon request at the following URL: http://moulon.inra.fr/ bioinfo/PROTICdb.  相似文献   

18.
19.
Cover-abundance estimates are commonly employed in phytosociological investigations to record the performance of species. Because the coded values are on an ordinal scale of measure, various authors have suggested that some transformation is necessary before such values can be used for classification and ordination. However, it is not clear that transformation is a sufficient treatment, and it would seem preferable to use ordinal data directly. In this paper we examine such direct use of partial rankings and show that several dissimilarity measures can be defined for this case without invoking any transformations. They include dissimilarity measures associated with various rank correlation measures and with distances between strings; all the measure are variant forms of Hausdorf's interset distance. Certain other kinds of data, such as those employing dominant and subdominant species and the dry-weight-rank estimation of biomass, are also on an ordinal scale and could be analysed using similar techniques.To illustrate the approach, a string dissimilarity measure is used to analyse a set of data from Slovakian grasslands which appear to reflect a simple gradient. The original data were recorded with 10 classes of performance and are analysed using hierarchical and nondeterministic, overlapping, classifications.  相似文献   

20.
Analysis of doubly-censored survival data, with application to AIDS   总被引:5,自引:0,他引:5  
This paper proposes nonparametric and weakly structured parametric methods for analyzing survival data in which both the time origin and the failure event can be right- or interval-censored. Such data arise in clinical investigations of the human immunodeficiency virus (HIV) when the infection and clinical status of patients are observed only at several time points. The proposed methods generalize the self-consistency algorithm proposed by Turnbull (1976, Journal of the Royal Statistical Society, Series B 38, 290-295) for singly-censored univariate data, and are illustrated with the results from a study of hemophiliacs who were infected with HIV by contaminated blood factor.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号