首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
As an alternative to parsimony analyses, stochastic models have been proposed ( [Lewis, 2001] and [Nylander et al., 2004]) for morphological characters, so that maximum likelihood or Bayesian analyses may be used for phylogenetic inference. A key feature of these models is that they account for ascertainment bias, in that only varying, or parsimony-informative characters are observed. However, statistical consistency of such model-based inference requires that the model parameters be identifiable from the joint distribution they entail, and this issue has not been addressed.Here we prove that parameters for several such models, with finite state spaces of arbitrary size, are identifiable, provided the tree has at least eight leaves. If the tree topology is already known, then seven leaves suffice for identifiability of the numerical parameters. The method of proof involves first inferring a full distribution of both parsimony-informative and non-informative pattern joint probabilities from the parsimony-informative ones, using phylogenetic invariants. The failure of identifiability of the tree parameter for four-taxon trees is also investigated.  相似文献   

2.
Currently the bottom up approach is the most popular for characterizing protein samples by mass spectrometry. This is mainly attributed to the fact that the bottom up approach has been successfully optimized for high throughput studies. However, the bottom up approach is associated with a number of challenges such as loss of linkage information between peptides. Previous publications have addressed some of these problems which are commonly referred to as protein inference. Nevertheless, all previous publications on the subject are oversimplified and do not represent the full complexity of the proteins identified. To this end we present here SIR (spectra based isoform resolver) that uses a novel transparent and systematic approach for organizing and presenting identified proteins based on peptide spectra assignments. The algorithm groups peptides and proteins into five evidence groups and calculates sixteen parameters for each identified protein that are useful for cases where deterministic protein inference is the goal. The novel approach has been incorporated into SIR which is a user-friendly tool only concerned with protein inference based on imports of Mascot search results. SIR has in addition two visualization tools that facilitate further exploration of the protein inference problem.  相似文献   

3.
Fitting Cox's proportional hazards models from survey data   总被引:4,自引:0,他引:4  
BINDER  DAVID A. 《Biometrika》1992,79(1):139-147
  相似文献   

4.
5.
In this work, the commonly used algorithms for mass spectrometry based protein identification, Mascot, MS-Fit, ProFound and SEQUEST, were studied in respect to the selectivity and sensitivity of their searches. The influence of various search parameters were also investigated. Approximately 6600 searches were performed using different search engines with several search parameters to establish a statistical basis. The applied mass spectrometric data set was chosen from a current proteome study. The huge amount of data could only be handled with computational assistance. We present a software solution for fully automated triggering of several peptide mass fingerprinting (PMF) and peptide fragmentation fingerprinting (PFF) algorithms. The development of this high-throughput method made an intensive evaluation based on data acquired in a typical proteome project possible. Previous evaluations of PMF and PFF algorithms were mainly based on simulations.  相似文献   

6.
For large data sets, it can be difficult or impossible to fit models with random effects using standard algorithms due to memory limitations or high computational burdens. In addition, it would be advantageous to use the abundant information to relax assumptions, such as normality of random effects. Motivated by data from an epidemiologic study of childhood growth, we propose a 2-stage method for fitting semiparametric random effects models to longitudinal data with many subjects. In the first stage, we use a multivariate clustering method to identify G相似文献   

7.
Fitting regression models to case-control data by maximum likelihood   总被引:3,自引:0,他引:3  
SCOTT  A. J.; WILD  C. J. 《Biometrika》1997,84(1):57-71
  相似文献   

8.
Populations that are structured into small local patches are a common feature of ecological and epidemiological systems. Models describing this structure are often referred to as metapopulation models in ecology or household models in epidemiology. Small local populations are subject to demographic stochasticity. Theoretical studies of household disease models without resistant stages (SIS models) have shown that local stochasticity can be ignored for between patch disease transmission if the number of connected patches is large. In that case the distribution of the number of infected individuals per household reaches a stationary distribution described by a birth-death process with a constant immigration term. Here we show how this result, in conjunction with the balancing condition for birth-death processes, provides a framework to estimate demographic parameters from a frequency distribution of local population sizes. The parameter estimation framework is applicable to estimate parameters of disease transmission models as well as metapopulation models.  相似文献   

9.
Satten GA  Sternberg MR 《Biometrics》1999,55(2):507-513
In a semi-Markov model, the hazard of making a transition between stages depends on the time spent in the current stage but is independent of time spent in other stages. If the initiation time (time of entry into the network) is not known for some persons and if transition time data are interval censored (i.e., if transition times are not known exactly but are known only to have occurred in some interval), then the length of time these persons spent in any stage is not known. We show how a semi-Markov model can still be fit to interval-censored data with missing initiation times. For the special case of models in which all persons enter the network at the same initial stage and proceed through the same succession of stages to a unique absorbing stage, we present discrete-time nonparametric maximum likelihood estimators of the waiting-time distributions for this type of data.  相似文献   

10.
We analysed the roles and distribution of metal ions in enzymatic catalysis using available public databases and our new resource Metal-MACiE (). In Metal-MACiE, a database of metal-based reaction mechanisms, 116 entries covering 21% of the metal-dependent enzymes and 70% of the types of enzyme-catalysed chemical transformations are annotated according to metal function. We used Metal-MACiE to assess the functions performed by metals in biological catalysis and the relative frequencies of different metals in different roles, which can be related to their individual chemical properties and availability in the environment. The overall picture emerging from the overview of Metal-MACiE is that redox-inert metal ions are used in enzymes to stabilize negative charges and to activate substrates by virtue of their Lewis acid properties, whereas redox-active metal ions can be used both as Lewis acids and as redox centres. Magnesium and zinc are by far the most common ions of the first type, while calcium is relatively less used. Magnesium, however, is most often bound to phosphate groups of substrates and interacts with the enzyme only transiently, whereas the other metals are stably bound to the enzyme. The most common metal of the second type is iron, which is prevalent in the catalysis of redox reactions, followed by manganese, cobalt, molybdenum, copper and nickel. The control of the reactivity of redox-active metal ions may involve their association with organic cofactors to form stable units. This occurs sometimes for iron and nickel, and quite often for cobalt and molybdenum. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

11.
Summary .   Frailty models are widely used to model clustered survival data. Classical ways to fit frailty models are likelihood-based. We propose an alternative approach in which the original problem of "fitting a frailty model" is reformulated into the problem of "fitting a linear mixed model" using model transformation. We show that the transformation idea also works for multivariate proportional odds models and for multivariate additive risks models. It therefore bridges segregated methodologies as it provides a general way to fit conditional models for multivariate survival data by using mixed models methodology. To study the specific features of the proposed method we focus on frailty models. Based on a simulation study, we show that the proposed method provides a good and simple alternative for fitting frailty models for data sets with a sufficiently large number of clusters and moderate to large sample sizes within covariate-level subgroups in the clusters. The proposed method is applied to data from 27 randomized trials in advanced colorectal cancer, which are available through the Meta-Analysis Group in Cancer.  相似文献   

12.
13.
Chemical and biological data from more than 5,000 lakes in 20 European countries have been compiled into databases within the EU project REBECCA. The project’s purpose was to provide scientific support for implementation of the EU Water Framework Directive (WFD). The databases contain the biological elements phytoplankton, macrophytes, macroinvertebrates and fish, together with relevant chemistry data and station information. The common database strategy has enabled project partners to perform analyses of chemical–biological relationships and to describe reference conditions for large geographic regions in Europe. This strategy has obvious benefits compared with single-country analyses: results will be more representative for larger European regions, and the statistical power and precision will be larger. The high number of samples within some regions has also enabled analysis of type-specific relationships for several lake types. These results are essential for the intercalibration of ecological assessment systems for lakes, as required by the WFD. However, the common database approach has also involved costs and limitations. The data process has been resource-demanding, and the requirements for a flexible database structure have made it less user-friendly for project partners. Moreover, there are considerable heterogeneities among datasets from different countries regarding sampling methods and taxonomic precision; this may reduce comparability of the data and increase the uncertainty of the results. This article gives an overview of the contents and functions of the REBECCA Lakes databases, and of our experiences from constructing and using the databases. We conclude with recommendations for compilation of environmental data for future international projects.  相似文献   

14.
15.
《Mycoscience》2014,55(6):456-461
Two new species, Melanoleuca leucopoda and M. porphyropoda, are described based on collections made from Shenyang City, Liaoning Province, China. Melanoleuca leucopoda is mainly characterized by its whitish stipe with fibrils and oblong spores with elongated warts. Melanoleuca porphyropoda differs from all other Melanoleuca species in lacking cystidia and in having decurrent gills and a purplish stipe. The sequences of internal transcribed spacer regions (ITS1-5.8S-ITS2) of Melanoleuca species were analyzed and the results indicated that two new species clustered into two clades and differed from the other species of the genus. The combination of morphological and molecular data confirmed that the two fungi are new species. The morphological similarity of the new species to other species of Melanoleuca and the systematic position of the two species based on molecular data are also discussed.  相似文献   

16.
Xia D  Ghali F  Gaskell SJ  O'Cualain R  Sims PF  Jones AR 《Proteomics》2012,12(12):1912-1916
The development of ion mobility (IM) MS instruments has the capability to provide an added dimension to peptide analysis pipelines in proteomics, but, as yet, there are few software tools available for analysing such data. IM can be used to provide additional separation of parent ions or product ions following fragmentation. In this work, we have created a set of software tools that are capable of converting three dimensional IM data generated from analysis of fragment ions into a variety of formats used in proteomics. We demonstrate that IM can be used to calculate the charge state of a fragment ion, demonstrating the potential to improve peptide identification by excluding non-informative ions from a database search. We also provide preliminary evidence of structural differences between b and y ions for certain peptide sequences but not others. All software tools and data sets are made available in the public domain at http://code.google.com/p/ion-mobility-ms-tools/.  相似文献   

17.
The use of diploid sequence markers is still challenging despite the good quality of the information they provide. There is a common problem to all sequencing approaches [traditional cloning and sequencing of PCR amplicons as well as next-generation sequencing (NGS)]: when no variation is found within the sequences from a given individual, homozygozity can never be asserted with certainty. As a consequence, sequence data from diploid markers are mostly analysed at the population (not the individual level) particularly in animal studies. This study aims at contributing to solve this. Using the Bayes theorem and the binomial law, useful results are derived, among which: (i) the number of sequence reads per individual (or sequencing depth) which is required to ensure, at a given probability threshold, that some heterozygotes are not considered erroneously as homozygotes, as a function of the observed heterozygozity (H(o) ) of the locus in the population; (ii) a way of estimating H(o) from low coverage NGS data; (iii) a way of testing the null hypothesis that a genetic marker corresponds to a single and diploid locus, in the absence of data from controlled crosses; (iv) strategies for characterizing sequence genotypes in populations minimizing the average number of sequence reads per individual; (v) a rationale to decide which are the variations that one needs to consider along the sequence, as a function of the sequencing depth affordable, the level of polymorphism desired and the risk of sequencing error. For traditional sequencing technology, optimal strategies appear surprisingly different from the usual empirical ones. The average number of sequence reads required to obtain 99% of fully determined genotypes never exceeds six, this value corresponding to the worst situation when H(o) equals 0.6. This threshold value of H(o) is strikingly stable when the tolerated proportion of nonfully resolved genotypes varies in a reasonable range. These results do not rely on the Hardy-Weinberg equilibrium assumption or on diallelism of nucleotidic sites.  相似文献   

18.
19.
Population multiple components is a statistical tool useful for the analysis of time-dependent hybrid data. With a small number of parameters, it is possible to model and to predict the periodic behavior of a population. In this article, we propose two methods to compare among populations rhythmometric parameters obtained by multiple component analysis. The first is a parametric method based in the usual statistical techniques for comparison of mean vectors in multivariate normal populations. The method, through MANOVA analysis, allows comparison of the MESOR and amplitude-acrophase pair of each component among two or more populations. The second is a nonparametric method, based in bootstrap techniques, to compare parameters from two populations. This test allows one to compare the MESOR, the amplitude, and the acrophase of each fitted component, as well as the global amplitude, orthophase, and bathyphase estimated when all fitted components are harmonics of a fundamental period. The idea is to calculate a confidence interval for the difference of the parameters of interest. If this interval does not contain zero, it can be concluded that the parameters from the two models are different with high probability. An estimation of p-value for the corresponding test can also be calculated. Both methods are illustrated with an example, based on clinical data. The nonparametric test can also be applied to paired data, a special situation of great interest in practice. By the use of similar bootstrap techniques, we illustrate how to construct confidence intervals for any rhythmometric parameter estimated from population multiple components models, including the orthophase, bathyphase, and global amplitude. These tests for comparison of parameters among populations are a needed tool when modeling the nonsinusoidal rhythmic behavior of hybrid data by population multiple component analysis.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号