首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Malin is a software package for the analysis of eukaryotic gene structure evolution. It provides a graphical user interface for various tasks commonly used to infer the evolution of exon-intron structure in protein-coding orthologs. Implemented tasks include the identification of conserved homologous intron sites in protein alignments, as well as the estimation of ancestral intron content, lineage-specific intron losses and gains. Estimates are computed either with parsimony, or with a probabilistic model that incorporates rate variation across lineages and intron sites. Availability: Malin is available as a stand-alone Java application, as well as an application bundle for MacOS X, at the website http://www.iro.umontreal.ca/~csuros/introns/malin/. The software is distributed under a BSD-style license.  相似文献   

2.
3.
Large sample theory of semiparametric models based on maximum likelihood estimation (MLE) with shape constraint on the nonparametric component is well studied. Relatively less attention has been paid to the computational aspect of semiparametric MLE. The computation of semiparametric MLE based on existing approaches such as the expectation‐maximization (EM) algorithm can be computationally prohibitive when the missing rate is high. In this paper, we propose a computational framework for semiparametric MLE based on an inexact block coordinate ascent (BCA) algorithm. We show theoretically that the proposed algorithm converges. This computational framework can be applied to a wide range of data with different structures, such as panel count data, interval‐censored data, and degradation data, among others. Simulation studies demonstrate favorable performance compared with existing algorithms in terms of accuracy and speed. Two data sets are used to illustrate the proposed computational method. We further implement the proposed computational method in R package BCA1SG , available at CRAN.  相似文献   

4.
5.
This paper considers questions of standard error and questions of bias in the maximum likelihood estimation of parameters associated with an HLA-linked disease. It is shown that a considerable reduction in standard error is possible using data on population prevalence and parental disease status, if available. Comparison is made with standard errors arising in the shared haplotypes method. The biases considered relate to misspecification of the ascertainment scheme, to incorrect assumptions about parameter values, to the possibility that affected parents have lower fitness than unaffected parents, and to the possibility of within family correlation of penetrance values due to effects of a common environment.  相似文献   

6.
7.
8.
Selected distributional properties of the maximum likelihood estimator and its z-transformation of three familial correlations (parental, parent-offspring, filial) were investigated numerically for the case of nuclear families with variable sibship size. This investigation was based on six different sets of the three correlations, and four different sample sizes, defining 24 sampling conditions, which were replicated 1,000 times each. It was found that the distributional properties of the correlation estimator are affected by the magnitude of the correlations even in large samples although approximate normality is achieved locally. Fisher's z-transformation, here used only in its interclass form, achieves reduction of skewness, stabilization of variance, and approach to normality already in small samples, except for the filial correlation (where it may be deemed inappropriate) in smaller samples. For both the correlation estimator and its z-transformation, the (estimated) relative efficiency was shown to be high (better than 90% in most sampling conditions), suggesting that the estimated minimum variance bound is a satisfactory estimator of the sampling variance. It is concluded that the maximum likelihood estimation of familial correlations under variable sibship size is feasible and, when prudently applied, especially in the form of their z-transformations, provides an appropriate method in analyses of family studies.  相似文献   

9.
Sequencing of eukaryotic genomes allows one to address major evolutionary problems, such as the evolution of gene structure. We compared the intron positions in 684 orthologous gene sets from 8 complete genomes of animals, plants, fungi, and protists and constructed parsimonious scenarios of evolution of the exon-intron structure for the respective genes. Approximately one-third of the introns in the malaria parasite Plasmodium falciparum are shared with at least one crown group eukaryote; this number indicates that these introns have been conserved through >1.5 billion years of evolution that separate Plasmodium from the crown group. Paradoxically, humans share many more introns with the plant Arabidopsis thaliana than with the fly or nematode. The inferred evolutionary scenario holds that the common ancestor of Plasmodium and the crown group and, especially, the common ancestor of animals, plants, and fungi had numerous introns. Most of these ancestral introns, which are retained in the genomes of vertebrates and plants, have been lost in fungi, nematodes, arthropods, and probably Plasmodium. In addition, numerous introns have been inserted into vertebrate and plant genes, whereas, in other lineages, intron gain was much less prominent.  相似文献   

10.
We present a new likelihood method for detecting constrained evolution at synonymous sites and other forms of nonneutral evolution in putative pseudogenes. The model is applicable whenever the DNA sequence is available from a protein-coding functional gene, a pseudogene derived from the protein-coding gene, and an orthologous functional copy of the gene. Two nested likelihood ratio tests are developed to test the hypotheses that (1) the putative pseudogene has equal rates of silent and replacement substitutions; and (2) the rate of synonymous substitution in the functional gene equals the rate of substitution in the pseudogene. The method is applied to a data set containing 74 human processed-pseudogene loci, 25 mouse processed-pseudogene loci, and 22 rat processed-pseudogene loci. Using the informatics resources of the Human Genome Project, we localized 67 of the human-pseudogene pairs in the genome and estimated the GC content of a large surrounding genomic region for each. We find that, for pseudogenes deposited in GC regions similar to those of their paralogs, the assumption of equal rates of silent and replacement site evolution in the pseudogene is upheld; in these cases, the rate of silent site evolution in the functional genes is approximately 70% the rate of evolution in the pseudogene. On the other hand, for pseudogenes located in genomic regions of much lower GC than their functional gene, we see a sharp increase in the rate of silent site substitutions, leading to a large rate of rejection for the pseudogene equality likelihood ratio test.  相似文献   

11.

Background  

The aim of protein design is to predict amino-acid sequences compatible with a given target structure. Traditionally envisioned as a purely thermodynamic question, this problem can also be understood in a wider context, where additional constraints are captured by learning the sequence patterns displayed by natural proteins of known conformation. In this latter perspective, however, we still need a theoretical formalization of the question, leading to general and efficient learning methods, and allowing for the selection of fast and accurate objective functions quantifying sequence/structure compatibility.  相似文献   

12.
A database called eukaryotic intron database (EID) was developed based on the data from GenBank.Studies on the statistical characteristics of EID show that there were 103,848 genes,478,484 introns,and 582,332 exons,with an average of 4.61 introns and 5.61 exons per gene.Introns of 40-120 nt in length were abundant in the database.Results of the statistical analysis on the data from nine model species showed that in eukaryotes,higher species do not necessarily have more introns or exons in a gene than lower species.Furthermore,characteristics of EID,such as intron phase,distribution of different splice sites,and the relationship between genome size and intron proportion or intron density,have been studied.  相似文献   

13.
A database called eukaryotic intron database (EID) was developed based on the data from GenBank. Studies on the statistical characteristics of EID show that there were 103, 848 genes, 478,484 introns, and 582,332 exons, with an average of 4.61 introns and 5.61 exons per gene. Introns of 40–120 nt in length were abundant in the database. Results of the statistical analysis on the data from nine model species showed that in eukaryotes, higher species do not necessarily have more introns or exons in a gene than lower species. Furthermore, characteristics of EID, such as intron phase, distribution of different splice sites, and the relationship between genome size and intron proportion or intron density, have been studied. __________ Translated from Acta Scientiarum Naturalium Universitatis Sunyatseni, 2005, 44(6): 79–82 [译自: 中山大学学报, 2005, 44(6): 79–82]  相似文献   

14.
The accelerated degradation test is commonly used to predict the stability of a biological standard during long-term storage at low temperatures. The analysis of complicated by the fact that the standard generally defines its own unit of activity, so only relative rates of degradation at different temperatures can be observed. A series of Monte Carlo simulation studies is described which has been carried out to investigate the accuracy and precision of estimates based on the statistical method and are of use in assessing the extent to which the size and design of the accelerated degradation test influence the precision of the estimate of the low temperature degradation rate.  相似文献   

15.
16.
Researchers in observational survival analysis are interested in not only estimating survival curve nonparametrically but also having statistical inference for the parameter. We consider right-censored failure time data where we observe n independent and identically distributed observations of a vector random variable consisting of baseline covariates, a binary treatment at baseline, a survival time subject to right censoring, and the censoring indicator. We assume the baseline covariates are allowed to affect the treatment and censoring so that an estimator that ignores covariate information would be inconsistent. The goal is to use these data to estimate the counterfactual average survival curve of the population if all subjects are assigned the same treatment at baseline. Existing observational survival analysis methods do not result in monotone survival curve estimators, which is undesirable and may lose efficiency by not constraining the shape of the estimator using the prior knowledge of the estimand. In this paper, we present a one-step Targeted Maximum Likelihood Estimator (TMLE) for estimating the counterfactual average survival curve. We show that this new TMLE can be executed via recursion in small local updates. We demonstrate the finite sample performance of this one-step TMLE in simulations and an application to a monoclonal gammopathy data.  相似文献   

17.
We consider two-stage sampling designs, including so-called nested case control studies, where one takes a random sample from a target population and completes measurements on each subject in the first stage. The second stage involves drawing a subsample from the original sample, collecting additional data on the subsample. This data structure can be viewed as a missing data structure on the full-data structure collected in the second-stage of the study. Methods for analyzing two-stage designs include parametric maximum likelihood estimation and estimating equation methodology. We propose an inverse probability of censoring weighted targeted maximum likelihood estimator (IPCW-TMLE) in two-stage sampling designs and present simulation studies featuring this estimator.  相似文献   

18.
Recent experimental imaging techniques are able to tag and count molecular populations in a living cell. From these data mathematical models are inferred and calibrated. If small populations are present, discrete-state stochastic models are widely-used to describe the discreteness and randomness of molecular interactions. Based on time-series data of the molecular populations, the corresponding stochastic reaction rate constants can be estimated. This procedure is computationally very challenging, since the underlying stochastic process has to be solved for different parameters in order to obtain optimal estimates. Here, we focus on the maximum likelihood method and estimate rate constants, initial populations and parameters representing measurement errors.  相似文献   

19.
A simple and efficient algorithm is presented for finding a maximum likelihood pedigree using microsatellite (STR) genotype information on a complete sample of related individuals. The computational complexity of the algorithm is at worst (O(n32n)), where n is the number of individuals. Thus it is possible to exhaustively search the space of all pedigrees of up to thirty individuals for one that maximizes the likelihood. A priori age and sex information can be used if available, but is not essential. The algorithm is applied in a simulation study, and to some real data on humans.  相似文献   

20.
A general maximum likelihood discriminant   总被引:3,自引:0,他引:3  
N E Day  D F Kerridge 《Biometrics》1967,23(2):313-323
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号