首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present a method for automatic full-precision alignment of the images in a tomographic tilt series. Full-precision automatic alignment of cryo electron microscopy images has remained a difficult challenge to date, due to the limited electron dose and low image contrast. These facts lead to poor signal to noise ratio (SNR) in the images, which causes automatic feature trackers to generate errors, even with high contrast gold particles as fiducial features. To enable fully automatic alignment for full-precision reconstructions, we frame the problem probabilistically as finding the most likely particle tracks given a set of noisy images, using contextual information to make the solution more robust to the noise in each image. To solve this maximum likelihood problem, we use Markov Random Fields (MRF) to establish the correspondence of features in alignment and robust optimization for projection model estimation. The resulting algorithm, called Robust Alignment and Projection Estimation for Tomographic Reconstruction, or RAPTOR, has not needed any manual intervention for the difficult datasets we have tried, and has provided sub-pixel alignment that is as good as the manual approach by an expert user. We are able to automatically map complete and partial marker trajectories and thus obtain highly accurate image alignment. Our method has been applied to challenging cryo electron tomographic datasets with low SNR from intact bacterial cells, as well as several plastic section and X-ray datasets.  相似文献   

2.
In solution NMR spectroscopy the residual dipolar coupling (RDC) is invaluable in improving both the precision and accuracy of NMR structures during their structural refinement. The RDC also provides a potential to determine protein structure de novo. These procedures are only effective when an accurate estimate of the alignment tensor has already been made. Here we present a top–down approach, starting from the secondary structure elements and finishing at the residue level, for RDC data analysis in order to obtain a better estimate of the alignment tensor. Using only the RDCs from N–H bonds of residues in α-helices and CA–CO bonds in β-strands, we are able to determine the offset and the approximate amplitude of the RDC modulation-curve for each secondary structure element, which are subsequently used as targets for global minimization. The alignment order parameters and the orientation of the major principal axis of individual helix or strand, with respect to the alignment frame, can be determined in each of the eight quadrants of a sphere. The following minimization against RDC of all residues within the helix or strand segment can be carried out with fixed alignment order parameters to improve the accuracy of the orientation. For a helical protein Bax, the three components A xx , A yy and A zz , of the alignment order can be determined with this method in average to within 2.3% deviation from the values calculated with the available atomic coordinates. Similarly for β-sheet protein Ubiquitin they agree in average to within 8.5%. The larger discrepancy in β-strand parameters comes from both the diversity of the β-sheet structure and the lower precision of CA–CO RDCs. This top-down approach is a robust method for alignment tensor estimation and also holds a promise for providing a protein topological fold using limited sets of RDCs.  相似文献   

3.
The evolutionary history of living African amphibians remains poorly understood. This study estimates the phylogeny within the frog genera Arthroleptis and Cardioglossa using approximately 2400 bases of mtDNA sequence data (12S, tRNA-Valine, and 16S genes) from half of the described species. Analyses are conducted using parsimony, maximum likelihood, and Bayesian methods. The effect of alignment on phylogeny estimation is explored by separately analyzing alignments generated with different gap costs and a consensus alignment. The consensus alignment results in species paraphyly, low nodal support, and incongruence with the results based on other alignments, which produced largely similar results. Most nodes in the phylogeny are highly supported, yet several topologies are inconsistent with previous hypotheses. The monophyly of Cardioglossa and of miniature species previously assigned to Schoutedenella was further examined using Templeton and Shimodaira–Hasegawa tests. Cardioglossa monophyly is rejected and C. aureoli is transferred to Arthroleptis. These tests do not reject Schoutedenella monophyly, but this hypothesis receives no support from non-parametric bootstrapping or Bayesian posterior probabilities. This phylogeny provides a framework for reconstructing historical biogeography and analyzing the evolution of body size and life history. Direct development and miniaturization appear at the base of Arthroleptis phylogeny concomitant with a range expansion from Central Africa to throughout most of sub-Saharan Africa.  相似文献   

4.
Previous debate about statistical variation in inferred phylogenies has focused on procedures for the estimation of evolutionary relationships from aligned sequences. Morrison and Ellis1 have recently drawn attention to additional variation attributable to the alignment procedure used and have suggested that this may be highly significant. This raises doubts about our ability to infer reliable phylogenies. Although concerns may not be as serious as their analyses at first imply, Morrison and Ellis1 have performed a useful service in reminding us that accurate sequence alignment is a crucial part of molecular phylogenetics. BioEssays 20 :287-290, 1998.© 1998 John Wiley & Sons, Inc.  相似文献   

5.
An earlier analysis of the trnL intron in the Colletieae (Rhamnaceae) showed polyphyly of the genus Discaria. Polyphyly of Discaria is supported only by an AT-rich region of ambiguous alignment within the trnL intron. Polyphyly of the genus relies on extracting the information of the AT-rich region correctly. Ambiguously aligned regions are commonly excluded from phylogenetic analysis. In the present study the question was raised whether random or noisy data could generate a pattern like the one found in the AT-rich region of ambiguous alignment. The original pattern was resistant to changes in alignment parameter cost when submitted to a sensitivity analysis using direct optimization. Artificially generated random or noisy data gave well-resolved trees but these were found to be extremely sensitive to changes in parameter costs. However, information from additional data, such as conserved regions, restricts the influence of random data. It is here suggested that the information in ambiguously aligned regions need not be dismissed, provided that an appropriate method that finds all possible optimal alignments is used to extract the information. In addition to commonly used support measures, some information of robustness to changes in alignment parameter costs is needed in order to make the most reliable conclusions.  相似文献   

6.

Background  

Current molecular phylogenetic studies of Lepidoptera and most other arthropods are predominantly based on mitochondrial genes and a limited number of nuclear genes. The nuclear genes, however, generally do not provide sufficient information for young radiations. ITS2, which has proven to be an excellent nuclear marker for similarly aged radiations in other organisms like fungi and plants, is only rarely used for phylogeny estimation in arthropods, although universal primers exist. This is partly due to difficulties in the alignment of ITS2 sequences in more distant taxa. The present study uses ITS2 secondary structure information to elucidate the phylogeny of a species-rich young radiation of arthropods, the butterfly subgenus Agrodiaetus. One aim is to evaluate the efficiency of ITS2 to resolve the phylogeny of the subgenus in comparison with COI, the most important mitochondrial marker in arthropods. Furthermore, we assess the use of compensatory base changes in ITS2 for the delimitation of species and discuss the prospects of ITS2 as a nuclear marker for barcoding studies.  相似文献   

7.
黄土高原典型草原地上生物量估测模型   总被引:3,自引:2,他引:1  
为了寻求有效的草地地上生物量估测方法和精确估测黄土高原典型草原草原地上生物量。于2014年8月中旬,在黄土高原典型草原草原地上生物量达到最大值,分别从单株水平和种群水平进行野外调查。以株高(H)和盖度(C)的复合因子(C×H)为自变量,通过回归分析,建立地上生物量估测模型,采用留一法对其精确性进行评估;并通过校正系数以及群落总生物量估测值和实测值比较单株水平和种群水平所建模型的精确性。结果表明:黄土高原典型草原草地,无论在单株水平还是种群水平,线性和幂函数对该区域生物量的拟合效果更好。估测模型检验结果表明,在单株水平各个物种的生物量估测值与实测值相关性较好,均达到了显著水平(P0.05),其r值均大于0.6,总相对误差RS均小于10%,平均相对误差绝对值RMA(average absolute value of relative error)均小于30%,总生物量的实测值与估测值比较接近,校正系数均接近1;而在种群水平上,虽然各物种的生物量估测值与实测值相关性均达到了显著水平(P0.05),但多数物种平均相对误差绝对值RMA大于30%,总相对误差RS(total relative error)均大于10%,总生物量的估测值均大于实测值,校正系数均偏离了1,说明在黄土高原典型草原通过单株水平建立的物种生物量估测模型的精度优于种群水平建立的物种生物量估测模型的精度。  相似文献   

8.
Given a sequenceA and regular expressionR, theapproximate regular expression matching problem is to find a sequence matchingR whose optimal alignment withA is the highest scoring of all such sequences. This paper develops an algorithm to solve the problem in timeO(MN), whereM andN are the lengths ofA andR. Thus, the time requirement is asymptotically no worse than for the simpler problem of aligning two fixed sequences. Our method is superior to an earlier algorithm by Wagner and Seiferas in several ways. First, it treats real-valued costs, in addition to integer costs, with no loss of asymptotic efficiency. Second, it requires onlyO(N) space to deliver just the score of the best alignment. Finally, its structure permits implementation techniques that make it extremely fast in practice. We extend the method to accommodate gap penalties, as required for typical applications in molecular biology, and further refine it to search for substrings ofA that strongly align with a sequence inR, as required for typical data base searches. We also show how to deliver an optimal alignment betweenA andR in onlyO(N+logM) space usingO(MN logM) time. Finally, anO(MN(M+N)+N 2logN) time algorithm is presented for alignment scoring schemes where the cost of a gap is an arbitrary increasing function of its length.  相似文献   

9.
Recently algorithms for parametric alignment (Watermanet al., 1992,Natl Acad. Sci. USA 89, 6090–6093; Gusfieldet al., 1992,Proceedings of the Third Annual ACM-SIAM Discrete Algorithms) find optimal scores for all penalty parameters, both for global and local sequence alignment. This paper reviews those techniques. Then in the main part of this paper dynamic programming methods are used to compute ensemble alignment, finding all alignment scores for all parameters. Both global and local ensemble alignments are studied, and parametric alignment is used to compute near optimal ensemble alignments.  相似文献   

10.

Background  

Jumping alignments have recently been proposed as a strategy to search a given multiple sequence alignment A against a database. Instead of comparing a database sequence S to the multiple alignment or profile as a whole, S is compared and aligned to individual sequences from A. Within this alignment, S can jump between different sequences from A, so different parts of S can be aligned to different sequences from the input multiple alignment. This approach is particularly useful for dealing with recombination events.  相似文献   

11.
  1. When we collect the growth curves of many individuals, orderly variation in the curves is often observed rather than a completely random mixture of various curves. Small individuals may exhibit similar growth curves, but the curves differ from those of large individuals, whereby the curves gradually vary from small to large individuals. It has been recognized that after standardization with the asymptotes, if all the growth curves are the same (anamorphic growth curve set), the growth curve sets can be estimated using nonchronological data; otherwise, that is, if the growth curves are not identical after standardization with the asymptotes (polymorphic growth curve set), this estimation is not feasible. However, because a given set of growth curves determines the variation in the observed data, it may be possible to estimate polymorphic growth curve sets using nonchronological data.
  2. In this study, we developed an estimation method by deriving the likelihood function for polymorphic growth curve sets. The method involves simple maximum likelihood estimation. The weighted nonlinear regression and least‐squares method after the log‐transform of the anamorphic growth curve sets were included as special cases.
  3. The growth curve sets of the height of cypress (Chamaecyparis obtusa) and larch (Larix kaempferi) trees were estimated. With the model selection process using the AIC and likelihood ratio test, the growth curve set for cypress was found to be polymorphic, whereas that for larch was found to be anamorphic. Improved fitting using the polymorphic model for cypress is due to resolving underdispersion (less dispersion in real data than model prediction).
  4. The likelihood function for model estimation depends not only on the distribution type of asymptotes, but the definition of the growth curve set as well. Consideration of these factors may be necessary, even if environmental explanatory variables and random effects are introduced.
  相似文献   

12.
WEIBULL models are fitted to synthetic life table data by applying weighted least squares analysis to log log functions which are constructed from appropriate underlying contingency tables. As such, the resulting estimates and test statistics are based on the linearized minimum modified X21-criterion and thus have satisfactory properties in moderately large samples. The basic methodology is illustrated in terms of an example which is bivariate in the sense of involving two simultaneous, but non-competing, vital events. For this situation, the estimation of WEIBULL model parameters is described for both marginal as well as certain conditional distributions either individually or jointly.  相似文献   

13.
Species identification based on short sequences of DNA markers, that is, DNA barcoding, has emerged as an integral part of modern taxonomy. However, software for the analysis of large and multilocus barcoding data sets is scarce. The Basic Local Alignment Search Tool (BLAST) is currently the fastest tool capable of handling large databases (e.g. >5000 sequences), but its accuracy is a concern and has been criticized for its local optimization. However, current more accurate software requires sequence alignment or complex calculations, which are time‐consuming when dealing with large data sets during data preprocessing or during the search stage. Therefore, it is imperative to develop a practical program for both accurate and scalable species identification for DNA barcoding. In this context, we present VIP Barcoding: a user‐friendly software in graphical user interface for rapid DNA barcoding. It adopts a hybrid, two‐stage algorithm. First, an alignment‐free composition vector (CV) method is utilized to reduce searching space by screening a reference database. The alignment‐based K2P distance nearest‐neighbour method is then employed to analyse the smaller data set generated in the first stage. In comparison with other software, we demonstrate that VIP Barcoding has (i) higher accuracy than Blastn and several alignment‐free methods and (ii) higher scalability than alignment‐based distance methods and character‐based methods. These results suggest that this platform is able to deal with both large‐scale and multilocus barcoding data with accuracy and can contribute to DNA barcoding for modern taxonomy. VIP Barcoding is free and available at http://msl.sls.cuhk.edu.hk/vipbarcoding/ .  相似文献   

14.

Background  

Determining beforehand specific positions to align (anchor points) has proved valuable for the accuracy of automated multiple sequence alignment (MSA) software. This feature can be used manually to include biological expertise, or automatically, usually by pairwise similarity searches. Multiple local similarities are be expected to be more adequate, as more biologically relevant. However, even good multiple local similarities can prove incompatible with the ordering of an alignment.  相似文献   

15.
In this work we address the problem of the robust identification of unknown parameters of a cell population dynamics model from experimental data on the kinetics of cells labelled with a fluorescence marker defining the division age of the cell. The model is formulated by a first order hyperbolic PDE for the distribution of cells with respect to the structure variable x (or z) being the intensity level (or the log10-transformed intensity level) of the marker. The parameters of the model are the rate functions of cell division, death, label decay and the label dilution factor. We develop a computational approach to the identification of the model parameters with a particular focus on the cell birth rate α(z) as a function of the marker intensity, assuming the other model parameters are scalars to be estimated. To solve the inverse problem numerically, we parameterize α(z) and apply a maximum likelihood approach. The parametrization is based on cubic Hermite splines defined on a coarse mesh with either equally spaced a priori fixed nodes or nodes to be determined in the parameter estimation procedure. Ill-posedness of the inverse problem is indicated by multiple minima. To treat the ill-posed problem, we apply Tikhonov regularization with the regularization parameter determined by the discrepancy principle. We show that the solution of the regularized parameter estimation problem is consistent with the data set with an accuracy within the noise level in the measurements.   相似文献   

16.
The eukaryotic cyto-/nucleoplasmatic 70-kDa heat-shock protein (HSP70) has homologues in the endoplasmic reticulum as well as in bacteria, mitochondria, and plastids. We selected a representative subset from the large number of sequenced stress-70 family members which covers all known branches of the protein family and calculated and manually improved an alignment. Here we present the consensus sequence of the aligned proteins and putative nuclear localization signals (NLS) in the eukaryotic HSP70 homologues. The phylogenetic relationships of the stress-70 group family members were estimated by use of different computation methods. We present a phylogenetic tree containing all known stress-70 subfamilies and demonstrate the usefulness of stress-70 protein sequences for the estimation of intertaxonic phylogeny. Correspondence to: S.A. Reusing  相似文献   

17.
The access to weak alignment media has fuelled the development of methods for efficiently and accurately measuring residual dipolar couplings (RDCs) in NMR-spectroscopy. Among the wealth of approaches for determining one-bond scalar and RDC constants only J-modulated and J-evolved techniques retain maximum resolution in the presence of differential relaxation. In this article, a number of J-evolved experiments are examined with respect to the achievable minimum linewidth in the J-dimension, using the peptide PA4 and the 80-amino-acid-protein Saposin C as model systems. With the JE-N-BIRD d,X -HSQC experiment, the average full-width at half height could be reduced to approximately 5 Hz for the protein, which allows the additional resolution of otherwise unresolved peaks by the active (J+D)-coupling. Since RDCs generally can be scaled by the choice of alignment medium and alignment strength, the technique introduced here provides an effective resort in cases when chemical shift differences alone are insufficient for discriminating signals. In favorable cases even secondary structure elements can be distinguished.  相似文献   

18.
We present a statistical method, and its accompanying algorithms, for the selection of a mathematical model of the gating mechanism of an ion channel and for the estimation of the parameters of this model. The method assumes a hidden Markov model that incorporates filtering, colored noise and state-dependent white excess noise for the recorded data. The model selection and parameter estimation are performed via a Bayesian approach using Markov chain Monte Carlo. The method is illustrated by its application to single-channel recordings of the K+ outward-rectifier in barley leaf.Acknowledgement The authors thank Sake Vogelzang, Bert van Duijn and Bert de Boer for their helpful advice and useful comments and suggestions.  相似文献   

19.
Making sense of score statistics for sequence alignments   总被引:1,自引:0,他引:1  
The search for similarity between two biological sequences lies at the core of many applications in bioinformatics. This paper aims to highlight a few of the principles that should be kept in mind when evaluating the statistical significance of alignments between sequences. The extreme value distribution is first introduced, which in most cases describes the distribution of alignment scores between a query and a database. The effects of the similarity matrix and gap penalty values on the score distribution are then examined, and it is shown that the alignment statistics can undergo an abrupt phase transition. A few types of random sequence databases used in the estimation of statistical significance are presented, and the statistics employed by the BLAST, FASTA and PRSS programs are compared. Finally the different strategies used to assess the statistical significance of the matches produced by profiles and hidden Markov models are presented.  相似文献   

20.
Missing outcomes or irregularly timed multivariate longitudinal data frequently occur in clinical trials or biomedical studies. The multivariate t linear mixed model (MtLMM) has been shown to be a robust approach to modeling multioutcome continuous repeated measures in the presence of outliers or heavy‐tailed noises. This paper presents a framework for fitting the MtLMM with an arbitrary missing data pattern embodied within multiple outcome variables recorded at irregular occasions. To address the serial correlation among the within‐subject errors, a damped exponential correlation structure is considered in the model. Under the missing at random mechanism, an efficient alternating expectation‐conditional maximization (AECM) algorithm is used to carry out estimation of parameters and imputation of missing values. The techniques for the estimation of random effects and the prediction of future responses are also investigated. Applications to an HIV‐AIDS study and a pregnancy study involving analysis of multivariate longitudinal data with missing outcomes as well as a simulation study have highlighted the superiority of MtLMMs on the provision of more adequate estimation, imputation and prediction performances.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号