首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Gap costs for multiple sequence alignment   总被引:6,自引:0,他引:6  
Standard methods for aligning pairs of biological sequences charge for the most common mutations, which are substitutions, deletions and insertions. Because a single mutation may insert or delete several nucleotides, gap costs that are not directly proportional to gap length are usually the most effective. How to extend such gap costs to alignments of three or more sequences is not immediately obvious, and a variety of approaches have been taken. This paper argues that, since gap and substitution costs together specify optimal alignments, they should be defined using a common rationale. Specifically, a new definition of gap costs for multiple alignments is proposed and compared with previous ones. Since the new definition links a multiple alignment's cost to that of its pairwise projections, it allows knowledge gained about two-sequence alignments to bear on the multiple alignment problem. Also, such linkage is a key element of recent algorithms that have rendered practical the simultaneous alignment of as many as six sequences.  相似文献   

2.
The outcome of a phylogenetic analysis based on DNA sequence data is highly dependent on the homology-assignment step and may vary with alignment parameter costs. Robustness to changes in parameter costs is therefore a desired quality of a data set because the final conclusions will be less dependent on selecting a precise optimal cost set. Here, node stability is explored in relationship to separate versus combined analysis in three different data sets, all including several data partitions. Robustness to changes in cost sets is measured as number of successive changes that can be made in a given cost set before a specific clade is lost. The changes are in all cases base change cost, gap penalties, and adding/removing/changing affine gap costs. When combining data partitions, the number of clades that appear in the entire parameter space is not remarkably increased, in some cases this number even decreased. However, when combining data partitions the trees from cost sets including affine gap costs were always more similar than the trees were from cost sets without affine gap costs. This was not the case when the data partitions were analyzed independently. When data sets were combined 80% of the clades found under cost sets including affine gap costs resisted at least one change to the cost set.  相似文献   

3.
Optimizing substitution matrices by separating score distributions   总被引:1,自引:0,他引:1  
MOTIVATION: Homology search is one of the most fundamental tools in Bioinformatics. Typical alignment algorithms use substitution matrices and gap costs. Thus, the improvement of substitution matrices increases accuracy of homology searches. Generally, substitution matrices are derived from aligned sequences whose relationships are known, and gap costs are determined by trial and error. To discriminate relationships more clearly, we are encouraged to optimize the substitution matrices from statistical viewpoints using both positive and negative examples utilizing Bayesian decision theory. RESULTS: Using Cluster of Orthologous Group (COG) database, we optimized substitution matrices. The classification accuracy of the obtained matrix is better than that of conventional substitution matrices to COG database. It also achieves good performance in classifying with other databases.  相似文献   

4.

Background  

Studies on the distribution of indel sizes have consistently found that they obey a power law. This finding has lead several scientists to propose that logarithmic gap costs, G (k) = a + c ln k, are more biologically realistic than affine gap costs, G (k) = a + bk, for sequence alignment. Since quick and efficient affine costs are currently the most popular way to globally align sequences, the goal of this paper is to determine whether logarithmic gap costs improve alignment accuracy significantly enough the merit their use over the faster affine gap costs.  相似文献   

5.
Based on the observation that a single mutational event can delete or insert multiple residues, affine gap costs for sequence alignment charge a penalty for the existence of a gap, and a further length-dependent penalty. From structural or multiple alignments of distantly related proteins, it has been observed that conserved residues frequently fall into ungapped blocks separated by relatively nonconserved regions. To take advantage of this structure, a simple generalization of affine gap costs is proposed that allows nonconserved regions to be effectively ignored. The distribution of scores from local alignments using these generalized gap costs is shown empirically to follow an extreme value distribution. Examples are presented for which generalized affine gap costs yield superior alignments from the standpoints both of statistical significance and of alignment accuracy. Guidelines for selecting generalized affine gap costs are discussed, as is their possible application to multiple alignment. Proteins 32:88–96, 1998. Published 1998 Wiley-Liss, Inc.
  • 1 This article is a US government work and, as such, is in the public domain in the United States of America.
  •   相似文献   

    6.
    The performances of five global multiple-sequence alignment programs (CLUSTAL W, Divide and Conquer, Malign, PileUp, and TreeAlign) were evaluated using part of the animal mitochondrial small subunit (12S) rRNA molecule. Conserved sequence motifs derived from an alignment based on secondary structural information were used to score how well each program aligned a data set of five vertebrate and five invertebrate taxa over a range of parameter values. All of the programs could align the motifs with reasonable accuracy for at least one set of parameter conditions, although if the whole sequence was considered, similarity to the structural alignment was only 25%-34%. Use of small gap costs generally gave more accurate results, although Malign and TreeAlign generated longer alignments when gap costs were low. The programs differed in the consistency of the alignments when gap cost was varied; CLUSTAL W, Divide and Conquer, and TreeAlign were the most accurate and robust, while PileUp performed poorly as gap cost values increased, and the accuracy of Malign fluctuated. Default settings for the programs did not give the best results, and attempting to select similar parameter values in different programs did not always result in more similar alignments. Poor alignment of even well-conserved motifs can occur if these are near sites with insertions or deletions. Since there is no a priori way to determine gap costs and because such costs can vary over the gene, alignment of rRNA sequences, particularly the less well conserved regions, should be treated carefully and aided by secondary structure and conserved motifs. Some motifs are single bases and so are often invisible to alignment programs. Our tests involved the most conserved regions of the 12S rRNA gene, and alignment of less well conserved regions will be more problematical. None of the alignments we examined produced a fully resolved phylogeny for the data set, indicating that this portion of 12S rRNA is insufficient for resolution of distant evolutionary relationships.  相似文献   

    7.
    Exploring a large number of parameter sets in sensitivity analyses of direct optimization parsimony can be costly in terms of time and computing resources, and there is little a priori guidance available for reasonable limits to these search parameters. For this reason, we sought a general‐purpose upper limit for gap costs in the direct optimization program POY to streamline this process. To test the performance of POY as gap costs increase, we simulated data onto a pre‐set topology using a GTR + I + G model modified to include gaps by adding them according to a negative‐binomial model. Gaps were then removed and the data were analysed in POY at increasing gap costs. Increasing gap costs consistently resulted in reduced phylogenetic accuracy across trees of different relative branch lengths. Decoupling gap insertion and gap extension costs recovered a fraction of the accuracy lost by having both high gap insertion and gap extension costs, but only in trees with long internal nodes. To determine whether loss of phylogenetic accuracy was node‐specific, we designed a small dataset with a constrained node, where all possible combinations of cost substitution and different percentages of gap versus nucleotide changes were explored. These analyses showed that the effects of gap insertion and extension are node‐specific, and the minimum threshold for convergence on gap‐supported nodes is similar to the threshold for accuracy loss found in the larger simulated datasets. Subsequent analyses of empirical data revealed that a similar pattern of loss with gap cost increase can occur with ribosomal genes (18S, 28S, 16S and 12S) but this pattern was not seen in the intron data (myoglobin II) examined. In conjunction with previously published congruence‐based studies, the results suggest that POY sensitivity analyses can be streamlined and made more accurate if gap insertion and extension costs follow, as a guideline, a limit of four times the highest base‐transformation cost. © The Willi Hennig Society 2008.  相似文献   

    8.
    Tumour development is a process resulting from the disturbance of various cellular functions including cell proliferation, adhesion and motility. While the role of these cell parameters in tumour promotion and progression has been widely recognized, the mechanisms that influence gap junctional coupling during tumorigenesis remain elusive. Neoplastic cells usually display decreased levels of connexin expression and/or gap junctional coupling. Thus, impaired intercellular communication via gap junctions may facilitate the release of a potentially neoplastic cell from the controlling regime of the surrounding tissue, leading to tumour promotion. However, recent data indicates that metastatic tumour cell lines are often characterized by relatively high levels of connexin expression and gap junctional coupling. This review outlines current knowledge on the role of connexins in tumorigenesis and the possible mechanisms of the interference of gap junctional coupling with the processes of tumour invasion and metastasis. Paper authored by participants of the international conference: XXXIV Winter School of the Faculty of Biochemistry, Biophysics and Biotechnology of Jagiellonian University, Zakopane, March 7–11, 2007, “The Cell and Its Environment”. Publication costs were covered by the organisers of this meeting.  相似文献   

    9.
    Indels in DNA sequences frequently affect more than a single nucleotide, creating problems for alignment, character coding and phylogenetic analysis. However, the size and frequency of multiple‐residue indels is not usually tested, and with popular alignment packages their reconstruction is indirectly acheived by reducing the affine (gap extension) cost. We explored the length distribution of indels in intron sequences of the gene Mp20 by modifying the gap opening and gap extension costs. Given a “known” tree for the study group, global homology levels were greatest under low gap cost, with gap extension costs of roughly 0.4‐fold the opening cost. Different approaches to gap coding and weighting suggested that taxonomic congruence was correlated with high frequencies of multiple‐position indels, with a maximum indel length of 2–5 bp and few indels above 15 bp, but also including a proportion of indels > 100 bp. Only a small minority of indels could be reconstructed as single‐position indels. Consequently, tree topologies improved when homologous multinucleotide indels were recoded as binary characters which are otherwise highly homoplastic and weighted characters in single‐position coding. In tree‐generating alignment procedures as implemented in POY, where gap penalty determines the character weight during tree search, the problem of assigning inappropriately high weight to multiple‐residue indels could partly be overcome by setting the extension costs to about 0.4‐fold lower than gap opening costs. We conclude that multiple consecutive gap positions are not independent characters and hence methods for parsimony reconstruction of long indels are required. Finally, we also observed a general lack of correlation between taxonomic and character congruence, demonstrating the difficulties of applying congruence criteria to decide among competing alignments. This highlights the value of recent model‐based alignment procedures which can implement the statistical distributions of indel size classes, and do not rely on potentially circular strategies for optimizing overall congruence. © The Willi Hennig Society 2006.  相似文献   

    10.
    MOTIVATION: No general theory guides the selection of gap penalties for local sequence alignment. We empirically determined the most effective gap penalties for protein sequence similarity searches with substitution matrices over a range of target evolutionary distances from 20 to 200 Point Accepted Mutations (PAMs). RESULTS: We embedded real and simulated homologs of protein sequences into a database and searched the database to determine the gap penalties that produced the best statistical significance for the distant homologs. The most effective penalty for the first residue in a gap (q+r) changes as a function of evolutionary distance, while the gap extension penalty for additional residues (r) does not. For these data, the optimal gap penalties for a given matrix scaled in 1/3 bit units (e.g. BLOSUM50, PAM200) are q=25-0.1 * (target PAM distance), r=5. Our results provide an empirical basis for selection of gap penalties and demonstrate how optimal gap penalties behave as a function of the target evolutionary distance of the substitution matrix. These gap penalties can improve expectation values by at least one order of magnitude when searching with short sequences, and improve the alignment of proteins containing short sequences repeated in tandem.  相似文献   

    11.
    Achieving sufficient connectivity between populations is essential for persistence, but costs of dispersal may select against individual traits or behaviours that, if present, would improve connectivity. Existing dispersal models tend to ignore the multitude of risks to individuals: while many assess the effect of mortality costs, there is also a risk of failing to find new habitat, especially when the entire inhabitable area remains both small and fragmented. There are few known rules governing whether individuals evolve to disperse more, or less, than what is ideal for population connectivity and persistence. Here we aim to fill this gap, while also noting that evolution might not only produce suboptimal dispersal behaviour: it also influences individual heterogeneity in dispersal. Intuitively, we might expect heterogeneity to improve connectivity, as some individuals will travel far. However, we show that this is only true if dispersal distances on average are quite short; heterogeneity can also lead to reduced connectivity because it can reduce the proportion of the most profitable (‘safest’) intermediate dispersal distances. In general, our results show that conditions typically associated with conservation concerns (small and fragmented habitats inhabited by a species with a low birth rate) are also ones that are most likely to lead to suboptimal dispersal traits. This prompts the question of assisted dispersal in cases of urgent conservation concern.  相似文献   

    12.
    We advance the hypothesis that women are as competitive as men once the incentive for winning includes factors that matter to women. Allowing winners an opportunity to share some of their winnings with the low performers has gendered consequences for competitive behavior. We ground our work in an evolutionary framework in which winning competitions brings asymmetric benefits and costs to men and women. In the new environment, the potential to share some of the rewards from competition with others may afford women the benefit of reaping competitive gains without incurring some of its potential costs. An experiment (N = 438 in an online convenience sample of U.S. adults) supports our hypothesis: a 26% gender gap in performance vanishes once a sharing option is included to an otherwise identical winner-take-all incentive scheme. Besides providing a novel experiment that challenges the paradigm that women are not as motivated to compete as men, our work proposes some suggestions for policy: including socially-oriented rewards to contracts may offer a novel tool to close the persistent labor market gender gap.  相似文献   

    13.
    In this article, our objective is to introduce economics as a tool for the planning, prioritization, and evaluation of restoration projects. Studies that develop economic estimates of public values for ecological restoration employ methods that may be unfamiliar to practitioners. We hope to address this knowledge gap by describing economic concepts in the context of ecological restoration. We have summarized the most common methods for estimating the costs and benefits of restoration projects as well as frameworks for decision analysis and prioritization. These methods are illustrated in a review of the literature as it applies to terrestrial restoration in the United States, with examples of applications of methods to projects. Our hope is that practitioners will consider collaborating with economists to help ensure that restoration costs and benefits are identified and understood.  相似文献   

    14.
    Although the phylogeny of centipedes has found ample agreement based on morphology, recent analyses incorporating molecular data show major conflict at resolving the deepest nodes in the centipede tree. While some genes support the classical (morphological) hypothesis, others suggest an alternative tree in which the relictual order Craterostigmomorpha, restricted to Tasmania and New Zealand, is resolved as the sister group to all other centipedes. We combined all available data including seven genes (totalling more than 8 kb of genetic information) and 153 morphological characters for 24 centipedes, and conducted a sensitivity analysis to evaluate where the conflict resides. Our data showed that the classical hypothesis is obtained primarily when nuclear ribosomal genes exert dominance in the character data matrix (at high gap costs), while the alternative tree is obtained when protein-encoding genes account for most of the cladogram length (at low gap costs). In this particular case, the addition of genetic data does not produce a more stable hypothesis for deep centipede relationships than when analysing certain genes independently, but the overall conflict in the data can be clearly detected via a sensitivity analysis, and support and stability of shallow nodes increase as data are added.  相似文献   

    15.
    Optimal sequence alignment using affine gap costs   总被引:27,自引:0,他引:27  
    When comparing two biological sequences, it is often desirable for a gap to be assigned a cost not directly proportional to its length. If affine gap costs are employed, in other words if opening a gap costsv and each null in the gap costsu, the algorithm of Gotoh (1982,J. molec. Biol. 162, 705) finds the minimum cost of aligning two sequences in orderMN steps. Gotoh's algorithm attempts to find only one from among possibly many optimal (minimum-cost) alignments, but does not always succeed. This paper provides an example for which this part of Gotoh's algorithm fails and describes an algorithm that finds all and only the optimal alignments. This modification of Gotoh's algorithm still requires orderMN steps. A more precise form of path graph than previously used is needed to represent accurately all optimal alignments for affine gap costs.  相似文献   

    16.
    Bioinformatics software for biologists in the genomics era   总被引:1,自引:0,他引:1  
      相似文献   

    17.
    The response of Japanese beech (Fagus japonica Maxim.) sprouts to canopy gaps in natural beech forest in central Japan was studied using two contrasted gaps in which tree-ring chronologies of regenerating stems were analyzed. The gaps were created by uprooting of a single Quercus mongolica var. grosseserrata stem (diameter: 50 cm; gap size: 40 m2; 23 years old) and by concurrent uprootings of four F. japonica stools (gap size: 180 m2; 30 years old). Japanese beech sprouts emerged before and after the gap formation and dominated stem populations in both gaps. In gaps, growth of F. japonica sprouts was equal or lower than growth of stems of seed origin, but most sprouts (F. japonica, Acer mono var. marmoratum) appeared a few years before emergence of seedlings. The small gap created by single stem fall was dominated by some beech sprouts from stools adjacent to the gap. The multiple gap was not closed by beech sprouts from stools surrounding the gap, but some dominant beech stems were resprouts from the uprooted beech stools. The existence of a sprout bank under the canopy may play an important role in the closing process of gaps in natural Japanese beech forest.  相似文献   

    18.
    长白山暗针叶林林隙一般特征及干扰状况   总被引:17,自引:0,他引:17  
    杨修 《生态学报》2002,22(11):1825-1831
    对长白山暗针叶林林隙一般特征和干扰状况进行了研究。结果表明:长白山暗针叶林林隙的线状密度为21.15个/km,扩展林隙所占的面积比例为29.45%,冠空隙所占的面积比例为15.81%;冠空隙的年干扰频率为0.24%,干扰轮回期为416.7a左右;冠空隙的大小变化在17.9—340.3m^2之间,<100m^2的冠空隙个数较多,冠空隙的平均面积为93.60m^2;扩展林隙的大小变化在43.6—482.3m^2之间,50一200m^2之间的扩展林隙个数较多,扩展林隙的平均面积为174.34m^2;暗针叶林林隙形成的主要方式是风倒;在长白山暗针叶林中,大多数林隙是由2—6株形成木形成,其中由3株形成木形成的林隙员多,单株或7株以上形成木形成的林隙数量很少;在暗针叶林中,10一40a前这段时间形成的林隙较多,特别是20一30a期间形成的林隙员多。其它阶段形成的林隙较少;暗针叶林的林隙大多是由臭冷杉、落叶松和鱼鳞云杉形成。径级在10一30cm之间,高度在25—30m之间的主林层树木形成林隙的可能性最大。暗针叶林林分组成、林隙干扰方式和程度随海拔高度的变化而变化。  相似文献   

    19.
    Dental caries is one of the most common diseases of childhood. The aim of this study was to compare the cost of providing the Scotland-wide nursery toothbrushing programme with associated National Health Service (NHS) cost savings from improvements in the dental health of five-year-old children: through avoided dental extractions, fillings and potential treatments for decay.

    Methods

    Estimated costs of the nursery toothbrushing programme in 2011/12 were requested from all Scottish Health Boards. Unit costs of a filled, extracted and decayed primary tooth were calculated using verifiable sources of information. Total costs associated with dental treatments were estimated for the period from 1999/00 to 2009/10. These costs were based on the unit costs above and using the data of the National Dental Inspection Programme and then extrapolated to the population level. Expected cost savings were calculated for each of the subsequent years in comparison with the 2001/02 dental treatment costs. Population standardised analysis of hypothetical cohorts of 1000 children per deprivation category was performed.

    Results

    The estimated cost of the nursery toothbrushing programme in Scotland was £1,762,621 per year. The estimated cost of dental treatments in the baseline year 2001/02 was £8,766,297, while in 2009/10 it was £4,035,200. In 2002/03 the costs of dental treatments increased by £213,380 (2.4%). In the following years the costs decreased dramatically with the estimated annual savings ranging from £1,217,255 in 2003/04 (13.9% of costs in 2001/02) to £4,731,097 in 2009/10 (54.0%). Population standardised analysis by deprivation groups showed that the largest decrease in modelled costs was for the most deprived cohort of children.

    Conclusions

    The NHS costs associated with the dental treatments for five-year-old children decreased over time. In the eighth year of the toothbrushing programme the expected savings were more than two and a half times the costs of the programme implementation.  相似文献   

    20.
    Summary The early development of notochord cells may be divided into three phases according to the quantitative evaluation of gap junctions: late gastrula, neurula and from tailbud to tadpole. In late gastrula, the percentage of the area of gap junctions to total membrane is 0.054% and most of the gap junctions are small in size. During the stages of neurulation, the ratios of gap junctions to total membrane area increase and remain high (0.106–0.181 %), and the majority of the gap junctions are of medium and large size. The high ratios of gap junctions to membrane area during neurulation suggests that intercellular communication via gap junctions is important during this period. In the stages from tailbud to tadpole the ratios decrease and drop drastically to 0.001 % and most of the gap junctions found are small in size. It is in the last phase that gap junctions of altered configuration appear.  相似文献   

    设为首页 | 免责声明 | 关于勤云 | 加入收藏

    Copyright©北京勤云科技发展有限公司  京ICP备09084417号