首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Summary Three measures of sequence dissimilarity have been compared on a computer-generated model system in which substitutions in random sequences were made at randomly selected sites and the replacement character was chosen at random from the set of characters different from the original occupant of the site. The three measures were the conventionalmmismatch count between aligned sequences (AMC=m) and two measures not requiring prior sequence alignment. The latter two measures were the squared Euclidean distance between vectors of counts of t-tuples (t=1–6) of characters in the two sequences (multiplet distribution distances or MDD=d) and counts of characters not covered by word structures of statistically significant length common to the two sequences (common long words or CLW=SIB, SIS, or SAB). Average MDD distances were found to be two times average mismatch counts in the simulated sequences for all values of t from 1 to 6 and all degrees of substitution from one per sequence to so many as to produce, effectively, random sequences. This simple relation held independently of sequence length and of sequence composition. The relation was confirmed by exact results on small model systems and by formal asymptotic results in the limit of so few substitutions that no double hits occur and in the limit of two random sequences. The coefficient of variation for MDD distances was greater than that for mismatch counts for singlets but both measures approached the same low value for sextets. Needleman-Wunsch alignment produced incorrect mismatch counts at higher degrees of substitution. The model satisfied the conditions for the derivation of the Jukes-Cantor asymptotic adjustment, but its application produced increasingly bad results with increasing degrees of substitution in accord with earlier results on model and natural sequences. This fact was a consequence of the increase with increasing degrees of substitution of the sensitivity of the adjustment to error in the observations. Average CLW distances for a variety of common word structures were more or less parallel to MDD distances for appropriately long t-tuples. These results on model systems supported the validity of the two dissimilarity measures not requiring sequence alignment that was found in earlier work on natural sequences (Blaisdell 1989).  相似文献   

2.
Summary Various measures of sequence dissimilarity have been evaluated by how well the additive least squares estimation of edges (branch lengths) of an unrooted evolutionary tree fit the observed pairwise dissimilarity measures and by how consistent the trees are for different data sets derived from the same set of sequences. This evaluation provided sensitive discrimination among dissimilarity measures and among possible trees. Dissimilarity measures not requiring prior sequence alignment did about as well as did the traditional mismatch counts requiring prior sequence alignment. Application of Jukes-Cantor correction to singlet mismatch counts worsened the results. Measures not requiring alignment had the advantage of being applicable to sequences too different to be critically alignable. Two different measures of pairwise dissimilarity not requiring alignment have been used: (1) multiplet distribution distance (MDD), the square of the Euclidean distance between vectors of the fractions of base singlets (or doublets, or triplets, or…) in the respective sequences, and (2) complements of long words (CLW), the count of bases not occurring in significantly long common words. MDD was applicable to sequences more different than was CLW (noncoding), but the latter often gave better results where both measures were available (coding). MDD results were improved by using longer multiplets and, if the sequences were coding, by using the larger amino acid and codon alphabets rather than the nucleotide alphabet. The additive least squares method could be used to provide a reasonable consensus of different trees for the same set of species (or related genes).  相似文献   

3.
Recently algorithms for parametric alignment (Watermanet al., 1992,Natl Acad. Sci. USA 89, 6090–6093; Gusfieldet al., 1992,Proceedings of the Third Annual ACM-SIAM Discrete Algorithms) find optimal scores for all penalty parameters, both for global and local sequence alignment. This paper reviews those techniques. Then in the main part of this paper dynamic programming methods are used to compute ensemble alignment, finding all alignment scores for all parameters. Both global and local ensemble alignments are studied, and parametric alignment is used to compute near optimal ensemble alignments.  相似文献   

4.
The classic algorithms of Needleman-Wunsch and Smith-Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). To process large genomes that have undergone complex genome rearrangements, almost all existing whole genome alignment methods apply fast heuristics to divide genomes into small pieces that are suitable for Needleman-Wunsch alignment. In these alignment methods, it is standard practice to fix the parameters and to produce a single alignment for subsequent analysis by biologists. As the number of alignment programs applied on a whole genome scale continues to increase, so does the disagreement in their results. The alignments produced by different programs vary greatly, especially in non-coding regions of eukaryotic genomes where the biologically correct alignment is hard to find. Parametric alignment is one possible remedy. This methodology resolves the issue of robustness to changes in parameters by finding all optimal alignments for all possible parameters in a PHMM. Our main result is the construction of a whole genome parametric alignment of Drosophila melanogaster and Drosophila pseudoobscura. This alignment draws on existing heuristics for dividing whole genomes into small pieces for alignment, and it relies on advances we have made in computing convex polytopes that allow us to parametrically align non-coding regions using biologically realistic models. We demonstrate the utility of our parametric alignment for biological inference by showing that cis-regulatory elements are more conserved between Drosophila melanogaster and Drosophila pseudoobscura than previously thought. We also show how whole genome parametric alignment can be used to quantitatively assess the dependence of branch length estimates on alignment parameters.  相似文献   

5.
Unialgal isolates ofGelidium latifolium from northern Spain and western Norway were compared with respect to specific growth rate, when kept under different combinations of light (20, 50, 100, 200, 300 µmol m-2s-1) and temperature (17, 20, 24, 28, 31 °C.) The Norwegian isolate grew almost twice as fast as the Spanish isolate under all combinations tested. Maximum growth rate for the Norwegian and Spanish isolates was 6.71% d-1 and 3.64% d-1, respectively. The results show the existence of ecotypes and the importance of inoculum selection in the development of a mass cultivation system forGelidium.  相似文献   

6.
The G domain and domain II in the crystal structure of Thermus thermophilus elongation factor G (EF-G) were compared with the homologous domains in Thermus aquaticus elongation factor Tu (EF-Tu). Sequence alignment derived from the structural superposition was used to define conserved sequence elements in domain II. These elements and previously known conserved sequence elements in the G domain were used to guide the alignment of the sequences of Sulfolobus acidocaldarius elongation factor 2, human elongation factor 2, and Escherichia coli initiation factor 2 and release factor 3 to the aligned sequences of EF-G and EF-Tu. This alignment, which deviates from previously published alignments, has evolutionary implications and leads to alternative interpretations of biochemical data concerning the interaction of elongation factors with the -sarcin/ricin region of the ribosome. A single conserved sequence motif in domain II was identified and used to further characterize the GTPase subfamily of translation factors and related proteins. It was shown that the motif is found in most if not all the members of the family. Apparently, the common characteristic of these GTPases is an extensive consensus structural unit that possibly accounts for a similar interaction with the ribosome and is composed of two domains homologous to the G domain and domain II in EF-Tu and EF-G.  相似文献   

7.

Background

The increasing abundance of neuromorphological data provides both the opportunity and the challenge to compare massive numbers of neurons from a wide diversity of sources efficiently and effectively. We implemented a modified global alignment algorithm representing axonal and dendritic bifurcations as strings of characters. Sequence alignment quantifies neuronal similarity by identifying branch-level correspondences between trees.

Results

The space generated from pairwise similarities is capable of classifying neuronal arbor types as well as, or better than, traditional topological metrics. Unsupervised cluster analysis produces groups that significantly correspond with known cell classes for axons, dendrites, and pyramidal apical dendrites. Furthermore, the distinguishing consensus topology generated by multiple sequence alignment of a group of neurons reveals their shared branching blueprint. Interestingly, the axons of dendritic-targeting interneurons in the rodent cortex associates with pyramidal axons but apart from the (more topologically symmetric) axons of perisomatic-targeting interneurons.

Conclusions

Global pairwise and multiple sequence alignment of neurite topologies enables detailed comparison of neurites and identification of conserved topological features in alignment-defined clusters. The methods presented also provide a framework for incorporation of additional branch-level morphological features. Moreover, comparison of multiple alignment with motif analysis shows that the two techniques provide complementary information respectively revealing global and local features.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0605-1) contains supplementary material, which is available to authorized users.  相似文献   

8.
Influence of benthic organisms on solute transport in lake sediments   总被引:2,自引:2,他引:0  
Increased inputs of nutrients into the waters of Lake Okeechobee has raised concern that the lake is becoming hypereutrophic. One aspect in understanding the overall cycling and dynamics of the nutrients in the system is the effect of benthic organisms on solute transport. Various diffusional models have been used to approximate the effect of benthic organisms on solute transport within sediments using diffusion coefficient values which represent the pooled contributions of molecular diffusion (D s ) and enhanced solute mixing due to macrobenthos activity (D i ). The objective of this study was to investigate the impact of benthic activity on solute transport by measuringD s (i.e., no benthic activity) and an apparent-dispersion or mixing coefficientD m (i.e., with benthic activity) for the four major sediment types of Lake Okeechobee, Florida using a reservoir technique. This method involved monitoring the depletion of a conservative tracer (tritiated water) from the overlying water (reservoir) resulting from transport into sediments using disturbed sediments repacked in cores (3.2 cm diam.) and undisturbed cores (3.2 to 12 cm diam.) obtained from the lake. Additional estimates ofD m andD s were also obtained by measuring tracer concentration profiles in the sediment cores at the end of a specified diffusion period. Molecular diffusion coefficients (D s ) measured forrepacked cores of sand, littoral, mud and peat sediments ranged from 0.90 to 1.29 cm2 d−1, and estimates ofD s were slightly higher in undisturbed cores without benthic organisms.D m values for undisturbed cores of mud, sand and littoral sediments having macrobenthic populations ranged from 2.09 to 24.78 cm2 d−1; values that were 1.6 to 15 times higher than those in sediments without benthic activity. Undisturbed cores of varying diameter from mud sediments had similar estimates ofD m for tritium; however, the undisturbed cores with larger diameters from littoral sediments yielded larger estimates ofD m , reflecting the heterogeneity of benthic population densities and activity. Therefore,D s estimates may not adequately represent transport processes for mud, sand and littoral sediments of Lake Okeechobee; hence careful consideration should be given to the role of benthic organisms in the overall transport of solutes across the sediment-water interface. A contribution of the Florida Agricultural Experiment Station Journal Series No. R-01150. A contribution of the Florida Agricultural Experiment Station Journal Series No. R-01150.  相似文献   

9.
The size distributions of deletions, insertions, and indels (i.e., insertions or deletions) were studied, using 78 human processed pseudogenes and other published data sets. The following results were obtained: (1) Deletions occur more frequently than do insertions in sequence evolution; none of the pseudogenes studied shows significantly more insertions than deletions. (2) Empirically, the size distributions of deletions, insertions, and indels can be described well by a power law, i.e., f k = Ck b , where f k is the frequency of deletion, insertion, or indel with gap length k, b is the power parameter, and C is the normalization factor. (3) The estimates of b for deletions and insertions from the same data set are approximately equal to each other, indicating that the size distributions for deletions and insertions are approximately identical. (4) The variation in the estimates of b among various data sets is small, indicating that the effect of local structure exists but only plays a secondary role in the size distribution of deletions and insertions. (5) The linear gap penalty, which is most commonly used in sequence alignment, is not supported by our analysis; rather, the power law for the size distribution of indels suggests that an appropriate gap penalty is w k = a + b ln k, where a is the gap creation cost and blnk is the gap extension cost. (6) The higher frequency of deletion over insertion suggests that the gap creation cost of insertion (a i ) should be larger than that of deletion (a d ); that is, a i a d = In R, where R is the frequency ratio of deletions to insertions. Correspondence to: W.-H. Li  相似文献   

10.
MOTIVATION: Although pairwise sequence alignment is essential in comparative genomic sequence analysis, it has proven difficult to precisely determine the gap penalties for a given pair of sequences. A common practice is to employ default penalty values. However, there are a number of problems associated with using gap penalties. First, alignment results can vary depending on the gap penalties, making it difficult to explore appropriate parameters. Second, the statistical significance of an alignment score is typically based on a theoretical model of non-gapped alignments, which may be misleading. Finally, there is no way to control the number of gaps for a given pair of sequences, even if the number of gaps is known in advance. RESULTS: In this paper, we develop and evaluate the performance of an alignment technique that allows the researcher to assign a priori set of the number of allowable gaps, rather than using gap penalties. We compare this approach with the Smith-Waterman and Needleman-Wunsch techniques on a set of structurally aligned protein sequences. We demonstrate that this approach outperforms the other techniques, especially for short sequences (56-133 residues) with low similarity (<25%). Further, by employing a statistical measure, we show that it can be used to assess the quality of the alignment in relation to the true alignment with the associated optimal number of gaps. AVAILABILITY: The implementation of the described methods SANK_AL is available at http://cbbc.murdoch.edu.au/ CONTACT: matthew@cbbc.murdoch.edu.au.  相似文献   

11.
Nute  Michael  Warnow  Tandy 《BMC genomics》2016,17(10):764-144

Background

Multiple sequence alignment is an important task in bioinformatics, and alignments of large datasets containing hundreds or thousands of sequences are increasingly of interest. While many alignment methods exist, the most accurate alignments are likely to be based on stochastic models where sequences evolve down a tree with substitutions, insertions, and deletions. While some methods have been developed to estimate alignments under these stochastic models, only the Bayesian method BAli-Phy has been able to run on even moderately large datasets, containing 100 or so sequences. A technique to extend BAli-Phy to enable alignments of thousands of sequences could potentially improve alignment and phylogenetic tree accuracy on large-scale data beyond the best-known methods today.

Results

We use simulated data with up to 10,000 sequences representing a variety of model conditions, including some that are significantly divergent from the statistical models used in BAli-Phy and elsewhere. We give a method for incorporating BAli-Phy into PASTA and UPP, two strategies for enabling alignment methods to scale to large datasets, and give alignment and tree accuracy results measured against the ground truth from simulations. Comparable results are also given for other methods capable of aligning this many sequences.

Conclusions

Extensions of BAli-Phy using PASTA and UPP produce significantly more accurate alignments and phylogenetic trees than the current leading methods.
  相似文献   

12.
Growth responses of Pithophora oedogonia (Mont.) Wittr. and Spirogyra sp. to nine combinations of temperature (15°, 25°, and 35°C) and photon flux rate (50, 100, and 500 μmol·m?2·s?1) were determined using a three-factorial design. Maximum growth rates were measured at 35°C and 500 pmol·m?2·s?1 for P. oedogonia (0.247 d?1) and 25°C and 500 μmol·m?2·s?1 for Spirogyra sp. (0.224 d?1). Growth rates of P. oedogonia were strongly inhibited at 15°C (average decrease= 89%of maximum rate), indicating that this species is warm stenothermal. Growth rates of Spirogyra sp. were only moderately inhibited at 15° and 35°C (average decrease = 36 and 30%, respectively), suggesting that this species is eurythermal over the temperature range employed. Photon flux rate had a greater influence on growth of Spirogyra sp. (31% reduction at 50 pmol·m?2·s?1 and 25°C) than it did on growth of P. oedogonia (16% reduction at 50 μmol·m?2·s?1 and 35°C). Spirogyra sp. also exhibited much greater adjustments to its content of chlorophyll a (0.22–3.34 μg·mg fwt?1) than did P. oedogonia (1.35–3.08 μg·mg fwt?1). The chlorophyll a content of Spirogyra sp. increased in response to both reductions in photon flux rate and high temperatures (35°C). Observed species differences are discussed with respect to in situ patterns of seasonal abundance in Surrey Lake, Indiana, the effect of algal mat anatomy on the internal light environment, and the process of acclimation to changes in temperature and irradiance conditions.  相似文献   

13.
Optimal sequence alignment allowing for long gaps   总被引:7,自引:0,他引:7  
A new algorithm for optimal sequence alignment allowing for long insertions and deletions is developed. The algorithm requires O((L+C)MN) computational steps, O(LN) primary memory and O(MN) secondary memory storage, whereM andN(M≥N) are sequence lengths,L (typicallyL≤3) is the number of segment specifying the gap weighting function, andC is a constant. We have also modified our earlier traceback algorithm so that it finds all and only the optimal alignments in a compact form of a directed graph. The current versions accept a set of aligned sequences as input, which facilitates multiple sequence alignment by some iterative procedures. Dedicated to Professor Akiyoshi Wada on the occasion of his 60th birthday.  相似文献   

14.
Summary Sequence homologies among 34 chloroplast-type ferredoxins were examined using a computer program that quantitatively evaluates the extent of sequence similarity as a correlation coefficient. The resultant alignment contains six gaps representing insertions or deletions of some residues, all of which are located such that they precisely preserve the domains of structural fragments as determined by crystallographic data onSpirulina platensis ferredoxin.In the search for any total correlation between the chloroplast-type and 27 bacterial ferredoxins, 1891 comparison matrices prepared for possible combinations indicated that the bacterial basal sequence of 55 residues has been conserved evolutionarily in the chloroplast-type sequences corresponding to residue positions 36–90 ofSpirulina platensis ferredoxin. In addition, the bacterial connector sequence region was found to be conserved. These findings strongly suggest that the bacterial and chloroplast-type ferredoxins descended from a common ancestor, and branched off after the bacterial gene duplication, whereas the chloroplast-type ferredoxins originally were generated by duplicating the already duplicated bacterial gene, i.e., by double-duplication.  相似文献   

15.

Background  

Multiple sequence alignment is fundamental. Exponential growth in computation time appears to be inevitable when an optimal alignment is required for many sequences. Exact costs of optimum alignments are therefore rarely computed. Consequently much effort has been invested in algorithms for alignment that are heuristic, or explore a restricted class of solutions. These give an upper bound on the alignment cost, but it is equally important to determine the quality of the solution obtained. In the absence of an optimal alignment with which to compare, lower bounds may be calculated to assess the quality of the alignment. As more effort is invested in improving upper bounds (alignment algorithms), it is therefore important to improve lower bounds as well. Although numerous cost metrics can be used to determine the quality of an alignment, many are based on sum-of-pairs (SP) measures and their generalizations.  相似文献   

16.
A local algorithm for DNA sequence alignment with inversions   总被引:1,自引:0,他引:1  
A dynamic programming algorithm to find all optimal alignments of DNA subsequences is described. The alignments use not only substitutions, insertions and deletions of nucleotides but also inversions (reversed complements) of substrings of the sequences. The inversion alignments themselves contain substitutions, insertions and deletions of nucleotides. We study the problem of alignment with non-intersecting inversions. To provide a computationally efficient algorithm we restrict candidate inversions to theK highest scoring inversions. An algorithm to find theJ best non-intersecting alignments with inversions is also described. The new algorithm is applied to the regions of mitochondrial DNA ofDrosophila yakuba and mouse coding for URF6 and cytochrome b and the inversion of the URF6 gene is found. The open problem of intersecting inversions is discussed.  相似文献   

17.
Using site-directed mutagenesis, we investigated the roles of Ile66 and Ala107 of d-psicose 3-epimerase from Agrobacterium tumefaciens in binding O6 of its true substrate, d-fructose. When Ile66 was substituted with alanine, glycine, cysteine, leucine, phenylalanine, tryptophan, tyrosine or valine, all the mutants dramatically increased the K m for d-tagatose but slightly decreased the K m for d-fructose, indicating that Ile66 is involved in substrate recognition. When Ala107 was substituted by either isoleucine or valine, the substituted mutants had lower thermostability than the wild-type enzyme whereas the proline-substituted mutant had higher thermostability. Thus, Ala107 is involved in enzyme stability.  相似文献   

18.
Bayesian adaptive sequence alignment algorithms   总被引:3,自引:1,他引:2  
The selection of a scoring matrix and gap penalty parameters continues to be an important problem in sequence alignment. We describe here an algorithm, the 'Bayes block aligner, which bypasses this requirement. Instead of requiring a fixed set of parameter settings, this algorithm returns the Bayesian posterior probability for the number of gaps and for the scoring matrices in any series of interest. Furthermore, instead of returning the single best alignment for the chosen parameter settings, this algorithm returns the posterior distribution of all alignments considering the full range of gapping and scoring matrices selected, weighing each in proportion to its probability based on the data. We compared the Bayes aligner with the popular Smith-Waterman algorithm with parameter settings from the literature which had been optimized for the identification of structural neighbors, and found that the Bayes aligner correctly identified more structural neighbors. In a detailed examination of the alignment of a pair of kinase and a pair of GTPase sequences, we illustrate the algorithm's potential to identify subsequences that are conserved to different degrees. In addition, this example shows that the Bayes aligner returns an alignment-free assessment of the distance between a pair of sequences.   相似文献   

19.
This paper presents a dynamic programming algorithm for aligning two sequeces when the alignment is constrained to lie between two arbitrary boundary lines in the dynamic programming matrix. For affine gap penalties, the algorithm requires onlyO(F) computation time andO(M+N) space, whereF is the area of the feasible region andM andN are the sequence lengths. The result extends to concave gap penalties, with somewhat increased time and space bounds. K.-M. C. and W. M. were supported in part by grant R01 LM05110 from the National Library of Medicine. R. C. H. was supported by PHS grant R01 DK27635.  相似文献   

20.
The fate of contaminant carbon was monitored during aerobic biodegradation in the presence of a mixed indigenous microbial consortium in order to calibrate a microbial-growth-based biokinetic model. The methodology simultaneously monitored mineralization, substrate depletion and microbial population evolution in biomass extract spiked with14C-labeled hexadecane. Hexadecane depletion and hexadecane-degrader population were monitored using sacrificed microcosms by centrifuging the extract so that the supernatant and the residue contained residual hexadecane and microbial population, respectively. This methodology allowed verification of the carbon mass balance (average14C-carbon recovery of 90.33 ± 1.62% for biotic microcosms) and calibration of a biokinetic model. Four biokinetic parameters and three yield coefficients were identified (Haldane kinetic parameters:μS = 1.3639 d-1, Ks = 0.4295 mg-C, KI = 6.6457 mg-C; decay kinetic parameter:μd = 1.3.102 d-1; substrate/biomass, carbon dioxide/ biomass during growth and carbon dioxide/biomass during decay yield coefficients: Ys = 1.5948 mg-C/mg-C, YP g = 0.4554 mg-C/mg-C, YP d = 1.3263 mg-C/mg-C) and compared with the literature data. The methodology can facilitate the identification of biodegradation models by decoupling the intrinsic ability of microorganisms to degrade contaminant from restrictions imposed by limiting conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号