首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A new measure (CL) of spatial/structural landscape complexity is developed in this paper, based on the Levenshtein algorithm used in Computer Science and Bioinformatics for string comparisons. The Levenshtein distance (or edit distance) between two strings of symbols is the minimum of all possible replacements, deletions and insertions necessary to convert one string into the other. In this paper, it is shown how this measure can be applicable on raster landscape maps of any size or shape. Calculations and applications are shown on model and real landscapes. The main advantages of this measure for structural (spatial) landscape analysis are the following: it is easily applicable; it can be compared to its maximum value (depending on the grid resolution); it can be used to compare structural/spatial complexities between landscapes; it is applicable to raster landscape maps of any shape; and it can be used to calculate changes in landscape complexity over time. At the level of ecological practice, it may aid in landscape monitoring, management and planning, by identifying areas of higher structural landscape complexity, which may deserve greater attention in the process of landscape conservation.  相似文献   

2.
Qian B  Goldstein RA 《Proteins》2001,45(1):102-104
Protein sequence alignment has become a widely used method in the study of newly sequenced proteins. Most sequence alignment methods use an affine gap penalty to assign scores to insertions and deletions. Although affine gap penalties represent the relative ease of extending a gap compared with initializing a gap, it is still an obvious oversimplification of the real processes that occur during sequence evolution. To improve the efficiency of sequence alignment methods and to obtain a better understanding of the process of sequence evolution, we wanted to find a more accurate model of insertions and deletions in homologous proteins. In this work, we extract the probability of a gap occurrence and the resulting gap length distribution in distantly related proteins (sequence identity < 25%) using alignments based on their common structures. We observe a distribution of gaps that can be fitted with a multiexponential with four distinct components. The results suggest new approaches to modeling insertions and deletions in sequence alignments.  相似文献   

3.
Aita T  Husimi Y  Nishigaki K 《Bio Systems》2011,106(2-3):67-75
To measure the similarity or dissimilarity between two given biological sequences, several papers proposed metrics based on the "word-composition vector". The essence of these metrics is as follows. First, we count the appearance frequencies of all the K-tuple words throughout each of two given sequences. Then, the two given sequences are transformed into their respective word-composition vectors. Next, the distance metrics, for example the angle between the two vectors, are calculated. A significant issue is to determine the optimal word size K. With a mathematical model of mutational events (including substitutions, insertions, deletions and duplications) that occur in sequences, we analyzed how the angle between the composition vectors depends on the mutational events. We also considered the optimal word size (=resolution) from our original approach. Our results were verified by computational experiments using artificially generated sequences, amino acid sequences of hemoglobin and nucleotide sequences of 16S ribosomal RNA.  相似文献   

4.
An online tool named GEN-SNiP that identifies variations in a set of test DNA sequences with respect to a standard reference sequence is developed and deployed successfully. The tool generates a list of substitutions, insertions and deletions for each test sequences, determined by the reference sequence. In the key batch mode feature, the tool allows multiple sequences to be compared and contrasted even when small insertions and deletions are present, with results sent to the user via email. Other distinguishing features of the tool are grouping of continuous deletions or insertions in the test sequence into a single entity for better output handling, displaying of the alignment of test and reference sequence and the input sequence. The tool has been reported as unique in recent literature.  相似文献   

5.
Nucleotide insertions and deletions (indels) are responsible for gaps in the sequence alignments. Indel is one of the major sources of evolutionary change at the molecular level. We have examined the patterns of insertions and deletions in the 19 mammalian genomes, and found that deletion events are more common than insertions in the mammalian genomes. Both the number of insertions and deletions decrease rapidly when the gap length increases and single nucleotide indel is the most frequent in all indel events. The frequencies of both insertions and deletions can be described well by power law.Key Words: Insertion, deletion, gap, indel, mammalian genome.  相似文献   

6.
Sequence comparison is one of the major tasks in bioinformatics, which could serve as evidence of structural and functional conservation, as well as of evolutionary relations. There are several similarity/dissimilarity measures for sequence comparison, but challenges remains. This paper presented a binomial model-based measure to analyze biological sequences. With help of a random indicator, the occurrence of a word at any position of sequence can be regarded as a random Bernoulli variable, and the distribution of a sum of the word occurrence is well known to be a binomial one. By using a recursive formula, we computed the binomial probability of the word count and proposed a binomial model-based measure based on the relative entropy. The proposed measure was tested by extensive experiments including classification of HEV genotypes and phylogenetic analysis, and further compared with alignment-based and alignment-free measures. The results demonstrate that the proposed measure based on binomial model is more efficient.  相似文献   

7.
《Journal of molecular biology》2019,431(12):2320-2330
Short insertions and deletions (InDels) are a common type of mutation found in nature and a useful source of variation in protein engineering. InDel events have important consequences in protein evolution, often opening new pathways for adaptation. However, much less is known about the effects of InDels compared to point mutations and amino acid substitutions. In particular, deep mutagenesis studies on the distribution of fitness effects of mutations have focused almost exclusively on amino acid substitutions. Here, we present a near-comprehensive analysis of the fitness effects of single amino acid InDels in TEM-1 β-lactamase. While we found InDels to be largely deleterious, partially overlapping deletion-tolerant and insertion-tolerant regions were observed throughout the protein, especially in unstructured regions and at the end of helices. The signal sequence of TEM-1 tolerated InDels more than the mature protein. Most regions of the protein tolerated insertions more than deletions, but a few regions tolerated deletions more than insertions. We examined the relationship between InDel tolerance and a variety of measures to help understand its origin. These measures included evolutionary variation in β-lactamases, secondary structure identity, tolerance to amino acid substitutions, solvent accessibility, and side-chain weighted contact number. We found secondary structure, weighted contact number, and evolutionary variation in class A beta-lactamases to be the somewhat predictive of InDel fitness effects.  相似文献   

8.
We have developed a computer based method to identify candidate single nucleotide polymorphisms (SNPs) and small insertions/deletions from expressed sequence tag data. Using a redundancy-based approach, valid SNPs are distinguished from erroneous sequence by their representation multiple times in an alignment of sequence reads. A second measure of validity was also calculated based on the cosegregation of the SNP pattern between multiple SNP loci in an alignment. The utility of this method was demonstrated by applying it to 102,551 maize (Zea mays) expressed sequence tag sequences. A total of 14,832 candidate polymorphisms were identified with an SNP redundancy score of two or greater. Segregation of these SNPs with haplotype indicates that candidate SNPs with high redundancy and cosegregation confidence scores are likely to represent true SNPs. This was confirmed by validation of 264 candidate SNPs from 27 loci, with a range of redundancy and cosegregation scores, in four inbred maize lines. The SNP transition/transversion ratio and insertion/deletion size frequencies correspond to those observed by direct sequencing methods of SNP discovery and suggest that the majority of predicted SNPs and insertion/deletions identified using this approach represent true genetic variation in maize.  相似文献   

9.
As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including substitutions, insertions, deletions, frameshifts, and non-sense mutations. Frameshifts and non-sense mutations are likely to cause a negative effect on protein function. Existing prediction tools primarily focus on studying the deleterious effects of single amino acid substitutions through examining amino acid conservation at the position of interest among related sequences, an approach that is not directly applicable to insertions or deletions. Here, we introduce a versatile alignment-based score as a new metric to predict the damaging effects of variations not limited to single amino acid substitutions but also in-frame insertions, deletions, and multiple amino acid substitutions. This alignment-based score measures the change in sequence similarity of a query sequence to a protein sequence homolog before and after the introduction of an amino acid variation to the query sequence. Our results showed that the scoring scheme performs well in separating disease-associated variants (n = 21,662) from common polymorphisms (n = 37,022) for UniProt human protein variations, and also in separating deleterious variants (n = 15,179) from neutral variants (n = 17,891) for UniProt non-human protein variations. In our approach, the area under the receiver operating characteristic curve (AUC) for the human and non-human protein variation datasets is ∼0.85. We also observed that the alignment-based score correlates with the deleteriousness of a sequence variation. In summary, we have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org.  相似文献   

10.
MOTIVATION: A quantitative study of molecular evolutionary events such as substitutions, insertions and deletions from closely related genomes requires (1) an accurate multiple sequence alignment program and (2) a method to annotate the insertions and deletions that explain the 'gaps' in the alignment. Although the former requirement has been extensively addressed, the latter problem has received little attention, especially in a comprehensive probabilistic framework. RESULTS: Here, we present Indelign, a program that uses a probabilistic evolutionary model to compute the most likely scenario of insertions and deletions consistent with an input multiple alignment. It is also capable of modifying the given alignment so as to obtain a better agreement with the evolutionary model. We find close to optimal performance and substantial improvement over alternative methods, in tests of Indelign on synthetic data. We use Indelign to analyze regulatory sequences in Drosophila, and find an excess of insertions over deletions, which is different from what has been reported for neutral sequences. AVAILABILITY: The Indelign program may be downloaded from the website http://veda.cs.uiuc.edu/indelign/ SUPPLEMENTARY INFORMATION: Supplementary material is available at Bioinformatics online.  相似文献   

11.
The sequence requirements for splicing of the Tetrahymena pre-rRNA have been examined by altering the rRNA gene to produce versions that contain insertions and deletions within the intervening sequence (IVS). The altered genes were transcribed and the RNA tested for self-splicing in vitro. A number of insertions (8-54 nucleotides) at three locations had no effect on self-splicing activity. Two of these insertions, located at a site 5 nucleotides preceding the 3'-end of the IVS, did not alter the choice of the 3' splice site. Thus the 3' splice site is not chosen by its distance from a fixed point within the IVS. Analysis of deletions constructed at two sites revealed two structures, a hairpin loop and a stem-loop, that are entirely dispensable for IVS excision in vitro. Three other regions were found to be necessary. The regions that are important for self-splicing are not restricted to the conserved sequence elements that define this class of intervening sequences. The requirement for structures within the IVS for pre-rRNA splicing is in sharp contrast to the very limited role of IVS structure in nuclear pre-mRNA splicing.  相似文献   

12.
Amino acid similarity often needs to be considered in DNA sequence comparison to elucidate gene functions. We propose a Smith-Waterman-like algorithm which considers amino acid similarity and insertions/deletions in sequences at the DNA level and at the protein level in a hybrid manner. The algorithm is applied to cDNA sequences of Oryza sativa and those of Arabidopsis thaliana. The results are compared with the results of application of NCBI's tblastx program (which compares the sequences in the BLAST manner after translation). It is shown that the present algorithm is very helpful in discovering nucleotide insertions/deletions originating from experimental errors as well as amino acid insertions/deletions due to evolutionary reasons.  相似文献   

13.
Although oligonucleotide probes complementary to single nucleotide substitutions are commonly used in microarray-based screens for genetic variation, little is known about the hybridization properties of probes complementary to small insertions and deletions. It is necessary to define the hybridization properties of these latter probes in order to improve the specificity and sensitivity of oligonucleotide microarray-based mutational analysis of disease-related genes. Here, we compare and contrast the hybridization properties of oligonucleotide microarrays consisting of 25mer probes complementary to all possible single nucleotide substitutions and insertions, and one and two base deletions in the 9168 bp coding region of the ATM (ataxia telangiectasia mutated) gene. Over 68 different dye-labeled single-stranded nucleic acid targets representing all ATM coding exons were applied to these microarrays. We assess hybridization specificity by comparing the relative hybridization signals from probes perfectly matched to ATM sequences to those containing mismatches. Probes complementary to two base substitutions displayed the highest average specificity followed by those complementary to single base substitutions, single base deletions and single base insertions. In all the cases, hybridization specificity was strongly influenced by sequence context and possible intra- and intermolecular probe and/or target structure. Furthermore, single nucleotide substitution probes displayed the most consistent hybridization specificity data followed by single base deletions, two base deletions and single nucleotide insertions. Overall, these studies provide valuable empirical data that can be used to more accurately model the hybridization properties of insertion and deletion probes and improve the design and interpretation of oligonucleotide microarray-based resequencing and mutational analysis.  相似文献   

14.
The nucleotide sequence of 16S rDNA from Euglena gracilis chloroplasts has been determined representing the first complete sequence of an algal chloroplast rRNA gene. The structural part of the 16S rRNA gene has 1491 nucleotides according to a comparative analysis of our sequencing results with the published 5'- and 3'-terminal "T1-oligonucleotides" from 16S rRNA from E. gracilis. Alignment with 16S rDNA from Zea mays chloroplasts and E. coli reveals 80 to 72% sequence homology, respectively. Two deletions of 9 and 23 nucleotides are found which are identical in size and position with deletions observed in 16S rDNA of maize and tobacco chloroplasts and which seem to be characteristic for all chloroplast rRNA species. We also find insertions and deletions in E. gracilis not seen in 16S rDNA of higher plant chloroplasts. The 16S rRNA sequence of E. gracilis chloroplasts can be folded by base pairing according to the general 16S rRNA secondary structure model.  相似文献   

15.
Nam K  Ellegren H 《PLoS genetics》2012,8(5):e1002680
Selective and/or neutral processes may govern variation in DNA content and, ultimately, genome size. The observation in several organisms of a negative correlation between recombination rate and intron size could be compatible with a neutral model in which recombination is mutagenic for length changes. We used whole-genome data on small insertions and deletions within transposable elements from chicken and zebra finch to demonstrate clear links between recombination rate and a number of attributes of reduced DNA content. Recombination rate was negatively correlated with the length of introns, transposable elements, and intergenic spacer and with the rate of short insertions. Importantly, it was positively correlated with gene density, the rate of short deletions, the deletion bias, and the net change in sequence length. All these observations point at a pattern of more condensed genome structure in regions of high recombination. Based on the observed rates of small insertions and deletions and assuming that these rates are representative for the whole genome, we estimate that the genome of the most recent common ancestor of birds and lizards has lost nearly 20% of its DNA content up until the present. Expansion of transposable elements can counteract the effect of deletions in an equilibrium mutation model; however, since the activity of transposable elements has been low in the avian lineage, the deletion bias is likely to have had a significant effect on genome size evolution in dinosaurs and birds, contributing to the maintenance of a small genome. We also demonstrate that most of the observed correlations between recombination rate and genome contraction parameters are seen in the human genome, including for segregating indel polymorphisms. Our data are compatible with a neutral model in which recombination drives vertebrate genome size evolution and gives no direct support for a role of natural selection in this process.  相似文献   

16.
To study the mechanisms for local evolutionary changes in DNA sequences involving slippage-type insertions and deletions, an alignment approach is explored that can consider the posterior probabilities of alignment models. Various patterns of insertion and deletion that can link the ancestor and descendant sequences are proposed and evaluated by simulation and compared by the Markov chain Monte Carlo (MCMC) method. Analyses of pseudogenes reveal that the introduction of the parameters that control the probability of slippage-type events markedly augments the probability of the observed sequence evolution, arguing that a cryptic involvement of slippage occurrences is manifested as insertions and deletions of short nucleotide segments. Strikingly, approximately 80% of insertions in human pseudogenes and approximately 50% of insertions in murids pseudogenes are likely to be caused by the slippage-mediated process, as represented by BC in ABCD --> ABCBCD. We suggest that, in both human and murids, even very short repetitive motifs, such as CAGCAG, CACACA, and CCCC, have approximately 10- to 15-fold susceptibility to insertions and deletions, compared to nonrepetitive sequences. Our protocol, namely, indel-MCMC, thus seems to be a reasonable approach for statistical analyses of the early phase of microsatellite evolution.  相似文献   

17.
While compiling genetic linkage maps in several plant species based upon restriction fragment length polymorphisms (RFLPs), it was noted that the incidence of polymorphism differs among species. The basis of this disparity was investigated in this study by examining the nucleotide sequence at homologous loci among distinct cultivars within two species which exhibit considerably different levels of RFLPs. Using the polymerase chain reaction, homologous regions from different cultivars were first amplified and the nucleotide sequence of the products were determined. Four genomic regions of seven maize cultivars and three genomic regions of eight melon cultivars were examined to compare the respective levels of sequence variation between the two species. Levels of variation for both base substitutions and insertions/deletions varied widely among the maize sequences and between maize and melon for base substitutions. Estimates of theta (a measure of polymorphism) ranged from 0 to 0.002 in melon and from 0.006 to 0.040 for base substitutions and from 0.002 to 0.023 for insertions/deletions in maize. Critical value tests and chi-squared tests suggested that in maize the underlying processes generating and maintaining neutral mutations differ among the regions. The results not only suggest that several mechanisms are necessary to explain the variation seen in these two species, but also point to some basic dissimilarities in the organization and maintenance of the genomes of different plant species.  相似文献   

18.
Insertions and deletions of nucleotides in the genes encoding the variable domains of antibodies are natural components of the hypermutation process, which may expand the available repertoire of hypervariable loop lengths and conformations. Although insertion of amino acids has also been utilized in antibody engineering, little is known about the functional consequences of such modifications. To investigate this further, we have introduced single-codon insertions and deletions as well as more complex modifications in the complementarity-determining regions of human antibody fragments with different specificities. Our results demonstrate that single amino acid insertions and deletions are generally well tolerated and permit production of stably folded proteins, often with retained antigen recognition, despite the fact that the thus modified loops carry amino acids that are disallowed at key residue positions in canonical loops of the corresponding length or are of a length not associated with a known canonical structure. We have thus shown that single-codon insertions and deletions can efficiently be utilized to expand structure and sequence space of the antigen-binding site beyond what is encoded by the germline gene repertoire.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号