期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using Triplet Periodicity of Nucleotide Sequences for Finding Potential Reading Frame Shifts in Genes

F.E. Frenkel E.V. Korotkov 《DNA research》2009,16(2):105-114

We introduce a novel approach for the detection of possible mutations leading to a reading frame (RF) shift in a gene. Deletions and insertions of DNA coding regions are considerable events for genes because an RF shift results in modifications of the extensive region of amino acid sequence coded by a gene. The suggested method is based on the phenomenon of triplet periodicity (TP) in coding regions of genes and its relative resistance to substitutions in DNA sequence. We attempted to extend 326 933 regions of continuous TP found in genes from the KEGG databank by considering possible insertions and deletions. We revealed totally 824 genes where such extension was possible and statistically significant. Then we generated amino acid sequences according to active (KEGG''s) and hypothetically ancient RFs in order to find confirmation of a shift at a protein level. Consequently, 64 sequences have protein similarities only for ancient RF, 176 only for active RF, 3 for both and 581 have no protein similarity at all. We aimed to have revealed lower bound for the number of genes in which a shift between RF and TP is possible. Further ways to increase the number of revealed RF shifts are discussed. 相似文献

2.

Predicting the Functional Effect of Amino Acid Substitutions and Indels

Yongwook Choi Gregory E. Sims Sean Murphy Jason R. Miller Agnes P. Chan 《PloS one》2012,7(10)

As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including substitutions, insertions, deletions, frameshifts, and non-sense mutations. Frameshifts and non-sense mutations are likely to cause a negative effect on protein function. Existing prediction tools primarily focus on studying the deleterious effects of single amino acid substitutions through examining amino acid conservation at the position of interest among related sequences, an approach that is not directly applicable to insertions or deletions. Here, we introduce a versatile alignment-based score as a new metric to predict the damaging effects of variations not limited to single amino acid substitutions but also in-frame insertions, deletions, and multiple amino acid substitutions. This alignment-based score measures the change in sequence similarity of a query sequence to a protein sequence homolog before and after the introduction of an amino acid variation to the query sequence. Our results showed that the scoring scheme performs well in separating disease-associated variants (n = 21,662) from common polymorphisms (n = 37,022) for UniProt human protein variations, and also in separating deleterious variants (n = 15,179) from neutral variants (n = 17,891) for UniProt non-human protein variations. In our approach, the area under the receiver operating characteristic curve (AUC) for the human and non-human protein variation datasets is ∼0.85. We also observed that the alignment-based score correlates with the deleteriousness of a sequence variation. In summary, we have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org. 相似文献

3.

Isolation of cDNA clones for human erythrocyte membrane sialoglycoproteins alpha and delta. 总被引：2，自引：0，他引：2

下载免费PDF全文

C G Tate M J Tanner 《The Biochemical journal》1988,254(3):743-750

We have isolated almost full-length cDNA clones corresponding to human erythrocyte membrane sialoglycoproteins alpha (glycophorin A) and delta (glycophorin B). The predicted amino acid sequence of delta differs at two amino acid residues from the sequence determined by peptide sequencing. The sialoglycoprotein delta clone we have isolated contains an interrupting sequence within the region that gives rise to the cleaved N-terminal leader sequence for the protein and represents a product that is unlikely to be inserted into the erythrocyte membrane. Comparison of the cDNA sequences of alpha and delta shows very strong homology at the DNA level within the coding regions. The two mRNA sequences are closely related and differ by a number of clearly defined insertions and deletions. 相似文献

4.

Alternative pathways of methyl methanesulfonate-induced mutagenesis in Escherichia coli

Ewa Sledziewska-Gójska Celina Janion 《Molecular & general genetics : MGG》1989,216(1):126-131

相似文献

5.

Developing a mathematical method to search for latent periodicity in protein amino-acid sequences with deletions and insertions

E. V. Korotkov M. A. Korotkova 《Biophysics》2015,60(6):876-885

A mathematical method has been developed in order to search for latent periodicity in protein amino-acid and other symbolical sequences using dynamic programming and random matrices. The method allows the detection of the latent periodicity with insertions and deletions at positions that are unknown beforehand. The developed method has been applied to search for the periodicity in the amino-acid sequences of several proteins and in the euro/dollar exchange rate since 2001. The presence of a long period with insertions and deletions in amino-acid sequences is shown. The period length of seven amino acids is observed in the proteins that contain supercoiled regions (a coiled-coil structure) as well as of six, five, or more amino acids. The existence of the period length of 6 and 7 days, as well as 24 and 25 h in the analyzed financial time series is observed; note that this periodicity is detectable only for insertions and deletions. The causes that underlie the occurrence of the latent periodicity with insertions and deletions in amino-acid sequences and financial time series are discussed. 相似文献

6.

Fold change in evolution of protein structures

Grishin NV 《Journal of structural biology》2001,134(2-3):167-185

Typically, protein spatial structures are more conserved in evolution than amino acid sequences. However, the recent explosion of sequence and structure information accompanied by the development of powerful computational methods led to the accumulation of examples of homologous proteins with globally distinct structures. Significant sequence conservation, local structural resemblance, and functional similarity strongly indicate evolutionary relationships between these proteins despite pronounced structural differences at the fold level. Several mechanisms such as insertions/deletions/substitutions, circular permutations, and rearrangements in beta-sheet topologies account for the majority of detected structural irregularities. The existence of evolutionarily related proteins that possess different folds brings new challenges to the homology modeling techniques and the structure classification strategies and offers new opportunities for protein design in experimental studies. 相似文献

7.

Molecular analysis of hybrid dysgenesis-induced derivatives of a P-element allele at the vg locus. 总被引：6，自引：1，他引：5

下载免费PDF全文

J A Williams S S Pappu J B Bell 《Molecular and cellular biology》1988,8(4):1489-1497

相似文献

8.

The polygalacturonases of Aspergillus niger are encoded by a family of diverged genes.

H J Bussink F P Buxton B A Fraaye L H de Graaff J Visser 《European journal of biochemistry》1992,208(1):83-90

Aspergillus niger produces several polygalacturonases that, with other enzymes, are involved in the degradation of pectin. One of the two previously characterized genes coding for the abundant polygalacturonases I and II (PGI and PGII) found in a commercial pectinase preparation was used as a probe to isolate five more genes by screening a genomic DNA library in phage lambda EMBL4 using conditions of moderate stringency. The products of these genes were detected in the culture medium of Aspergillus nidulans transformants on the basis of activity measurements and Western-blot analysis using a polyclonal antibody raised against PGI. These transformants were, with one exception, constructed using phage DNA. A. nidulans transformants secreted high amounts of PGI and PGII in comparison to the previously characterized A. niger transformants and a novel polygalacturonase (PGC) was produced at high levels by A. nidulans transformed with the subcloned pgaC gene. This gene was sequenced and the protein-coding region was found to be interrupted by three introns; the different intron/exon organization of the three sequenced A. niger polygalacturonase genes can be explained by the gain or loss of two single introns. The pgaC gene encodes a putative 383-amino-acid prepro-protein that is cleaved after a pair of basic amino acids and shows approximately 60% amino acid sequence similarity to the other polygalacturonases in the mature protein. The N-terminal amino acid sequences of the A. niger polygalacturonases display characteristic amino acid insertions or deletions that are also observed in polygalacturonases of phytopathogenic fungi. In the upstream regions of the A. niger polygalacturonase genes, a sequence of ten conserved nucleotides comprising a CCAAT sequence was found, which is likely to represent a binding site for a regulatory protein as it shows a high similarity to the yeast CYC1 upstream activation site recognized by the HAP2/3/4 activation complex. 相似文献

9.

Latent periodicity of protein families, identified with the indel-aware algorithm

Laskin AA Skryabin KG Korotkov EV 《Journal of proteome research》2007,6(2):862-868

Latent amino acid repeats seem to be widespread in genetic sequences and to reflect their structure, function, and evolution. We have recently identified latent periodicity in more than 150 protein families including protein kinases and various nucleotide-binding proteins. The latent repeats in these families were correlated to their structure and evolution. However, a majority of known protein families were not identified with our latent periodicity search algorithm. The main presumable reason for this was the inability of our techniques to identify periodicities interspersed with insertions and deletions. We designed the new latent periodicity search algorithm, which is capable of taking into account insertions and deletions. As a result, we identified many novel cases of latent periodicity peculiar to protein families. Possible origins of the periodic structure of these families are discussed. Summarizing, we presume that latent periodicity is present in a substantial portion of known protein families. The latent periodicity matrices and the results of Swiss-Prot scans are available from http://bioinf.narod.ru/del/. 相似文献

10.

An alternative to the accepted phylogeny of purple bacteria based on 16S rRNA: analyses of the amino acid sequences of cytochromes C2 and C556 from Rhodobacter (Rhodovulum) sulfidophilus

Ambler RP Meyer TE Bartsch RG Cusanovich MA 《Archives of biochemistry and biophysics》2001,388(1):25-33

It is becoming increasingly apparent from complete genome sequences that 16S rRNA data, as currently interpreted, does not provide an unambiguous picture of bacterial phylogeny. In contrast, we have found that analysis of insertions and deletions in the amino acid sequences of cytochrome c2 has some advantages in establishing relationships and that this approach may have broad utility in acquiring a better understanding of bacterial relationships. The amino acid sequences of cytochromes c2 and c556 have been determined in whole or in part from four strains of Rhodobacter sulfidophilus. The cytochrome c2 contains three- and eight-residue insertions as well as a single-residue deletion in common with the large cytochromes c2 but in contrast to the small cytochromes c2 and mitochondrial cytochromes. In addition, the Rb. sulfidophilus protein shares a rare six- to seven-residue insertion with other Rhodobacter cytochromes c2. The cytochrome c556 is a low-spin class II cytochrome c homologous to the greater family of cytochromes c', which are usually high-spin. The similarity of cytochrome c556 to other species of class II cytochromes is consistent with the relationships deduced from comparisons of cytochromes c2. Thus, our results do not support placement of Rb. sulfidophilus in a separate genus, Rhodovulum, which was proposed primarily on the basis of 16S rRNA sequences. Instead, the Rhodobacter cytochromes c2 are distinct from those of other genera and species of purple bacteria and show a different pattern of relationships among species than reported for 16S rRNA. 相似文献

11.

SIFT Indel: Predictions for the Functional Effects of Amino Acid Insertions/Deletions in Proteins

Jing Hu Pauline C. Ng 《PloS one》2013,8(10)

Indels in the coding regions of a gene can either cause frameshifts or amino acid insertions/deletions. Frameshifting indels are indels that have a length that is not divisible by 3 and subsequently cause frameshifts. Indels that have a length divisible by 3 cause amino acid insertions/deletions or block substitutions; we call these 3n indels. The new amino acid changes resulting from 3n indels could potentially affect protein function. Therefore, we construct a SIFT Indel prediction algorithm for 3n indels which achieves 82% accuracy, 81% sensitivity, 82% specificity, 82% precision, 0.63 MCC, and 0.87 AUC by 10-fold cross-validation. We have previously published a prediction algorithm for frameshifting indels. The rules for the prediction of 3n indels are different from the rules for the prediction of frameshifting indels and reflect the biological differences of these two different types of variations. SIFT Indel was applied to human 3n indels from the 1000 Genomes Project and the Exome Sequencing Project. We found that common variants are less likely to be deleterious than rare variants. The SIFT indel prediction algorithm for 3n indels is available at http://sift-dna.org/ 相似文献

12.

Evolutionary rates of insertion and deletion in noncoding nucleotide sequences of primates 总被引：16，自引：5，他引：11

Saitou N; Ueda S 《Molecular biology and evolution》1994,11(3):504-512

Insertions and deletions are responsible for gaps in aligned nucleotide sequences, but they have been usually ignored when the number of nucleotide substitutions was estimated. We compared six sets of nuclear and mitochondrial noncoding DNA sequences of primates and obtained the estimates of the evolutionary rate of insertion and deletion. The maximum-parsimony principle was applied to locate insertions and deletions on a given phylogenetic tree. Deletions were about twice as frequent as insertions for nuclear DNA, and single-nucleotide insertions and deletions were the most frequent in all events. The rate of insertion and deletion was found to be rather constant among branches of the phylogenetic tree, and the rate (approximately 2.0/kb/Myr) for mitochondrial DNA was found to be much higher than that (approximately 0.2/kb/Myr) for nuclear DNA. The rates of nucleotide substitution were about 10 times higher than the rate of insertion and deletion for both nuclear and mitochondrial DNA. 相似文献

13.

Crossassociation: A method of comparing protein sequences

M. J. Sackin 《Biochemical genetics》1971,5(3):287-313

Crossassociation is a computer method of comparing protein sequences. It can help detect amino acid matches, deletions, insertions, and other similarities which would be hard to detect by eye. The method is to slide the sequences past each other one step at a time and to count the number of amino acids that match. At each overlap position, the program prints the percentage match and statistical significance measures of the matching. The null hypothesis for significance is the random arrangement of amino acids in the proportions found in the sequences under study. For most protein pairs, the expected proportion of matches is about 1/14. The method includes computation of three overall similarity measures between sequences which should have use in both evolutionary and taxonomic studies. The use of the method has been tested with actual and hypothetical sequences. Problems of recovering evolutionary relationships by this and related methods are discussed. 相似文献

14.

ZFX has a gene structure similar to ZFY, the putative human sex determinant, and escapes X inactivation 总被引：29，自引：0，他引：29

A Schneider-G?dicke P Beer-Romero L G Brown R Nussbaum D C Page 《Cell》1989,57(7):1247-1258

相似文献

15.

Specificity of Tn5 insertions into a 36-bp DNA sequence repeated in tandem seven times

James R. Lupski Perry Gershon Luiz S. Ozaki G. Nigel Godson 《Gene》1984,30(1-3):99-106

The target junction sequences of six independent Tn5 insertions into a 36-bp tandemly repeated DNA segment have been determined. In all instances Tn5 preferentially inserts near one end of the tandem repeat, but in four out of six cases the insertion is between different nucleotides. The target sequence shares some similarity (8 out of 11 bp) with the ends of Tn5. All six insertions are accompanied by duplication of 9 bp of target DNA. The data imply that, even though Tn5 appears to insert randomly on a macro scale, at the nucleotide sequence level insertion into target DNA, which has limited similarity to the Tn5 end reactive sequences, may be a preferred event. 相似文献

16.

A nonlinear measure of subalignment similarity and its significance levels

Stephen F. Altschul Bruce W. Erickson 《Bulletin of mathematical biology》1986,48(5-6):617-632

A new measure of subalignment similarity is introduced. Specifically, similaritys(l,c) is defined as the logarithm to the basep of the probability of findingc or fewer mismatches in a subalignment of lengthl, wherep is the probability of a match. Previous algorithms can not use this measure to find locally optimal subalignments because, unlike Needleman-Wunsch and Sellers similarities, this measure is nonlinear. A new pattern recognition algorithm is described for finding all locally optimal subalignments of two nucleotide sequences. The DD algorithm can uses(l, c) or any other reasonable similarity function to assess the relative interest of subalignments. The DD algorithm searches only the diagonal graph, which lacks insertions and deletions. This search strategy greatly decreases the computation time and does not require an arbitrary choice of gap cost. The paths of the resulting DD graph usually draw attention to likely locations for insertions and deletions. A heuristic formula is derived for estimating significance levels fors(l, c) in the context of the lengths of the two aligned sequences. The DD algorithm has been used to find interesting subalignments between the nucleotide sequences for human and murine interleukin 2. 相似文献

17.

Classification analysis of a latent dinucleotide periodicity of plant genomes

A. A. Shelenkov K. G. Skryabin E. V. Korotkov 《Russian Journal of Genetics》2008,44(1):101-114

The information decomposition (ID) method has been used for searching dinucleotide periodicities, including latent ones, in plant genomes. In nucleotide sequences of genomes of various plants from the Gen-Bank database, 14 766 sequences with a periodicity of two nucleotides have been found at a high level of statistical significance. Classification of the periodicity matrices of the detected DNA sequences has yielded 141 classes of dinucleotide periodicity. Since ID does not detect periodicities with nucleotide deletions or insertions, modified profile analysis (MPA) has been applied to the obtained classes to reveal DNA sequences with dinucleotide periodicities containing nucleotide deletions and insertions. Combined use of ID and MPA has permitted the detection of 80 396 DNA sequences with dinucleotide periodicities in the genomes of various plants. The biological role of dinucleotide periodicity in the detected sequences is discussed. 相似文献

18.

Functional consequences of insertions and deletions in the complementarity-determining regions of human antibodies

Lantto J Ohlin M 《The Journal of biological chemistry》2002,277(47):45108-45114

Insertions and deletions of nucleotides in the genes encoding the variable domains of antibodies are natural components of the hypermutation process, which may expand the available repertoire of hypervariable loop lengths and conformations. Although insertion of amino acids has also been utilized in antibody engineering, little is known about the functional consequences of such modifications. To investigate this further, we have introduced single-codon insertions and deletions as well as more complex modifications in the complementarity-determining regions of human antibody fragments with different specificities. Our results demonstrate that single amino acid insertions and deletions are generally well tolerated and permit production of stably folded proteins, often with retained antigen recognition, despite the fact that the thus modified loops carry amino acids that are disallowed at key residue positions in canonical loops of the corresponding length or are of a length not associated with a known canonical structure. We have thus shown that single-codon insertions and deletions can efficiently be utilized to expand structure and sequence space of the antigen-binding site beyond what is encoded by the germline gene repertoire. 相似文献

19.

An Approach for Searching Insertions in Bacterial Genes Leading to the Phase Shift of Triplet Periodicity

Maria A. Korotkova Nikolay A. Kudryashov Eugene V. Korotkov 《基因组蛋白质组与生物信息学报(英文版)》2011,(Z2):158-170

The concept of the phase shift of triplet periodicity (TP) was used for searching potential DNA insertions in genes from 17 bacterial genomes. A mathematical algorithm for detection of these insertions has been developed. This approach can detect potential insertions and deletions with lengths that are not multiples of three bases, especially insertions of relatively large DNA fragments (>100 bases). New similarity measure between triplet matrixes was employed to improve the sensitivity for detecting the TP phase shift. Sequences of 17,220 bacterial genes with each consisting of more than 1,200 bases were analyzed, and the presence of a TP phase shift has been shown in ~16% of analysed genes (2,809 genes), which is about 4 times more than that detected in our previous work. We propose that shifts of the TP phase may indicate the shifts of reading frame in genes after insertions of the DNA fragments with lengths that are not multiples of three bases. A relationship between the phase shifts of TP and the frame shifts in genes is discussed. 相似文献

20.

A mathematical consideration of the word-composition vector method in comparison of biological sequences

Aita T Husimi Y Nishigaki K 《Bio Systems》2011,106(2-3):67-75

To measure the similarity or dissimilarity between two given biological sequences, several papers proposed metrics based on the "word-composition vector". The essence of these metrics is as follows. First, we count the appearance frequencies of all the K-tuple words throughout each of two given sequences. Then, the two given sequences are transformed into their respective word-composition vectors. Next, the distance metrics, for example the angle between the two vectors, are calculated. A significant issue is to determine the optimal word size K. With a mathematical model of mutational events (including substitutions, insertions, deletions and duplications) that occur in sequences, we analyzed how the angle between the composition vectors depends on the mutational events. We also considered the optimal word size (=resolution) from our original approach. Our results were verified by computational experiments using artificially generated sequences, amino acid sequences of hemoglobin and nucleotide sequences of 16S ribosomal RNA. 相似文献