共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Oligopeptide biases in protein sequences and their use in predicting protein coding regions in nucleotide sequences 总被引:16,自引:0,他引:16
We have examined oligopeptides with lengths ranging from 2 to 11 residues in protein sequences that show no obvious evolutionary relationship. All sequences in the Protein Identification Resource database were carefully classified by sensitive homology searches into superfamilies to obtain unbiased oligopeptide counts. The results, contrary to previous studies, show clear prejudices in protein sequences. The oligopeptide preferences were used to help decide the significance of sequence homologies and to improve the more general methods for detecting protein coding regions within nucleotide sequences. 相似文献
3.
4.
Liu H Yang J Ling JG Chou KC 《Biochemical and biophysical research communications》2005,338(2):1005-1011
Functioning as an "address tag" or "zip code" that guides nascent proteins (newly synthesized proteins in the cytosol) to wherever they are needed, signal peptides (also called targeting signals or signal sequences) have become a crucial tool in finding new drugs or reprogramming cells for gene therapy. To effectively and timely use such a tool, however, the first important thing is to develop an automated method for quickly and accurately identifying the signal peptide for a given nascent protein. With the avalanche of new protein sequences generated in the post-genomic era, the challenge has become even more urgent and critical. In this paper, five statistical rulers were derived via performing a mutual information analysis. By combining these statistical rulers, a new prediction algorithm was established and high success prediction rates were observed. The new algorithm may play a complementary role to the existing algorithms in this area. It is anticipated that the mutual information approach introduced here may be very useful for studying many other sequence-coupling problems in molecular biology as well. 相似文献
5.
The potential for obtaining a true mass spectrometric protein identification result depends on the choice of algorithm as well as on experimental factors that influence the information content in the mass spectrometric data. Current methods can never prove definitively that a result is true, but an appropriate choice of algorithm can provide a measure of the statistical risk that a result is false, i.e., the statistical significance. We recently demonstrated an algorithm, Probity, which assigns the statistical significance to each result. For any choice of algorithm, the difficulty of obtaining statistically significant results depends on the number of protein sequences in the sequence collection searched. By simulations of random protein identifications and using the Probity algorithm, we here demonstrate explicitly how the statistical significance depends on the number of sequences searched. We also provide an example on how the practitioner's choice of taxonomic constraints influences the statistical significance. 相似文献
6.
BlastAlign uses NCBI blastn to build a multiple nucleotide alignment and is intended for use with sequences that have large indels or are otherwise difficult to align globally. The program builds a matrix representing regions of homology along the sequences, from which it selects the 'most representative' sequence and then extracts the blastn query-anchored multiple alignment for this sequence. The matrix is printed and allows subgroups to be identified visually and an option allows other sequences to be used as the 'most representative'. The program contains elements of both Perl and Python and will run on UNIX (including Mac OSX) and DOS. An additional Perl program BlastAlignP uses tblastn to align nucleotide sequences to a single amino acid sequence, thus allowing an open reading frame to be maintained in the resulting multiple alignment. AVAILABILITY: It is freely available at http://www.bio.ic.ac.uk/research/belshaw/BlastAlign.tar and at http://evolve.zoo.ox.ac.uk/software/blastalign. 相似文献
7.
8.
MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences 总被引:20,自引:0,他引:20
The Molecular Evolutionary Genetics Analysis (MEGA) software is a desktop application designed for comparative analysis of homologous gene sequences either from multigene families or from different species with a special emphasis on inferring evolutionary relationships and patterns of DNA and protein evolution. In addition to the tools for statistical analysis of data, MEGA provides many convenient facilities for the assembly of sequence data sets from files or web-based repositories, and it includes tools for visual presentation of the results obtained in the form of interactive phylogenetic trees and evolutionary distance matrices. Here we discuss the motivation, design principles and priorities that have shaped the development of MEGA. We also discuss how MEGA might evolve in the future to assist researchers in their growing need to analyze large data set using new computational methods. 相似文献
9.
10.
DNA sequences of 56 human genes for which information on both exons and
introns was available were examined. The variance in G+C content among
genes is estimated and shown to be substantial. There is a high correlation
in G+C content between exons and introns within the same gene. The
dinucleotide frequencies of introns are similar to those of intergenic
spacer regions and are in reasonable agreement with predictions from
substitution rates estimated from pseudogenes, except that the observed
deficiency of TA doublets is not predicted. Duplicated bases also show a
frequency greater than the expectation under independence. There is marked
variability among genes in the frequency of the doublet CG relative to its
expectation under independence. This variation is evolutionarily conserved
and is correlated with the G+C content. Pseudogenes behave as if they are
in a low -G+C, CG-deficient part of the genome, although the genes from
which they arose are variable in these respects.
相似文献
11.
DNAFSMiner: a web-based software toolbox to recognize two types of functional sites in DNA sequences
SUMMARY: DNAFSMiner (DNA Functional Sites Miner) is a web-based software toolbox to recognize functional sites in nucleic acid sequences. Currently in this toolbox, we provide two software: TIS Miner and Poly(A) Signal Miner. The TIS Miner can be used to predict translation initiation sites in vertebrate DNA/mRNA/cDNA sequences, and the Poly(A) Signal Miner can be used to predict polyadenylation [poly(A)] signals in human DNA sequences. The prediction results are better than those by literature methods on two benchmark applications. This good performance is mainly attributable to our unique learning method. DNAFSMiner is available free of charge for academic and non-profit organizations. AVAILABILITY: http://research.i2r.a-star.edu.sg/DNAFSMiner/ CONTACT: huiqing@i2r.a-star.edu.sg. 相似文献
12.
13.
14.
15.
This paper presents a simple program for interactive searchingfor nucleotide sequences that may code for the helixturnhelix,zinc finger or leucine zipper motifs in proteins. The helixturnhelixmotifs are predicted using the recently published method ofDodd and Egan, while zinc fingers and leucine zippers are searchedfor by our original methods. DNABIND is shown to detect allfour known helixturnhelix motifs in bacteriophagelambda genes and both zinc fingers of the adrl gene of yeast. 相似文献
16.
Thollesson M 《Bioinformatics (Oxford, England)》2004,20(3):416-418
LDDist is a Perl module implemented in C++ that allows the user to calculate LogDet pair-wise genetic distances for amino acid as well as nucleotide sequence data. It can handle site-to-site rate variation by treating a proportion of the sites as invariant and/or by assigning sites to different, presumably homogenous, rate categories. The rate-class assignments and invariant proportion can be set explicitly, or estimated by the program; the latter using either of two different capture-recapture methods. The assignment to rate categories in lieu of a phylogeny can be done using Shannon-Wiener index as a crude token for relative rate. 相似文献
17.
A detailed summary of the content and composition of total proteins, RNA and DNA in 57 yeast-like microorganisms is presented. On the basis of the correlation between the content of amino acids in proteins the studied strains could be divided into 8 groups that differ not only in the content of proteins and amino acids but also in the RNA and DNA content. According to this characteristic the groups ofBasidiomycetes andAscomycetes could be discriminated. The present study should also serve as orientation for the screening of strains suitable for the production of fodder yeasts. 相似文献
18.
P A Pevzner A A Borodovsky MYuMironov 《Journal of biomolecular structure & dynamics》1989,6(5):1013-1026
Mathematical models of the generation of genetic texts appeared simultaneously with the first sequencing DNA. They are used to establish functional and evolutionary relations between genetic texts, to predict the number and distribution of specific sites in a sequence and to identify "meaningful" words. The present paper deals with two problems: 1) The significance of deviations from the mean statistical characteristics in a genetic text. Anyone who has addressed himself to the statistical analysis of sequenced DNA is familiar with the question: what deviations from the expected frequencies of occurrence of particular words testify to the "biological" significance of those words? We propose a formula for the variance of the number of word's occurrences in the text, with allowance for word overlaps, making it possible to assess the significance of the deviations from the expected statistical characteristics. 2) A new method for predicting the frequencies of occurrence of particular words in a genetic text using the statistical characteristics of "spaced" L-grams. The method can be used for predicting the number of restriction sites in human DNA and in planning experiments on the physical mapping and sequencing of the human genome. 相似文献
19.
Background
The functional annotation of proteins relies on published information concerning their close and remote homologues in sequence databases. Evidence for remote sequence similarity can be further strengthened by a similar biological background of the query sequence and identified database sequences. However, few tools exist so far, that provide a means to include functional information in sequence database searches. 相似文献20.
Natural selection of protein structural and functional properties: a single nucleotide polymorphism perspective
下载免费PDF全文
