共查询到20条相似文献,搜索用时 0 毫秒
1.
Consensus sequences based on plurality rule 总被引:1,自引:0,他引:1
We apply concepts of social choice theory, in particular those concerning median and plurality rules, to investigate the problem
of finding a consensus of aligned molecular sequences. Our model of consensus permits consensus elements at each aligned position
to denote ambiguity codes if several alternatives are equally-preferred candidates for consensus. Our results concern plurality
rules which are median rules are characterized by the Condorcet properties, and are efficient to calculate. Our approach is
axiomatic. 相似文献
2.
3.
Threshold consensus methods for molecular sequences. 总被引:1,自引:0,他引:1
We introduce a parameterized threshold consensus method (th chi) for molecular sequences which is based on a majority-rule voting principle. In contrast to other frequency-based methods, the th chi method uses a single criterion to return ambiguity codes of different lengths. We derive basic features of the method and establish that it returns at most two ambiguity codes at any position of the consensus sequence. We bound from below the size of the frequency gap that exists when the th chi method returns an ambiguity code. Using such properties, we compare the th chi method to other consensus methods for molecular sequences which are defined in terms of threshold or gap criteria. 相似文献
4.
Arabidopsis consensus intron sequences 总被引:7,自引:0,他引:7
We have analysed 998 Arabidopsis intron sequences in the EMBL database. All Arabidopsis introns to adhere to the :GU...AG: rule with the exception of 1% of introns with :GC at their 5 ends. Virtually all of the introns contained a putative branchpoint sequence (YUNAN) 18 to 60 nt upstream of the 3 splice site. Although a polypyrimidine tract was much less apparent than in vertebrate introns, the most common nucleotide in the region upstream of the 3 splice site was uridine. Consensus sequences for 5 and 3 splice sites and branchpoint sequences for Arabidopsis introns are presented. 相似文献
5.
Plant chitinase consensus sequences 总被引:6,自引:0,他引:6
Eighty-six plant chitinase sequences from 29 different species and one hybrid were obtained from the on-line GenBank nucleotide
database. These sequences were grouped into five gene families based on previously published guidelines (Meins et al., 1994),
and the amino-acid and nucleotide sequences of each gene family were aligned. Consensus amino-acid and nucleotide sequences
were derived for each gene family based on the alignments. The consensus sequences were analyzed to determine, their amino-acid
composition, hydropathy profiles, and codon usage. 相似文献
6.
Kitson DH Badretdinov A Zhu ZY Velikanov M Edwards DJ Olszewski K Szalma S Yan L 《Briefings in bioinformatics》2002,3(1):32-44
To maximise the assignment of function of the proteins encoded by a genome and to aid the search for novel drug targets, there is an emerging need for sensitive methods of predicting protein function on a genome-wide basis. GeneAtlas is an automated, high-throughput pipeline for the prediction of protein structure and function using sequence similarity detection, homology modelling and fold recognition methods. GeneAtlas is described in detail here. To test GeneAtlas, a 'virtual' genome was used, a subset of PDB structures from the SCOP database, in which the functional relationships are known. GeneAtlas detects additional relationships by building 3D models in comparison with the sequence searching method PSI-BLAST. Functionally related proteins with sequence identity below the twilight zone can be recognised correctly. 相似文献
7.
Consensus methods are recognized as valuable tools for data analysis, especially when some sort of data aggregation is desired. Although consensus methods for sequences play a vital role in molecular biology, researchers pay little heed to the features and limitations of such methods, and so there are risks that criteria for constructing consensus sequences will be misused or misunderstood. To understand better the issues involved, we conducted a critical comparison of nine consensus methods for sequences, of which eight were used in papers appearing in this journal. We report the results of that comparison, and we make recommendations which we hope will assist researchers when they must select particular consensus methods for particular applications. 相似文献
8.
Although molecular biologists often calculate consensus sequencesfrom aligned DNA or protein sequences, relatively little isknown about the properties of many of the consensus methodsbeing used. Consequently, we wrote a program, CONSENSUS, toanalyze and compare methods of calculating a consensus result(a base, an ambiguity code or a subset of codes) at a positionin an aligned set of molecular sequences. The program supportsalphabets of up to four symbols (e.g. (R, Y) or A, C, G, T).The program's output makes it suitable for exploratory dataanalysis or for selecting values of thresholds or confidencelevels in consensus methods having such parameters. 相似文献
9.
10.
Kim JH Waterman MS Li LM 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(1):88-97
If the origins of fragments are known in genome sequencing projects, it is straightforward to reconstruct diploid consensus sequences. In reality, however, this is not true. Although there are proposed methods to reconstruct haplotypes from genome sequencing projects, an accuracy assessment is required to evaluate the confidence of the estimated diploid consensus sequences. In this paper, we define the confidence score of diploid consensus sequences. It requires the calculation of the likelihood of an assembly. To calculate the likelihood, we propose a linear time algorithm with respect to the number of polymorphic sites. The likelihood calculation and confidence score are used for further improvements of haplotype estimation in two directions. One direction is that low-scored phases are disconnected. The other direction is that, instead of using nominal frequency 1/2, the haplotype frequency is estimated to reflect the actual contribution of each haplotype. Our method was evaluated on the simulated data whose polymorphism rate (1.2 percent) was based on Ciona intestinalis. As a result, the high accuracy of our algorithm was indicated: The true positive rate of the haplotype estimation was greater than 97 percent 相似文献
11.
Interpreting cDNA sequences: Some insights from studies on translation 总被引:36,自引:0,他引:36
M. Kozak 《Mammalian genome》1996,7(8):563-574
This review discusses some rules for assessing the completeness of a cDNA sequence and identifying the start site for translation.
Features commonly invoked—such as an ATG codon in a favorable context for initiation, or the presence of an upstream in-frame
terminator codon, or the prediction of a signal peptide-like sequence at the amino terminus—have some validity; but examples
drawn from the literature illustrate limitations to each of these criteria. The best advice is to inspect a cDNA sequence
not only for these positive features but also for the absence of certain negative indicators. Three specific warning signs
are discussed and documented: (i) The presence of numerous ATG codons upstream from the presumptive start site for translation
often indicates an aberration (sometimes a retained intron) at the 5′ end of the cDNA. (ii) Even one strong, upstream, out-of-frame
ATG codon poses a problem if the reading frame set by the upstream ATG overlaps the presumptive start of the major open reading
frame. Many cDNAs that display this arrangement turn out to be incomplete; that is, the out-of-frame ATG codon is within,
rather than upstream from, the protein coding domain. (iii) A very weak context at the putative start site for translation
often means that the cDNA lacks the authentic initiator codon. In addition to presenting some criteria that may aid in recognizing
incomplete cDNA sequences, the review includes some advice for using in vitro translation systems for the expression of cDNAs.
Some unresolved questions about translational regulation are discussed by way of illustrating the importance of verifying
mRNA structures before making deductions about translation.
Received: 24 April 1996 / Accepted: May 1996 相似文献
12.
A graphical method is presented for displaying the patterns in a set of aligned sequences. The characters representing the sequence are stacked on top of each other for each position in the aligned sequences. The height of each letter is made proportional to its frequency, and the letters are sorted so the most common one is on top. The height of the entire stack is then adjusted to signify the information content of the sequences at that position. From these 'sequence logos', one can determine not only the consensus sequence but also the relative frequency of bases and the information content (measured in bits) at every position in a site or sequence. The logo displays both significant residues and subtle sequence patterns. 相似文献
13.
Inferring consensus structure from nucleic acid sequences 总被引:4,自引:0,他引:4
This paper presents an unsupervised inference method for determiningthe higher-order structure from sequence data. The method isgeneral, but in this paper it is applied to nucleic acid sequencesin determining the secondary (2-D) and tertiary (3-D) structureof the macromolecule. The method evaluates position -positioninterdependence of the sequence using an information measureknown as expected mutual information. The expected mutual informationis calculated for each pair of positions and the chi-squaretest is used to screen statistically significant position pairs.In the calculation of expected mutual information, an unbiasedprobability estimator is used to overcome the problem associatedwith zero observation in conserved sites. A selection criterionbased on known structural constraints of the strongest interdependentposition pairs is applied yielding position pairs most indicativeof secondary and tertiary interactions. The method has beentested using tRNA and 5S rRNA sequences with very good results.
Received on July 20, 1990; accepted on January 15, 1991 相似文献
14.
15.
Codon reading patterns in Drosophila melanogaster mitochondria based on their tRNA sequences: a unique wobble rule in animal mitochondria. 总被引:4,自引:1,他引:4 下载免费PDF全文
K Tomita T Ueda S Ishiwa P F Crain J A McCloskey K Watanabe 《Nucleic acids research》1999,27(21):4291-4297
Mitochondrial (mt) tRNA(Trp), tRNA(Ile), tRNA(Met), tRNA(Ser)GCU, tRNA(Asn)and tRNA(Lys)were purified from Drosophila melanogaster (fruit fly) and their nucleotide sequences were determined. tRNA(Lys)corresponding to both AAA and AAG lysine codons was found to contain the anticodon CUU, C34 at the wobble position being unmodified. tRNA(Met)corresponding to both AUA and AUG methionine codons was found to contain 5-formylcytidine (f(5)C) at the wobble position, although the extent of modification is partial. These results suggest that both C and f(5)C as the wobble bases at the anticodon first position (position 34) can recognize A at the codon third position (position 3) in the fruit fly mt translation system. tRNA(Ser)GCU corresponding to AGU, AGC and AGA serine codons was found to contain unmodified G at the anticodon wobble position, suggesting the utilization of an unconventional G34-A3 base pair during translation. When these tRNA anticodon sequences are compared with those of other animal counterparts, it is concluded that either unmodified C or G at the wobble position can recognize A at the codon third position and that modification from A to t(6)A at position 37, 3'-adjacent to the anticodon, seems to be important for tRNA possessing C34 to recognize A3 in the mRNA in the fruit fly mt translation system. 相似文献
16.
Defining the consensus sequences of E.coli promoter elements by random selection. 总被引:7,自引:2,他引:7 下载免费PDF全文
The consensus sequence of E.coli promoter elements was determined by the method of random selection. A large collection of hybrid molecules was produced in which random-sequence oligonucleotides were cloned in place of a wild-type promoter element, and functional -10 and -35 E.coli promoter elements were obtained by a genetic selection involving the expression of a structural gene. The DNA sequences and relative levels of function for -10 and -35 elements were determined. The consensus sequences determined by this approach are very similar to those determined by comparing DNA sequences of naturally occurring E.coli promoters. However, no strong correlation is observed between similarity to the consensus and relative level of function. The results are considered in terms of E.coli promoter function and of the general applicability of the random selection method. 相似文献
17.
18.
Keith JM Adams P Bryant D Kroese DP Mitchelson KR Cochran DA Lala GH 《Bioinformatics (Oxford, England)》2002,18(11):1494-1499
MOTIVATION: A consensus sequence for a family of related sequences is, as the name suggests, a sequence that captures the features common to most members of the family. Consensus sequences are important in various DNA sequencing applications and are a convenient way to characterize a family of molecules. RESULTS: This paper describes a new algorithm for finding a consensus sequence, using the popular optimization method known as simulated annealing. Unlike the conventional approach of finding a consensus sequence by first forming a multiple sequence alignment, this algorithm searches for a sequence that minimises the sum of pairwise distances to each of the input sequences. The resulting consensus sequence can then be used to induce a multiple sequence alignment. The time required by the algorithm scales linearly with the number of input sequences and quadratically with the length of the consensus sequence. We present results demonstrating the high quality of the consensus sequences and alignments produced by the new algorithm. For comparison, we also present similar results obtained using ClustalW. The new algorithm outperforms ClustalW in many cases. 相似文献
19.
The covariance between two estimators of Simpson's index of diversity, D, obtained from cumulated samples at two stages of the sampling process is derived. A stopping rule based on a comparison of successively obtained estimates of D is developed for use in determining adequate sample size for sampling in stages. The results of a numerical study to compare the effect on the estimation of diversity using the stopping rule with one stage sampling are included. 相似文献
20.
TBP flanking sequences: asymmetry of binding, long-range effects and consensus sequences 总被引:2,自引:2,他引:0
We carried out in vitro selection experiments to systematically probe the effects of TATA-box flanking sequences on its interaction with the TATA-box binding protein (TBP). This study validates our previous hypothesis that the effect of the flanking sequences on TBP/TATA-box interactions is much more significant when the TATA box has a context-dependent DNA structure. Several interesting observations, with implications for protein–DNA interactions in general, came out of this study. (i) Selected sequences are selection-method specific and TATA-box dependent. (ii) The variability in binding stability as a function of the flanking sequences for (T-A)4 boxes is as large as the variability in binding stability as a function of the core TATA box itself. Thus, for (T-A)4 boxes the flanking sequences completely dominate and determine the binding interaction. (iii) Binding stabilities of all but one of the individual selected sequences of the (T-A)4form is significantly higher than that of their mononucleotide-based consensus sequence. (iv) Even though the (T-A)4 sequence is symmetric the flanking sequence pattern is asymmetric. We propose that the plasticity of (T-A)n sequences increases the number of conformationally distinct TATA boxes without the need to extent the TBP contact region beyond the eight-base-pair long TATA box. 相似文献