首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Targeted gene walking polymerase chain reaction.   总被引:26,自引:3,他引:23       下载免费PDF全文
We describe a modification of a polymerase chain reaction method called 'targeted gene walking' that can be used for the amplification of unknown DNA sequences adjacent to a short stretch of known sequence by using the combination of a single, targeted sequence specific PCR primer with a second, nonspecific 'walking' primer. This technique can replace conventional cloning and screening methods with a single step PCR protocol to greatly expedite the isolation of sequences either upstream or downstream from a known sequence. A number of potential applications are discussed, including its utility as an alternative to cloning and screening for new genes or cDNAs, as a method for searching for polymorphic sites, restriction endonuclease or regulatory regions, and its adaptation to rapidly sequence DNA of lengthy unknown regions that are contiguous to known genes.  相似文献   

2.
MOTIVATION: A consensus sequence for a family of related sequences is, as the name suggests, a sequence that captures the features common to most members of the family. Consensus sequences are important in various DNA sequencing applications and are a convenient way to characterize a family of molecules. RESULTS: This paper describes a new algorithm for finding a consensus sequence, using the popular optimization method known as simulated annealing. Unlike the conventional approach of finding a consensus sequence by first forming a multiple sequence alignment, this algorithm searches for a sequence that minimises the sum of pairwise distances to each of the input sequences. The resulting consensus sequence can then be used to induce a multiple sequence alignment. The time required by the algorithm scales linearly with the number of input sequences and quadratically with the length of the consensus sequence. We present results demonstrating the high quality of the consensus sequences and alignments produced by the new algorithm. For comparison, we also present similar results obtained using ClustalW. The new algorithm outperforms ClustalW in many cases.  相似文献   

3.
MOTIVATION: For large-scale structural assignment to sequences, as in computational structural genomics, a fast yet sensitive sequence search procedure is essential. A new approach using intermediate sequences was tested as a shortcut to iterative multiple sequence search methods such as PSI-BLAST. RESULTS: A library containing potential intermediate sequences for proteins of known structure (PDB-ISL) was constructed. The sequences in the library were collected from a large sequence database using the sequences of the domains of proteins of known structure as the query sequences and the program PSI-BLAST. Sequences of proteins of unknown structure can be matched to distantly related proteins of known structure by using pairwise sequence comparison methods to find homologues in PDB-ISL. Searches of PDB-ISL were calibrated, and the number of correct matches found at a given error rate was the same as that found by PSI-BLAST. The advantage of this library is that it uses pairwise sequence comparison methods, such as FASTA or BLAST2, and can, therefore, be searched easily and, in many cases, much more quickly than an iterative multiple sequence comparison method. The procedure is roughly 20 times faster than PSI-BLAST for small genomes and several hundred times for large genomes. AVAILABILITY: Sequences can be submitted to the PDB-ISL servers at http://stash.mrc-lmb.cam.ac.uk/PDB_ISL/ or http://cyrah.ebi.ac.uk:1111/Serv/PDB_ISL/ and can be downloaded from ftp://ftp.ebi.ac.uk/pub/contrib/jong/PDB_+ ++ISL/ CONTACT: sat@mrc-lmb.cam.ac.uk and jong@ebi.ac.uk  相似文献   

4.
Analysis of 16S rRNA sequences retrieved as cDNA (16S rcDNA) from the Octopus Spring cyanobacterial mat has permitted phylogenetic characterization of some uncultivated community members, expanding our knowledge or diversity within this microbial community. Two new cyanobacterial 16S rRNA sequences were discovered, raising to four the number of cyanobacterial sequence types known to occur in the mat. None of the sequences found is that of the cultivated thermophilic cyanobacterium Synechococcus lividus. A new 16S rRNA sequence characteristic of green nonsulfur bacteria and their relatives was discovered, raising to two the number of such sequences known to exist in the mat. Both are unique among the 16S rRNA sequences of cultivated members of this group, including an Octopus Spring isolate of Chloroflexus aurantiacus and Heliothrix oregonensis, whose sequences we report herein. Two spirochete-like 16S rRNA sequences were discovered. One can be placed in the leptospira subdivision of the spirochete group, but the other has such a loose affiliation with the spirochete group that it might actually belong to an as yet unrecognized subdivision or even to a new eubacterial line of descent.  相似文献   

5.
Analysis of 16S rRNA sequences retrieved as cDNA (16S rcDNA) from the Octopus Spring cyanobacterial mat has permitted phylogenetic characterization of some uncultivated community members, expanding our knowledge or diversity within this microbial community. Two new cyanobacterial 16S rRNA sequences were discovered, raising to four the number of cyanobacterial sequence types known to occur in the mat. None of the sequences found is that of the cultivated thermophilic cyanobacterium Synechococcus lividus. A new 16S rRNA sequence characteristic of green nonsulfur bacteria and their relatives was discovered, raising to two the number of such sequences known to exist in the mat. Both are unique among the 16S rRNA sequences of cultivated members of this group, including an Octopus Spring isolate of Chloroflexus aurantiacus and Heliothrix oregonensis, whose sequences we report herein. Two spirochete-like 16S rRNA sequences were discovered. One can be placed in the leptospira subdivision of the spirochete group, but the other has such a loose affiliation with the spirochete group that it might actually belong to an as yet unrecognized subdivision or even to a new eubacterial line of descent.  相似文献   

6.
运用计算机进行核酸和蛋白质的序列分析是分子生物学研究的一个较新发展,这项技术已越来越多地用于研究大量积累的序列数据。蛋白质功能区是蛋白质分子中能独立折叠成具有一定结构并执行特定功能的结构域,所有具有同一类功能区的分子统称为一个蛋白质的超族(protein superfamily)。本文通过对免疫球蛋白(Ig)超族及其功能区序列所进行的分析,建立了一种根据功能区之保守片段残基组成的模式匹配分析检索蛋白质功能区的方法,它先根据多序列的对准比较确定某一类功能区之保守片段,再对已知的保守片段各位置上氨基酸残基组成进行统计分析,然后根据与统计数值相匹配的方法,计算待检序列残基组成的统计学意义,由此确定功能区的存在。该方法的优点在于它不仅可以检出已知的具有某一类功能区的分子,而且还可能发现新的具有该功能区的分子,从而推测后者的功能。  相似文献   

7.
DNA binding sites: representation and discovery   总被引:60,自引:0,他引:60  
The purpose of this article is to provide a brief history of the development and application of computer algorithms for the analysis and prediction of DNA binding sites. This problem can be conveniently divided into two subproblems. The first is, given a collection of known binding sites, develop a representation of those sites that can be used to search new sequences and reliably predict where additional binding sites occur. The second is, given a set of sequences known to contain binding sites for a common factor, but not knowing where the sites are, discover the location of the sites in each sequence and a representation for the specificity of the protein.  相似文献   

8.
Ten new wheat γ-gliadin gene sequences are reported and an analysis of γ-gliadin gene family structure is carried out using all known γ-gliadin sequences. The new sequences comprise four genomic clones with significantly more flanking DNA than previously reported, and six cDNA clones from a wheat endosperm EST project. Analysis of extended flanking DNA from the genomic clones indicates the limits of conservation of γ-gliadin DNA sequence that are similar to those previously found with other gliadin and glutenin genes and that are theorized to define the DNA sequence necessary for gene control. Most of the flanking DNA is not homologous to any reported DNA sequence, and one flanking region contains the first MITE-like (miniature inverted transposable element) DNA sequence associated with gliadin genes. About a quarter of the encoded polypeptides would contain a free cysteine residue – an observation that may relate to reports that at least some gliadins can participate in wheat endosperm glutenin polymer formation. The new sequences represent both genes closely related to those previously reported and a new sub-class of γ-gliadins.  相似文献   

9.
10.
A multiple sequence alignment algorithm is described that uses a dynamic programming-based pattern construction method to align a set of homologous sequences based on their common pattern of conserved sequence elements. This pattern-induced multi-sequence alignment (PIMA) algorithm can employ secondary-structure dependent gap penalties for use in comparative modelling of new sequences when the three-dimensional structure of one or more members of the same family is known. We show that the use of secondary structure information can significantly improve the accuracy of aligning structure boundaries in a set of homologous sequences even when the structure of only one member of the family is known.  相似文献   

11.
An automatic sequence search and analysis protocol (DomainFinder) based on PSI-BLAST and IMPALA, and using conservative thresholds, has been developed for reliably integrating gene sequences from GenBank into their respective structural families within the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath_new). DomainFinder assigns a new gene sequence to a CATH homologous superfamily provided that PSI-BLAST identifies a clear relationship to at least one other Protein Data Bank sequence within that superfamily. This has resulted in an expansion of the CATH protein family database (CATH-PFDB v1.6) from 19,563 domain structures to 176,597 domain sequences. A further 50,000 putative homologous relationships can be identified using less stringent cut-offs and these relationships are maintained within neighbour tables in the CATH Oracle database, pending further evidence of their suggested evolutionary relationship. Analysis of the CATH-PFDB has shown that only 15% of the sequence families are close enough to a known structure for reliable homology modeling. IMPALA/PSI-BLAST profiles have been generated for each of the sequence families in the expanded CATH-PFDB and a web server has been provided so that new sequences may be scanned against the profile library and be assigned to a structure and homologous superfamily.  相似文献   

12.
Transmembrane helices are the most readily predictable secondary structure components of proteins. They can be predicted to a high degree of accuracy in a variety of ways. Many of these methods compare new sequence data with the sequence characteristics of known transmembrane domains. However, the known transmembrane sequences are not necessarily representative of a particular organism. We attempt to demonstrate that parameters optimized for the known transmembrane domains are far from optimal when predicting transmembrane regions in a given genome. In particular, we have tested the effect of nucleotide bias upon the composition and hence the prediction characteristics of transmembrane helices. Our analysis shows that nucleotide bias of a genome has a strong and predictable influence upon the occurrences of several of the most important hydrophobic amino acids found within transmembrane helices. Thus, we show that nucleotide bias should be taken into account when determining putative transmembrane domains from sequence data.  相似文献   

13.
C Sander  R Schneider 《Proteins》1991,9(1):56-68
The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology.  相似文献   

14.
Alignment of protein sequences by their profiles   总被引:7,自引:0,他引:7  
The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thirteen different protocols for creating and comparing profiles corresponding to the multiple sequence alignments are implemented in the SALIGN command of MODELLER. A test set of 200 pairwise, structure-based alignments with sequence identities below 40% is used to benchmark the 13 protocols as well as a number of previously described sequence alignment methods, including heuristic pairwise sequence alignment by BLAST, pairwise sequence alignment by global dynamic programming with an affine gap penalty function by the ALIGN command of MODELLER, sequence-profile alignment by PSI-BLAST, Hidden Markov Model methods implemented in SAM and LOBSTER, pairwise sequence alignment relying on predicted local structure by SEA, and multiple sequence alignment by CLUSTALW and COMPASS. The alignment accuracies of the best new protocols were significantly better than those of the other tested methods. For example, the fraction of the correctly aligned residues relative to the structure-based alignment by the best protocol is 56%, which can be compared with the accuracies of 26%, 42%, 43%, 48%, 50%, 49%, 43%, and 43% for the other methods, respectively. The new method is currently applied to large-scale comparative protein structure modeling of all known sequences.  相似文献   

15.
16.
High-efficiency thermal asymmetric interlaced (HE-TAIL) PCR is a modified thermal asymmetric interlaced (TAIL) method for finding unknown genomic DNA sequences adjacent to known sequences in GC-rich plant DNA. Necessary modifications to obtain high-efficiency amplification of flanking sequences are the inclusion of 2 control reactions during tertiary cycling and the design of long gene-specific primers, which can be used during single-step annealing-extension PCR. The modified protocol is suitable to walk from short known sequences, such as sequence-tagged sites (STS), expressed sequence tags (EST), or short exon sequences, and enables researchers to clone full-length open reading frames (ORFs) without library screening. Moreover, the HE-TAIL method can be used to identify DNA sequences flanking T-DNA insertions or to isolate promoter regions. Although individual steps are limited to about 4 kb, multiple steps can be done to walk upstream or downstream of known regions.  相似文献   

17.
A sequence-dependent properly of any long polynucleotide can be related to the properties of a limited number of other polynucleotides. The properties of the constituent sequence subunits need not be individually known. Only the properties of a set of linearly independent sequence combinations in polymers need be known. The number of independent sequences depends on the number of neighboring bases (or base pairs) contributing to the property and on the number of different bases (or base pairs) allowed. General formulae are derived. If only nearest neighbors contribute to a property, there are 13 linearly independent single-strand sequences with four different bases and there are 8 linearly independent double-strand sequences with two different base pairs. The study of independent sequences in polymers should be especially useful with double-stranded polynucleotides. It should be possible by this polymer approach to estimate nearest-neighbor frequencies from the circular dichroism of double strands and also to investigate the relation between solution conformation and double-strand sequence.  相似文献   

18.
MOTIVATION: Overlapping gene coding sequences (CDSs) are particularly common in viruses but also occur in more complex genomes. Detecting such genes with conventional gene-finding algorithms can be difficult for several reasons. If an overlapping CDS is on the same read-strand as a known CDS, then there may not be a distinct promoter or mRNA. Furthermore, the constraints imposed by double-coding can result in atypical codon biases. However, these same constraints lead to particular mutation patterns that may be detectable in sequence alignments. RESULTS: In this paper, we investigate several statistics for detecting double-coding sequences with pairwise alignments--including a new maximum-likelihood method. We also develop a model for double-coding sequence evolution. Using simulated sequences generated with the model, we characterize the distribution of each statistic as a function of sequence composition, length, divergence time and double-coding frame. Using these results, we develop several algorithms for detecting overlapping CDSs. The algorithms were tested on known overlapping CDSs and other overlapping open reading frames (ORFs) in the hepatitis B virus (HBV), Escherichia coli and Salmonella typhimurium genomes. The algorithms should prove useful for detecting novel overlapping genes--especially short coding ORFs in viruses. AVAILABILITY: Programs may be obtained from the authors. SUPPLEMENTARY INFORMATION: http://biochem.otago.ac.nz/double.html.  相似文献   

19.
20.
Polyoma virus. The early region and its T-antigens.   总被引:12,自引:2,他引:10  
The DNA sequence of the early coding region of polyoma virus is presented. It consists of 2739 nucleotides. The sequence predicts that more than one reading frame can be used to code for the three known polyoma virus early proteins (designated small, middle and large T-antigens). From the DNA sequence, the 'splicing' signals used in the processing of viral RNA to functional messenger RNAs can be predicted, as well as the sizes and sequences of the three proteins. Other unusual aspects of the DNA sequence are noted. Comparisons are made between the DNA sequences and the predicted amino acid sequences of the respective large T-antigens of polyoma virus and the related virus Simian Virus (SV) 40.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号