首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
SUMMARY: NdPASA is a web server specifically designed to optimize sequence alignment between distantly related proteins. The program integrates structure information of the template sequence into a global alignment algorithm by employing neighbor-dependent propensities of amino acids as a unique parameter for alignment. NdPASA optimizes alignment by evaluating the likelihood of a residue pair in the query sequence matching against a corresponding residue pair adopting a particular secondary structure in the template sequence. NdPASA is most effective in aligning homologous proteins sharing low percentage of sequence identity. The server is designed to aid homologous protein structure modeling. A PSI-BLAST search engine was implemented to help users identify template candidates that are most appropriate for modeling the query sequences.  相似文献   

2.
Sequences in public databases may contain a number of sequencing errors. A double binomial model describing the distribution of indel-excluded similarity coefficients (S) among repeatedly sequenced 16S rRNA was previously developed and it produced a confidence interval of S useful for testing sequence identity among sequences of 400-bp length. We characterized patterns in sequencing errors found in nearly complete 16S rRNA sequences of Vibrionaceae as highly variable in reported sequence length and containing a small number of indels. To accommodate these characteristics, a simple binomial model for distribution of the similarity coefficient (H) that included indels was derived from the double binomial model for S. The model showed good fit to empirical data. By using either a pre-determined or bootstrapping estimated standard probability of base matching, we were able to use the exact binomial test to determine the relative level of sequencing error for a given pair of duplicated sequences. A limitation of the method is the requirement that duplicated sequences for the same template sequence be paired, but this can be overcome by using only conserved regions of 16S rRNA sequences and pairing a given sequence with its highest scoring BLAST search hit from the nr database of GenBank.  相似文献   

3.
We investigated the effects of ssDNA template sequence on both primer synthesis and NTP hydrolysis by herpes simplex virus 1 helicase-primase. Primer synthesis was found to be profoundly dependent upon template sequence. Although not absolutely required, an important sequence feature for significant production of longer primers (beyond four nucleotides in length) is a deoxyguanylate-pyrimidine-pyrimidine (3'-G-pyr-pyr-5') triplet in the template. The deoxyguanylate serves both to direct primase to initiate synthesis opposite the adjacent pyrimidine and to dramatically increase primer length. While primase synthesized significantly more long primers on those templates containing a G-pyr-pyr triplet, the enzyme still synthesized massive quantities of di- and trinucleotides on many templates containing this sequence. Varying the sequences around the G-pyr-pyr recognition sequence dramatically altered both the rate of primer synthesis and the fraction of primers longer than four nucleotides, indicating that primase must interact with both the G-pyr-pyr and flanking sequences in the template. In contrast to the large effects that varying the template sequence had on primase activity, ssDNA-dependent NTPase activity was essentially unaffected by changes in template sequence, including the presence or absence of the G-pyr-pyr trinucleotide. In addition to hydrolyzing NTPs the NTPase could also hydrolyze the 5'-terminal phosphate from newly synthesized primers.  相似文献   

4.
Wang J  Feng JA 《Proteins》2005,58(3):628-637
Sequence alignment has become one of the essential bioinformatics tools in biomedical research. Existing sequence alignment methods can produce reliable alignments for homologous proteins sharing a high percentage of sequence identity. The performance of these methods deteriorates sharply for the sequence pairs sharing less than 25% sequence identity. We report here a new method, NdPASA, for pairwise sequence alignment. This method employs neighbor-dependent propensities of amino acids as a unique parameter for alignment. The values of neighbor-dependent propensity measure the preference of an amino acid pair adopting a particular secondary structure conformation. NdPASA optimizes alignment by evaluating the likelihood of a residue pair in the query sequence matching against a corresponding residue pair adopting a particular secondary structure in the template sequence. Using superpositions of homologous proteins derived from the PSI-BLAST analysis and the Structural Classification of Proteins (SCOP) classification of a nonredundant Protein Data Bank (PDB) database as a gold standard, we show that NdPASA has improved pairwise alignment. Statistical analyses of the performance of NdPASA indicate that the introduction of sequence patterns of secondary structure derived from neighbor-dependent sequence analysis clearly improves alignment performance for sequence pairs sharing less than 20% sequence identity. For sequence pairs sharing 13-21% sequence identity, NdPASA improves the accuracy of alignment over the conventional global alignment (GA) algorithm using the BLOSUM62 by an average of 8.6%. NdPASA is most effective for aligning query sequences with template sequences whose structure is known. NdPASA can be accessed online at http://astro.temple.edu/feng/Servers/BioinformaticServers.htm.  相似文献   

5.
剪接后的内含子与相应mRNA序列的相互作用在基因表达调控过程中起着非常重要的作用。基于27个物种的核糖核蛋白基因序列,采用Smith—Waterman局域比对方法得到外显子连接序列与相应内含子序列的最佳匹配片段,分析了外显子连接序列上的匹配频率分布和匹配片段的序列特征。发现一些低等真核生物EJC结合区域的匹配频率明显低于其它区域,所有物种EJC结合区域的序列构成呈现出相对低的结构序。最佳匹配片段的平均长度和配对率分布与siRNA和miRNA的结合特征相同。推测EJC和内含子在与外显子序列结合的过程中存在相互竞争和相互协作的关系,内含子中部序列在基因表达调控过程中起着重要的作用。  相似文献   

6.
7.
Schneider and Davison [Schneider, S.M., Davison, M., 2005. Demarcated response sequences and the generalised matching law. Behav. Proc. 70, 51–61] showed that the generalised matching law applied to concurrent free-operant two-response sequences. When sufficient temporal spacing is required between the responses, however, neither the response-level nor the sequence-level forms of the generalised matching law provide good fits. An alternative “two-stage sensitivity” model with fewer free parameters features two types of sensitivity to the reinforcement contingencies on sequences. When temporal spacing between the responses is long, the “response distribution sensitivity” parameter describes sensitivity only of the individual responses to the sequence-level contingencies. At a threshold level, this sensitivity reaches a maximum. When spacing is shorter than threshold, the “response order sensitivity” parameter reflects a new sensitivity to the order of the responses within sequences. As this sensitivity approaches its maximum, sequence matching is achieved. For both stages, a changeover parameter describes bias against sequences that require changeovers between the two responses. The model fit data ranging from near-response matching with long minimum inter-response times (IRTs) to sequence matching with no minimum IRTs, using two species and a variety of sequence reinforcer distributions. Rats differed from pigeons in achieving sequence matching only at a nonzero minimum IRT. In a comparison based on pigeon data with no minimum IRT, the two-stage sensitivity model was more efficient than the generalised matching law according to the Akaike criterion. The logic of the model suggests a new way of understanding the mechanisms underlying behavioural units.  相似文献   

8.
Expressed sequence tags (ESTs) represent 500-1000-bp-long sequences corresponding to mRNAs derived from different sources (cell lines, tissues, etc.). The human EST database contains over 8,000,000 sequences, with over 4,000,000,000 total nucleotides. RNA molecules are transcribed from a genomic DNA template; therefore, all ESTs should match corresponding genomes. Nevertheless, we have found in the human EST database approximately 11,000 ESTs not matching sequences in the human genome database. The presence of "trash" ESTs (TESTs) in the EST database could result from DNA or RNA contamination of the laboratory equipment, tissues, or cell lines. TESTs could also represent sequences from unidentified human genes or from species inhabiting the human body. Here, we attempt to identify the sources of human EST database contaminations. In particular, we discuss systematic contamination of the mammalian EST databases with sequences of plants.  相似文献   

9.
In the attempt to explore complex bacterial communities of environmental samples, primers hybridizing to phylogenetically highly conserved regions of 16S rRNA genes are widely used, but differential amplification is a recognized problem. The biases associated with preferential amplification of multitemplate PCR were investigated using 'universal' bacteria-specific primers, focusing on the effect of primer mismatch, annealing temperature and PCR cycle number. The distortion of the template-to-product ratio was measured using predefined template mixtures and environmental samples by terminal restriction fragment length polymorphism analysis. When a 1 : 1 genomic DNA template mixture of two strains was used, primer mismatches inherent in the 63F primer presented a serious bias, showing preferential amplification of the template containing the perfectly matching sequence. The extent of the preferential amplification showed an almost exponential relation with increasing annealing temperature from 47 to 61 degrees C. No negative effect of the various annealing temperatures was observed with the 27F primer, with no mismatches with the target sequences. The number of PCR cycles had little influence on the template-to-product ratios. As a result of additional tests on environmental samples, the use of a low annealing temperature is recommended in order to significantly reduce preferential amplification while maintaining the specificity of PCR.  相似文献   

10.
The RNA world hypothesis states that the early evolution of life went through a stage where RNA served as genome and as catalyst. The replication of RNA world organisms would have been facilitated by ribozymes that catalyze RNA polymerization. To recapitulate an RNA world in the laboratory, a series of RNA polymerase ribozymes was developed previously. However, these ribozymes have a polymerization efficiency that is too low for self-replication, and the most efficient ribozymes prefer one specific template sequence. The limiting factor for polymerization efficiency is the weak sequence-independent binding to its primer/template substrate. Most of the known polymerase ribozymes bind an RNA heptanucleotide to form the P2 duplex on the ribozyme. By modifying this heptanucleotide, we were able to significantly increase polymerization efficiency. Truncations at the 3'-terminus of this heptanucleotide increased full-length primer extension by 10-fold, on a specific template sequence. In contrast, polymerization on several different template sequences was improved dramatically by replacing the RNA heptanucleotide with DNA oligomers containing randomized sequences of 15 nt. The presence of G and T in the random sequences was sufficient for this effect, with an optimal composition of 60% G and 40% T. Our results indicate that these DNA sequences function by establishing many weak and nonspecific base-pairing interactions to the single-stranded portion of the template. Such low-specificity interactions could have had important functions in an RNA world.  相似文献   

11.
Page RD 《Nucleic acids research》2000,28(20):3839-3845
Comparative analysis is the preferred method of inferring RNA secondary structure, but its use requires considerable expertise and manual effort. As the importance of secondary structure for accurate sequence alignment and phylogenetic analysis becomes increasingly realised, the need for secondary structure models for diverse taxonomic groups becomes more pressing. The number of available structures bears little relation to the relative diversity or importance of the different taxonomic groups. Insects, for example, comprise the largest group of animals and yet are very poorly represented in secondary structure databases. This paper explores the utility of maximum weighted matching (MWM) to help automate the process of comparative analysis by inferring secondary structure for insect mitochondrial small subunit (12S) rRNA sequences. By combining information on correlated changes in substitutions and helix dot plots, MWM can rapidly generate plausible models of secondary structure. These models can be further refined using standard comparative techniques. This paper presents a secondary structure model for insect 12S rRNA based on an alignment of 225 insect sequences and an alignment for 16 exemplar insect sequences. This alignment is used as a template for a web server that automatically generates secondary structures for insect sequences.  相似文献   

12.
This article describes a method for determining whether a particular nucleic acid sequence is present in a sample and for discriminating between any two nucleic acid sequences if such sequences differ only by a single nucleotide. The method entails extension of a novel two-component primer on templates that may or may not include a target nucleic acid sequence. The 3′ portion of the primer is complementary to a portion of the template adjacent to the target sequence (for example, the polymorphic nucleotide). The 5′ portion of the primer is complementary to a different preselected nucleic acid sequence. Extension of the 3′ portion of the primer with a labeled deoxynucleoside triphosphate yields a labeled extension product, but only if the template includes the target sequence. The presence of such a labeled primer-extension product is detected by hybridization of the 5′ portion to the preselected sequence. The preselected sequence is immobilized on a solid support. The method has been applied to genotyping individuals for the two-allele polymorphism of the human tyrosinase gene.  相似文献   

13.

Background  

Annotation of sequences that share little similarity to sequences of known function remains a major obstacle in genome annotation. Some of the best methods of detecting remote relationships between protein sequences are based on matching sequence profiles. We analyse the superfamily specific performance of sequence profile-profile matching. Our benchmark consists of a set of 16 protein superfamilies that are highly diverse at the sequence level. We relate the performance to the number of sequences in the profiles, the profile diversity and the extent of structural conservation in the superfamily.  相似文献   

14.
In vitro studies have demonstrated that linear duplex, protein-free DNA molecules containing an inverted terminal repeat (ITR) sequence of the PRD1 genome at one end can undergo replication by a protein-primed mechanism. No DNA replication was observed when the ITR sequence was deleted or was not exposed at the terminus of the template DNA. We have determined the minimal origin of replication by analyzing the template activity of various deletion derivatives. Our results showed that the terminal 20 base-pairs of ITR are required for efficient in vitro DNA replication. We have found that, within the minimal replication origin region, there are complementary sequences. A site-specific mutagenesis analysis showed that most of the point mutations in the complementary sequences markedly reduced the template activity. The analyses of the results obtained with synthetic oligonucleotides have revealed that the specificity of the replication origin is strand specific and even on a single-stranded template a particular DNA sequence including a 3'-terminal C residue is required for the initiation of PRD1 DNA replication in vitro.  相似文献   

15.
Wu HY  Brian DA 《Journal of virology》2007,81(7):3206-3215
Coronaviruses have a positive-strand RNA genome and replicate through the use of a 3' nested set of subgenomic mRNAs each possessing a leader (65 to 90 nucleotides [nt] in length, depending on the viral species) identical to and derived from the genomic leader. One widely supported model for leader acquisition states that a template switch takes place during the generation of negative-strand antileader-containing templates used subsequently for subgenomic mRNA synthesis. In this process, the switch is largely driven by canonical heptameric donor sequences at intergenic sites on the genome that match an acceptor sequence at the 3' end of the genomic leader. With experimentally placed 22-nt-long donor sequences within a bovine coronavirus defective interfering (DI) RNA we have shown that matching sites occurring anywhere within a 65-nt-wide 5'-proximal genomic acceptor hot spot (nt 33 through 97) can be used for production of templates for subgenomic mRNA synthesis from the DI RNA. Here we report that with the same experimental approach, template switches can be induced in trans from an internal site in the DI RNA to the negative-strand antigenome of the helper virus. For these, a 3'-proximal 89-nt acceptor hot spot on the viral antigenome (nt 35 through 123), largely complementary to that described above, was found. Molecules resulting from these switches were not templates for subgenomic mRNA synthesis but, rather, ambisense chimeras potentially exceeding the viral genome in length. The results suggest the existence of a coronavirus 5'-proximal partially double-stranded template switch-facilitating structure of discrete width that contains both the viral genome and antigenome.  相似文献   

16.
TACO is a template library that implements higher-order parallel operations on distributed object sets by means of reusable topology classes and C++ function templates. In this paper we discuss an experimental application that exploits TACO's distributed object groups and collective operations for computing the similarity between groups of molecular sequences, a computationally intensive core problem in molecular biology research. In particular we show how TACO's distributed collections can be conveniently combined with well known concepts found in the C++ standard template library (STL) to solve matching and sorting problems effectively on distributed hardware platforms. The resulting implementation is concise and gives excellent parallel performance on PC- and workstation clusters.  相似文献   

17.
Topology fingerprint approach to the inverse protein folding problem.   总被引:19,自引:0,他引:19  
We describe the most general solution to date of the problem of matching globular protein sequences to the appropriate three-dimensional structures. The screening template, against which sequences are tested, is provided by a protein "structural fingerprint" library based on the contact map and the buried/exposed pattern of residues. Then, a lattice Monte Carlo algorithm validates or dismisses the stability of the proposed fold. Examples of known structural similarities between proteins having weakly or unrelated sequences such as the globins and phycocyanins, the eight-member alpha/beta fold of triose phosphate isomerase and even a close structural equivalence between azurin and immunoglobulins are found.  相似文献   

18.
19.
Next generation sequencing technologies, like ultra-deep pyrosequencing (UDPS), allows detailed investigation of complex populations, like RNA viruses, but its utility is limited by errors introduced during sample preparation and sequencing. By tagging each individual cDNA molecule with barcodes, referred to as Primer IDs, before PCR and sequencing these errors could theoretically be removed. Here we evaluated the Primer ID methodology on 257,846 UDPS reads generated from a HIV-1 SG3Δenv plasmid clone and plasma samples from three HIV-infected patients. The Primer ID consisted of 11 randomized nucleotides, 4,194,304 combinations, in the primer for cDNA synthesis that introduced a unique sequence tag into each cDNA molecule. Consensus template sequences were constructed for reads with Primer IDs that were observed three or more times. Despite high numbers of input template molecules, the number of consensus template sequences was low. With 10,000 input molecules for the clone as few as 97 consensus template sequences were obtained due to highly skewed frequency of resampling. Furthermore, the number of sequenced templates was overestimated due to PCR errors in the Primer IDs. Finally, some consensus template sequences were erroneous due to hotspots for UDPS errors. The Primer ID methodology has the potential to provide highly accurate deep sequencing. However, it is important to be aware that there are remaining challenges with the methodology. In particular it is important to find ways to obtain a more even frequency of resampling of template molecules as well as to identify and remove artefactual consensus template sequences that have been generated by PCR errors in the Primer IDs.  相似文献   

20.
MOTIVATION: The number of known protein sequences is about thousand times larger than the number of experimentally solved 3D structures. For more than half of the protein sequences a close or distant structural analog could be identified. The key starting point in a classical comparative modeling is to generate the best possible sequence alignment with a template or templates. With decreasing sequence similarity, the number of errors in the alignments increases and these errors are the main causes of the decreasing accuracy of the molecular models generated. Here we propose a new approach to comparative modeling, which does not require the implicit alignment - the model building phase explores geometric, evolutionary and physical properties of a template (or templates). RESULTS: The proposed method requires prior identification of a template, although the initial sequence alignment is ignored. The model is built using a very efficient reduced representation search engine CABS to find the best possible superposition of the query protein onto the template represented as a 3D multi-featured scaffold. The criteria used include: sequence similarity, predicted secondary structure consistency, local geometric features and hydrophobicity profile. For more difficult cases, the new method qualitatively outperforms existing schemes of comparative modeling. The algorithm unifies de novo modeling, 3D threading and sequence-based methods. The main idea is general and could be easily combined with other efficient modeling tools as Rosetta, UNRES and others.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号