首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 859 毫秒
1.
summary: We describe an extension to the Homologous Structure Alignment Database (HOMSTRAD; Mizuguchi et al., Protein Sci., 7, 2469-2471, 1998a) to include homologous sequences derived from the protein families database Pfam (Bateman et al., Nucleic Acids Res., 28, 263-266, 2000). HOMSTRAD is integrated with the server FUGUE (Shi et al., submitted, 2001) for recognition and alignment of homologues, benefitting from the combination of abundant sequence information and accurate structure-based alignments. AVAILABILITY The HOMSTRAD database is available at: http://www-cryst.bioc.cam.ac.uk/homstrad/. Query sequences can be submitted to the homology recognition/alignment server FUGUE at: http://www-cryst.bioc.cam.ac.uk/fugue/.  相似文献   

2.
SUMMARY: The DBAli database includes approximately 35000 alignments of pairs of protein structures from SCOP (Lo Conte et al., Nucleic Acids Res., 28, 257-259, 2000) and CE (Shindyalov and Bourne, Protein Eng., 11, 739-747, 1998). DBAli is linked to several resources, including Compare3D (Shindyalov and Bourne, http://www.sdsc.edu/pb/software.htm, 1999) and ModView (Ilyin and Sali, http://guitar.rockefeller.edu/ModView/, 2001) for visualizing sequence alignments and structure superpositions. A flexible search of DBAli by protein sequence and structure properties allows construction of subsets of alignments suitable for a number of applications, such as benchmarking of sequence-sequence and sequence-structure alignment methods under a variety of conditions. AVAILABILITY: http://guitar.rockefeller.edu/DBAli/  相似文献   

3.
The score statistics of a recently introduced 'hybrid alignment' algorithm is studied in detail numerically. An extensive survey across the 2216 models of protein domains contained in the Pfam v5.4 database (Bateman et al., Nucleic Acids Res., 28, 263-266, 2000) verifies the theoretical predictions: For the position-specific scoring functions used in the Pfam models, the score statistics of hybrid alignment obey the Gumbel distribution, with the key Gumbel parameter lambda taking on the asymptotic value 1 universally for all models. Thus, the use of hybrid alignment eliminates the time-consuming computer simulations normally needed to assign p-values to alignment scores, freeing the users to experiment with different scoring parameters and functions. The performance of the hybrid algorithm in detecting sequence homology is also studied. For protein sequences from the SCOP database (Murzin et al., J. Mol. Biol., 247, 536-540, 1995) using uniform scoring functions, the performance is found to be comparable to the best of the existing methods. Preliminary results using the PfamA database suggest that the hybrid algorithm achieves similar performance as existing methods for position-specific scoring systems as well. Hybrid alignment is thereby established as a high performance alignment algorithm with well-characterized, universal statistics.  相似文献   

4.
Detection of functional modules from protein interaction networks   总被引:4,自引:0,他引:4  
  相似文献   

5.
A database comprising all ligand-binding sites of known structure aligned with all related protein sequences and structures is described. Currently, the database contains approximately 50000 ligand-binding sites for small molecules found in the Protein Data Bank (PDB). The structure-structure alignments are obtained by the Combinatorial Extension (CE) program (Shindyalov and Bourne, Protein Eng., 11, 739-747, 1998) and sequence-structure alignments are extracted from the ModBase database of comparative protein structure models for all known protein sequences (Sanchez et al., Nucleic Acids Res., 28, 250-253, 2000). It is possible to search for binding sites in LigBase by a variety of criteria. LigBase reports summarize ligand data including relevant structural information from the PDB file, such as ligand type and size, and contain links to all related protein sequences in the TrEMBL database. Residues in the binding sites are graphically depicted for comparison with other structurally defined family members. LigBase provides a resource for the analysis of families of related binding sites.  相似文献   

6.
Where differences have been reported between tumor and normal mitochondrial DNA (mtDNA), they have generally involved limited modifications of the genome (Taira et al., Nucleic Acids Res. 11:1635, 1983; Shay and Werbin, Mutat. Res. 186:149, 1987). However, Corral et al. (Nucleic Acids Res. 16:10935, 1988; 17:5191, 1989) observed recombination between cytochrome oxidase subunit I (COI) and NADH dehydrogenase subunit 6 (ND6), two genes normally on opposite sides of the circular mitochondrial genome. In rat hepatoma mtDNA COI and ND6 were reported to be separated by only 230 base pairs (Corral et al., 1988, 1989). We have performed RFLP analysis on mtDNA from normal rat livers and rat hepatomas, using COI and ND6 probes. Additional experiments compared end-labeled DNA fragments produced by EcoRI and HindIII digestion of mtDNA. These studies failed to provide any evidence for genetic recombination in rat hepatoma mtDNA, even in the same cell line used by Corral et al. Rather, they support the conclusion that mtDNA from tumor and normal tissues exhibits a low degree of heterogeneity.  相似文献   

7.
MUSCLE: multiple sequence alignment with high accuracy and high throughput   总被引:32,自引:0,他引:32  
We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.  相似文献   

8.
The Stanford Microarray Database (SMD) stores raw and normalized data from microarray experiments, and provides web interfaces for researchers to retrieve, analyze and visualize their data. The two immediate goals for SMD are to serve as a storage site for microarray data from ongoing research at Stanford University, and to facilitate the public dissemination of that data once published, or released by the researcher. Of paramount importance is the connection of microarray data with the biological data that pertains to the DNA deposited on the microarray (genes, clones etc.). SMD makes use of many public resources to connect expression information to the relevant biology, including SGD [Ball,C.A., Dolinski,K., Dwight,S.S., Harris,M.A., Issel-Tarver,L., Kasarskis,A., Scafe,C.R., Sherlock,G., Binkley,G., Jin,H. et al. (2000) Nucleic Acids Res., 28, 77-80], YPD and WormPD [Costanzo,M.C., Hogan,J.D., Cusick,M.E., Davis,B.P., Fancher,A.M., Hodges,P.E., Kondu,P., Lengieza,C., Lew-Smith,J.E., Lingner,C. et al. (2000) Nucleic Acids Res., 28, 73-76], Unigene [Wheeler,D.L., Chappey,C., Lash,A.E., Leipe,D.D., Madden,T.L., Schuler,G.D., Tatusova,T.A. and Rapp,B.A. (2000) Nucleic Acids Res., 28, 10-14], dbEST [Boguski,M.S., Lowe,T.M. and Tolstoshev,C.M. (1993) Nature Genet., 4, 332-333] and SWISS-PROT [Bairoch,A. and Apweiler,R. (2000) Nucleic Acids Res., 28, 45-48] and can be accessed at http://genome-www.stanford.edu/microarray.  相似文献   

9.
The NMR structure of a 31mer RNA constituting a functionally important domain of the catalytic RNase P RNA from Escherichia coli is reported. Severe spectral overlaps of the proton resonances in the natural 31mer RNA (1) were successfully tackled by unique spectral simplifications found in the partially-deuterated 31 mer RNA analogue (2) incorporating deuterated cytidines [C5 (>95 atom % 2H), C2' (>97 atom % 2H), C3' (>97 atom % 2H), C4' (>65 atom % 2H) and C5' (>97 atom % 2H)] [for the 'NMR-window' concept see: Földesi,A. et al. (1992) Tetrahedron, 48, 9033; Foldesi,A. et al. (1993) J. Biochem. Biophys. Methods, 26, 1; Yamakage,S.-I. et al. (1993) Nucleic Acids Res., 21, 5005; Agback,P. et al. (1994) Nucleic Acids Res., 22, 1404; Földesi,A. et al. (1995) Tetrahedron, 51, 10065; Földesi,A. et al. (1996) Nucleic Acids Res., 24, 1187-1194]. 175 resonances have been assigned out of total of 235 non-exchangeable proton resonances in (1) in an unprecedented manner in the absence of 13C and 15N labelling. 41 out of 175 assigned resonances could be accomplished with the help of the deuterated analogue (2). The two stems in 31mer RNA adopt an A-type RNA conformation and the base-stacking continues from stem I into the beginning of the loop I. Long distance cross-strand NOEs showed a structured conformation at the junction between stem I and loop I. The loop I-stem II junction is less ordered and shows structural perturbation at and around the G11 -C22 base pair.  相似文献   

10.
A consensus sequence has been determined for a major interspersed deoxyribonucleic acid repeat in the genome of Chinese hamster ovary cells (CHO cells). This sequence is extensively homologous to (i) the human Alu sequence (P. L. Deininger et al., J. Mol. Biol., in press), (ii) the mouse B1 interspersed repetitious sequence (Krayev et al., Nucleic Acids Res. 8:1201-1215, 1980) (iii) an interspersed repetitious sequence from African green monkey deoxyribonucleic acid (Dhruva et al., Proc. Natl. Acad. Sci. U.S.A. 77:4514-4518, 1980) and (iv) the CHO and mouse 4.5S ribonucleic acid (this report; F. Harada and N. Kato, Nucleic Acids Res. 8:1273-1285, 1980). Because the CHO consensus sequence shows significant homology to the human Alu sequence it is termed the CHO Alu-equivalent sequence. A conserved structure surrounding CHO Alu-equivalent family members can be recognized. It is similar to that surrounding the human Alu and the mouse B1 sequences, and is represented as follows: direct repeat-CHO-Alu-A-rich sequence-direct repeat. A composite interspersed repetitious sequence has been identified. Its structure is represented as follows: direct repeat-residue 47 to 107 of CHO-Alu-non-Alu repetitious sequence-A-rich sequence-direct repeat. Because the Alu flanking sequences resemble those that flank known transposable elements, we think it likely that the Alu sequence dispersed throughout the mammalian genome by transposition.  相似文献   

11.
Abstract

Thermodynamic parameters for duplex formation were determined from CD melting curves for r(GGACGAGUCC)2 and d(GGACGAGTCC)2, both of which form two consecutive ‘sheared’ A:G base pairs at the center [Katahira et al. (1993) Nucleic Acids Res. 21, 5418–5424; Katahira et al., (1994) Nucleic Acids Res. 22, 2752–27591. The parameters were determined also for r(GGACUAGUCC)2 and d(GGACTAGTCC)2, where the A:G mismatches are replaced by Watson-Crick A:U(T) base pairs. Thermodynamic properties for duplex formation are compared between the sheared and the Watson-Crick base pairs, and between RNA and DNA. Difference in the thermodynamic stability is analyzed and discussed in terms of enthalpy and entropy changes. The characteristic features in CD spectra of RNA and DNA containing the sheared A:G base pairs are also reported.

  相似文献   

12.
The DNA sequence spanning coordinates 9.9 to 16.4 kilobases of the lactose transposon Tn951 ( Cornelis et al., Mol. Gen. Genet. 160:215-224, 1978) constitutes a transposable element by itself. Unlike Tn951 ( Cornelis et al., Mol. Gen. Genet. 184:241-248, 1981), this element, called Tn2501 , transposes in the absence of any other transposon. Transposition of Tn2501 proceeds through transient cointegration and duplicates 5 base pairs of host DNA. Tn2501 is flanked by nearly perfect inverted repeats (44 of 48), related to the inverted repeats of Tn21 ( Zheng et al., Nucleic Acids Res. 9:6265-6278, 1982). Unlike Tn21 , Tn2501 does not confer mercury resistance.  相似文献   

13.
The sequences of the genes coding for M.CviBIII (from virus NC-1A which infects a eukaryotic alga) [Narva et al., Nucleic Acids Res. 15 (1987) 9807-9823] and M.TaqI (from the bacterium Thermus aquaticus) [Slatko et al., Nucleic Acids Res. 15 (1987) 9781-9796] have been determined recently. Both enzymes methylate adenine in the sequence TCGA. We have compared the predicted amino acid sequences of these two methyltransferases (MTases), with each other and with ten other N6 A-MTases and find regions of similarity. M.CviBIII and M.TaqI were most closely related followed by M.PaeR7, whose recognition sequence (CTCGAG) contains the M.TaqI/M.CviBIII recognition sequence TCGA, and M.PstI, whose recognition sequence is CTGCAG. All of the N6-MTases contain the sequence Asp/Asn-Pro-Pro-Tyr (B-P-P-Y) referred to by Hattman et al. [J. Bacteriol. 164 (1985) 932-937] as region IV. The predicted secondary structure of this region forms a finger-like structure ('beta finger') containing a beta-pleated sheet (...XXXB), two beta-turns (P-P) followed by another beta-pleated sheet [Y/FXXX...].  相似文献   

14.
SUMMARY: 3MOTIF is a web application that visually maps conserved sequence motifs onto three-dimensional protein structures in the Protein Data Bank (PDB; Berman et al., Nucleic Acids Res., 28, 235-242, 2000). Important properties of motifs such as conservation strength and solvent accessible surface area at each position are visually represented on the structure using a variety of color shading schemes. Users can manipulate the displayed motifs using the freely available Chime plugin. AVAILABILITY: http://motif.stanford.edu/3motif/  相似文献   

15.
Aurintricarboxylic acid (ATA) is a well-known inhibitor of RNA and DNA modifying enzymes and was suggested as a potent RNase inhibitor for preparation of RNA (Hallick et al., 1977, Nucleic Acids Res. 4, 3055-3064). We show that ATA is a very useful stain for detecting RNA on Northern blots and slot blots although it did not fully protect purified RNA in concentrated solution against RNase A.  相似文献   

16.
A new version of the RDP (Ribosomal Database Project).   总被引:69,自引:0,他引:69       下载免费PDF全文
The Ribosomal Database Project (RDP-II), previously described by Maidak et al. [ Nucleic Acids Res. (1997), 25, 109-111], is now hosted by the Center for Microbial Ecology at Michigan State University. RDP-II is a curated database that offers ribosomal RNA (rRNA) nucleotide sequence data in aligned and unaligned forms, analysis services, and associated computer programs. During the past two years, data alignments have been updated and now include >9700 small subunit rRNA sequences. The recent development of an ObjectStore database will provide more rapid updating of data, better data accuracy and increased user access. RDP-II includes phylogenetically ordered alignments of rRNA sequences, derived phylogenetic trees, rRNA secondary structure diagrams, and various software programs for handling, analyzing and displaying alignments and trees. The data are available via anonymous ftp (ftp.cme.msu. edu) and WWW (http://www.cme.msu.edu/RDP). The WWW server provides ribosomal probe checking, approximate phylogenetic placement of user-submitted sequences, screening for possible chimeric rRNA sequences, automated alignment, and a suggested placement of an unknown sequence on an existing phylogenetic tree. Additional utilities also exist at RDP-II, including distance matrix, T-RFLP, and a Java-based viewer of the phylogenetic trees that can be used to create subtrees.  相似文献   

17.
Empirical models of substitution are often used in protein sequence analysis because the large alphabet of amino acids requires that many parameters be estimated in all but the simplest parametric models. When information about structure is used in the analysis of substitutions in structured RNA, a similar situation occurs. The number of parameters necessary to adequately describe the substitution process increases in order to model the substitution of paired bases. We have developed a method to obtain substitution rate matrices empirically from RNA alignments that include structural information in the form of base pairs. Our data consisted of alignments from the European Ribosomal RNA Database of Bacterial and Eukaryotic Small Subunit and Large Subunit Ribosomal RNA ( Wuyts et al. 2001. Nucleic Acids Res. 29:175-177; Wuyts et al. 2002. Nucleic Acids Res. 30:183-185). Using secondary structural information, we converted each sequence in the alignments into a sequence over a 20-symbol code: one symbol for each of the four individual bases, and one symbol for each of the 16 ordered pairs. Substitutions in the coded sequences are defined in the natural way, as observed changes between two sequences at any particular site. For given ranges (windows) of sequence divergence, we obtained substitution frequency matrices for the coded sequences. Using a technique originally developed for modeling amino acid substitutions ( Veerassamy, Smith, and Tillier. 2003. J. Comput. Biol. 10:997-1010), we were able to estimate the actual evolutionary distance for each window. The actual evolutionary distances were used to derive instantaneous rate matrices, and from these we selected a universal rate matrix. The universal rate matrices were incorporated into the Phylip Software package ( Felsenstein 2002. http://evolution.genetics.washington.edu/phylip.html), and we analyzed the ribosomal RNA alignments using both distance and maximum likelihood methods. The empirical substitution models performed well on simulated data, and produced reasonable evolutionary trees for 16S ribosomal RNA sequences from sequenced Bacterial genomes. Empirical models have the advantage of being easily implemented, and the fact that the code consists of 20 symbols makes the models easily incorporated into existing programs for protein sequence analysis. In addition, the models are useful for simulating the evolution of RNA sequence and structure simultaneously.  相似文献   

18.
We describe an exhaustive and greedy algorithm for improving the accuracy of multiple sequence alignment. A simple progressive alignment approach is employed to provide initial alignments. The initial alignment is then iteratively optimized against an objective function. For any working alignment, the optimization involves three operations: insertions, deletions and shuffles of gaps. The optimization is exhaustive since the algorithm applies the above operations to all eligible positions of an alignment. It is also greedy since only the operation that gives the best improving objective score will be accepted. The algorithms have been implemented in the EGMA (Exhaustive and Greedy Multiple Alignment) package using Java programming language, and have been evaluated using the BAliBASE benchmark alignment database. Although EGMA is not guaranteed to produce globally optimized alignment, the tests indicate that EGMA is able to build alignments with high quality consistently, compared with other commonly used iterative and non-iterative alignment programs. It is also useful for refining multiple alignments obtained by other methods.  相似文献   

19.
Photochemical alterations following ultraviolet irradiation of the alternating copolymer d(GT)n.d(CA)n were studied. We found that in solution conditions which produced circular dichroism spectra compatible with B-form or A-form DNA, no interstrand cross-linking or photoproduct formation could be demonstrated. Zimmer et al. (Zimmer, C., Tymen, S., Marck, C., and Guschlbaumer, W. (1982) Nucleic Acids Res. 10, 1081-1091) and Vorlickova et al. (Vorlickova, M., Kypr, J., Sotkrova, S., Sponar, J. (1982) Nucleic Acids Res. 10, 1071-1080) have reported a number of solution conditions which produce a structural transition of this polymer characterized by a negative deviation of the circular dichroism spectrum in the region of 280 nm. The nature of this transition has not yet been elucidated. Following ultraviolet irradiation of d(GT)n.d(CA)n under two conditions which produce this transition (manganese solution or ethanol plus trace salts solution) we found ultraviolet dose-dependent interstrand cross-linking as well as dose-dependent formation of thymine-containing photoproduct. Interstrand cross-linking is demonstrated by two criteria: increase in polymer size as detected by alkaline agarose gel electrophoresis, and generation of intermediate density material in alkaline cesium sulfate isopycnic gradients. The thymine-containing photo-product was demonstrated by thin layer chromatography of acid hydrolysates of the polymer. The photo-product is at least partially photoreversible. These findings suggest that the geometry of the alternative conformation is such that pyrimidines from different strands are closely approximated, allowing for photodimerization.  相似文献   

20.
MOTIVATION: Partial order alignment (POA) has been proposed as a new approach to multiple sequence alignment (MSA), which can be combined with existing methods such as progressive alignment. This is important for addressing problems both in the original version of POA (such as order sensitivity) and in standard progressive alignment programs (such as information loss in complex alignments, especially surrounding gap regions). RESULTS: We have developed a new Partial Order-Partial Order alignment algorithm that optimally aligns a pair of MSAs and which therefore can be applied directly to progressive alignment methods such as CLUSTAL. Using this algorithm, we show the combined Progressive POA alignment method yields results comparable with the best available MSA programs (CLUSTALW, DIALIGN2, T-COFFEE) but is far faster. For example, depending on the level of sequence similarity, aligning 1000 sequences, each 500 amino acids long, took 15 min (at 90% average identity) to 44 min (at 30% identity) on a standard PC. For large alignments, Progressive POA was 10-30 times faster than the fastest of the three previous methods (CLUSTALW). These data suggest that POA-based methods can scale to much larger alignment problems than possible for previous methods. AVAILABILITY: The POA source code is available at http://www.bioinformatics.ucla.edu/poa  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号