首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Lee D  Grant A  Marsden RL  Orengo C 《Proteins》2005,59(3):603-615
Using a new protocol, PFscape, we undertake a systematic identification of protein families and domain architectures in 120 complete genomes. PFscape clusters sequences into protein families using a Markov clustering algorithm (Enright et al., Nucleic Acids Res 2002;30:1575-1584) followed by complete linkage clustering according to sequence identity. Within each protein family, domains are recognized using a library of hidden Markov models comprising CATH structural and Pfam functional domains. Domain architectures are then determined using DomainFinder (Pearl et al., Protein Sci 2002;11:233-244) and the protein family and domain architecture data are amalgamated in the Gene3D database (Buchan et al., Genome Res 2002;12:503-514). Using Gene3D, we have investigated protein sequence space, the extent of structural annotation, and the distribution of different domain architectures in completed genomes from all kingdoms of life. As with earlier studies by other researchers, the distribution of domain families shows power-law behavior such that the largest 2,000 domain families can be mapped to approximately 70% of nonsingleton genome sequences; the remaining sequences are assigned to much smaller families. While approximately 50% of domain annotations within a genome are assigned to 219 universal domain families, a much smaller proportion (< 10%) of protein sequences are assigned to universal protein families. This supports the mosaic theory of evolution whereby domain duplication followed by domain shuffling gives rise to novel domain architectures that can expand the protein functional repertoire of an organism. Functional data (e.g. COG/KEGG/GO) integrated within Gene3D result in a comprehensive resource that is currently being used in structure genomics initiatives and can be accessed via http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/.  相似文献   

2.
The Stanford Microarray Database (SMD) stores raw and normalized data from microarray experiments, and provides web interfaces for researchers to retrieve, analyze and visualize their data. The two immediate goals for SMD are to serve as a storage site for microarray data from ongoing research at Stanford University, and to facilitate the public dissemination of that data once published, or released by the researcher. Of paramount importance is the connection of microarray data with the biological data that pertains to the DNA deposited on the microarray (genes, clones etc.). SMD makes use of many public resources to connect expression information to the relevant biology, including SGD [Ball,C.A., Dolinski,K., Dwight,S.S., Harris,M.A., Issel-Tarver,L., Kasarskis,A., Scafe,C.R., Sherlock,G., Binkley,G., Jin,H. et al. (2000) Nucleic Acids Res., 28, 77-80], YPD and WormPD [Costanzo,M.C., Hogan,J.D., Cusick,M.E., Davis,B.P., Fancher,A.M., Hodges,P.E., Kondu,P., Lengieza,C., Lew-Smith,J.E., Lingner,C. et al. (2000) Nucleic Acids Res., 28, 73-76], Unigene [Wheeler,D.L., Chappey,C., Lash,A.E., Leipe,D.D., Madden,T.L., Schuler,G.D., Tatusova,T.A. and Rapp,B.A. (2000) Nucleic Acids Res., 28, 10-14], dbEST [Boguski,M.S., Lowe,T.M. and Tolstoshev,C.M. (1993) Nature Genet., 4, 332-333] and SWISS-PROT [Bairoch,A. and Apweiler,R. (2000) Nucleic Acids Res., 28, 45-48] and can be accessed at http://genome-www.stanford.edu/microarray.  相似文献   

3.
summary: We describe an extension to the Homologous Structure Alignment Database (HOMSTRAD; Mizuguchi et al., Protein Sci., 7, 2469-2471, 1998a) to include homologous sequences derived from the protein families database Pfam (Bateman et al., Nucleic Acids Res., 28, 263-266, 2000). HOMSTRAD is integrated with the server FUGUE (Shi et al., submitted, 2001) for recognition and alignment of homologues, benefitting from the combination of abundant sequence information and accurate structure-based alignments. AVAILABILITY The HOMSTRAD database is available at: http://www-cryst.bioc.cam.ac.uk/homstrad/. Query sequences can be submitted to the homology recognition/alignment server FUGUE at: http://www-cryst.bioc.cam.ac.uk/fugue/.  相似文献   

4.
MOTIVATION: We review proposed syntheses of probabilistic sequence alignment, profiling and phylogeny. We develop a multiple alignment algorithm for Bayesian inference in the links model proposed by Thorne et al. (1991, J. Mol. Evol., 33, 114-124). The algorithm, described in detail in Section 3, samples from and/or maximizes the posterior distribution over multiple alignments for any number of DNA or protein sequences, conditioned on a phylogenetic tree. The individual sampling and maximization steps of the algorithm require no more computational resources than pairwise alignment. METHODS: We present a software implementation (Handel) of our algorithm and report test results on (i) simulated data sets and (ii) the structurally informed protein alignments of BAliBASE (Thompson et al., 1999, Nucleic Acids Res., 27, 2682-2690). RESULTS: We find that the mean sum-of-pairs score (a measure of residue-pair correspondence) for the BAliBASE alignments is only 13% lower for Handelthan for CLUSTALW(Thompson et al., 1994, Nucleic Acids Res., 22, 4673-4680), despite the relative simplicity of the links model (CLUSTALW uses affine gap scores and increased penalties for indels in hydrophobic regions). With reference to these benchmarks, we discuss potential improvements to the links model and implications for Bayesian multiple alignment and phylogenetic profiling. AVAILABILITY: The source code to Handelis freely distributed on the Internet at http://www.biowiki.org/Handel under the terms of the GNU Public License (GPL, 2000, http://www.fsf.org./copyleft/gpl.html).  相似文献   

5.
SUMMARY: 3MOTIF is a web application that visually maps conserved sequence motifs onto three-dimensional protein structures in the Protein Data Bank (PDB; Berman et al., Nucleic Acids Res., 28, 235-242, 2000). Important properties of motifs such as conservation strength and solvent accessible surface area at each position are visually represented on the structure using a variety of color shading schemes. Users can manipulate the displayed motifs using the freely available Chime plugin. AVAILABILITY: http://motif.stanford.edu/3motif/  相似文献   

6.
Where differences have been reported between tumor and normal mitochondrial DNA (mtDNA), they have generally involved limited modifications of the genome (Taira et al., Nucleic Acids Res. 11:1635, 1983; Shay and Werbin, Mutat. Res. 186:149, 1987). However, Corral et al. (Nucleic Acids Res. 16:10935, 1988; 17:5191, 1989) observed recombination between cytochrome oxidase subunit I (COI) and NADH dehydrogenase subunit 6 (ND6), two genes normally on opposite sides of the circular mitochondrial genome. In rat hepatoma mtDNA COI and ND6 were reported to be separated by only 230 base pairs (Corral et al., 1988, 1989). We have performed RFLP analysis on mtDNA from normal rat livers and rat hepatomas, using COI and ND6 probes. Additional experiments compared end-labeled DNA fragments produced by EcoRI and HindIII digestion of mtDNA. These studies failed to provide any evidence for genetic recombination in rat hepatoma mtDNA, even in the same cell line used by Corral et al. Rather, they support the conclusion that mtDNA from tumor and normal tissues exhibits a low degree of heterogeneity.  相似文献   

7.
We recently reported on the use of 1,2,4-dithiazolidine-3,5-dione (DtsNH) and 3-ethoxy-1,2,4-dithiazoline-5-one (EDITH) as effective sulfurizing reagents for the preparation of phosphorothioate-containing oligodeoxyribo-nucleotides [Xu et al. (1996) Nucleic Acids Res., 24, 1602-1607]. One challenge in automated solid-phase synthesis of phosphorothioate-containing RNA is to develop sulfurization reagents that are effective in the presence of bulky 2'-OH protecting groups. The present study demonstrates that EDITH is exceedingly effective at low concentrations (0.05 M) and short reaction times (2 min) for the automated synthesis of oligoribonucleotides.  相似文献   

8.
MOTIVATION: The process of determining the functional sequence content of an organism is confounded by several factors. Large protein coding sequences are relatively easy to find by statistical methods. Smaller proteins however may escape detection due to their size falling below some arbitrary researcher-defined minimum cutoff, or the inability to precisely define a promoter, or translational start (Delcher et al., Nucleic Acids Res., 27, 4636-4641, 1999). Promoter and regulatory sequences themselves are difficult to define due to a significant amount of allowable sequence variation, as well as a probable lack of any completely accurate whole-organismal gene catalogs to date. Finally, certain genes coding functional RNAs may have insufficient structural or sequence constraints to be detectable by normal sequence structure/pattern searching methods (Eddy and Rivas, Bioinformatics, 16, 583-605, 2000). In those cases where there are multiple closely related organisms that have been sequenced, there is additional information that may be used in the investigation of sequence content-that being the possible conserved nature of functional sequences between the organisms. We present a method for the utilization of this conserved information to detect genes and other potentially functional sequences that may be missed by standard ORF-calling, RNA finding, and pattern matching software. The tricross programs produce a multi-way cross comparison of three sets of sequences, determine which are conserved in all three sets, and produce a graphical (Virtual Reality Modelling Language-VRML; (ISO/IEC 14772-1: 1997, VDC), 1997) representation as well as alignments of all sequence triples found. The software can also be applied to a pair of sequence sets, though the noise in the results increases. RESULTS: Tricross has been used to examine the intergenic-sequence content of the three archaeal Pyrococcus genomes to determine the most highly related sequences remaining between the annotated protein and RNA coding sequences. Set to relatively stringent similarity requirements for the search, tricross found 101 intergenic sequences conserved among the three organisms. Interestingly, 29 of these appear to contain members of a family of small RNA molecules (Kiss-Laszlo et al., EMBO J., 17, 797-807, 1998) only recently discovered in the Archaea (Armbruster, OSU, Diss., 1988; Omer et al., Science, 288, 517-522, 2000; Gaspin et al., J. Mol. Biol., 297, 895-906, 2000). While some of the remaining 72 appear to be individual highly conserved promoter sequences, others have no currently known biological significance. Although originally developed to facilitate the examination of intergenic sequences, none of the tricross logic is inherently specific to intergenic sequences. The software can also be applied to gene sequences, and has been used to produce inter-genomic gene order dot-plots for Haemophilus influenzae (Fleischmann et al., Science, 269, 496-512, 1995) versus H.ducreyi (unpublished data), and Neisseria meningiditis Z2491 (serogroup A) (Parkhill et al., Nature, 404, 502-506, 2000) versus Neisseria meningiditis Z58 (serogroup B) (Tettelin et al., Science, 287, 1809-1815, 2000) versus Neisseria gonorrhoeae (Lewis et al., http://micro-gen.ouhsc.edu/, 2000). AVAILABILITY: The tricross software package is available from http://www.biosci.ohio-state.edu/~ray/bioinformatics/tricross.html. CONTACT: ray@biosci.ohio-state.edu; daniels.7@osu.edu; munsonr@pediatrics.ohio-state.edu Supplementary information: Additional data from the cross-genomic comparisons examined in the discussion section are linked from http://www.biosci.ohio-state.edu/~ray/bioinformatics/tricross.html.  相似文献   

9.
The germ cell nuclear factor (GCNF)   总被引:1,自引:0,他引:1  
The germ cell nuclear factor (GCNF), which is also known as RTR (retinoid receptor-related testis-associated receptor) is a member of the nuclear receptor superfamily. As a natural ligand remains to be discovered, GCNF is referred to as an orphan receptor. Owing to GCNF's unique features and its distant relation to any other known nuclear receptor it has been classified as the only member of the subgroup six and designated NR6A1 by the Receptor Nomenclature Committee (Duarte et al., 2002: Nucleic Acids Res 30: 364-368). To date, GCNF has been cloned from distinct vertebrate species, including zebrafish, Xenopus laevis, mouse, rat, and human. Cloning and characterization of the gene, domain organization and DNA binding properties of the protein, as well as the differential expression of mRNA splice variants or the protein during development and in the adult animal have been comprehensively reviewed by others (Greschik and Schüle, 1998: J Mol Med 76:800-810; Cooney et al., 1999: Am Zool 39:796-806). In this minireview I focus on the pleiotropic function of GCNF in embryogenesis and germ cell differentiation, and discuss novel concepts about its putative role in neurogenesis.  相似文献   

10.
The NMR structure of a 31mer RNA constituting a functionally important domain of the catalytic RNase P RNA from Escherichia coli is reported. Severe spectral overlaps of the proton resonances in the natural 31mer RNA (1) were successfully tackled by unique spectral simplifications found in the partially-deuterated 31 mer RNA analogue (2) incorporating deuterated cytidines [C5 (>95 atom % 2H), C2' (>97 atom % 2H), C3' (>97 atom % 2H), C4' (>65 atom % 2H) and C5' (>97 atom % 2H)] [for the 'NMR-window' concept see: Földesi,A. et al. (1992) Tetrahedron, 48, 9033; Foldesi,A. et al. (1993) J. Biochem. Biophys. Methods, 26, 1; Yamakage,S.-I. et al. (1993) Nucleic Acids Res., 21, 5005; Agback,P. et al. (1994) Nucleic Acids Res., 22, 1404; Földesi,A. et al. (1995) Tetrahedron, 51, 10065; Földesi,A. et al. (1996) Nucleic Acids Res., 24, 1187-1194]. 175 resonances have been assigned out of total of 235 non-exchangeable proton resonances in (1) in an unprecedented manner in the absence of 13C and 15N labelling. 41 out of 175 assigned resonances could be accomplished with the help of the deuterated analogue (2). The two stems in 31mer RNA adopt an A-type RNA conformation and the base-stacking continues from stem I into the beginning of the loop I. Long distance cross-strand NOEs showed a structured conformation at the junction between stem I and loop I. The loop I-stem II junction is less ordered and shows structural perturbation at and around the G11 -C22 base pair.  相似文献   

11.
A consensus sequence has been determined for a major interspersed deoxyribonucleic acid repeat in the genome of Chinese hamster ovary cells (CHO cells). This sequence is extensively homologous to (i) the human Alu sequence (P. L. Deininger et al., J. Mol. Biol., in press), (ii) the mouse B1 interspersed repetitious sequence (Krayev et al., Nucleic Acids Res. 8:1201-1215, 1980) (iii) an interspersed repetitious sequence from African green monkey deoxyribonucleic acid (Dhruva et al., Proc. Natl. Acad. Sci. U.S.A. 77:4514-4518, 1980) and (iv) the CHO and mouse 4.5S ribonucleic acid (this report; F. Harada and N. Kato, Nucleic Acids Res. 8:1273-1285, 1980). Because the CHO consensus sequence shows significant homology to the human Alu sequence it is termed the CHO Alu-equivalent sequence. A conserved structure surrounding CHO Alu-equivalent family members can be recognized. It is similar to that surrounding the human Alu and the mouse B1 sequences, and is represented as follows: direct repeat-CHO-Alu-A-rich sequence-direct repeat. A composite interspersed repetitious sequence has been identified. Its structure is represented as follows: direct repeat-residue 47 to 107 of CHO-Alu-non-Alu repetitious sequence-A-rich sequence-direct repeat. Because the Alu flanking sequences resemble those that flank known transposable elements, we think it likely that the Alu sequence dispersed throughout the mammalian genome by transposition.  相似文献   

12.
The DNA sequence spanning coordinates 9.9 to 16.4 kilobases of the lactose transposon Tn951 ( Cornelis et al., Mol. Gen. Genet. 160:215-224, 1978) constitutes a transposable element by itself. Unlike Tn951 ( Cornelis et al., Mol. Gen. Genet. 184:241-248, 1981), this element, called Tn2501 , transposes in the absence of any other transposon. Transposition of Tn2501 proceeds through transient cointegration and duplicates 5 base pairs of host DNA. Tn2501 is flanked by nearly perfect inverted repeats (44 of 48), related to the inverted repeats of Tn21 ( Zheng et al., Nucleic Acids Res. 9:6265-6278, 1982). Unlike Tn21 , Tn2501 does not confer mercury resistance.  相似文献   

13.
14.
The sequences of the genes coding for M.CviBIII (from virus NC-1A which infects a eukaryotic alga) [Narva et al., Nucleic Acids Res. 15 (1987) 9807-9823] and M.TaqI (from the bacterium Thermus aquaticus) [Slatko et al., Nucleic Acids Res. 15 (1987) 9781-9796] have been determined recently. Both enzymes methylate adenine in the sequence TCGA. We have compared the predicted amino acid sequences of these two methyltransferases (MTases), with each other and with ten other N6 A-MTases and find regions of similarity. M.CviBIII and M.TaqI were most closely related followed by M.PaeR7, whose recognition sequence (CTCGAG) contains the M.TaqI/M.CviBIII recognition sequence TCGA, and M.PstI, whose recognition sequence is CTGCAG. All of the N6-MTases contain the sequence Asp/Asn-Pro-Pro-Tyr (B-P-P-Y) referred to by Hattman et al. [J. Bacteriol. 164 (1985) 932-937] as region IV. The predicted secondary structure of this region forms a finger-like structure ('beta finger') containing a beta-pleated sheet (...XXXB), two beta-turns (P-P) followed by another beta-pleated sheet [Y/FXXX...].  相似文献   

15.
Structure of a mouse histone-encoding gene cluster   总被引:5,自引:0,他引:5  
  相似文献   

16.
Abstract

Thermodynamic parameters for duplex formation were determined from CD melting curves for r(GGACGAGUCC)2 and d(GGACGAGTCC)2, both of which form two consecutive ‘sheared’ A:G base pairs at the center [Katahira et al. (1993) Nucleic Acids Res. 21, 5418–5424; Katahira et al., (1994) Nucleic Acids Res. 22, 2752–27591. The parameters were determined also for r(GGACUAGUCC)2 and d(GGACTAGTCC)2, where the A:G mismatches are replaced by Watson-Crick A:U(T) base pairs. Thermodynamic properties for duplex formation are compared between the sheared and the Watson-Crick base pairs, and between RNA and DNA. Difference in the thermodynamic stability is analyzed and discussed in terms of enthalpy and entropy changes. The characteristic features in CD spectra of RNA and DNA containing the sheared A:G base pairs are also reported.

  相似文献   

17.
Pfam family DUF1023 consists entirely of uncharacterized proteins generated by sequencing the genomes of Actinobacteria (Bateman A., et al., Nucleic Acids Res. 2004;32 Database issue:D138-141.) Utilizing sequence similarity detection methods, we infer homology between DUF1023 and alpha/beta hydrolases. DUF1023 proteins conserve the core secondary structures in alpha/beta hydrolase fold, and share similar catalytic machinery as that of alpha/beta hydrolases. We predict DUF1023 spatial structure and deduce that they function as hydrolases utilizing catalytic Ser-His-Asp triad with the serine as a nucleophile.  相似文献   

18.
SUMMARY: The DBAli database includes approximately 35000 alignments of pairs of protein structures from SCOP (Lo Conte et al., Nucleic Acids Res., 28, 257-259, 2000) and CE (Shindyalov and Bourne, Protein Eng., 11, 739-747, 1998). DBAli is linked to several resources, including Compare3D (Shindyalov and Bourne, http://www.sdsc.edu/pb/software.htm, 1999) and ModView (Ilyin and Sali, http://guitar.rockefeller.edu/ModView/, 2001) for visualizing sequence alignments and structure superpositions. A flexible search of DBAli by protein sequence and structure properties allows construction of subsets of alignments suitable for a number of applications, such as benchmarking of sequence-sequence and sequence-structure alignment methods under a variety of conditions. AVAILABILITY: http://guitar.rockefeller.edu/DBAli/  相似文献   

19.
The locations of replication pause sites in the simian virus 40 minichromosome which were determined by sizing cloned fragments of nascent DNA (Zannis-Hadjopoulos et al., J. Mol. Biol. 165:599-607, 1983) were compared with the positions of simian virus 40 nucleosomes in the genome, as obtained by sequence-directed mapping (G. Mengeritsky and E. N. Trifonov, Nucleic Acids Res. 11:3833-3851, 1983; Mengeritsky and Trifonov, Cell Biophys. 6:1-8, 1984). Clear correlation between these two maps is demonstrated, suggesting that nucleosomes hinder propagation of the replication forks.  相似文献   

20.
Aurintricarboxylic acid (ATA) is a well-known inhibitor of RNA and DNA modifying enzymes and was suggested as a potent RNase inhibitor for preparation of RNA (Hallick et al., 1977, Nucleic Acids Res. 4, 3055-3064). We show that ATA is a very useful stain for detecting RNA on Northern blots and slot blots although it did not fully protect purified RNA in concentrated solution against RNase A.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号