首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The CATH database of protein structures contains approximately 18000 domains organized according to their (C)lass, (A)rchitecture, (T)opology and (H)omologous superfamily. Relationships between evolutionary related structures (homologues) within the database have been used to test the sensitivity of various sequence search methods in order to identify relatives in Genbank and other sequence databases. Subsequent application of the most sensitive and efficient algorithms, gapped blast and the profile based method, Position Specific Iterated Basic Local Alignment Tool (PSI-BLAST), could be used to assign structural data to between 22 and 36 % of microbial genomes in order to improve functional annotation and enhance understanding of biological mechanism. However, on a cautionary note, an analysis of functional conservation within fold groups and homologous superfamilies in the CATH database, revealed that whilst function was conserved in nearly 55% of enzyme families, function had diverged considerably, in some highly populated families. In these families, functional properties should be inherited far more cautiously and the probable effects of substitutions in key functional residues carefully assessed.  相似文献   

2.
An automatic sequence search and analysis protocol (DomainFinder) based on PSI-BLAST and IMPALA, and using conservative thresholds, has been developed for reliably integrating gene sequences from GenBank into their respective structural families within the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath_new). DomainFinder assigns a new gene sequence to a CATH homologous superfamily provided that PSI-BLAST identifies a clear relationship to at least one other Protein Data Bank sequence within that superfamily. This has resulted in an expansion of the CATH protein family database (CATH-PFDB v1.6) from 19,563 domain structures to 176,597 domain sequences. A further 50,000 putative homologous relationships can be identified using less stringent cut-offs and these relationships are maintained within neighbour tables in the CATH Oracle database, pending further evidence of their suggested evolutionary relationship. Analysis of the CATH-PFDB has shown that only 15% of the sequence families are close enough to a known structure for reliable homology modeling. IMPALA/PSI-BLAST profiles have been generated for each of the sequence families in the expanded CATH-PFDB and a web server has been provided so that new sequences may be scanned against the profile library and be assigned to a structure and homologous superfamily.  相似文献   

3.
Journal of Mathematical Biology - Numerous data analysis and data mining techniques require that data be embedded in a Euclidean space. When faced with symbolic datasets, particularly biological...  相似文献   

4.
Accompanying the discovery of an increasing number of proteins, there is the need to provide functional annotation that is both highly accurate and consistent. The Gene Ontology (GO) provides consistent annotation in a computer readable and usable form; hence, GO annotation (GOA) has been assigned to a large number of protein sequences based on direct experimental evidence and through inference determined by sequence homology. Here we show that this annotation can be extended and corrected for cases where protein structures are available. Specifically, using the Combinatorial Extension (CE) algorithm for structure comparison, we extend the protein annotation currently provided by GOA at the European Bioinformatics Institute (EBI) to further describe the contents of the Protein Data Bank (PDB). Specific cases of biologically interesting annotations derived by this method are given. Given that the relationship between sequence, structure, and function is complicated, we explore the impact of this relationship on assigning GOA. The effect of superfolds (folds with many functions) is considered and, by comparison to the Structural Classification of Proteins (SCOP), the individual effects of family, superfamily, and fold.  相似文献   

5.
Summary : An interactive dotmatrix program for the MacOS was designed that allows comparison of DNA to protein sequences using nested 3-frame translations. Availability : Shareware, available at http://copan.bioz.unibas.ch/software/ Contact : burglin@ubaclu. unibas.ch   相似文献   

6.
Mammalian genes are characterized by relatively small exons surrounded by variable lengths of intronic sequence. Sequences similar to the splice signals that define the 5' and 3' boundaries of these exons are also present in abundance throughout the surrounding introns. What causes the real sites to be distinguished from the multitude of pseudosites in pre-mRNA is unclear. Much progress has been made in defining additional sequence elements that enhance the use of particular sites. Less work has been done on sequences that repress the use of particular splice sites. To find additional examples of sequences that inhibit splicing, we searched human genomic DNA libraries for sequences that would inhibit the inclusion of a constitutively spliced exon. Genetic selection experiments suggested that such sequences were common, and we subsequently tested randomly chosen restriction fragments of about 100 bp. When inserted into the central exon of a three-exon minigene, about one in three inhibited inclusion, revealing a high frequency of inhibitory elements in human DNA. In contrast, only 1 in 27 Escherichia coli DNA fragments was inhibitory. Several previously identified silencing elements derived from alternatively spliced exons functioned weakly in this constitutively spliced exon. In contrast, a high-affinity site for U2AF65 strongly inhibited exon inclusion. Together, our results suggest that splicing occurs in a background of repression and, since many of our inhibitors contain splice like signals, we suggest that repression of some pseudosites may occur through an inhibitory arrangement of these sites.  相似文献   

7.
The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets.  相似文献   

8.
Inconsistencies in Neanderthal genomic DNA sequences   总被引:1,自引:0,他引:1       下载免费PDF全文
Wall JD  Kim SK 《PLoS genetics》2007,3(10):1862-1866
Two recently published papers describe nuclear DNA sequences that were obtained from the same Neanderthal fossil. Our reanalyses of the data from these studies show that they are not consistent with each other and point to serious problems with the data quality in one of the studies, possibly due to modern human DNA contaminants and/or a high rate of sequencing errors.  相似文献   

9.
Protocols are presented for preparing DNA from a genomic library in λ phage and for synthesizing genomic fragments using PCR with nested vector- and gene-specific primers and linker-primers. Library DNA, isolated fromE. coli liquid lysates by a simple protocol, is used as template in PCR following a commercial protocol. The method produces library DNA sufficient for several hundred PCRs, incorporates nested primers to reduce nonspecific product formation, and enables the synthesis of linker-containing DNA fragments containing selected restriction sites to simplify subsequent cloning. The isolation of 5′ upstream sequences of three different arabidopsis genes by this methodod is described.  相似文献   

10.
11.
Direct cloning of large genomic sequences   总被引:1,自引:0,他引:1  
  相似文献   

12.
Assigning functions to nucleolar structures   总被引:8,自引:0,他引:8  
  相似文献   

13.
Single-stranded DNA or RNA libraries used in SELEX experiments usually include primer-annealing sequences for PCR amplification. In genomic SELEX, these fixed sequences may form base pairs with the central genomic fragments and interfere with the binding of target molecules to the genomic sequences. In this study, a method has been developed to circumvent these artificial effects. Primer-annealing sequences are removed from the genomic library before selection with the target protein and are then regenerated to allow amplification of the selected genomic fragments. A key step in the regeneration of primer-annealing sequences is to employ thermal cycles of hybridization-extension, using the sequences from unselected pools as templates. The genomic library was derived from the bacteriophage fd, and the gene 5 protein (g5p) from the phage was used as a target protein. After four rounds of primer-free genomic SELEX, most cloned sequences overlapped at a segment within gene 6 of the viral genome. This sequence segment was pyrimidine-rich and contained no stable secondary structures. Compared with a neighboring genomic fragment, a representative sequence from the family of selected sequences had about 23-fold higher g5p-binding affinity. Results from primer-free genomic SELEX were compared with the results from two other genomic SELEX protocols.  相似文献   

14.
We describe a simple and rapid method for the isolation of specific genomic DNA sequences recognized by DNA-binding proteins. This procedure consists of four steps: (1) restriction enzyme digestion and size fractionation of genomic DNA; (2) DNA--protein binding using the gel mobility-shift assay; (3) ligation of isolated DNA fragments followed by transformation of Escherichia coli; and (4) screening of recombinant clones for inserts containing specific DNA--protein binding sequences. We have used this protocol to isolate human DNA sequences, 100-200 bp in size, that are recognized by both partially purified and affinity purified proteins. Unlike other procedures designed to identify genomic target sequences, the method described does not require polymerase chain reaction or successive immunoprecipitations.  相似文献   

15.
Riboswitches and RNA interference are important emerging mechanisms found in many organisms to control gene expression. To enhance our understanding of such RNA roles, finding small regulatory motifs in genomes presents a challenge on a wide scale. Many simple functional RNA motifs have been found by in vitro selection experiments, which produce synthetic target-binding aptamers as well as catalytic RNAs, including the hammerhead ribozyme. Motivated by the prediction of Piganeau and Schroeder [(2003) Chem. Biol., 10, 103–104] that synthetic RNAs may have natural counterparts, we develop and apply an efficient computational protocol for identifying aptamer-like motifs in genomes. We define motifs from the sequence and structural information of synthetic aptamers, search for sequences in genomes that will produce motif matches, and then evaluate the structural stability and statistical significance of the potential hits. Our application to aptamers for streptomycin, chloramphenicol, neomycin B and ATP identifies 37 candidate sequences (in coding and non-coding regions) that fold to the target aptamer structures in bacterial and archaeal genomes. Further energetic screening reveals that several candidates exhibit energetic properties and sequence conservation patterns that are characteristic of functional motifs. Besides providing candidates for experimental testing, our computational protocol offers an avenue for expanding natural RNA's functional repertoire.  相似文献   

16.
Traditional sequence analysis depends on sequence alignment. In this study, we analyzed various functional regions of the human genome based on sequence features, including word frequency, dinucleotide relative abundance, and base-base correlation. We analyzed the human chromosome 22 and classified the upstream, exon, intron, downstream, and intergenic regions by principal component analysis and discriminant analysis of these features. The results show that we could classify the functional regions of genome based on sequence feature and discriminant analysis.  相似文献   

17.
Liberles DA  Wayne ML 《Genome biology》2002,3(6):reviews1018.1-reviews10184
As more gene and genomic sequences from an increasing assortment of species become available, new pictures of evolution are emerging. Improved methods can pinpoint where positive and negative selection act in individual codons in specific genes on specific branches of phylogenetic trees. Positive selection appears to be important in the interaction between genotype, protein structure, function, and organismal phenotype.  相似文献   

18.
An original tetrahedral representation of the Genetic Code (GC) that better describes its structure, degeneration and evolution trends is defined. The possibility to reduce the dimension of the representation by projecting the GC tetrahedron on an adequately oriented plane is also analyzed, leading to some equivalent complex representations of the GC. On these bases, optimal symbolic-to-digital mappings of the linear, nucleic acid strands into real or complex genomic signals are derived at nucleotide, codon and amino acid levels. By converting the sequences of nucleotides and polypeptides into digital genomic signals, this approach offers the possibility to use a large variety of signal processing methods for their handling and analysis. It is also shown that some essential features of the nucleotide sequences can be better extracted using this representation. Specifically, the paper reports for the first time the existence of a global helicoidal wrapping of the complex representations of the bases along DNA sequences, a large scale trend of genomic signals. New tools for genomic signal analysis, including the use of phase, aggregated phase, unwrapped phase, sequence path, stem representation of components'relative frequencies, as well as analysis of the transitions are introduced at the nucleotide, codon and amino acid levels, and in a multiresolution approach.  相似文献   

19.
Currently there is no successful computational approach for identification of genes encoding novel functional RNAs (fRNAs) in genomic sequences. We have developed a machine learning approach using neural networks and support vector machines to extract common features among known RNAs for prediction of new RNA genes in the unannotated regions of prokaryotic and archaeal genomes. The Escherichia coli genome was used for development, but we have applied this method to several other bacterial and archaeal genomes. Networks based on nucleotide composition were 80–90% accurate in jackknife testing experiments for bacteria and 90–99% for hyperthermophilic archaea. We also achieved a significant improvement in accuracy by combining these predictions with those obtained using a second set of parameters consisting of known RNA sequence motifs and the calculated free energy of folding. Several known fRNAs not included in the training datasets were identified as well as several hundred predicted novel RNAs. These studies indicate that there are many unidentified RNAs in simple genomes that can be predicted computationally as a precursor to experimental study. Public access to our RNA gene predictions and an interface for user predictions is available via the web.  相似文献   

20.
Minerdi  D.  Bianciotto  V.  Bonfante  P. 《Plant and Soil》2002,244(1-2):211-219
Arbuscular mycorrhizal (AM) fungi have been successful in time and space thanks to a long co-evolution with their host plants. In addition to this well known interaction, they also associate with bacteria that reside in the fungal cytoplasm. The chapter mostly focusses on endosymbionts belonging to the genus Burkholderia and found in many species of Gigasporaceae. We have used morphological and genetic approaches to investigate these intracellular microrganisms. Some genes related to metabolism, cell colonization events and nitrogen fixation have been characterized and suggest a potential role in the nutritional exchanges between endobacteria, fungi and plants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号