首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The CATH database of protein structures contains approximately 18000 domains organized according to their (C)lass, (A)rchitecture, (T)opology and (H)omologous superfamily. Relationships between evolutionary related structures (homologues) within the database have been used to test the sensitivity of various sequence search methods in order to identify relatives in Genbank and other sequence databases. Subsequent application of the most sensitive and efficient algorithms, gapped blast and the profile based method, Position Specific Iterated Basic Local Alignment Tool (PSI-BLAST), could be used to assign structural data to between 22 and 36 % of microbial genomes in order to improve functional annotation and enhance understanding of biological mechanism. However, on a cautionary note, an analysis of functional conservation within fold groups and homologous superfamilies in the CATH database, revealed that whilst function was conserved in nearly 55% of enzyme families, function had diverged considerably, in some highly populated families. In these families, functional properties should be inherited far more cautiously and the probable effects of substitutions in key functional residues carefully assessed.  相似文献   

2.
Journal of Mathematical Biology - Numerous data analysis and data mining techniques require that data be embedded in a Euclidean space. When faced with symbolic datasets, particularly biological...  相似文献   

3.
Summary : An interactive dotmatrix program for the MacOS was designed that allows comparison of DNA to protein sequences using nested 3-frame translations. Availability : Shareware, available at http://copan.bioz.unibas.ch/software/ Contact : burglin@ubaclu. unibas.ch   相似文献   

4.
Direct cloning of large genomic sequences   总被引:1,自引:0,他引:1  
  相似文献   

5.
6.
Mammalian genes are characterized by relatively small exons surrounded by variable lengths of intronic sequence. Sequences similar to the splice signals that define the 5' and 3' boundaries of these exons are also present in abundance throughout the surrounding introns. What causes the real sites to be distinguished from the multitude of pseudosites in pre-mRNA is unclear. Much progress has been made in defining additional sequence elements that enhance the use of particular sites. Less work has been done on sequences that repress the use of particular splice sites. To find additional examples of sequences that inhibit splicing, we searched human genomic DNA libraries for sequences that would inhibit the inclusion of a constitutively spliced exon. Genetic selection experiments suggested that such sequences were common, and we subsequently tested randomly chosen restriction fragments of about 100 bp. When inserted into the central exon of a three-exon minigene, about one in three inhibited inclusion, revealing a high frequency of inhibitory elements in human DNA. In contrast, only 1 in 27 Escherichia coli DNA fragments was inhibitory. Several previously identified silencing elements derived from alternatively spliced exons functioned weakly in this constitutively spliced exon. In contrast, a high-affinity site for U2AF65 strongly inhibited exon inclusion. Together, our results suggest that splicing occurs in a background of repression and, since many of our inhibitors contain splice like signals, we suggest that repression of some pseudosites may occur through an inhibitory arrangement of these sites.  相似文献   

7.
Inconsistencies in Neanderthal genomic DNA sequences   总被引:1,自引:0,他引:1       下载免费PDF全文
Wall JD  Kim SK 《PLoS genetics》2007,3(10):1862-1866
Two recently published papers describe nuclear DNA sequences that were obtained from the same Neanderthal fossil. Our reanalyses of the data from these studies show that they are not consistent with each other and point to serious problems with the data quality in one of the studies, possibly due to modern human DNA contaminants and/or a high rate of sequencing errors.  相似文献   

8.
The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets.  相似文献   

9.
Single-stranded DNA or RNA libraries used in SELEX experiments usually include primer-annealing sequences for PCR amplification. In genomic SELEX, these fixed sequences may form base pairs with the central genomic fragments and interfere with the binding of target molecules to the genomic sequences. In this study, a method has been developed to circumvent these artificial effects. Primer-annealing sequences are removed from the genomic library before selection with the target protein and are then regenerated to allow amplification of the selected genomic fragments. A key step in the regeneration of primer-annealing sequences is to employ thermal cycles of hybridization-extension, using the sequences from unselected pools as templates. The genomic library was derived from the bacteriophage fd, and the gene 5 protein (g5p) from the phage was used as a target protein. After four rounds of primer-free genomic SELEX, most cloned sequences overlapped at a segment within gene 6 of the viral genome. This sequence segment was pyrimidine-rich and contained no stable secondary structures. Compared with a neighboring genomic fragment, a representative sequence from the family of selected sequences had about 23-fold higher g5p-binding affinity. Results from primer-free genomic SELEX were compared with the results from two other genomic SELEX protocols.  相似文献   

10.
We describe a simple and rapid method for the isolation of specific genomic DNA sequences recognized by DNA-binding proteins. This procedure consists of four steps: (1) restriction enzyme digestion and size fractionation of genomic DNA; (2) DNA--protein binding using the gel mobility-shift assay; (3) ligation of isolated DNA fragments followed by transformation of Escherichia coli; and (4) screening of recombinant clones for inserts containing specific DNA--protein binding sequences. We have used this protocol to isolate human DNA sequences, 100-200 bp in size, that are recognized by both partially purified and affinity purified proteins. Unlike other procedures designed to identify genomic target sequences, the method described does not require polymerase chain reaction or successive immunoprecipitations.  相似文献   

11.
Riboswitches and RNA interference are important emerging mechanisms found in many organisms to control gene expression. To enhance our understanding of such RNA roles, finding small regulatory motifs in genomes presents a challenge on a wide scale. Many simple functional RNA motifs have been found by in vitro selection experiments, which produce synthetic target-binding aptamers as well as catalytic RNAs, including the hammerhead ribozyme. Motivated by the prediction of Piganeau and Schroeder [(2003) Chem. Biol., 10, 103–104] that synthetic RNAs may have natural counterparts, we develop and apply an efficient computational protocol for identifying aptamer-like motifs in genomes. We define motifs from the sequence and structural information of synthetic aptamers, search for sequences in genomes that will produce motif matches, and then evaluate the structural stability and statistical significance of the potential hits. Our application to aptamers for streptomycin, chloramphenicol, neomycin B and ATP identifies 37 candidate sequences (in coding and non-coding regions) that fold to the target aptamer structures in bacterial and archaeal genomes. Further energetic screening reveals that several candidates exhibit energetic properties and sequence conservation patterns that are characteristic of functional motifs. Besides providing candidates for experimental testing, our computational protocol offers an avenue for expanding natural RNA's functional repertoire.  相似文献   

12.
Traditional sequence analysis depends on sequence alignment. In this study, we analyzed various functional regions of the human genome based on sequence features, including word frequency, dinucleotide relative abundance, and base-base correlation. We analyzed the human chromosome 22 and classified the upstream, exon, intron, downstream, and intergenic regions by principal component analysis and discriminant analysis of these features. The results show that we could classify the functional regions of genome based on sequence feature and discriminant analysis.  相似文献   

13.
Liberles DA  Wayne ML 《Genome biology》2002,3(6):reviews1018.1-reviews10184
As more gene and genomic sequences from an increasing assortment of species become available, new pictures of evolution are emerging. Improved methods can pinpoint where positive and negative selection act in individual codons in specific genes on specific branches of phylogenetic trees. Positive selection appears to be important in the interaction between genotype, protein structure, function, and organismal phenotype.  相似文献   

14.
Minerdi  D.  Bianciotto  V.  Bonfante  P. 《Plant and Soil》2002,244(1-2):211-219
Arbuscular mycorrhizal (AM) fungi have been successful in time and space thanks to a long co-evolution with their host plants. In addition to this well known interaction, they also associate with bacteria that reside in the fungal cytoplasm. The chapter mostly focusses on endosymbionts belonging to the genus Burkholderia and found in many species of Gigasporaceae. We have used morphological and genetic approaches to investigate these intracellular microrganisms. Some genes related to metabolism, cell colonization events and nitrogen fixation have been characterized and suggest a potential role in the nutritional exchanges between endobacteria, fungi and plants.  相似文献   

15.
Currently there is no successful computational approach for identification of genes encoding novel functional RNAs (fRNAs) in genomic sequences. We have developed a machine learning approach using neural networks and support vector machines to extract common features among known RNAs for prediction of new RNA genes in the unannotated regions of prokaryotic and archaeal genomes. The Escherichia coli genome was used for development, but we have applied this method to several other bacterial and archaeal genomes. Networks based on nucleotide composition were 80–90% accurate in jackknife testing experiments for bacteria and 90–99% for hyperthermophilic archaea. We also achieved a significant improvement in accuracy by combining these predictions with those obtained using a second set of parameters consisting of known RNA sequence motifs and the calculated free energy of folding. Several known fRNAs not included in the training datasets were identified as well as several hundred predicted novel RNAs. These studies indicate that there are many unidentified RNAs in simple genomes that can be predicted computationally as a precursor to experimental study. Public access to our RNA gene predictions and an interface for user predictions is available via the web.  相似文献   

16.
Che D  Hasan MS  Wang H  Fazekas J  Huang J  Liu Q 《Bioinformation》2011,7(6):311-314
Genomic islands (GIs) are genomic regions that are originally transferred from other organisms. The detection of genomic islands in genomes can lead to many applications in industrial, medical and environmental contexts. Existing computational tools for GI detection suffer either low recall or low precision, thus leaving the room for improvement. In this paper, we report the development of our Ensemble algorithm for Genomic Island Detection (EGID). EGID utilizes the prediction results of existing computational tools, filters and generates consensus prediction results. Performance comparisons between our ensemble algorithm and existing programs have shown that our ensemble algorithm is better than any other program. EGID was implemented in Java, and was compiled and executed on Linux operating systems. EGID is freely available at http://www5.esu.edu/cpsc/bioinfo/software/EGID.  相似文献   

17.
SUMMARY: In the segment-by-segment approach to sequence alignment, pairwise and multiple alignments are generated by comparing gap-free segments of the sequences under study. This method is particularly efficient in detecting local homologies, and it has been used to identify functional regions in large genomic sequences. Herein, an algorithm is outlined that calculates optimal pairwise segment-by-segment alignments in essentially linear space. AVAILABILTIY: The program is available at the Bielefeld Bioinformatics Server (BiBiServ) at http://bibiserv.techfak. uni-bielefeld.de/dialign/  相似文献   

18.
Eukaryotic genomes contain many endogenous retroviral sequences (ERVs). ERVs are often severely mutated, therefore difficult to detect. A platform independent (Java) program package, RetroTector (ReTe), was constructed. It has three basic modules: (i) detection of candidate long terminal repeats (LTRs), (ii) detection of chains of conserved retroviral motifs fulfilling distance constraints and (iii) attempted reconstruction of original retroviral protein sequences, combining alignment, codon statistics and properties of protein ends. Other features are prediction of additional open reading frames, automated database collection, graphical presentation and automatic classification. ReTe favors elements >1000-bp long due to its dependence on order of and distances between retroviral fragments. It detects single or low-copy-number elements. ReTe assigned a 'retroviral' score of 890-2827 to 10 exogenous retroviruses from seven genera, and accurately predicted their genes. In a simulated model, ReTe was robust against mutational decay. The human genome was analyzed in 1-2 days on a LINUX cluster. Retroviral sequences were detected in divergent vertebrate genomes. Most ReTe detected chains were coincident with Repeatmasker output and the HERVd database. ReTe did not report most of the evolutionary old HERV-L related and MalR sequences, and is not yet tailored for single LTR detection. Nevertheless, ReTe rationally detects and annotates many retroviral sequences.  相似文献   

19.
The resources available from Arabidopsis thaliana for interpreting functional attributes of wheat EST are reviewed. A focus for the review is a comparison between wheat EST sequences, generated from developing endosperm tissue, and the complete genomic sequence from Arabidopsis. The available information indicates that not only can tentative annotations be assigned to many wheat genes but also putative or unknown Arabidopsis gene annotations can be improved by comparative genomics. Electronic Publication  相似文献   

20.
Analysis of genomic sequences by Chaos Game Representation   总被引:4,自引:0,他引:4  
MOTIVATION: Chaos Game Representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to find the coordinates for their position in a continuous space. This distribution of positions has two properties: it is unique, and the source sequence can be recovered from the coordinates such that distance between positions measures similarity between the corresponding sequences. The possibility of using the latter property to identify succession schemes have been entirely overlooked in previous studies which raises the possibility that CGR may be upgraded from a mere representation technique to a sequence modeling tool. RESULTS: The distribution of positions in the CGR plane were shown to be a generalization of Markov chain probability tables that accommodates non-integer orders. Therefore, Markov models are particular cases of CGR models rather than the reverse, as currently accepted. In addition, the CGR generalization has both practical (computational efficiency) and fundamental (scale independence) advantages. These results are illustrated by using Escherichia coli K-12 as a test data-set, in particular, the genes thrA, thrB and thrC of the threonine operon.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号