首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
ESTHER (for esterases, alpha/betahydrolase enzyme and relatives) is a database of sequences phylogenetically related to cholinesterases. These sequences define a homogeneous group of enzymes (carboxylesterases, lipases and hormone-sensitive lipases) sharing a similar structure of a central beta-sheet surrounded by alpha-helices. Among these proteins a wide range of functions can be found (hydrolases, adhesion molecules, hormone precursors). The purpose of ESTHER is to help comparison of structures and functions of members of the family. Since the last release, new features have been added to the server. A BLAST comparison tool allows sequence homology searches within the database sequences. New sections are available: kinetics and inhibitors of cholinesterases, fasciculin-acetylcholinesterase interaction and a gene structure review. The mutation analysis compilation has been improved with three-dimensional images. A mailing list has been created.  相似文献   

2.
It is standard practice, whenever a researcher finds a new gene, to search databases for genes that have a similar sequence. It is not standard practice, whenever a researcher finds a new gene, to search for genes that have similar expression (co-expression). Failure to perform co-expression searches has lead to incorrect conclusions about the likely function of new genes, and has lead to wasted laboratory attempts to confirm functions incorrectly predicted. We present here the example of Glia Maturation Factor gamma (GMF-gamma). Despite its name, it has not been shown to participate in glia maturation. It is a gene of unknown function that is similar in sequence to GMF-beta. The sequence homology and chromosomal location led to an unsuccessful search for GMF-gamma mutations in glioma. We examined GMF-gamma expression in 1432 human cDNA libraries. Highest expression occurs in phagocytic, antigen-presenting and other hematopoietic cells. We found GMF-gamma mRNA in almost every tissue examined, with expre  相似文献   

3.
Genomic resources available to researchers studying phytopathogenic fungi are limited. Here, we briefly review the genomic and bioinformatic resources available and the current status of fungal genomics. We also describe a relational database containing sequences of expressed sequence tags (ESTs) from three phytopathogenic fungi, Blumeria graminis, Magnaporthe grisea, and Mycosphaerella graminicola, and the methods and underlying principles required for its construction. The database contains significant annotation for each EST sequence and is accessible at http://cogeme.ex.ac.uk. An easy-to-use interface allows the user to identify gene sequences by using simple text queries or homology searches. New querying functions and large sequence sets from a variety of phytopathogenic species will be incorporated in due course.  相似文献   

4.
The study of the Schistosoma mansoni genome, one of the etiologic agents of human schistosomiasis, is essential for a better understanding of the biology and development of this parasite. In order to get an overview of all S. mansoni catalogued gene sequences, we performed a clustering analysis of the parasite mRNA sequences available in public databases. This was made using softwares PHRAP and CAP3. The consensus sequences, generated after the alignment of cluster constituent sequences, allowed the identification by database homology searches of the most expressed genes in the worm. We analyzed these genes and looked for a correlation between their high expression and parasite metabolism and biology. We observed that the majority of these genes is related to the maintenance of basic cell functions, encoding genes whose products are related to the cytoskeleton, intracellular transport and energy metabolism. Evidences are presented here that genes for aerobic energy metabolism are expressed in all the developmental stages analyzed. Some of the most expressed genes could not be identified by homology searches and may have some specific functions in the parasite.  相似文献   

5.
The generation of large numbers of partial cDNA sequences, or expressed sequence tags (ESTs), has provided a method with which to sample a large number of genes from an organism. More than 25,000 Arabidopsis thaliana ESTs have been deposited in public databases, producing the largest collection of ESTs for any plant species. We describe here the application of a method of reducing redundancy and increasing information content in this collection by grouping overlapping ESTs representing the same gene into a "contig" or assembly. The increased information content of these assemblies allows more putative identifications to be assigned based on the results of similarity searches with nucleotide and protein databases. The results of this analysis indicate that sequence information is available for approximately 12,600 nonoverlapping ESTs from Arabidopsis. Comparison of the assemblies with 953 Arabidopsis coding sequences indicates that up to 57% of all Arabidopsis genes are represented by an EST. Clustering analysis of these sequences suggests that between 300 and 700 gene families are represented by between 700 and 2000 sequences in the EST database. A database of the assembled sequences, their putative identifications, and cellular roles is available through the World Wide Web.  相似文献   

6.

Background  

TILLING (Targeting Induced Local Lesions IN Genomes) is a powerful tool for reverse genetics, combining traditional chemical mutagenesis with high-throughput PCR-based mutation detection to discover induced mutations that alter protein function. The most popular mutation detection method for TILLING is a mismatch cleavage assay using the endonuclease CelI. For this method, locus-specific PCR is essential. Most wheat genes are present as three similar sequences with high homology in exons and low homology in introns. Locus-specific primers can usually be designed in introns. However, it is sometimes difficult to design locus-specific PCR primers in a conserved region with high homology among the three homoeologous genes, or in a gene lacking introns, or if information on introns is not available. Here we describe a mutation detection method which combines High Resolution Melting (HRM) analysis of mixed PCR amplicons containing three homoeologous gene fragments and sequence analysis using Mutation Surveyor? software, aimed at simultaneous detection of mutations in three homoeologous genes.  相似文献   

7.
Ro-associated Y RNAs in metazoans: evolution and diversification   总被引:2,自引:0,他引:2  
The Y genes encode small noncoding RNAs whose functions remain elusive, whose numbers vary between species, and whose major property is to be bound by the Ro60 protein (or its ortholog in other species). To better understand the evolution of the Y gene family, we performed a homology search in 27 different genomes along with a structural search using Y RNA specific motifs. These searches confirmed that Y RNAs are well conserved in the animal kingdom and resulted in the detection of several new Y RNA genes, including the first Y RNAs in insects and a second Y RNA detected in Caenorhabditis elegans. Unexpectedly, Y5 genes were retrieved almost as frequently as Y1 and Y3 genes, and, consequently are not the result of a relatively recent apparition as is generally believed. Investigation of the organization of the Y genes demonstrated that the synteny was conserved among species. Interestingly, it revealed the presence of six putative "fossil" Y genes, all of which were Y4 and Y5 related. Sequence analysis led to inference of the ancestral sequences for all Y RNAs. In addition, the evolution of existing Y RNAs was deduced for many families, orders and classes. Moreover, a consensus sequence and secondary structure for each Y species was determined. Further evolutionary insight was obtained from the analysis of several thousand Y retropseudogenes among various species. Taken together, these results confirm the rich and diversified evolution history of Y RNAs.  相似文献   

8.
The increasing number of whole genomic sequences of microorganisms has led to the complexity of genome-wide annotation and gene sequence comparison among multiple microorganisms. To address this problem, we have developed nWayComp software that compares DNA and protein sequences of phylogenetically-related microorganisms. This package integrates a series of bioinformatics tools such as BLAST, ClustalW, ALIGN, PHYLIP and PRIMER3 for sequence comparison. It searches for homologous sequences among multiple organisms and identifies genes that are unique to a particular organism. The homologous gene sets are then ranked in the descending order of the sequence similarity. For each set of homologous sequences, a table of sequence identity among homologous genes along with sequence variations such as SNPs and INDELS is developed, and a phylogenetic tree is constructed. In addition, a common set of primers that can amplify all the homologous sequences are generated. The nWayComp package provides users with a quick and convenient tool to compare genomic sequences among multiple organisms at the whole-genome level.  相似文献   

9.
We determined the nucleotide sequences of 64 TAC (transformation-competent artificial chromosome) clones selected from genomic libraries of Lotus japonicus accession Miyakojima MG-20 based on the sequence information of expressed sequence tags (ESTs), cDNAs, genes and DNA markers from L. japonicus and other legumes. The length of the DNA regions sequenced in this study was 6,370,255 bp, and the total length of the L. japonicus genome sequenced so far is 32,537,698 bp together with the nucleotide sequences of 256 TAC clones previously reported. Five hundred forty-eight potential protein-encoding genes with known or predicted functions, 127 gene segments and 224 pseudogenes were assigned to the newly sequenced regions by computer prediction and similarity searches against the sequences in protein and EST databases. Based on the nucleotide sequences of the clones, simple sequence repeat length polymorphism (SSLP) or derived cleaved amplified polymorphic sequence (dCAPS) markers were generated, and each clone was genetically localized onto the linkage map of two accessions of L. japonicus, MG-20 and Gifu B-129. The sequence data, gene information and mapping information are available through the World Wide Web at http://www.kazusa.or.jp/lotus/.  相似文献   

10.
MOTIVATION: Sequence alignment techniques have been developed into extremely powerful tools for identifying the folding families and function of proteins in newly sequenced genomes. For a sufficiently low sequence identity it is necessary to incorporate additional structural information to positively detect homologous proteins. We have carried out an extensive analysis of the effectiveness of incorporating secondary structure information directly into the alignments for fold recognition and identification of distant protein homologs. A secondary structure similarity matrix based on a database of three-dimensionally aligned proteins was first constructed. An iterative application of dynamic programming was used which incorporates linear combinations of amino acid and secondary structure sequence similarity scores. Initially, only primary sequence information is used. Subsequently contributions from secondary structure are phased in and new homologous proteins are positively identified if their scores are consistent with the predetermined error rate. RESULTS: We used the SCOP40 database, where only PDB sequences that have 40% homology or less are included, to calibrate homology detection by the combined amino acid and secondary structure sequence alignments. Combining predicted secondary structure with sequence information results in a 8-15% increase in homology detection within SCOP40 relative to the pairwise alignments using only amino acid sequence data at an error rate of 0.01 errors per query; a 35% increase is observed when the actual secondary structure sequences are used. Incorporating predicted secondary structure information in the analysis of six small genomes yields an improvement in the homology detection of approximately 20% over SSEARCH pairwise alignments, but no improvement in the total number of homologs detected over PSI-BLAST, at an error rate of 0.01 errors per query. However, because the pairwise alignments based on combinations of amino acid and secondary structure similarity are different from those produced by PSI-BLAST and the error rates can be calibrated, it is possible to combine the results of both searches. An additional 25% relative improvement in the number of genes identified at an error rate of 0.01 is observed when the data is pooled in this way. Similarly for the SCOP40 dataset, PSI-BLAST detected 15% of all possible homologs, whereas the pooled results increased the total number of homologs detected to 19%. These results are compared with recent reports of homology detection using sequence profiling methods. AVAILABILITY: Secondary structure alignment homepage at http://lutece.rutgers.edu/ssas CONTACT: anders@rutchem.rutgers.edu; ronlevy@lutece.rutgers.edu Supplementary Information: Genome sequence/structure alignment results at http://lutece.rutgers.edu/ss_fold_predictions.  相似文献   

11.
Myosins play an important role in various developmental processes in plants. We have identified 14 myosin genes in rice (Oryza sativa cv. Nipponbare) genome using sequence information available in public databases. Phylogenetic analysis of these sequences with other plant and non-plant myosins revealed that two of the predicted sequences belonged to class VIII and the others to class XI. All of these genes were distributed on seven chromosomes in the rice genome. Domain searches on these sequences indicated that a typical rice myosin consisted of Myosin_N, head domain, neck (IQ motifs), tail, and dilute (DIL) domain. Based on the sequence information obtained from predicted myosins, we isolated and sequenced two full-length cDNAs, OsMyoVIIIA and OsMyoXIE, representing each of the two classes of myosins. These two cDNAs isolated from different organs existed in isoforms due to differential splicing and showed minor differences from the predicted myosin in exon organization. Out of 14 myosin genes 11 were expressed in three major organs: leaves, panicles, and roots, among which three myosins exhibited different expression levels. On the other hand, three of the total myosin sequences showed organ-specific expression. The existence of different myosin genes and their isoforms in different organs or tissues indicates the diversity of myosin functions in rice.  相似文献   

12.
Using the sequence information of expressed sequences tags (ESTs), cDNAs and genes from Lotus japonicus and other legumes, 73 TAC (transformation-competent artificial chromosomes) clones were selected from a genomic library of L. japonicus accession MG-20, and their nucleotide sequences were determined. The length of the DNA sequenced in this study was 7,455,959 bp, and the total length of the DNA regions sequenced so far is 26,167,443 bp together with the nucleotide sequences of 183 TAC clones previously reported. By similarity searches against the sequences in protein and EST databases and prediction by computer programs, a total of 699 potential protein-encoding genes with known or predicted functions, 163 gene segments and 267 pseudogenes were assigned to the newly sequenced regions. Based oil the nucleotide sequences of the clones, simple sequence repeat length polymorphism (SSLP) or derived cleaved amplified polymorphic sequence (dCAPS) markers were generated, and each clone was located onto the linkage map of two accessions of L. japonicus, Gifu B-129 and Miyakojima MG-20. The sequence data, gene information and mapping information are available through the World Wide Web at http://www.kazusa.or.jp/lotus/.  相似文献   

13.
Whatever else they should share, strains of bacteria assigned to the same species should have house-keeping genes that are similar in sequence. Single gene sequences (or rRNA gene sequences) have very few informative sites to resolve the strains of closely related species, and relationships among similar species may be confounded by interspecies recombination. A more promising approach (multilocus sequence analysis, MLSA) is to concatenate the sequences of multiple house-keeping loci and to observe the patterns of clustering among large populations of strains of closely related named bacterial species. Recent studies have shown that large populations can be resolved into non-overlapping sequence clusters that agree well with species assigned by the standard microbiological methods. The use of clustering patterns to inform the division of closely related populations into species has many advantages for poorly studied bacteria (or to re-evaluate well-studied species), as it provides a way of recognizing natural discontinuities in the distribution of similar genotypes. Clustering patterns can be used by expert groups as the basis of a pragmatic approach to assigning species, taking into account whatever additional data are available (e.g. similarities in ecology, phenotype and gene content). The development of large MLSA Internet databases provides the ability to assign new strains to previously defined species clusters and an electronic taxonomy. The advantages and problems in using sequence clusters as the basis of species assignments are discussed.  相似文献   

14.
BLAST (Basic Local Alignment Search Tool) searches against DNA and protein sequence databases have become an indispensable tool for biomedical research. The proliferation of the genome sequencing projects is steadily increasing the fraction of genome-derived sequences in the public databases and their importance as a public resource. We report here the availability of Genomic BLAST, a novel graphical tool for simplifying BLAST searches against complete and unfinished genome sequences. This tool allows the user to compare the query sequence against a virtual database of DNA and/or protein sequences from a selected group of organisms with finished or unfinished genomes. The organisms for such a database can be selected using either a graphic taxonomy-based tree or an alphabetical list of organism-specific sequences. The first option is designed to help explore the evolutionary relationships among organisms within a certain taxonomy group when performing BLAST searches. The use of an alphabetical list allows the user to perform a more elaborate set of selections, assembling any given number of organism-specific databases from unfinished or complete genomes. This tool, available at the NCBI web site http://www.ncbi.nlm.nih.gov/cgi-bin/Entrez/genom_table_cgi, currently provides access to over 170 bacterial and archaeal genomes and over 40 eukaryotic genomes.  相似文献   

15.
Summary Ubiquitin is ubiquitous in all eukaryotes and its amino acid sequence shows extreme conservation. Ubiquitin genes comprise direct repeats of the ubiquitin coding unit with no spacers. The nucleotide sequences coding for 13 ubiquitin genes from 11 species reported so far have been compiled and analyzed. The G+C content of codon third base reveals a positive linear correlation with the genome G+C content of the corresponding species. The slope strongly suggests that the overall G+C content of codons of polyubiquitin genes clearly reflects the genome G+C content by AT/GC substitutions at the codon third position. The G+C content of ubiquitin codon third base also shows a positive linear correlation with the overall G+C content of coding regions of compiled genes, indicating the codon choices among synonymous codons reflect the average codon usage pattern of corresponding species. On the other hand, the monoubiquitin gene, which is different from the polyubiquitin gene in gene organization, gene expression, and function of the encoding protein, shows a different codon usage pattern compared with that of the polyubiquitin gene. From comparisons of the levels of synonymous substitutions among ubiquitin repeats and the homology of the amino acid sequence of the tail of monomeric ubiquitin genes, we propose that the molecular evolution of ubiquitin genes occurred as follows: Plural primitive ubiquitin sequences were dispersed on genome in ancestral eukaryotes. Some of them situated in a particular environment fused with the tail sequence to produce monomeric ubiquitin genes that were maintained across species. After divergence of species, polyubiquitin genes were formed by duplication of the other primitive ubiquitin sequences on different chromosomes. Differences in the environments in which ubiquitin genes are embedded reflect the differences in codon choice and in gene expression pattern between poly- and monomeric ubiquitin genes.  相似文献   

16.
H J?rnvall 《FEBS letters》1999,456(1):85-88
Motifer is a software tool able to find directly in nucleotide databases very distant homologues to an amino acid query sequence. It focuses searches on a specific amino acid pattern, scoring the matching and intervening residues as specified by the user. The program has been developed for searching databases of expressed sequence tags (ESTs), but it is also well suited to search genomic sequences. The query sequence can be a variable pattern with alternative amino acids or gaps and the sequences searched can contain introns or sequencing errors with accompanying frame shifts. Other features include options to generate a searchable output, set the maximal sequencing error frequency, limit searches to given species, or exclude already known matches. Motifer can find sequence homologues that other search algorithms would deem unrelated or would not find because of sequencing errors or a too large number of other homologues. The ability of Motifer to find relatives to a given sequence is exemplified by searches for members of the transforming growth factor-beta family and for proteins containing a WW-domain. The functions aimed at enhancing EST searches are illustrated by the 'in silico' cloning of a novel cytochrome P450 enzyme.  相似文献   

17.
18.
Histone and histone fold sequences and structures: a database.   总被引:4,自引:3,他引:1       下载免费PDF全文
A database of aligned histone protein sequences has been constructed based on the results of homology searches of the major public sequence databases. In addition, sequences of proteins identified as containing the histone fold motif and structures of all known histone and histone fold proteins have been included in the current release. Database resources include information on conflicts between similar sequence entries in different source databases, multiple sequence alignments, and links to the Entrez integrated information retrieval system at the National Center for Biotechnology Information (NCBI). The database currently contains over 1000 protein sequences. All sequences and alignments in this database are available through the World Wide Web at: http: //www.ncbi.nlm.nih.gov/Baxevani/HISTONES/ .  相似文献   

19.
To accumulate information on the coding sequences (CDSs) of unidentified genes, we have conducted a sequencing project of human long cDNA clones. Both the end sequences of approximately 10,000 cDNA clones from two size-fractionated human spleen cDNA libraries (average sizes of 4.5 kb and 5.6 kb) were determined by single-pass sequencing to select cDNAs with unidentified sequences. We herein present the entire sequences of 81 cDNA clones, most of which were selected by two approaches based on their protein-coding potentialities in silico: Fifty-eight cDNA clones were selected as those having protein-coding potentialities at the 5'-end of single-pass sequences by applying the GeneMark analysis; and 20 cDNA clones were selected as those expected to encode proteins larger than 100 amino acid residues by analysis of the human genome sequences flanked by both the end sequences of cDNAs using the GENSCAN gene prediction program. In addition to these newly identified cDNAs, three cDNA clones were isolated by colony hybridization experiments using probes corresponding to known gene sequences since these cDNAs are likely to contain considerable amounts of new information regarding the genes already annotated. The sequence data indicated that the average sizes of the inserts and corresponding CDSs of cDNA clones analyzed here were 5.0 kb and 2.0 kb (670 amino acid residues), respectively. From the results of homology and motif searches against the public databases, functional categories of the 29 predicted gene products could be assigned; 86% of these predicted gene products (25 gene products) were classified into proteins relating to cell signaling/communication, nucleic acid management, and cell structure/motility.  相似文献   

20.
To evaluate the importance of the surrounding nucleotide sequence in the selection of a splice site for mRNA, we have carried out computer studies of eukaryotic protein genes whose entire nucleotide sequences were available. A splice site-like sequence that has a significant homology to the consensus splice junction sequences is frequently found within an intron and exon. It is found that the higher the homology of a candidate donor site sequence to the nine-nucleotide consensus sequence, the higher is its probability of being a donor site. For most of the donors, the stability of presumed base-pairing with U1-RNA is higher than that of donor-like sequences, if any, in the adjacent exon and intron. However, homology of a candidate acceptor sequence to the 15-nucleotide consensus is a poor criterion of an acceptor site. The presence of a sequence that could serve as a branch-point 18 to 37 nucleotides before an acceptor does not seem to be critical in distinguishing it from an acceptor-like sequence. For genes of human, rat, mouse and chicken, respectively, nucleotide frequencies around splice junctions of many genes have been calculated. They seem to be different at some positions around a donor site from species to species. The acceptors for these vertebrates have longer pyrimidine-rich regions than the previous consensus sequence. The newly derived nucleotide frequencies were used as the standard to calculate the weighted homology score of a candidate splice site sequence in a gene of the four species. This weighted homology score of the 40 to 60-nucleotide intron-exon sequence is a much better criterion of an acceptor. These results suggest that the most important signal in the selection of a splice resides in the surrounding nucleotide sequence. It is also suggested that the surrounding nucleotide sequence alone is not generally sufficient for the selection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号