首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Expressed sequence tag projects have currently produced over 400 000 partial gene sequences from more than 30 nematode species and the full genomic sequences of selected nematodes are being determined. In addition, functional analyses in the model nematode Caenorhabditis elegans have addressed the role of almost all genes predicted by the genome sequence. This recent explosion in the amount of available nematode DNA sequences, coupled with new gene function data, provides an unprecedented opportunity to identify pre-validated drug targets through efficient mining of nematode genomic databases. This article describes the various information sources available and strategies that can expedite this process.  相似文献   

2.
Expressed sequence tags (ESTs) are randomly sequenced cDNA clones. Currently, nearly 3 million human and 2 million mouse ESTs provide valuable resources that enable researchers to investigate the products of gene expression. The EST databases have proven to be useful tools for detecting homologous genes, for exon mapping, revealing differential splicing, etc. With the increasing availability of large amounts of poorly characterised eukaryotic (notably human) genomic sequence, ESTs have now become a vital tool for gene identification, sometimes yielding the only unambiguous evidence for the existence of a gene expression product. However, BLAST-based Web servers available to the general user have not kept pace with these developments and do not provide appropriate tools for querying EST databases with large highly spliced genes, often spanning 50 000-100 000 bases or more. Here we describe Gene2EST (http://woody.embl-heidelberg.de/gene2est/), a server that brings together a set of tools enabling efficient retrieval of ESTs matching large DNA queries and their subsequent analysis. RepeatMasker is used to mask dispersed repetitive sequences (such as Alu elements) in the query, BLAST2 for searching EST databases and Artemis for graphical display of the findings. Gene2EST combines these components into a Web resource targeted at the researcher who wishes to study one or a few genes to a high level of detail.  相似文献   

3.

Background  

The rapid increase in the amount of protein and DNA sequence information available has become almost overwhelming to researchers. So much information is now accessible that high-quality, functional gene analysis and categorization has become a major goal for many laboratories. To aid in this categorization, there is a need for non-commercial software that is able to both align sequences and also calculate pairwise levels of similarity/identity.  相似文献   

4.
A robust bioinformatics capability is widely acknowledged as central to realizing the promises of toxicogenomics. Successful application of toxicogenomic approaches, such as DNA microarray, inextricably relies on appropriate data management, the ability to extract knowledge from massive amounts of data and the availability of functional information for data interpretation. At the FDA's National Center for Toxicological Research (NCTR), we are developing a public microarray data management and analysis software, called ArrayTrack. ArrayTrack is Minimum Information About a Microarray Experiment (MIAME) supportive for storing both microarray data and experiment parameters associated with a toxicogenomics study. A quality control mechanism is implemented to assure the fidelity of entered expression data. ArrayTrack also provides a rich collection of functional information about genes, proteins and pathways drawn from various public biological databases for facilitating data interpretation. In addition, several data analysis and visualization tools are available with ArrayTrack, and more tools will be available in the next released version. Importantly, gene expression data, functional information and analysis methods are fully integrated so that the data analysis and interpretation process is simplified and enhanced. ArrayTrack is publicly available online and the prospective user can also request a local installation version by contacting the authors.  相似文献   

5.
Web Tools for Rice Transcriptome Analyses   总被引:1,自引:0,他引:1  
Gene expression databases provide profiling data for the expression of thousands of genes to researchers worldwide. Oligonucleotide microarray technology is a useful tool that has been employed to produce gene expression profiles in most species. In rice, there are five genome-wide DNA microarray platforms: NSF 45K, BGI/Yale 60K, Affymetrix, Agilent Rice 44K, and NimbleGen 390K. Presently, more than 1,700 hybridizations of microarray gene expression data are available from public microarray depositing databases such as NCBI gene expression omnibus and Arrayexpress at EBI. More processing or reformatting of public gene expression data is required for further applications or analyses. Web-based databases for expression meta-analyses are useful for guiding researchers in designing relevant research schemes. In this review, we summarize various databases for expression meta-analyses of rice genes and web tools for further applications, such as the development of co-expression network or functional gene network.  相似文献   

6.
Next‐generation technologies generate an overwhelming amount of gene sequence data. Efficient annotation tools are required to make these data amenable to functional genomics analyses. The Mercator pipeline automatically assigns functional terms to protein or nucleotide sequences. It uses the MapMan ‘BIN’ ontology, which is tailored for functional annotation of plant ‘omics’ data. The classification procedure performs parallel sequence searches against reference databases, compiles the results and computes the most likely MapMan BINs for each query. In the current version, the pipeline relies on manually curated reference classifications originating from the three reference organisms (Arabidopsis, Chlamydomonas, rice), various other plant species that have a reviewed SwissProt annotation, and more than 2000 protein domain and family profiles at InterPro, CDD and KOG. Functional annotations predicted by Mercator achieve accuracies above 90% when benchmarked against manual annotation. In addition to mapping files for direct use in the visualization software MapMan, Mercator provides graphical overview charts, detailed annotation information in a convenient web browser interface and a MapMan‐to‐GO translation table to export results as GO terms. Mercator is available free of charge via http://mapman.gabipd.org/web/guest/app/Mercator .  相似文献   

7.
8.
PURPOSE OF REVIEW: The identification of regulatory polymorphisms has become a key problem in human genetics. In the past few years there has been a conceptual change in the way in which regulatory single-nucleotide polymorphisms are studied. We revise the new approaches and discuss how gene expression studies can contribute to a better knowledge of the genetics of common diseases. RECENT FINDINGS: New techniques for the association of single-nucleotide polymorphisms with changes in gene expression have been recently developed. This, together with a more comprehensive use of the old in-vitro methods, has produced a great amount of genetic information. When added to current databases, it will help to design better tools for the detection of regulatory single-nucleotide polymorphisms. SUMMARY: The identification of functional regulatory single-nucleotide polymorphisms cannot be done by the simple inspection of DNA sequence. In-vivo techniques, based on primer-extension, and the more recently developed 'haploChIP' allow the association of gene variants to changes in gene expression. Gene expression analysis by conventional in-vitro techniques is the only way to identify the functional consequences of regulatory single-nucleotide polymorphisms. The amount of information produced in the last few years will help to refine the tools for the future analysis of regulatory gene variants.  相似文献   

9.
Gene identification in novel eukaryotic genomes by self-training algorithm   总被引:8,自引:0,他引:8  
Finding new protein-coding genes is one of the most important goals of eukaryotic genome sequencing projects. However, genomic organization of novel eukaryotic genomes is diverse and ab initio gene finding tools tuned up for previously studied species are rarely suitable for efficacious gene hunting in DNA sequences of a new genome. Gene identification methods based on cDNA and expressed sequence tag (EST) mapping to genomic DNA or those using alignments to closely related genomes rely either on existence of abundant cDNA and EST data and/or availability on reference genomes. Conventional statistical ab initio methods require large training sets of validated genes for estimating gene model parameters. In practice, neither one of these types of data may be available in sufficient amount until rather late stages of the novel genome sequencing. Nevertheless, we have shown that gene finding in eukaryotic genomes could be carried out in parallel with statistical models estimation directly from yet anonymous genomic DNA. The suggested method of parallelization of gene prediction with the model parameters estimation follows the path of the iterative Viterbi training. Rounds of genomic sequence labeling into coding and non-coding regions are followed by the rounds of model parameters estimation. Several dynamically changing restrictions on the possible range of model parameters are added to filter out fluctuations in the initial steps of the algorithm that could redirect the iteration process away from the biologically relevant point in parameter space. Tests on well-studied eukaryotic genomes have shown that the new method performs comparably or better than conventional methods where the supervised model training precedes the gene prediction step. Several novel genomes have been analyzed and biologically interesting findings are discussed. Thus, a self-training algorithm that had been assumed feasible only for prokaryotic genomes has now been developed for ab initio eukaryotic gene identification.  相似文献   

10.
MOTIVATION: Sequence databases represent an enormous resource of phylogenetic information, but there is a lack of tools for accessing that information in order to assess the amount of evolutionary information in these databases that may be suitable for phylogenetic reconstruction and for identifying areas of the taxonomy that are under-represented for specific gene sequences. RESULTS: We have developed TreeGeneBrowser which allows inspection and evaluation of gene sequence data for phylogenetic reconstruction. This program improves the efficiency of identification of genes that may be useful for particular phylogenetic studies and identifies taxa and taxonomic branches that are under-represented in sequence databases.  相似文献   

11.
Fungi comprise a large monophyletic group of uni- and multicellular eukaryotic organisms in which many species are of economic or medical importance. Fungal genomes are variable in size (13–42 Mb), and multicellular species support true spatial and temporal cell-type-specific regulation of gene expression. In a 38.8-kbAspergillus nidulanscontiguous genomic DNA region, a transposable element and 12 potential genes were identified, 7 similar to genes in other organisms. This observation is consistent with the prediction that multicellular ascomycetous fungi harbor 8000–9000 genes in a 36-Mb average genome. Thus, the genomic DNA sequence of filamentous fungi will provide substantial amounts of genetic and functional information that is not available in yeast, for the human and other metazoan minimal gene complement.  相似文献   

12.
The development of tools for the analysis of global gene expression is vital for the optimal exploitation of the data on parasite genomes that are now being generated in abundance. Recent advances in two-dimensional electrophoresis (2-DE), mass spectrometry and bioinformatics have greatly enhanced the possibilities for mapping and characterisation of protein populations. We have employed these developments in a proteomics approach for the analysis of proteins expressed in the tachyzoite stage of Toxoplasma gondii. Over 1000 polypeptides were reproducibly separated by high-resolution 2-DE using the pH ranges 4-7 and 6-11. Further separations using narrow range gels suggest that at least 3000-4000 polypeptides should be resolvable by 2-DE using multiple single pH unit gels. Mass spectrometry was used to characterise a variety of protein spots on the 2-DE gels. Peptide mass fingerprints, acquired by matrix-assisted laser desorption/ionisation-(MALDI) mass spectrometry, enabled unambiguous protein identifications to be made where full gene sequence information was available. However, interpretation of peptide mass fingerprint data using the T. gondii expressed sequence tag (EST) database was less reliable. Peptide fragmentation data, acquired by post-source decay mass spectrometry, proved a more successful strategy for the putative identification of proteins using the T. gondii EST database and protein databases from other organisms. In some instances, several protein spots appeared to be encoded by the same gene, indicating that post-translational modification and/or alternative splicing events may be a common feature of functional gene expression in T. gondii. The data demonstrate that proteomic analyses are now viable for T. gondii and other protozoa for which there are good EST databases, even in the absence of complete genome sequence. Moreover, proteomics is of great value in interpreting and annotating EST databases.  相似文献   

13.
To understand how protein segments are inserted and deleted during divergent evolution, a set of pairwise alignments contained exactly one gap, and therefore arising from the first insertion-deletion (indel) event in the time separating the homologs, was examined. The alignments showed that "structure breaking" amino acids (PGDNS) were preferred within and flanking gapped regions, as are two residues with hydrophilic side-chains (QE) that frequently occur at the surface of protein folds. Conversely, hydrophobic residues (FMILYVW) occur infrequently within and flanking the gapped region. These preferences are modestly different in protein pairs separated by an episode of adaptive evolution, than in pairs diverging under strong functional constraints. Surprisingly, regions near an indel have not evolved more rapidly than the sequence pair overall, showing no evidence that an indel event must be compensated by local amino acid replacement. The gap-lengths are best approximated by a Zipfian distribution, with the probability of a gap of length L decreasing as a function of L(-1.8). These features are largely independent of the length of the gap and the extent of divergence (measured by both silent and non-silent sequence changes) separating the two proteins. Surprisingly, amino acid repeats were discovered in more than a third of the polypeptide segments in and around the gap. These correspond to repeats in the DNA sequence. This suggests that a signature of the mechanism by which indels occur in the DNA sequence remains in the encoded protein sequences. These data suggest specific tools to score gap placement in an alignment. They also suggest tools that distinguish true indels from gaps created by mistaken gene finding, including under-predicted and over-predicted introns. By providing mechanisms to identify errors, the tools will enhance the value of genome sequence databases in support of integrated paleogenomics strategies used to extract functional information in a post-genomic environment.  相似文献   

14.
The applications of functional genomics, proteomics and informatics to cancer research have yielded a tremendous amount of information, which is growing all the time. Much of this information is available publicly on the Internet and ranges from general information about different cancers from a patient or clinical viewpoint, through to databases suitable for cancer researchers of all backgrounds, to very specific sites dedicated to individual genes or molecules. A simple search for 'cancer' from a typical Web browser search engine yields more than half a million hits; an even more specific search for 'leukaemia' (>40 000 hits) or 'p53' (>5700 hits) yields far too many hits to allow one to identify particular sites of interest. This review aims to provide a brief guide to some of the resources and databases that can be used as springboards to home in rapidly on information relevant to many fields of cancer research. As such, this article will not focus on a single website but hopes to illustrate some of the ways that postgenomic biology is revolutionizing cancer research. It will cover genomics and proteomics approaches that have been applied to studying global expression patterns in cancers, in addition to providing links ranging from general information about cancer to specific cancer gene mutation databases.  相似文献   

15.
As gene annotation databases continue to evolve and improve, it has become feasible to incorporate the functional and pathway information about genes, available in these databases into the analysis of gene expression data, for a better understanding of the underlying mechanisms. A few methods have been proposed in the literature to formally convert individual gene results into gene function results. In this paper, we will compare the various methods, propose and examine some new ones, and offer a structured approach to incorporating gene function or pathway information into the analysis of expression data. We study the performance of the various methods and also compare them on real data, using a case study from the toxicogenomics area. Our results show that the approaches based on gene function scores yield a different, and functionally more interpretable, array of genes than methods that rely solely on individual gene scores. They also suggest that functional class scoring methods appear to perform better and more consistently than overrepresentation analysis and distributional score methods.  相似文献   

16.
The bacteria Escherichia coli has been widely employed in studies of eukaryotic DNA repair genes. Several eukaryotic genes have been cloned by functional complementation of mutant lineages of E. coli. We examined the similarities and differences among bacterial and eukaryotic DNA repair systems. Based on these data, we examined tools used for gene cloning and functional studies of DNA repair in eukaryotes, using this bacterial system as a model.  相似文献   

17.
A huge database resulted from whole genome sequencings has provided a possibility of new information that is likely to extent the scope and thus changes the way of approach for the functional assigning of putative open reading frames annotated by whole genome sequence analyses. These are mainly realized by ease, one-step identification of putative genes using genomics or proteomics tools. A major challenge remained in biotechnology may translate these informations into better ways to screen or select a gene as a representative sequence. Further attempts to mine the related whole genes or partial DNA fragments from whole genome treasure, and then the incorporation of these sequences into a representative template, will result in the use of genetic information that can be translated into functional proteins or allowed the generation of new lineages as a valuable pool. Such screens enable rapid biochemical analysis and easy isolation of the target activity, thereby accelerating the screening of novel enzymes from the expanded library with related sequences. Information-based PCR amplification of whole genes and reconstitution of functional DNA fragments will provide a platform for expanding the functional spaces of potential enzymes, especially when used mixed- and metagenome as gene resources.  相似文献   

18.
《Genomics》2021,113(6):3635-3643
The 16S rRNA gene amplicon sequencing is a popular technique that provides accurate characterization of microbial taxonomic abundances but does not provide any functional information. Several tools are available to predict functional profiles based on 16S rRNA gene sequence data that use different genome databases and approaches. As variable regions of partially-sequenced 16S rRNA gene cannot resolve taxonomy accurately beyond the genus level, these tools may give inflated results. Here, we developed ‘MicFunPred’, which uses a novel approach to derive imputed metagenomes based on a set of core genes only, thereby minimizing false-positive predictions. On simulated datasets, MicFunPred showed the lowest False Positive Rate (FPR) with mean Spearman's correlation of 0.89 (SD = 0.03), while on seven real datasets the mean correlation was 0.75 (SD = 0.08). MicFunPred was found to be faster with low computational requirements and performed better or comparable when compared with other tools.  相似文献   

19.
As the amount of biological data grows, so does the need for biologists to store and access this information in central repositories in a free and unambiguous manner. The European Bioinformatics Institute (EBI) hosts six core databases, which store information on DNA sequences (EMBL-Bank), protein sequences (SWISS-PROT and TrEMBL), protein structure (MSD), whole genomes (Ensembl) and gene expression (ArrayExpress). But just as a cell would be useless if it couldn't transcribe DNA or translate RNA, our resources would be compromised if each existed in isolation. We have therefore developed a range of tools that not only facilitate the deposition and retrieval of biological information, but also allow users to carry out searches that reflect the interconnectedness of biological information. The EBI's databases and tools are all available on our website at www.ebi.ac.uk.  相似文献   

20.
In the past decade there has been an increase in the number of completely sequenced genomes due to the race of multibillion-dollar genome-sequencing projects. The enormous biological sequence data thus flooding into the sequence databases necessitates the development of efficient tools for comparative genome sequence analysis. The information deduced by such analysis has various applications viz. structural and functional annotation of novel genes and proteins, finding gene order in the genome, gene fusion studies, constructing metabolic pathways etc. Such study also proves invaluable for pharmaceutical industries, such as in silico drug target identification and new drug discovery. There are various sequence analysis tools available for mining such useful information of which FASTA and Smith-Waterman algorithms are widely used. However, analyzing large datasets of genome sequences using the above codes seems to be impractical on uniprocessor machines. Hence there is a need for improving the performance of the above popular sequence analysis tools on parallel cluster computers. Performance of the Smith-Waterman (SSEARCH) and FASTA programs were studied on PARAM 10000, a parallel cluster of workstations designed and developed in-house. FASTA and SSEARCH programs, which are available from the University of Virginia, were ported on PARAM and were optimized. In this era of high performance computing, where the paradigm is shifting from conventional supercomputers to the cost-effective general-purpose cluster of workstations and PCs, this study finds extreme relevance. Good performance of sequence analysis tools on a cluster of workstations was demonstrated, which is important for accelerating identification of novel genes and drug targets by screening large databases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号