首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Exon discovery by genomic sequence alignment   总被引:5,自引:0,他引:5  
MOTIVATION: During evolution, functional regions in genomic sequences tend to be more highly conserved than randomly mutating 'junk DNA' so local sequence similarity often indicates biological functionality. This fact can be used to identify functional elements in large eukaryotic DNA sequences by cross-species sequence comparison. In recent years, several gene-prediction methods have been proposed that work by comparing anonymous genomic sequences, for example from human and mouse. The main advantage of these methods is that they are based on simple and generally applicable measures of (local) sequence similarity; unlike standard gene-finding approaches they do not depend on species-specific training data or on the presence of cognate genes in data bases. As all comparative sequence-analysis methods, the new comparative gene-finding approaches critically rely on the quality of the underlying sequence alignments. RESULTS: Herein, we describe a new implementation of the sequence-alignment program DIALIGN that has been developed for alignment of large genomic sequences. We compare our method to the alignment programs PipMaker, WABA and BLAST and we show that local similarities identified by these programs are highly correlated to protein-coding regions. In our test runs, PipMaker was the most sensitive method while DIALIGN was most specific. AVAILABILITY: The program is downloadable from the DIALIGN home page at http://bibiserv.techfak.uni-bielefeld.de/dialign/.  相似文献   

2.
While genome sequencing efforts reveal the basic building blocksof life, a genome sequence alone is insufficient for elucidatingbiological function. Genome annotation—the process ofidentifying genes and assigning function to each gene in a genomesequence—provides the means to elucidate biological functionfrom sequence. Current state-of-the-art high-throughput genomeannotation uses a combination of comparative (sequence similaritydata) and non-comparative (ab initio gene prediction algorithms)methods to identify protein-coding genes in genome sequences.Because approaches used to validate the presence of predictedprotein-coding genes are typically based on expressed RNA sequences,they cannot independently and unequivocally determine whethera predicted protein-coding gene is translated into a protein.With the ability to directly measure peptides arising from expressedproteins, high-throughput liquid chromatography-tandem massspectrometry-based proteomics approaches can be used to verifycoding regions of a genomic sequence. Here, we highlight severalways in which high-throughput tandem mass spectrometry-basedproteomics can improve the quality of genome annotations andsuggest that it could be efficiently applied during the genecalling process so that the improvements are propagated throughthe subsequent functional annotation process.   相似文献   

3.
Gene structure conservation aids similarity based gene prediction   总被引:4,自引:1,他引:3       下载免费PDF全文
One of the primary tasks in deciphering the functional contents of a newly sequenced genome is the identification of its protein coding genes. Existing computational methods for gene prediction include ab initio methods which use the DNA sequence itself as the only source of information, comparative methods using multiple genomic sequences, and similarity based methods which employ the cDNA or protein sequences of related genes to aid the gene prediction. We present here an algorithm implemented in a computer program called Projector which combines comparative and similarity approaches. Projector employs similarity information at the genomic DNA level by directly using known genes annotated on one DNA sequence to predict the corresponding related genes on another DNA sequence. It therefore makes explicit use of the conservation of the exon–intron structure between two related genes in addition to the similarity of their encoded amino acid sequences. We evaluate the performance of Projector by comparing it with the program Genewise on a test set of 491 pairs of independently confirmed mouse and human genes. It is more accurate than Genewise for genes whose proteins are <80% identical, and is suitable for use in a combined gene prediction system where other methods identify well conserved and non-conserved genes, and pseudogenes.  相似文献   

4.
The accurate prediction of higher eukaryotic gene structures and regulatory elements directly from genomic sequences is an important early step in the understanding of newly assembled contigs and finished genomes. As more new genomes are sequenced, comparative approaches are becoming increasingly practical and valuable for predicting genes and regulatory elements. We demonstrate the effectiveness of a comparative method called pattern filtering; it utilizes synteny between two or more genomic segments for the annotation of genomic sequences. Pattern filtering optimally detects the signatures of conserved functional elements despite the stochastic noise inherent in evolutionary processes, allowing more accurate annotation of gene models. We anticipate that pattern filtering will facilitate sequence annotation and the discovery of new functional elements by the genetics and genomics communities.  相似文献   

5.
The availability of genomic resources has already had a tremendous impact on biomedical research. In this review, we describe how whole genome sequence and high-throughput functional genomics projects have facilitated the identification and characterization of important genes in lipid metabolism and disease. We review key approaches and lipid genes identified in the first years of this century and discuss how genomic resources are likely to streamline gene identification and functional characterization in the future.  相似文献   

6.
基于直向同源序列的比较基因组学研究   总被引:2,自引:0,他引:2  
直向同源序列在不同的物种中具有相近甚至相同的功能、相似的调控途径, 扮演相似甚至相同的角色, 而且, 绝大多数核心生物功能就是由相当数量的直向同源基因所承担, 它是基因组序列的功能注释与分析中最可靠的选择, 其特殊的生物学特性决定: 利用直向同源序列开展比较基因组学研究, 必将为探测不同生物在进化过程中重要功能基因的出现、表达和丢失提供线索。文章从直向同源基因的基本特性、直向同源序列与比较基因组学的关系、应用直向同源序列开展比较基因组学相关研究方法、现状等展开综述。关键词: 直向同源; 比较基因组学; 生物学特性; 数据库  相似文献   

7.
Many bacterial pathogens promote infection and cause disease by directly injecting into host cells proteins that manipulate eukaryotic cellular processes. Identification of these translocated proteins is essential to understanding pathogenesis. Yet, their identification remains limited. This, in part, is due to their general sequence uniqueness, which confounds homology-based identification by comparative genomic methods. In addition, their absence often does not result in phenotypes in virulence assays limiting functional genetic screens. Translocated proteins have been observed to confer toxic phenotypes when expressed in the yeast Saccharomyces cerevisiae. This observation suggests that yeast growth inhibition can be used as an indicator of protein translocation in functional genomic screens. However, limited information is available regarding the behavior of non-translocated proteins in yeast. We developed a semi-automated quantitative assay to monitor the growth of hundreds of yeast strains in parallel. We observed that expression of half of the 19 Shigella translocated proteins tested but almost none of the 20 non-translocated Shigella proteins nor approximately 1,000 Francisella tularensis proteins significantly inhibited yeast growth. Not only does this study establish that yeast growth inhibition is a sensitive and specific indicator of translocated proteins, but we also identified a new substrate of the Shigella type III secretion system (TTSS), IpaJ, previously missed by other experimental approaches. In those cases where the mechanisms of action of the translocated proteins are known, significant yeast growth inhibition correlated with the targeting of conserved cellular processes. By providing positive rather than negative indication of activity our assay complements existing approaches for identification of translocated proteins. In addition, because this assay only requires genomic DNA it is particularly valuable for studying pathogens that are difficult to genetically manipulate or dangerous to culture.  相似文献   

8.
New technologies based on DNA microarrays and comparative genomics hold great promise for providing the background biological information necessary for effective coral reef conservation and management. Microarray analysis has been used in a wide range of applications across the biological sciences, most frequently to examine simultaneous changes in the expression of large numbers of genes in response to experimental manipulation or environmental variation. Other applications of microarray methods include the assessment of divergence in gene sequences between species and the identification of fast-evolving genes. Arrays are presently available for only a limited range of species, but with appropriate controls they can be used for related species, thus avoiding the considerable costs associated with development of a system de novo. Arrays are in use or preparation to study stress responses, early development, and symbiosis in Acropora and Montastraea. Ongoing projects on several corals are making available large numbers of expressed gene sequences, enabling the identification of candidate genes for studies on gamete specificity, allorecognition and symbiont interactions. Over the next few years, microarray and comparative genomic approaches are likely to assume increasingly important and widespread use to study many aspects of the biology of coral reef organisms. Application of these genomic approaches to enhance our understanding of genetic and physiological correlates during stress, environmental disturbance and disease bears direct relevance to the conservation of coral reef ecosystems. S. Forêt and K.S. Kassahn contributed equally.  相似文献   

9.
The Xylella fastidiosa comparative genomic database is a scientific resource with the aim to provide a user-friendly interface for accessing high-quality manually curated genomic annotation and comparative sequence analysis, as well as for identifying and mapping prophage-like elements, a marked feature of Xylella genomes. Here we describe a database and tools for exploring the biology of this important plant pathogen. The hallmarks of this database are the high quality genomic annotation, the functional and comparative genomic analysis and the identification and mapping of prophage-like elements. It is available from web site http://www.xylella.lncc.br.  相似文献   

10.
The study of complex biological questions through comparative proteomics is becoming increasingly attractive to plant biologists as the rapidly expanding plant genomic and expressed sequence tag databases provide improved opportunities for protein identification. This review focuses on practical issues associated with comparative proteomic analysis, including the challenges of effective protein extraction and separation from plant tissues, the pros and cons of two-dimensional gel-based analysis and the problems of identifying proteins from species that are not recognized models for functional genomic studies. Specific points are illustrated using data from an ongoing study of the tomato and pepper fruit proteomes.  相似文献   

11.
With the continuing accomplishments of the human genome project, high-throughput strategies to identify DNA sequences that are important in mammalian gene regulation are becoming increasingly feasible. In contrast to the historic, labour-intensive, wet-laboratory methods for identifying regulatory sequences, many modern approaches are heavily focused on the computational analysis of large genomic data sets. Data from inter-species genomic sequence comparisons and genome-wide expression profiling, integrated with various computational tools, are poised to contribute to the decoding of genomic sequence and to the identification of those sequences that orchestrate gene regulation. In this review, we highlight several genomic approaches that are being used to identify regulatory sequences in mammalian genomes.  相似文献   

12.
We developed an algorithm named GEAR (genomic enrichment analysis of regional DNA copy number changes) for functional interpretation of genome-wide DNA copy number changes identified by array-based comparative genomic hybridization. GEAR selects two types of chromosomal alterations with potential biological relevance, i.e. recurrent and phenotype-specific alterations. Then it performs functional enrichment analysis using a priori selected functional gene sets to identify primary and clinical genomic signatures. The genomic signatures identified by GEAR represent functionally coordinated genomic changes, which can provide clues on the underlying molecular mechanisms related to the phenotypes of interest. GEAR can help the identification of key molecular functions that are activated or repressed in the tumor genomes leading to the improved understanding on the tumor biology. AVAILABILITY: GEAR software is available with online manual in the website, http://www.systemsbiology.co.kr/GEAR/.  相似文献   

13.
Cancer is a genetic disease that results from a variety of genomic alterations. Identification of some of these causal genetic events has enabled the development of targeted therapeutics and spurred efforts to discover the key genes that drive cancer formation. Rapidly improving sequencing and genotyping technology continues to generate increasingly large datasets that require analytical methods to identify functional alterations that deserve additional investigation. This review examines statistical and computational approaches for the identification of functional changes among sets of single-nucleotide substitutions. Frequency-based methods identify the most highly mutated genes in large-scale cancer sequencing efforts while bioinformatics approaches are effective for independent evaluation of both non-synonymous mutations and polymorphisms. We also review current knowledge and tools that can be utilized for analysis of alterations in non-protein-coding genomic sequence.  相似文献   

14.
Short interspersed nuclear elements (SINEs) are a type of class 1 transposable element (retrotransposon) with features that allow investigators to resolve evolutionary relationships between populations and species while providing insight into genome composition and function. Characterization of a Carnivora-specific SINE family, Can-SINEs, has, has aided comparative genomic studies by providing rare genomic changes, and neutral sequence variants often needed to resolve difficult evolutionary questions. In addition, Can-SINEs constitute a significant source of functional diversity with Carnivora. Publication of the whole-genome sequence of domestic dog, domestic cat, and giant panda serves as a valuable resource in comparative genomic inferences gleaned from Can-SINEs. In anticipation of forthcoming studies bolstered by new genomic data, this review describes the discovery and characterization of Can-SINE motifs as well as describes composition, distribution, and effect on genome function. As the contribution of noncoding sequences to genomic diversity becomes more apparent, SINEs and other transposable elements will play an increasingly large role in mammalian comparative genomics.  相似文献   

15.
Unicellular algae serve as models for the study and discovery of metabolic pathways, for the functional dissection of cell biological processes such as organellar division and cell motility, and for the identification of novel genes and gene functions. The recent completion of several algal genome sequences and expressed sequence tag collections and the establishment of nuclear and organellar transformation methods has opened the way for functional genomics approaches using algal model systems. The thermo-acidophilic unicellular red alga Galdieria sulphuraria represents a particularly interesting species for a genomics approach owing to its extraordinary metabolic versatility such as heterotrophic and mixotrophic growth on more than 50 different carbon sources and its adaptation to hot acidic environments. However, the ab initio prediction of genes required for unknown metabolic pathways from genome sequences is not trivial. A compelling strategy for gene identification is the comparison of similarly sized genomes of related organisms with different physiologies. Using this approach, candidate genes were identified that are critical to the metabolic versatility of Galdieria. Expressed sequence tags and high-throughput genomic sequence reads covering >70% of the G. sulphuraria genome were compared to the genome of the unicellular, obligate photoautotrophic red alga Cyanidioschyzon merolae. More than 30% of the Galdieria sequences did not relate to any of the Cyanidioschyzon genes. A closer inspection of these sequences revealed a large number of membrane transporters and enzymes of carbohydrate metabolism that are unique to Galdieria. Based on these data, it is proposed that genes involved in the uptake of reduced carbon compounds and enzymes involved in their metabolism are crucial to the metabolic flexibility of G. sulphuraria.  相似文献   

16.
About 40% of the proteins encoded in eukaryotic genomes are proteins of unknown function (PUFs). Their functional characterization remains one of the main challenges in modern biology. In this study we identified the PUF encoding genes from Arabidopsis (Arabidopsis thaliana) using a combination of sequence similarity, domain-based, and empirical approaches. Large-scale gene expression analyses of 1,310 publicly available Affymetrix chips were performed to associate the identified PUF genes with regulatory networks and biological processes of known function. To generate quality results, the study was restricted to expression sets with replicated samples. First, genome-wide clustering and gene function enrichment analysis of clusters allowed us to associate 1,541 PUF genes with tightly coexpressed genes for proteins of known function (PKFs). Over 70% of them could be assigned to more specific biological process annotations than the ones available in the current Gene Ontology release. The most highly overrepresented functional categories in the obtained clusters were ribosome assembly, photosynthesis, and cell wall pathways. Interestingly, the majority of the PUF genes appeared to be controlled by the same regulatory networks as most PKF genes, because clusters enriched in PUF genes were extremely rare. Second, large-scale analysis of differentially expressed genes was applied to identify a comprehensive set of abiotic stress-response genes. This analysis resulted in the identification of 269 PKF and 104 PUF genes that responded to a wide variety of abiotic stresses, whereas 608 PKF and 206 PUF genes responded predominantly to specific stress treatments. The provided coexpression and differentially expressed gene data represent an important resource for guiding future functional characterization experiments of PUF and PKF genes. Finally, the public Plant Gene Expression Database (http://bioweb.ucr.edu/PED) was developed as part of this project to provide efficient access and mining tools for the vast gene expression data of this study.  相似文献   

17.
A vast majority of the burden from neglected tropical diseases result from helminth infections (nematodes and platyhelminthes). Parasitic helminthes infect over 2 billion, exerting a high collective burden that rivals high-mortality conditions such as AIDS or malaria, and cause devastation to crops and livestock. The challenges to improve control of parasitic helminth infections are multi-fold and no single category of approaches will meet them all. New information such as helminth genomics, functional genomics and proteomics coupled with innovative bioinformatic approaches provide fundamental molecular information about these parasites, accelerating both basic research as well as development of effective diagnostics, vaccines and new drugs. To facilitate such studies we have developed an online resource, HelmCoP (Helminth Control and Prevention), built by integrating functional, structural and comparative genomic data from plant, animal and human helminthes, to enable researchers to develop strategies for drug, vaccine and pesticide prioritization, while also providing a useful comparative genomics platform. HelmCoP encompasses genomic data from several hosts, including model organisms, along with a comprehensive suite of structural and functional annotations, to assist in comparative analyses and to study host-parasite interactions. The HelmCoP interface, with a sophisticated query engine as a backbone, allows users to search for multi-factorial combinations of properties and serves readily accessible information that will assist in the identification of various genes of interest. HelmCoP is publicly available at: http://www.nematode.net/helmcop.html.  相似文献   

18.
During mammalian evolution, complex systems of epigenetic gene regulation have been established: Epigenetic mechanisms control tissue-specific gene expression, X chromosome inactivation in females and genomic imprinting. Studying DNA sequence conservation in imprinted genes, it becomes evident that evolution of gene function and evolution of epigenetic gene regulation are tightly connected. Furthermore, comparative studies allow the identification of DNA sequence features that distinguish imprinted genes from biallelically expressed genes. Among these features are CpG islands, tandem repeats and retrotransposed elements that are known to play major roles in epigenetic gene regulation. Currently, more and more genetic and epigenetic data sets become available. In future, such data sets will provide the basis for more complex investigations on epigenetic variation in human populations. Therein, an exciting topic will be the genetic and epigenetic variability of imprinted genes and its input on human disease.  相似文献   

19.
The knowledge of complete sequences of different organisms is dramatically changing the landscape of biological research and pharmaceutical development. We are experiencing a transition from a trial-and-error approach in traditional biological research and natural product drug discovery to a systematic operation in genomics and target-specific drug design and selection. Small, cell-permeable and target-specific chemical ligands are particularly useful in systematic genomic approaches to study biological questions. On the other hand, genomic sequence information, comparative and structural genomics, when combined with the cutting edge technologies in synthetic chemistry and ligand screening/identification, provide a powerful way to produce target-specific and/or function-specific chemical ligands and drugs. Chemical genomics or chemogenomics is a new term that describes the development of target-specific chemical ligands and the use of such chemical ligands to globally study gene and protein functions. We anticipate that chemical genomics plays a critical role in the genomic age of biological research and drug discovery.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号