首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 19 毫秒
1.
While genome sequencing efforts reveal the basic building blocksof life, a genome sequence alone is insufficient for elucidatingbiological function. Genome annotation—the process ofidentifying genes and assigning function to each gene in a genomesequence—provides the means to elucidate biological functionfrom sequence. Current state-of-the-art high-throughput genomeannotation uses a combination of comparative (sequence similaritydata) and non-comparative (ab initio gene prediction algorithms)methods to identify protein-coding genes in genome sequences.Because approaches used to validate the presence of predictedprotein-coding genes are typically based on expressed RNA sequences,they cannot independently and unequivocally determine whethera predicted protein-coding gene is translated into a protein.With the ability to directly measure peptides arising from expressedproteins, high-throughput liquid chromatography-tandem massspectrometry-based proteomics approaches can be used to verifycoding regions of a genomic sequence. Here, we highlight severalways in which high-throughput tandem mass spectrometry-basedproteomics can improve the quality of genome annotations andsuggest that it could be efficiently applied during the genecalling process so that the improvements are propagated throughthe subsequent functional annotation process.   相似文献   

2.
Complex genomic libraries are increasingly being used to retrieve complete genes, operons or large genomic fragments directly from environmental samples, without the need to cultivate the respective microorganisms. We report on the construction of three large-insert fosmid libraries in total covering 3 Gbp of community DNA from two different soil samples, a sandy ecosystem and a mixed forest soil. In a fosmid end sequencing approach including 5376 sequence tags of approximately 700 bp length, we show that mostly bacterial and, to a much lesser extent, archaeal and eukaryotic genome fragments (approximately 1% each) have been captured in our libraries. The diversity of putative protein-encoding genes, as reflected by their distribution into different COG clusters, was comparable to that encoded in complete genomes of cultivated microorganisms. A huge variety of genomic fragments has been captured in our libraries, as seen by comparison with sequences in the public databases and by the large variation in G+C contents. We dissect differences between the libraries, which relate to the different ecosystems analysed and to biases introduced by different DNA preparations. Furthermore, a range of taxonomic marker genes (other than 16S rRNA) has been identified that allows the assignment of genome fragments to specific lineages. The complete sequences of two genome fragments identified as being affiliated with Archaea, based on a gene encoding a CDC48 homologue and a thermosome subunit, respectively, are presented and discussed. We thereby extend the genomic information of uncultivated crenarchaeota from soil and offer hints to specific metabolic traits present in this group.  相似文献   

3.
孙敏  陈天宇  冯红 《微生物学通报》2021,48(5):1648-1661
[背景]耐辐射微生物是一类重要的极端微生物资源,在研究其耐受机制以及环境保护等方面具有重大的意义.[目的]从基因组和转录组角度解析耐辐射藤黄微球菌(Micrococcus luteus) V017的抗性遗传背景以及对辐照的转录组响应.[方法]利用PacBio平台对菌株V017进行基因组测序,通过比较基因组分析菌株V01...  相似文献   

4.
5.
The African trypanosome genome   总被引:1,自引:0,他引:1  
The haploid nuclear genome of the African trypanosome, Trypanosoma brucei, is about 35 Mb and varies in size among different trypanosome isolates by as much as 25%. The nuclear DNA of this diploid organism is distributed among three size classes of chromosomes: the megabase chromosomes of which there are at least 11 pairs ranging from 1 Mb to more than 6 Mb (numbered I-XI from smallest to largest); several intermediate chromosomes of 200-900 kb and uncertain ploidy; and about 100 linear minichromosomes of 50-150 kb. Size differences of as much as four-fold can occur, both between the two homologues of a megabase chromosome pair in a specific trypanosome isolate and among chromosome pairs in different isolates. The genomic DNA sequences determined to date indicated that about 50% of the genome is coding sequence. The chromosomal telomeres possess TTAGGG repeats and many, if not all, of the telomeres of the megabase and intermediate chromosomes are linked to expression sites for genes encoding variant surface glycoproteins (VSGs). The minichromosomes serve as repositories for VSG genes since some but not all of their telomeres are linked to unexpressed VSG genes. A gene discovery program, based on sequencing the ends of cloned genomic DNA fragments, has generated more than 20 Mb of discontinuous single-pass genomic sequence data during the past year, and the complete sequences of chromosomes I and II (about 1 Mb each) in T. brucei GUTat 10.1 are currently being determined. It is anticipated that the entire genomic sequence of this organism will be known in a few years. Analysis of a test microarray of 400 cDNAs and small random genomic DNA fragments probed with RNAs from two developmental stages of T. brucei demonstrates that the microarray technology can be used to identify batteries of genes differentially expressed during the various life cycle stages of this parasite.  相似文献   

6.
7.

Background  

With genome sequencing becoming more and more affordable, environmental shotgun sequencing of the microorganisms present in an environment generates a challenging amount of sequence data for the scientific community. These sequence data enable the diversity of the microbial world and the metabolic pathways within an environment to be investigated, a previously unthinkable achievement when using traditional approaches. DNA sequence data assembled from extracts of 0.8 μm filtered Sargasso seawater unveiled an unprecedented glimpse of marine prokaryotic diversity and gene content. Serendipitously, many sequences representing picoeukaryotes (cell size <2 μm) were also present within this dataset. We investigated the picoeukaryotic diversity of this database by searching sequences containing homologs of eight nuclear anchor genes that are well conserved throughout the eukaryotic lineage, as well as one chloroplastic and one mitochondrial gene.  相似文献   

8.
The large genome size of many species hinders the development and application of genomic tools to study them. For instance, loblolly pine (Pinus taeda L.), an ecologically and economically important conifer, has a large and yet uncharacterized genome of 21.7 Gbp. To characterize the pine genome, we performed exome capture and sequencing of 14 729 genes derived from an assembly of expressed sequence tags. Efficiency of sequence capture was evaluated and shown to be similar across samples with increasing levels of complexity, including haploid cDNA, haploid genomic DNA and diploid genomic DNA. However, this efficiency was severely reduced for probes that overlapped multiple exons, presumably because intron sequences hindered probe:exon hybridizations. Such regions could not be entirely avoided during probe design, because of the lack of a reference sequence. To improve the throughput and reduce the cost of sequence capture, a method to multiplex the analysis of up to eight samples was developed. Sequence data showed that multiplexed capture was reproducible among 24 haploid samples, and can be applied for high‐throughput analysis of targeted genes in large populations. Captured sequences were de novo assembled, resulting in 11 396 expanded and annotated gene models, significantly improving the knowledge about the pine gene space. Interspecific capture was also evaluated with over 98% of all probes designed from P. taeda that were efficient in sequence capture, were also suitable for analysis of the related species Pinus elliottii Engelm.  相似文献   

9.
10.
Expressed sequence tag projects have currently produced over 400 000 partial gene sequences from more than 30 nematode species and the full genomic sequences of selected nematodes are being determined. In addition, functional analyses in the model nematode Caenorhabditis elegans have addressed the role of almost all genes predicted by the genome sequence. This recent explosion in the amount of available nematode DNA sequences, coupled with new gene function data, provides an unprecedented opportunity to identify pre-validated drug targets through efficient mining of nematode genomic databases. This article describes the various information sources available and strategies that can expedite this process.  相似文献   

11.
The increasing number of whole genomic sequences of microorganisms has led to the complexity of genome-wide annotation and gene sequence comparison among multiple microorganisms. To address this problem, we have developed nWayComp software that compares DNA and protein sequences of phylogenetically-related microorganisms. This package integrates a series of bioinformatics tools such as BLAST, ClustalW, ALIGN, PHYLIP and PRIMER3 for sequence comparison. It searches for homologous sequences among multiple organisms and identifies genes that are unique to a particular organism. The homologous gene sets are then ranked in the descending order of the sequence similarity. For each set of homologous sequences, a table of sequence identity among homologous genes along with sequence variations such as SNPs and INDELS is developed, and a phylogenetic tree is constructed. In addition, a common set of primers that can amplify all the homologous sequences are generated. The nWayComp package provides users with a quick and convenient tool to compare genomic sequences among multiple organisms at the whole-genome level.  相似文献   

12.
As a first step towards understanding the molecular mechanisms through which the expression of the gene (OAT) encoding ornithine aminotransferase (OAT) is regulated in a tissue-specific manner, we have used a near full length OAT cDNA to isolate related sequences from a rat genomic DNA library. Twenty-one unique clones representing five contigs and spanning approximately 140 kb of genomic DNA were isolated and characterized. From these clones we have identified a single expressed OAT gene and three processed pseudogenes. The comparison of the EcoRI, BamHI, and HindIII fragments contained within these genomic clones with those detected in total genomic DNA by the cDNA probe suggests that essentially all of the OAT-related sequences in the rat genome have been isolated. Thus, the tissue-specific regulation of OAT gene expression appears to be effected through a single expressed gene. Data are presented which suggest that the OAT-1, OAT-2, and OAT-3 pseudogenes arose approximately 28.5, 7.3, and 25.1 Myr ago, respectively. Mutation rates are presented for each codon position of the expressed rat and human OAT genes. The region of the rat genome flanking the boundary of the OAT-3 pseudogene is of additional interest as it shares considerable identity to sequences contained within expressed genes and flanking other processed pseudogenes.  相似文献   

13.
14.
15.
16.
17.
18.
With the development of genome sequencing more whole genomes of microorganisms were completed, many methods wereintroduced to reconstruct the phylogenetic tree of those microorganismswith the information extracted from the whole genomes through variousways of transforming or mapping the whole genome sequences into otherforms which can describe the evolutionary distance in a new way. We thinkit might be possible that there exists information buried in the wholegenome transferred along lineage, which remains stable and is moreessential than sequence conservation of individual genes or the arrangementof some genes of a selected set. We need to find one measurement that caninvolve as many phylogenetic features as possible that are beyond thegenome sequence itself. We converted each genome sequence of themicroorganisms into another linear sequence to represent the functionalstructure of the sequence, and we used a new information function tocalculate the discrepancy of sequences and to get one distance matrix of thegenomes, and built one phylogenetic tree with a neighbor joining method.The resulting tree shows that the major lineages are consistent with theresult based on their 16srRNA sequences. Our method discovered onephylogenetic feature derived from the genome sequences and the encodedgenes that can rebuild the phylogenetic tree correctly. The mapping of onegenome sequence to its new form representing the relative positions of thefunctional genes provides a new way to measure the phylogeneticrelationships, and with the more specific classification of gene functions theresult could be more sensitive.  相似文献   

19.
虽然表达序列标签(ESTs,mRNA片段序列)已广泛用于高效基因发现和补充基因组注释的工作,最近,与实时荧光定量反转录PCR(qRT-PCR)结合,它也开始应用于种系遗传学、转录谱及其蛋白组学方面.通过对油料木本植物小桐子(J.curcas)的生殖组织基因表达水平的分析,预期可能找到一些与油脂合成相关的基因.这些研究成...  相似文献   

20.
Whereas genomics describes the study of genome, mainly represented by its gene expression on the DNA or RNA level, the term proteomics denotes the study of the proteome, which is the protein complement encoded by the genome. In recent years, the number of proteomic experiments increased tremendously. While all fields of proteomics have made major technological advances, the biggest step was seen in bioinformatics. Biological information management relies on sequence and structure databases and powerful software tools to translate experimental results into meaningful biological hypotheses and answers. In this resource article, I provide a collection of databases and software available on the Internet that are useful to interpret genomic and proteomic data. The article is a toolbox for researchers who have genomic or proteomic datasets and need to put their findings into a biological context.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号