首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The whole genome approach enables the characterization of all components of any given biological pathway. Moreover, it can help to uncover all the metabolic routes for any molecule. Here we have used the genome of Drosophila melanogaster to search for enzymes involved in the metabolism of fucosylated glycans. Our results suggest that in the fruit fly GDP-fucose, the donor for fucosyltransferase reactions, is formed exclusively via the de novo pathway from GDP-mannose through enzymatic reactions catalyzed by GDP-D-mannose 4,6-dehydratase (GMD) and GDP-4-keto-6-deoxy-D-mannose 3,5-epimerase/4-reductase (GMER, also known as FX in man). The Drosophila genome does not have orthologs for the salvage pathway enzymes, i.e. fucokinase and GDP-fucose pyrophosphorylase synthesizing GDP-fucose from fucose. In addition we identified two novel fucosyltransferases predicted to catalyze alpha1,3- and alpha1,6-specific linkages to the GlcNAc residues on glycans. No genes with the capacity to encode alpha1,2-specific fucosyltransferases were found. We also identified two novel genes coding for O-fucosyltransferases and a gene responsible for a fucosidase enzyme in the Drosophila genome. Finally, using the Drosophila CG4435 gene, we identified two novel human genes putatively coding for fucosyltransferases. This work can serve as a basis for further whole-genome approaches in mapping all possible glycosylation pathways and as a basic analysis leading to subsequent experimental studies to verify the predictions made in this work.  相似文献   

2.
3.
Recent advances in gene structure prediction   总被引:9,自引:0,他引:9  
De novo gene predictors are programs that predict the exon-intron structures of genes using the sequences of one or more genomes as their only input. In the past two years, dual-genome de novo predictors, which exploit local rates and patterns of mutation inferred from alignments between two genomes, have led to significant improvements in accuracy. Systems that exploit more than two genomes simultaneously have only recently begun to appear and are not yet competitive on practical tasks, but offer the greatest hope for near-term improvements. Dual-genome de novo prediction for compact eukaryotic genomes such as those of Arabidopsis thaliana and Caenorhabditis elegans is already quite accurate. Although mammalian gene prediction lags behind in accuracy, it is yielding ever more useful results. Coupled with significant improvements in pseudogene detection methods, which have eliminated many false positives, we have reached the point where de novo gene predictions are being used as hypotheses to drive experimental annotation via systematic RT-PCR and sequencing.  相似文献   

4.
Repetitive elements may comprise over two-thirds of the human genome   总被引:1,自引:0,他引:1  
Transposable elements (TEs) are conventionally identified in eukaryotic genomes by alignment to consensus element sequences. Using this approach, about half of the human genome has been previously identified as TEs and low-complexity repeats. We recently developed a highly sensitive alternative de novo strategy, P-clouds, that instead searches for clusters of high-abundance oligonucleotides that are related in sequence space (oligo "clouds"). We show here that P-clouds predicts >840 Mbp of additional repetitive sequences in the human genome, thus suggesting that 66%-69% of the human genome is repetitive or repeat-derived. To investigate this remarkable difference, we conducted detailed analyses of the ability of both P-clouds and a commonly used conventional approach, RepeatMasker (RM), to detect different sized fragments of the highly abundant human Alu and MIR SINEs. RM can have surprisingly low sensitivity for even moderately long fragments, in contrast to P-clouds, which has good sensitivity down to small fragment sizes (~25 bp). Although short fragments have a high intrinsic probability of being false positives, we performed a probabilistic annotation that reflects this fact. We further developed "element-specific" P-clouds (ESPs) to identify novel Alu and MIR SINE elements, and using it we identified ~100 Mb of previously unannotated human elements. ESP estimates of new MIR sequences are in good agreement with RM-based predictions of the amount that RM missed. These results highlight the need for combined, probabilistic genome annotation approaches and suggest that the human genome consists of substantially more repetitive sequence than previously believed.  相似文献   

5.
Grimes BR  Monaco ZL 《Chromosoma》2005,114(4):230-241
At the gene therapy session of the ICCXV Chromosome Conference (2004), recent advances in the construction of engineered chromosomes and de novo human artificial chromosomes were presented. The long-term aims of these studies are to develop vectors as tools for studying genome and chromosome function and for delivering genes into cells for therapeutic applications. There are two primary advantages of chromosome-based vector systems over most conventional vectors for gene delivery. First, the transferred DNA can be stably maintained without the risks associated with insertion, and second, large DNA segments encompassing genes and their regulatory elements can be introduced, leading to more reliable transgene expression. There is clearly a need for safe and effective gene transfer vectors to correct genetic defects. Among the topics discussed at the gene therapy session and the main focus of this review are requirements for de novo human artificial chromosome formation, assembly of chromatin on de novo human artificial chromosomes, advances in vector construction, and chromosome transfer to cells and animals.  相似文献   

6.
With the acquisition of complete genome sequences from several animals, there is renewed interest in the pattern of genome evolution on our own lineage. One key question is whether gene number increased during chordate or vertebrate evolution. It is argued here that comparing the total number of genes between a fly, a nematode and human is not appropriate to address this question. Extensive gene loss after duplication is one complication; another is the problem of comparing taxa that are phylogenetically very distant. Amphioxus and tunicates are more appropriate animals for comparison to vertebrates. Comparisons of clustered homeobox genes, where gene loss can be identified, reveals a one to four mode of evolution for Hox and ParaHox genes. Analyses of other gene families in amphioxus and vertebrates confirm that gene duplication was very widespread on the vertebrate lineage. These data confirm that vertebrates have more genes than their closest invertebrate relatives, acquired through gene duplication. abbreviations IHGSC, International Human Genome Sequencing Consortium; TCESC, The C. elegans Sequencing Consortium.  相似文献   

7.
8.
Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.  相似文献   

9.
Evolutionary innovation relies partially on changes in gene regulation. While a growing body of evidence demonstrates that such innovation is generated by functional changes or translocation of regulatory elements via mobile genetic elements, the de novo generation of enhancers from non-regulatory/non-mobile sequences has, to our knowledge, not previously been demonstrated. Here we show evidence for the de novo genesis of enhancers in vertebrates. For this, we took advantage of the massive gene loss following the last whole genome duplication in teleosts to systematically identify regions that have lost their coding capacity but retain sequence conservation with mammals. We found that these regions show enhancer activity while the orthologous coding regions have no regulatory activity. These results demonstrate that these enhancers have been de novo generated in fish. By revealing that minor changes in non-regulatory sequences are sufficient to generate new enhancers, our study highlights an important playground for creating new regulatory variability and evolutionary innovation.  相似文献   

10.
Accurate protein identification in large-scale proteomics experiments relies upon a detailed, accurate protein catalogue, which is derived from predictions of open reading frames based on genome sequence data. Integration of mass spectrometry-based proteomics data with computational proteome predictions from environmental metagenomic sequences has been challenging because of the variable overlap between proteomic datasets and corresponding short-read nucleotide sequence data. In this study, we have benchmarked several strategies for increasing microbial peptide spectral matching in metaproteomic datasets using protein predictions generated from matched metagenomic sequences from the same human fecal samples. Additionally, we investigated the impact of mass spectrometry-based filters (high mass accuracy, delta correlation), and de novo peptide sequencing on the number and robustness of peptide-spectrum assignments in these complex datasets. In summary, we find that high mass accuracy peptide measurements searched against non-assembled reads from DNA sequencing of the same samples significantly increased identifiable proteins without sacrificing accuracy.  相似文献   

11.
12.
13.
14.
Many essential aspects of genome function, including gene expression and chromosome segregation, are mediated throughout development and differentiation by changes in the chromatin state. Along with genomic signals encoded in the DNA, epigenetic processes regulate heritable gene expression patterns. Genomic signals such as enhancers, silencers, and repetitive DNA, while required for the establishment of alternative chromatin states, have an unclear role in epigenetic processes that underlie the persistence of chromatin states throughout development. Here, we demonstrate in fission yeast that the maintenance and inheritance of ectopic heterochromatin domains are independent of the genomic sequences necessary for their de novo establishment. We find that both structural heterochromatin and gene silencing can be stably maintained over an ~10-kb domain for up to hundreds of cell divisions in the absence of genomic sequences required for heterochromatin establishment, demonstrating the long-term persistence and stability of this chromatin state. The de novo heterochromatin, despite the absence of nucleation sequences, is also stably inherited through meiosis. Together, these studies provide evidence for chromatin-dependent, epigenetic control of gene silencing that is heritable, stable, and self-sustaining, even in the absence of the originating genomic signals.  相似文献   

15.
16.
'Disease-causing' mutations do not cause disease in all individuals. One possible important reason for this is that the outcome of a mutation can depend upon other genetic variants in a genome. These epistatic interactions between mutations occur both within and between molecules, and studies in model organisms show that they are extremely prevalent. However, epistatic interactions are still poorly understood at the molecular level, and consequently difficult to predict de novo. Here I provide an overview of our current understanding of the molecular mechanisms that can cause epistasis, and areas where more research is needed. A more complete understanding of epistasis will be vital for making accurate predictions about the phenotypes of individuals.  相似文献   

17.
REGANOR     
With >1,000 prokaryotic genome sequencing projects ongoing or already finished, comprehensive comparative analysis of the gene content of these genomes has become viable. To allow for a meaningful comparative analysis, gene prediction of the various genomes should be as accurate as possible. It is clear that improving the state of genome annotation requires automated gene identification methods to cope with the influence of artifacts, such as genomic GC content. There is currently still room for improvement in the state of annotations. We present a web server and a database of high-quality gene predictions. The web server is a resource for gene identification in prokaryote genome sequences. It implements our previously described, accurate gene finding method REGANOR. We also provide novel gene predictions for 241 complete, or almost complete, prokaryotic genomes. We demonstrate how this resource can easily be utilised to identify promising candidates for currently missing genes from genome annotations with several examples. All data sets are available online. AVAILABILITY: The gene finding server is accessible via https://www.cebitec.uni-bielefeld.de/groups/brf/software/reganor/cgi-bin/reganor_upload.cgi. The server software is available with the GenDB genome annotation system (version 2.2.1 onwards) under the GNU general public license. The software can be downloaded from https://sourceforge.net/projects/gendb/. More information on installing GenDB and REGANOR and the system requirements can be found on the GenDB project page http://www.cebitec.uni-bielefeld.de/groups/brf/software/wiki/GenDBWiki/AdministratorDocumentation/GenDBInstallation  相似文献   

18.
The methylation pattern of the germ line-transmitted Moloney leukemia proviral genome was analyzed in DNA of sperm, of day-12 and day-17 embryos, and of adult mice from six different Mov substrains. At day 12 of gestation, all 50 testable CpG sites in the individual viral genomes as well as sites in flanking host sequences were highly methylated. Some sites were unmethylated in sperm, indicating de novo methylation of unique DNA sequences during normal mouse development. At subsequent stages of development, specific CpG sites which were localized exclusively in the 5' and 3' enhancer regions of the long terminal repeat became progressively demethylated in all six proviruses. The extent of enhancer demethylation, however, was tissue specific and strongly affected by the chromosomal position of the respective proviral genome. This position-dependent demethylation of enhancer sequences was not accompanied by a similar change within the flanking host sequences, which remained virtually unchanged. Our results indicate that viral enhancer sequences, but not other sequences in the M-MuLV genome, may have an intrinsic ability to interact with cellular proteins, which can perturb the interaction of the methylase with DNA. Demethylation of enhancer sequences is not sufficient for gene expression but may be a necessary event which enables the enhancer to respond to developmental signals which ultimately lead to gene activation.  相似文献   

19.
A central challenge of synthetic biology is to enable the growth of living systems using parts that are not derived from nature, but designed and synthesized in the laboratory. As an initial step toward achieving this goal, we probed the ability of a collection of >10(6) de novo designed proteins to provide biological functions necessary to sustain cell growth. Our collection of proteins was drawn from a combinatorial library of 102-residue sequences, designed by binary patterning of polar and nonpolar residues to fold into stable 4-helix bundles. We probed the capacity of proteins from this library to function in vivo by testing their abilities to rescue 27 different knockout strains of Escherichia coli, each deleted for a conditionally essential gene. Four different strains--ΔserB, ΔgltA, ΔilvA, and Δfes--were rescued by specific sequences from our library. Further experiments demonstrated that a strain simultaneously deleted for all four genes was rescued by co-expression of four novel sequences. Thus, cells deleted for ~0.1% of the E. coli genome (and ~1% of the genes required for growth under nutrient-poor conditions) can be sustained by sequences designed de novo.  相似文献   

20.
The human genome reference (HGR) completion marked the genomics era beginning, yet despite its utility universal application is limited by the small number of individuals used in its development. This is highlighted by the presence of high-quality sequence reads failing to map within the HGR. Sequences failing to map generally represent 2–5 % of total reads, which may harbor regions that would enhance our understanding of population variation, evolution, and disease. Alternatively, complete de novo assemblies can be created, but these effectively ignore the groundwork of the HGR. In an effort to find a middle ground, we developed a bioinformatic pipeline that maps paired-end reads to the HGR as separate single reads, exports unmappable reads, de novo assembles these reads per individual and then combines assemblies into a secondary reference assembly used for comparative analysis. Using 45 diverse 1000 Genomes Project individuals, we identified 351,361 contigs covering 195.5 Mb of sequence unincorporated in GRCh38. 30,879 contigs are represented in multiple individuals with ~40 % showing high sequence complexity. Genomic coordinates were generated for 99.9 %, with 52.5 % exhibiting high-quality mapping scores. Comparative genomic analyses with archaic humans and primates revealed significant sequence alignments and comparisons with model organism RefSeq gene datasets identified novel human genes. If incorporated, these sequences will expand the HGR, but more importantly our data highlight that with this method low coverage (~10–20×) next-generation sequencing can still be used to identify novel unmapped sequences to explore biological functions contributing to human phenotypic variation, disease and functionality for personal genomic medicine.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号