首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Reconstructing human origins in the genomic era   总被引:7,自引:0,他引:7  
Analyses of recently acquired genomic sequence data are leading to important insights into the early evolution of anatomically modern humans, as well as into the more recent demographic processes that accompanied the global radiation of Homo sapiens. Some of the new results contradict early, but still influential, conclusions that were based on analyses of gene trees from mitochondrial DNA and Y-chromosome sequences. In this review, we discuss the different genetic and statistical methods that are available for studying human population history, and identify the most plausible models of human evolution that can accommodate the contrasting patterns observed at different loci throughout the genome.  相似文献   

2.
Technologies for the study of gene and protein expression in Plasmodium   总被引:1,自引:0,他引:1  
With the imminent completion of the genome sequences of several species of Plasmodium, attention is now turning to the exploitation of these genomic sequence data for vaccine, drug and diagnostic development. Several technologies have been developed over the past decade to assist in the determination of gene and protein expression on a global scale. Of these, DNA microarrays, novel high-throughput proteomic technologies and recombinational cloning technologies are lowering the barrier to functional genomic studies in Plasmodium. Of equal importance is the capacity to manipulate, store, retrieve and analyse the tremendous quantity of data generated from these genomic studies. This paper will address the use of these technologies as well as some of the computational tools that will be ultimately required to adequately study gene and protein expression in Plasmodium.  相似文献   

3.
Dinucleotide usage is known to vary in the genomes of organisms. The dinucleotide usage profiles or genome signatures are similar for sequence samples taken from the same genome, but are different for taxonomically distant species. This concept of genome signatures has been used to study several organisms including viruses, to elucidate the signatures of evolutionary processes at the genome level. Genome signatures assume greater importance in the case of host–pathogen interactions, where molecular interactions between the two species take place continuously, and can influence their genomic composition. In this study, analyses of whole genome sequences of the HIV-1 subtype B, a retrovirus that caused global pandemic of AIDS, have been carried out to analyse the variation in genome signatures of the virus from 1983 to 2007. We show statistically significant temporal variations in some dinucleotide patterns highlighting the selective evolution of the dinucleotide profiles of HIV-1 subtype B, possibly a consequence of host specific selection.  相似文献   

4.
ABSTRACT: BACKGROUND: Combinations of histone variants and modifications, conceptually representing a histone code, have been proposed to play a significant role in gene regulation and developmental processes in complex organisms. While various mechanisms have been implicated in establishing and maintaining epigenetic patterns at specific locations in the genome, they are generally believed to be independent of primary DNA sequence on a more global scale. RESULTS: To address this systematically in the case of the human genome, we have analyzed primary DNA sequences underlying 19 different methylated histones in human primary T-cells. We report that sequence alone can accurately predict the location of most of these histone marks genome-wide in this cell type. Furthermore, the sequence features responsible for such predictions are distinct for different groups of histone marks. CONCLUSIONS: These findings support the existence of a genomic code for histone modification associated with gene expression and chromatin programming, and they suggest that the mechanisms responsible for global histone modifications may interpret genomic sequence in various ways.  相似文献   

5.
GENOME: a rapid coalescent-based whole genome simulator   总被引:1,自引:0,他引:1  
Summary: GENOME proposes a rapid coalescent-based approach tosimulate whole genome data. In addition to features of standardcoalescent simulators, the program allows for recombinationrates to vary along the genome and for flexible population histories.Within small regions, we have evaluated samples simulated byGENOME to verify that GENOME provides the expected LD patternsand frequency spectra. The program can be used to study thesampling properties of any statistic for a whole genome study. Availability: The program and C++ source code are availableonline at http://www.sph.umich.edu/csg/liang/genome/ Contact: lianglim{at}umich.edu Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Martin Bishop  相似文献   

6.

Background  

Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation) score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program.  相似文献   

7.
Despite the current wealth of sequencing data, one‐third of all biochemically characterized metabolic enzymes lack a corresponding gene or protein sequence, and as such can be considered orphan enzymes. They represent a major gap between our molecular and biochemical knowledge, and consequently are not amenable to modern systemic analyses. As 555 of these orphan enzymes have metabolic pathway neighbours, we developed a global framework that utilizes the pathway and (meta)genomic neighbour information to assign candidate sequences to orphan enzymes. For 131 orphan enzymes (37% of those for which (meta)genomic neighbours are available), we associate sequences to them using scoring parameters with an estimated accuracy of 70%, implying functional annotation of 16 345 gene sequences in numerous (meta)genomes. As a case in point, two of these candidate sequences were experimentally validated to encode the predicted activity. In addition, we augmented the currently available genome‐scale metabolic models with these new sequence–function associations and were able to expand the models by on average 8%, with a considerable change in the flux connectivity patterns and improved essentiality prediction.  相似文献   

8.
9.
AMIGene: Annotation of MIcrobial Genes   总被引:11,自引:0,他引:11       下载免费PDF全文
AMIGene (Annotation of MIcrobial Genes) is an application for automatically identifying the most likely coding sequences (CDSs) in a large contig or a complete bacterial genome sequence. The first step in AMIGene is dedicated to the construction of Markov models that fit the input genomic data (i.e. the gene model), followed by the combination of well-known gene-finding methods and an heuristic approach for the selection of the most likely CDSs. The web interface allows the user to select one or several gene models applied to the analysis of the input sequence by the AMIGene program and to visualize the list of predicted CDSs graphically and in a downloadable text format. The AMIGene web site is accessible at the following address: http://www.genoscope.cns.fr/agc/tools/amigene/index.html (Contact: sbocs@genoscope.cns.fr).  相似文献   

10.
With the rapid increase in production of genetic data from new sequencing technologies, a myriad of new ways to study genomic patterns in nonmodel organisms are currently possible. Because genome assembly still remains a complicated procedure, and because the functional role of much of the genome is unclear, focusing on SNP genotyping from expressed sequences provides a cost‐effective way to reduce complexity while still retaining functionally relevant information. This review summarizes current methods, identifies ways that using expressed sequence data benefits population genomic inference and explores how current practitioners evaluate and overcome challenges that are commonly encountered. We focus particularly on the additional power of functional analysis provided by expressed sequence data and how these analyses push beyond allele pattern data available from nonfunction genomic approaches. The massive data sets generated by these approaches create opportunities and problems as well – especially false positives. We discuss methods available to validate results from expressed SNP genotyping assays, new approaches that sidestep use of mRNA and review follow‐up experiments that can focus on evolutionary mechanisms acting across the genome.  相似文献   

11.
Functional genomic approaches, such as proteomics, greatly enhance the value of genome sequences by providing a global level assessment of which genes are expressed, when genes are expressed and at what cellular levels gene products are synthesized. With over 1000 complete genome sequences of different microorganisms available, and DNA sequencing for environmental samples (metagenomes) producing vast amounts of gene sequence data, there is a real opportunity and a clear need to generate associated functional genomic data to learn about the source microorganisms. In contrast to the technological advances that have led to the accelerated rate and ease at which DNA sequence data can be generated, mass spectrometry based proteomics remains a technically sophisticated and exacting science. In recognition of the need to make proteomics more accessible to a growing number of environmental microbiologists so that the 'functional genomics gap' may be bridged, this review strives to demystify proteomic technologies and describe ways in which they have been applied, and more importantly, can be applied to study the physiology and ecology of extremophiles.  相似文献   

12.
SUMMARY: Although whole-genome sequences have been analysed for the presence of anomalous DNA, no dedicated application is currently available to analyse the composition of individual sequence entries, for instance those derived by experimental techniques, such as subtractive hybridization. Since genomic dinucleotide frequency values are conserved between related species, a representative genome sequence can often be found to score for anomalous sequence composition for many of these putative horizontally transferred sequences. We developed the application deltarho-web, which enables the determination of the differences between the dinucleotide composition of an input sequence and that of a selected genome in a size-dependent manner. A feature allowing batch comparisons is included as well. In addition, deltarho-web allows the analysis of the dinucleotide composition of complete genomes. This provides complementary information for the identification of large anomalous gene clusters.  相似文献   

13.

Background

A large number of gene prediction programs for the human genome exist. These annotation tools use a variety of methods and data sources. In the recent ENCODE genome annotation assessment project (EGASP), some of the most commonly used and recently developed gene-prediction programs were systematically evaluated and compared on test data from the human genome. AUGUSTUS was among the tools that were tested in this project.

Results

AUGUSTUS can be used as an ab initio program, that is, as a program that uses only one single genomic sequence as input information. In addition, it is able to combine information from the genomic sequence under study with external hints from various sources of information. For EGASP, we used genomic sequence alignments as well as alignments to expressed sequence tags (ESTs) and protein sequences as additional sources of information. Within the category of ab initio programs AUGUSTUS predicted significantly more genes correctly than any other ab initio program. At the same time it predicted the smallest number of false positive genes and the smallest number of false positive exons among all ab initio programs. The accuracy of AUGUSTUS could be further improved when additional extrinsic data, such as alignments to EST, protein and/or genomic sequences, was taken into account.

Conclusion

AUGUSTUS turned out to be the most accurate ab initio gene finder among the tested tools. Moreover it is very flexible because it can take information from several sources simultaneously into consideration.
  相似文献   

14.
15.
MOTIVATION: One of the major features of genomic DNA sequences, distinguishing them from texts in most spoken or artificial languages, is their high repetitiveness. Variation in the repetitiveness of genomic texts reflects the presence and density of different biologically important messages. Thus, deviation from an expected number of repeats in both directions indicates a possible presence of a biological signal. Linguistic complexity corresponds to repetitiveness of a genomic text, and potential regulatory sites may be discovered through construction of typical patterns of complexity distribution. RESULTS: We developed software for fast calculation of linguistic sequence complexity of DNA sequences. Our program utilizes suffix trees to compute the number of subwords present in genomic sequences, thereby allowing calculation of linguistic complexity in time linear in genome size. The measure of linguistic complexity was applied to the complete genome of Haemophilus influenzae. Maps of complexity along the entire genome were obtained using sliding windows of 40, 100, and 2000 nucleotides. This approach provided an efficient way to detect simple sequence repeats in this genome. In addition, local profiles of complexity distribution around the starts of translation were constructed for 21 complete prokaryotic genomes. We hypothesize that complexity profiles correspond to evolutionary relationships between organisms. We found principal differences in profiles of the GC-rich and other (non-GC-rich) genomes. We also found characteristic differences in profiles of AT genomes, which probably reflect individual species variations in translational regulation. AVAILABILITY: The program is available upon request from Alexander Bolshoy or at http://csweb.haifa.ac.il/library/#complex.  相似文献   

16.
17.
Maize nuclear DNA sequences capable of promoting the autonomous replication of plasmids in yeast were isolated by ligating Eco RI-digested fragments into yeast vectors unable to replicate autonomously. Three such autonomously replicating sequences (ARS), representing two families of highly repeated sequences within the maize genome, were isolated and characterized. Each repetitive family shows hybridization patterns on a Southern blot characteristic of a dispersed sequence. Unlike most repetitive sequences in maize, both ARS families have a constant copy number and characteristic genomic hybridization pattern in the inbred lines examined. Larger genome clones with sequence homology to the ARS-containing elements were selected from a lambda library of maize genomic DNA. There was typically only one copy of an ARS-homologous sequence on each 12–15 kb genomic fragment.  相似文献   

18.
There are four sequenced and publicly available plant genomes to date. With many more slated for completion, one challenge will be to use comparative genomic methods to detect novel evolutionary patterns in plant genomes. This research requires sequence alignment algorithms to detect regions of similarity within and among genomes. However, different alignment algorithms are optimized for identifying different types of homologous sequences. This review focuses on plant genome evolution and provides a tutorial for using several sequence alignment algorithms and visualization tools to detect useful patterns of conservation: conserved non-coding sequences, false positive noise, subfunctionalization, synteny, annotation errors, inversions and local duplications. Our tutorial encourages the reader to experiment online with the reviewed tools as a companion to the text.  相似文献   

19.
Soybean is believed to be a diploidized tetraploid generated from an allotetraploid ancestor. In this study, we used hypomethylated genomic DNA as a source of probes to investigate the genomic structure and methylation patterns of duplicated sequences. Forty-five genomic clones from Phaseolus vulgaris and 664 genomic clones from Glycine max were used to examine the duplicated regions in the soybean genome. Southern analysis of genomic DNA using probes from both sources revealed that greater than 15% of the hypomethylated genomic regions were only present once in the soybean genome. The remaining ca. 85% of the hypomethylated regions comprise duplicated or middle repetitive DNA sequences. If only the ratio of single to duplicate probe patterns is considered, it appears that 25% of the single-copy sequences have been lost. By using a subset of probes that only detected duplicated sequences, we examined the methylation status of the homeologous genomes with the restriction enzymes MspI and HpaII. We found that in all cases both copies of these regions were hypomethylated, although there were examples of low-level methylation. It appears that duplicate sequences are being eliminated in the diploidization process. Our data reveal no evidence that duplicated sequences are being silenced by inactivation correlated with methylation patterns.  相似文献   

20.
Using DNA sequence data from multiple genes (often from more than one genome compartment) to reconstruct phylogenetic relationships has become routine. Augmenting this approach with genomic structural characters (e.g., intron gain and loss, changes in gene order) as these data become available from comparative studies already has provided critical insight into some long-standing questions about the evolution of land plants. Here we report on the presence of a group II intron located in the mitochondrial atp1 gene of leptosporangiate and marattioid ferns. Primary sequence data for the atp1 gene are newly reported for 27 taxa, and results are presented from maximum likelihood-based phylogenetic analyses using Bayesian inference for 34 land plants in three data sets: (1) single-gene mitochondrial atp1 (exon+intron sequences); (2) five combined genes (mitochondrial atp1 [exon only]; plastid rbcL, atpB, rps4; nuclear SSU rDNA); and (3) same five combined genes plus morphology. All our phylogenetic analyses corroborate results from previous fern studies that used plastid and nuclear sequence data: the monophyly of euphyllophytes, as well as of monilophytes; whisk ferns (Psilotidae) sister to ophioglossoid ferns (Ophioglossidae); horsetails (Equisetopsida) sister to marattioid ferns (Marattiidae), which together are sister to the monophyletic leptosporangiate ferns. In contrast to the results from the primary sequence data, the genomic structural data (atp1 intron distribution pattern) would seem to suggest that leptosporangiate and marattioid ferns are monophyletic, and together they are the sister group to horsetails--a topology that is rarely reconstructed using primary sequence data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号