首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Transposable elements (TEs) have been identified in every organism in which they have been looked for. The sequencing of large genomes, such as the human genome and those of Drosophila, Arabidopsis, Caenorhabditis, has also shown that they are a major constituent of these genomes, accounting for 15% of the genome of Drosophila, 45% of the human genome, and more than 70% in some plants and amphibians. Compared with the 1% of genomic DNA dedicated to protein-coding sequences in the human genome, this has prompted various researchers to suggest that the TEs and the other repetitive sequences that constitute the so-called "noncoding DNA", are where the most stimulating discoveries will be made in the future (Bromham, 2002). We are therefore getting further and further from the original idea that this DNA was simply "junk DNA", that owed its presence in the genome entirely to its capacity for selfish transposition. Our understanding of the structures of TEs, their distribution along the genomes, their sequence and insertion polymorphisms within genomes, and within and between populations and species, their impact on genes and on the regulatory mechanisms of genetic expression, their effects on exon shuffling and other phenomena that reshape the genome, and their impact on genome size has increased dramatically in recent years. This leads to a more general picture of the impact of TEs on genomes, though many copies are still mainly selfish or junk DNA. In this review we focus mainly on discoveries made in Drosophila, but we also use information about other genomes when this helps to elucidate the general processes involved in the organization, plasticity, and evolution of genomes.  相似文献   

2.
Grover D  Kannan K  Brahmachari SK  Mukerji M 《Genetica》2005,124(2-3):273-289
Elucidation of complete nucleotide sequence of the human has revealed that coding sequences that store the information needed to synthesize functional proteins, occupy only 2% of the genomic region. The remaining 98%, barring few regulatory sequences, has been referred to as non-functional or junk DNA and consists of many kinds of repeat elements. In fact, human genome is the most repeat rich genome sequenced so far, in which more than half of the region is occupied by such sequences. Determination of significance of these repeats in the human genome has become the focus of many studies all over the world, especially after genome sequencing did not reveal any significant difference in coding regions between lower eukaryotes and human. In this article, we have focused on Alu repeats that are primate specific elements with many interesting biological properties. Moreover, these are the repeats with highest copy number in the human genome. We have highlighted different facets of their interaction with the genome and changing paradigms regarding their role in genome organization.  相似文献   

3.
4.
In Kellis et al. (2003), we reported the genome sequences of S. paradoxus, S. mikatae, and S. bayanus and compared these three yeast species to their close relative, S. cerevisiae. Genomewide comparative analysis allowed the identification of functionally important sequences, both coding and noncoding. In this companion paper we describe the mathematical and algorithmic results underpinning the analysis of these genomes. (1) We present methods for the automatic determination of genome correspondence. The algorithms enabled the automatic identification of orthologs for more than 90% of genes and intergenic regions across the four species despite the large number of duplicated genes in the yeast genome. The remaining ambiguities in the gene correspondence revealed recent gene family expansions in regions of rapid genomic change. (2) We present methods for the identification of protein-coding genes based on their patterns of nucleotide conservation across related species. We observed the pressure to conserve the reading frame of functional proteins and developed a test for gene identification with high sensitivity and specificity. We used this test to revisit the genome of S. cerevisiae, reducing the overall gene count by 500 genes (10% of previously annotated genes) and refining the gene structure of hundreds of genes. (3) We present novel methods for the systematic de novo identification of regulatory motifs. The methods do not rely on previous knowledge of gene function and in that way differ from the current literature on computational motif discovery. Based on genomewide conservation patterns of known motifs, we developed three conservation criteria that we used to discover novel motifs. We used an enumeration approach to select strongly conserved motif cores, which we extended and collapsed into a small number of candidate regulatory motifs. These include most previously known regulatory motifs as well as several noteworthy novel motifs. The majority of discovered motifs are enriched in functionally related genes, allowing us to infer a candidate function for novel motifs. Our results demonstrate the power of comparative genomics to further our understanding of any species. Our methods are validated by the extensive experimental knowledge in yeast and will be invaluable in the study of complex genomes like that of the human.  相似文献   

5.
With the development of genome sequencing more whole genomes of microorganisms were completed, many methods wereintroduced to reconstruct the phylogenetic tree of those microorganismswith the information extracted from the whole genomes through variousways of transforming or mapping the whole genome sequences into otherforms which can describe the evolutionary distance in a new way. We thinkit might be possible that there exists information buried in the wholegenome transferred along lineage, which remains stable and is moreessential than sequence conservation of individual genes or the arrangementof some genes of a selected set. We need to find one measurement that caninvolve as many phylogenetic features as possible that are beyond thegenome sequence itself. We converted each genome sequence of themicroorganisms into another linear sequence to represent the functionalstructure of the sequence, and we used a new information function tocalculate the discrepancy of sequences and to get one distance matrix of thegenomes, and built one phylogenetic tree with a neighbor joining method.The resulting tree shows that the major lineages are consistent with theresult based on their 16srRNA sequences. Our method discovered onephylogenetic feature derived from the genome sequences and the encodedgenes that can rebuild the phylogenetic tree correctly. The mapping of onegenome sequence to its new form representing the relative positions of thefunctional genes provides a new way to measure the phylogeneticrelationships, and with the more specific classification of gene functions theresult could be more sensitive.  相似文献   

6.
Although sequences containing regulatory elements located close to protein-coding genes are often only weakly conserved during evolution, comparisons of rodent genomes have implied that these sequences are subject to some selective constraints. Evolutionary conservation is particularly apparent upstream of coding sequences and in first introns, regions that are enriched for regulatory elements. By comparing the human and chimpanzee genomes, we show here that there is almost no evidence for conservation in these regions in hominids. Furthermore, we show that gene expression is diverging more rapidly in hominids than in murids per unit of neutral sequence divergence. By combining data on polymorphism levels in human noncoding DNA and the corresponding human–chimpanzee divergence, we show that the proportion of adaptive substitutions in these regions in hominids is very low. It therefore seems likely that the lack of conservation and increased rate of gene expression divergence are caused by a reduction in the effectiveness of natural selection against deleterious mutations because of the low effective population sizes of hominids. This has resulted in the accumulation of a large number of deleterious mutations in sequences containing gene control elements and hence a widespread degradation of the genome during the evolution of humans and chimpanzees.  相似文献   

7.
8.
Genetic information of human is encoded in two genomes: nuclear and mitochondrial. Both of them reflect molecular evolution of human starting from the beginning of life (about 4.5 billion years ago) until the origin of Homo sapiens species about 100,000 years ago. From this reason human genome contains some features that are common for different groups of organisms and some features that are unique for Homo sapiens. 3.2 x 10(9) base pairs of human nuclear genome are packed into 23 chromosomes of different size. The smallest chromosome - 21st contains 5 x 10(7) base pairs while the biggest one -1st contains 2.63 x 10(8) base pairs. Despite the fact that the nucleotide sequence of all chromosomes is established, the organisation of nuclear genome put still questions: for example: the exact number of genes encoded by the human genome is still unknown giving estimations from 30 to 150 thousand genes. Coding sequences represent a few percent of human nuclear genome. The majority of the genome is represented by repetitiVe sequences (about 50%) and noncoding unique sequences. This part of the genome is frequently wrongly called "junk DNA". The distribution of genes on chromosomes is irregular, DNA fragments containing low percentage of GC pairs code lower number of genes than the fragments of high percentage of GC pairs.  相似文献   

9.
We have analyzed a sequence of approximately 70 base pairs (bp) that shows a high degree of similarity to sequences present in the non-coding regions of a number of human and other mammalian genes. The sequence was discovered in a fragment of human genomic DNA adjacent to an integrated hepatitis B virus genome in cells derived from human hepatocellular carcinoma tissue. When one of the viral flanking sequences was compared to nucleotide sequences in GenBank, more than thirty human genes were identified that contained a similar sequence in their non-coding regions. The sequence element was usually found once or twice in a gene, either in an intron or in the 5' or 3' flanking regions. It did not share any similarities with known short interspersed nucleotide elements (SINEs) or presently known gene regulatory elements. This element was highly conserved at the same position within the corresponding human and mouse genes for myoglobin and N-myc, indicating evolutionary conservation and possible functional importance. Preliminary DNase I footprinting data suggested that the element or its adjacent sequences may bind nuclear factors to generate specific DNase I hypersensitive sites. The size, structure, and evolutionary conservation of this sequence indicates that it is distinct from other types of short interspersed repetitive elements. It is possible that the element may have a cis-acting functional role in the genome.  相似文献   

10.
11.
12.
One of the major goals of comparative genomics is to understand the evolutionary history of each nucleotide in the human genome sequence, and the degree to which it is under selective pressure. Ascertainment of selective constraint at nucleotide resolution is particularly important for predicting the functional significance of human genetic variation and for analyzing the sequence substructure of cis-regulatory sequences and other functional elements. Current methods for analysis of sequence conservation are focused on delineation of conserved regions comprising tens or even hundreds of consecutive nucleotides. We therefore developed a novel computational approach designed specifically for scoring evolutionary conservation at individual base-pair resolution. Our approach estimates the rate at which each nucleotide position is evolving, computes the probability of neutrality given this rate estimate, and summarizes the result in a Sequence CONservation Evaluation (SCONE) score. We computed SCONE scores in a continuous fashion across 1% of the human genome for which high-quality sequence information from up to 23 genomes are available. We show that SCONE scores are clearly correlated with the allele frequency of human polymorphisms in both coding and noncoding regions. We find that the majority of noncoding conserved nucleotides lie outside of longer conserved elements predicted by other conservation analyses, and are experiencing ongoing selection in modern humans as evident from the allele frequency spectrum of human polymorphism. We also applied SCONE to analyze the distribution of conserved nucleotides within functional regions. These regions are markedly enriched in individually conserved positions and short (<15 bp) conserved “chunks.” Our results collectively suggest that the majority of functionally important noncoding conserved positions are highly fragmented and reside outside of canonically defined long conserved noncoding sequences. A small subset of these fragmented positions may be identified with high confidence.  相似文献   

13.
Genes that show complex tissue-specific and temporal control by regulatory elements located outside their promoters present a considerable challenge to identify the sequences involved. The rapid accumulation of genomic sequence information for a number of species has enabled a comparative phylogenetic approach to find important regulatory elements. For some genes, which show a similar pattern of expression in humans and rodents, genomic sequence information for these two species may be sufficient. Others, such as the cystic fibrosis transmembrane conductance regulator (CFTR) gene, show significant divergence in expression patterns between mouse and human, necessitating phylogenetic approaches involving additional species. The ovine CFTR gene has a temporal and spatial expression pattern that is very similar to that of human CFTR. Comparative genomic sequence analysis of ovine and human CFTR identified high levels of homology between the core elements in several potential regulatory elements defined as DNase I hypersensitive sites in human CFTR. These data provide a case for the power of an artiodactyl genome to contribute to the understanding of human genetic disease.  相似文献   

14.
15.
16.
17.
18.
Evolutionarily conserved non-coding genomic sequences represent a potentially rich source for the discovery of gene regulatory regions. Since these elements are subject to stabilizing selection they evolve much more slowly than adjacent non-functional DNA. These so-called phylogenetic footprints can be detected by comparison of the sequences surrounding orthologous genes in different species. Therefore the loss of phylogenetic footprints as well as the acquisition of conserved non-coding sequences in some lineages, but not in others, can provide evidence for the evolutionary modification of cis-regulatory elements. We introduce here a statistical model of footprint evolution that allows us to estimate the loss of sequence conservation that can be attributed to gene loss and other structural reasons. This approach to studying the pattern of cis-regulatory element evolution, however, requires the comparison of relatively long sequences from many species. We have therefore developed an efficient software tool for the identification of corresponding footprints in long sequences from multiple species. We apply this novel method to the published sequences of HoxA clusters of shark, human, and the duplicated zebrafish and Takifugu clusters as well as the published HoxB cluster sequences. We find that there is a massive loss of sequence conservation in the intergenic region of the HoxA clusters, consistent with the finding in [Chiu et al., PNAS 99 (2002) 5492]. The loss of conservation after cluster duplication is more extensive than expected from structural reasons. This suggests that binding site turnover and/or adaptive modification may also contribute to the loss of sequence conservation.  相似文献   

19.
20.

Background  

Single nucleotide polymorphisms (SNPs) are important tools in studying complex genetic traits and genome evolution. Computational strategies for SNP discovery make use of the large number of sequences present in public databases (in most cases as expressed sequence tags (ESTs)) and are considered to be faster and more cost-effective than experimental procedures. A major challenge in computational SNP discovery is distinguishing allelic variation from sequence variation between paralogous sequences, in addition to recognizing sequencing errors. For the majority of the public EST sequences, trace or quality files are lacking which makes detection of reliable SNPs even more difficult because it has to rely on sequence comparisons only.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号