首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
There is a mounting evidence for the correlation between the gene expression pattern and sequence divergence. However, little is known about the relationship between the gene expression pattern and polymorphism. We compiled the gene expression, polymorphism, and divergence data from the public databases of the human genome. The ratios of nonsynonymous (A) to synonymous (S) substitutions in polymorphism and divergence in the human genome were strongly influenced by the expression pattern and breadth of genes and showed strong correlations. Among the tissues we analyzed, the brain-expressed genes have the smallest and the liver-expressed genes have the largest proportion of amino acid changes both in polymorphism and divergence. The analysis implies that negative selection is the primary factor affecting expression-dependent gene evolution and the prevalent but nonuniform distribution of slightly deleterious mutations in the genome. Although the genes under relaxed negative selection evolved faster than the other genes, these genes are even more liable to slightly deleterious mutations in the population. On the other hand, nonneutral mutations in the highly conservative genes, such as brain-expressed and housekeeping genes, are largely deleterious and eliminated before they enter the population.  相似文献   

2.
Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene’s proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene’s regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen’s Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, ncCADD and ncGWAVA, and find both scores are significantly predictive of human dosage sensitive genes and appear to carry information beyond conservation, as assessed by ncGERP. These results highlight that the intolerance of noncoding sequence stretches in the human genome can provide a critical complementary tool to other genome annotation approaches to help identify the parts of the human genome increasingly likely to harbor mutations that influence risk of disease.  相似文献   

3.

Background

Understanding how DNA sequence polymorphism relates to variation in gene expression is essential to connecting genotypic differences with phenotypic differences among individuals. Addressing this question requires linking population genomic data with gene expression variation.

Results

Using whole genome expression data and recent light shotgun genome sequencing of six Drosophila simulans genotypes, we assessed the relationship between expression variation in males and females and nucleotide polymorphism across thousands of loci. By examining sequence polymorphism in gene features, such as untranslated regions and introns, we find that genes showing greater variation in gene expression between genotypes also have higher levels of sequence polymorphism in many gene features. Accordingly, X-linked genes, which have lower sequence polymorphism levels than autosomal genes, also show less expression variation than autosomal genes. We also find that sex-specifically expressed genes show higher local levels of polymorphism and divergence than both sex-biased and unbiased genes, and that they appear to have simpler regulatory regions.

Conclusion

The gene-feature-based analyses and the X-to-autosome comparisons suggest that sequence polymorphism in cis-acting elements is an important determinant of expression variation. However, this relationship varies among the different categories of sex-biased expression, and trans factors might contribute more to male-specific gene expression than cis effects. Our analysis of sex-specific gene expression also shows that female-specific genes have been overlooked in analyses that only point to male-biased genes as having unusual patterns of evolution and that studies of sexually dimorphic traits need to recognize that the relationship between genetic and expression variation at these traits is different from the genome as a whole.  相似文献   

4.
The current pace of the generation of sequence data requires the development of software tools that can rapidly provide full annotation of the data. We have developed a new method for rapid sequence comparison using the exact match algorithm without repeat masking. As a demonstration, we have identified all perfect simple tandem repeats (STR) within the draft sequence of the human genome. The STR elements (chromosome, position, length and repeat subunit) have been placed into a relational database. Repeat flanking sequence is also publicly accessible at http://grid.abcc.ncifcrf.gov. To illustrate the utility of this complete set of STR elements, we documented the increased density of potentially polymorphic markers throughout the genome. The new STR markers may be useful in disease association studies because so many STR elements manifest multiallelic polymorphism. Also, because triplet repeat expansions are important for human disease etiology, we identified trinucleotide repeats that exist within exons of known genes. This resulted in a list that includes all 14 genes known to undergo polynucleotide expansion, and 48 additional candidates. Several of these are non-polyglutamine triplet repeats. Other examinations of the STR database demonstrated repeats spanning splice junctions and identified SNPs within repeat elements.  相似文献   

5.
6.
7.
More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/).  相似文献   

8.
9.
Human gene catalogs are fundamental to the study of human biology and medicine. But they are all based on open reading frames (ORFs) in a reference genome sequence (with allowance for introns). Individual genomes, however, are polymorphic: their sequences are not identical. There has been much research on how polymorphism affects previously-identified genes, but no research has been done on how it affects gene identification itself. We computationally predict protein-coding genes in a straightforward manner, by finding long ORFs in mRNA sequences aligned to the reference genome. We systematically test the effect of known polymorphisms with this procedure. Polymorphisms can not only disrupt ORFs, they can also create long ORFs that do not exist in the reference sequence. We found 5,737 putative protein-coding genes that do not exist in the reference, whose protein-coding status is supported by homology to known proteins. On average 10% of these genes are located in the genomic regions devoid of annotated genes in 12 other catalogs. Our statistical analysis showed that these ORFs are unlikely to occur by chance.  相似文献   

10.
Human mitochondrial DNA (mtDNA) is a nonrecombining genome that codes for 13 subunits of the mitochondrial oxidative phosphorylation system, 2 rRNAs, and 22 tRNAs. Mutations have accumulated sequentially in mtDNA lineages that diverged tens of thousands of years ago. The genes in mtDNA are subject to different functional constraints and are therefore expected to evolve at different rates, but the rank order of these rates should be the same in all lineages of a phylogeny. Previous studies have indicated, however, that specific regions of mtDNA may have experienced different histories of selection in different lineages, possibly because of lineage-specific interactions or environmental factors such as climate. We report here on a survey for lineage-specific patterns of nucleotide polymorphism in human mtDNA. We calculated molecular polymorphism indices and neutrality tests for classes of functional sites and genes in 837 human mtDNA sequences, compared the results between continent-specific mtDNA lineages, and used two sliding window methods to identify differences in the patterns of polymorphism between haplogroups. A general correlation between nucleotide position and the level of nucleotide polymorphism was identified in the coding region of the mitochondrial genome. Nucleotide diversity in the protein-coding sequence of mtDNA was generally not much higher than that found for many genes in nuclear DNA. A comparison of nonsynonymous/synonymous rate ratios in the 13 protein-coding genes suggested differences in the relative levels of selection between haplogroups, including the European haplogroup clusters. Interestingly, a segment of the MTND5 gene was found to be almost void of segregating sites and nonsynonymous mutations in haplogroup J, which has been associated with susceptibility to certain complex diseases. Our results suggest that there are haplogroup-specific differences in the intensity of selection against particular regions of the mitochondrial genome, indicating that some mutations may be non-neutral within specific phylogenetic lineages but neutral within others.  相似文献   

11.
Microsatellites, or tandem simple sequence repeats (SSRs), have become one of the most popular molecular markers in genome mapping because of their abundance across genomes and because of their high levels of polymorphism. However, information on which genes surround or flank them has remained very limited for most SSRs, especially in livestock species. In this study, an in silico comparative mapping approach was developed to link porcine SSRs to known genome regions by identifying their human orthologs. From a total of 1321 porcine microsatellites used in this study, 228 were found to have blocks in alignment with human genomic sequences. These 228 SSRs span about 1459 cM of the porcine genome, but with uneven distributions, ranging from 2 on SSC12 to 24 on SSC14. Linking these porcine SSRs to the known genome regions in the human genome also revealed 16 new putative synteny groups between these two species. Fifteen SSRs on SSC3 with identified human orthologs were typed on a pig-hamster radiation hybrid (RH) panel and used in a joint analysis with 80 known gene markers previously mapped on SSC3 using the same panel. The analysis revealed that they were all highly linked to either one or both adjacent markers. These results indicated that assigning the porcine SSRs to known genome regions by identifying their human orthologs is a reliable approach. The process will provide a foundation for positional cloning of causative genes for economically important traits.  相似文献   

12.
《Fly》2013,7(3):129-132
Upon completion of sequencing the Drosophila genome, it was estimated that 61% of human disease-associated genes had sequence homologs in flies, and in some diseases such as cancer, the number was as high as 68%1. We now know that as many as 75% of the genes associated with genetic disease have counterparts in Drosophila.2 Using better tools for mutation detection, association studies and whole genome analysis the number of human genes associated with genetic disease is steadily increasing. These detection efforts are outpacing the ability to assign function and understand the underlying cause of the disease at the molecular level. Drosophila models can therefore advance human disease research in a number of ways by: establishing the normal role of these gene products during development, elucidating the mechanism underlying disease pathology, and even identifying candidate therapeutic agents for the treatment of human disease.

At the 49th Annual Drosophila Research Conference in San Diego this year, a number of labs presented their exciting findings on Drosophila models of human disease in both platform presentations and poster sessions. Here we can only briefly review some of these developments, and we apologize that we do not have the time or space to review all of the findings presented which use Drosophila to understand human disease etiology.  相似文献   

13.
Pfleger CM  Reiter LT 《Fly》2008,2(3):129-132
Upon completion of sequencing the Drosophila genome, it was estimated that 61% of human disease-associated genes had sequence homologs in flies, and in some diseases such as cancer, the number was as high as 68%. We now know that as many as 75% of the genes associated with genetic disease have counterparts in Drosophila. Using better tools for mutation detection, association studies and whole genome analysis the number of human genes associated with genetic disease is steadily increasing. These detection efforts are outpacing the ability to assign function and understand the underlying cause of the disease at the molecular level. Drosophila models can therefore advance human disease research in a number of ways by: establishing the normal role of these gene products during development, elucidating the mechanism underlying disease pathology, and even identifying candidate therapeutic agents for the treatment of human disease. At the 49(th) Annual Drosophila Research Conference in San Diego this year, a number of labs presented their exciting findings on Drosophila models of human disease in both platform presentations and poster sessions. Here we can only briefly review some of these developments, and we apologize that we do not have the time or space to review all of the findings presented which use Drosophila to understand human disease etiology.  相似文献   

14.
15.
Approximately 50% of the predicted protein-coding genes of the Trypanosoma cruzi CL Brener strain are annotated as hypothetical or conserved hypothetical proteins. To further characterize these genes, we generated 1161 open-reading frame expressed sequence tags (ORESTES) from the mammalian stages of the VL10 human strain. Sequence clustering resulted in 435 clusters, consisting of 339 singletons and 96 contigs. Significant matches to the T. cruzi predicted gene database were found for ~94% contigs and ~69% singletons. These included genes encoding surface proteins, known to be intensely expressed in the parasite mammalian stages and implicated in host cell invasion and/or immune evasion mechanisms. Among 151 contigs and singletons with similarity to predicted hypothetical protein-coding genes and conserved hypothetical protein-coding genes, 83% showed no match with T. cruzi EST and/or proteome databases. These ORESTES are the first experimental evidence that the corresponding genes are in fact transcribed. Sequences with no significant match were searched against several T. cruzi and National Center for Biotechnology Information non-redundant sequence databases. The ORESTES analysis indicated that 124 predicted conserved hypothetical protein-coding genes and 27 predicted hypothetical protein-coding genes annotated in the CL Brener genome are transcribed in the VL10 mammalian stages. Six ORESTES annotated as hypothetical protein-coding genes showing no match to EST and/or proteome databases were confirmed by Northern blot in VL10. The generation of this set of ORESTES complements the T. cruzi genome annotation and suggests new stage-regulated genes encoding hypothetical proteins.  相似文献   

16.
Identifying genomic locations that have experienced selective sweeps is an important first step toward understanding the molecular basis of adaptive evolution. Using statistical methods that account for the confounding effects of population demography, recombination rate variation, and single-nucleotide polymorphism ascertainment, while also providing fine-scale estimates of the position of the selected site, we analyzed a genomic dataset of 1.2 million human single-nucleotide polymorphisms genotyped in African-American, European-American, and Chinese samples. We identify 101 regions of the human genome with very strong evidence (p < 10−5) of a recent selective sweep and where our estimate of the position of the selective sweep falls within 100 kb of a known gene. Within these regions, genes of biological interest include genes in pigmentation pathways, components of the dystrophin protein complex, clusters of olfactory receptors, genes involved in nervous system development and function, immune system genes, and heat shock genes. We also observe consistent evidence of selective sweeps in centromeric regions. In general, we find that recent adaptation is strikingly pervasive in the human genome, with as much as 10% of the genome affected by linkage to a selective sweep.  相似文献   

17.
18.
19.
We present a method for rapid isolation of flanking regions from amplified fragment length polymorphism (AFLP) fragments based on thermal asymmetric interlaced (TAIL)-PCR, in which one sequence-specific primer and one degenerate primer derived from an conserved motif found in homologies of the known sequence were used. The final result showed this to be a simple and efficient strategy, especially for short known sequences containing coding regions. Moreover this protocol was especially useful for species with little available genome information such as Hongkong Kumquat (Fortunella hindsii), since most of their genes have known homologies in other species such asArabidopsis and rice.  相似文献   

20.
Application of single nucleotide polymorphisms (SNPs) is revolutionizing human bio-medical research. However, discovery of polymorphisms in low polymorphic species is still a challenging and costly endeavor, despite widespread availability of Sanger sequencing technology. We present CRoPS as a novel approach for polymorphism discovery by combining the power of reproducible genome complexity reduction of AFLP with Genome Sequencer (GS) 20/GS FLX next-generation sequencing technology. With CRoPS, hundreds-of-thousands of sequence reads derived from complexity-reduced genome sequences of two or more samples are processed and mined for SNPs using a fully-automated bioinformatics pipeline. We show that over 75% of putative maize SNPs discovered using CRoPS are successfully converted to SNPWave assays, confirming them to be true SNPs derived from unique (single-copy) genome sequences. By using CRoPS, polymorphism discovery will become affordable in organisms with high levels of repetitive DNA in the genome and/or low levels of polymorphism in the (breeding) germplasm without the need for prior sequence information.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号