首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Array-based comparative genomic hybridization (aCGH) is a molecular cytogenetic technique used in detecting and mapping DNA copy number alterations. aCGH is able to interrogate the entire genome at a previously unattainable, high resolution and has directly led to the recent appreciation of a novel class of genomic variation: copy number variation (CNV) in mammalian genomes. All forms of DNA variation/polymorphism are important for studying the basis of phenotypic diversity among individuals. CNV research is still at its infancy, requiring careful collation and annotation of accumulating CNV data that will undoubtedly be useful for accurate interpretation of genomic imbalances identified during cancer research.  相似文献   

2.
Serine hydroxymethyltransferase (SHMT) catalyzes the transfer of a β-carbon from serine to tetrahydrofolate to form glycine and 5,10-methylene-tetrahydrofolate. This reaction plays an important role in neurotransmitter synthesis and metabolism. We set out to resequence SHMT1 and SHMT2, followed by functional genomic studies. We identified 87 and 60 polymorphisms in SHMT1 and SHMT2, respectively. We observed no significant functional effect of the 13 non-synonymous single-nucleotide polymorphism (SNPs) in these genes, either on catalytic activity or protein quantity. We imputed additional variants across the two genes using '1000 Genomes' data, and identified 14 variants that were significantly associated (p<1.0E-10) with SHMT1 messenger RNA expression in lymphoblastoid cell lines. Many of these SNPs were also significantly correlated with basal SHMT1 protein expression in 268 human liver biopsy samples. Reporter gene assays suggested that the SHMT1 promoter SNP, rs669340, contributed to this variation. Finally, SHMT1 and SHMT2 expression were significantly correlated with those of other Folate and Methionine Cycle genes at both the messenger RNA and protein levels. These experiments represent a comprehensive study of SHMT1 and SHMT2 gene sequence variation and its functional implications. In addition, we obtained preliminary indications that these genes may be co-regulated with other Folate and Methionine Cycle genes.  相似文献   

3.
Increases in throughput and decreases in costs have facilitated large scale metabolomics studies, the simultaneous measurement of large numbers of biochemical components in biological samples. Initial large scale studies focused on biomarker discovery for disease or disease progression and helped to understand biochemical pathways underlying disease. The first population-based studies that combined metabolomics and genome wide association studies (mGWAS) have increased our understanding of the (genetic) regulation of biochemical conversions. Measurements of metabolites as intermediate phenotypes are a potentially very powerful approach to uncover how genetic variation affects disease susceptibility and progression. However, we still face many hurdles in the interpretation of mGWAS data. Due to the composite nature of many metabolites, single enzymes may affect the levels of multiple metabolites and, conversely, levels of single metabolites may be affected by multiple enzymes. Here, we will provide a global review of the current status of mGWAS. We will specifically discuss the application of prior biological knowledge present in databases to the interpretation of mGWAS results and discuss the potential of mathematical models. As the technology continuously improves to detect metabolites and to measure genetic variation, it is clear that comprehensive systems biology based approaches are required to further our insight in the association between genes, metabolites and disease. This article is part of a Special Issue entitled: From Genome to Function.  相似文献   

4.
The discovery of an abundance of copy number variants (CNVs; gains and losses of DNA sequences >1 kb) and other structural variants in the human genome is influencing the way research and diagnostic analyses are being designed and interpreted. As such, comprehensive databases with the most relevant information will be critical to fully understand the results and have impact in a diverse range of disciplines ranging from molecular biology to clinical genetics. Here, we describe the development of bioinformatics resources to facilitate these studies. The Database of Genomic Variants (http://projects.tcag.ca/variation/) is a comprehensive catalogue of structural variation in the human genome. The database currently contains 1,267 regions reported to contain copy number variation or inversions in apparently healthy human cases. We describe the current contents of the database and how it can serve as a resource for interpretation of array comparative genomic hybridization (array CGH) and other DNA copy imbalance data. We also present the structure of the database, which was built using a new data modeling methodology termed Cross-Referenced Tables (XRT). This is a generic and easy-to-use platform, which is strong in handling textual data and complex relationships. Web-based presentation tools have been built allowing publication of XRT data to the web immediately along with rapid sharing of files with other databases and genome browsers. We also describe a novel tool named eFISH (electronic fluorescence in situ hybridization) (http://projects.tcag.ca/efish/), a BLAST-based program that was developed to facilitate the choice of appropriate clones for FISH and CGH experiments, as well as interpretation of results in which genomic DNA probes are used in hybridization-based experiments.  相似文献   

5.
The abundance of different SSU rRNA (“16S”) gene sequences in environmental samples is widely used in studies of microbial ecology as a measure of microbial community structure and diversity. However, the genomic copy number of the 16S gene varies greatly – from one in many species to up to 15 in some bacteria and to hundreds in some microbial eukaryotes. As a result of this variation the relative abundance of 16S genes in environmental samples can be attributed both to variation in the relative abundance of different organisms, and to variation in genomic 16S copy number among those organisms. Despite this fact, many studies assume that the abundance of 16S gene sequences is a surrogate measure of the relative abundance of the organisms containing those sequences. Here we present a method that uses data on sequences and genomic copy number of 16S genes along with phylogenetic placement and ancestral state estimation to estimate organismal abundances from environmental DNA sequence data. We use theory and simulations to demonstrate that 16S genomic copy number can be accurately estimated from the short reads typically obtained from high-throughput environmental sequencing of the 16S gene, and that organismal abundances in microbial communities are more strongly correlated with estimated abundances obtained from our method than with gene abundances. We re-analyze several published empirical data sets and demonstrate that the use of gene abundance versus estimated organismal abundance can lead to different inferences about community diversity and structure and the identity of the dominant taxa in microbial communities. Our approach will allow microbial ecologists to make more accurate inferences about microbial diversity and abundance based on 16S sequence data.  相似文献   

6.
Adjusting the focus on human variation   总被引:36,自引:0,他引:36  
  相似文献   

7.
8.
A novel hypothesis-free multivariate screening methodology for the study of human exercise metabolism in blood serum is presented. Serum gas chromatography/time-of-flight mass spectrometry (GC/TOFMS) data was processed using hierarchical multivariate curve resolution (H-MCR), and orthogonal partial least-squares discriminant analysis (OPLS-DA) was used to model the systematic variation related to the acute effect of strenuous exercise. Potential metabolic biomarkers were identified using data base comparisons. Extensive validation was carried out including predictive H-MCR, 7-fold full cross-validation, and predictions for the OPLS-DA model, variable permutation for highlighting interesting metabolites, and pairwise t tests for examining the significance of metabolites. The concentration changes of potential biomarkers were verified in the raw GC/TOFMS data. In total, 420 potential metabolites were resolved in the serum samples. On the basis of the relative concentrations of the 420 resolved metabolites, a valid multivariate model for the difference between pre- and post-exercise subjects was obtained. A total of 34 metabolites were highlighted as potential biomarkers, all statistically significant (p < 8.1E-05). As an example, two potential markers were identified as glycerol and asparagine. The concentration changes for these two metabolites were also verified in the raw GC/TOFMS data.The strategy was shown to facilitate interpretation and validation of metabolic interactions in human serum as well as revealing the identity of potential markers for known or novel mechanisms of human exercise physiology. The multivariate way of addressing metabolism studies can help to increase the understanding of the integrative biology behind, as well as unravel new mechanistic explanations in relation to, exercise physiology.  相似文献   

9.
Comparative analysis is one of the most powerful methods available for understanding the diverse and complex systems found in biology, but it is often limited by a lack of comprehensive taxonomic sampling. Despite the recent development of powerful genome technologies capable of producing sequence data in large quantities (witness the recently completed first draft of the human genome), there has been relatively little change in how evolutionary studies are conducted. The application of genomic methods to evolutionary biology is a challenge, in part because gene segments from different organisms are manipulated separately, requiring individual purification, cloning, and sequencing. We suggest that a feasible approach to collecting genome-scale data sets for evolutionary biology (i.e., evolutionary genomics) may consist of combination of DNA samples prior to cloning and sequencing, followed by computational reconstruction of the original sequences. This approach will allow the full benefit of automated protocols developed by genome projects to be realized; taxon sampling levels can easily increase to thousands for targeted genomes and genomic regions. Sequence diversity at this level will dramatically improve the quality and accuracy of phylogenetic inference, as well as the accuracy and resolution of comparative evolutionary studies. In particular, it will be possible to make accurate estimates of normal evolution in the context of constant structural and functional constraints (i.e., site-specific substitution probabilities), along with accurate estimates of changes in evolutionary patterns, including pairwise coevolution between sites, adaptive bursts, and changes in selective constraints. These estimates can then be used to understand and predict the effects of protein structure and function on sequence evolution and to predict unknown details of protein structure, function, and functional divergence. In order to demonstrate the practicality of these ideas and the potential benefit for functional genomic analysis, we describe a pilot project we are conducting to simultaneously sequence large numbers of vertebrate mitochondrial genomes.  相似文献   

10.
Understanding the patterns of genetic variation within and among populations is a central problem in population and evolutionary genetics. We examine this question in the acorn barnacle, Semibalanus balanoides, in which the allozyme loci Mpi and Gpi have been implicated in balancing selection due to varying selective pressures at different spatial scales. We review the patterns of genetic variation at the Mpi locus, compare this to levels of population differentiation at mtDNA and microsatellites, and place these data in the context of genome-wide variation from high-throughput sequencing of population samples spanning the North Atlantic. Despite considerable geographic variation in the patterns of selection at the Mpi allozyme, this locus shows rather low levels of population differentiation at ecological and trans-oceanic scales (F(ST)?~?5%). Pooled population sequencing was performed on samples from Rhode Island (RI), Maine (ME), and Southwold, England (UK). Analysis of more than 650 million reads identified approximately 335,000 high-quality SNPs in 19 million base pairs of the S. balanoides genome. Much variation is shared across the Atlantic, but there are significant examples of strong population differentiation among samples from RI, ME, and UK. An F(ST) outlier screen of more than 22,000 contigs provided a genome-wide context for interpretation of earlier studies on allozymes, mtDNA, and microsatellites. F(ST) values for allozymes, mtDNA and microsatellites are close to the genome-wide average for random SNPs, with the exception of the trans-Atlantic F(ST) for mtDNA. The majority of F(ST) outliers were unique between individual pairs of populations, but some genes show shared patterns of excess differentiation. These data indicate that gene flow is high, that selection is strong on a subset of genes, and that a variety of genes are experiencing diversifying selection at large spatial scales. This survey of polymorphism in S. balanoides provides a number of genomic tools that promise to make this a powerful model for ecological genomics of the rocky intertidal.  相似文献   

11.
Victor Guryev 《FEBS letters》2009,583(11):1668-837
Rapid advances in DNA sequencing improve existing techniques and enable new approaches in genetics and functional genomics, bringing about unprecedented coverage, resolution and sensitivity. Enhanced toolsets can facilitate the untangling of connections between genomic variation, environmental factors and phenotypic effects, providing novel opportunities, but may also pose challenges in data interpretation, especially in highly heterogeneous human populations. Laboratory rodent strains, however, offer a variety of tailored model systems with controlled genetic backgrounds, facilitating complex genotype/phenotype relationship studies. In this review we discuss the advent of massively parallel sequencing, its methodological advantage for molecular analysis in model organisms and the expectation of increased understanding of biologically relevant consequences of human genetic variation.  相似文献   

12.

Background

The detection and functional characterization of genomic structural variations are important for understanding the landscape of genetic variation in the chicken. A recently recognized aspect of genomic structural variation, called copy number variation (CNV), is gaining interest in chicken genomic studies. The aim of the present study was to investigate the pattern and functional characterization of CNVs in five characteristic chicken breeds, which will be important for future studies associating phenotype with chicken genome architecture.

Results

Using a commercial 385 K array-based comparative genomic hybridization (aCGH) genome array, we performed CNV discovery using 10 chicken samples from four local Chinese breeds and the French breed Houdan chicken. The female Anka broiler was used as a reference. A total of 281 copy number variation regions (CNVR) were identified, covering 12.8 Mb of polymorphic sequences or 1.07% of the entire chicken genome. The functional annotation of CNVRs indicated that these regions completely or partially overlapped with 231 genes and 1032 quantitative traits loci, suggesting these CNVs have important functions and might be promising resources for exploring differences among various breeds. In addition, we employed quantitative PCR (qPCR) to further validate several copy number variable genes, such as prolactin receptor, endothelin 3 (EDN3), suppressor of cytokine signaling 2, CD8a molecule, with important functions, and the results suggested that EDN3 might be a molecular marker for the selection of dark skin color in poultry production. Moreover, we also identified a new CNVR (chr24: 3484617–3512275), encoding the sortilin-related receptor gene, with copy number changes in only black-bone chicken.

Conclusions

Here, we report a genome-wide analysis of the CNVs in five chicken breeds using aCGH. The association between EDN3 and melanoblast proliferation was further confirmed using qPCR. These results provide additional information for understanding genomic variation and related phenotypic characteristics.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-934) contains supplementary material, which is available to authorized users.  相似文献   

13.
Background

Short-read resequencing of genomes produces abundant information of the genetic variation of individuals. Due to their numerous nature, these variants are rarely exhaustively validated. Furthermore, low levels of undetected variant miscalling will have a systematic and disproportionate impact on the interpretation of individual genome sequence information, especially should these also be carried through into in reference databases of genomic variation.

Results

We find that sequence variation from short-read sequence data is subject to recurrent-yet-intermittent miscalling that occurs in a sequence intrinsic manner and is very sensitive to sequence read length. The miscalls arise from difficulties aligning short reads to redundant genomic regions, where the rate of sequencing error approaches the sequence diversity between redundant regions. We find the resultant miscalled variants to be sensitive to small sequence variations between genomes, and thereby are often intrinsic to an individual, pedigree, strain or human ethnic group. In human exome sequences, we identify 2–300 recurrent false positive variants per individual, almost all of which are present in public databases of human genomic variation. From the exomes of non-reference strains of inbred mice, we identify 3–5000 recurrent false positive variants per mouse – the number of which increasing with greater distance between an individual mouse strain and the reference C57BL6 mouse genome. We show that recurrently miscalled variants may be reproduced for a given genome from repeated simulation rounds of read resampling, realignment and recalling. As such, it is possible to identify more than two-thirds of false positive variation from only ten rounds of simulation.

Conclusion

Identification and removal of recurrent false positive variants from specific individual variant sets will improve overall data quality. Variant miscalls arising are highly sequence intrinsic and are often specific to an individual, pedigree or ethnicity. Further, read length is a strong determinant of whether given false variants will be called for any given genome – which has profound significance for cohort studies that pool datasets collected and sequenced at different points in time.

  相似文献   

14.
15.
Recent pangenome studies have revealed a large fraction of the gene content within a species exhibits presence–absence variation (PAV). However, coding regions alone provide an incomplete assessment of functional genomic sequence variation at the species level. Little to no attention has been paid to noncoding regulatory regions in pangenome studies, though these sequences directly modulate gene expression and phenotype. To uncover regulatory genetic variation, we generated chromosome-scale genome assemblies for thirty Arabidopsis thaliana accessions from multiple distinct habitats and characterized species level variation in Conserved Noncoding Sequences (CNS). Our analyses uncovered not only PAV and positional variation (PosV) but that diversity in CNS is nonrandom, with variants shared across different accessions. Using evolutionary analyses and chromatin accessibility data, we provide further evidence supporting roles for conserved and variable CNS in gene regulation. Additionally, our data suggests that transposable elements contribute to CNS variation. Characterizing species-level diversity in all functional genomic sequences may later uncover previously unknown mechanistic links between genotype and phenotype.  相似文献   

16.
17.
Domestication and breeding have reshaped the genomic architecture of chicken, but the retention and loss of genomic elements during these evolutionary processes remain unclear. We present the first chicken pan-genome constructed using 664 individuals, which identified an additional approximately 66.5-Mb sequences that are absent from the reference genome (GRCg6a). The constructed pan-genome encoded 20,491 predicated protein-coding genes, of which higher expression levels are observed in conserved genes relative to dispensable genes. Presence/absence variation (PAV) analyses demonstrated that gene PAV in chicken was shaped by selection, genetic drift, and hybridization. PAV-based genome-wide association studies identified numerous candidate mutations related to growth, carcass composition, meat quality, or physiological traits. Among them, a deletion in the promoter region of IGF2BP1 affecting chicken body size is reported, which is supported by functional studies and extra samples. This is the first time to report the causal variant of chicken body size quantitative trait locus located at chromosome 27 which was repeatedly reported. Therefore, the chicken pan-genome is a useful resource for biological discovery and breeding. It improves our understanding of chicken genome diversity and provides materials to unveil the evolution history of chicken domestication.  相似文献   

18.
We report a genetic recombination map for Sorghum of 2512 loci spaced at average 0.4 cM ( approximately 300 kb) intervals based on 2050 RFLP probes, including 865 heterologous probes that foster comparative genomics of Saccharum (sugarcane), Zea (maize), Oryza (rice), Pennisetum (millet, buffelgrass), the Triticeae (wheat, barley, oat, rye), and Arabidopsis. Mapped loci identify 61.5% of the recombination events in this progeny set and reveal strong positive crossover interference acting across intervals of 相似文献   

19.
We have recently reported the construction of an nuclear magnetic resonance (NMR)-based metabonomics study platform, Automics. To examine the application of Automics in transgenic plants, we performed metabolic fingerprinting analysis, i.e., 1H NMR spectroscopy and multivariate analysis, on wild-type and transgenic Arabidopsis. We found that it was possible to distinguish wild-type from four transgenic plants by PLS-DA following application of orthogonal signal correction (OSC). Scores plot following OSC clearly demonstrates significant variation between the transgenic and non-transgenic groups, suggesting that the metabolic changes among wild-type and transgenic lines are possibly associated with transgenic event, We also found that the major contributing metabolites were some specific amino acids (i.e., threonine and alanine), which could correspond to the insertion of the selective marker BAR gene in the transgenic plants. Our data suggests that NMR-based metabonomics is an efficient method to distinguish fingerprinting difference between wild-type and transgenic plants, and can potentially be applied in the bio-safety assessment of transgenic plants.  相似文献   

20.
The distributed nature of biological knowledge poses a major challenge to the interpretation of genome-scale datasets, including those derived from microarray and proteomic studies. This report describes DAVID, a web-accessible program that integrates functional genomic annotations with intuitive graphical summaries. Lists of gene or protein identifiers are rapidly annotated and summarized according to shared categorical data for Gene Ontology, protein domain, and biochemical pathway membership. DAVID assists in the interpretation of genome-scale datasets by facilitating the transition from data collection to biological meaning.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号