首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Copy-number variants (CNVs) can reach appreciable frequencies in the human population, and recent discoveries have shown that several of these copy-number polymorphisms (CNPs) are associated with human diseases, including lupus, psoriasis, Crohn disease, and obesity. Despite new advances, significant biases remain in terms of CNP discovery and genotyping. We developed a method based on single-channel intensity data and benchmarked against copy numbers determined from sequencing read depth to successfully obtain CNP genotypes for 1495 CNPs from 487 human DNA samples of diverse ethnic backgrounds. This microarray contained CNPs in segmental duplication-rich regions and insertions of sequences not represented in the reference genome assembly or on standard SNP microarray platforms. We observe that CNPs in segmental duplications are more likely to be population differentiated than CNPs in unique regions (p = 0.015) and that biallelic CNPs show greater stratification when compared to frequency-matched SNPs (p = 0.0026). Although biallelic CNPs show a strong correlation of copy number with flanking SNP genotypes, the majority of multicopy CNPs do not (40% with r > 0.8). We selected a subset of CNPs for further characterization in 1876 additional samples from 62 populations; this revealed striking population-differentiated structural variants in genes of clinical significance such as OCLN, a tight junction protein involved in hepatitis C viral entry. Our microarray design allows these variants to be rapidly tested for disease association and our results suggest that CNPs (especially those that cannot be imputed from SNP genotypes) might have contributed disproportionately to human diversity and selection.  相似文献   

2.
The killer-cell immunoglobulin-like receptor (KIR) complex on chromosome 19 encodes receptors that modulate the activity of natural killer cells, and variation in these genes has been linked to infectious and autoimmune disease, as well as having bearing on pregnancy and transplant outcomes. The medical relevance and high variability of KIR genes makes short-read sequencing an attractive technology for interrogating the region, providing a high-throughput, high-fidelity sequencing method that is cost-effective. However, because this gene complex is characterized by extensive nucleotide polymorphism, structural variation including gene fusions and deletions, and a high level of homology between genes, its interrogation at high resolution has been thwarted by bioinformatic challenges, with most studies limited to examining presence or absence of specific genes. Here, we present the PING (Pushing Immunogenetics to the Next Generation) pipeline, which incorporates empirical data, novel alignment strategies and a custom alignment processing workflow to enable high-throughput KIR sequence analysis from short-read data. PING provides KIR gene copy number classification functionality for all KIR genes through use of a comprehensive alignment reference. The gene copy number determined per individual enables an innovative genotype determination workflow using genotype-matched references. Together, these methods address the challenges imposed by the structural complexity and overall homology of the KIR complex. To determine copy number and genotype determination accuracy, we applied PING to European and African validation cohorts and a synthetic dataset. PING demonstrated exceptional copy number determination performance across all datasets and robust genotype determination performance. Finally, an investigation into discordant genotypes for the synthetic dataset provides insight into misaligned reads, advancing our understanding in interpretation of short-read sequencing data in complex genomic regions. PING promises to support a new era of studies of KIR polymorphism, delivering high-resolution KIR genotypes that are highly accurate, enabling high-quality, high-throughput KIR genotyping for disease and population studies.  相似文献   

3.
PURPOSE OF REVIEW: This review examines the role of copy number variation in the human genome as a newly recognized determinant of lipoprotein and metabolic phenotypes. RECENT FINDINGS: Much of the recent progress defining the molecular basis of lipoprotein disorders has been the result of studying genomic DNA at the single nucleotide level, for instance with nucleotide sequence analysis or genotyping to detect single nucleotide polymorphisms. Focus on single nucleotides, however, fails to capture the complete spectrum of potential genetic variability. Recent genome-wide mapping studies have demonstrated the surprising ubiquity of large-scale copy number variations in apparently healthy people, adding to the complexity of the 'normal' genome, but also emphasizing this form of genetic variation as a potential disease mechanism. The application of this understanding to the genetics of lipoprotein disorders has been rapid. For instance, the use of novel techniques to detect copy number variations, such as multiplex ligation-dependent probe amplification, has revealed many additional causative mutations in the low-density lipoprotein receptor gene in patients with familial hypercholesterolemia. SUMMARY: Copy number variations thus represent a new level of genomic variation that is both an important mechanism of monogenic lipoprotein disorders and a potential contributor to common complex lipoprotein and metabolic phenotypes in the general population.  相似文献   

4.
Whole genome amplification by multiple displacement amplification (MDA) offers investigators using precious genomic DNA samples a high fidelity method for amplifying nanogram quantities of DNA several thousandfold. This becomes especially important for the modemrn day genomics researcher who more and more commonly is applying today's genome scanning technologies to patient cohort samples collected years ago that are irrecoverable and invariably in short supply. We present evidence here that MDA-prepared genomic DNA includes artifacts of chromosomal copy number that resemble copy number polymorphisms (CNPs) upon analysis of the DNA on the Affymetrix 10K GeneChip. The study of CNPs in both health and disease is a rapidly growing area of research, however our current understanding of the relevance of CNPs is incomplete. Our data indicate that utilization of whole genome-amplified samples for analysis heavily reliant on accurate copy number retention could be confounded if the genomic DNA sample was subjected to MDA. We recommend that small amounts of patient cohort DNA stocks be set aside and not subjected to whole genome amplification in order to facilitate the unbiased determination of chromosomal copy numbers when desired.  相似文献   

5.
YP Zhang  FY Deng  TL Yang  F Zhang  XD Chen  H Shen  XZ Zhu  Q Tian  HW Deng 《PloS one》2012,7(9):e44292

Introduction

Human height is a highly heritable trait considered as an important factor for health. There has been limited success in identifying the genetic factors underlying height variation. We aim to identify sequence variants associated with adult height by a genome-wide association study of copy number variants (CNVs) in Chinese.

Methods

Genome-wide CNV association analyses were conducted in 1,625 unrelated Chinese adults and sex specific subgroup for height variation, respectively. Height was measured with a stadiometer. Affymetrix SNP6.0 genotyping platform was used to identify copy number polymorphisms (CNPs). We constructed a genomic map containing 1,009 CNPs in Chinese individuals and performed a genome-wide association study of CNPs with height.

Results

We detected 10 significant association signals for height (p<0.05) in the whole population, 9 and 11 association signals for Chinese female and male population, respectively. A copy number polymorphism (CNP12587, chr18:54081842-54086942, p = 2.41×10−4) was found to be significantly associated with height variation in Chinese females even after strict Bonferroni correction (p = 0.048). Confirmatory real time PCR experiments lent further support for CNV validation. Compared to female subjects with two copies of the CNP, carriers of three copies had an average of 8.1% decrease in height. An important candidate gene, ubiquitin-protein ligase NEDD4-like (NEDD4L), was detected at this region, which plays important roles in bone metabolism by binding to bone formation regulators.

Conclusions

Our findings suggest the important genetic variants underlying height variation in Chinese.  相似文献   

6.
Comparisons of human genomes show that more base pairs are altered as a result of structural variation - including copy number variation - than as a result of point mutations. Here we review advances and challenges in the discovery and genotyping of structural variation. The recent application of massively parallel sequencing methods has complemented microarray-based methods and has led to an exponential increase in the discovery of smaller structural-variation events. Some global discovery biases remain, but the integration of experimental and computational approaches is proving fruitful for accurate characterization of the copy, content and structure of variable regions. We argue that the long-term goal should be routine, cost-effective and high quality de novo assembly of human genomes to comprehensively assess all classes of structural variation.  相似文献   

7.
MOTIVATION: Array comparative genomic hybridization (aCGH) is a pervasive technique used to identify chromosomal aberrations in human diseases, including cancer. Aberrations are defined as regions of increased or decreased DNA copy number, relative to a normal sample. Accurately identifying the locations of these aberrations has many important medical applications. Unfortunately, the observed copy number changes are often corrupted by various sources of noise, making the boundaries hard to detect. One popular current technique uses hidden Markov models (HMMs) to divide the signal into regions of constant copy number called segments; a subsequent classification phase labels each segment as a gain, a loss or neutral. Unfortunately, standard HMMs are sensitive to outliers, causing over-segmentation, where segments erroneously span very short regions. RESULTS: We propose a simple modification that makes the HMM robust to such outliers. More importantly, this modification allows us to exploit prior knowledge about the likely location of "outliers", which are often due to copy number polymorphisms (CNPs). By "explaining away" these outliers with prior knowledge about the locations of CNPs, we can focus attention on the more clinically relevant aberrated regions. We show significant improvements over the current state of the art technique (DNAcopy with MergeLevels) on previously published data from mantle cell lymphoma cell lines, and on published benchmark synthetic data augmented with outliers. AVAILABILITY: Source code written in Matlab is available from http://www.cs.ubc.ca/~sshah/acgh.  相似文献   

8.
SUMMARY: We present a tool for control-free copy number alteration (CNA) detection using deep-sequencing data, particularly useful for cancer studies. The tool deals with two frequent problems in the analysis of cancer deep-sequencing data: absence of control sample and possible polyploidy of cancer cells. FREEC (control-FREE Copy number caller) automatically normalizes and segments copy number profiles (CNPs) and calls CNAs. If ploidy is known, FREEC assigns absolute copy number to each predicted CNA. To normalize raw CNPs, the user can provide a control dataset if available; otherwise GC content is used. We demonstrate that for Illumina single-end, mate-pair or paired-end sequencing, GC-contentr normalization provides smooth profiles that can be further segmented and analyzed in order to predict CNAs. AVAILABILITY: Source code and sample data are available at http://bioinfo-out.curie.fr/projects/freec/.  相似文献   

9.
The success of genome-wide association (GWA) studies for the detection of sequence variation affecting complex traits in human has spurred interest in the use of large-scale high-density single nucleotide polymorphism (SNP) genotyping for the identification of quantitative trait loci (QTL) and for marker-assisted selection in model and agricultural species. A cost-effective and efficient approach for the development of a custom genotyping assay interrogating 54,001 SNP loci to support GWA applications in cattle is described. A novel algorithm for achieving a compressed inter-marker interval distribution proved remarkably successful, with median interval of 37 kb and maximum predicted gap of <350 kb. The assay was tested on a panel of 576 animals from 21 cattle breeds and six outgroup species and revealed that from 39,765 to 46,492 SNP are polymorphic within individual breeds (average minor allele frequency (MAF) ranging from 0.24 to 0.27). The assay also identified 79 putative copy number variants in cattle. Utility for GWA was demonstrated by localizing known variation for coat color and the presence/absence of horns to their correct genomic locations. The combination of SNP selection and the novel spacing algorithm allows an efficient approach for the development of high-density genotyping platforms in species having full or even moderate quality draft sequence. Aspects of the approach can be exploited in species which lack an available genome sequence. The BovineSNP50 assay described here is commercially available from Illumina and provides a robust platform for mapping disease genes and QTL in cattle.  相似文献   

10.

Background

Single nucleotide polymorphism (SNP) arrays are important tools widely used for genotyping and copy number estimation. This technology utilizes the specific affinity of fragmented DNA for binding to surface-attached oligonucleotide DNA probes. We analyze the variability of the probe signals of Affymetrix GeneChip SNP arrays as a function of the probe sequence to identify relevant sequence motifs which potentially cause systematic biases of genotyping and copy number estimates.

Methodology/Principal Findings

The probe design of GeneChip SNP arrays enables us to disentangle different sources of intensity modulations such as the number of mismatches per duplex, matched and mismatched base pairings including nearest and next-nearest neighbors and their position along the probe sequence. The effect of probe sequence was estimated in terms of triple-motifs with central matches and mismatches which include all 256 combinations of possible base pairings. The probe/target interactions on the chip can be decomposed into nearest neighbor contributions which correlate well with free energy terms of DNA/DNA-interactions in solution. The effect of mismatches is about twice as large as that of canonical pairings. Runs of guanines (G) and the particular type of mismatched pairings formed in cross-allelic probe/target duplexes constitute sources of systematic biases of the probe signals with consequences for genotyping and copy number estimates. The poly-G effect seems to be related to the crowded arrangement of probes which facilitates complex formation of neighboring probes with at minimum three adjacent G''s in their sequence.

Conclusions

The applied method of “triple-averaging” represents a model-free approach to estimate the mean intensity contributions of different sequence motifs which can be applied in calibration algorithms to correct signal values for sequence effects. Rules for appropriate sequence corrections are suggested.  相似文献   

11.
With the completion of Human Genome Project,International HapMap Project and the publication of copy number variation in human genome,a great number of accurate,rapid,and cost-effective technologies for SNP analysis have been developed,promoting the research of the complex diseases.This article presents a review of widely used genotyping techniques,and the progress and prospect in the study of complex diseases in terms of the projects and achievements of Chinese National Human Genome Center at Shanghai(CHGC...  相似文献   

12.
Genomic copy number alteration and allelic imbalance are distinct features of cancer cells, and recent advances in the genotyping technology have greatly boosted the research in the cancer genome. However, the complicated nature of tumor usually hampers the dissection of the SNP arrays. In this study, we describe a bioinformatic tool, named GIANT, for genome-wide identification of somatic aberrations from paired normal-tumor samples measured with SNP arrays. By efficiently incorporating genotype information of matched normal sample, it accurately detects different types of aberrations in cancer genome, even for aneuploid tumor samples with severe normal cell contamination. Furthermore, it allows for discovery of recurrent aberrations with critical biological properties in tumorigenesis by using statistical significance test. We demonstrate the superior performance of the proposed method on various datasets including tumor replicate pairs, simulated SNP arrays and dilution series of normal-cancer cell lines. Results show that GIANT has the potential to detect the genomic aberration even when the cancer cell proportion is as low as 5∼10%. Application on a large number of paired tumor samples delivers a genome-wide profile of the statistical significance of the various aberrations, including amplification, deletion and LOH. We believe that GIANT represents a powerful bioinformatic tool for interpreting the complex genomic aberration, and thus assisting both academic study and the clinical treatment of cancer.  相似文献   

13.
Conventional marker-based genotyping platforms are widely available, but not without their limitations. In this context, we developed Sequence-Based Genotyping (SBG), a technology for simultaneous marker discovery and co-dominant scoring, using next-generation sequencing. SBG offers users several advantages including a generic sample preparation method, a highly robust genome complexity reduction strategy to facilitate de novo marker discovery across entire genomes, and a uniform bioinformatics workflow strategy to achieve genotyping goals tailored to individual species, regardless of the availability of a reference sequence. The most distinguishing features of this technology are the ability to genotype any population structure, regardless whether parental data is included, and the ability to co-dominantly score SNP markers segregating in populations. To demonstrate the capabilities of SBG, we performed marker discovery and genotyping in Arabidopsis thaliana and lettuce, two plant species of diverse genetic complexity and backgrounds. Initially we obtained 1,409 SNPs for arabidopsis, and 5,583 SNPs for lettuce. Further filtering of the SNP dataset produced over 1,000 high quality SNP markers for each species. We obtained a genotyping rate of 201.2 genotypes/SNP and 58.3 genotypes/SNP for arabidopsis (n?=?222 samples) and lettuce (n?=?87 samples), respectively. Linkage mapping using these SNPs resulted in stable map configurations. We have therefore shown that the SBG approach presented provides users with the utmost flexibility in garnering high quality markers that can be directly used for genotyping and downstream applications. Until advances and costs will allow for routine whole-genome sequencing of populations, we expect that sequence-based genotyping technologies such as SBG will be essential for genotyping of model and non-model genomes alike.  相似文献   

14.
Next-generation sequencing (NGS) approaches are widely used in genome-wide genetic marker discovery and genotyping. However, current NGS approaches are not easy to apply to general outbred populations (human and some major farm animals) for SNP identification because of the high level of heterogeneity and phase ambiguity in the haplotype. Here, we reported a new method for SNP genotyping, called genotyping by genome reducing and sequencing (GGRS) to genotype outbred species. Through an improved procedure for library preparation and a marker discovery and genotyping pipeline, the GGRS approach can genotype outbred species cost-effectively and high-reproducibly. We also evaluated the efficiency and accuracy of our approach for high-density SNP discovery and genotyping in a large genome pig species (2.8 Gb), for which more than 70,000 single nucleotide polymorphisms (SNPs) can be identified for an expenditure of only $80 (USD)/sample.  相似文献   

15.
Learning disability (LD) is a very common, lifelong and disabling condition, affecting about 3% of the population. Despite this, it is only over the past 10-15 years that major progress has been made towards understanding the origins of LD. In particular, genetics driven advances in technology have led to the unequivocal demonstration of the importance of genome imbalance in the aetiology of idiopathic LD (ILD). In this review we provide an overview of these advances, discussing technologies such as multi-telomere FISH and array CGH that have already emerged as well as new approaches that show diagnostic potential for the future. The advances to date have highlighted new considerations such as copy number polymorphisms (CNPs) that can complicate the interpretation of genome imbalance and its relevance to ILD. More importantly though, they have provided a remarkable approximately 15-20% improvement in diagnostic capability as well as facilitating genotype/phenotype correlations and providing new avenues for the identification and understanding of genes involved in neurocognitive function.  相似文献   

16.
In species delimitation, a formidable goal in the discipline of systematic biology, we identify and describe species morphologically and ecologically based on phenotypic data. Efficient genotyping technologies produce genetic and genomic data with relative ease, which promotes species discovery and validation using genotype data. For the last two decades, we have seen the development of species delimitation methods based on genetic distances and phylogenetic trees using genotype data. However, speciation processes via evolutionary relationship among species were mostly divorced from species delimitation. Recent approaches to drawing species boundaries use multi-locus sequence data to account for evolutionary processes including speciation and gene flow. They allow us to learn of jointly speciation and species delimitation, leveraging computational and statistical techniques developed in population genetics and phylogenetics. Here, we review the recent progress in the development of species delimitation using genotype data and discuss the future outlook for the research of developing species delimitation methods.  相似文献   

17.
SNP analysis to dissect human traits   总被引:5,自引:0,他引:5  
The analysis of complex human diseases has been spurred by the number of published genomic sequence variants - many identified in the course of sequencing the human genome. But, to be useful for genetic analysis, variants have to be mapped accurately, their frequencies in various populations determined, and automated high-throughput assay techniques developed. Recently proposed methods address these issues: the use of 'reduced representation shotgun' methods for more efficient detection of single nucleotide polymorphisms (SNPs), the employment of high-throughput genotyping techniques, the development of SNP maps that incorporate information about linkage disequilibrium, and the use of SNPs in identifying susceptibility genes for common illnesses.  相似文献   

18.
Evaluating Quantitative Variation in the Genome of ZEA MAYS   总被引:7,自引:2,他引:5       下载免费PDF全文
Genomic diversity within the species Zea mays has been examined by measuring the variation in the repetitive component of the nuclear genome among North American inbred lines and varieties. This was done by preparing a set of clones of repetitive maize sequences that differ in function, molecular arrangement and multiplicity and then using these as probes for quantitative hybridization to DNA from various maize genotypes. The comparison showed that the majority of repeated sequences are markedly variable in copy number among the ten maize strains tested.The clone sample contained the rDNA and 5S genes, the major repeat of the chromosome knobs, sequences functioning as origins of DNA replication in yeast (ARS sequences) and randomly cloned sequences of unknown function and chromosomal location. The sequences ranged in reiteration frequency from 200 to greater than 10(5) copies and included both tandemly arrayed and dispersed repeats. The copy numbers were measured by hybridizing labeled cloned sequences to aliquots of high molecular weight genomic DNA that were applied to nitrocellulose filters through a slotted template (slot blotting). The hybridization signal on an autoradiogram occurred in a narrow band that could be scored reliably with a densitometer. This provided a rapid method of determining the abundance of particular repeated sequences in individual plants and plant populations. Using this technique, we found that the copy number of repeated sequences of all types generally varied among the strains by two- to threefold, although at least one sequence showed no detectable variation. In contrast to the variability found between strains, individuals within an inbred line or variety were found to be indistinguishable in terms of specific sequence multiplicity. Each genotype has a different pattern of copy numbers for the set of repeated sequence clones, and this pattern is characteristic of all individuals of a particular genotype. The data also show that the copy number of each sequence varies independently. No strains had uniformly high or low copy numbers for the entire set of probes.  相似文献   

19.
To date, microarray-based genotyping of large, complex plant genomes has been complicated by the need to perform genome complexity reduction to obtain sufficiently strong hybridization signals. Genome complexity reduction techniques are, however, tedious and can introduce unwanted variables into genotyping assays. Here, we report a microarray-based genotyping technology for complex genomes (such as the 2.3 GB maize genome) that does not require genome complexity reduction prior to hybridization. Approximately 200,000 long oligonucleotide probes were identified as being polymorphic between the inbred parents of a mapping population and used to genotype two recombinant inbred lines. While multiple hybridization replicates provided ~97% accuracy, even a single replicate provided ~95% accuracy. Genotyping accuracy was further increased to >99% by utilizing information from adjacent probes. This microarray-based method provides a simple, high-density genotyping approach for large, complex genomes.  相似文献   

20.
Copy number variation (CNV) is implicated in important traits in multiple crop plants, but can be challenging to genotype using conventional methods. The Rhg1 locus of soybean, which confers resistance to soybean cyst nematode (SCN), is a CNV of multiple 31.2‐kb genomic units each containing four genes. Reliable, high‐throughput methods to quantify Rhg1 and other CNVs for selective breeding were developed. The CNV genotyping assay described here uses a homeologous gene copy within the paleopolyploid soybean genome to provide the internal control for a single‐tube TaqMan copy number assay. Using this assay, CNV in breeding populations can be tracked with high precision. We also show that extensive CNV exists within Fayette, a released, inbred SCN‐resistant soybean cultivar with a high copy number at Rhg1 derived from a single donor parent. Copy number at Rhg1 is therefore unstable within a released variety over a relatively small number of generations. Using this assay to select for individuals with altered copy number, plants were obtained with both increased copy number and increased SCN resistance relative to control plants. Thus, CNV genotyping technologies can be used as a new type of marker‐assisted selection to select for desirable traits in breeding populations, and to control for undesirable variation within cultivars.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号