首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
India represents an amazing confluence of geographically, linguistically and socially disparate ethnic populations (Indian Genome Variation Consortium, J Genet 87:3–20, 2008). Understanding the genetic diversity of Indian population remains a daunting task. In this paper we present detailed analysis of genomic variations (high-depth coverage (~?30×) using Illumina Hiseq 2000 platform) from three healthy Indian male individuals each belonging to three geographically delineated regions and linguistic phylum viz. high altitude region of Ladakh (Tibeto-Burman linguistic phylum), sub mountainous region of Kumaun (Indo-European linguistic phylum) and sea level region of Telangana (Dravidian linguistic phylum) for probing the extent of genetic diversity in our population. The sequencing analysis provided high quality data (~?95% of the total reads aligned to the human reference genome for each sample) and very good alignment quality (>?80% of the filtered mapped reads had a quality score of 60). A total of 4.3, 3.7 and 4.3 million single nucleotide variations were identified in the genome of high altitude, sub mountainous and sea level respectively by comparing with human reference genome. Approximately 17.3, 18.2, 17.4% of the variants were unique in the three genomes. The study identified many novel variations in the three diverse genomes (132,970 in Ladakh, 112,317 in Kumaun and 128,881 in Telangana individual) and is an important resource for creating a baseline and a comprehensive catalogue of human genomic variation across the Indian as well as the Asian continent.  相似文献   

2.
3.

Background

The 1000 Genome project paved the way for sequencing diverse human populations. New genome projects are being established to sequence underrepresented populations helping in understanding human genetic diversity. The Kuwait Genome Project an initiative to sequence individual genomes from the three subgroups of Kuwaiti population namely, Saudi Arabian tribe; “tent-dwelling” Bedouin; and Persian, attributing their ancestry to different regions in Arabian Peninsula and to modern-day Iran (West Asia). These subgroups were in line with settlement history and are confirmed by genetic studies. In this work, we report whole genome sequence of a Kuwaiti native from Persian subgroup at >37X coverage.

Results

We document 3,573,824 SNPs, 404,090 insertions/deletions, and 11,138 structural variations. Out of the reported SNPs and indels, 85,939 are novel. We identify 295 ‘loss-of-function’ and 2,314 ’deleterious’ coding variants, some of which carry homozygous genotypes in the sequenced genome; the associated phenotypes include pharmacogenomic traits such as greater triglyceride lowering ability with fenofibrate treatment, and requirement of high warfarin dosage to elicit anticoagulation response. 6,328 non-coding SNPs associate with 811 phenotype traits: in congruence with medical history of the participant for Type 2 diabetes and β-Thalassemia, and of participant’s family for migraine, 72 (of 159 known) Type 2 diabetes, 3 (of 4) β-Thalassemia, and 76 (of 169) migraine variants are seen in the genome. Intergenome comparisons based on shared disease-causing variants, positions the sequenced genome between Asian and European genomes in congruence with geographical location of the region. On comparison, bead arrays perform better than sequencing platforms in correctly calling genotypes in low-coverage sequenced genome regions however in the event of novel SNP or indel near genotype calling position can lead to false calls using bead arrays.

Conclusions

We report, for the first time, reference genome resource for the population of Persian ancestry. The resource provides a starting point for designing large-scale genetic studies in Peninsula including Kuwait, and Persian population. Such efforts on populations under-represented in global genome variation surveys help augment current knowledge on human genome diversity.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1233-x) contains supplementary material, which is available to authorized users.  相似文献   

4.
Next‐generation sequencing allows access to a large quantity of genomic data. In plants, several studies used whole chloroplast genome sequences for inferring phylogeography or phylogeny. Even though the chloroplast is a haploid organelle, NGS plastome data identified a nonnegligible number of intra‐individual polymorphic SNPs. Such observations could have several causes such as sequencing errors, the presence of heteroplasmy or transfer of chloroplast sequences in the nuclear and mitochondrial genomes. The occurrence of allelic diversity has practical important impacts on the identification of diversity, the analysis of the chloroplast data and beyond that, significant evolutionary questions. In this study, we show that the observed intra‐individual polymorphism of chloroplast sequence data is probably the result of plastid DNA transferred into the mitochondrial and/or the nuclear genomes. We further assess nine different bioinformatics pipelines’ error rates for SNP and genotypes calling using SNPs identified in Sanger sequencing. Specific pipelines are adequate to deal with this issue, optimizing both specificity and sensitivity. Our results will allow a proper use of whole chloroplast NGS sequence and will allow a better handling of NGS chloroplast sequence diversity.  相似文献   

5.
ABSTRACT: BACKGROUND: The genetic background of the cynomolgus macaque (Macaca fascicularis) is made complex by the high genetic diversity, population structure, and gene introgression from the closely related rhesus macaque (Macaca mulatta). Herein we report the whole-genome sequence of a Malaysian cynomolgus macaque male with more than 40-fold coverage, which was determined using a resequencing method based on the Indian rhesus macaque genome. RESULTS: We identified approximately 9.7 million single nucleotide variants (SNVs) between the Malaysian cynomolgus and the Indian rhesus macaque genomes. Compared with humans, a smaller nonsynonymous/synonymous SNV ratio in the cynomolgus macaque suggests more effective removal of slightly deleterious mutations. Comparison of two cynomolgus (Malaysian and Vietnamese) and two rhesus (Indian and Chinese) macaque genomes, including previously published macaque genomes, suggests that Indochinese cynomolgus macaques have been more affected by gene introgression from rhesus macaques. We further identified 60 nonsynonymous SNVs that completely differentiated the cynomolgus and rhesus macaque genomes, and that could be important candidate variants for determining species-specific responses to drugs and pathogens. The demographic inference using the genome sequence data revealed that Malaysian cynomolgus macaques have experienced at least three population bottlenecks. CONCLUSIONS: This list of whole-genome SNVs will be useful for many future applications, such as an array-based genotyping system for macaque individuals. High-quality whole-genome sequencing of the cynomolgus macaque genome may aid studies on finding genetic differences that are responsible for phenotypic diversity in macaques and may help control genetic backgrounds among individuals.  相似文献   

6.
Babesia gibsoni, the causative agent of canine piroplasmosis, is a tick-borne intraerythrocytic protozoan parasite predominantly reported in Asian countries. The present study aimed at genotypic characterization of B. gibsoni isolates prevalent in dogs in Kerala, a southern state of India. Blood samples were collected from 272 dogs in Kerala and B. gibsoni infection was detected by microscopy and polymerase chain reaction (PCR). Molecular confirmation of B. gibsoni parasites was carried out by 18S rRNA nested-PCR, followed by sequencing. Nested-PCR detected a higher percentage of dogs (40.44%) positive for B. gibsoni infection than microscopy where 15.81% dogs were detected positive for infection. Genetic characterization of B. gibsoni isolates (n = 11) prevalent in dogs in the state of Kerala was carried out by PCR amplification and sequencing of the 855 bp thrombospondin-related adhesive protein (TRAP) gene fragment. Phylogenetic analysis of the B. gibsoni TRAP (BgTRAP) gene revealed that B. gibsoni isolates from Kerala formed a distinct cluster with the isolates from north India and Bangladesh, away from other East Asian isolates. Nucleotide analysis of the tandem repeats of BgTRAP gene showed considerable genetic variation among Indian isolates that was shared by B. gibsoni isolates of Bangladesh but not by the isolates of East Asian countries. The results of the present study further confirmed that B. gibsoni parasites in a distinct genetic clade are endemic in dogs in India and Bangladesh. However, elaborate studies are required for better understanding of the genetic diversity of B. gibsoni.  相似文献   

7.
Single nucleotide polymorphisms (SNPs) are the most abundant DNA markers in plant genomes. In this study, based on 54,465 SNPs between the genomes of two Indica varieties, Minghui 63 (MH63) and Zhenshan 97 (ZS97) and additional 20,705 SNPs between the MH63 and Nipponbare genomes, we identified and confirmed 1,633 well-distributed SNPs by PCR and Sanger sequencing. From these, a set of 372 SNPs were further selected to analyze the patterns of genetic diversity in 300 representative rice inbred lines from 22 rice growing countries worldwide. Using this set of SNPs, we were able to uncover the well-known Indica-Japonica subspecific differentiation and geographic differentiations within Indica and Japonica. Furthermore, our SNP results revealed some common and contrasting patterns of the haplotype diversity along different rice chromosomes in the Indica and Japonica accessions, which suggest different evolutionary forces possibly acting in specific regions of the rice genome during domestication and evolution of rice. Our results demonstrated that this set of SNPs can be used as anchor SNPs for large scale genotyping in rice molecular breeding research involving Indica-Japonica and Indica-Indica crosses.  相似文献   

8.
9.
The prevalence of different H. pylori genotypes in various geographical regions indicates region-specific adaptations during the course of evolution. Complete genomes of H. pylori from countries with high infection burdens, such as India, have not yet been described. Herein we present genome sequences of two H. pylori strains, NAB47 and NAD1, from India. In this report, we briefly mention the sequencing and finishing approaches, genome assembly with downstream statistics, and important features of the two draft genomes, including their phylogenetic status. We believe that these genome sequences and the comparative genomics emanating thereupon will help us to clearly understand the ancestry and biology of the Indian H. pylori genotypes, and this will be helpful in solving the so-called Indian enigma, by which high infection rates do not corroborate the minuscule number of serious outcomes observed, including gastric cancer.  相似文献   

10.

Background

Several genomes have now been sequenced, with millions of genetic variants annotated. While significant progress has been made in mapping single nucleotide polymorphisms (SNPs) and small (<10 bp) insertion/deletions (indels), the annotation of larger structural variants has been less comprehensive. It is still unclear to what extent a typical genome differs from the reference assembly, and the analysis of the genomes sequenced to date have shown varying results for copy number variation (CNV) and inversions.

Results

We have combined computational re-analysis of existing whole genome sequence data with novel microarray-based analysis, and detect 12,178 structural variants covering 40.6 Mb that were not reported in the initial sequencing of the first published personal genome. We estimate a total non-SNP variation content of 48.8 Mb in a single genome. Our results indicate that this genome differs from the consensus reference sequence by approximately 1.2% when considering indels/CNVs, 0.1% by SNPs and approximately 0.3% by inversions. The structural variants impact 4,867 genes, and >24% of structural variants would not be imputed by SNP-association.

Conclusions

Our results indicate that a large number of structural variants have been unreported in the individual genomes published to date. This significant extent and complexity of structural variants, as well as the growing recognition of their medical relevance, necessitate they be actively studied in health-related analyses of personal genomes. The new catalogue of structural variants generated for this genome provides a crucial resource for future comparison studies.  相似文献   

11.
With the expansion of next‐generation sequencing technology and advanced bioinformatics, there has been a rapid growth of genome sequencing projects. However, while this technology enables the rapid and cost‐effective assembly of draft genomes, the quality of these assemblies usually falls short of gold standard genome assemblies produced using the more traditional BAC by BAC and Sanger sequencing approaches. Assembly validation is often performed by the physical anchoring of genetically mapped markers, but this is prone to errors and the resolution is usually low, especially towards centromeric regions where recombination is limited. New approaches are required to validate reference genome assemblies. The ability to isolate individual chromosomes combined with next‐generation sequencing permits the validation of genome assemblies at the chromosome level. We demonstrate this approach by the assessment of the recently published chickpea kabuli and desi genomes. While previous genetic analysis suggests that these genomes should be very similar, a comparison of their chromosome sizes and published assemblies highlights significant differences. Our chromosomal genomics analysis highlights short defined regions that appear to have been misassembled in the kabuli genome and identifies large‐scale misassembly in the draft desi genome. The integration of chromosomal genomics tools within genome sequencing projects has the potential to significantly improve the construction and validation of genome assemblies. The approach could be applied both for new genome assemblies as well as published assemblies, and complements currently applied genome assembly strategies.  相似文献   

12.
Estimating genetic diversity and inferring the evolutionary history of Plasmodium falciparum could be helpful in understanding origin and spread of virulent and drug‐resistant forms of the malaria pathogen and therefore contribute to malaria control programme. Genetic diversity of the whole mitochondrial (mt) genome of P. falciparum sampled across the major distribution ranges had been reported, but no Indian P. falciparum isolate had been analysed so far, even though India is highly endemic to P. falciparum malaria. We have sequenced the whole mt genome of 44 Indian field isolates and utilized published data set of 96 genome sequences to present global genetic diversity and to revisit the evolutionary history of P. falciparum. Indian P. falciparum presents high genetic diversity with several characteristics of ancestral populations and shares many of the genetic features with African and to some extent Papua New Guinean (PNG) isolates. Similar to African isolates, Indian P. falciparum populations have maintained high effective population size and undergone rapid expansion in the past with oldest time to the most recent common ancestor (TMRCA). Interestingly, one of the four single nucleotide polymorphisms (SNPs) that differentiates P. falciparum from P. falciparum‐like isolates (infecting non‐human primates in Africa) was found to be segregating in five Indian P. falciparum isolates. This SNP was in tight linkage with other two novel SNPs that were found exclusively in these five Indian isolates. The results on the mt genome sequence analyses of Indian isolates on the whole add to the current understanding on the evolutionary history of P. falciparum.  相似文献   

13.
In present study we describe the sequencing and annotated analysis of the individual genome of Estonian. Using SOLID technology we generated 2,449,441,916 of 50-bp reads. The Bioscope version 1.3 was used for mapping and pairing of reads to the NCBI human genome reference (build 36, hg18). Bioscope enables also the annotation of the results of variant (tertiary) analysis. The average mapping of reads was 75.5% with total coverage of 107.72 Gb. resulting in mean fold coverage of 34.6. We found 3,482,975 SNPs out of which 352,492 were novel. 21,222 SNPs were in coding region: 10,649 were synonymous SNPs, 10,360 were nonsynonymous missense SNPs, 155 were nonsynonymous nonsense SNPs and 58 were nonsynonymous frameshifts. We identified 219 CNVs with total base pair coverage of 37,326,300 bp and 87,451 large insertion/deletion polymorphisms covering 10,152,256 bp of the genome. In addition, we found 285,864 small size insertion/deletion polymorphisms out of which 133,969 were novel. Finally, we identified 53 inversions, 19 overlapped genes and 2 overlapped exons. Interestingly, we found the region in chromosome 6 to be enriched with the coding SNPs and CNVs. This study confirms previous findings, that our genomes are more complex and variable as thought before. Therefore, sequencing of the personal genomes followed by annotation would improve the analysis of heritability of phenotypes and our understandings on the functions of genome.  相似文献   

14.
Here we use whole-genome de novo assembly of second-generation sequencing reads to map structural variation (SV) in an Asian genome and an African genome. Our approach identifies small- and intermediate-size homozygous variants (1-50 kb) including insertions, deletions, inversions and their precise breakpoints, and in contrast to other methods, can resolve complex rearrangements. In total, we identified 277,243 SVs ranging in length from 1-23 kb. Validation using computational and experimental methods suggests that we achieve overall <6% false-positive rate and <10% false-negative rate in genomic regions that can be assembled, which outperforms other methods. Analysis of the SVs in the genomes of 106 individuals sequenced as part of the 1000 Genomes Project suggests that SVs account for a greater fraction of the diversity between individuals than do single-nucleotide polymorphisms (SNPs). These findings demonstrate that whole-genome de novo assembly is a feasible approach to deriving more comprehensive maps of genetic variation.  相似文献   

15.
Whole genome sequencing studies are essential to obtain a comprehensive understanding of the vast pattern of human genomic variations. Here we report the results of a high-coverage whole genome sequencing study for 44 unrelated healthy Caucasian adults, each sequenced to over 50-fold coverage (averaging 65.8×). We identified approximately 11 million single nucleotide polymorphisms (SNPs), 2.8 million short insertions and deletions, and over 500,000 block substitutions. We showed that, although previous studies, including the 1000 Genomes Project Phase 1 study, have catalogued the vast majority of common SNPs, many of the low-frequency and rare variants remain undiscovered. For instance, approximately 1.4 million SNPs and 1.3 million short indels that we found were novel to both the dbSNP and the 1000 Genomes Project Phase 1 data sets, and the majority of which (∼96%) have a minor allele frequency less than 5%. On average, each individual genome carried ∼3.3 million SNPs and ∼492,000 indels/block substitutions, including approximately 179 variants that were predicted to cause loss of function of the gene products. Moreover, each individual genome carried an average of 44 such loss-of-function variants in a homozygous state, which would completely “knock out” the corresponding genes. Across all the 44 genomes, a total of 182 genes were “knocked-out” in at least one individual genome, among which 46 genes were “knocked out” in over 30% of our samples, suggesting that a number of genes are commonly “knocked-out” in general populations. Gene ontology analysis suggested that these commonly “knocked-out” genes are enriched in biological process related to antigen processing and immune response. Our results contribute towards a comprehensive characterization of human genomic variation, especially for less-common and rare variants, and provide an invaluable resource for future genetic studies of human variation and diseases.  相似文献   

16.
Comparative analyses of complete chloroplast (cp) DNA sequences within a species may provide clues to understand the population dynamics and colonization histories of plant species. Equisetum arvense (Equisetaceae) is a widely distributed fern species in northeastern Asia, Europe, and North America. The complete cp DNA sequences from Asian and American E. arvense individuals were compared in this study. The Asian E. arvense cp genome was 583 bp shorter than that of the American E. arvense. In total, 159 indels were observed between two individuals, most of which were concentrated on the hypervariable trnY-trnE intergenic spacer (IGS) in the large single-copy (LSC) region of the cp genome. This IGS region held a series of 19 bp repeating units. The numbers of the 19 bp repeat unit were responsible for 78% of the total length difference between the two cp genomes. Furthermore, only other closely related species of Equisetum also show the hypervariable nature of the trnY-trnE IGS. By contrast, only a single indel was observed in the gene coding regions: the ycf1 gene showed 24 bp differences between the two continental individuals due to a single tandem-repeat indel. A total of 165 single-nucleotide polymorphisms (SNPs) were recorded between the two cp genomes. Of these, 52 SNPs (31.5%) were distributed in coding regions, 13 SNPs (7.9%) were in introns, and 100 SNPs (60.6%) were in intergenic spacers (IGS). The overall difference between the Asian and American E. arvense cp genomes was 0.12%. Despite the relatively high genetic diversity between Asian and American E. arvense, the two populations are recognized as a single species based on their high morphological similarity. This indicated that the two regional populations have been in morphological stasis.  相似文献   

17.
Kumar  Deepender  Chhokar  Vinod  Sheoran  Sonia  Singh  Rajender  Sharma  Pradeep  Jaiswal  Sarika  Iquebal  M. A.  Jaiswar  Akanksha  Jaisri  J.  Angadi  U. B.  Rai  Anil  Singh  G. P.  Kumar  Dinesh  Tiwari  Ratan 《Molecular biology reports》2020,47(1):293-306

Genetic diversity is crucial for successful adaptation and sustained improvement in crops. India is bestowed with diverse agro-climatic conditions which makes it rich in wheat germplasm adapted to various niches. Germplasm repository consists of local landraces, trait specific genetic stocks including introgressions from wild relatives, exotic collections, released varieties, and improved germplasm. Characterization of genetic diversity is done using morpho-physiological characters as well as by analyzing variations at DNA level. However, there are not many reports on array based high throughput SNP markers having characteristics of genome wide coverage employed in Indian spring wheat germplasm. Amongst wheat SNP arrays, 35K Axiom Wheat Breeder’s Array has the highest SNP polymorphism efficiency suitable for genetic mapping and genetic diversity characterization. Therefore, genotyping was done using 35K in 483 wheat genotypes resulting in 14,650 quality filtered SNPs, that were distributed across the B (~?50%), A (~?39%), and D (~?10%) genomes. The total genetic distance coverage was 4477.85 cM with 3.27 SNP/cM and 0.49 cM/SNP as average marker density and average inter-marker distance, respectively. The PIC ranged from 0.09 to 0.38 with an average of 0.29 across genomes. Population structure and Principal Coordinate Analysis resulted in two subpopulations (SP1 and SP2). The analysis of molecular variance revealed the genetic variation of 2% among and 98% within subpopulations indicating high gene flow between SP1 and SP2. The subpopulation SP2 showed high level of genetic diversity based on genetic diversity indices viz. Shannon’s information index (I)?=?0.648, expected heterozygosity (He)?=?0.456 and unbiased expected heterozygosity (uHe)?=?0.456. To the best of our knowledge, this study is the first to include the largest set of Indian wheat genotypes studied exclusively for genetic diversity. These findings may serve as a potential source for the identification of uncharacterized QTL/gene using genome wide association studies and marker assisted selection in wheat breeding programs.

  相似文献   

18.
Advances in high‐throughput sequencing have promoted the collection of reference genomes and genome‐wide diversity. However, the assessment of genomic variation among populations has hitherto mainly been surveyed through single‐nucleotide polymorphisms (SNPs) and largely ignored the often major fraction of genomes represented by transposable elements (TEs). Despite accumulating evidence supporting the evolutionary significance of TEs, comprehensive surveys remain scarce. Here, we sequenced the full genomes of 304 individuals of Arabis alpina sampled from four nearby natural populations to genotype SNPs as well as polymorphic long terminal repeat retrotransposons (polymorphic TEs; i.e., presence/absence of TE insertions at specific loci). We identified 291,396 SNPs and 20,548 polymorphic TEs, comparing their contributions to genomic diversity and divergence across populations. Few SNPs were shared among populations and overall showed high population‐specific variation, whereas most polymorphic TEs segregated among populations. The genomic context of these two classes of variants further highlighted candidate adaptive loci having a putative impact on functional genes. In particular, 4.96% of the SNPs were identified as nonsynonymous or affecting start/stop codons. In contrast, 43% of the polymorphic TEs were present next to Arabis genes enriched in functional categories related to the regulation of reproduction and responses to biotic as well as abiotic stresses. This unprecedented data set, mapping variation gained from SNPs and complementary polymorphic TEs within and among populations, will serve as a rich resource for addressing microevolutionary processes shaping genome variation.  相似文献   

19.
20.
Date palm is a very important crop in western Asia and northern Africa, and it is the oldest domesticated fruit tree with archaeological records dating back 5000 years. The huge economic value of this crop has generated considerable interest in breeding programs to enhance production of dates. One of the major limitations of these efforts is the uncertainty regarding the number of date palm cultivars, which are currently based on fruit shape, size, color, and taste. Whole mitochondrial and plastid genome sequences were utilized to examine single nucleotide polymorphisms (SNPs) of date palms to evaluate the efficacy of this approach for molecular characterization of cultivars. Mitochondrial and plastid genomes of nine Saudi Arabian cultivars were sequenced. For each species about 60 million 100 bp paired-end reads were generated from total genomic DNA using the Illumina HiSeq 2000 platform. For each cultivar, sequences were aligned separately to the published date palm plastid and mitochondrial reference genomes, and SNPs were identified. The results identified cultivar-specific SNPs for eight of the nine cultivars. Two previous SNP analyses of mitochondrial and plastid genomes identified substantial intra-cultivar ( = intra-varietal) polymorphisms in organellar genomes but these studies did not properly take into account the fact that nearly half of the plastid genome has been integrated into the mitochondrial genome. Filtering all sequencing reads that mapped to both organellar genomes nearly eliminated mitochondrial heteroplasmy but all plastid SNPs remained heteroplasmic. This investigation provides valuable insights into how to deal with interorganellar DNA transfer in performing SNP analyses from total genomic DNA. The results confirm recent suggestions that plastid heteroplasmy is much more common than previously thought. Finally, low levels of sequence variation in plastid and mitochondrial genomes argue for using nuclear SNPs for molecular characterization of date palm cultivars.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号