首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 578 毫秒
1.
The p hosphorus up take 1 ( Pup1 ) locus was identified as a major quantitative trait locus (QTL) for tolerance of phosphorus deficiency in rice. Near-isogenic lines with the Pup1 region from tolerant donor parent Kasalath typically show threefold higher phosphorus uptake and grain yield in phosphorus-deficient field trials than the intolerant parent Nipponbare. In this study, we report the fine mapping of the Pup1 locus to the long arm of chromosome 12 (15.31–15.47 Mb). Genes in the region were initially identified on the basis of the Nipponbare reference genome, but did not reveal any obvious candidate genes related to phosphorus uptake. Kasalath BAC clones were therefore sequenced and revealed a 278-kbp sequence significantly different from the syntenic regions in Nipponbare (145 kb) and in the indica reference genome of 93-11 (742 kbp). Size differences are caused by large insertions or deletions (INDELs), and an exceptionally large number of retrotransposon and transposon-related elements (TEs) present in all three sequences (45%–54%). About 46 kb of the Kasalath sequence did not align with the entire Nipponbare genome, and only three Nipponbare genes (fatty acid α-dioxygenase, dirigent protein and aspartic proteinase) are highly conserved in Kasalath. Two Nipponbare genes (expressed proteins) might have evolved by at least three TE integrations in an ancestor gene that is still present in Kasalath. Several predicted Kasalath genes are novel or unknown genes that are mainly located within INDEL regions. Our results highlight the importance of sequencing QTL regions in the respective donor parent, as important genes might not be present in the current reference genomes.  相似文献   

2.

Background

Identifying insertion/deletion polymorphisms (INDELs) with high confidence has been intrinsically challenging in short-read sequencing data. Here we report our approach for improving INDEL calling accuracy by using a machine learning algorithm to combine call sets generated with three independent methods, and by leveraging the strengths of each individual pipeline. Utilizing this approach, we generated a consensus exome INDEL call set from a large dataset generated by the 1000 Genomes Project (1000G), maximizing both the sensitivity and the specificity of the calls.

Results

This consensus exome INDEL call set features 7,210 INDELs, from 1,128 individuals across 13 populations included in the 1000 Genomes Phase 1 dataset, with a false discovery rate (FDR) of about 7.0%.

Conclusions

In our study we further characterize the patterns and distributions of these exonic INDELs with respect to density, allele length, and site frequency spectrum, as well as the potential mutagenic mechanisms of coding INDELs in humans.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1333-7) contains supplementary material, which is available to authorized users.  相似文献   

3.
Single-nucleotide polymorphisms (SNPs) and insertion–deletions (INDELs) are currently the important classes of genetic markers for major crop species. In this study, methods for developing SNP markers in rapeseed (Brassica napus L.) and their in silico mapping and use for genotyping are demonstrated. For the development of SNP and INDEL markers, 181 fragments from 121 different gene sequences spanning 86 kb were examined. A combination of different selection methods (genome-specific amplification, hetero-duplex analysis and sequence analysis) allowed the detection of 18 singular fragments that showed a total of 87 SNPs and 6 INDELs between 6 different rapeseed varieties. The average frequency of sequence polymorphism was estimated to be one SNP every 247 bp and one INDEL every 3,583 bp. Most SNPs and INDELs were found in non-coding regions. Polymorphism information content values for SNP markers ranged between 0.02 and 0.50 in a set of 86 varieties. Using comparative genetics data for B. napus and Arabidopsis thaliana, an allocation of SNP markers to linkage groups in rapeseed was achieved: a unique location was determined for seven gene sequences; two and three possible locations were found for six and four sequences, respectively. The results demonstrate the usefulness of existing genomic resources for SNP discovery in rapeseed.  相似文献   

4.
Developed recently, high resolution melting (HRM) analysis is an efficient, accurate and inexpensive method for distinguishing DNA polymorphisms. HRM has been used to identify mutations in human genes, and to detect SNPs, INDELs and microsatellites in plants. However, its capacity to discriminate DNA variants in the context of complex haplotypes involving INDEL as well as SNP variants has not been examined until now. In this study, we genotyped an almond (Prunus dulcis (Mill.) D. A. Webb, syn. Prunus amygdalus Batsch) pseudo-testcross mapping population that showed segregation of complex haplotypes associated with CYP79D16 promoter sequence. The 175 bp region in question included a 7 bp INDEL and 3 SNPs, and manifested as three different haplotypes in the parents. Thus, with one homozygous and one heterozygous parent, two relevant genotypes were identified in the mapping population. Although the population displayed monomorphism with respect to the INDEL and one of the SNPs, HRM was sufficiently sensitive to distinguish genotypes on the basis of the two informative SNPs, and the resulting data were used to map CYP79D16 to linkage group 6 of the almond genome. Thus the capacity of HRM to resolve genotypes arising from complex haplotypes has been demonstrated, and this has important implications for the design of efficient HRM markers for various genetic applications including mapping, population studies and biodiversity analyses.  相似文献   

5.
A "gene-island" sequencing strategy has been developed that expedites the targeted acquisition of orthologous gene sequences from related species for comparative genome analysis. A 152-kb bacterial artificial chromosome (BAC) clone from sorghum (Sorghum bicolor) encoding phytochrome A (PHYA) was fully sequenced, revealing 16 open reading frames with a gene density similar to many regions of the rice (Oryza sativa) genome. The sequences of genes in the orthologous region of the maize (Zea mays) and rice genomes were obtained using the gene-island sequencing method. BAC clones containing the orthologous maize and rice PHYA genes were identified, sheared, subcloned, and probed with the sorghum PHYA-containing BAC DNA. Sequence analysis revealed that approximately 75% of the cross-hybridizing subclones contained sequences orthologous to those within the sorghum PHYA BAC and less than 25% contained repetitive and/or BAC vector DNA sequences. The complete sequence of four genes, including up to 1 kb of their promoter regions, was identified in the maize PHYA BAC. Nine orthologous gene sequences were identified in the rice PHYA BAC. Sequence comparison of the orthologous sorghum and maize genes aided in the identification of exons and conserved regulatory sequences flanking each open reading frame. Within genomic regions where micro-colinearity of genes is absolutely conserved, gene-island sequencing is a particularly useful tool for comparative analysis of genomes between related species.  相似文献   

6.
High resolution melting analysis of almond SNPs derived from ESTs   总被引:4,自引:1,他引:3  
High resolution melting curve (HRM) is a recent advance for the detection of SNPs. The technique measures temperature induced strand separation of short PCR amplicons, and is able to detect variation as small as one base difference between samples. It has been applied to the analysis and scan of mutations in the genes causing human diseases. In plant species, the use of this approach is limited. We applied HRM analysis to almond SNP discovery and genotyping based on the predicted SNP information derived from the almond and peach EST database. Putative SNPs were screened from almond and peach EST contigs by HRM analysis against 25 almond cultivars. All 4 classes of SNPs, INDELs and microsatellites were discriminated, and the HRM profiles of 17 amplicons were established. The PCR amplicons containing single, double and multiple SNPs produced distinctive HRM profiles. Additionally, different genotypes of INDEL and microsatellite variations were also characterised by HRM analysis. By sequencing the PCR products, 100 SNPs were validated/revealed in the HRM amplicons and their flanking regions. The results showed that the average frequency of SNPs was 1:114 bp in the genic regions, and transition to transversion ratio was 1.16:1. Rare allele frequencies of the SNPs varied from 0.02 to 0.5, and the polymorphic information contents of the SNPs were from 0.04 to 0.53 at an average of 0.31. HRM has been demonstrated to be a fast, low cost, and efficient approach for SNP discovery and genotyping, in particular, for species without much genomic information such as almond.  相似文献   

7.
The availability of a comprehensive set of resources including an entire annotated reference genome, sequenced alternative accessions, and a multitude of marker systems makes Arabidopsis thaliana an ideal platform for genetic mapping. PCR markers based on INsertions/DELetions (INDELs) are currently the most frequently used polymorphisms. For the most commonly used mapping combination, Columbia×Landsberg erecta (Col-0×Ler-0), the Cereon polymorphism database is a valuable resource for the generation of polymorphic markers. However, because the number of markers available in public databases for accessions other than Col-0 and Ler-0 is extremely low, mapping using other accessions is far from straightforward. This issue arose while cloning mutations in the Wassilewskija (Ws-4) background. In this work, approaches are described for marker generation in Ws-4 x Col-0. Complementary strategies were employed to generate 229 INDEL markers. Firstly, existing Col-0/Ler-0 Cereon predicted polymorphisms were mined for transferability to Ws-4. Secondly, Ws-0 ecotype Illumina sequence data were analyzed to identify INDELs that could be used for the development of PCR-based markers for Col-0 and Ws-4. Finally, shotgun sequencing allowed the identification of INDELs directly between Col-0 and Ws-4. The polymorphism of the 229 markers was assessed in seven widely used Arabidopsis accessions, and PCR markers that allow a clear distinction between the diverged Ws-0 and Ws-4 accessions are detailed. The utility of the markers was demonstrated by mapping more than 35 mutations in a Col-0×Ws-4 combination, an example of which is presented here. The potential contribution of next generation sequencing technologies to more traditional map-based cloning is discussed.  相似文献   

8.
Insertions/deletions (INDELs), a type of abundant length polymorphisms in the plant genomes, combine the characteristics of both simple sequence repeats (SSRs) and single-nucleotide polymorphisms (SNP), and thus can be developed as desired molecular markers for genetic studies and crop breeding. There has been no large-scale characterization of INDELs variations in Brassica napus yet. In this study, we identified a total of 538,691 INDELs in size range of 1–10 bp by aligning whole-genome re-sequencing data of 23 B. napus inbred lines (ILs) to the B. napus genome sequence of ‘Darmor-bzh.’ Of these, 104,190 INDELs were uniquely mapped on the pseudochromosomes of the reference genome. A set of 595 unique INDELs of 2–5 bp in length was selected for experimental validation in the 23 ILs. Of these INDELs, 530 (89.01 %) produced a single PCR product and were single locus. A total of 523 (87.9 %) INDELs were found polymorphic among the 23 ILs. A genetic linkage map containing 108 single-locus INDELs and 89 anchor SSR markers was constructed using 188 recombinant ILs. The majority of INDELs markers on the linkage map showed consistency with the pseudochromosomes of the B. napus cultivar ‘Darmor-bzh.’ The INDELs variations and markers reported here will be valuable resources in future for genetic studies and molecular breeding in oilseed rape.  相似文献   

9.
张乃心  张玉娟  余果  陈斌 《昆虫学报》2013,56(4):398-407
研究双翅目昆虫线粒体基因组的结构特点, 并设计其测序的通用引物, 为今后双翅目昆虫线粒体基因组的研究提供参考和依据。利用比较基因组学和生物信息学方法, 分析了已经完全测序的26个双翅目昆虫线粒体基因组的结构特点、 碱基组成和保守区, 并据此设计了双翅目昆虫基因组测序的通用引物。结果表明: 双翅目昆虫线粒体基因组长14 503~19 517 bp, 其结构保守, 含有37个编码基因, 包括13个蛋白质编码基因, 22个tRNA编码基因和2个rRNA编码基因, 此外还包含一段长度差异很大的非编码区(AT富含区)。基因组内基因排列次序稳定, 除个别基因外, 其余都与黑腹果蝇Drosophila melanogaster基因排列次序一致。基因组的碱基组成不均衡, AT含量在72.59%~85.15%之间, 碱基使用存在偏向性, 偏好使用AC碱基。全基因组的核苷酸和氨基酸序列保守, 共鉴定了11个保守区。在保守区内共设计了26对双翅目线粒体基因组测序通用引物, 扩增的目标片段都在1 200 bp以内。将该套通用引物用于葱蝇Delia antiqua线粒体全基因组测序, 结果证明其高效、 合用。  相似文献   

10.
The emergence of next-generation sequencing technologies allowed access to the vast amounts of information that are contained in the human genome. This information has contributed to the understanding of individual and population-based variability and improved the understanding of the evolutionary history of different human groups. However, the genome of a representative of the Amerindian populations had not been previously sequenced. Thus, the genome of an individual from a South American tribe was completely sequenced to further the understanding of the genetic variability of Amerindians. A total of 36.8 giga base pairs (Gbp) were sequenced and aligned with the human genome. These Gbp corresponded to 95.92% of the human genome with an estimated miscall rate of 0.0035 per sequenced bp. The data obtained from the alignment were used for SNP (single-nucleotide) and INDEL (insertion-deletion) calling, which resulted in the identification of 502,017 polymorphisms, of which 32,275 were potentially new high-confidence SNPs and 33,795 new INDELs, specific of South Native American populations. The authenticity of the sample as a member of the South Native American populations was confirmed through the analysis of the uniparental (maternal and paternal) lineages. The autosomal comparison distinguished the investigated sample from others continental populations and revealed a close relation to the Eastern Asian populations and Aboriginal Australian. Although, the findings did not discard the classical model of America settlement; it brought new insides to the understanding of the human population history. The present study indicates a remarkable genetic variability in human populations that must still be identified and contributes to the understanding of the genetic variability of South Native American populations and of the human populations history.  相似文献   

11.
Genetic improvement is important for the poultry industry, contributing to increased efficiency of meat production and quality. Because breast muscle is the most valuable part of the chicken carcass, knowledge of polymorphisms influencing this trait can help breeding programs. Therefore, the complete genome of 18 chickens from two different experimental lines (broiler and layer) from EMBRAPA was sequenced, and SNPs and INDELs were detected in a QTL region for breast muscle deposition on chicken chromosome 2 between microsatellite markers MCW0185 and MCW0264 (105 849–112 649 kb). Initially, 94 674 unique SNPs and 10 448 unique INDELs were identified in the target region. After quality filtration, 77% of the SNPs (85 765) and 60% of the INDELs (7828) were retained. The studied region contains 66 genes, and functional annotation of the filtered variants identified 517 SNPs and three INDELs in exonic regions. Of these, 357 SNPs were classified as synonymous, 153 as non‐synonymous, three as stopgain, four INDELs as frameshift and three INDELs as non‐frameshift. These exonic mutations were identified in 37 of the 66 genes from the target region, three of which are related to muscle development (DTNA, RB1CC1 and MOS). Fifteen non‐tolerated SNPs were detected in several genes (MEP1B, PRKDC, NSMAF, TRAPPC8, SDR16C5, CHD7, ST18 and RB1CC1). These loss‐of‐function and exonic variants present in genes related to muscle development can be considered candidate variants for further studies in chickens. Further association studies should be performed with these candidate mutations as should validation in commercial populations to allow a better explanation of QTL effects.  相似文献   

12.
Structure of the rat L-type pyruvate kinase gene   总被引:10,自引:0,他引:10  
  相似文献   

13.
采用RT-PCR及RACE法,克隆得到鳜鱼(Siniperca chuatsi)肝胰脏胰蛋白酶(trypsin, Try)、淀粉酶(amylase, Amy)基因 cDNA全序列.结果表明,鳜鱼Try基因cDNA全长为896 bp,其中开放阅读框 (open reading frame,ORF)为744 bp,编码247个氨基酸. 序列同源性分析发现,鳜鱼Try与 斑马鱼(Danio rerio)、非洲爪蟾(Xenopus laevis)、 小鼠Try和人TRY氨基酸序列同源性分别为81.4%、75.3%、74.5%和71.4%.鳜鱼Amy 基因cDNA全长为1 647 bp,其中ORF为1 539 bp,编码512个氨基酸.鳜鱼Amy与斑马鱼 、非洲爪蟾、小鼠Amy和人AMY氨基酸序列同源性分别为79.7%、75.4%、71.9%和70.9%. 同时对鳜鱼基因组进行PCR,获得鳜鱼Try、Amy与胃蛋白酶原(pepsinogen, Pep)全基因组DNA序列.序列分析表明,鳜鱼Try基因由4个内含子和5个外显子组成,全长1 362 bp;鳜鱼Amy基因由8个内含子和9个外显子组成,全长4 267 bp;鳜鱼Pep基因由8个内含子和9个外显子组成,全长 4 032 bp,与其它脊椎动物基因结构相似.应用Genome walker方法在鳜鱼克隆得到长度分别为1 189 bp、413 bp和527 bp的Try、Amy和Pep基因的5′侧翼区序列以及1段长为704 bp的Pep 基因3′侧翼区序列,并利用相关软件预测其中具有多个可调节其表达的调控元件.鳜鱼Try、Am y和Pep基因组全序列的克隆及其序列、结构分析和分子系统进化等的研究,为鱼类消化代谢相关基因的生理功能及表达调控机理进一步研究提供依据.  相似文献   

14.
A collection of 9,990 single-pass nuclear genomic sequences, corresponding to 5 Mb of tomato DNA, were obtained using methylation filtration (MF) strategy and reduced to 7,053 unique undermethylated genomic islands (UGIs) distributed as follows: (1) 59% non-coding sequences, (2) 28% coding sequences, (3) 12% transposons—96% of which are class I retroelements, and (4) 1% organellar sequences integrated into the nuclear genome over the past approximately 100 million years. A more detailed analysis of coding UGIs indicates that the unmethylated portion of tomato genes extends as far as 676 bp upstream and 766 bp downstream of coding regions with an average of 174 and 171 bp, respectively. Based on the analysis of the UGI copy distribution, the undermethylated portion of the tomato genome is determined to account for the majority of the unmethylated genes in the genome and is estimated to constitute 61±15 Mb of DNA (~5% of the entire genome)—which is significantly less than the 220 Mb estimated for gene-rich euchromatic arms of the tomato genome. This result indicates that, while most genes reside in the euchromatin, a significant portion of euchromatin is methylated in the intergenic spacer regions. Implications of the results for sequencing the genome of tomato and other solanaceous species are discussed.  相似文献   

15.
Brandström M  Ellegren H 《Genetics》2007,176(3):1691-1701
It is increasingly recognized that insertions and deletions (indels) are an important source of genetic as well as phenotypic divergence and diversity. We analyzed length polymorphisms identified through partial (0.25x) shotgun sequencing of three breeds of domestic chicken made by the International Chicken Polymorphism Map Consortium. A data set of 140,484 short indel polymorphisms in unique DNA was identified after filtering for microsatellite structures. There was a significant excess of tandem duplicates at indel sites, with deletions of a duplicate motif outnumbering the generation of duplicates through insertion. Indel density was lower in microchromosomes than in macrochromosomes, in the Z chromosome than in autosomes, and in 100 bp of upstream sequence, 5'-UTR, and first introns than in intergenic DNA and in other introns. Indel density was highly correlated with single nucleotide polymorphism (SNP) density. The mean density of indels in pairwise sequence comparisons was 1.9 x 10(-4) indel events/bp, approximately 5% the density of SNPs segregating in the chicken genome. The great majority of indels involved a limited number of nucleotides (median 1 bp), with A-rich motifs being overrepresented at indel sites. The overrepresentation of deletions at tandem duplicates indicates that replication slippage in duplicate sequences is a common mechanism behind indel mutation. The correlation between indel and SNP density indicates common effects of mutation and/or selection on the occurrence of indels and point mutations.  相似文献   

16.
17.
Chen D  Zhang W  Zhu ZD  Huang Y  Wang P  Zhou BB  Yang XN  Xiao HS  Zhang QH 《遗传》2010,32(12):1296-1303
文章旨在建立一种基因组目标靶序列捕捉文库的方法,并结合第二代测序技术,以实现候选基因区段的深度测序。利用Agilent公司的eArray在线平台,对1250个基因的11824个外显子共2414977bp的基因组序列进行120个碱基长度的捕捉探针(钓饵)设计,并制备成SureSelect液相靶序列捕获试剂。选用2例人基因组DNA,超声打断后末端补平并磷酸化,连接SOLiD接头,回收150bp~200bp的DNA片段,与靶序列探针杂交捕获目标序列,油包水微乳滴PCR扩增后,磁珠分离富集,上SOLiD测序系统通过工作流程分析(WFA)进行文库质量的评价,或正式测序反应。结果显示对所包含的11147个基因外显子片段设计出并合成了46509个捕捉探针,制备成SureSelect试剂盒。探针可有效地捕捉并富集基因组DNA的目标靶片段,定量PCR显示富集效率可达29倍。WFA分析表明文库可以在SOLiD仪器进行正式测序。测序结果显示靶序列区域的测序数占有效总测序数的比例达到70%,覆盖率均在200×以上。结果表明本研究所建立的SureSelect基因组靶序列捕捉、富集建立测序文库的技术路线可行,可直接用于SOLiD测序仪的测序。  相似文献   

18.
Distribution of 1000 sequenced T-DNA tags in the Arabidopsis genome   总被引:6,自引:0,他引:6  
Induction of knockout mutations by T-DNA insertion mutagenesis is widely used in studies of plant gene functions. To assess the efficiency of this genetic approach, we have sequenced PCR amplified junctions of 1000 T-DNA insertions and analysed their distribution in the Arabidopsis genome. Map positions of 973 tags could be determined unequivocally, indicating that the majority of T-DNA insertions landed in chromosomal domains of high gene density. Only 4.7% of insertions were found in interspersed, centromeric, telomeric and rDNA repeats, whereas 0.6% of sequenced tags identified chromosomally integrated segments of organellar DNAs. 35.4% of T-DNAs were localized in intervals flanked by ATG and stop codons of predicted genes, showing a distribution of 62.2% in exons and 37.8% in introns. The frequency of T-DNA tags in coding and intergenic regions showed a good correlation with the predicted size distribution of these sequences in the genome. However, the frequency of T-DNA insertions in 3'- and 5'-regulatory regions of genes, corresponding to 300 bp intervals 3' downstream of stop and 5' upstream of ATG codons, was 1.7-2.3-fold higher than in any similar interval elsewhere in the genome. The additive frequency of insertions in 5'-regulatory regions and coding domains provided an estimate for the mutation rate, suggesting that 47.8% of mapped T-DNA tags induced knockout mutations in Arabidopsis.  相似文献   

19.
盐肤木是一种重要的经济树种,可为医药和工业染料提供原料。盐肤木具有较强的抗旱、耐寒、耐盐,可在温带、暖温带和亚热带地区生长。本研究首次对盐肤木叶绿体基因组进行从头测序(de novo sequencing)组装研究。结果表明,盐肤木叶绿体基因组长度为159082 bp,具有典型的四部分结构,两个单拷贝区被一对反向重复区分隔。LSC和SSC的长度分别为85394 bp和18663 bp。叶绿体基因组总共编码126个基因,其中包括88个蛋白编码基因,8个rRNA基因,30个tRNA基因。在叶绿体基因组中,61.97%的序列为基因编码区。在盐肤木叶绿体基因组中,只有8个基因含有内含子,除ycf3基因(2个内含子)外,其余均含有1个内含子。盐肤木叶绿体基因组总共存在755个SSR位点。SSR主要由二核苷酸和单核苷酸组成,分别占60%(453)和28.74%(217)。聚类分析结果表明,漆树科与盐肤木最为接近,其次为槭树科和无患子科。本研究为盐肤木的分类提供了分子基础。本研究是关于盐肤木叶绿体基因组的首次报道,对了解其光合作用、进化和叶绿体转基因工程具有重要意义。  相似文献   

20.
从山东某商品代肉鸡场表现生长迟缓的14日龄病鸡群分离到一株鸡传染性贫血病毒(CAV)C14株。C14株感染1日龄SPF鸡能抑制对禽流感病毒(AIV)的抗体反应,还能与禽网状内皮增生病病毒(REV)在免疫抑制上起协同作用。用PCR方法分段扩增出C14基因组的三条部分重叠片段,分别克隆于T载体并进行测序,拼接后得到其全基因组序列。测序结果表明,CAV-C14株基因组全长2298bp,含有3个互相重叠的开放阅读框和1个调控区。将C14与国内外已发表的CAV参考株基因组比较,同源性为97.2%~99.2%。序列比较表明CAV非编码区中含有的多个与复制及转录调控相关已知基序的序列都非常保守。CAV的3个编码基因VP1、VP2和VP3均有一定程度变异,以VP1变异性最大,且不同毒株间的3个蛋白质氨基酸序列的变异是互不相关的。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号