首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
A recombinant M13 clone (O42) containing a 65 b.p. cDNA fragment from human fetal liver mRNA coding for glyceraldehyde-3-phosphate dehydrogenase has been identified and it has been used to isolate from a full-length human adult liver cDNA library a recombinant clone, pG1, which has been subcloned in M13 phage and completely sequenced with the chain terminator method. Besides the coding region of 1008 b.p., the cDNA sequence includes 60 nucleotides at the 5'-end and 204 nucleotides at the 3'-end up to the polyA tail. Hybridization of pG1 to human liver total RNA shows only one band about the size of pG1 cDNA. A much stronger hybridization signal was observed using RNA derived from human hepatocarcinoma and kidney carcinoma cell lines. Sequence homology between clone 042 and the homologous region of clone pG1 is 86%. On the other hand, homology among the translated sequences and the known human muscle protein sequence ranges between 77 and 90%; these data demonstrate the existence of more than one gene coding for G3PD. Southern blot of human DNA, digested with several restriction enzymes, also indicate that several homologous sequences are present in the human genome.  相似文献   

3.

Background

Less than two percent of the human genome is protein coding, yet that small fraction harbours the majority of known disease causing mutations. Despite rapidly falling whole genome sequencing (WGS) costs, much research and increasingly the clinical use of sequence data is likely to remain focused on the protein coding exome. We set out to quantify and understand how WGS compares with the targeted capture and sequencing of the exome (exome-seq), for the specific purpose of identifying single nucleotide polymorphisms (SNPs) in exome targeted regions.

Results

We have compared polymorphism detection sensitivity and systematic biases using a set of tissue samples that have been subject to both deep exome and whole genome sequencing. The scoring of detection sensitivity was based on sequence down sampling and reference to a set of gold-standard SNP calls for each sample. Despite evidence of incremental improvements in exome capture technology over time, whole genome sequencing has greater uniformity of sequence read coverage and reduced biases in the detection of non-reference alleles than exome-seq. Exome-seq achieves 95% SNP detection sensitivity at a mean on-target depth of 40 reads, whereas WGS only requires a mean of 14 reads. Known disease causing mutations are not biased towards easy or hard to sequence areas of the genome for either exome-seq or WGS.

Conclusions

From an economic perspective, WGS is at parity with exome-seq for variant detection in the targeted coding regions. WGS offers benefits in uniformity of read coverage and more balanced allele ratio calls, both of which can in most cases be offset by deeper exome-seq, with the caveat that some exome-seq targets will never achieve sufficient mapped read depth for variant detection due to technical difficulties or probe failures. As WGS is intrinsically richer data that can provide insight into polymorphisms outside coding regions and reveal genomic rearrangements, it is likely to progressively replace exome-seq for many applications.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-247) contains supplementary material, which is available to authorized users.  相似文献   

4.
The large genome size of many species hinders the development and application of genomic tools to study them. For instance, loblolly pine (Pinus taeda L.), an ecologically and economically important conifer, has a large and yet uncharacterized genome of 21.7 Gbp. To characterize the pine genome, we performed exome capture and sequencing of 14 729 genes derived from an assembly of expressed sequence tags. Efficiency of sequence capture was evaluated and shown to be similar across samples with increasing levels of complexity, including haploid cDNA, haploid genomic DNA and diploid genomic DNA. However, this efficiency was severely reduced for probes that overlapped multiple exons, presumably because intron sequences hindered probe:exon hybridizations. Such regions could not be entirely avoided during probe design, because of the lack of a reference sequence. To improve the throughput and reduce the cost of sequence capture, a method to multiplex the analysis of up to eight samples was developed. Sequence data showed that multiplexed capture was reproducible among 24 haploid samples, and can be applied for high‐throughput analysis of targeted genes in large populations. Captured sequences were de novo assembled, resulting in 11 396 expanded and annotated gene models, significantly improving the knowledge about the pine gene space. Interspecific capture was also evaluated with over 98% of all probes designed from P. taeda that were efficient in sequence capture, were also suitable for analysis of the related species Pinus elliottii Engelm.  相似文献   

5.
Mastitis is an infectious disease of the mammary gland that leads to reduced milk production and change in milk composition. Complement component C3 plays a major role as a central molecule of the complement cascade involving in killing of microorganisms, either directly or in cooperation with phagocytic cells. C3 cDNA were isolated, from Egyptian buffalo and cattle, sequenced and characterized. The C3 cDNA sequences of buffalo and cattle consist of 5025 and 5019 bp, respectively. Buffalo and cattle C3 cDNAs share 99% of sequence identity with each other. The 4986 bp open reading frame in buffalo encodes a putative protein of 1661 amino acids—as in cattle—and includes all the functional domains. Further, analysis of the C3 cDNA sequences detected six novel single-nucleotide polymorphisms (SNPs) in buffalo and three novel SNPs in cattle. The association analysis of the detected SNPs with milk somatic cell score as an indicator of mastitis revealed that the most significant association in buffalo was found in the C >A substitution (ss: 1752816097) in exon 27, whereas in cattle it was in the C >T substitution (ss: 1752816085) in exon 12. Our findings provide preliminary information about the contribution of C3 polymorphisms to mastitis resistance in buffalo and cattle.  相似文献   

6.
Targeted sequencing is a cost-efficient way to obtain answers to biological questions in many projects, but the choice of the enrichment method to use can be difficult. In this study we compared two hybridization methods for target enrichment for massively parallel sequencing and single nucleotide polymorphism (SNP) discovery, namely Nimblegen sequence capture arrays and the SureSelect liquid-based hybrid capture system. We prepared sequencing libraries from three HapMap samples using both methods, sequenced the libraries on the Illumina Genome Analyzer, mapped the sequencing reads back to the genome, and called variants in the sequences. 74-75% of the sequence reads originated from the targeted region in the SureSelect libraries and 41-67% in the Nimblegen libraries. We could sequence up to 99.9% and 99.5% of the regions targeted by capture probes from the SureSelect libraries and from the Nimblegen libraries, respectively. The Nimblegen probes covered 0.6 Mb more of the original 3.1 Mb target region than the SureSelect probes. In each sample, we called more SNPs and detected more novel SNPs from the libraries that were prepared using the Nimblegen method. Thus the Nimblegen method gave better results when judged by the number of SNPs called, but this came at the cost of more over-sampling.  相似文献   

7.
Over the next few years, the efficient use of next-generation sequencing (NGS) in human genetics research will depend heavily upon the effective mechanisms for the selective enrichment of genomic regions of interest. Recently, comprehensive exome capture arrays have become available for targeting approximately 33 Mb or ∼180,000 coding exons across the human genome. Selective genomic enrichment of the human exome offers an attractive option for new experimental designs aiming to quickly identify potential disease-associated genetic variants, especially in family-based studies. We have evaluated a 2.1 M feature human exome capture array on eight individuals from a three-generation family pedigree. We were able to cover up to 98% of the targeted bases at a long-read sequence read depth of ≥3, 86% at a read depth of ≥10, and over 50% of all targets were covered with ≥20 reads. We identified up to 14,284 SNPs and small indels per individual exome, with up to 1,679 of these representing putative novel polymorphisms. Applying the conservative genotype calling approach HCDiff, the average rate of detection of a variant allele based on Illumina 1 M BeadChips genotypes was 95.2% at ≥10x sequence. Further, we propose an advantageous genotype calling strategy for low covered targets that empirically determines cut-off thresholds at a given coverage depth based on existing genotype data. Application of this method was able to detect >99% of SNPs covered ≥8x. Our results offer guidance for “real-world” applications in human genetics and provide further evidence that microarray-based exome capture is an efficient and reliable method to enrich for chromosomal regions of interest in next-generation sequencing experiments.  相似文献   

8.

Background

The domestic pig (Sus scrofa) is both an important livestock species and a model for biomedical research. Exome sequencing has accelerated identification of protein-coding variants underlying phenotypic traits in human and mouse. We aimed to develop and validate a similar resource for the pig.

Results

We developed probe sets to capture pig exonic sequences based upon the current Ensembl pig gene annotation supplemented with mapped expressed sequence tags (ESTs) and demonstrated proof-of-principle capture and sequencing of the pig exome in 96 pigs, encompassing 24 capture experiments. For most of the samples at least 10x sequence coverage was achieved for more than 90% of the target bases. Bioinformatic analysis of the data revealed over 236,000 high confidence predicted SNPs and over 28,000 predicted indels.

Conclusions

We have achieved coverage statistics similar to those seen with commercially available human and mouse exome kits. Exome capture in pigs provides a tool to identify coding region variation associated with production traits, including loss of function mutations which may explain embryonic and neonatal losses, and to improve genomic assemblies in the vicinity of protein coding genes in the pig.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-550) contains supplementary material, which is available to authorized users.  相似文献   

9.
A cDNA containing the complete coding sequence of rabbit brown adipose tissue uncoupling protein was isolated and sequenced. The coding region is 80.6% identical to rat UCP cDNA and the protein is about 86% identical to the rat and hamster proteins. Despite the presence of 2 AATAAA polyadenylation consensus sequences in rabbit UCP cDNA, only one rabbit UCP mRNA was detected indicating that only the 3'-downstream signal is used in contrast to rat and mouse where both are used.  相似文献   

10.
Next generation DNA sequencing (NGS) technologies have revolutionized the pace at which whole genome and exome sequences can be generated. However, despite these advances, many of the methods for targeted resequencing, such as the generation of high-depth exome sequences, are somewhat limited by the relatively large amounts of starting DNA that are normally required. In the case of tumour analysis this is particularly pertinent as many tumour biopsies often return submicrogram quantities of DNA, especially when tumours are microdissected prior to analysis. Here, we present a method for exome capture and resequencing using as little as 50 ng of starting DNA. The sequencing libraries generated by this minimal starting amount (MSA-Cap) method generate datasets that are comparable to standard amount (SA) whole exome libraries that use three micrograms of starting DNA. This method, which can be performed in most laboratories using commonly available reagents, has the potential to enhance large scale profiling efforts such as the resequencing of tumour exomes.  相似文献   

11.
Most of the 160 million river buffalo in the world are in Asia where they are used extensively, both as a food source and for draught power. Only recently have investigations begun exploring the buffalo genome for variation that might influence health and productivity of these economically important animals. This paper describes the sequence variability of the toll-like receptor 5 (TLR5) gene, which recognizes bacterial flagellin and is a key player in the immune system. TLR5 is comprised of a single exon that is 2577 bp and codes 858 amino acids. We examined single-nucleotide polymorphisms (SNPs) located within the coding region. Overall, 17 SNPs were discovered, seven of which are non-synonymous. Our study population yielded four different haplotypes. We examined predicted protein domain structure and found that river buffalo, swamp buffalo, and African Forest buffalo shared the same protein domain structure and are more similar to each other than they are to cattle and American bison, which are similar to each other. PolyPhen 2 analysis revealed one amino acid substitution in the river buffalo population with potential functional significance.  相似文献   

12.
Most of the 160 million river buffalo in the world are in Asia where they are used extensively, both as a food source and for draught power. Only recently have investigations begun exploring the buffalo genome for variation that might influence health and productivity of these economically important animals. This paper describes the sequence variability of the toll-like receptor 5 (TLR5) gene, which recognizes bacterial flagellin and is a key player in the immune system. TLR5 is comprised of a single exon that is 2577?bp and codes 858?amino acids. We examined single-nucleotide polymorphisms (SNPs) located within the coding region. Overall, 17 SNPs were discovered, seven of which are non-synonymous. Our study population yielded four different haplotypes. We examined predicted protein domain structure and found that river buffalo, swamp buffalo, and African Forest buffalo shared the same protein domain structure and are more similar to each other than they are to cattle and American bison, which are similar to each other. PolyPhen 2 analysis revealed one amino acid substitution in the river buffalo population with potential functional significance.  相似文献   

13.
Cattle and water buffalo belong to the same subfamily Bovinae and share chromosome banding and gene order homology. In this study, we used genome-wide Illumina BovineSNP50 BeadChip to analyze 91 DNA samples from three breeds of water buffalo (Nili-Ravi, Murrah and their crossbred with local GuangXi buffalos in China), to demonstrate the genetic divergence between cattle and water buffalo through a large single nucleotide polymorphism (SNP) transferability study at the whole genome level, and performed association analysis of functional traits in water buffalo as well. A total of 40,766 (75.5 %) bovine SNPs were found in the water buffalo genome, but 49,936 (92.5 %) were with only one allele, and finally 935 were identified to be polymorphic and useful for association analysis in water buffalo. Therefore, the genome sequences of water buffalo and cattle shared a high level of homology but the polymorphic status of the bovine SNPs varied between these two species. The different patterns of mutations between species may associate with their phenotypic divergence due to genome evolution. Among 935 bovine SNPs, we identified a total of 9 and 7 SNPs significantly associated to fertility and milk production traits in water buffalo, respectively. However, more works in larger sample size are needed in future to verify these candidate SNPs for water buffalo.  相似文献   

14.
In this study, attempts have been made to identify and characterize water buffalo (Bubalus bubalis) mammary derived growth inhibitor (MDGI) gene, isolated from a mammary gland cDNA library of lactating buffalo. The complete MDGI cDNA was of 698 nucleotides, consisting 61 nucleotides in 5′ UTR, coding region of 402 nucleotides, and 235 nucleotides representing the 3′ UTR. Comparison of nucleotide and deduced amino acid sequence data with that of MDGI//fatty acid binding protein (FABP) of other species shows three buffalo specific nucleotide changes while seven nucleotide changes were common to cattle and buffalo. Buffalo and cattle MDGI had 100% amino acid sequence similarity, which also shared three amino acid changes: 34 (Ala-Gly), 109 (Leu-Met), and 132 (Glu-Gln) as compared to other species. Comparison with FABPs reported from other cattle tissues revealed highest amino acid sequence similarity with FABP-heart (100%) and least with FABP-liver (20.5%). Phylogenetic analysis revealed cattle MDGI to be closest to buffalo, while mouse MDGI was distantly placed, whereas different tissue derived FABPs of cattle showed FABP-heart closest and FABP-epidermis most distantly placed from buffalo MDGI. This report also differs from the earlier findings that MDGI is intermediate of FABP-heart and adipose.  相似文献   

15.
In this study, attempts have been made to identify and characterize water buffalo (Bubalus bubalis) mammary derived growth inhibitor (MDGI) gene, isolated from a mammary gland cDNA library of lactating buffalo. The complete MDGI cDNA was of 698 nucleotides, consisting 61 nucleotides in 5' UTR, coding region of 402 nucleotides, and 235 nucleotides representing the 3' UTR. Comparison of nucleotide and deduced amino acid sequence data with that of MDGI/fatty acid binding protein (FABP) of other species shows three buffalo specific nucleotide changes while seven nucleotide changes were common to cattle and buffalo. Buffalo and cattle MDGI had 100% amino acid sequence similarity, which also shared three amino acid changes: 34 (Ala-Gly), 109 (Leu-Met), and 132 (Glu-Gln) as compared to other species. Comparison with FABPs reported from other cattle tissues revealed highest amino acid sequence similarity with FABP-heart (100%) and least with FABP-liver (20.5%). Phylogenetic analysis revealed cattle MDGI to be closest to buffalo, while mouse MDGI was distantly placed, whereas different tissue derived FABPs of cattle showed FABP-heart closest and FABP-epidermis most distantly placed from buffalo MDGI. This report also differs from the earlier findings that MDGI is intermediate of FABP-heart and adipose.  相似文献   

16.
The genetic architecture of ischemic stroke is complex and is likely to include rare or low frequency variants with high penetrance and large effect sizes. Such variants are likely to provide important insights into disease pathogenesis compared to common variants with small effect sizes. Because a significant portion of human functional variation may derive from the protein-coding portion of genes we undertook a pilot study to identify variation across the human exome (i.e., the coding exons across the entire human genome) in 10 ischemic stroke cases. Our efforts focused on evaluating the feasibility and identifying the difficulties in this type of research as it applies to ischemic stroke. The cases included 8 African-Americans and 2 Caucasians selected on the basis of similar stroke subtypes and by implementing a case selection algorithm that emphasized the genetic contribution of stroke risk. Following construction of paired-end sequencing libraries, all predicted human exons in each sample were captured and sequenced. Sequencing generated an average of 25.5 million read pairs (75 bp×2) and 3.8 Gbp per sample. After passing quality filters, screening the exomes against dbSNP demonstrated an average of 2839 novel SNPs among African-Americans and 1105 among Caucasians. In an aggregate analysis, 48 genes were identified to have at least one rare variant across all stroke cases. One gene, CSN3, identified by screening our prior GWAS results in conjunction with our exome results, was found to contain an interesting coding polymorphism as well as containing excess rare variation as compared with the other genes evaluated. In conclusion, while rare coding variants may predispose to the risk of ischemic stroke, this fact has yet to be definitively proven. Our study demonstrates the complexities of such research and highlights that while exome data can be obtained, the optimal analytical methods have yet to be determined.  相似文献   

17.
The fresh water pufferfish Tetraodon nigroviridis is a model organism for studying evolution of genome and gene functions, but its mitochondrial genome (mtDNA) sequence is still not available. We determined the complete nucleotide sequence of its mtDNA using shotgun sequencing. The T. nigroviridis mtDNA was 16,462 bp, and contained 13 protein coding genes, 22 tRNAs, 2 rRNAs and a major non-coding region. The gene order was identical to the common type of vertebrate mtDNA, whereas the G + C content in the sense strand was 46.9%, much higher than most other fish species. One hundred and three SNPs were detected in the control region of the mtDNA of 35 individuals, a majority of SNPs were detected in the 5' end of the control region. A phylogenetic study including 21 fish species was performed on concatenated amino acid sequences of 12 protein coding genes, and revealed that the T. nigroviridis was clustered with Fugu rubripes into a group. The complete mtDNA sequence and SNPs in its control region will be useful in studying fish evolution, in differentiating different Tetraodon species and in analyzing genetic diversity within T. nigroviridis.  相似文献   

18.
Background  LRP5 is known to have an important relationship with bone density and a variety of other biological processes. Mapping to human chromosome 11q13.2, LRP5 shows considerable evolutionary conservation. Orthologs of this gene exist in many species, although comparison of human LRP5 with other non-human primates has not been performed until now.
Methods  We reported the complementary DNA (cDNA) sequence and deduced amino acid sequence for baboon LRP5 , and compared the baboon and human sequences. cDNA sequences for 21 baboons were examined to identify single-nucleotide polymorphisms (SNPs).
Results  Sequences of coding regions in human and baboon LRP5 showed 97– 99% homology. Twenty-five SNPs were identified in the coding region of baboon LRP5 .
Conclusion  The observed degree of coding sequence homology in LRP5 led us to expect that the baboon may serve as a useful model for future research into the role(s) of this gene in primate metabolic diseases.  相似文献   

19.

Background

Techniques enabling targeted re-sequencing of the protein coding sequences of the human genome on next generation sequencing instruments are of great interest. We conducted a systematic comparison of the solution-based exome capture kits provided by Agilent and Roche NimbleGen. A control DNA sample was captured with all four capture methods and prepared for Illumina GAII sequencing. Sequence data from additional samples prepared with the same protocols were also used in the comparison.

Results

We developed a bioinformatics pipeline for quality control, short read alignment, variant identification and annotation of the sequence data. In our analysis, a larger percentage of the high quality reads from the NimbleGen captures than from the Agilent captures aligned to the capture target regions. High GC content of the target sequence was associated with poor capture success in all exome enrichment methods. Comparison of mean allele balances for heterozygous variants indicated a tendency to have more reference bases than variant bases in the heterozygous variant positions within the target regions in all methods. There was virtually no difference in the genotype concordance compared to genotypes derived from SNP arrays. A minimum of 11× coverage was required to make a heterozygote genotype call with 99% accuracy when compared to common SNPs on genome-wide association arrays.

Conclusions

Libraries captured with NimbleGen kits aligned more accurately to the target regions. The updated NimbleGen kit most efficiently covered the exome with a minimum coverage of 20×, yet none of the kits captured all the Consensus Coding Sequence annotated exons.  相似文献   

20.
The present study was carried out to identify and annotate the genome wide SNPs in Murrah buffalo genome. A total of 21.2 million raw reads from 4 pooled female Murrah buffalo samples were obtained using restriction enzyme digestion followed by sequencing with Illumina Hiseq 2000. After quality filtration, the reads were aligned to Murrah buffalo genome (ICAR-NBAGR) and Water buffalo genome (UMD_CASPUR_WB_2.0) which resulted in 99.37% and 99.67% of the reads aligning, respectively. A total of 130,688 high quality SNPs along with 35,110 indels were identified versus the Murrah bufffalo genome. Similarly 219,856 high quality SNPs along with 15,201 indels were identified versus the Water buffalo genome. We report 483 SNPs in 66 genes affecting Milk Production, 436 SNPs in 38 genes affecting fertility and 559 SNPs in 72 genes affecting other major traits. The average genome coverage was 13.4% and 14.8% versus the Murrah and Water buffalo genomes, respectively.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号