共查询到20条相似文献,搜索用时 0 毫秒
1.
Martina Miju?kovi? Stuart M. Brown Zuojian Tang Cory R. Lindsay Efstratios Efstathiadis Ludovic Deriano David B. Roth 《PloS one》2012,7(10)
Defining the architecture of a specific cancer genome, including its structural variants, is essential for understanding tumor biology, mechanisms of oncogenesis, and for designing effective personalized therapies. Short read paired-end sequencing is currently the most sensitive method for detecting somatic mutations that arise during tumor development. However, mapping structural variants using this method leads to a large number of false positive calls, mostly due to the repetitive nature of the genome and the difficulty of assigning correct mapping positions to short reads. This study describes a method to efficiently identify large tumor-specific deletions, inversions, duplications and translocations from low coverage data using SVDetect or BreakDancer software and a set of novel filtering procedures designed to reduce false positive calls. Applying our method to a spontaneous T cell lymphoma arising in a core RAG2/p53-deficient mouse, we identified 40 validated tumor-specific structural rearrangements supported by as few as 2 independent read pairs. 相似文献
2.
相邻的反向重复DNA片段有形成单链内二级结构的倾向,属于一种测序困难的DNA模板。解决RNAi载体插入的反向重复片段的测序问题,为该类载体正确性的测序验证奠定基础。采用常规分子克隆方法构建表达小麦TaATG2串联反向重复片段的RNAi载体,设计2种策略对经菌落PCR初步鉴定的载体进行测序验证:一种是以完整的载体质粒为模板进行测序;另一种是先对载体进行酶切处理,切除反向重复片段中的一个后对保留另一个片段的线性载体进行测序。结果表明,第一种测序策略受到串联反向重复片段形成的单链内部二级结构的影响,测序信号在反向重复片段处出现衰减或乱峰,无法读取序列。第二种测序策略排除了2个反向重复片段之间的干扰,保留在载体上的片段测序信号清晰,序列准确。采用酶切切除一个片段后进行测序的方法,经过2次酶切和2次测序可以有效地对载体上的2个反向重复片段分别进行序列测定,进而确认构建载体的正确性。 相似文献
3.
Anastasia S. Khodakova Renee J. Smith Leigh Burgoyne Damien Abarno Adrian Linacre 《PloS one》2014,9(8)
Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations. 相似文献
4.
Daniel Aguirre de C��rcer Stuart E. Denman Chris McSweeney Mark Morrison 《Applied and environmental microbiology》2011,77(17):6310-6312
The use and validation of a strategy that allows a universal set of bar-coded sequencing primers to be appended to an amplified PCR product is described. The strategy allows a modular approach, in that the same bar code can be used with two or more target-specific primer sets, even simultaneously. 相似文献
5.
6.
Philip C. Zuzarte Robert E. Denroche Gordon Fehringer Hagit Katzov-Eckert Rayjean J. Hung John D. McPherson 《PloS one》2014,9(4)
We describe a method for pooling and sequencing DNA from a large number of individual samples while preserving information regarding sample identity. DNA from 576 individuals was arranged into four 12 row by 12 column matrices and then pooled by row and by column resulting in 96 total pools with 12 individuals in each pool. Pooling of DNA was carried out in a two-dimensional fashion, such that DNA from each individual is present in exactly one row pool and exactly one column pool. By considering the variants observed in the rows and columns of a matrix we are able to trace rare variants back to the specific individuals that carry them. The pooled DNA samples were enriched over a 250 kb region previously identified by GWAS to significantly predispose individuals to lung cancer. All 96 pools (12 row and 12 column pools from 4 matrices) were barcoded and sequenced on an Illumina HiSeq 2000 instrument with an average depth of coverage greater than 4,000×. Verification based on Ion PGM sequencing confirmed the presence of 91.4% of confidently classified SNVs assayed. In this way, each individual sample is sequenced in multiple pools providing more accurate variant calling than a single pool or a multiplexed approach. This provides a powerful method for rare variant detection in regions of interest at a reduced cost to the researcher. 相似文献
7.
Mutant screens have proven powerful for genetic dissection of a myriad of biological processes, but subsequent identification and isolation of the causative mutations are usually complex and time consuming. We have made the process easier by establishing a novel strategy that employs whole-genome sequencing to simultaneously map and identify mutations without the need for any prior genetic mapping.THE challenges posed by the identification of a causal mutation in a mutant of interest have in effect restricted the use of forward genetics to those organisms benefiting from a solid genetic toolbox. Whole-genome sequencing (WGS) is promising to revolutionize the way phenotypic traits are assigned to genes. However, current strategies to identify causal mutations using WGS require first the identification of an approximate genomic location containing the mutation of interest (Sarin et al. 2008; Smith et al. 2008; Srivatsan et al. 2008; Blumenstiel et al. 2009; Irvine et al. 2009). This is because genomes contain many natural sequence variations (Denver et al. 2004; Hillier et al. 2008; Sarin et al. 2010), which, along with mutagen-induced ones, complicate the identification of the causal mutation when an approximate genomic location has not been previously identified. Mapping has previously been achieved with time-consuming and laborious techniques that, in addition, rely on an organism''s single-nucleotide polymorphism (SNP) map and established variant strains. For example, traditional SNP-based mapping (Wicks et al. 2001; Davis et al. 2005) has previously been used in Caenorhabditis elegans to narrow down the genomic region containing the mutation of interest, prior to conducting WGS (Sarin et al. 2008). In Arabidopsis, simultaneous SNP mapping and mutation identification has been achieved with WGS, but this requires the generation of a mapping population of up to 500 F2 progeny to identify only one allele (Schneeberger et al. 2009). This is a challenging prospect for many model systems. Indeed, if the mutant phenotype is subtle, the isolation of such numbers of recombinants is very tedious. Furthermore, it is not applicable in those organisms where a mapping population cannot be generated, simply because of a lack of intercrossable variants or because of life cycles (parasitic organisms, for example) that would make it extremely difficult to follow and isolate many recombinant individuals.Here, we describe a strategy to simultaneously and rapidly locate and identify multiple mutations from a mutagenesis screen with WGS that circumvents these limitations. This powerful and straightforward method directly uses mutagen-induced nucleotide changes that are linked to the causal mutation to identify its specific genomic location, thus negating the construction of genetic mapping populations and subsequent mapping.Treatment of organisms with a chemical mutagen induces nucleotide changes throughout the genome. Following mutagenesis, backcrossing or outcrossing of the mutagenized organism to unmutagenized counterparts is performed to eliminate mutagen-induced mutations (Figure 1A; supporting information, File S2). The phenotype-causing mutation remains as only backcrossed individuals showing the phenotype of interest are retained. In addition, mutagen-induced nucleotide changes that are genetically linked to the causal mutation and physically surround it on the chromosome will remain, in contrast to unlinked nucleotide changes (Figure 1A). As a result of this genetic linkage, a high-density cluster of typical mutagen-induced variants is visualized from sequence data obtained by WGS, which is positioned around the causal mutation. By locating such high-density regions, one maps the approximate genomic location of the causal mutation and subsequently identifies the affected gene within this region.Open in a separate windowFigure 1.—Mapping mutations on the basis of density of mutagen-induced DNA damage across the genome. (A) Visual representation of our WGS cloning strategy. Mutagen treatment induces point mutations throughout the genome (red asterisks). Backcrossing to the original unmutated parent strain removes much of the mutagen-induced nucleotide changes except for the causal mutation (green asterisk) and those genetically linked to it. WGS sequencing can be used to detect canonical mutagen-induced point mutations, thus revealing a physical position for the causal mutation. Shared background variants (yellow crosses) are filtered out from WGS data by comparing the sequences of mutants sequenced side-by-side, revealing a high-density variant cluster in only one genomic region. Importantly, genomic sequences of mutants derived from the same starting strain must be compared, to allow subtraction of nucleotide variants that are common to this particular strain, through sequence comparison. (B) Physical map of total nucleotide variations per megabase across the genome compared to the wild-type reference genome for each mutant (fp6, fp9, and fp12) after WGS. (C) After sequence quality filtering, subtraction of common variants between the 3 mutants, and filtering out noncanonical EMS nucleotide changes, high-density variant peaks are obtained in one genomic location for each mutant (red boxes). Steps 1 and 3 are essential for clear visualization of the high-density peaks whereas step 2 improves visualization. (D) Close-up of variants on chromosome III for fp6. Within this peak we identified only 6 candidate mutations that could potentially affect a protein sequence. We confirmed that the missense mutation in egl-5 was the causal mutation (Figure S2). For fp9 and fp12 we identified only 10 (9 missense and 1 3′-UTR) and 4 (2 premature stop and 2 missense) candidate mutations, respectively, within each mutant''s EMS-based mapped region. Thus, our method consistently allowed precise mapping in 3 different mutants to a region small enough to contain only a handful of candidate mutations.As a proof-of-principle, we simultaneously mapped and sequenced the causal mutations of multiple C. elegans mutants isolated from an EMS mutagenesis screen using this strategy. The mutagenesis screen itself was undertaken to identify genes that controlled the reprogramming of a single cell called Y into another cell called PDA during C. elegans development (Jarriault et al. 2008). After EMS treatment, three distinct mutant alleles (fp6, fp9, and fp12) were backcrossed to the original unmutagenized strain 4-6X. It is important to note that a backcrossing or outcrossing step is necessary for the analysis of mutants obtained from all mutagenesis screens, irrespective of the type of mutant identification strategy used or the type of mutagen or organism used (and, as such, does not represent an extra step introduced by our method). The mutants then underwent WGS side-by-side (Table S1, Table S2, Figure S1, and File S2). After alignment to the wild-type N2 reference genome using MAQgene software (Bigelow et al. 2009), the sequencing data obtained for each mutant were compared, and we subtracted common nucleotide variants that were shared between at least two of our three mutants (File S1). These shared variants, which are very unlikely to be either the causal mutation or EMS-induced mutations from the screen itself, represent strain differences between the N2 used to generate the reference genome and the PS3662 strain used here for mutagenesis. Note that this step eliminated ∼2000 point mutations as potential candidates for our causal mutation. This result strongly emphasizes the advantage of conducting WGS on two or more mutants side-by-side, as reference genomes may contain many nucleotide variations when compared to organisms sequenced from the laboratory (Denver et al. 2004; Hillier et al. 2008; Sarin et al. 2010; this study) and as such would confound mutation identification.To identify EMS-induced changes linked to the causal mutation and expose its location, we looked only at variants that matched the canonical EMS-induced G/C > A/T transitions (Drake and Baltz 1976), revealing localized peaks of high-density variation on a single chromosome for each mutant (Figure 1, B and C). These peaks correspond to regions of high mutagen-induced damage that were not removed during backcrossing and therefore are most likely genetically linked to the causal mutation. We therefore focused our attention on these physical regions to identify candidate mutations within them. We localized fp6 to a 4.29-Mb region on chromosome III, fp9 to a 7.11-Mb region on chromosome X, and fp12 to a 1.28-Mb region on a different part of chromosome X (Figure 1C).As a proof of principle, we further examined the nucleotide changes present in the interval to which fp6 was linked. Taking into consideration all variant types (point mutations and indels), we identified only six candidate mutations that potentially affected a gene''s function (Figure 1D and Table S3). One of these, affecting the egl-5/hox gene, lies almost perfectly in the middle of the predicted EMS-based mapped region. We confirmed the existence of the mutation in egl-5 by manual resequencing. Both egl-5 targeted RNAi and noncomplementation with the egl-5(n945) null allele confirmed that fp6 affected egl-5 and caused the Y-to-PDA reprogramming defect (Figure S2). fp9 and fp12 each map to distinct regions on chromosome X that also contain only a handful of candidate mutations (10 and 4, respectively) (Figure 1C). Thus, our method consistently allowed precise mapping in 3 different mutants to a region small enough to contain only a handful of candidate mutations and subsequent identification of the causal mutation.We calculated that comparison of WGS data for only two mutants of the same mutagenesis screen is sufficient to localize and sequence the causal mutation (Table S4). Thirteen times sequence coverage has been found to be sufficient to identify a mutation in a pre-SNP mapped C. elegans mutant (Shen et al. 2008). Here, we tested the sequence coverage necessary to perform simultaneous mapping and mutant identification using our strategy and found that 13× was more than enough (Table S4). In addition, by performing longer reads and/or paired-end sequencing, our method can be scaled up to bigger genomes or allow multiple mutant sequencing on each flow cell lane [for, e.g., using multiplex WGS (Cronn et al. 2008)]. Furthermore, because direct sequence comparison is ultimately made between two mutants sequenced side-by-side, the quality of an organism''s reference genome (which is used only for alignment purposes) does not have a bearing on the mapping or mutant identification outcome. Moreover, recent advances in de novo alignment of short reads generated from next generation sequencing platforms (Li et al. 2010; Nowrousian et al. 2010; Webb and Rosenthal 2010; Young et al. 2010) suggest that a reference genome may not even be required to perform mutagen-based mapping and mutant identification with WGS. We predict that technical advances in these areas will make it possible to perform mutagenesis screens on any nonsequenced and genetically uncharacterized organism and use our strategy to quickly identify the causal mutation of an interesting mutant.
Open in a separate windowWe found that all of the minimal requirements tested here were more than adequate to use our mapping strategy. Therefore, it is possible that fewer backcrosses and less sequencing coverage may suffice than is shown here. For example, for genomes with a similar size to C. elegans (∼100 Mb), this method can easily be scaled up by sequencing eight mutants per flow cell. As for any WGS experiments, total cost depends on genome size.By eliminating any prior work except for back/outcrossing, a necessary step for any mutant characterization, our simple and quick strategy provides a significant saving of time and labor as the time needed to map and identify a candidate causal mutation is trimmed down to the sequencing time (currently 7 days) and sequence analysis time (<1 day, see 相似文献
TABLE 1
Summary of WGS cloning strategyConditions used | Minimal requirements tested | |
---|---|---|
Backcrossing | 4–6× | 4× enough |
No. of mutants sequenced | 3 | 2 enough |
Sequencing of mutant | 2× flow cell lanes, paired-end reads (57mer) | 1× flow cell lane enough, single-end reads (57mer) enough |
Average sequence coverage | 52.2–55.3× | 13.6× enough |
Advantages | ||
Any SNP or genetic map information is not necessary | ||
No prior wet lab work necessary: generation of a recombinant mapping population is not necessary | ||
Multiple alleles identified at once | ||
Amenable to scaling up: can be equally used for bigger genomes | ||
Fast: 7 days sequencing, 12 hr MAQGene alignment, and 1 hr mapping | ||
Modest sequence coverage requirements limit cost | ||
Reference genome sequence quality is not important and may not even be necessary | ||
Very straightforward without any specialized software | ||
Requirement | ||
Species must be amenable to mutagenesis and backcrossing |
8.
9.
Effects of GC Content and Mutational Pressure on the Lengths of Exons and Coding Sequences 总被引:6,自引:0,他引:6
It has been hypothesized that the length of an exon tends to increase with the GC content because stop codons are AT-rich and should occur less frequently in GC-rich exons. This prediction assumes that mutation pressure plays a significant role in the occurrence and distribution of stop codons. However, the prediction is applicable not to all exons, but only to the last coding exon of a gene and to single-exon CDS sequences. We classified exons in multiexon genes in eight eukaryotic species into three groups-the first exon, the internal, and the last exon-and computed the Spearman correlation between the exon length and the percentage GC (%GC) for each of the three groups. In only five of the species studied is the correlation for the last coding exon greater than that for the first or internal exons. For the single-exon CDS sequences, the correlation between CDS length and %GC is mostly negative. Thus, eukaryotic genomes do not support the predicted relationship between exon length and %GC. In prokaryotic genomes, CDS length and %GC are positively correlated in each of the 68 completely sequenced prokaryotic genomes in GenBank with genomic GC contents varying from 25 to 68%, except for the wall-less Mycoplasma genitalium and the syphilis pathogen Treponema pallidum. Moreover, the average CDS length and the genomic GC content are also positively correlated. After correcting for genome size, the partial correlation between the average CDS length and the genomic GC content is 0.3217 ( p < 0.025). 相似文献
10.
Behrang Barekatain Dariush Khezrimotlagh Mohd Aizaini Maarof Hamid Reza Ghaeini Shaharuddin Salleh Alfonso Ariza Quintana Behzad Akbari Alicia Trivi?o Cabrera 《PloS one》2013,8(8)
In recent years, Random Network Coding (RNC) has emerged as a promising solution for efficient Peer-to-Peer (P2P) video multicasting over the Internet. This probably refers to this fact that RNC noticeably increases the error resiliency and throughput of the network. However, high transmission overhead arising from sending large coefficients vector as header has been the most important challenge of the RNC. Moreover, due to employing the Gauss-Jordan elimination method, considerable computational complexity can be imposed on peers in decoding the encoded blocks and checking linear dependency among the coefficients vectors. In order to address these challenges, this study introduces MATIN which is a random network coding based framework for efficient P2P video streaming. The MATIN includes a novel coefficients matrix generation method so that there is no linear dependency in the generated coefficients matrix. Using the proposed framework, each peer encapsulates one instead of n coefficients entries into the generated encoded packet which results in very low transmission overhead. It is also possible to obtain the inverted coefficients matrix using a bit number of simple arithmetic operations. In this regard, peers sustain very low computational complexities. As a result, the MATIN permits random network coding to be more efficient in P2P video streaming systems. The results obtained from simulation using OMNET++ show that it substantially outperforms the RNC which uses the Gauss-Jordan elimination method by providing better video quality on peers in terms of the four important performance metrics including video distortion, dependency distortion, End-to-End delay and Initial Startup delay. 相似文献
11.
Due to the growth of interest in single-cell genomics, computational methods for distinguishing true variants from artifacts are highly desirable. While special attention has been paid to false positives in variant or mutation calling from single-cell sequencing data, an equally important but often neglected issue is that of false negatives derived from allele dropout during the amplification of single cell genomes. In this paper, we propose a simple strategy to reduce the false negatives in single-cell sequencing data analysis. Simulation results show that this method is highly reliable, with an error rate of 4.94×10-5, which is orders of magnitude lower than the expected false negative rate (~34%) estimated from a single-cell exome dataset, though the method is limited by the low SNP density in the human genome. We applied this method to analyze the exome data of a few dozen single tumor cells generated in previous studies, and extracted cell specific mutation information for a small set of sites. Interestingly, we found that there are difficulties in using the classical clonal model of tumor cell growth to explain the mutation patterns observed in some tumor cells. 相似文献
12.
13.
14.
15.
16.
Cloning, Sequencing, and Disruption of the Bacillus subtilis psd Gene Coding for Phosphatidylserine Decarboxylase 总被引:2,自引:0,他引:2
下载免费PDF全文

Kouji Matsumoto Masahiro Okada Yuko Horikoshi Hiroshi Matsuzaki Tsutomu Kishi Mitsuhiro Itaya Isao Shibuya 《Journal of bacteriology》1998,180(1):100-106
The psd gene of Bacillus subtilis Marburg, encoding phosphatidylserine decarboxylase, has been cloned and sequenced. It encodes a polypeptide of 263 amino acid residues (deduced molecular weight of 29,689) and is located just downstream of pss, the structural gene for phosphatidylserine synthase that catalyzes the preceding reaction in phosphatidylethanolamine synthesis (M. Okada, H. Matsuzaki, I. Shibuya, and K. Matsumoto, J. Bacteriol. 176:7456–7461, 1994). Introduction of a plasmid containing the psd gene into temperature-sensitive Escherichia coli psd-2 mutant cells allowed growth at otherwise restrictive temperature. Phosphatidylserine was not detected in the psd-2 mutant cells harboring the plasmid; it accumulated in the mutant up to 29% of the total phospholipids without the plasmid. An enzyme activity that catalyzes decarboxylation of 14C-labeled phosphatidylserine to form phosphatidylethanolamine was detected in E. coli psd-2 cells harboring a Bacillus psd plasmid. E. coli cells harboring the psd plasmid, the expression of which was under the control of the T710 promoter, produced proteins of 32 and 29 kDa upon induction. A pulse-labeling experiment suggested that the 32-kDa protein is the primary translation product and is processed into the 29-kDa protein. The psd gene, together with pss, was located by Southern hybridization to the 238- to 306-kb SfiI-NotI fragment of the chromosome. A B. subtilis strain harboring an interrupted psd allele, psd1::neo, was constructed. The null psd mutant contained no phosphatidylethanolamine and accumulated phosphatidylserine. It grew well without supplementation of divalent cations which are essential for the E. coli pssA null mutant lacking phosphatidylethanolamine. In both the B. subtilis null pss and psd mutants, glucosyldiacylglycerol content increased two- to fourfold. The results suggest that the lack of phosphatidylethanolamine in the B. subtilis membrane may be compensated for by the increases in the contents of glucosyldiacylglycerols by an unknown mechanism. 相似文献
17.
18.
利用含红霉素抗性基因和缺启动子-信号肽序列的氨苄青霉素抗性基因的双功能质粒pGPB14为探针载体,克隆了枯草杆菌的启动子-信号肽序列并对克隆的片段进行序列分析。枯草杆菌染色体DNA经Sau3A酶解后与BomHI酶切的质粒pGPB14连接,转化大肠杆菌C600,筛选抗氨苄青霉素及抗红霉素的转化子,从双抗性转化子中提取重组质粒并经酶切分析,显示克隆的DNA片段在0.27-1.5kb之间。用Sanger的双脱氧链终止法测定了10个克隆片段的DNA顺序,结果表明,克隆的片段都含有启动子、核糖体结合优点及信号肽序列。克隆片段可以在大肠杆菌和枯草杆菌中恢复氨苄青霉素抗性的表型。β-内酰胺酶活力测定结果证明:大肠杆菌的酶活力主要积累在周质空间内而枯草杆菌的酶活力主要分泌到胞外。 相似文献
19.
20.
采用RT-PCR和TAIL-PCR方法,首次对我国分离的巴泰病毒(YN92-4株)基因组的全编码区进行序列测定和分析。结果显示,YN92-4株病毒基因组由S、M、L三个片段组成,长度分别为947、4 371、6 860个核苷酸。其中,S片段基因编码由234个氨基酸残基组成的核衣壳蛋白和由102个氨基酸残基组成的非结构蛋白,M片段基因编码由1 435个氨基酸残基组成的前体蛋白,L片段基因编码由2 239个氨基酸残基组成的RNA聚合酶。与国外其它地区的巴泰病毒分离株进行基因组全编码区序列比较后发现,YN92-4株与日本牛血清分离株(ON-7/B/01株)在S、M片段核苷酸(氨基酸)的同源性最高,分别为97.7%(100%)和95.7%(98%);由于本研究首次开展对巴泰病毒L基因片段核苷酸序列的研究,因此国际基因库尚无可参考的序列信息,本研究比较了我国分离的巴泰病毒与同一血清组的代表病毒Bunyamwera病毒L片段的核苷酸和氨基酸序列同源性,分别为73.5%和81.6%。系统进化分析显示,YN92-4株基因组与其它巴泰病毒分离株在各自分支下形成独立分支。本研究提示我国分离的巴泰病毒YN92-4株未发生基因重配(... 相似文献