首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

Matched sequencing of both tumor and normal tissue is routinely used to classify variants of uncertain significance (VUS) into somatic vs. germline. However, assays used in molecular diagnostics focus on known somatic alterations in cancer genes and often only sequence tumors. Therefore, an algorithm that reliably classifies variants would be helpful for retrospective exploratory analyses. Contamination of tumor samples with normal cells results in differences in expected allelic fractions of germline and somatic variants, which can be exploited to accurately infer genotypes after adjusting for local copy number. However, existing algorithms for determining tumor purity, ploidy and copy number are not designed for unmatched short read sequencing data.

Results

We describe a methodology and corresponding open source software for estimating tumor purity, copy number, loss of heterozygosity (LOH), and contamination, and for classification of single nucleotide variants (SNVs) by somatic status and clonality. This R package, PureCN, is optimized for targeted short read sequencing data, integrates well with standard somatic variant detection pipelines, and has support for matched and unmatched tumor samples. Accuracy is demonstrated on simulated data and on real whole exome sequencing data.

Conclusions

Our algorithm provides accurate estimates of tumor purity and ploidy, even if matched normal samples are not available. This in turn allows accurate classification of SNVs. The software is provided as open source (Artistic License 2.0) R/Bioconductor package PureCN (http://bioconductor.org/packages/PureCN/).
  相似文献   

2.

Background

The Ahringer C. elegans RNAi feeding library prepared by cloning genomic DNA fragments has been widely used in genome-wide analysis of gene function. However, the library has not been thoroughly validated by direct sequencing, and there are potential errors, including: 1) mis-annotation (the clone with the retired gene name should be remapped to the actual target gene); 2) nonspecific PCR amplification; 3) cross-RNAi; 4) mis-operation such as sample loading error, etc.

Results

Here we performed a reliability analysis on the Ahringer C. elegans RNAi feeding library, which contains 16,256 bacterial strains, using a bioinformatics approach. Results demonstrated that most (98.3%) of the bacterial strains in the library are reliable. However, we also found that 2,851 (17.54%) bacterial strains need to be re-annotated even they are reliable. Most of these bacterial strains are the clones having the retired gene names. Besides, 28 strains are grouped into unreliable category and 226 strains are marginal because of probably expressing unrelated double-stranded RNAs (dsRNAs). The accuracy of the prediction was further confirmed by direct sequencing analysis of 496 bacterial strains. Finally, a freely accessible database named CelRNAi (http://biocompute.bmi.ac.cn/CelRNAi/) was developed as a valuable complement resource for the feeding RNAi library by providing the predicted information on all bacterial strains. Moreover, submission of the direct sequencing result or any other annotations for the bacterial strains to the database are allowed and will be integrated into the CelRNAi database to improve the accuracy of the library. In addition, we provide five candidate primer sets for each of the unreliable and marginal bacterial strains for users to construct an alternative vector for their own RNAi studies.

Conclusions

Because of the potential unreliability of the Ahringer C. elegans RNAi feeding library, we strongly suggest the user examine the reliability information of the bacterial strains in the CelRNAi database before performing RNAi experiments, as well as the post-RNAi experiment analysis.
  相似文献   

3.
Background Polycystic kidney disease (PKD) is an autosomal recessive disorder resulting from mutations in the PKHD1 gene on chromosome 6 (6p12), a large gene spanning 470 kb of genomic DNA.ObjectiveThe aim of the present study was to report newly identified mutations in the PKHD1 gene in two Iranian families with PKD.Materials and Methods Genetic alterations of a 3-month-old boy and a 27-year-old girl with PKD were evaluated using whole-exome sequencing. The PCR direct sequencing was performed to analyse the co-segregation of the variants with the disease in the family. Finally, the molecular function of the identified novel mutations was evaluated by in silico study. ResultsIn the 3 month-old boy, a novel homozygous frameshift mutation was detected in the PKHD1 gene, which can cause PKD. Moreover, we identified three novel heterozygous missense mutations in ATIC, VPS13B, and TP53RK genes. In the 27-year-old woman, with two recurrent abortions history and two infant mortalities at early weeks due to metabolic and/or renal disease, we detected a novel missense mutation on PKHD1 gene and a novel mutation in ETFDH gene.Conclusion In general, we have identified two novel mutations in the PKHD1 gene. These molecular findings can help accurately correlate genotype and phenotype in families with such disease in order to reduce patient births through preoperative genetic diagnosis or better management of disorders.  相似文献   

4.

Background

Inherited cardiac conduction diseases (CCD) are rare but are caused by mutations in a myriad of genes. Recently, whole-exome sequencing has successfully led to the identification of causal mutations for rare monogenic Mendelian diseases.

Objective

To investigate the genetic background of a family affected by inherited CCD.

Methods and Results

We used whole-exome sequencing to study a Chinese family with multiple family members affected by CCD. Using the pedigree information, we proposed a heterozygous missense mutation (c.G695T, Gly232Val) in the lamin A/C (LMNA) gene as a candidate mutation for susceptibility to CCD in this family. The mutation is novel and is expected to affect the conformation of the coiled-coil rod domain of LMNA according to a structural model prediction. Its pathogenicity in lamina instability was further verified by expressing the mutation in a cellular model.

Conclusions

Our results suggest that whole-exome sequencing is a feasible approach to identifying the candidate genes underlying inherited conduction diseases.  相似文献   

5.
6.
New earthworm samples from Cyprus are assessed and discussed. A re-evaluation of specimens previously relegated to the Southern Alpine species Perelia nematogena (Rosa, 1903), revealed two independent species: Perelia phoebea (Cognetti, 1913 Cognetti, L. (1913): Escursioni zoologiche del Dr. E. Festa nell’Isola di Rodi V. Oligocheti. Bollettino dei Musei di zoologia ed anatomia comparata della R. Università di Torino, 28, 16. [Google Scholar]) ,described originally from Rhodes Island, (Greece) and an undescribed species Perelia makrisi sp. n. The new species is similar also to the Levantine Pe. galileana Csuzdi &; Pavlí?ek, 2005 and corroborates the hypotheses that the autochthonous earthworm fauna of Cyprus is of Levantine origin.

http://www.zoobank.org/urn:lsid:zoobank.org:pub:FD1996DC-2FFC-42D5-A1D2-005B50E6FC64  相似文献   

7.
Two new harvestmen species of the family Phalangiidae, Rilaena caucasica sp. n. and Rilaena silhavyi sp. n. are diagnosed, illustrated, and described from the Caucasus region. Comparative illustration of the related Rilaena anatolica (Roewer, 1956), R. atrolutea (Roewer, 1915) and R. kelbajarica Snegovaya &; Pkhakadze, 2014 Snegovaya, N. Y., &; Pkhakadze, V. D. (2014): New species of the genus Rilaena (Opiliones, Phalangiidae) from the mount Gyamish, Azerbaijan. Vestnik zoologii, 48, 313318. doi: 10.2478/vzoo-2014-0037[Crossref] [Google Scholar] are given.

http://www.zoobank.org/urn:lsid:zoobank.org:pub:7B29FD94-45A2-4E32-A41E-3276E016410B  相似文献   

8.

Background

Double minute chromosomes are circular fragments of DNA whose presence is associated with the onset of certain cancers. Double minutes are lethal, as they are highly amplified and typically contain oncogenes. Locating double minutes can supplement the process of cancer diagnosis, and it can help to identify therapeutic targets. However, there is currently a dearth of computational methods available to identify double minutes. We propose a computational framework for the idenfication of double minute chromosomes using next-generation sequencing data. Our framework integrates predictions from algorithms that detect DNA copy number variants, and it also integrates predictions from algorithms that locate genomic structural variants. This information is used by a graph-based algorithm to predict the presence of double minute chromosomes.

Results

Using a previously published copy number variant algorithm and two structural variation prediction algorithms, we implemented our framework and tested it on a dataset consisting of simulated double minute chromosomes. Our approach uncovered double minutes with high accuracy, demonstrating its plausibility.

Conclusions

Although we only tested the framework with three programs (RDXplorer, BreakDancer, Delly), it can be extended to incorporate results from programs that 1) detect amplified copy number and from programs that 2) detect genomic structural variants like deletions, translocations, inversions, and tandem repeats.The software that implements the framework can be accessed here: https://github.com/mhayes20/DMFinder
  相似文献   

9.
Whole-genome sequencing of tumor tissue has the potential to provide comprehensive characterization of genomic alterations in tumor samples. We present Patchwork, a new bioinformatic tool for allele-specific copy number analysis using whole-genome sequencing data. Patchwork can be used to determine the copy number of homologous sequences throughout the genome, even in aneuploid samples with moderate sequence coverage and tumor cell content. No prior knowledge of average ploidy or tumor cell content is required. Patchwork is freely available as an R package, installable via R-Forge (http://patchwork.r-forge.r-project.org/).  相似文献   

10.
The Glacidorbidae, a family restricted to the Gondwanan realm (Tasmania, southeastern and southwestern Australia, and southern Argentina and Chile), previously included five genera with 20 identified species; 19 of them are Australian, with one genus and species, Gondwanorbis magallanicus (Meier-Brook & Smith, 1976 Meier-Brook, K. & Smith, B.J. (1976) Glacidorbis Iredale, 1943, a genus of freshwater prosobranchs with a Tasmanian-Southeast Australian-South Andean distribution. Archive für Molluskenkunde 106, 191198. [Google Scholar]), from South America. Here we describe two new species of Gondwanorbis: Gondwanorbis fueguensis n. sp. from the freshwater gastropods province of Southern Patagonia (Argentina) and Gondwanorbis tricarinatus n. sp. from Chile, and a new genus and species from the freshwater gastropods province of northern Patagonia (Argentina), Patagonorbis nahuelhuapensis n. sp and n. gen.

http://www./zoobank.org/urn:lsid:zoobank.org:pub:62EA0972-3AEF-4188-8E6D-F10895CE2BEF  相似文献   

11.

Background  

There is considerable interest in the development of methods to efficiently identify all coding variants present in large sample sets of humans. There are three approaches possible: whole-genome sequencing, whole-exome sequencing using exon capture methods, and RNA-Seq. While whole-genome sequencing is the most complete, it remains sufficiently expensive that cost effective alternatives are important.  相似文献   

12.

Background

Genomic deletions, inversions, and other rearrangements known collectively as structural variations (SVs) are implicated in many human disorders. Technologies for sequencing DNA provide a potentially rich source of information in which to detect breakpoints of structural variations at base-pair resolution. However, accurate prediction of SVs remains challenging, and existing informatics tools predict rearrangements with significant rates of false positives or negatives.

Results

To address this challenge, we developed ‘Structural Variation detection by STAck and Tail’ (SV-STAT) which implements a novel scoring metric. The software uses this statistic to quantify evidence for structural variation in genomic regions suspected of harboring rearrangements. To demonstrate SV-STAT, we used targeted and genome-wide approaches. First, we applied a custom capture array followed by Roche/454 and SV-STAT to three pediatric B-lineage acute lymphoblastic leukemias, identifying five structural variations joining known and novel breakpoint regions. Next, we detected SVs genome-wide in paired-end Illumina data collected from additional tumor samples. SV-STAT showed predictive accuracy as high as or higher than leading alternatives. The software is freely available under the terms of the GNU General Public License version 3 at https://gitorious.org/svstat/svstat.

Conclusions

SV-STAT works across multiple sequencing chemistries, paired and single-end technologies, targeted or whole-genome strategies, and it complements existing SV-detection software. The method is a significant advance towards accurate detection and genotyping of genomic rearrangements from DNA sequencing data.
  相似文献   

13.
MALINA is a web service for bioinformatic analysis of whole-genome metagenomic data obtained from human gut microbiota sequencing. As input data, it accepts metagenomic reads of various sequencing technologies, including long reads (such as Sanger and 454 sequencing) and next-generation (including SOLiD and Illumina). It is the first metagenomic web service that is capable of processing SOLiD color-space reads, to authors’ knowledge. The web service allows phylogenetic and functional profiling of metagenomic samples using coverage depth resulting from the alignment of the reads to the catalogue of reference sequences which are built into the pipeline and contain prevalent microbial genomes and genes of human gut microbiota. The obtained metagenomic composition vectors are processed by the statistical analysis and visualization module containing methods for clustering, dimension reduction and group comparison. Additionally, the MALINA database includes vectors of bacterial and functional composition for human gut microbiota samples from a large number of existing studies allowing their comparative analysis together with user samples, namely datasets from Russian Metagenome project, MetaHIT and Human Microbiome Project (downloaded fromhttp://hmpdacc.org). MALINA is made freely available on the web athttp://malina.metagenome.ru. The website is implemented in JavaScript (using Ext JS), Microsoft .NET Framework, MS SQL, Python, with all major browsers supported.  相似文献   

14.
Somatic mosaicism refers to the existence of somatic mutations in a fraction of somatic cells in a single biological sample. Its importance has mainly been discussed in theory although experimental work has started to emerge linking somatic mosaicism to disease diagnosis. Through novel statistical modeling of paired-end DNA-sequencing data using blood-derived DNA from healthy donors as well as DNA from tumor samples, we present an ultra-fast computational pipeline, LocHap that searches for multiple single nucleotide variants (SNVs) that are scaffolded by the same reads. We refer to scaffolded SNVs as local haplotypes (LH). When an LH exhibits more than two genotypes, we call it a local haplotype variant (LHV). The presence of LHVs is considered evidence of somatic mosaicism because a genetically homogeneous cell population will not harbor LHVs. Applying LocHap to whole-genome and whole-exome sequence data in DNA from normal blood and tumor samples, we find wide-spread LHVs across the genome. Importantly, we find more LHVs in tumor samples than in normal samples, and more in older adults than in younger ones. We confirm the existence of LHVs and somatic mosaicism by validation studies in normal blood samples. LocHap is publicly available at http://www.compgenome.org/lochap.  相似文献   

15.

Background

The processing and analysis of the large scale data generated by next-generation sequencing (NGS) experiments is challenging and is a burgeoning area of new methods development. Several new bioinformatics tools have been developed for calling sequence variants from NGS data. Here, we validate the variant calling of these tools and compare their relative accuracy to determine which data processing pipeline is optimal.

Results

We developed a unified pipeline for processing NGS data that encompasses four modules: mapping, filtering, realignment and recalibration, and variant calling. We processed 130 subjects from an ongoing whole exome sequencing study through this pipeline. To evaluate the accuracy of each module, we conducted a series of comparisons between the single nucleotide variant (SNV) calls from the NGS data and either gold-standard Sanger sequencing on a total of 700 variants or array genotyping data on a total of 9,935 single-nucleotide polymorphisms. A head to head comparison showed that Genome Analysis Toolkit (GATK) provided more accurate calls than SAMtools (positive predictive value of 92.55% vs. 80.35%, respectively). Realignment of mapped reads and recalibration of base quality scores before SNV calling proved to be crucial to accurate variant calling. GATK HaplotypeCaller algorithm for variant calling outperformed the UnifiedGenotype algorithm. We also showed a relationship between mapping quality, read depth and allele balance, and SNV call accuracy. However, if best practices are used in data processing, then additional filtering based on these metrics provides little gains and accuracies of >99% are achievable.

Conclusions

Our findings will help to determine the best approach for processing NGS data to confidently call variants for downstream analyses. To enable others to implement and replicate our results, all of our codes are freely available at http://metamoodics.org/wes.
  相似文献   

16.
Tyrosine phosphorylation is rare, representing only about 0.5% of phosphorylations in the cell under basal conditions. While mitogenic tyrosine kinase signaling has been extensively explored, the role of phosphotyrosine signaling across the cell cycle and in particular during mitosis is poorly understood.

Two recent, independent studies tackled this question from different angles to reveal exciting new insights into the role of this modification during cell division. Caron et al.1 Caron D, Byrne DP, Thebault P, Soulet D, Landry CR, Eyers PA, Elowe S. Mitotic phosphotyrosine network analysis reveals that tyrosine phosphorylation regulates Polo-like kinase 1 (PLK1). Sci Signal 2016; 9:rs14; PMID:27965426; http://dx.doi.org/10.1126/scisignal.aah3525[Crossref], [PubMed], [Web of Science ®] [Google Scholar] exploited mitotic phosphoproteomics data sets to determine the extent of mitotic tyrosine phosphorylation, and St-Denis et al.2 St-Denis N, Gupta GD, Lin ZY, Gonzalez-Badillo B, Veri AO, Knight JD, Rajendran D, Couzens AL, Currie KW, Tkach JM, et al. Phenotypic and interaction profiling of the human phosphatases identifies diverse mitotic regulators. Cell Rep 2016; 17:2488-501; PMID:27880917; http://dx.doi.org/10.1016/j.celrep.2016.10.078[Crossref], [PubMed], [Web of Science ®] [Google Scholar] identified protein tyrosine phosphatases from all subfamilies as regulators of mitotic progression or spindle formation. These studied collectively revealed that tyrosine phosphorylation may play a more prominent and active role in mitotic progression than previously appreciated.  相似文献   


17.
West syndrome, which is narrowly defined as infantile spasms that occur in clusters and hypsarrhythmia on EEG, is the most common early-onset epileptic encephalopathy (EOEE). Patients with West syndrome may have clear etiologies, including perinatal events, infections, gross chromosomal abnormalities, or cases followed by other EOEEs. However, the genetic etiology of most cases of West syndrome remains unexplained. DNA from 18 patients with unexplained West syndrome was subjected to microarray-based comparative genomic hybridization (array CGH), followed by trio-based whole-exome sequencing in 14 unsolved families. We identified candidate pathogenic variants in 50 % of the patients (n = 9/18). The array CGH revealed candidate pathogenic copy number variations in four cases (22 %, 4/18), including an Xq28 duplication, a 16p11.2 deletion, a 16p13.1 deletion and a 19p13.2 deletion disrupting CACNA1A. Whole-exome sequencing identified candidate mutations in known epilepsy genes in five cases (36 %, 5/14). Three candidate de novo mutations were identified in three cases, with two mutations occurring in two new candidate genes (NR2F1 and CACNA2D1) (21 %, 3/14). Hemizygous candidate mutations in ALG13 and BRWD3 were identified in the other two cases (14 %, 2/14). Evaluating a panel of 67 known EOEE genes failed to identify significant mutations. Despite the heterogeneity of unexplained West syndrome, the combination of array CGH and whole-exome sequencing is an effective means of evaluating the genetic background in unexplained West syndrome. We provide additional evidence for NR2F1 as a causative gene and for CACNA2D1 and BRWD3 as candidate genes for West syndrome.  相似文献   

18.
Whole-genome sequencing (WGS) of organisms displaying a specific mutant phenotype is a powerful approach to identify the genetic determinants of a plethora of biological processes. We have previously validated the feasibility of this approach by identifying a point-mutated locus responsible for a specific phenotype, observed in an ethyl methanesulfonate (EMS)-mutagenized Caenorhabditis elegans strain. Here we describe the genome-wide mutational profile of 17 EMS-mutagenized genomes as assessed with a bioinformatic pipeline, called MAQGene. Surprisingly, we find that while outcrossing mutagenized strains does reduce the total number of mutations, a striking mutational load is still observed even in outcrossed strains. Such genetic complexity has to be taken into account when establishing a causative relationship between genotype and phenotype. Even though unintentional, the 17 sequenced strains described here provide a resource of allelic variants in almost 1000 genes, including 62 premature stop codons, which represent candidate knockout alleles that will be of further use for the C. elegans community to study gene function.INDUCING molecular lesions in a genome is an effective approach to interrogate the genome for its functional elements. Molecular lesions can be induced using a variety of methods. Because of their efficiency and their ability to generate alleles with various different alterations in gene activity (e.g., amorphic, antimorphic, hypomorphic, and hypermorphic), chemical mutagens, such as ethyl methanesulfonate (EMS), are frequently used in genetic mutant screens (Anderson 1995). However, due to mutagen efficiency, a mutant animal selected for a single-locus phenotype invariably contains EMS-induced “background mutations” in its genome. Experimenters try to minimize the potential impact of background mutations through outcrossing to animals with a wild-type genome. Yet no full snapshots of genome sequences right after EMS mutagenesis and after outcrossing have so far been provided to illustrate the extent of background mutations and the extent to which they can indeed be eliminated.Another caveat of using base-changing chemical mutagens is the relative difficulty associated with identifying the phenotype-causing molecular lesion. In multicellular genetic model organisms, mutant identification involves time-consuming positional cloning approaches, usually involving breeding with genetically marked strains that allow pinpointing of the location of a molecular lesion. Even with rapid, SNP-based mapping approaches in animals with short generation times, such as Caenorhabditis elegans, substantial time hurdles, particularly in the final, fine-mapping stages, still exist. Conceptually similar problems in defining the location of a molecular lesion are encountered by human geneticists who attempt to identify disease-causing genetic lesions.Whole-genome sequencing (WGS) is beginning to emerge as an efficient and cost-effective tool to shortcut time-consuming mapping and positional cloning efforts (Hobert 2010). The sequencing of an entire genome and its ensuing comparison to a wild-type reference genome can potentially directly pinpoint the molecular lesion that results in the mutant phenotype the animal has been selected for. Proof-of-concept studies in bacteria, yeast, plants, worms, and flies have validated the applicability of this approach (Sarin et al. 2008; Smith et al. 2008; Srivatsan et al. 2008; Blumenstiel et al. 2009; Irvine et al. 2009; Flowers et al. 2010).Present-day deep sequencing platforms used for WGS generate relatively short sequence reads, thereby posing the bioinformatic challenge to align those reads to a reference genome. We previously described a software pipeline, MAQGene, which is based on the standard alignment program MAQ (Li et al. 2008) and facilitates this bioinformatic step by providing the end user with an extensively curated list of sequence variants from a WGS run of a mutated genome compared to a reference genome (Bigelow et al. 2009). This pipeline can be used for well-annotated, assembled genomes, such as C. elegans or Drosophila. In this article, we describe that this pipeline can identify not only point mutations but also deletions. We then use this pipeline to analyze a total of 17 EMS-mutagenized genomes. We find that EMS-mutagenized genomes carry a significant mutational load including presumptive loss-of-function alleles in several protein-coding genes that can lead to synthetic genetic interactions, one of which we describe here in more detail. We show that outcrossing to wild-type animals can lighten the mutational load; however, a substantial number of sequence variants are also introduced during outcrossing. Even though background mutations uncovered by WGS may complicate the interpretation of mutant phenotypes, they do provide a potentially useful source for functional studies of the affected genes.  相似文献   

19.
20.

Background

The recent availability of whole-exome sequencing has opened new possibilities for the evaluation of individuals with genetically undiagnosed intellectual disability.

Results

We report two affected siblings, offspring of first-cousin parents, with intellectual disability, hypotonia, short stature, growth hormone deficiency, and delayed bone age. All members of the nuclear family were genotyped, and exome sequencing was performed in one of the affected individuals. We used an in-house algorithm (CATCH v1.1) that combines homozygosity mapping with exome sequencing results and provides a list of candidate variants. One identified novel homozygous missense variant in KALRN (NM_003947.4:c.3644C>A: p.(Thr1215Lys)) was predicted to be pathogenic by all pathogenicity prediction software used (SIFT, PolyPhen, Mutation Taster). KALRN encodes the protein kalirin, which is a GTP-exchange factor protein with a reported role in cytoskeletal remodeling and dendritic spine formation in neurons. It is known that mice with ablation of Kalrn exhibit age-dependent functional deficits and behavioral phenotypes.

Conclusion

Exome sequencing provided initial evidence linking KALRN to monogenic intellectual disability in man, and we propose that KALRN is the causative gene for the autosomal recessive phenotype in this family.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号