首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Paired-end sequencing is emerging as a key technique for assessing genome rearrangements and structural variation on a genome-wide scale. This technique is particularly useful for detecting copy-neutral rearrangements, such as inversions and translocations, which are common in cancer and can produce novel fusion genes. We address the question of how much sequencing is required to detect rearrangement breakpoints and to localize them precisely using both theoretical models and simulation. We derive a formula for the probability that a fusion gene exists in a cancer genome given a collection of paired-end sequences from this genome. We use this formula to compute fusion gene probabilities in several breast cancer samples, and we find that we are able to accurately predict fusion genes in these samples with a relatively small number of fragments of large size. We further demonstrate how the ability to detect fusion genes depends on the distribution of gene lengths, and we evaluate how different parameters of a sequencing strategy impact breakpoint detection, breakpoint localization, and fusion gene detection, even in the presence of errors that suggest false rearrangements. These results will be useful in calibrating future cancer sequencing efforts, particularly large-scale studies of many cancer genomes that are enabled by next-generation sequencing technologies.  相似文献   

2.
Theoretical and practical advances in genome halving   总被引:4,自引:0,他引:4  
MOTIVATION: Duplication of an organism's entire genome is a rare but spectacular event, enabling the rapid emergence of multiple new gene functions. Over time, the parallel linkage of duplicated genes across chromosomes may be disrupted by reciprocal translocations, while the intra-chromosomal order of genes may be shuffled by inversions and transpositions. Some duplicate genes may evolve unrecognizably or be deleted. As a consequence, the only detectable signature of an ancient duplication event in a modern genome may be the presence of various chromosomal segments containing parallel paralogous genes, with each segment appearing exactly twice in the genome. The problem of reconstructing the linkage structure of an ancestral genome before duplication is known as genome halving with unordered chromosomes. RESULTS: In this paper, we derive a new upper bound on the genome halving distance that is tighter than the best known, and a new lower bound that is almost always tighter than the best known. We also define the notion of genome halving diameter, and obtain both upper and lower bounds for it. Our tighter bounds on genome halving distance yield a new algorithm for reconstructing an ancestral duplicated genome. We create a software package GenomeHalving based on this new algorithm and test it on the yeast genome, identifying a sequence of translocations for halving the yeast genome that is shorter than previously conjectured possible.  相似文献   

3.
《Genomics》2022,114(2):110264
Cancer is one of the major causes of human death per year. In recent years, cancer identification and classification using machine learning have gained momentum due to the availability of high throughput sequencing data. Using RNA-seq, cancer research is blooming day by day and new insights of cancer and related treatments are coming into light. In this paper, we propose PanClassif, a method that requires a very few and effective genes to detect cancer from RNA-seq data and is able to provide performance gain in several wide range machine learning classifiers. We have taken 22 types of cancer samples from The Cancer Genome Atlas (TCGA) having 8287 cancer samples and 680 normal samples. Firstly, PanClassif uses k-Nearest Neighbour (k-NN) smoothing to smooth the samples to handle noise in the data. Then effective genes are selected by Anova based test. For balancing the train data, PanClassif applies an oversampling method, SMOTE. We have performed comprehensive experiments on the datasets using several classification algorithms. Experimental results shows that PanClassif outperform existing state-of-the-art methods available and shows consistent performance for two single cell RNA-seq datasets taken from Gene Expression Omnibus (GEO). PanClassif improves performances of a wide variety of classifiers for both binary cancer prediction and multi-class cancer classification. PanClassif is available as a python package (https://pypi.org/project/panclassif/). All the source code and materials of PanClassif are available at https://github.com/Zwei-inc/panclassif.  相似文献   

4.
The majority of constitutional reciprocal translocations appear to be unique rearrangements arising from independent events. However, a small number of translocations are recurrent, most significantly the t(11;22)(q23;q11). Among large series of translocations there may be multiple independently ascertained cases with the same cytogenetic breakpoints. Some of these could represent additional recurrent rearrangements, alternatively they could be identical by descent (IBD) or have subtly different breakpoints when examined under higher resolution. We have used molecular breakpoint mapping and haplotyping to determine the origin of three pairs of reciprocal constitutional translocations, each with the same cytogenetic breakpoints. FISH mapping showed one pair to have different breakpoints and thus to be distinct rearrangements. Another pair of translocations were IBD with identical breakpoint intervals and highly conserved haplotypes on the derived chromosomes. The third pair, t(4;11)(p16.2;p15.4), had the same breakpoint intervals by aCGH and fosmid mapping but had very different haplotypes, therefore they represent a novel recurrent translocation. Unlike the t(11;22)(q23;q11), the formation of the t(4;11)(p16.2;p15.4) may have involved segmental duplications and sequence homology at the breakpoints. Additional examples of recurrent translocations could be identified if the resources were available to study more translocations using the approaches described here. However, like the t(4;11)(p16.2;p15.4), such translocations are likely to be rare with the t(11;22) remaining the only common recurrent constitutional reciprocal translocation.  相似文献   

5.
Over the past few decades, the knowledge on genetic defects causing mental retardation has dramatically increased. In this review, we discuss the importance of balanced chromosomal translocations in the identification of genes responsible for mental retardation. We present a database-search guided overview of balanced translocations identified in patients with mental retardation. We divide those in four categories: (1) balanced translocations that helped to identify a causative gene within a contiguous gene syndrome, (2) balanced translocations that led to the identification of a mental retardation gene confirmed by independent methods, (3) balanced translocations disrupting candidate genes that have not been confirmed by independent methods and (4) balanced translocations not reported to disrupt protein coding sequences. It can safely be concluded that balanced translocations have been instrumental in the identification of multiple genes that are involved in mental retardation. In addition, many more candidate genes were identified with a suspected but (as yet?) unconfirmed role in mental retardation. Some balanced translocations do not disrupt a protein coding gene and it can be speculated that in the light of recent findings concerning ncRNA’s and ultra-conserved regions, such findings are worth further investigation as these potentially may lead us to the discovery of novel disease mechanisms.  相似文献   

6.
7.
Chromosomal translocations in cancer   总被引:1,自引:0,他引:1  
Genetic alterations in DNA can lead to cancer when it is present in proto-oncogenes, tumor suppressor genes, DNA repair genes etc. Examples of such alterations include deletions, inversions and chromosomal translocations. Among these rearrangements chromosomal translocations are considered as the primary cause for many cancers including lymphoma, leukemia and some solid tumors. Chromosomal translocations in certain cases can result either in the fusion of genes or in bringing genes close to enhancer or promoter elements, hence leading to their altered expression. Moreover, chromosomal translocations are used as diagnostic markers for cancer and its therapeutics. In the first part of this review, we summarize the well-studied chromosomal translocations in cancer. Although the mechanism of formation of most of these translocations is still unclear, in the second part we discuss the recent advances in this area of research.  相似文献   

8.
Identifying cancer driver genes and pathways among all somatic mutations detected in a cohort of tumors is a key challenge in cancer genomics. Traditionally, this is done by prioritizing genes according to the recurrence of alterations that they bear. However, this approach has some known limitations, such as the difficulty to correctly estimate the background mutation rate, and the fact that it cannot identify lowly recurrently mutated driver genes. Here we present a novel approach, Oncodrive-fm, to detect candidate cancer drivers which does not rely on recurrence. First, we hypothesized that any bias toward the accumulation of variants with high functional impact observed in a gene or group of genes may be an indication of positive selection and can thus be used to detect candidate driver genes or gene modules. Next, we developed a method to measure this bias (FM bias) and applied it to three datasets of tumor somatic variants. As a proof of concept of our hypothesis we show that most of the highly recurrent and well-known cancer genes exhibit a clear FM bias. Moreover, this novel approach avoids some known limitations of recurrence-based approaches, and can successfully identify lowly recurrent candidate cancer drivers.  相似文献   

9.
10.
A major challenge in interpreting the large volume of mutation data identified by next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations to facilitate the identification of targetable genes and new drugs. Current approaches are primarily based on mutation frequencies of single-genes, which lack the power to detect infrequently mutated driver genes and ignore functional interconnection and regulation among cancer genes. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. VarWalker fits generalized additive models for each sample based on sample-specific mutation profiles and builds on the joint frequency of both mutation genes and their close interactors. These interactors are selected and optimized using the Random Walk with Restart algorithm in a protein-protein interaction network. We applied the method in >300 tumor genomes in two large-scale NGS benchmark datasets: 183 lung adenocarcinoma samples and 121 melanoma samples. In each cancer, we derived a consensus mutation subnetwork containing significantly enriched consensus cancer genes and cancer-related functional pathways. These cancer-specific mutation networks were then validated using independent datasets for each cancer. Importantly, VarWalker prioritizes well-known, infrequently mutated genes, which are shown to interact with highly recurrently mutated genes yet have been ignored by conventional single-gene-based approaches. Utilizing VarWalker, we demonstrated that network-assisted approaches can be effectively adapted to facilitate the detection of cancer driver genes in NGS data.  相似文献   

11.
The cytogenetic evaluation of hematologic disease can confirm a diagnosis, determine treatment options, and provide prognostic information to the patient. Among the potential cytogenetic aberrations that can be identified are certain balanced translocations with recurrent breakpoints that provide disease classification and define the sites of disease-causing or disease-promoting genes. In this review, we discuss the importance of balanced translocation identification, the methods traditionally used to identify balanced translocations in the cytogenetics laboratory, and the application of new methodologies such as next generation (NextGen) sequencing and array-based translocation identification through a linear amplification application. These new technologies have the potential to identify all currently known diagnostically and prognostically important rearrangements as well as novel alterations that may provide new therapeutic targets to enhance treatment of hematologic disease.  相似文献   

12.
Despite the known relevance of genomic structural variants to pathogen behavior, cancer, development, and evolution, certain repeat based structural variants may evade detection by existing high-throughput techniques. Here, we present ruler arrays, a technique to detect genomic structural variants including insertions and deletions (indels), duplications, and translocations. A ruler array exploits DNA polymerase's processivity to detect physical distances between defined genomic sequences regardless of the intervening sequence. The method combines a sample preparation protocol, tiling genomic microarrays, and a new computational analysis. The analysis of ruler array data from two genomic samples enables the identification of structural variation between the samples. In an empirical test between two closely related haploid strains of yeast ruler arrays detected 78% of the structural variants larger than 100 bp.  相似文献   

13.
Chromosomal translocations are characteristic of hematopoietic neoplasias and can lead to unregulated oncogene expression or the fusion of genes to yield novel functions. In recent years, different lymphoma/leukemia-associated rearrangements have been detected in healthy individuals. In this study, we used inverse PCR to screen peripheral lymphocytes from 100 healthy individuals for the presence of MLL (Mixed Lineage Leukemia) translocations. Forty-nine percent of the probands showed MLL rearrangements. Sequence analysis showed that these rearrangements were specific for MLL translocations that corresponded to t(4;11)(q21;q23) (66%) and t(9;11) (20%). However, RT-PCR failed to detect any expression of t(4;11)(q21;q23) in our population. We suggest that 11q23 rearrangements in peripheral lymphocytes from normal individuals may result from exposure to endogenous or exogenous DNA-damaging agents. In practical terms, the high susceptibility of the MLL gene to chemically-induced damage suggests that monitoring the aberrations associated with this gene in peripheral lymphocytes may be a sensitive assay for assessing genomic instability in individuals exposed to genotoxic stress.  相似文献   

14.
Genomic aberrations recurrent in a particular cancer type can be important prognostic markers for tumor progression. Typically in early tumorigenesis, cells incur a breakdown of the DNA replication machinery that results in an accumulation of genomic aberrations in the form of duplications, deletions, translocations, and other genomic alterations. Microarray methods allow for finer mapping of these aberrations than has previously been possible; however, data processing and analysis methods have not taken full advantage of this higher resolution. Attention has primarily been given to analysis on the single sample level, where multiple adjacent probes are necessarily used as replicates for the local region containing their target sequences. However, regions of concordant aberration can be short enough to be detected by only one, or very few, array elements. We describe a method called Multiple Sample Analysis for assessing the significance of concordant genomic aberrations across multiple experiments that does not require a-priori definition of aberration calls for each sample. If there are multiple samples, representing a class, then by exploiting the replication across samples our method can detect concordant aberrations at much higher resolution than can be derived from current single sample approaches. Additionally, this method provides a meaningful approach to addressing population-based questions such as determining important regions for a cancer subtype of interest or determining regions of copy number variation in a population. Multiple Sample Analysis also provides single sample aberration calls in the locations of significant concordance, producing high resolution calls per sample, in concordant regions. The approach is demonstrated on a dataset representing a challenging but important resource: breast tumors that have been formalin-fixed, paraffin-embedded, archived, and subsequently UV-laser capture microdissected and hybridized to two-channel BAC arrays using an amplification protocol. We demonstrate the accurate detection on simulated data, and on real datasets involving known regions of aberration within subtypes of breast cancer at a resolution consistent with that of the array. Similarly, we apply our method to previously published datasets, including a 250K SNP array, and verify known results as well as detect novel regions of concordant aberration. The algorithm has been fully implemented and tested and is freely available as a Java application at http://www.cbil.upenn.edu/MSA.  相似文献   

15.
As next-generation sequencing continues to have an expanding presence in the clinic, the identification of the most cost-effective and robust strategy for identifying copy number changes and translocations in tumor genomes is needed. We hypothesized that performing shallow whole genome sequencing (WGS) of 900–1000-bp inserts (long insert WGS, LI-WGS) improves our ability to detect these events, compared with shallow WGS of 300–400-bp inserts. A priori analyses show that LI-WGS requires less sequencing compared with short insert WGS to achieve a target physical coverage, and that LI-WGS requires less sequence coverage to detect a heterozygous event with a power of 0.99. We thus developed an LI-WGS library preparation protocol based off of Illumina’s WGS library preparation protocol and illustrate the feasibility of performing LI-WGS. We additionally applied LI-WGS to three separate tumor/normal DNA pairs collected from patients diagnosed with different cancers to demonstrate our application of LI-WGS on actual patient samples for identification of somatic copy number alterations and translocations. With the evolution of sequencing technologies and bioinformatics analyses, we show that modifications to current approaches may improve our ability to interrogate cancer genomes.  相似文献   

16.
17.
ABSTRACT: BACKGROUND: The selection of the reference to scale the data in a copy number analysis has paramountimportance to achieve accurate estimates. Usually this reference is generated using controlsamples included in the study. However, these control samples are not always available and inthese cases, an artificial reference must be generated. A proper generation of this signal iscrucial in terms of both noise and bias.We propose NSA (Normality Search Algorithm), a scaling method that works with andwithout control samples. It is based on the assumption that genomic regions enriched in SNPswith identical copy numbers in both alleles are likely to be normal. These normal regions arepredicted for each sample individually and used to calculate the final reference signal. NSAcan be applied to any CN data regardless the microarray technology and preprocessingmethod. It also finds an optimal weighting of the samples minimizing possible batch effects. RESULTS: Five human datasets (a subset of HapMap samples, Glioblastoma Multiforme (GBM),Ovarian, Prostate and Lung Cancer experiments) have been analyzed. It is shown that usingonly tumoral samples, NSA is able to remove the bias in the copy number estimation, toreduce the noise and therefore, to increase the ability to detect copy number aberrations(CNAs). These improvements allow NSA to also detect recurrent aberrations more accuratelythan other state of the art methods. CONCLUSIONS: NSA provides a robust and accurate reference for scaling probe signals data to CN valueswithout the need of control samples. It minimizes the problems of bias, noise and batcheffects in the estimation of CNs. Therefore, NSA scaling approach helps to better detectrecurrent CNAs than current methods. The automatic selection of references makes it usefulto perform bulk analysis of many GEO or ArrayExpress experiments without the need ofdeveloping a parser to find the normal samples or possible batches within the data. Themethod is available in the open-source R package NSA, which is an add-on to the aroma.cnframework. http://www.aroma-project.org/addons.  相似文献   

18.
19.
A major challenge in cancer genomics is uncovering genes with an active role in tumorigenesis from a potentially large pool of mutated genes across patient samples. Here we focus on the interactions that proteins make with nucleic acids, small molecules, ions and peptides, and show that residues within proteins that are involved in these interactions are more frequently affected by mutations observed in large-scale cancer genomic data than are other residues. We leverage this observation to predict genes that play a functionally important role in cancers by introducing a computational pipeline (http://canbind.princeton.edu) for mapping large-scale cancer exome data across patients onto protein structures, and automatically extracting proteins with an enriched number of mutations affecting their nucleic acid, small molecule, ion or peptide binding sites. Using this computational approach, we show that many previously known genes implicated in cancers are enriched in mutations within the binding sites of their encoded proteins. By focusing on functionally relevant portions of proteins—specifically those known to be involved in molecular interactions—our approach is particularly well suited to detect infrequent mutations that may nonetheless be important in cancer, and should aid in expanding our functional understanding of the genomic landscape of cancer.  相似文献   

20.
PURPOSE: Relapsed/refractory pediatric cancers show poor prognosis; however, their genomic patterns remain unknown. To investigate the genetic mechanisms of tumor relapse and therapy resistance, we characterized genomic alterations in diagnostic and relapsed lesions in patients with relapsed/refractory pediatric solid tumors using targeted deep sequencing. PATIENTS AND METHODS: A targeted sequencing panel covering the exons of 381 cancer genes was used to characterize 19 paired diagnostic and relapsed samples from patients with relapsed/refractory pediatric solid tumors. RESULTS: The mean coverage for all samples was 930.6× (SD = 213.8). Among the 381 genes, 173 single nucleotide variations (SNVs)/insertion-deletions (InDels), 100 copy number alterations, and 1 structural variation were detected. A total of 72.6% of SNVs in primary tumors were also found in recurrent lesions, and 27.2% of SNVs in recurrent tumors had newly occurred. Among SNVs/InDels detected only in recurrent lesions, 71% had a low variant allele fraction (<10%). Patients were classified into three categories based on the mutation patterns after cancer treatment. A significant association between the major mutation patterns and clinical outcome was observed. Patients whose relapsed tumor had fewer mutations than the diagnostic sample tended to be older, had longer progression-free survival, and achieved complete remission after relapse. Contrastingly, patients whose genetic profile only had concordant mutations without any change had the worst outcome. CONCLUSIONS: We characterized genomic changes in recurrent pediatric solid tumors. These findings could help to understand the biology of relapsed childhood cancer and to develop personalized treatment based on their genetic profile.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号