期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Latent Factor Analysis to Discover Pathway-Associated Putative Segmental Aneuploidies in Human Cancers

Joseph E. Lucas Hsiu-Ni Kung Jen-Tsan A. Chi 《PLoS computational biology》2010,6(9)

Tumor microenvironmental stresses, such as hypoxia and lactic acidosis, play important roles in tumor progression. Although gene signatures reflecting the influence of these stresses are powerful approaches to link expression with phenotypes, they do not fully reflect the complexity of human cancers. Here, we describe the use of latent factor models to further dissect the stress gene signatures in a breast cancer expression dataset. The genes in these latent factors are coordinately expressed in tumors and depict distinct, interacting components of the biological processes. The genes in several latent factors are highly enriched in chromosomal locations. When these factors are analyzed in independent datasets with gene expression and array CGH data, the expression values of these factors are highly correlated with copy number alterations (CNAs) of the corresponding BAC clones in both the cell lines and tumors. Therefore, variation in the expression of these pathway-associated factors is at least partially caused by variation in gene dosage and CNAs among breast cancers. We have also found the expression of two latent factors without any chromosomal enrichment is highly associated with 12q CNA, likely an instance of “trans”-variations in which CNA leads to the variations in gene expression outside of the CNA region. In addition, we have found that factor 26 (1q CNA) is negatively correlated with HIF-1α protein and hypoxia pathways in breast tumors and cell lines. This agrees with, and for the first time links, known good prognosis associated with both a low hypoxia signature and the presence of CNA in this region. Taken together, these results suggest the possibility that tumor segmental aneuploidy makes significant contributions to variation in the lactic acidosis/hypoxia gene signatures in human cancers and demonstrate that latent factor analysis is a powerful means to uncover such a linkage. 相似文献

2.

Genetic Background May Contribute to PAM50 Gene Expression Breast Cancer Subtype Assignments

Ying Hu Ling Bai Thomas Geiger Natalie Goldberger Renard C. Walker Jeffery E. Green Lalage M. Wakefield Kent W. Hunter 《PloS one》2013,8(8)

相似文献

3.

De novo mutational signature discovery in tumor genomes using SparseSignatures

Avantika Lal Keli Liu Robert Tibshirani Arend Sidow Daniele Ramazzotti 《PLoS computational biology》2021,17(6)

Cancer is the result of mutagenic processes that can be inferred from tumor genomes by analyzing rate spectra of point mutations, or “mutational signatures”. Here we present SparseSignatures, a novel framework to extract signatures from somatic point mutation data. Our approach incorporates a user-specified background signature, employs regularization to reduce noise in non-background signatures, uses cross-validation to identify the number of signatures, and is scalable to large datasets. We show that SparseSignatures outperforms current state-of-the-art methods on simulated data using a variety of standard metrics. We then apply SparseSignatures to whole genome sequences of pancreatic and breast tumors, discovering well-differentiated signatures that are linked to known mutagenic mechanisms and are strongly associated with patient clinical features. 相似文献

4.

Genetic and Epigenetic Somatic Alterations in Head and Neck Squamous Cell Carcinomas Are Globally Coordinated but Not Locally Targeted

Graham M. Poage Brock C. Christensen E. Andres Houseman Michael D. McClean John K. Wiencke Marshall R. Posner John R. Clark Heather H. Nelson Carmen J. Marsit Karl T. Kelsey 《PloS one》2010,5(3)

Background

Solid tumors, including head and neck squamous cell carcinomas (HNSCC), arise as a result of genetic and epigenetic alterations in a sustained stress environment. Little work has been done that simultaneously examines the spectrum of both types of changes in human tumors on a genome-wide scale and results so far have been limited and mixed. Since it has been hypothesized that epigenetic alterations may act by providing the second carcinogenic hit in gene silencing, we sought to identify genome-wide DNA copy number alterations and CpG dinucleotide methylation events and examine the global/local relationships between these types of alterations in HNSCC.

Methodology/Principal Findings

We have extended a prior analysis of 1,413 cancer-associated loci for epigenetic changes in HNSCC by integrating DNA copy number alterations, measured at 500,000 polymorphic loci, in a case series of 19 primary HNSCC tumors. We have previously demonstrated that local copy number does not bias methylation measurements in this array platform. Importantly, we found that the global pattern of copy number alterations in these tumors was significantly associated with tumor methylation profiles (p<0.002). However at the local level, gene promoter regions did not exhibit a correlation between copy number and methylation (lowest q = 0.3), and the spectrum of genes affected by each type of alteration was unique.

Conclusion/Significance

This work, using a novel and robust statistical approach demonstrates that, although a “second hit” mechanism is not likely the predominant mode of action for epigenetic dysregulation in cancer, the patterns of methylation events are associated with the patterns of allele loss. Our work further highlights the utility of integrative genomics approaches in exploring the driving somatic alterations in solid tumors. 相似文献

5.

A Simple Model-Based Approach to Inferring and Visualizing Cancer Mutation Signatures

Yuichi Shiraishi Georg Tremmel Satoru Miyano Matthew Stephens 《PLoS genetics》2015,11(12)

相似文献

6.

Digital Genotyping of Macrosatellites and Multicopy Genes Reveals Novel Biological Functions Associated with Copy Number Variation of Large Tandem Repeats

Manisha Brahmachary Audrey Guilmatre Javier Quilez Dan Hasson Christelle Borel Peter Warburton Andrew J. Sharp 《PLoS genetics》2014,10(6)

Tandem repeats are common in eukaryotic genomes, but due to difficulties in assaying them remain poorly studied. Here, we demonstrate the utility of Nanostring technology as a targeted approach to perform accurate measurement of tandem repeats even at extremely high copy number, and apply this technology to genotype 165 HapMap samples from three different populations and five species of non-human primates. We observed extreme variability in copy number of tandemly repeated genes, with many loci showing 5–10 fold variation in copy number among humans. Many of these loci show hallmarks of genome assembly errors, and the true copy number of many large tandem repeats is significantly under-represented even in the high quality ‘finished’ human reference assembly. Importantly, we demonstrate that most large tandem repeat variations are not tagged by nearby SNPs, and are therefore essentially invisible to SNP-based GWAS approaches. Using association analysis we identify many cis correlations of large tandem repeat variants with nearby gene expression and DNA methylation levels, indicating that variations of tandem repeat length are associated with functional effects on the local genomic environment. This includes an example where expansion of a macrosatellite repeat is associated with increased DNA methylation and suppression of nearby gene expression, suggesting a mechanism termed “repeat induced gene silencing”, which has previously been observed only in transgenic organisms. We also observed multiple signatures consistent with altered selective pressures at tandemly repeated loci, suggesting important biological functions. Our studies show that tandemly repeated loci represent a highly variable fraction of the genome that have been systematically ignored by most previous studies, copy number variation of which can exert functionally significant effects. We suggest that future studies of tandem repeat loci will lead to many novel insights into their role in modulating both genomic and phenotypic diversity. 相似文献

7.

Network Signatures of Survival in Glioblastoma Multiforme

Vishal N. Patel Giridharan Gokulrangan Salim A. Chowdhury Yanwen Chen Andrew E. Sloan Mehmet Koyutürk Jill Barnholtz-Sloan Mark R. Chance 《PLoS computational biology》2013,9(9)

To determine a molecular basis for prognostic differences in glioblastoma multiforme (GBM), we employed a combinatorial network analysis framework to exhaustively search for molecular patterns in protein-protein interaction (PPI) networks. We identified a dysregulated molecular signature distinguishing short-term (survival<225 days) from long-term (survival>635 days) survivors of GBM using whole genome expression data from The Cancer Genome Atlas (TCGA). A 50-gene subnetwork signature achieved 80% prediction accuracy when tested against an independent gene expression dataset. Functional annotations for the subnetwork signature included “protein kinase cascade,” “IκB kinase/NFκB cascade,” and “regulation of programmed cell death” – all of which were not significant in signatures of existing subtypes. Finally, we used label-free proteomics to examine how our subnetwork signature predicted protein level expression differences in an independent GBM cohort of 16 patients. We found that the genes discovered using network biology had a higher probability of dysregulated protein expression than either genes exhibiting individual differential expression or genes derived from known GBM subtypes. In particular, the long-term survivor subtype was characterized by increased protein expression of DNM1 and MAPK1 and decreased expression of HSPA9, PSMD3, and CANX. Overall, we demonstrate that the combinatorial analysis of gene expression data constrained by PPIs outlines an approach for the discovery of robust and translatable molecular signatures in GBM. 相似文献

8.

Whole-genome reconstruction and mutational signatures in gastric cancer

《Genome biology》2012,13(12):R115

Background

Gastric cancer is the second highest cause of global cancer mortality. To explore the complete repertoire of somatic alterations in gastric cancer, we combined massively parallel short read and DNA paired-end tag sequencing to present the first whole-genome analysis of two gastric adenocarcinomas, one with chromosomal instability and the other with microsatellite instability.

Results

Integrative analysis and de novo assemblies revealed the architecture of a wild-type KRAS amplification, a common driver event in gastric cancer. We discovered three distinct mutational signatures in gastric cancer - against a genome-wide backdrop of oxidative and microsatellite instability-related mutational signatures, we identified the first exome-specific mutational signature. Further characterization of the impact of these signatures by combining sequencing data from 40 complete gastric cancer exomes and targeted screening of an additional 94 independent gastric tumors uncovered ACVR2A, RPL22 and LMAN1 as recurrently mutated genes in microsatellite instability-positive gastric cancer and PAPPA as a recurrently mutated gene in TP53 wild-type gastric cancer.

Conclusions

These results highlight how whole-genome cancer sequencing can uncover information relevant to tissue-specific carcinogenesis that would otherwise be missed from exome-sequencing data. 相似文献

9.

Mutation Discovery in Regions of Segmental Cancer Genome Amplifications with CoNAn-SNV: A Mixture Model for Next Generation Sequencing of Tumors

A Crisan R Goya G Ha J Ding LM Prentice A Oloumi J Senz T Zeng K Tse A Delaney MA Marra DG Huntsman M Hirst S Aparicio S Shah 《PloS one》2012,7(8):e41551

Next generation sequencing has now enabled a cost-effective enumeration of the full mutational complement of a tumor genome-in particular single nucleotide variants (SNVs). Most current computational and statistical models for analyzing next generation sequencing data, however, do not account for cancer-specific biological properties, including somatic segmental copy number alterations (CNAs)-which require special treatment of the data. Here we present CoNAn-SNV (Copy Number Annotated SNV): a novel algorithm for the inference of single nucleotide variants (SNVs) that overlap copy number alterations. The method is based on modelling the notion that genomic regions of segmental duplication and amplification induce an extended genotype space where a subset of genotypes will exhibit heavily skewed allelic distributions in SNVs (and therefore render them undetectable by methods that assume diploidy). We introduce the concept of modelling allelic counts from sequencing data using a panel of Binomial mixture models where the number of mixtures for a given locus in the genome is informed by a discrete copy number state given as input. We applied CoNAn-SNV to a previously published whole genome shotgun data set obtained from a lobular breast cancer and show that it is able to discover 21 experimentally revalidated somatic non-synonymous mutations in a lobular breast cancer genome that were not detected using copy number insensitive SNV detection algorithms. Importantly, ROC analysis shows that the increased sensitivity of CoNAn-SNV does not result in disproportionate loss of specificity. This was also supported by analysis of a recently published lymphoma genome with a relatively quiescent karyotype, where CoNAn-SNV showed similar results to other callers except in regions of copy number gain where increased sensitivity was conferred. Our results indicate that in genomically unstable tumors, copy number annotation for SNV detection will be critical to fully characterize the mutational landscape of cancer genomes. 相似文献

10.

The mutational burden of acral melanoma revealed by whole‐genome sequencing and comparative analysis

Gordon Stamp J. Meirion Thomas Andrew Hayes Dirk Strauss Mike Gavrielides Wei Xing Martin Gore James Larkin Richard Marais 《Pigment cell & melanoma research》2014,27(5):835-838

Acral melanoma is a subtype of melanoma with distinct epidemiological, clinical and mutational profiles. To define the genomic alterations in acral melanoma, we conducted whole‐genome sequencing and SNP array analysis of five metastatic tumours and their matched normal genomes. We identified the somatic mutations, copy number alterations and structural variants in these tumours and combined our data with published studies to identify recurrently mutated genes likely to be the drivers of acral melanomagenesis. We compared and contrasted the genomic landscapes of acral, mucosal, uveal and common cutaneous melanoma to reveal the distinctive mutational characteristics of each subtype. 相似文献

11.

Integrated Multiple “-omics” Data Reveal Subtypes of Hepatocellular Carcinoma

Gang Liu Chuanpeng Dong Lei Liu 《PloS one》2016,11(11)

Hepatocellular carcinoma is one of the most heterogeneous cancers, as reflected by its multiple grades and difficulty to subtype. In this study, we integrated copy number variation, DNA methylation, mRNA, and miRNA data with the developed “cluster of cluster” method and classified 256 HCC samples from TCGA (The Cancer Genome Atlas) into five major subgroups (S1-S5). We observed that this classification was associated with specific mutations and protein expression, and we detected that each subgroup had distinct molecular signatures. The subclasses were associated not only with survival but also with clinical observations. S1 was characterized by bulk amplification on 8q24, TP53 mutation, low lipid metabolism, highly expressed onco-proteins, attenuated tumor suppressor proteins and a worse survival rate. S2 and S3 were characterized by telomere hypomethylation and a low expression of TERT and DNMT1/3B. Compared to S2, S3 was associated with less copy number variation and some good prognosis biomarkers, including CRP and CYP2E1. In contrast, the mutation rate of CTNNB1 was higher in S3. S4 was associated with bulk amplification and various molecular characteristics at different biological levels. In summary, we classified the HCC samples into five subgroups using multiple “-omics” data. Each subgroup had a distinct survival rate and molecular signature, which may provide information about the pathogenesis of subtypes in HCC. 相似文献

12.

Inferring Adaptive Codon Preference to Understand Sources of Selection Shaping Codon Usage Bias

Janaina Lima de Oliveira Atahualpa Castillo Morales Laurence D Hurst Araxi O Urrutia Christopher R L Thompson Jason B Wolf 《Molecular biology and evolution》2021,38(8):3247

Alternative synonymous codons are often used at unequal frequencies. Classically, studies of such codon usage bias (CUB) attempted to separate the impact of neutral from selective forces by assuming that deviations from a predicted neutral equilibrium capture selection. However, GC-biased gene conversion (gBGC) can also cause deviation from a neutral null. Alternatively, selection has been inferred from CUB in highly expressed genes, but the accuracy of this approach has not been extensively tested, and gBGC can interfere with such extrapolations (e.g., if expression and gene conversion rates covary). It is therefore critical to examine deviations from a mutational null in a species with no gBGC. To achieve this goal, we implement such an analysis in the highly AT rich genome of Dictyostelium discoideum, where we find no evidence of gBGC. We infer neutral CUB under mutational equilibrium to quantify “adaptive codon preference,” a nontautologous genome wide quantitative measure of the relative selection strength driving CUB. We observe signatures of purifying selection consistent with selection favoring adaptive codon preference. Preferred codons are not GC rich, underscoring the independence from gBGC. Expression-associated “preference” largely matches adaptive codon preference but does not wholly capture the influence of selection shaping patterns across all genes, suggesting selective constraints associated specifically with high expression. We observe patterns consistent with effects on mRNA translation and stability shaping adaptive codon preference. Thus, our approach to quantifying adaptive codon preference provides a framework for inferring the sources of selection that shape CUB across different contexts within the genome. 相似文献

13.

High-depth sequencing of over 750 genes supports linear progression of primary tumors and metastases in most patients with liver-limited metastatic colorectal cancer

Iain Beehuat Tan Simeen Malik Kalpana Ramnarayanan John R McPherson Dan Liang Ho Yuka Suzuki Sarah Boonhsui Ng Su Yan Kiat Hon Lim Dennis Koh Chew Min Hoe Chung Yip Chan Rachel Ten Brian KP Goh Alexander YF Chung Joanna Tan Cheryl Xueli Chan Su Ting Tay Lezhava Alexander Niranjan Nagarajan Axel M Hillmer Choon Leong Tang Clarinda Chua Bin Tean Teh Steve Rozen Patrick Tan 《Genome biology》2015,16(1)

BackgroundColorectal cancer with metastases limited to the liver (liver-limited mCRC) is a distinct clinical subset characterized by possible cure with surgery. We performed high-depth sequencing of over 750 cancer-associated genes and copy number profiling in matched primary, metastasis and normal tissues to characterize genomic progression in 18 patients with liver-limited mCRC.ResultsHigh depth Illumina sequencing and use of three different variant callers enable comprehensive and accurate identification of somatic variants down to 2.5% variant allele frequency. We identify a median of 11 somatic single nucleotide variants (SNVs) per tumor. Across patients, a median of 79.3% of somatic SNVs present in the primary are present in the metastasis and 81.7% of all alterations present in the metastasis are present in the primary. Private alterations are found at lower allele frequencies; a different mutational signature characterized shared and private variants, suggesting distinct mutational processes. Using B-allele frequencies of heterozygous germline SNPs and copy number profiling, we find that broad regions of allelic imbalance and focal copy number changes, respectively, are generally shared between the primary tumor and metastasis.ConclusionsOur analyses point to high genomic concordance of primary tumor and metastasis, with a thick common trunk and smaller genomic branches in general support of the linear progression model in most patients with liver-limited mCRC. More extensive studies are warranted to further characterize genomic progression in this important clinical population.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0589-1) contains supplementary material, which is available to authorized users. 相似文献

14.

The dChip survival analysis module for microarray data

Samir B Amin Parantu K Shah Aimin Yan Sophia Adamia Stéphane Minvielle Hervé Avet-Loiseau Nikhil C Munshi Cheng Li 《BMC bioinformatics》2011,12(1):72

Background

Genome-wide expression signatures are emerging as potential marker for overall survival and disease recurrence risk as evidenced by recent commercialization of gene expression based biomarkers in breast cancer. Similar predictions have recently been carried out using genome-wide copy number alterations and microRNAs. Existing software packages for microarray data analysis provide functions to define expression-based survival gene signatures. However, there is no software that can perform survival analysis using SNP array data or draw survival curves interactively for expression-based sample clusters. 相似文献

15.

Polycomb repressive complex 2 epigenomic signature defines age-associated hypermethylation and gene expression changes

Mikhail G Dozmorov 《Epigenetics》2015,10(6):484-495

相似文献

16.

Digital karyotyping: an update of its applications in cancer

Salani R Chang CL Cope L Wang TL 《Molecular diagnosis & therapy》2006,10(4):231-237

DNA copy number alterations, including entire chromosomal changes and small interstitial DNA amplifications and deletions, characterize the development of cancer. These changes usually affect the expression of target genes and subsequently the function of the target proteins. Since the completion of the human genome project, the capacity to comprehensively analyze the human cancer genome has expanded significantly. Techniques such as digital karyotyping have been developed to allow for the detection of DNA copy number alterations in cancer at the whole-genome scale. When compared with conventional methods such as spectral karyotyping, representational difference analysis, comparative genomic hybridization (CGH), or the more recent array CGH; digital karyotyping provides an evaluation of copy number of genetic material at higher resolution. Digital karyotyping has therefore promised to enhance our understanding of the cancer genome. This article provides an overview of digital karyotyping including the principle of the technology and its applications in identifying potential oncogenes and tumor suppressor genes. 相似文献

17.

Bortezomib Resistance Can Be Reversed by Induced Expression of Plasma Cell Maturation Markers in a Mouse In Vitro Model of Multiple Myeloma

Holly A. F. Stessman Aatif Mansoor Fenghuang Zhan Michael A. Linden Brian Van Ness Linda B. Baughn 《PloS one》2013,8(10)

Multiple myeloma (MM), the second most common hematopoietic malignancy, remains an incurable plasma cell (PC) neoplasm. While the proteasome inhibitor, bortezomib (Bz) has increased patient survival, resistance represents a major treatment obstacle as most patients ultimately relapse becoming refractory to additional Bz therapy. Current tests fail to detect emerging resistance; by the time patients acquire resistance, their prognosis is often poor. To establish immunophenotypic signatures that predict Bz sensitivity, we utilized Bz-sensitive and -resistant cell lines derived from tumors of the Bcl-X_L/Myc mouse model of PC malignancy. We identified significantly reduced expression of two markers (CD93, CD69) in “acquired” (Bz-selected) resistant cells. Using this phenotypic signature, we isolated a subpopulation of cells from a drug-naïve, Bz-sensitive culture that displayed “innate” resistance to Bz. Although these genes were identified as biomarkers, they may indicate a mechanism for Bz-resistance through the loss of PC maturation which may be induced and/or selected by Bz. Significantly, induction of PC maturation in both “acquired” and “innate” resistant cells restored Bz sensitivity suggesting a novel therapeutic approach for reversing Bz resistance in refractory MM. 相似文献

18.

Detection and characterization of horizontal transfers in prokaryotes using genomic signature 总被引：6，自引：0，他引：6

Dufraigne C Fertil B Lespinats S Giron A Deschavanne P 《Nucleic acids research》2005,33(1):e6

Horizontal DNA transfer is an important factor of evolution and participates in biological diversity. Unfortunately, the location and length of horizontal transfers (HTs) are known for very few species. The usage of short oligonucleotides in a sequence (the so-called genomic signature) has been shown to be species-specific even in DNA fragments as short as 1 kb. The genomic signature is therefore proposed as a tool to detect HTs. Since DNA transfers originate from species with a signature different from those of the recipient species, the analysis of local variations of signature along recipient genome may allow for detecting exogenous DNA. The strategy consists in (i) scanning the genome with a sliding window, and calculating the corresponding local signature (ii) evaluating its deviation from the signature of the whole genome and (iii) looking for similar signatures in a database of genomic signatures. A total of 22 prokaryote genomes are analyzed in this way. It has been observed that atypical regions make up ~6% of each genome on the average. Most of the claimed HTs as well as new ones are detected. The origin of putative DNA transfers is looked for among ~12000 species. Donor species are proposed and sometimes strongly suggested, considering similarity of signatures. Among the species studied, Bacillus subtilis, Haemophilus Influenzae and Escherichia coli are investigated by many authors and give the opportunity to perform a thorough comparison of most of the bioinformatics methods used to detect HTs. 相似文献

19.

Evolutionary Signatures amongst Disease Genes Permit Novel Methods for Gene Prioritization and Construction of Informative Gene-Based Networks

Nolan Priedigkeit Nicholas Wolfe Nathan L. Clark 《PLoS genetics》2015,11(2)

Genes involved in the same function tend to have similar evolutionary histories, in that their rates of evolution covary over time. This coevolutionary signature, termed Evolutionary Rate Covariation (ERC), is calculated using only gene sequences from a set of closely related species and has demonstrated potential as a computational tool for inferring functional relationships between genes. To further define applications of ERC, we first established that roughly 55% of genetic diseases posses an ERC signature between their contributing genes. At a false discovery rate of 5% we report 40 such diseases including cancers, developmental disorders and mitochondrial diseases. Given these coevolutionary signatures between disease genes, we then assessed ERC''s ability to prioritize known disease genes out of a list of unrelated candidates. We found that in the presence of an ERC signature, the true disease gene is effectively prioritized to the top 6% of candidates on average. We then apply this strategy to a melanoma-associated region on chromosome 1 and identify MCL1 as a potential causative gene. Furthermore, to gain global insight into disease mechanisms, we used ERC to predict molecular connections between 310 nominally distinct diseases. The resulting “disease map” network associates several diseases with related pathogenic mechanisms and unveils many novel relationships between clinically distinct diseases, such as between Hirschsprung''s disease and melanoma. Taken together, these results demonstrate the utility of molecular evolution as a gene discovery platform and show that evolutionary signatures can be used to build informative gene-based networks. 相似文献

20.

Examination of Genome Homogeneity in Prokaryotes Using Genomic Signatures

Jon Bohlin Eystein Skjerve 《PloS one》2009,4(12)

Background

DNA word frequencies, normalized for genomic AT content, are remarkably stable within prokaryotic genomes and are therefore said to reflect a “genomic signature.” The genomic signatures can be used to phylogenetically classify organisms from arbitrary sampled DNA. Genomic signatures can also be used to search for horizontally transferred DNA or DNA regions subjected to special selection forces. Thus, the stability of the genomic signature can be used as a measure of genomic homogeneity. The factors associated with the stability of the genomic signatures are not known, and this motivated us to investigate further. We analyzed the intra-genomic variance of genomic signatures based on AT content normalization (0^th order Markov model) as well as genomic signatures normalized by smaller DNA words (1^st and 2^nd order Markov models) for 636 sequenced prokaryotic genomes. Regression models were fitted, with intra-genomic signature variance as the response variable, to a set of factors representing genomic properties such as genomic AT content, genome size, habitat, phylum, oxygen requirement, optimal growth temperature and oligonucleotide usage variance (OUV, a measure of oligonucleotide usage bias), measured as the variance between genomic tetranucleotide frequencies and Markov chain approximated tetranucleotide frequencies, as predictors.

Principal Findings

Regression analysis revealed that OUV was the most important factor (p<0.001) determining intra-genomic homogeneity as measured using genomic signatures. This means that the less random the oligonucleotide usage is in the sense of higher OUV, the more homogeneous the genome is in terms of the genomic signature. The other factors influencing variance in the genomic signature (p<0.001) were genomic AT content, phylum and oxygen requirement.

Conclusions

Genomic homogeneity in prokaryotes is intimately linked to genomic GC content, oligonucleotide usage bias (OUV) and aerobiosis, while oligonucleotide usage bias (OUV) is associated with genomic GC content, aerobiosis and habitat. 相似文献