首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.

Background

The correct taxonomic assignment of bacterial genomes is a primary and challenging task. With the availability of whole genome sequences, the gene content based approaches appear promising in inferring the bacterial taxonomy. The complete genome sequencing of a bacterial genome often reveals a substantial number of unique genes present only in that genome which can be used for its taxonomic classification.

Results

In this study, we have proposed a comprehensive method which uses the taxon-specific genes for the correct taxonomic assignment of existing and new bacterial genomes. The taxon-specific genes identified at each taxonomic rank have been successfully used for the taxonomic classification of 2,342 genomes present in the NCBI genomes, 36 newly sequenced genomes, and 17 genomes for which the complete taxonomy is not yet known. This approach has been implemented for the development of a tool ‘Microtaxi’ which can be used for the taxonomic assignment of complete bacterial genomes.

Conclusion

The taxon-specific gene based approach provides an alternate valuable methodology to carry out the taxonomic classification of newly sequenced or existing bacterial genomes.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1542-0) contains supplementary material, which is available to authorized users.  相似文献   

2.

Background

One of the goals of genomics is to identify the genetic loci responsible for variation in phenotypic traits. The completion of the tomato genome sequence and recent advances in DNA sequencing technology allow for in-depth characterization of genetic variation present in the tomato genome. Like many self-pollinated crops, cultivated tomato accessions show a low molecular but high phenotypic diversity. Here we describe the whole-genome resequencing of eight accessions (four cherry-type and four large fruited lines) chosen to represent a large range of intra-specific variability and the identification and annotation of novel polymorphisms.

Results

The eight genomes were sequenced using the GAII Illumina platform. Comparison of the sequences with the reference genome yielded more than 4 million single nucleotide polymorphisms (SNPs). This number varied from 80,000 to 1.5 million according to the accessions. Almost 128,000 InDels were detected. The distribution of SNPs and InDels across and within chromosomes was highly heterogeneous revealing introgressions from wild species and the mosaic structure of the genomes of the cherry tomato accessions. In-depth annotation of the polymorphisms identified more than 16,000 unique non-synonymous SNPs. In addition 1,686 putative copy-number variations (CNVs) were identified.

Conclusions

This study represents the first whole genome resequencing experiment in cultivated tomato. Substantial genetic differences exist between the sequenced tomato accessions and the reference sequence. The heterogeneous distribution of the polymorphisms may be related to introgressions that occurred during domestication or breeding. The annotated SNPs, InDels and CNVs identified in this resequencing study will serve as useful genetic tools, and as candidate polymorphisms in the search for phenotype-altering DNA variations.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-14-791) contains supplementary material, which is available to authorized users.  相似文献   

3.
4.

Background

Ultra high throughput sequencing (UHTS) technologies find an important application in targeted resequencing of candidate genes or of genomic intervals from genetic association studies. Despite the extraordinary power of these new methods, they are still rarely used in routine analysis of human genomic variants, in part because of the absence of specific standard procedures. The aim of this work is to provide human molecular geneticists with a tool to evaluate the best UHTS methodology for efficiently detecting DNA changes, from common SNPs to rare mutations.

Methodology/Principal Findings

We tested the three most widespread UHTS platforms (Roche/454 GS FLX Titanium, Illumina/Solexa Genome Analyzer II and Applied Biosystems/SOLiD System 3) on a well-studied region of the human genome containing many polymorphisms and a very rare heterozygous mutation located within an intronic repetitive DNA element. We identify the qualities and the limitations of each platform and describe some peculiarities of UHTS in resequencing projects.

Conclusions/Significance

When appropriate filtering and mapping procedures are applied UHTS technology can be safely and efficiently used as a tool for targeted human DNA variations detection. Unless particular and platform-dependent characteristics are needed for specific projects, the most relevant parameter to consider in mainstream human genome resequencing procedures is the cost per sequenced base-pair associated to each machine.  相似文献   

5.

Background

In the post genome era, a major goal of biology is the identification of specific roles for individual genes. We report a new genomic tool for gene characterization, the UCLA Gene Expression Tool (UGET).

Results

Celsius, the largest co-normalized microarray dataset of Affymetrix based gene expression, was used to calculate the correlation between all possible gene pairs on all platforms, and generate stored indexes in a web searchable format. The size of Celsius makes UGET a powerful gene characterization tool. Using a small seed list of known cartilage-selective genes, UGET extended the list of known genes by identifying 32 new highly cartilage-selective genes. Of these, 7 of 10 tested were validated by qPCR including the novel cartilage-specific genes SDK2 and FLJ41170. In addition, we retrospectively tested UGET and other gene expression based prioritization tools to identify disease-causing genes within known linkage intervals. We first demonstrated this utility with UGET using genetically heterogeneous disorders such as Joubert syndrome, microcephaly, neuropsychiatric disorders and type 2 limb girdle muscular dystrophy (LGMD2) and then compared UGET to other gene expression based prioritization programs which use small but discrete and well annotated datasets. Finally, we observed a significantly higher gene correlation shared between genes in disease networks associated with similar complex or Mendelian disorders.

Discussion

UGET is an invaluable resource for a geneticist that permits the rapid inclusion of expression criteria from one to hundreds of genes in genomic intervals linked to disease. By using thousands of arrays UGET annotates and prioritizes genes better than other tools especially with rare tissue disorders or complex multi-tissue biological processes. This information can be critical in prioritization of candidate genes for sequence analysis.  相似文献   

6.

Background

Colorectal cancer is a major contributor to cancer morbidity and mortality. Tandem repeat instability and its effect on cancer phenotypes remain so far poorly studied on a genome-wide scale.

Results

Here we analyze the genomes of 35 colorectal tumors and their matched normal (healthy) tissues for two types of tandem repeat instability, de-novo repeat gain or loss and repeat copy number variation. Specifically, we study for the first time genome-wide repeat instability in the promoters and exons of 18,439 genes, and examine the association of repeat instability with genome-scale gene expression levels. We find that tumors with a microsatellite instable (MSI) phenotype are enriched in genes with repeat instability, and that tumor genomes have significantly more genes with repeat instability compared to healthy tissues. Genes in tumor genomes with repeat instability in their promoters are significantly less expressed and show slightly higher levels of methylation. Genes in well-studied cancer-associated signaling pathways also contain significantly more unstable repeats in tumor genomes. Genes with such unstable repeats in the tumor-suppressor p53 pathway have lower expression levels, whereas genes with repeat instability in the MAPK and Wnt signaling pathways are expressed at higher levels, consistent with the oncogenic role they play in cancer.

Conclusions

Our results suggest that repeat instability in gene promoters and associated differential gene expression may play an important role in colorectal tumors, which is a first step towards the development of more effective molecular diagnostic approaches centered on repeat instability.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1902-9) contains supplementary material, which is available to authorized users.  相似文献   

7.
Kao CF  Fang YS  Zhao Z  Kuo PH 《PloS one》2011,6(4):e18696

Background

Large scale and individual genetic studies have suggested numerous susceptible genes for depression in the past decade without conclusive results. There is a strong need to review and integrate multi-dimensional data for follow up validation. The present study aimed to apply prioritization procedures to build-up an evidence-based candidate genes dataset for depression.

Methods

Depression candidate genes were collected in human and animal studies across various data resources. Each gene was scored according to its magnitude of evidence related to depression and was multiplied by a source-specific weight to form a combined score measure. All genes were evaluated through a prioritization system to obtain an optimal weight matrix to rank their relative importance with depression using the combined scores. The resulting candidate gene list for depression (DEPgenes) was further evaluated by a genome-wide association (GWA) dataset and microarray gene expression in human tissues.

Results

A total of 5,055 candidate genes (4,850 genes from human and 387 genes from animal studies with 182 being overlapped) were included from seven data sources. Through the prioritization procedures, we identified 169 DEPgenes, which exhibited high chance to be associated with depression in GWA dataset (Wilcoxon rank-sum test, p = 0.00005). Additionally, the DEPgenes had a higher percentage to express in human brain or nerve related tissues than non-DEPgenes, supporting the neurotransmitter and neuroplasticity theories in depression.

Conclusions

With comprehensive data collection and curation and an application of integrative approach, we successfully generated DEPgenes through an effective gene prioritization system. The prioritized DEPgenes are promising for future biological experiments or replication efforts to discoverthe underlying molecular mechanisms for depression.  相似文献   

8.

Background

In the honeybee Apis mellifera, the bacterial gut community is consistently colonized by eight distinct phylotypes of bacteria. Managed bee colonies are of considerable economic interest and it is therefore important to elucidate the diversity and role of this microbiota in the honeybee. In this study, we have sequenced the genomes of eleven strains of lactobacilli and bifidobacteria isolated from the honey crop of the honeybee A. mellifera.

Results

Single gene phylogenies confirmed that the isolated strains represent the diversity of lactobacilli and bifidobacteria in the gut, as previously identified by 16S rRNA gene sequencing. Core genome phylogenies of the lactobacilli and bifidobacteria further indicated extensive divergence between strains classified as the same phylotype. Phylotype-specific protein families included unique surface proteins. Within phylotypes, we found a remarkably high level of gene content diversity. Carbohydrate metabolism and transport functions contributed up to 45% of the accessory genes, with some genomes having a higher content of genes encoding phosphotransferase systems for the uptake of carbohydrates than any previously sequenced genome. These genes were often located in highly variable genomic segments that also contained genes for enzymes involved in the degradation and modification of sugar residues. Strain-specific gene clusters for the biosynthesis of exopolysaccharides were identified in two phylotypes. The dynamics of these segments contrasted with low recombination frequencies and conserved gene order structures for the core genes. Hits for CRISPR spacers were almost exclusively found within phylotypes, suggesting that the phylotypes are associated with distinct phage populations.

Conclusions

The honeybee gut microbiota has been described as consisting of a modest number of phylotypes; however, the genomes sequenced in the current study demonstrated a very high level of gene content diversity within all three described phylotypes of lactobacilli and bifidobacteria, particularly in terms of metabolic functions and surface structures, where many features were strain-specific. Together, these results indicate niche differentiation within phylotypes, suggesting that the honeybee gut microbiota is more complex than previously thought.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1476-6) contains supplementary material, which is available to authorized users.  相似文献   

9.
10.

Background

DNA methylation is associated with aberrant gene expression in cancer, and has been shown to correlate with therapeutic response and disease prognosis in some types of cancer. We sought to investigate the biological significance of DNA methylation in lung cancer.

Results

We integrated the gene expression profiles and data of gene promoter methylation for a large panel of non-small cell lung cancer cell lines, and identified 578 candidate genes with expression levels that were inversely correlated to the degree of DNA methylation. We found these candidate genes to be differentially methylated in normal lung tissue versus non-small cell lung cancer tumors, and segregated by histologic and tumor subtypes. We used gene set enrichment analysis of the genes ranked by the degree of correlation between gene expression and DNA methylation to identify gene sets involved in cellular migration and metastasis. Our unsupervised hierarchical clustering of the candidate genes segregated cell lines according to the epithelial-to-mesenchymal transition phenotype. Genes related to the epithelial-to-mesenchymal transition, such as AXL, ESRP1, HoxB4, and SPINT1/2, were among the nearly 20% of the candidate genes that were differentially methylated between epithelial and mesenchymal cells. Greater numbers of genes were methylated in the mesenchymal cells and their expressions were upregulated by 5-azacytidine treatment. Methylation of the candidate genes was associated with erlotinib resistance in wild-type EGFR cell lines. The expression profiles of the candidate genes were associated with 8-week disease control in patients with wild-type EGFR who had unresectable non-small cell lung cancer treated with erlotinib, but not in patients treated with sorafenib.

Conclusions

Our results demonstrate that the underlying biology of genes regulated by DNA methylation may have predictive value in lung cancer that can be exploited therapeutically.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1079) contains supplementary material, which is available to authorized users.  相似文献   

11.

Background

Alzheimer’s disease (AD) is one of the leading genetically complex and heterogeneous disorder that is influenced by both genetic and environmental factors. The underlying risk factors remain largely unclear for this heterogeneous disorder. In recent years, high throughput methodologies, such as genome-wide linkage analysis (GWL), genome-wide association (GWA) studies, and genome-wide expression profiling (GWE), have led to the identification of several candidate genes associated with AD. However, due to lack of consistency within their findings, an integrative approach is warranted. Here, we have designed a rank based gene prioritization approach involving convergent analysis of multi-dimensional data and protein-protein interaction (PPI) network modelling.

Results

Our approach employs integration of three different AD datasets- GWL,GWA and GWE to identify overlapping candidate genes ranked using a novel cumulative rank score (SR) based method followed by prioritization using clusters derived from PPI network. SR for each gene is calculated by addition of rank assigned to individual gene based on either p value or score in three datasets. This analysis yielded 108 plausible AD genes. Network modelling by creating PPI using proteins encoded by these genes and their direct interactors resulted in a layered network of 640 proteins. Clustering of these proteins further helped us in identifying 6 significant clusters with 7 proteins (EGFR, ACTB, CDC2, IRAK1, APOE, ABCA1 and AMPH) forming the central hub nodes. Functional annotation of 108 genes revealed their role in several biological activities such as neurogenesis, regulation of MAP kinase activity, response to calcium ion, endocytosis paralleling the AD specific attributes. Finally, 3 potential biochemical biomarkers were found from the overlap of 108 AD proteins with proteins from CSF and plasma proteome. EGFR and ACTB were found to be the two most significant AD risk genes.

Conclusions

With the assumption that common genetic signals obtained from different methodological platforms might serve as robust AD risk markers than candidates identified using single dimension approach, here we demonstrated an integrated genomic convergence approach for disease candidate gene prioritization from heterogeneous data sources linked to AD.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-199) contains supplementary material, which is available to authorized users.  相似文献   

12.

Background

Genome-wide association studies (GWASs) and global profiling of gene expression (microarrays) are two major technological breakthroughs that allow hypothesis-free identification of candidate genes associated with tumorigenesis. It is not obvious whether there is a consistency between the candidate genes identified by GWAS (GWAS genes) and those identified by profiling gene expression (microarray genes).

Methodology/Principal Findings

We used the Cancer Genetic Markers Susceptibility database to retrieve single nucleotide polymorphisms from candidate genes for prostate cancer. In addition, we conducted a large meta-analysis of gene expression data in normal prostate and prostate tumor tissue. We identified 13,905 genes that were interrogated by both GWASs and microarrays. On the basis of P values from GWASs, we selected 1,649 most significantly associated genes for functional annotation by the Database for Annotation, Visualization and Integrated Discovery. We also conducted functional annotation analysis using same number of the top genes identified in the meta-analysis of the gene expression data. We found that genes involved in cell adhesion were overrepresented among both the GWAS and microarray genes.

Conclusions/Significance

We conclude that the results of these analyses suggest that combining GWAS and microarray data would be a more effective approach than analyzing individual datasets and can help to refine the identification of candidate genes and functions associated with tumor development.  相似文献   

13.

Background

Stochastic fluctuations in the protein turnover underlie the random emergence of neural precursor cells from initially homogenous cell population. If stochastic alteration of the levels in signal transduction networks is sufficient to spontaneously alter a phenotype, can it cause a sporadic chronic disease as well – including cancer?

Methods

Expression in >80 disease-free tissue environments was measured using Affymetrix microarray platform comprising 54675 probe-sets. Steps were taken to suppress the technical noise inherent to microarray experiment. Next, the integrated expression and expression variability data were aligned with the mechanistic data covering major human chronic diseases.

Results

Measured as class average, variability of expression of disease associated genes measured in health was higher than variability of random genes for all chronic pathologies. Anti-cancer FDA approved targets were displaying much higher variability as a class compared to random genes. Same held for magnitude of gene expression. The genes known to participate in multiple chronic disorders demonstrated the highest variability. Disease-related gene categories displayed on average more intricate regulation of biological function vs random reference, were enriched in adaptive and transient functions as well as positive feedback relationships.

Conclusions

A possible causative link can be suggested between normal (healthy) state gene expression variation and inception of major human pathologies, including cancer. Study of variability profiles may lead to novel diagnostic methods, therapies and better drug target prioritization. The results of the study suggest the need to advance personalized therapy development.  相似文献   

14.

Background

Accurate detection of characteristic proteins secreted by colon cancer tumor cells in biological fluids could serve as a biomarker for the disease. The aim of the present study was to identify and validate new serum biomarkers and demonstrate their potential usefulness for early diagnosis of colon cancer.

Methods

The study was organized in three sequential phases: 1) biomarker discovery, 2) technical and biological validation, and 3) proof of concept to test the potential clinical use of selected biomarkers. A prioritized subset of the differentially-expressed genes between tissue types (50 colon mucosa from cancer-free individuals and 100 normal-tumor pairs from colon cancer patients) was validated and further tested in a series of serum samples from 80 colon cancer cases, 23 patients with adenoma and 77 cancer-free controls.

Results

In the discovery phase, 505 unique candidate biomarkers were identified, with highly significant results and high capacity to discriminate between the different tissue types. After a subsequent prioritization, all tested genes (N = 23) were successfully validated in tissue, and one of them, COL10A1, showed relevant differences in serum protein levels between controls, patients with adenoma (p = 0.0083) and colon cancer cases (p = 3.2e-6).

Conclusion

We present a sequential process for the identification and further validation of biomarkers for early detection of colon cancer that identifies COL10A1 protein levels in serum as a potential diagnostic candidate to detect both adenoma lesions and tumor.

Impact

The use of a cheap serum test for colon cancer screening should improve its participation rates and contribute to decrease the burden of this disease.  相似文献   

15.

Introduction

Gene expression data is often assumed to be normally-distributed, but this assumption has not been tested rigorously. We investigate the distribution of expression data in human cancer genomes and study the implications of deviations from the normal distribution for translational molecular oncology research.

Methods

We conducted a central moments analysis of five cancer genomes and performed empiric distribution fitting to examine the true distribution of expression data both on the complete-experiment and on the individual-gene levels. We used a variety of parametric and nonparametric methods to test the effects of deviations from normality on gene calling, functional annotation, and prospective molecular classification using a sixth cancer genome.

Results

Central moments analyses reveal statistically-significant deviations from normality in all of the analyzed cancer genomes. We observe as much as 37% variability in gene calling, 39% variability in functional annotation, and 30% variability in prospective, molecular tumor subclassification associated with this effect.

Conclusions

Cancer gene expression profiles are not normally-distributed, either on the complete-experiment or on the individual-gene level. Instead, they exhibit complex, heavy-tailed distributions characterized by statistically-significant skewness and kurtosis. The non-Gaussian distribution of this data affects identification of differentially-expressed genes, functional annotation, and prospective molecular classification. These effects may be reduced in some circumstances, although not completely eliminated, by using nonparametric analytics. This analysis highlights two unreliable assumptions of translational cancer gene expression analysis: that “small” departures from normality in the expression data distributions are analytically-insignificant and that “robust” gene-calling algorithms can fully compensate for these effects.  相似文献   

16.

Purpose

FKBP51, (FKBP5), is a negative regulator of Akt. Variability in FKBP5 expression level is a major factor contributing to variation in response to chemotherapeutic agents including gemcitabine, a first line treatment for pancreatic cancer. Genetic variation in FKBP5 could influence its function and, ultimately, treatment response of pancreatic cancer.

Experimental Design

We set out to comprehensively study the role of genetic variation in FKBP5 identified by Next Generation DNA resequencing on response to gemcitabine treatment of pancreatic cancer by utilizing both tumor and germline DNA samples from 43 pancreatic cancer patients, including 19 paired normal-tumor samples. Next, genotype-phenotype association studies were performed with overall survival as well as with FKBP5 gene expression in tumor using the same samples in which resequencing had been performed, followed by functional genomics studies.

Results

In-depth resequencing identified 404 FKBP5 single nucleotide polymorphisms (SNPs) in normal and tumor DNA. SNPs with the strongest associations with survival or FKBP5 expression were subjected to functional genomic study. Electromobility shift assay showed that the rs73748206 “A(T)” SNP altered DNA-protein binding patterns, consistent with significantly increased reporter gene activity, possibly through its increased binding to Glucocorticoid Receptor (GR). The effect of rs73748206 was confirmed on the basis of its association with FKBP5 expression by affecting the binding to GR in lymphoblastoid cell lines derived from the same patients for whom DNA was used for resequencing.

Conclusion

This comprehensive FKBP5 resequencing study provides insights into the role of genetic variation in variation of gemcitabine response.  相似文献   

17.
Valor LM  Grant SG 《PloS one》2007,2(12):e1303

Background

Gene expression profiling using microarrays is a powerful technology widely used to study regulatory networks. Profiling of mRNA levels in mutant organisms has the potential to identify genes regulated by the mutated protein.

Methodology/Principle Findings

Using tissues from multiple lines of knockout mice we have examined genome-wide changes in gene expression. We report that a significant proportion of changed genes were found near the targeted gene.

Conclusions/Significance

The apparent clustering of these genes was explained by the presence of flanking DNA from the parental ES cell. We provide recommendations for the analysis and reporting of microarray data from knockout mice  相似文献   

18.

Background

Recently, a number of studies have performed genome or exome sequencing of hepatocellular carcinoma (HCC) and identified hundreds or even thousands of mutations in protein-coding genes. However, these studies have only focused on a limited number of candidate genes, and many important mutation resources remain to be explored.

Principal Findings

In this study, we integrated mutation data obtained from various sources and performed pathway and network analysis. We identified 113 pathways that were significantly mutated in HCC samples and found that the mutated genes included in these pathways contained high percentages of known cancer genes, and damaging genes and also demonstrated high conservation scores, indicating their important roles in liver tumorigenesis. Five classes of pathways that were mutated most frequently included (a) proliferation and apoptosis related pathways, (b) tumor microenvironment related pathways, (c) neural signaling related pathways, (d) metabolic related pathways, and (e) circadian related pathways. Network analysis further revealed that the mutated genes with the highest betweenness coefficients, such as the well-known cancer genes TP53, CTNNB1 and recently identified novel mutated genes GNAL and the ADCY family, may play key roles in these significantly mutated pathways. Finally, we highlight several key genes (e.g., RPS6KA3 and PCLO) and pathways (e.g., axon guidance) in which the mutations were associated with clinical features.

Conclusions

Our workflow illustrates the increased statistical power of integrating multiple studies of the same subject, which can provide biological insights that would otherwise be masked under individual sample sets. This type of bioinformatics approach is consistent with the necessity of making the best use of the ever increasing data provided in valuable databases, such as TCGA, to enhance the speed of deciphering human cancers.  相似文献   

19.

Introduction

Heredity is estimated to cause at least 20% of colorectal cancer. The hereditary nonpolyposis colorectal cancer subset is divided into Lynch syndrome and familial colorectal cancer type X (FCCTX) based on presence of mismatch repair (MMR) gene defects.

Purpose

We addressed the gene expression signatures in colorectal cancer linked to Lynch syndrome and FCCTX with the aim to identify candidate genes and to map signaling pathways relevant in hereditary colorectal carcinogenesis.

Experimental design

The 18 k whole-genome c-DNA-mediated annealing, selection, extension, and ligation (WG-DASL) assay was applied to 123 colorectal cancers, including 39 Lynch syndrome tumors and 37 FCCTX tumors. Target genes were technically validated using real-time quantitative RT-PCR (qRT-PCR) and the expression signature was validated in independent datasets.

Results

Colorectal cancers linked to Lynch syndrome and FCCTX showed distinct gene expression profiles, which by significance analysis of microarrays (SAM) differed by 2188 genes. Functional pathways involved were related to G-protein coupled receptor signaling, oxidative phosphorylation, and cell cycle function and mitosis. qRT-PCR verified altered expression of the selected genes NDUFA9, AXIN2, MYC, DNA2 and H2AFZ. Application of the 2188-gene signature to independent datasets showed strong correlation to MMR status.

Conclusion

Distinct genetic profiles and deregulation of different canonical pathways apply to Lynch syndrome and FCCTX and key targets herein may be relevant to pursue for refined diagnostic and therapeutic strategies in hereditary colorectal cancer.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号