首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
“Big” molecules such as proteins and genes still continue to capture the imagination of most biologists, biochemists and bioinformaticians. “Small” molecules, on the other hand, are the molecules that most biologists, biochemists and bioinformaticians prefer to ignore. However, it is becoming increasingly apparent that small molecules such as amino acids, lipids and sugars play a far more important role in all aspects of disease etiology and disease treatment than we realized. This particular chapter focuses on an emerging field of bioinformatics called “chemical bioinformatics” – a discipline that has evolved to help address the blended chemical and molecular biological needs of toxicogenomics, pharmacogenomics, metabolomics and systems biology. In the following pages we will cover several topics related to chemical bioinformatics. First, a brief overview of some of the most important or useful chemical bioinformatic resources will be given. Second, a more detailed overview will be given on those particular resources that allow researchers to connect small molecules to diseases. This section will focus on describing a number of recently developed databases or knowledgebases that explicitly relate small molecules – either as the treatment, symptom or cause – to disease. Finally a short discussion will be provided on newly emerging software tools that exploit these databases as a means to discover new biomarkers or even new treatments for disease.

What to Learn in This Chapter

  • The meaning of chemical bioinformatics
  • Strengths and limitations of existing chemical bioinformatic databases
  • Using databases to learn about the cause and treatment of diseases
  • The Small Molecule Pathway Database (SMPDB)
  • The Human Metabolome Database (HMDB)
  • DrugBank
  • The Toxin and Toxin-Target Database (T3DB)
  • PolySearch and Metabolite Set Enrichment Analysis
This article is part of the “Translational Bioinformatics” collection for PLOS Computational Biology.
  相似文献   

2.
Differences between individual human genomes, or between human and cancer genomes, range in scale from single nucleotide variants (SNVs) through intermediate and large-scale duplications, deletions, and rearrangements of genomic segments. The latter class, called structural variants (SVs), have received considerable attention in the past several years as they are a previously under appreciated source of variation in human genomes. Much of this recent attention is the result of the availability of higher-resolution technologies for measuring these variants, including both microarray-based techniques, and more recently, high-throughput DNA sequencing. We describe the genomic technologies and computational techniques currently used to measure SVs, focusing on applications in human and cancer genomics.

What to Learn in This Chapter

  • Current knowledge about the prevalence of structural variation in human and cancer genomes.
  • Strategies for using microarray and high-throughput DNA sequencing technologies to measure structural variation.
  • Computational techniques to detect structural variants from DNA sequencing data.
This article is part of the “Translational Bioinformatics” collection for PLOS Computational Biology.
  相似文献   

3.

Introduction

Gene expression data is often assumed to be normally-distributed, but this assumption has not been tested rigorously. We investigate the distribution of expression data in human cancer genomes and study the implications of deviations from the normal distribution for translational molecular oncology research.

Methods

We conducted a central moments analysis of five cancer genomes and performed empiric distribution fitting to examine the true distribution of expression data both on the complete-experiment and on the individual-gene levels. We used a variety of parametric and nonparametric methods to test the effects of deviations from normality on gene calling, functional annotation, and prospective molecular classification using a sixth cancer genome.

Results

Central moments analyses reveal statistically-significant deviations from normality in all of the analyzed cancer genomes. We observe as much as 37% variability in gene calling, 39% variability in functional annotation, and 30% variability in prospective, molecular tumor subclassification associated with this effect.

Conclusions

Cancer gene expression profiles are not normally-distributed, either on the complete-experiment or on the individual-gene level. Instead, they exhibit complex, heavy-tailed distributions characterized by statistically-significant skewness and kurtosis. The non-Gaussian distribution of this data affects identification of differentially-expressed genes, functional annotation, and prospective molecular classification. These effects may be reduced in some circumstances, although not completely eliminated, by using nonparametric analytics. This analysis highlights two unreliable assumptions of translational cancer gene expression analysis: that “small” departures from normality in the expression data distributions are analytically-insignificant and that “robust” gene-calling algorithms can fully compensate for these effects.  相似文献   

4.

Background

Ocimum sanctum L. (O. tenuiflorum) family-Lamiaceae is an important component of Indian tradition of medicine as well as culture around the world, and hence is known as “Holy basil” in India. This plant is mentioned in the ancient texts of Ayurveda as an “elixir of life” (life saving) herb and worshipped for over 3000 years due to its healing properties. Although used in various ailments, validation of molecules for differential activities is yet to be fully analyzed, as about 80 % of the patents on this plant are on extracts or the plant parts, and mainly focussed on essential oil components. With a view to understand the full metabolic potential of this plant whole nuclear and chloroplast genomes were sequenced for the first time combining the sequence data from 4 libraries and three NGS platforms.

Results

The saturated draft assembly of the genome was about 386 Mb, along with the plastid genome of 142,245 bp, turning out to be the smallest in Lamiaceae. In addition to SSR markers, 136 proteins were identified as homologous to five important plant genomes. Pathway analysis indicated an abundance of phenylpropanoids in O. sanctum. Phylogenetic analysis for chloroplast proteome placed Salvia miltiorrhiza as the nearest neighbor. Comparison of the chemical compounds and genes availability in O. sanctum and S. miltiorrhiza indicated the potential for the discovery of new active molecules.

Conclusion

The genome sequence and annotation of O. sanctum provides new insights into the function of genes and the medicinal nature of the metabolites synthesized in this plant. This information is highly beneficial for mining biosynthetic pathways for important metabolites in related species.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1640-z) contains supplementary material, which is available to authorized users.  相似文献   

5.

Background

Pseudomonas aeruginosa is an important opportunistic pathogen responsible for many infections in hospitalized and immunocompromised patients. Previous reports estimated that approximately 10% of its 6.6 Mbp genome varies from strain to strain and is therefore referred to as “accessory genome”. Elements within the accessory genome of P. aeruginosa have been associated with differences in virulence and antibiotic resistance. As whole genome sequencing of bacterial strains becomes more widespread and cost-effective, methods to quickly and reliably identify accessory genomic elements in newly sequenced P. aeruginosa genomes will be needed.

Results

We developed a bioinformatic method for identifying the accessory genome of P. aeruginosa. First, the core genome was determined based on sequence conserved among the completed genomes of twelve reference strains using Spine, a software program developed for this purpose. The core genome was 5.84 Mbp in size and contained 5,316 coding sequences. We then developed an in silico genome subtraction program named AGEnt to filter out core genomic sequences from P. aeruginosa whole genomes to identify accessory genomic sequences of these reference strains. This analysis determined that the accessory genome of P. aeruginosa ranged from 6.9-18.0% of the total genome, was enriched for genes associated with mobile elements, and was comprised of a majority of genes with unknown or unclear function. Using these genomes, we showed that AGEnt performed well compared to other publically available programs designed to detect accessory genomic elements. We then demonstrated the utility of the AGEnt program by applying it to the draft genomes of two previously unsequenced P. aeruginosa strains, PA99 and PA103.

Conclusions

The P. aeruginosa genome is rich in accessory genetic material. The AGEnt program accurately identified the accessory genomes of newly sequenced P. aeruginosa strains, even when draft genomes were used. As P. aeruginosa genomes become available at an increasingly rapid pace, this program will be useful in cataloging the expanding accessory genome of this bacterium and in discerning correlations between phenotype and accessory genome makeup. The combination of Spine and AGEnt should be useful in defining the accessory genomes of other bacterial species as well.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-737) contains supplementary material, which is available to authorized users.  相似文献   

6.

Background

Insertion sequences (ISs) are approximately 1 kbp long “jumping” genes found in prokaryotes. ISs encode the protein Transposase, which facilitates the excision and reinsertion of ISs in genomes, making these sequences a type of class I (“cut-and-paste”) Mobile Genetic Elements. ISs are proposed to be involved in the reductive evolution of symbiotic prokaryotes. Our previous sequencing of the genome of the cyanobacterium ‘Nostoc azollae’ 0708, living in a tight perpetual symbiotic association with a plant (the water fern Azolla), revealed the presence of an eroding genome, with a high number of insertion sequences (ISs) together with an unprecedented large proportion of pseudogenes. To investigate the role of ISs in the reductive evolution of ‘Nostoc azollae’ 0708, and potentially in the formation of pseudogenes, a bioinformatic investigation of the IS identities and positions in 47 cyanobacterial genomes was conducted. To widen the scope, the IS contents were analysed qualitatively and quantitatively in 20 other genomes representing both free-living and symbiotic bacteria.

Results

Insertion Sequences were not randomly distributed in the bacterial genomes and were found to transpose short distances from their original location (“local hopping”) and pseudogenes were enriched in the vicinity of IS elements. In general, symbiotic organisms showed higher densities of IS elements and pseudogenes than non-symbiotic bacteria. A total of 1108 distinct repeated sequences over 500 bp were identified in the 67 genomes investigated. In the genome of ‘Nostoc azollae’ 0708, IS elements were apparent at 970 locations (14.3%), with 428 being full-length. Morphologically complex cyanobacteria with large genomes showed higher frequencies of IS elements, irrespective of life style.

Conclusions

The apparent co-location of IS elements and pseudogenes found in prokaryotic genomes implies earlier IS transpositions into genes. As transpositions tend to be local rather than genome wide this likely explains the proximity between IS elements and pseudogenes. These findings suggest that ISs facilitate the reductive evolution in for instance in the symbiotic cyanobacterium ‘Nostoc azollae’ 0708 and in other obligate prokaryotic symbionts.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1386-7) contains supplementary material, which is available to authorized users.  相似文献   

7.

Background

Segmental duplication is widely held to be an important mode of genome growth and evolution. Yet how this would affect the global structure of genomes has been little discussed.

Methods/Principal Findings

Here, we show that equivalent length, or , a quantity determined by the variance of fluctuating part of the distribution of the -mer frequencies in a genome, characterizes the latter''s global structure. We computed the s of 865 complete chromosomes and found that they have nearly universal but (-dependent) values. The differences among the of a chromosome and those of its coding and non-coding parts were found to be slight.

Conclusions

We verified that these non-trivial results are natural consequences of a genome growth model characterized by random segmental duplication and random point mutation, but not of any model whose dominant growth mechanism is not segmental duplication. Our study also indicates that genomes have a nearly universal cumulative “point” mutation density of about 0.73 mutations per site that is compatible with the relatively low mutation rates of (15)10/site/Mya previously determined by sequence comparison for the human and E. coli genomes.  相似文献   

8.
Actinobacteria in the genus Cellulomonas are the only known and reported cellulolytic facultative anaerobes. To better understand the cellulolytic strategy employed by these bacteria, we sequenced the genome of the Cellulomonas fimi ATCC 484T. For comparative purposes, we also sequenced the genome of the aerobic cellulolytic “Cellvibrio gilvus” ATCC 13127T. An initial analysis of these genomes using phylogenetic and whole-genome comparison revealed that “Cellvibrio gilvus” belongs to the genus Cellulomonas. We thus propose to assign “Cellvibrio gilvus” to the genus Cellulomonas. A comparative genomics analysis between these two Cellulomonas genome sequences and the recently completed genome for Cellulomonas flavigena ATCC 482T showed that these cellulomonads do not encode cellulosomes but appear to degrade cellulose by secreting multi-domain glycoside hydrolases. Despite the minimal number of carbohydrate-active enzymes encoded by these genomes, as compared to other known cellulolytic organisms, these bacteria were found to be proficient at degrading and utilizing a diverse set of carbohydrates, including crystalline cellulose. Moreover, they also encode for proteins required for the fermentation of hexose and xylose sugars into products such as ethanol. Finally, we found relatively few significant differences between the predicted carbohydrate-active enzymes encoded by these Cellulomonas genomes, in contrast to previous studies reporting differences in physiological approaches for carbohydrate degradation. Our sequencing and analysis of these genomes sheds light onto the mechanism through which these facultative anaerobes degrade cellulose, suggesting that the sequenced cellulomonads use secreted, multidomain enzymes to degrade cellulose in a way that is distinct from known anaerobic cellulolytic strategies.  相似文献   

9.

Background

Ecologists are collecting extensive data concerning movements of animals in marine ecosystems. Such data need to be analysed with valid statistical methods to yield meaningful conclusions.

Principal Findings

We demonstrate methodological issues in two recent studies that reached similar conclusions concerning movements of marine animals (Nature 451∶1098; Science 332∶1551). The first study analysed vertical movement data to conclude that diverse marine predators (Atlantic cod, basking sharks, bigeye tuna, leatherback turtles and Magellanic penguins) exhibited “Lévy-walk-like behaviour”, close to a hypothesised optimal foraging strategy. By reproducing the original results for the bigeye tuna data, we show that the likelihood of tested models was calculated from residuals of regression fits (an incorrect method), rather than from the likelihood equations of the actual probability distributions being tested. This resulted in erroneous Akaike Information Criteria, and the testing of models that do not correspond to valid probability distributions. We demonstrate how this led to overwhelming support for a model that has no biological justification and that is statistically spurious because its probability density function goes negative. Re-analysis of the bigeye tuna data, using standard likelihood methods, overturns the original result and conclusion for that data set. The second study observed Lévy walk movement patterns by mussels. We demonstrate several issues concerning the likelihood calculations (including the aforementioned residuals issue). Re-analysis of the data rejects the original Lévy walk conclusion.

Conclusions

We consequently question the claimed existence of scaling laws of the search behaviour of marine predators and mussels, since such conclusions were reached using incorrect methods. We discourage the suggested potential use of “Lévy-like walks” when modelling consequences of fishing and climate change, and caution that any resulting advice to managers of marine ecosystems would be problematic. For reproducibility and future work we provide R source code for all calculations.  相似文献   

10.

Background

“Evolution Canyon” (ECI) at Lower Nahal Oren, Mount Carmel, Israel, is an optimal natural microscale model for unraveling evolution in action highlighting the basic evolutionary processes of adaptation and speciation. A major model organism in ECI is wild emmer, Triticum dicoccoides, the progenitor of cultivated wheat, which displays dramatic interslope adaptive and speciational divergence on the tropical-xeric “African” slope (AS) and the temperate-mesic “European” slope (ES), separated on average by 250 m.

Methods

We examined 278 single sequence repeats (SSRs) and the phenotype diversity of the resistance to powdery mildew between the opposite slopes. Furthermore, 18 phenotypes on the AS and 20 phenotypes on the ES, were inoculated by both Bgt E09 and a mixture of powdery mildew races.

Results

In the experiment of genetic diversity, very little polymorphism was identified intra-slope in the accessions from both the AS or ES. By contrast, 148 pairs of SSR primers (53.23%) amplified polymorphic products between the phenotypes of AS and ES. There are some differences between the two wild emmer wheat genomes and the inter-slope SSR polymorphic products between genome A and B. Interestingly, all wild emmer types growing on the south-facing slope (SFS=AS) were susceptible to a composite of Blumeria graminis, while the ones growing on the north-facing slope (NFS=ES) were highly resistant to Blumeria graminis at both seedling and adult stages.

Conclusion/Significance

Remarkable inter-slope evolutionary divergent processes occur in wild emmer wheat, T. dicoccoides at EC I, despite the shot average distance of 250 meters. The AS, a dry and hot slope, did not develop resistance to powdery mildew, whereas the ES, a cool and humid slope, did develop resistance since the disease stress was strong there. This is a remarkable demonstration in host-pathogen interaction on how resistance develops when stress causes an adaptive result at a micro-scale distance.  相似文献   

11.
12.
13.

Background

Graphical representation of data is one of the most easily comprehended forms of explanation. The current study describes a simple visualization tool which may allow greater understanding of medical and epidemiological data.

Method

We propose a simple tool for visualization of data, known as a “quilt plot”, that provides an alternative to presenting large volumes of data as frequency tables. Data from the Australian Needle and Syringe Program survey are used to illustrate “quilt plots”.

Conclusion

Visualization of large volumes of data using “quilt plots” enhances interpretation of medical and epidemiological data. Such intuitive presentations are particularly useful for the rapid assessment of problems in the data which cannot be readily identified by manual review. We recommend that, where possible, “quilt plots” be used along with traditional quantitative assessments of the data as an explanatory data analysis tool.  相似文献   

14.

Background

Bacteriophages that infect the opportunistic pathogen Pseudomonas aeruginosa have been classified into several groups. One of them, which includes temperate phage particles with icosahedral heads and long flexible tails, bears genomes whose architecture and replication mechanism, but not their nucleotide sequences, are like those of coliphage Mu. By comparing the genomic sequences of this group of P. aeruginosa phages one could draw conclusions about their ontogeny and evolution.

Results

Two newly isolated Mu-like phages of P. aeruginosa are described and their genomes sequenced and compared with those available in the public data banks. The genome sequences of the two phages are similar to each other and to those of a group of P. aeruginosa transposable phages. Comparing twelve of these genomes revealed a common genomic architecture in the group. Each phage genome had numerous genes with homologues in all the other genomes and a set of variable genes specific for each genome. The first group, which comprised most of the genes with assigned functions, was named “core genome”, and the second group, containing mostly short ORFs without assigned functions was called “accessory genome”. Like in other phage groups, variable genes are confined to specific regions in the genome.

Conclusion

Based on the known and inferred functions for some of the variable genes of the phages analyzed here, they appear to confer selective advantages for the phage survival under particular host conditions. We speculate that phages have developed a mechanism for horizontally acquiring genes to incorporate them at specific loci in the genome that help phage adaptation to the selective pressures imposed by the host.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1146) contains supplementary material, which is available to authorized users.  相似文献   

15.

Background

Published data on the association between PSCA rs2294008 polymorphism and cancer risk have implicated inconclusive results. To determine the relationship and to precisely assess the effect size estimate of the association, we performed a meta-analysis.

Methods

We searched published literature in Embase and PubMed databases using the search terms “PSCA”, “prostate stem cell antigen”, “variants”, “polymorphism”, “polymorphisms”, and “cancer”. A total of 21 eligible articles were retrieved, with 27, 197 cancer cases and 48, 237 controls.

Results

On the whole, we found the association between PSCA rs2294008 polymorphism and cancer risk was statistically significant: TT vs CC: OR = 1.18, 95% CI, 1.10 to 1.27; TT + CT vs CC: OR = 1.08, 95% CI, 1.05 to 1.10; TT vs CT + CC: OR = 1.14, 95% CI, 1.07 to 1.21; T vs C: OR = 1.10, 95% CI, 1.06 to 1.14; CT vs CC: OR = 1.10, 95% CI, 1.06 to 1.13. Stratified analyses in cancer type and ethnicity showed similar results.

Conclusions

Based on the statistical evidence, we can draw a conclusion that the rs2294008 polymorphism of PSCA gene is likely to play a role in cancer carcinogenesis, especially in gastric cancer and bladder cancer.  相似文献   

16.
17.
Genome-wide association study (GWAS) aims to discover genetic factors underlying phenotypic traits. The large number of genetic factors poses both computational and statistical challenges. Various computational approaches have been developed for large scale GWAS. In this chapter, we will discuss several widely used computational approaches in GWAS. The following topics will be covered: (1) An introduction to the background of GWAS. (2) The existing computational approaches that are widely used in GWAS. This will cover single-locus, epistasis detection, and machine learning methods that have been recently developed in biology, statistic, and computer science communities. This part will be the main focus of this chapter. (3) The limitations of current approaches and future directions.

What to Learn in This Chapter

  • The background of Genome-wide association study (GWAS).
  • The existing computational approaches that are widely used in GWAS. This will cover single-locus, epistasis detection, and machine learning methods.
  • The limitations of current approaches and future directions.
This article is part of the “Translational Bioinformatics” collection for PLOS Computational Biology.
  相似文献   

18.
Untranslated gene regions (UTRs) play an important role in controlling gene expression. 3′-UTRs are primarily targeted by microRNA (miRNA) molecules that form complex gene regulatory networks. Cancer genomes are replete with non-coding mutations, many of which are connected to changes in tumor gene expression that accompany the development of cancer and are associated with resistance to therapy. Therefore, variants that occurred in 3′-UTR under cancer progression should be analysed to predict their phenotypic effect on gene expression, e.g., by evaluating their impact on miRNA target sites. Here, we analyze 3′-UTR variants in DICER1 and DROSHA genes in the context of myelodysplastic syndrome (MDS) development. The key features of this analysis include an assessment of both “canonical” and “non-canonical” types of mRNA-miRNA binding and tissue-specific profiling of miRNA interactions with wild-type and mutated genes. As a result, we obtained a list of DICER1 and DROSHA variants likely altering the miRNA sites and, therefore, potentially leading to the observed tissue-specific gene downregulation. All identified variants have low population frequency consistent with their potential association with pathology progression.  相似文献   

19.
20.
Geographic partitioning is postulated to foster divergence of Helicobacter pylori populations as an adaptive response to local differences in predominant host physiology. H. pylori's ability to establish persistent infection despite host inflammatory responses likely involves active management of host defenses using bacterial proteins that may themselves be targets for adaptive evolution. Sequenced H. pylori genomes encode a family of eight or nine secreted proteins containing repeat motifs that are characteristic of the eukaryotic Sel1 regulatory protein, whereas the related Campylobacter and Wolinella genomes each contain only one or two such “Sel1-like repeat” (SLR) genes (“slr genes”). Signatures of positive selection (ratio of nonsynonymous to synonymous mutations, dN/dS = ω > 1) were evident in the evolutionary history of H. pylori slr gene family expansion. Sequence analysis of six of these slr genes (hp0160, hp0211, hp0235, hp0519, hp0628, and hp1117) from representative East Asian, European, and African H. pylori strains revealed that all but hp0628 had undergone positive selection, with different amino acids often selected in different regions. Most striking was a divergence of Japanese and Korean alleles of hp0519, with Japanese alleles having undergone particularly strong positive selection (ωJ > 25), whereas alleles of other genes from these populations were intermingled. Homology-based structural modeling localized most residues under positive selection to SLR protein surfaces. Rapid evolution of certain slr genes in specific H. pylori lineages suggests a model of adaptive change driven by selection for fine-tuning of host responses, and facilitated by geographic isolation. Characterization of such local adaptations should help elucidate how H. pylori manages persistent infection, and potentially lead to interventions tailored to diverse human populations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号