首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.

Background

Most biological processes are influenced by protein post-translational modifications (PTMs). Identifying novel PTM sites in different organisms, including humans and model organisms, has expedited our understanding of key signal transduction mechanisms. However, with increasing availability of deep, quantitative datasets in diverse species, there is a growing need for tools to facilitate cross-species comparison of PTM data. This is particularly important because functionally important modification sites are more likely to be evolutionarily conserved; yet cross-species comparison of PTMs is difficult since they often lie in structurally disordered protein domains. Current tools that address this can only map known PTMs between species based on known orthologous phosphosites, and do not enable the cross-species mapping of newly identified modification sites. Here, we addressed this by developing a web-based software tool, PhosphOrtholog (www.phosphortholog.com) that accurately maps protein modification sites between different species. This facilitates the comparison of datasets derived from multiple species, and should be a valuable tool for the proteomics community.

Results

Here we describe PhosphOrtholog, a web-based application for mapping known and novel orthologous PTM sites from experimental data obtained from different species. PhosphOrtholog is the only generic and automated tool that enables cross-species comparison of large-scale PTM datasets without relying on existing PTM databases. This is achieved through pairwise sequence alignment of orthologous protein residues. To demonstrate its utility we apply it to two sets of human and rat muscle phosphoproteomes generated following insulin and exercise stimulation, respectively, and one publicly available mouse phosphoproteome following cellular stress revealing high mapping and coverage efficiency. Although coverage statistics are dataset dependent, PhosphOrtholog increased the number of cross-species mapped sites in all our example data sets by more than double when compared to those recovered using existing resources such as PhosphoSitePlus.

Conclusions

PhosphOrtholog is the first tool that enables mapping of thousands of novel and known protein phosphorylation sites across species, accessible through an easy-to-use web interface. Identification of conserved PTMs across species from large-scale experimental data increases our knowledgebase of functional PTM sites. Moreover, PhosphOrtholog is generic being applicable to other PTM datasets such as acetylation, ubiquitination and methylation.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1820-x) contains supplementary material, which is available to authorized users.  相似文献   

4.

Background

Whole genome sequences (WGS) have proliferated as sequencing technology continues to improve and costs decline. While many WGS of model or domestic organisms have been produced, a growing number of non-model species are also being sequenced. In the absence of a reference, construction of a genome sequence necessitates de novo assembly which may be beyond the ability of many labs due to the large volumes of raw sequence data and extensive bioinformatics required. In contrast, the presence of a reference WGS allows for alignment which is more tractable than assembly. Recent work has highlighted that the reference need not come from the same species, potentially enabling a wide array of species WGS to be constructed using cross-species alignment. Here we report on the creation a draft WGS from a single bighorn sheep (Ovis canadensis) using alignment to the closely related domestic sheep (Ovis aries).

Results

Two sequencing libraries on SOLiD platforms yielded over 865 million reads, and combined alignment to the domestic sheep reference resulted in a nearly complete sequence (95% coverage of the reference) at an average of 12x read depth (104 SD). From this we discovered over 15 million variants and annotated them relative to the domestic sheep reference. We then conducted an enrichment analysis of those SNPs showing fixed differences between the reference and sequenced individual and found significant differences in a number of gene ontology (GO) terms, including those associated with reproduction, muscle properties, and bone deposition.

Conclusion

Our results demonstrate that cross-species alignment enables the creation of novel WGS for non-model organisms. The bighorn sheep WGS will provide a resource for future resequencing studies or comparative genomics.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1618-x) contains supplementary material, which is available to authorized users.  相似文献   

5.
Methods for the analysis of chromatin immunoprecipitation sequencing (ChIP-seq) data start by aligning the short reads to a reference genome. While often successful, they are not appropriate for cases where a reference genome is not available. Here we develop methods for de novo analysis of ChIP-seq data. Our methods combine de novo assembly with statistical tests enabling motif discovery without the use of a reference genome. We validate the performance of our method using human and mouse data. Analysis of fly data indicates that our method outperforms alignment based methods that utilize closely related species.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0756-4) contains supplementary material, which is available to authorized users.  相似文献   

6.

Background

While the gargantuan multi-nation effort of sequencing T. aestivum gets close to completion, the annotation process for the vast number of wheat genes and proteins is in its infancy. Previous experimental studies carried out on model plant organisms such as A. thaliana and O. sativa provide a plethora of gene annotations that can be used as potential starting points for wheat gene annotations, proven that solid cross-species gene-to-gene and protein-to-protein correspondences are provided.

Results

DNA and protein sequences and corresponding annotations for T. aestivum and 9 other plant species were collected from Ensembl Plants release 22 and curated. Cliques of predicted 1-to-1 orthologs were identified and an annotation enrichment model was defined based on existing gene-GO term associations and phylogenetic relationships among wheat and 9 other plant species. A total of 13 cliques of size 10 were identified, which represent putative functionally equivalent genes and proteins in the 10 plant species. Eighty-five new and more specific GO terms were associated with wheat genes in the 13 cliques of size 10, which represent a 65% increase compared with the previously 130 known GO terms. Similar expression patterns for 4 genes from Arabidopsis, barley, maize and rice in cliques of size 10 provide experimental evidence to support our model. Overall, based on clique size equal or larger than 3, our model enriched the existing gene-GO term associations for 7,838 (8%) wheat genes, of which 2,139 had no previous annotation.

Conclusions

Our novel comparative genomics approach enriches existing T. aestivum gene annotations based on cliques of predicted 1-to-1 orthologs, phylogenetic relationships and existing gene ontologies from 9 other plant species.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1496-2) contains supplementary material, which is available to authorized users.  相似文献   

7.

Background

Despite having predominately deleterious fitness effects, transposable elements (TEs) are major constituents of eukaryote genomes in general and of plant genomes in particular. Although the proportion of the genome made up of TEs varies at least four-fold across plants, the relative importance of the evolutionary forces shaping variation in TE abundance and distributions across taxa remains unclear. Under several theoretical models, mating system plays an important role in governing the evolutionary dynamics of TEs. Here, we use the recently sequenced Capsella rubella reference genome and short-read whole genome sequencing of multiple individuals to quantify abundance, genome distributions, and population frequencies of TEs in three recently diverged species of differing mating system, two self-compatible species (C. rubella and C. orientalis) and their self-incompatible outcrossing relative, C. grandiflora.

Results

We detect different dynamics of TE evolution in our two self-compatible species; C. rubella shows a small increase in transposon copy number, while C. orientalis shows a substantial decrease relative to C. grandiflora. The direction of this change in copy number is genome wide and consistent across transposon classes. For insertions near genes, however, we detect the highest abundances in C. grandiflora. Finally, we also find differences in the population frequency distributions across the three species.

Conclusion

Overall, our results suggest that the evolution of selfing may have different effects on TE evolution on a short and on a long timescale. Moreover, cross-species comparisons of transposon abundance are sensitive to reference genome bias, and efforts to control for this bias are key when making comparisons across species.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-602) contains supplementary material, which is available to authorized users.  相似文献   

8.

Background

The application of phages is a promising tool to reduce the number of Campylobacter along the food chain. Besides the efficacy against a broad range of strains, phages have to be safe in terms of their genomes. Thus far, no genes with pathogenic potential (e.g., genes encoding virulence factors) have been detected in Campylobacter phages. However, preliminary studies suggested that the genomes of group II phages may be diverse and prone to genomic rearrangements.

Results

We determined and analysed the genomic sequence (182,761 bp) of group II phage CP21 that is closely related to the already characterized group II phages CP220 and CPt10. The genomes of these phages are comprised of four modules separated by very similar repeat regions, some of which harbouring open reading frames (ORFs). Though, the arrangement of the modules and the location of some ORFs on the genomes are different in CP21 and in CP220/CPt10. In this work, a PCR system was established to study the modular genome organization of other group II phages demonstrating that they belong to different subgroups of the CP220-like virus genus, the prototypes of which are CP21 and CP220. The subgroups revealed different restriction patterns and, interestingly enough, also distinct host specificities, tail fiber proteins and tRNA genes. We additionally analysed the genome of group II phage vB_CcoM-IBB_35 (IBB_35) for which to date only five individual contigs could be determined. We show that the contigs represent modules linked by long repeat regions enclosing some yet not identified ORFs (e.g., for a head completion protein). The data suggest that IBB_35 is a member of the CP220 subgroup.

Conclusion

Campylobacter group II phages are diverse regarding their genome organization. Since all hitherto characterized group II phages contain numerous genes for transposases and homing endonucleases as well as similar repeat regions, it cannot be excluded that these phages are genetically unstable. To answer this question, further experiments and sequencing of more group II phages should be performed.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1837-1) contains supplementary material, which is available to authorized users.  相似文献   

9.

Background

Microsatellite loci have high mutation rates and thus are indicative of mutational processes within the genome. By concentrating on the symbiotic and aposymbiotic cnidarians, we investigated if microsatellite abundances follow a phylogenetic or ecological pattern. Individuals from eight species were shotgun sequenced using 454 GS-FLX Titanium technology. Sequences from the three available cnidarian genomes (Nematostella vectensis, Hydra magnipapillata and Acropora digitifera) were added to the analysis for a total of eleven species representing two classes, three subclasses and eight orders within the phylum Cnidaria.

Results

Trinucleotide and tetranucleotide repeats were the most abundant motifs, followed by hexa- and dinucleotides. Pentanucleotides were the least abundant motif in the data set. Hierarchical clustering and log likelihood ratio tests revealed a weak relationship between phylogeny and microsatellite content. Further, comparisons between cnidaria harboring intracellular dinoflagellates and those that do not, show microsatellite coverage is higher in the latter group.

Conclusions

Our results support previous studies that found tri- and tetranucleotides to be the most abundant motifs in invertebrates. Differences in microsatellite coverage and composition between symbiotic and non-symbiotic cnidaria suggest the presence/absence of dinoflagellates might place restrictions on the host genome.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-939) contains supplementary material, which is available to authorized users.  相似文献   

10.
11.

Background

Cellular organelles with genomes of their own (e.g. plastids and mitochondria) can pass genetic sequences to other organellar genomes within the cell in many species across the eukaryote phylogeny. The extent of the occurrence of these organellar-derived inserted sequences (odins) is still unknown, but if not accounted for in genomic and phylogenetic studies, they can be a source of error. However, if correctly identified, these inserted sequences can be used for evolutionary and comparative genomic studies. Although such insertions can be detected using various laboratory and bioinformatic strategies, there is currently no straightforward way to apply them as a standard organellar genome assembly on next-generation sequencing data. Furthermore, most current methods for identification of such insertions are unsuitable for use on non-model organisms or ancient DNA datasets.

Results

We present a bioinformatic method that uses phasing algorithms to reconstruct both source and inserted organelle sequences. The method was tested in different shotgun and organellar-enriched DNA high-throughput sequencing (HTS) datasets from ancient and modern samples. Specifically, we used datasets from lions (Panthera leo ssp. and Panthera leo leo) to characterize insertions from mitochondrial origin, and from common grapevine (Vitis vinifera) and bugle (Ajuga reptans) to characterize insertions derived from plastid genomes. Comparison of the results against other available organelle genome assembly methods demonstrated that our new method provides an improvement in the sequence assembly.

Conclusion

Using datasets from a wide range of species and different levels of complexity we showed that our novel bioinformatic method based on phasing algorithms can be used to achieve the next two goals: i) reference-guided assembly of chloroplast/mitochondrial genomes from HTS data and ii) identification and simultaneous assembly of odins. This method represents the first application of haplotype phasing for automatic detection of odins and reference-based organellar genome assembly.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0682-1) contains supplementary material, which is available to authorized users.  相似文献   

12.

Background

Carbohydrate metabolism is a key feature of vascular plant architecture, and is of particular importance in large woody species, where lignocellulosic biomass is responsible for bearing the bulk of the stem and crown. Since Carbohydrate Active enZymes (CAZymes) in plants are responsible for the synthesis, modification and degradation of carbohydrate biopolymers, the differences in gene copy number and regulation between woody and herbaceous species have been highlighted previously. There are still many unanswered questions about the role of CAZymes in land plant evolution and the formation of wood, a strong carbohydrate sink.

Results

Here, twenty-two publically available plant genomes were used to characterize the frequency, diversity and complexity of CAZymes in plants. We find that a conserved suite of CAZymes is a feature of land plant evolution, with similar diversity and complexity regardless of growth habit and form. In addition, we compared the diversity and levels of CAZyme gene expression during wood formation in trees using mRNA-seq data from two distantly related angiosperm tree species Eucalyptus grandis and Populus trichocarpa, highlighting the major CAZyme classes involved in xylogenesis and lignocellulosic biomass production.

Conclusions

CAZyme domain ratio across embryophytes is maintained, and the diversity of CAZyme domains is similar in all land plants, regardless of woody habit. The stoichiometric conservation of gene expression in woody and non-woody tissues of Eucalyptus and Populus are indicative of gene balance preservation.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1571-8) contains supplementary material, which is available to authorized users.  相似文献   

13.

Background

Terminal restriction fragment length polymorphism (T-RFLP) analysis is a DNA-fingerprinting method that can be used for comparisons of the microbial community composition in a large number of samples. There is no consensus on how T-RFLP data should be treated and analyzed before comparisons between samples are made, and several different approaches have been proposed in the literature. The analysis of T-RFLP data can be cumbersome and time-consuming, and for large datasets manual data analysis is not feasible. The currently available tools for automated T-RFLP analysis, although valuable, offer little flexibility, and few, if any, options regarding what methods to use. To enable comparisons and combinations of different data treatment methods an analysis template and an extensive collection of macros for T-RFLP data analysis using Microsoft Excel were developed.

Results

The Tools for T-RFLP data analysis template provides procedures for the analysis of large T-RFLP datasets including application of a noise baseline threshold and setting of the analysis range, normalization and alignment of replicate profiles, generation of consensus profiles, normalization and alignment of consensus profiles and final analysis of the samples including calculation of association coefficients and diversity index. The procedures are designed so that in all analysis steps, from the initial preparation of the data to the final comparison of the samples, there are various different options available. The parameters regarding analysis range, noise baseline, T-RF alignment and generation of consensus profiles are all given by the user and several different methods are available for normalization of the T-RF profiles. In each step, the user can also choose to base the calculations on either peak height data or peak area data.

Conclusions

The Tools for T-RFLP data analysis template enables an objective and flexible analysis of large T-RFLP datasets in a widely used spreadsheet application.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0361-7) contains supplementary material, which is available to authorized users.  相似文献   

14.
15.

Background

The growing wealth of public available gene expression data has made the systemic studies of how genes interact in a cell become more feasible. Liquid association (LA) describes the extent to which coexpression of two genes may vary based on the expression level of a third gene (the controller gene). However, genome-wide application has been difficult and resource-intensive. We propose a new screening algorithm for more efficient processing of LA estimation on a genome-wide scale and apply its use to a Saccharomyces cerevisiae data set.

Results

On a test subset of the data, the fast screening algorithm achieved >99.8% agreement with the exhaustive search of LA values, while reduced run time by 81–93 %. Using a well-known yeast cell-cycle data set with 6,178 genes, we identified triplet combinations with significantly large LA values. In an exploratory gene set enrichment analysis, the top terms for the controller genes in these triplets with large LA values are involved in some of the most fundamental processes in yeast such as energy regulation, transportation, and sporulation.

Conclusion

In summary, in this paper we propose a novel, efficient algorithm to explore LA on a genome-wide scale and identified triplets of interest in cell cycle pathways using the proposed method in a yeast data set. A software package named fastLiquidAssociation for implementing the algorithm is available through http://www.bioconductor.org.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0371-5) contains supplementary material, which is available to authorized users.  相似文献   

16.
17.

Background

The dense phytoplankton blooms that characterize productive regions and seasons in the oceans are dominated, from high to low latitudes and from coast line to open ocean, by comparatively few, often cosmopolitan species of diatoms. These key dominant species may undergo dramatic changes due to global climate change.

Results

In order to identify molecular stress-indicators for the ubiquitous diatom species Skeletonema marinoi, we tested stress-related genes in different environmental conditions (i.e. nutrient starvation/depletion, CO2-enrichment and combined effects of these stressors) using RT-qPCR. The data show that these stressors impact algal growth rate, inducing early aging and profound changes in expression levels of the genes of interest.

Conclusions

Most analyzed genes (e.g. antioxidant-related and aldehyde dehydrogenases) were strongly down-regulated which may indicate a strategy to avoid unnecessary over-investment in their respective proteins. By contrast, key genes were activated (e.g. HSPs, GOX) which may allow the diatom species to better cope with adverse conditions. We propose the use of this panel of genes as early bio-indicators of environmental stress factors in a changing ocean.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1574-5) contains supplementary material, which is available to authorized users.  相似文献   

18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号