首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
The spectrum of mutations discovered in cancer genomes can be explained by the activity of a few elementary mutational processes. We present a novel probabilistic method, EMu, to infer the mutational signatures of these processes from a collection of sequenced tumors. EMu naturally incorporates the tumor-specific opportunity for different mutation types according to sequence composition. Applying EMu to breast cancer data, we derive detailed maps of the activity of each process, both genome-wide and within specific local regions of the genome. Our work provides new opportunities to study the mutational processes underlying cancer development. EMu is available at http://www.sanger.ac.uk/resources/software/emu/.  相似文献   

2.
3.
In France, Bacillus anthracis subgroup B2 strains do not metabolize starch or glycogen but can use gluconate, whereas subgroup A1 strains show the inverse pattern. Functional genetic analysis revealed that mutations in the amyS and gntK genes encoding an alpha-amylase and a gluconate kinase, respectively, were responsible for these phenotypes.Bacillus anthracis, the etiological agent of anthrax, is a gram-positive, aerobic soil bacterium. Multilocus variable-number tandem repeat analysis of a collection of French isolates shows that the main groups of B. anthracis groups A (subgroup A1) and B (subgroup B2) described worldwide are represented (1, 2). Subgroup B2 isolates are the most common isolates in France and are found particularly in southern mountain regions, but they are extremely rare elsewhere in the world. Biochemical characterization of French isolates indicates that subgroup A1 and B2 strains have different carbohydrate utilization patterns (P. Vaissaire, A. Fouet, K. L. Smith, C. Keys, C. Le Doujet, P. Sylvestre, M. Levy, P. Keim, and M. Mock, presented at the 5th International Conference on Anthrax and 3rd International Workshop on the Molecular Biology of Bacillus cereus, B. anthracis and B. thuringiensis, 30 March to 3 April 2003, Nice, France). French subgroup A1 strains metabolize starch and glycogen but not gluconate, and the inverse is true for subgroup B2 strains. The genomes of several B. anthracis strains are available on the NCBI website (http://www.ncbi.nlm.nih.gov/), and two of these strains, Ames and CNEVA, are representative of groups A and B, respectively. We compared the genomic sequences of Ames and CNEVA to identify mutations that may affect metabolic activities involved in the phenotypic differences.The Kegg pathway database (http://www.genome.jp/kegg/pathway.html) was used to select enzyme activities involved in the metabolic pathways for starch, glycogen, and gluconate. BLAST analysis of the corresponding open reading frame in the Ames (subgroup A3) and CNEVA (subgroup B2) genomes was then used to identify the selected genes that were interrupted or mutated. The functions and localizations of these open reading frames were then investigated with the Pfam (http://pfam.sanger.ac.uk/), CDD (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml), SMART (http://smart.embl-heidelberg.de/), SignalP (http://www.cbs.dtu.dk/services/SignalP/), and TMHMM (http://www.cbs.dtu.dk/services/TMHMM-2.0/) search programs. A single-base deletion in the amyS gene (BA3551) encoding an alpha-amylase linked to starch and glycogen metabolism was found in the CNEVA genome. The wild-type AmyS protein contains 513 amino acids, and its predicted molecular mass is 58.4 kDa. In subgroup B2, there is a frameshift due to deletion of an adenosine in the 7th position of the nucleotide sequence that leads to a premature stop codon in the 13th position. In the Ames genome, a single-base substitution was found in the gntK gene (BA0162) encoding a gluconate kinase linked to gluconate metabolism. The predicted wild-type GntK protein contains 511 amino acids, and its predicted molecular mass is 56.7 kDa. The mutation identified is a cytosine-to-adenosine substitution at position 530 of the nucleotide sequence that leads to a premature stop codon at amino acid position 176. We confirmed the presence of these two mutations in the other B. anthracis subgroup genomes accessible in the NCBI unfinished microbial genome database and sequenced 12 isolates with various genotypes belonging to subgroups A1 and B2 (6 isolates in each subgroup) originating from outbreaks that occurred in different regions of France over the last 15 years. These analyses revealed that the deletion in amyS is restricted to strains belonging to group B subgroups, whereas the substitution in gntK is restricted to strains belonging to group A subgroups. The mutations identified in amyS and gntK both result in premature stop codons that lead to a loss of the enzymatic activities and may thus account for the observed phenotypic differences between subgroup A1 and B2 strains. We therefore focused on these two genes and used French strains 9602R and RA3R belonging to subgroups A1 and B2, respectively, for further analysis.  相似文献   

4.
5.
PDBsum1 is a standalone set of programs to perform the same structural analyses as provided by the PDBsum web server (https://www.ebi.ac.uk/pdbsum). The server has pages for every entry in the Protein Data Bank (PDB) and can also process user‐uploaded PDB files, returning a password‐protected set of pages that are retained for around 3 months. The standalone version described here allows for in‐house processing and indefinite retention of the results. All data files and images are pre‐generated, rather than on‐the‐fly as in the web version, so can be easily accessed. The program runs on Linux, Windows, and mac operating systems and is freely available for academic use at https://www.ebi.ac.uk/thornton-srv/software/PDBsum1.  相似文献   

6.

Background

Phylogenetic-based classification of M. tuberculosis and other bacterial genomes is a core analysis for studying evolutionary hypotheses, disease outbreaks and transmission events. Whole genome sequencing is providing new insights into the genomic variation underlying intra- and inter-strain diversity, thereby assisting with the classification and molecular barcoding of the bacteria. One roadblock to strain investigation is the lack of user-interactive solutions to interrogate and visualise variation within a phylogenetic tree setting.

Results

We have developed a web-based tool called PhyTB (http://pathogenseq.lshtm.ac.uk/phytblive/index.php) to assist phylogenetic tree visualisation and identification of M. tuberculosis clade-informative polymorphism. Variant Call Format files can be uploaded to determine a sample position within the tree. A map view summarises the geographical distribution of alleles and strain-types. The utility of the PhyTB is demonstrated on sequence data from 1,601 M. tuberculosis isolates.

Conclusion

PhyTB contextualises M. tuberculosis genomic variation within epidemiological, geographical and phylogenic settings. Further tool utility is possible by incorporating large variants and phenotypic data (e.g. drug-resistance profiles), and an assessment of genotype-phenotype associations. Source code is available to develop similar websites for other organisms (http://sourceforge.net/projects/phylotrack).  相似文献   

7.
Dictyostelium is an attractive model system for the study of mechanisms basic to cellular function or complex multicellular developmental processes. Recent advances in Dictyostelium genomics have generated a wide spectrum of resources. However, much of the current genomic sequence information is still not currently available through GenBank or related databases. Thus, many investigators are unaware that extensive sequence data from Dictyostelium has been compiled, or of its availability and access. Here, we discuss progress in Dictyostelium genomics and gene annotation, and highlight the primary portals for sequence access, manipulation and analysis (http://genome.imb-jena.de/dictyostelium/; http://dictygenome.bcm.tmc.edu/; http://www.sanger. ac.uk/Projects/D_discoideum/; http://www.csm.biol. tsukuba.ac.jp/cDNAproject.html).  相似文献   

8.
9.
The Hawaiian strain (CB4856) of Caenorhabditis elegans is one of the most divergent from the canonical laboratory strain N2 and has been widely used in developmental, population, and evolutionary studies. To enhance the utility of the strain, we have generated a draft sequence of the CB4856 genome, exploiting a variety of resources and strategies. When compared against the N2 reference, the CB4856 genome has 327,050 single nucleotide variants (SNVs) and 79,529 insertion–deletion events that result in a total of 3.3 Mb of N2 sequence missing from CB4856 and 1.4 Mb of sequence present in CB4856 but not present in N2. As previously reported, the density of SNVs varies along the chromosomes, with the arms of chromosomes showing greater average variation than the centers. In addition, we find 61 regions totaling 2.8 Mb, distributed across all six chromosomes, which have a greatly elevated SNV density, ranging from 2 to 16% SNVs. A survey of other wild isolates show that the two alternative haplotypes for each region are widely distributed, suggesting they have been maintained by balancing selection over long evolutionary times. These divergent regions contain an abundance of genes from large rapidly evolving families encoding F-box, MATH, BATH, seven-transmembrane G-coupled receptors, and nuclear hormone receptors, suggesting that they provide selective advantages in natural environments. The draft sequence makes available a comprehensive catalog of sequence differences between the CB4856 and N2 strains that will facilitate the molecular dissection of their phenotypic differences. Our work also emphasizes the importance of going beyond simple alignment of reads to a reference genome when assessing differences between genomes.  相似文献   

10.
One essential application in bioinformatics that is affected by the high-throughput sequencing data deluge is the sequence alignment problem, where nucleotide or amino acid sequences are queried against targets to find regions of close similarity. When queries are too many and/or targets are too large, the alignment process becomes computationally challenging. This is usually addressed by preprocessing techniques, where the queries and/or targets are indexed for easy access while searching for matches. When the target is static, such as in an established reference genome, the cost of indexing is amortized by reusing the generated index. However, when the targets are non-static, such as contigs in the intermediate steps of a de novo assembly process, a new index must be computed for each run. To address such scalability problems, we present DIDA, a novel framework that distributes the indexing and alignment tasks into smaller subtasks over a cluster of compute nodes. It provides a workflow beyond the common practice of embarrassingly parallel implementations. DIDA is a cost-effective, scalable and modular framework for the sequence alignment problem in terms of memory usage and runtime. It can be employed in large-scale alignments to draft genomes and intermediate stages of de novo assembly runs. The DIDA source code, sample files and user manual are available through http://www.bcgsc.ca/platform/bioinfo/software/dida. The software is released under the British Columbia Cancer Agency License (BCCA), and is free for academic use.  相似文献   

11.
12.
13.

Background

Assembling genes from next-generation sequencing data is not only time consuming but computationally difficult, particularly for taxa without a closely related reference genome. Assembling even a draft genome using de novo approaches can take days, even on a powerful computer, and these assemblies typically require data from a variety of genomic libraries. Here we describe software that will alleviate these issues by rapidly assembling genes from distantly related taxa using a single library of paired-end reads: aTRAM, automated Target Restricted Assembly Method. The aTRAM pipeline uses a reference sequence, BLAST, and an iterative approach to target and locally assemble the genes of interest.

Results

Our results demonstrate that aTRAM rapidly assembles genes across distantly related taxa. In comparative tests with a closely related taxon, aTRAM assembled the same sequence as reference-based and de novo approaches taking on average < 1 min per gene. As a test case with divergent sequences, we assembled >1,000 genes from six taxa ranging from 25 – 110 million years divergent from the reference taxon. The gene recovery was between 97 – 99% from each taxon.

Conclusions

aTRAM can quickly assemble genes across distantly-related taxa, obviating the need for draft genome assembly of all taxa of interest. Because aTRAM uses a targeted approach, loci can be assembled in minutes depending on the size of the target. Our results suggest that this software will be useful in rapidly assembling genes for phylogenomic projects covering a wide taxonomic range, as well as other applications. The software is freely available http://www.github.com/juliema/aTRAM.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0515-2) contains supplementary material, which is available to authorized users.  相似文献   

14.
Two large-scale phenotyping efforts, the European Mouse Disease Clinic (EUMODIC) and the Wellcome Trust Sanger Institute Mouse Genetics Project (SANGER-MGP), started during the late 2000s with the aim to deliver a comprehensive assessment of phenotypes or to screen for robust indicators of diseases in mouse mutants. They both took advantage of available mouse mutant lines but predominantly of the embryonic stem (ES) cells resources derived from the European Conditional Mouse Mutagenesis programme (EUCOMM) and the Knockout Mouse Project (KOMP) to produce and study 799 mouse models that were systematically analysed with a comprehensive set of physiological and behavioural paradigms. They captured more than 400 variables and an additional panel of metadata describing the conditions of the tests. All the data are now available through EuroPhenome database (www.europhenome.org) and the WTSI mouse portal (http://www.sanger.ac.uk/mouseportal/), and the corresponding mouse lines are available through the European Mouse Mutant Archive (EMMA), the International Knockout Mouse Consortium (IKMC), or the Knockout Mouse Project (KOMP) Repository. Overall conclusions from both studies converged, with at least one phenotype scored in at least 80?% of the mutant lines. In addition, 57?% of the lines were viable, 13?% subviable, 30?% embryonic lethal, and 7?% displayed fertility impairments. These efforts provide an important underpinning for a future global programme that will undertake the complete functional annotation of the mammalian genome in the mouse model.  相似文献   

15.
The PDBsum web server provides structural analyses of the entries in the Protein Data Bank (PDB). Two recent additions are described here. The first is the detailed analysis of the SARS‐CoV‐2 virus protein structures in the PDB. These include the variants of concern, which are shown both on the sequences and 3D structures of the proteins. The second addition is the inclusion of the available AlphaFold models for human proteins. The pages allow a search of the protein against existing structures in the PDB via the Sequence Annotated by Structure (SAS) server, so one can easily compare the predicted model against experimentally determined structures. The server is freely accessible to all at http://www.ebi.ac.uk/pdbsum.  相似文献   

16.
Summary: DNAPlotter is an interactive Java application for generatingcircular and linear representations of genomes. Making use ofthe Artemis libraries to provide a user-friendly method of loadingin sequence files (EMBL, GenBank, GFF) as well as data fromrelational databases, it filters features of interest to displayon separate user-definable tracks. It can be used to producepublication quality images for papers or web pages. Availability: DNAPlotter is freely available (under a GPL licence)for download (for MacOSX, UNIX and Windows) at the WellcomeTrust Sanger Institute web sites: http://www.sanger.ac.uk/Software/Artemis/circular/ Contact: artemis{at}sanger.ac.uk Associate Editor: John Quackenbush  相似文献   

17.
18.
A model to predict the population density of verotoxigenic Escherichia coli (VTEC) throughout the elaboration and storage of fermented raw-meat sausages (FRMS) was developed. Probabilistic and kinetic measurement data sets collected from publicly available resources were completed with new measurements when required and used to quantify the dependence of VTEC growth and inactivation on the temperature, pH, water activity (aw), and concentration of lactic acid. Predictions were compared with observations in VTEC-contaminated FRMS manufactured in a pilot plant. Slight differences in the reduction of VTEC were predicted according to the fermentation temperature, 24 or 34°C, with greater inactivation at the highest temperature. The greatest reduction was observed during storage at high temperatures. A population decrease greater than 6 decimal logarithmic units was observed after 66 days of storage at 25°C, while a reduction of only ca. 1 logarithmic unit was detected at 12°C. The performance of our model and other modeling approaches was evaluated throughout the processing of dry and semidry FRMS. The greatest inactivation of VTEC was predicted in dry FRMS with long drying periods, while the smallest reduction was predicted in semidry FMRS with short drying periods. The model is implemented in a computing tool, E. coli SafeFerment (EcSF), freely available from http://www.ifr.ac.uk/safety/EcoliSafeFerment. EcSF integrates growth, probability of growth, and thermal and nonthermal inactivation models to predict the VTEC concentration throughout FRMS manufacturing and storage under constant or fluctuating environmental conditions.  相似文献   

19.
DNA tandem repeats (TRs) are ubiquitous genomic features which consist of two or more adjacent copies of an underlying pattern sequence. The copies may be identical or approximate. Variable number of tandem repeats or VNTRs are polymorphic TR loci in which the number of pattern copies is variable. In this paper we describe VNTRseek, our software for discovery of minisatellite VNTRs (pattern size ≥ 7 nucleotides) using whole genome sequencing data. VNTRseek maps sequencing reads to a set of reference TRs and then identifies putative VNTRs based on a discrepancy between the copy number of a reference and its mapped reads. VNTRseek was used to analyze the Watson and Khoisan genomes (454 technology) and two 1000 Genomes family trios (Illumina). In the Watson genome, we identified 752 VNTRs with pattern sizes ranging from 7 to 84 nt. In the Khoisan genome, we identified 2572 VNTRs with pattern sizes ranging from 7 to 105 nt. In the trios, we identified between 2660 and 3822 VNTRs per individual and found nearly 100% consistency with Mendelian inheritance. VNTRseek is, to the best of our knowledge, the first software for genome-wide detection of minisatellite VNTRs. It is available at http://orca.bu.edu/vntrseek/.  相似文献   

20.
Rhizobium leguminosarum bv. trifolii SRDI943 (strain syn. V2-2) is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from a root nodule of Trifolium michelianum Savi cv. Paradana that had been grown in soil collected from a mixed pasture in Victoria, Australia. This isolate was found to have a broad clover host range but was sub-optimal for nitrogen fixation with T. subterraneum (fixing 20-54% of reference inoculant strain WSM1325) and was found to be totally ineffective with the clover species T. polymorphum and T. pratense. Here we describe the features of R. leguminosarum bv. trifolii strain SRDI943, together with genome sequence information and annotation. The 7,412,387 bp high-quality-draft genome is arranged into 5 scaffolds of 5 contigs, contains 7,317 protein-coding genes and 89 RNA-only encoding genes, and is one of 100 rhizobial genomes sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号