首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Large-scale genome projects require the analysis of large amounts of raw data. This analysis often involves the application of a chain of biology-based programs. Many of these programs are difficult to operate because they are non-integrated, command-line driven, and platform-dependent. The problem is compounded when the number of data files involved is large, making navigation and status-tracking difficult. To demonstrate how this problem can be addressed, we have created a platform-independent Web front end that integrates a set of programs used in a genomic project analyzing gene function by transposon mutagenesis in Saccharomyces cerevisiae. In particular, these programs help define a large number of transposon insertion events within the yeast genome, identifying both the precise site of transposon insertion as well as potential open reading frames disrupted by this insertion event. Our Web interface facilitates this analysis by performing the following tasks. Firstly, it allows each of the analysis programs to be launched against multiple directories of data files. Secondly, it allows the user to view, download, and upload files generated by the programs. Thirdly, it indicates which sets of data directories have been processed by each program. Although designed specifically to aid in this project, our interface exemplifies a general approach by which independent software programs may be integrated into an efficient protocol for large-scale genomic data processing. Electronic Publication  相似文献   

2.
Computational prediction of the origin of replication is a challenging problem and of immense interest to biologists. Several methods have been proposed for identifying the replicon site for various classes of organisms. However, these methods have limited applicability since the replication mechanism is different in different organisms. We propose a correlation measure and show that it is correctly able to predict the origin of replication in most of the bacterial genomes. When applied to Methanocaldococcus jannaschii, Plasmodium falciparum apicoplast and Nicotiana tabacum plastid, this correlation based method is able to correctly predict the origin of replication whereas the generally used GC skew measure fails. Thus, this correlation based measure is a novel and promising tool for predicting the origin of replication in a wide class of organisms. This could have important implications in not only gaining a deeper understanding of the replication machinery in higher organisms, but also for drug discovery.  相似文献   

3.
4.
5.

Background

Metarhizium anisopliae is an important fungal biocontrol agent of insect pests of agricultural crops. Genomics can aid the successful commercialization of biopesticides by identification of key genes differentiating closely related species, selection of virulent microbial isolates which are amenable to industrial scale production and formulation and through the reduction of phenotypic variability. The genome of Metarhizium isolate ARSEF23 was recently published as a model for M. anisopliae, however phylogenetic analysis has since re-classified this isolate as M. robertsii. We present a new annotated genome sequence of M. anisopliae (isolate Ma69) and whole genome comparison to M. robertsii (ARSEF23) and M. acridum (CQMa 102).

Results

Whole genome analysis of M. anisopliae indicates significant macrosynteny with M. robertsii but with some large genomic inversions. In comparison to M. acridum, the genome of M. anisopliae shares lower sequence homology. While alignments overall are co-linear, the genome of M. acridum is not contiguous enough to conclusively observe macrosynteny. Mating type gene analysis revealed both MAT1-1 and MAT1-2 genes present in M. anisopliae suggesting putative homothallism, despite having no known teleomorph, in contrast with the putatively heterothallic M. acridum isolate CQMa 102 (MAT1-2) and M. robertsii isolate ARSEF23 (altered MAT1-1). Repetitive DNA and RIP analysis revealed M. acridum to have twice the repetitive content of the other two species and M. anisopliae to be five times more RIP affected than M. robertsii. We also present an initial bioinformatic survey of candidate pathogenicity genes in M. anisopliae.

Conclusions

The annotated genome of M. anisopliae is an important resource for the identification of virulence genes specific to M. anisopliae and development of species- and strain- specific assays. New insight into the possibility of homothallism and RIP affectedness has important implications for the development of M. anisopliae as a biopesticide as it may indicate the potential for greater inherent diversity in this species than the other species. This could present opportunities to select isolates with unique combinations of pathogenicity factors, or it may point to instability in the species, a negative attribute in a biopesticide.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-660) contains supplementary material, which is available to authorized users.  相似文献   

6.
Helicobacter hepaticus is an important pathogen in laboratory mice and induces the development of liver tumors and gastrointestinal disease in susceptible strains of mice. In this study, a miniset of 36 cosmid clones from a genomic library of H. hepaticus was ordered and grouped into four large contigs representing approximately 1 Mb of the H. hepaticus genome using PCR, DNA sequencing, Southern and dot-blot hybridization and pulsed-field gel electrophoresis. From the 200-300 terminal nucleotide sequences of 38 cosmid clones, 56 coding regions were predicted, of which 51 were found to have orthologs in the public databases and five appeared to be unique to H. hepaticus. Of these 51 genes, 36 have orthologs in Helicobacter pylori and 25 display the highest sequence similarity to H. pylori. However, chromosomal positions of these genes are not conserved between these two helicobacters. In addition, 10 H. hepaticus genes had the highest sequence similarity to orthologs in Campylobacter jejuni. The GC content in a randomly selected 21-kb H. hepaticus genomic sequence was 35.8%, which approximates the average between H. pylori (39%) and C. jejuni (30.6%). These results demonstrate that: (1) H. hepaticus is more closely related to H. pylori than C. jejuni; (2) significant genomic alterations exist between H. hepaticus and H. pylori, including gene organization, protein sequences and GC content, probably in part due to specific adaptation to distinct ecological niches.  相似文献   

7.
Quantitative proteomics technology based on isobaric tags is playing an important role in proteomic investigation. In this paper, we present an automated software, named IQuant, which integrates a postprocessing tool of protein identification and advanced statistical algorithms to process the MS/MS signals generated from the peptides labeled by isobaric tags and aims at proteomics quantification. The software of IQuant, which is freely downloaded at http://sourceforge.net/projects/iquant/ , can run from a graphical user interface and a command‐line interface, and can work on both Windows and Linux systems.  相似文献   

8.
基因组重排作为一种实用高效的育种技术,在缺乏遗传背景认知和可操作遗传体系等条件下,可以突破微生物种属间的限制,经过多轮递推的原生质体融合来加速其人工定向进化,在微生物菌种改良及代谢产物开发和产业化等研究领域得到了广泛应用。步入后基因组时代,快速发展的组学和生物信息学使基因组重排成为连接各种微生物育种方法的重要纽带,为我们深入探索微生物复杂的代谢网络和全局调控机制,更为精准地实施对微生物的人工调控和定向进化提供了契机。本文系统性地回顾了近年来基因组重排在微生物菌种选育中的应用研究,尤其针对围绕其开展的组学研究进行了详细阐述,并对基因组重排与组学、生物信息学和合成生物学等新兴技术的联合应用进行了展望。  相似文献   

9.
以上海某些医院临床分离到的多重耐药肺炎克雷伯菌为宿主菌,从不同环境的污水中分离获得1株肺炎克雷伯菌噬菌体KP002。电子显微镜显示其为有尾噬菌体,头部直径约70nm,尾长约80nm,尾宽约20nm。对其生物学特性进行研究,结果显示此株噬菌体在pH 3~9及4~50℃的环境中具有较高活性;6min吸附率达95%以上;潜伏期为10min,爆发期为50min;裂解量为172pfu/cell。结果表明,该噬菌体对pH值和温度适应范围较宽。对其全基因组进行测序分析,结果显示其基因组为环状双链DNA,全长47 173bp,GC含量为48%。本研究筛选获得1株对pH值和温度适应范围较宽的耐药肺炎克雷伯菌烈性噬菌体KP002,为建立耐药肺炎克雷伯菌的噬菌体库以用于治疗临床多重耐药菌感染提供了新的思路。  相似文献   

10.
We have isolated a new extremely thermophilic fast-growing Geobacillus strain that can efficiently utilize xylose, glucose, mannose and galactose for cell growth. When grown aerobically at 72 °C, Geobacillus LC300 has a growth rate of 2.15 h−1 on glucose and 1.52 h−1 on xylose (doubling time less than 30 min). The corresponding specific glucose and xylose utilization rates are 5.55 g/g/h and 5.24 g/g/h, respectively. As such, Geobacillus LC300 grows 3-times faster than E. coli on glucose and xylose, and has a specific xylose utilization rate that is 3-times higher than the best metabolically engineered organism to date. To gain more insight into the metabolism of Geobacillus LC300 its genome was sequenced using PacBio׳s RS II single-molecule real-time (SMRT) sequencing platform and annotated using the RAST server. Based on the genome annotation and the measured biomass composition a core metabolic network model was constructed. To further demonstrate the biotechnological potential of this organism, Geobacillus LC300 was grown to high cell-densities in a fed-batch culture, where cells maintained a high xylose utilization rate under low dissolved oxygen concentrations. All of these characteristics make Geobacillus LC300 an attractive host for future metabolic engineering and biotechnology applications.  相似文献   

11.
Glycosyltransferases comprise highly divergent groups of enzymes, which play a central role in the synthesis of complex glycans. Because the repertoire of glycosyltransferases in the genome determines the range of synthesizable glycans, and because the increasing amount of genome sequence data is now available, it is essential to examine these enzymes across organisms to explore possible structures and functions of the glycoconjugates. In this study, we systematically investigated 36 eukaryotic genomes and obtained 3426 glycosyltransferase homologs for biosynthesis of major glycans, classified into 53 families based on sequence similarity. The families were further grouped into six functional categories based on the biosynthetic pathways, which revealed characteristic patterns among organism groups in the degree of conservation and in the number of paralogs. The results also revealed a strong correlation between the number of glycosyltransferases and the number of coding genes in each genome. We then predicted the ability to synthesize major glycan structures including N-glycan precursors and GPI-anchors in each organism from the combination of the glycosyltransferase families. This indicates that not only parasitic protists but also some algae are likely to synthesize smaller structures than the structures known to be conserved among a wide range of eukaryotes. Finally we discuss the functions of two large families, sialyltransferases and β4-glycosyltransferases, by performing finer classifications into subfamilies. Our findings suggest that universality and diversity of glycans originate from two types of evolution of glycosyltransferase families, namely conserved families with few paralogs and diverged families with many paralogs.  相似文献   

12.
We have developed a tandem mass spectrometry (MS/MS) data analysis program for confirmation of sequence of chemically modified oligonucleotides. The method is based on the analysis of deconvoluted MS/MS data for fragment ions from three charge states and comparison of these data against a set of computer-generated masses from expected fragmentation patterns. The algorithm compares the experimental masses not only against the fragment set predicted for the expected sequence but also against a wider test set covering all next-neighbor position switches of the original sequence and all pairwise swaps of nucleosides, which in synthesis would result in molecules with masses within a preset mass tolerance. The algorithm is capable of identifying incorrect sequences that would not be distinguished by identity testing with electrospray ionization mass spectrometry. The method has been tested with permutations of the two 21-mer single strands of a chemically modified short interfering RNA containing 2′-O-methyl and phosphorothioate linkages. For both strands, challenge sequences were synthesized and tested with the premise that they were the original sequences. The algorithm correctly reported the locations of next-neighbor position switches and nucleoside swaps. The results confirm the approach as useful for MS/MS-based identity test methods for synthetic oligonucleotides.  相似文献   

13.
We present an overview of the gene content and organization of the mitochondrial genome of Dictyostelium discoideum. The mitochondria genome consists of 55,564 bp with an A + T content of 72.6%. The identified genes include those for two ribosomal RNAs (rnl and rns), 18 tRNAs, ten subunits of the NADH dehydrogenase complex (nad1, 2, 3, 4, 4L, 5, 6, 7, 9 and 11), apocytochrome b (cytb), three subunits of the cytochrome oxidase (cox1/2 and 3), four subunits of the ATP synthase complex (atp1, 6, 8 and 9), 15 ribosomal proteins, and five other ORFs, excluding intronic ORFs. Notable features of D. discoideum mtDNA include the following. (1) All genes are encoded on the same strand of the DNA and a universal genetic code is used. (2) The cox1 gene has no termination codon and is fused to the downstream cox2 gene. The 13 genes for ribosomal proteins and four ORF genes form a cluster 15.4 kb long with several gene overlaps. (3) The number of tRNAs encoded in the genome is not sufficient to support the synthesis of mitochondrial protein. (4) In total, five group I introns reside in rnl and cox1/2, and three of those in cox1/2 contain four free-standing ORFs. We compare the genome to other sequenced mitochondrial genomes, particularly that of Acanthamoeba castellanii. Received: 5 July 1999 / Accepted: 17 January 2000  相似文献   

14.
Storage of sequence data is a big concern as the amount of data generated is exponential in nature at several locations. Therefore, there is a need to develop techniques to store data using compression algorithm. Here we describe optimal storage algorithm (OPTSDNA) for storing large amount of DNA sequences of varying length. This paper provides performance analysis of optimal storage algorithm (OPTSDNA) of a distributed bioinformatics computing system for analysis of DNA sequences. OPTSDNA algorithm is used for storing various sizes of DNA sequences into database. DNA sequences of different lengths were stored by using this algorithm. These input DNA sequences are varied in size from very small to very large. Storage size is calculated by this algorithm. Response time is also calculated in this work. The efficiency and performance of the algorithm is high (in size calculation with percentage) when compared with other known with sequential approach.  相似文献   

15.
In mass spectrometry‐based proteomics, most conventional search engines match spectral data to sequence databases. These search databases thus play a crucial role in the identification process. While search engines can derive peptides in silico from protein sequences, this is usually limited to standard digestion algorithms. Customized search databases that provide detailed control over the search space can vastly outperform such standard strategies, especially in gel‐free proteomics experiments. Here we present Database on Demand, an easy‐to‐use web tool that can quickly produce a wide variety of customized search databases.  相似文献   

16.

Background

Microbial communities of traditional cheeses are complex and insufficiently characterized. The origin, safety and functional role in cheese making of these microbial communities are still not well understood. Metagenomic analysis of these communities by high throughput shotgun sequencing is a promising approach to characterize their genomic and functional profiles. Such analyses, however, critically depend on the availability of appropriate reference genome databases against which the sequencing reads can be aligned.

Results

We built a reference genome catalog suitable for short read metagenomic analysis using a low-cost sequencing strategy. We selected 142 bacteria isolated from dairy products belonging to 137 different species and 67 genera, and succeeded to reconstruct the draft genome of 117 of them at a standard or high quality level, including isolates from the genera Kluyvera, Luteococcus and Marinilactibacillus, still missing from public database. To demonstrate the potential of this catalog, we analysed the microbial composition of the surface of two smear cheeses and one blue-veined cheese, and showed that a significant part of the microbiota of these traditional cheeses was composed of microorganisms newly sequenced in our study.

Conclusions

Our study provides data, which combined with publicly available genome references, represents the most expansive catalog to date of cheese-associated bacteria. Using this extended dairy catalog, we revealed the presence in traditional cheese of dominant microorganisms not deliberately inoculated, mainly Gram-negative genera such as Pseudoalteromonas haloplanktis or Psychrobacter immobilis, that may contribute to the characteristics of cheese produced through traditional methods.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1101) contains supplementary material, which is available to authorized users.  相似文献   

17.
The Lupinus luteus genome contains a highly repetitive fraction of sequences named the EcoRI family. Two EcoRI molecules, 1071 and 1079 base pairs in length, were cloned, sequenced and compared. Analysis of the internal-sequence organization revealed a number of short direct repeats. Their involvement in the formation of the EcoRI-family fragments is postulated. Evidence is presented for the dispersed type of genomic organization of the EcoRI-family fragments.Abbreviations AluI, BspRI, EcoRI, Mbo, PstI restriction nucleases - bp base pair - G, A, T, C deoxynucleotides: dGMP, dAMP, dTMP and dCMP - pBR322 and pUC18 plasmids used as cloning vehicles  相似文献   

18.
Every day tens of thousands of sequence searches and sequence alignment queries are submitted to webservers. The capitalized word “BLAST” becomes a verb, describing the act of performing sequence search and alignment. However, if one needs to search for sequences that contain, for example, two hydrophobic and three polar residues at five given positions, the query formation on the most frequently used webservers will be difficult. Some servers support the formation of queries with regular expressions, but most of the users are unfamiliar with their syntax. Here we present an intuitive, easily applicable webserver, the Protein Sequence Analysis server, that allows the formation of multiple choice queries by simply drawing the residues to their positions; if more than one residue are drawn to the same position, then they will be nicely stacked on the user interface, indicating the multiple choice at the given position. This computer-game-like interface is natural and intuitive, and the coloring of the residues makes possible to form queries requiring not just certain amino acids in the given positions, but also small nonpolar, negatively charged, hydrophobic, positively charged, or polar ones. The webserver is available at http://psa.pitgroup.org.  相似文献   

19.
The sequencing of the cloned Locusta migratoria mitochondrial genome has been completed. The sequence is 15,722 by in length and contains 75.3% A+T, the lowest value in any of the five insect mitochondrial sequences so far determined. The protein coding genes have a similar A+T content (74.1%) but are distinguished by a high cytosine content at the third codon position. The gene content and organization are the same as in Drosophila yakuba except for a rearrangement of the two tRNA genes tRNAlys and tRNAasp. The A+T-rich region has a lower A+T nucleotide content than in other insects, and this is largely due to the presence of two G+C-rich 155-bp repetitive sequences at the 5 end of this section and the beginning of the adjacent small rRNA gene. The sizes of the large and small rRNA genes are 1,314 and 827 bp, respectively, and both sequences can be folded to form secondary structures similar to those previously predicted for Drosophila. The tRNA genes have also been modeled and these show a strong resemblance to the dipteran tRNAs, all anticodons apparently being conserved between the two species. A comparison of the protein coding nucleotide sequences of the locust DNA with the homologous sequences of five other arthropods (Drosophila yakuba, Anopheles quadrimaculatus, Anopheles gambiae, Apis mellifera, and Artemia franciscana) was performed. The amino acid composition of the encoded proteins in Locusta is similar to that of Drosophila, with a Dayhoff distance twice that of the distance between the fruit fly and the mosquitoes. A phylogenetic analysis revealed the locust genes to be more similar to those of the Dipterans than to those of the honeybee at both the nucleotide and amino acid levels. A comparative analysis of tRNA orders, using crustacean mtDNAs as outgroups, supported this. This high level of divergence in the Apis genome has been noted elsewhere and is possibly an effect of directional mutation pressure having resulted in an accelerated pattern of sequence evolution. If the general assumption that the Holometabola are monophyletic holds, then these results emphasize the difficulties of reconstructing phylogenies that include lineages with variable substitution rates and base composition biases. The need to exercise caution in using information about tRNA gene orders in phylogenetic analysis is also illustrated. However, if the honeybee sequence is excluded, the correspondence between the other five arthropod sequences supports the findings of previous studies which have endorsed the use of mtDNA sequences for studies of phylogeny at deep levels of taxonomy when mutation rates are equivalent. Correspondence to: P.K. Flook  相似文献   

20.
Vampirovibrio chlorellavorus is recognized as a pathogen of commercially‐relevant Chlorella species. Algal infection and total loss of productivity (biomass) often occurs when susceptible algal hosts are cultivated in outdoor open pond systems. The pathogenic life cycle of this bacterium has been inferred from laboratory and field observations, and corroborated in part by the genomic analyses for two Arizona isolates recovered from an open algal reactor. V. chlorellavorus predation has been reported to occur in geographically‐ and environmentally‐diverse conditions. Genomic analyses of these and additional field isolates is expected to reveal new information about the extent of ecological diversity and genes involved in host‐pathogen interactions. The draft genome sequences for two isolates of the predatory V. chlorellavorus (Cyanobacteria; Ca. Melainabacteria) from an outdoor cultivation system located in the Arizona Sonoran Desert were assembled and annotated. The genomes were sequenced and analyzed to identify genes (proteins) with predicted involvement in predation, infection, and cell death of Chlorella host species prioritized for biofuel production at sites identified as highly suitable for algal production in the southwestern USA. Genomic analyses identified several predicted genes encoding secreted proteins that are potentially involved in pathogenicity, and at least three apparently complete sets of virulence (Vir) genes, characteristic of the VirB‐VirD type system encoding the canonical VirB1‐11 and VirD4 proteins, respectively. Additional protein functions were predicted suggesting their involvement in quorum sensing and motility. The genomes of two previously uncharacterized V. chlorellavorus isolates reveal nucleotide and protein level divergence between each other, and a previously sequenced V. chlorellavorus genome. This new knowledge will enhance the fundamental understanding of trans‐kingdom interactions between a unique cosmopolitan cyanobacterial pathogen and its green microalgal host, of broad interest as a source of harvestable biomass for biofuels or bioproducts.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号