首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 26 毫秒
1.
欧竑宇 《微生物学通报》2013,40(10):1909-1919
随着DNA测序技术的进步, 迄今为止已有12个链霉菌基因组被测序。面对海量组学的数据, 急需采用生物信息学方法来大规模深度挖掘这些重要微生物资源, 进而实现链霉菌资源挖掘和代谢潜力释放的深度互动。围绕链霉菌基因组比较分析中菌株特有的基因组岛和次生代谢物生物合成基因簇的识别及功能解析等两个常见问题, 本文收集了近期开发的一些常用生物信息学工具和二级数据库。以链霉菌染色体核心区和两臂的划分、天蓝色链霉菌和变铅青链霉菌基因组岛的识别、卡特利链霉菌巨型质粒的鉴别为例, 简介了这些生物信息学资源的使用方法。此外, 还简述了我们课题组进行放线菌型整合性接合元件识别和开发硫肽生物合成基因簇预测新工具的一些尝试。生物信息学工具和二级数据库在链霉菌基因组比较分析中有重要作用, 可将研究重点迅速地聚焦在某株菌的可移动遗传元件和次生代谢物生成基因簇上, 确定其对应的菌株特有表型, 及解析新型化合物生物合成和调控机理。  相似文献   

2.
Next‐generation sequencing allows access to a large quantity of genomic data. In plants, several studies used whole chloroplast genome sequences for inferring phylogeography or phylogeny. Even though the chloroplast is a haploid organelle, NGS plastome data identified a nonnegligible number of intra‐individual polymorphic SNPs. Such observations could have several causes such as sequencing errors, the presence of heteroplasmy or transfer of chloroplast sequences in the nuclear and mitochondrial genomes. The occurrence of allelic diversity has practical important impacts on the identification of diversity, the analysis of the chloroplast data and beyond that, significant evolutionary questions. In this study, we show that the observed intra‐individual polymorphism of chloroplast sequence data is probably the result of plastid DNA transferred into the mitochondrial and/or the nuclear genomes. We further assess nine different bioinformatics pipelines’ error rates for SNP and genotypes calling using SNPs identified in Sanger sequencing. Specific pipelines are adequate to deal with this issue, optimizing both specificity and sensitivity. Our results will allow a proper use of whole chloroplast NGS sequence and will allow a better handling of NGS chloroplast sequence diversity.  相似文献   

3.
4.
5.
Whereas genomics describes the study of genome, mainly represented by its gene expression on the DNA or RNA level, the term proteomics denotes the study of the proteome, which is the protein complement encoded by the genome. In recent years, the number of proteomic experiments increased tremendously. While all fields of proteomics have made major technological advances, the biggest step was seen in bioinformatics. Biological information management relies on sequence and structure databases and powerful software tools to translate experimental results into meaningful biological hypotheses and answers. In this resource article, I provide a collection of databases and software available on the Internet that are useful to interpret genomic and proteomic data. The article is a toolbox for researchers who have genomic or proteomic datasets and need to put their findings into a biological context.  相似文献   

6.
Heterodera glycines, the soybean cyst nematode (SCN), is a damaging agricultural pest that could be effectively managed if critical phenotypes, such as virulence and host range could be understood. While SCN is amenable to genetic analysis, lack of DNA sequence data prevents the use of such methods to study this pathogen. Fortunately, new methods of DNA sequencing that produced large amounts of data and permit whole genome comparative analyses have become available. In this study, 400 million bases of genomic DNA sequence were collected from two inbred biotypes of SCN using 454 micro-bead DNA sequencing. Comparisons to a BAC, sequenced by Sanger sequencing, showed that the micro-bead sequences could identify low and high copy number regions within the BAC. Potential single nucleotide polymorphisms (SNPs) between the two SCN biotypes were identified by comparing the two sets of sequences. Selected resequencing revealed that up to 84% of the SNPs were correct. We conclude that the quality of the micro-bead sequence data was sufficient for de novo SNP identification and should be applicable to organisms with similar genome sizes and complexities. The SNPs identified will be an important starting point in associating phenotypes with specific regions of the SCN genome.  相似文献   

7.
Drosophila melanogaster is a leading model in population genetics and genomics, and a growing number of whole-genome data sets from natural populations of this species have been published over the last years. A major challenge is the integration of disparate data sets, often generated using different sequencing technologies and bioinformatic pipelines, which hampers our ability to address questions about the evolution of this species. Here we address these issues by developing a bioinformatics pipeline that maps pooled sequencing (Pool-Seq) reads from D. melanogaster to a hologenome consisting of fly and symbiont genomes and estimates allele frequencies using either a heuristic (PoolSNP) or a probabilistic variant caller (SNAPE-pooled). We use this pipeline to generate the largest data repository of genomic data available for D. melanogaster to date, encompassing 271 previously published and unpublished population samples from over 100 locations in >20 countries on four continents. Several of these locations have been sampled at different seasons across multiple years. This data set, which we call Drosophila Evolution over Space and Time (DEST), is coupled with sampling and environmental metadata. A web-based genome browser and web portal provide easy access to the SNP data set. We further provide guidelines on how to use Pool-Seq data for model-based demographic inference. Our aim is to provide this scalable platform as a community resource which can be easily extended via future efforts for an even more extensive cosmopolitan data set. Our resource will enable population geneticists to analyze spatiotemporal genetic patterns and evolutionary dynamics of D. melanogaster populations in unprecedented detail.  相似文献   

8.
RRE allows the extraction of non-coding regions surrounding a coding sequence [i.e. gene upstream region, 5'-untranslated region (5'-UTR), introns, 3'-UTR, downstream region] from annotated genomic datasets available at NCBI. AVAILABILITY: RRE parser and web-based interface are accessible at http://www.bioinformatica.unito.it/bioinformatics/rre/rre.html  相似文献   

9.
10.
? Premise of the study: Genome survey sequences (GSS) from massively parallel sequencing have potential to provide large, cost-effective data sets for phylogenetic inference, replace single gene or spacer regions as DNA barcodes, and provide a plethora of data for other comparative molecular evolution studies. Here we report on the application of this method to estimating the molecular phylogeny of core Asparagales, investigating plastid gene losses, assembling complete plastid genomes, and determining the type and quality of assembled genomic data attainable from Illumina 80-120-bp reads. ? Methods: We sequenced total genomic DNA from samples in two lineages of monocotyledonous plants, Poaceae and Asparagales, on the Illumina platform in a multiplex arrangement. We compared reference-based assemblies to de novo contigs, evaluated consistency of assemblies resulting from use of various references sequences, and assessed our methods to obtain sequence assemblies in nonmodel taxa. ? Key results: Our method returned reliable, robust organellar and nrDNA sequences in a variety of plant lineages. High quality assemblies are not dependent on genome size, amount of plastid present in the total genomic DNA template, or relatedness of available reference sequences for assembly. Phylogenetic results revealed familial and subfamilial relationships within Asparagales with high bootstrap support, although placement of the monotypic genus Aphyllanthes was placed with moderate confidence. ? Conclusions: The well-supported molecular phylogeny provides evidence for delineation of subfamilies within core Asparagales. With advances in technology and bioinformatics tools, the use of massively parallel sequencing will continue to become easier and more affordable for phylogenomic and molecular evolutionary biology investigations.  相似文献   

11.
Reassociation kinetics and flow cytometry data indicate that ixodid tick genomes are large, relative to most arthropods, containing>or=10(9) base pairs. The molecular basis for this is unknown. We have identified a novel small interspersed element with features of a tRNA-derived SINE, designated Ruka, in genomic sequences of Rhipicephalus appendiculatus and Boophilus (Rhipicephalus) microplus ticks. The SINE was also identified in expressed sequence tag (EST) databases derived from several tissues in four species of ixodid ticks, namely R. appendiculatus, B. (R.) microplus, Amblyomma variegatum and also the more distantly related Ixodes scapularis. Secondary structure predictions indicated that Ruka could adopt a tRNA structure that was, atypically, most similar to a serine tRNA. By extrapolation the frequency of occurrence in the randomly selected BAC clone sequences is consistent with approximately 65,000 copies of Ruka in the R. appendiculatus genome. Real time PCR analyses on genomic DNA indicate copy numbers for specific Ruka subsets between 5800 and 38,000. Several putative conserved Ruka insertion sites were identified in EST sequences of three ixodid tick species based on the flanking sequences associated with the SINEs, indicating that some Ruka transpositions probably occurred prior to speciation within the metastriate division of the Ixodidae. The data strongly suggest that Class I transposable elements form a significant component of tick genomes and may partially account for the large genome sizes observed.  相似文献   

12.
Open source and open data have been driving forces in bioinformatics in the past. However, privacy concerns may soon change the landscape, limiting future access to important data sets, including personal genomics data. Here we survey this situation in some detail, describing, in particular, how the large scale of the data from personal genomic sequencing makes it especially hard to share data, exacerbating the privacy problem. We also go over various aspects of genomic privacy: first, there is basic identifiability of subjects having their genome sequenced. However, even for individuals who have consented to be identified, there is the prospect of very detailed future characterization of their genotype, which, unanticipated at the time of their consent, may be more personal and invasive than the release of their medical records. We go over various computational strategies for dealing with the issue of genomic privacy. One can "slice" and reformat datasets to allow them to be partially shared while securing the most private variants. This is particularly applicable to functional genomics information, which can be largely processed without variant information. For handling the most private data there are a number of legal and technological approaches-for example, modifying the informed consent procedure to acknowledge that privacy cannot be guaranteed, and/or employing a secure cloud computing environment. Cloud computing in particular may allow access to the data in a more controlled fashion than the current practice of downloading and computing on large datasets. Furthermore, it may be particularly advantageous for small labs, given that the burden of many privacy issues falls disproportionately on them in comparison to large corporations and genome centers. Finally, we discuss how education of future genetics researchers will be important, with curriculums emphasizing privacy and data security. However, teaching personal genomics with identifiable subjects in the university setting will, in turn, create additional privacy issues and social conundrums.  相似文献   

13.
14.
15.
Next-generation sequencing technologies are revolutionizing the field of phylogenetics by making available genome scale data for a fraction of the cost of traditional targeted sequencing. One challenge will be to make use of these genomic level data without necessarily resorting to full-scale genome assembly and annotation, which is often time and labor intensive. Here we describe a technique, the Target Restricted Assembly Method (TRAM), in which the typical process of genome assembly and annotation is in essence reversed. Protein sequences of phylogenetically useful genes from a species within the group of interest are used as targets in tblastn searches of a data set from a lane of Illumina reads for a related species. Resulting blast hits are then assembled locally into contigs and these contigs are then aligned against the reference “cDNA” sequence to remove portions of the sequences that include introns. We illustrate the Target Restricted Assembly Method using genomic scale datasets for 20 species of lice (Insecta: Psocodea) to produce a test phylogenetic data set of 10 nuclear protein coding gene sequences. Given the advantages of using DNA instead of RNA, this technique is very cost effective and feasible given current technologies.  相似文献   

16.
The EpiGRAPH web service enables biologists to uncover hidden associations in vertebrate genome and epigenome datasets. Users can upload sets of genomic regions and EpiGRAPH will test multiple attributes (including DNA sequence, chromatin structure, epigenetic modifications and evolutionary conservation) for enrichment or depletion among these regions. Furthermore, EpiGRAPH learns to predictively identify similar genomic regions. This paper demonstrates EpiGRAPH's practical utility in a case study on monoallelic gene expression and describes its novel approach to reproducible bioinformatic analysis.  相似文献   

17.
Detecting genetic variation is one of the main applications of high-throughput sequencing, but is still challenging wherever aligning short reads poses ambiguities. Current state-of-the-art variant calling approaches avoid such regions, arguing that it is necessary to sacrifice detection sensitivity to limit false discovery. We developed a method that links candidate variant positions within repetitive genomic regions into clusters. The technique relies on a resource, a thesaurus of genetic variation, that enumerates genomic regions with similar sequence. The resource is computationally intensive to generate, but once compiled can be applied efficiently to annotate and prioritize variants in repetitive regions. We show that thesaurus annotation can reduce the rate of false variant calls due to mappability by up to three orders of magnitude. We apply the technique to whole genome datasets and establish that called variants in low mappability regions annotated using the thesaurus can be experimentally validated. We then extend the analysis to a large panel of exomes to show that the annotation technique opens possibilities to study variation in hereto hidden and under-studied parts of the genome.  相似文献   

18.
DNA sequencing: bench to bedside and beyond   总被引:4,自引:1,他引:3  
Fifteen years elapsed between the discovery of the double helix (1953) and the first DNA sequencing (1968). Modern DNA sequencing began in 1977, with development of the chemical method of Maxam and Gilbert and the dideoxy method of Sanger, Nicklen and Coulson, and with the first complete DNA sequence (phage ϕX174), which demonstrated that sequence could give profound insights into genetic organization. Incremental improvements allowed sequencing of molecules >200 kb (human cytomegalovirus) leading to an avalanche of data that demanded computational analysis and spawned the field of bioinformatics. The US Human Genome Project spurred sequencing activity. By 1992 the first ‘sequencing factory’ was established, and others soon followed. The first complete cellular genome sequences, from bacteria, appeared in 1995 and other eubacterial, archaebacterial and eukaryotic genomes were soon sequenced. Competition between the public Human Genome Project and Celera Genomics produced working drafts of the human genome sequence, published in 2001, but refinement and analysis of the human genome sequence will continue for the foreseeable future. New ‘massively parallel’ sequencing methods are greatly increasing sequencing capacity, but further innovations are needed to achieve the ‘thousand dollar genome’ that many feel is prerequisite to personalized genomic medicine. These advances will also allow new approaches to a variety of problems in biology, evolution and the environment.  相似文献   

19.
Bread wheat (Triticum aestivum) is one of the most important crop plants, globally providing staple food for a large proportion of the human population. However, improvement of this crop has been limited due to its large and complex genome. Advances in genomics are supporting wheat crop improvement. We provide a variety of web-based systems hosting wheat genome and genomic data to support wheat research and crop improvement. WheatGenome.info is an integrated database resource which includes multiple web-based applications. These include a GBrowse2-based wheat genome viewer with BLAST search portal, TAGdb for searching wheat second-generation genome sequence data, wheat autoSNPdb, links to wheat genetic maps using CMap and CMap3D, and a wheat genome Wiki to allow interaction between diverse wheat genome sequencing activities. This system includes links to a variety of wheat genome resources hosted at other research organizations. This integrated database aims to accelerate wheat genome research and is freely accessible via the web interface at http://www.wheatgenome.info/.  相似文献   

20.
KS Lee  RN Kim  BH Yoon  DS Kim  SH Choi  DW Kim  SH Nam  A Kim  A Kang  KH Park  JE Jung  SH Chae  HS Park 《Bioinformation》2012,8(11):532-534
Recently, next generation sequencing (NGS) technologies have led to a revolutionary increase in sequencing speed and costefficacy. Consequently, a vast number of contigs from many recently sequenced bacterial genomes remain to be accurately mapped and annotated, requiring the development of more convenient bioinformatics programs. In this paper, we present a newly developed web-based bioinformatics program, Bacterial Genome Mapper, which is suitable for mapping and annotating contigs that have been assembled from bacterial genome sequence raw data. By constructing a multiple alignment map between target contig sequences and two reference bacterial genome sequences, this program also provides very useful comparative genomics analysis of draft bacterial genomes. AVAILABILITY: The database is available for free at http://mbgm.kribb.re.kr.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号