首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Germline and somatic variants within an individual or cohort are interpreted with information from large cohorts. Annotation with this information becomes a computational bottleneck as population sets grow to terabytes of data. Here, we introduce echtvar, which efficiently encodes population variants and annotation fields into a compressed archive that can be used for rapid variant annotation and filtering. Most variants, represented by chromosome, position and alleles are encoded into 32-bits-half the size of previous encoding schemes and at least 4 times smaller than a naive encoding. The annotations, stored separately within the same archive, are also encoded and compressed. We show that echtvar is faster and uses less space than existing tools and that it can effectively reduce the number of candidate variants. We give examples on germ-line and somatic variants to document how echtvar can facilitate exploratory data analysis on genetic variants. Echtvar is available at https://github.com/brentp/echtvar under an MIT license.  相似文献   

2.

Background

Brassica rapa is an economically important crop species. During its long breeding history, a large number of morphotypes have been generated, including leafy vegetables such as Chinese cabbage and pakchoi, turnip tuber crops and oil crops.

Results

To investigate the genetic variation underlying this morphological variation, we re-sequenced, assembled and annotated the genomes of two B. rapa subspecies, turnip crops (turnip) and a rapid cycling. We then analysed the two resulting genomes together with the Chinese cabbage Chiifu reference genome to obtain an impression of the B. rapa pan-genome. The number of genes with protein-coding changes between the three genotypes was lower than that among different accessions of Arabidopsis thaliana, which can be explained by the smaller effective population size of B. rapa due to its domestication. Based on orthology to a number of non-brassica species, we estimated the date of divergence among the three B. rapa morphotypes at approximately 250,000 YA, far predating Brassica domestication (5,000-10,000 YA).

Conclusions

By analysing genes unique to turnip we found evidence for copy number differences in peroxidases, pointing to a role for the phenylpropanoid biosynthesis pathway in the generation of morphological variation. The estimated date of divergence among three B. rapa morphotypes implies that prior to domestication there was already considerably divergence among B. rapa genotypes. Our study thus provides two new B. rapa reference genomes, delivers a set of computer tools to analyse the resulting pan-genome and uses these to shed light on genetic drivers behind the rich morphological variation found in B. rapa.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-250) contains supplementary material, which is available to authorized users.  相似文献   

3.
Despite the great potential of single nucleotide polymorphism (SNP) markers in evolutionary studies, in particular for inferring population genetic parameters, SNP analysis has almost exclusively been limited to humans and ‘genomic model’ organisms, due to the lack of available sequence data in non-model organisms. Here, we describe a rapid and cost effective method to isolate candidate SNPs in non-model organisms. This SNP isolation strategy consists basically in the direct sequencing of amplified fragment length polymorphism bands. In a first application of this method, 10 unique DNA fragments that contained 24 SNPs were discovered in 11.11 kb of sequenced genomic DNA of a non-model species, the brown trout (Salmo trutta).  相似文献   

4.
Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project.  相似文献   

5.
Although the rhesus macaque (Macaca mulatta) is commonly used for biomedical research and becoming a preferred model for translational medicine, quantification of genome-wide variation has been slow to follow the publication of the genome in 2007. Here we report the properties of 4040 single nucleotide polymorphisms discovered and validated in Chinese and Indian rhesus macaques from captive breeding colonies in the United States. Frequency-matched measures of linkage disequilibrium were much greater in the Indian sample. Although the majority of polymorphisms were shared between the two populations, rare alleles were over twice as common in the Chinese sample. Indian rhesus had higher rates of heterozygosity, as well as previously undetected substructure, potentially due to admixture from Burma in wild populations and demographic events post-captivity.  相似文献   

6.
A submersible microbial fuel cell (SBMFC) was developed as a biosensor for in situ and real time monitoring of dissolved oxygen (DO) in environmental waters. Domestic wastewater was utilized as a sole fuel for powering the sensor. The sensor performance was firstly examined with tap water at varying DO levels. With an external resistance of 1000?, the current density produced by the sensor (5.6±0.5-462.2±0.5mA/m(2)) increased linearly with DO level up to 8.8±0.3mg/L (regression coefficient, R(2)=0.9912), while the maximum response time for each measurement was less than 4min. The current density showed different response to DO levels when different external resistances were applied, but a linear relationship was always observed. Investigation of the sensor performance at different substrate concentrations indicates that the organic matter contained in the domestic wastewater was sufficient to power the sensing activities. The sensor ability was further explored under different environmental conditions (e.g. pH, temperature, conductivity, and alternative electron acceptor), and the results indicated that a calibration would be required before field application. Lastly, the sensor was tested with different environmental waters and the results showed no significant difference (p>0.05) with that measured by DO meter. The simple, compact SBMFC sensor showed promising potential for direct, inexpensive and rapid DO monitoring in various environmental waters.  相似文献   

7.
8.
SUMMARY: Genomic Analysis and Rapid Biological ANnotation (GARBAN) is a new tool that provides an integrated framework to analyze simultaneously and compare multiple data sets derived from microarray or proteomic experiments. It carries out automated classifications of genes or proteins according to the criteria of the Gene Ontology Consortium at a level of depth defined by the user. Additionally, it performs clustering analysis of all sets based on functional categories or on differential expression levels. GARBAN also provides graphical representations of the biological pathways in which all the genes/proteins participate. AVAILABILITY: http://garban.tecnun.es.  相似文献   

9.
Various research projects often involve determining the relative position of genomic coordinates, intervals, single nucleotide variations (SNVs), insertions, deletions and translocations with respect to genes and their potential impact on protein translation. Due to the tremendous increase in throughput brought by the use of next-generation sequencing, investigators are routinely faced with the need to annotate very large datasets. We present Segtor, a tool to annotate large sets of genomic coordinates, intervals, SNVs, indels and translocations. Our tool uses segment trees built using the start and end coordinates of the genomic features the user wishes to use instead of storing them in a database management system. The software also produces annotation statistics to allow users to visualize how many coordinates were found within various portions of genes. Our system currently can be made to work with any species available on the UCSC Genome Browser. Segtor is a suitable tool for groups, especially those with limited access to programmers or with interest to analyze large amounts of individual genomes, who wish to determine the relative position of very large sets of mapped reads and subsequently annotate observed mutations between the reads and the reference. Segtor (http://lbbc.inca.gov.br/segtor/) is an open-source tool that can be freely downloaded for non-profit use. We also provide a web interface for testing purposes.  相似文献   

10.
BACKGROUND: The annotation of genomes from next-generation sequencing platforms needs to be rapid, high-throughput, and fully integrated and automated. Although a few Web-based annotation services have recently become available, they may not be the best solution for researchers that need to annotate a large number of genomes, possibly including proprietary data, and store them locally for further analysis. To address this need, we developed a standalone software application, the Annotation of microbial Genome Sequences (AGeS) system, which incorporates publicly available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance. METHODOLOGY: The AGeS system supports three main capabilities. The first is the storage of input contig sequences and the resulting annotation data in a central, customized database. The second is the annotation of microbial genomes using an integrated software pipeline, which first analyzes contigs from high-throughput sequencing by locating genomic regions that code for proteins, RNA, and other genomic elements through the Do-It-Yourself Annotation (DIYA) framework. The identified protein-coding regions are then functionally annotated using the in-house-developed Pipeline for Protein Annotation (PIPA). The third capability is the visualization of annotated sequences using GBrowse. To date, we have implemented these capabilities for bacterial genomes. AGeS was evaluated by comparing its genome annotations with those provided by three other methods. Our results indicate that the software tools integrated into AGeS provide annotations that are in general agreement with those provided by the compared methods. This is demonstrated by a >94% overlap in the number of identified genes, a significant number of identical annotated features, and a >90% agreement in enzyme function predictions.  相似文献   

11.
For many years, the regulation of protein structure and function by phosphorylation and dephosphorylation was considered a relatively recent invention that arose independently in each phylogenetic domain. Over time, however, incidents of apparent domain trespass involving the presence of 'eukaryotic' protein kinases or protein phosphatases in prokaryotic organisms were reported with increasing frequency. Today, genomics has provided the means to examine the phylogenetic distribution of 'eukaryotic' protein kinases and protein phosphatases in a comprehensive and systematic manner. The results of these genome searches challenge previous conceptions concerning the origins and evolution of this versatile regulatory mechanism.  相似文献   

12.
SUMMARY: The interpretation of genome-wide association results is confounded by linkage disequilibrium between nearby alleles. We have developed a flexible bioinformatics query tool for single-nucleotide polymorphisms (SNPs) to identify and to annotate nearby SNPs in linkage disequilibrium (proxies) based on HapMap. By offering functionality to generate graphical plots for these data, the SNAP server will facilitate interpretation and comparison of genome-wide association study results, and the design of fine-mapping experiments (by delineating genomic regions harboring associated variants and their proxies). AVAILABILITY: SNAP server is available at http://www.broad.mit.edu/mpg/snap/.  相似文献   

13.
The huge variation in the genomic guanine plus cytosine content (GC%) among prokaryotes has been explained by two mutually exclusive hypotheses, namely, selectionist and neutralist. The former proposals have in common the assumption that this feature is a form of adaptation to some ecological or physiological condition. On the other hand, the neutralist interpretation states that the variations are due only to different mutational biases. Since all of the traits that have been proposed by the selectionists either appeared to be limited to certain genera or were invalidated by the availability of more data, they cannot be considered as a selective force influencing the genomic GC% across all prokaryotes. In this report we show that aerobic prokaryotes display a significant increment in genomic GC% in relation to anaerobic ones. This is the first time that a link between a metabolic character and GC% has been found, independently of phylogenetic relationships and with a statistically significant amount of data.  相似文献   

14.
MOTIVATION: Recombination can be a prevailing drive in shaping genome evolution. RAT (Recombination Analysis Tool) is a Java-based tool for investigating recombination events in any number of aligned sequences (protein or DNA) of any length (short viral sequences to full genomes). It is an uncomplicated and intuitive application and allows the user to view only the regions of sequence alignments they are interested in. RESULTS: RAT was applied to viral sequences. Its utility was demonstrated through the detection of a known recombinant of HIV and a detailed analysis of Noroviruses, the most common cause of viral gastroenteritis in humans. AVAILABILITY: RAT, along with a user's guide, is freely available from http://jic-bioinfo.bbsrc.ac.uk/bioinformatics-research/staff/graham_etherington/RAT.htm.  相似文献   

15.
IslandPath: aiding detection of genomic islands in prokaryotes   总被引:11,自引:0,他引:11  
Genomic islands (clusters of genes of potential horizontal origin in a prokaryotic genome) are frequently associated with a particular adaptation of a microbe that is of medical, agricultural or environmental importance, such as antibiotic resistance, pathogen virulence, or metal resistance. While many sequence features associated with such islands have been adopted separately in applications for analysis of genomic islands, including pathogenicity islands, there is no single application that integrates multiple features for island detection. IslandPath is a network service which incorporates multiple DNA signals and genome annotation features into a graphical display of a bacterial or archaeal genome, to aid the detection of genomic islands. AVAILABILITY: This application is available at http://www.pathogenomics.sfu.ca/islandpath and the source code is freely available, under GNU public licence, from the authors. SUPPLEMENTARY INFORMATION: An online help file, which includes analyses of the utility of IslandPath, can be found at http://www.pathogenomics.sfu.ca/islandpath/current/islandhelp.html  相似文献   

16.
17.
18.
19.
《Gene》1997,194(2):273-276
This report describes the amplification of upstream genomic sequences using the polymerase chain reaction (PCR) based solely on downstream DNA information from a cDNA clone. In this novel and rapid technique, genomic DNA (gDNA) is first incubated with a restriction enzyme that recognizes a site within the 5′ end of a gene, followed by denaturation and polyadenylation of its free 3′ ends with terminal transferase. The modified gDNA is then used as template for PCR using a gene-specific primer complementary to a sequence in the 3′ end of its cDNA and an anchored deoxyoligothymidine primer. A second round of PCR is then performed with a second, nested gene-specific primer and the anchor sequence primer. The resulting PCR product is cloned and its sequence determined. Three independent plant genomic clones were isolated using this method that exhibited complete sequence identity to their cDNAs and to the primers used in the amplification.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号