首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Understanding the taxonomic composition of a sample, whether from patient, food or environment, is important to several types of studies including pathogen diagnostics, epidemiological studies, biodiversity analysis and food quality regulation. With the decreasing costs of sequencing, metagenomic data is quickly becoming the preferred typed of data for such analysis.

Results

Rapidly defining the taxonomic composition (both taxonomic profile and relative frequency) in a metagenomic sequence dataset is challenging because the task of mapping millions of sequence reads from a metagenomic study to a non-redundant nucleotide database such as the NCBI non-redundant nucleotide database (nt) is a computationally intensive task. We have developed a robust subsampling-based algorithm implemented in a tool called CensuScope meant to take a ‘sneak peak’ into the population distribution and estimate taxonomic composition as if a census was taken of the metagenomic landscape. CensuScope is a rapid and accurate metagenome taxonomic profiling tool that randomly extracts a small number of reads (based on user input) and maps them to NCBI’s nt database. This process is repeated multiple times to ascertain the taxonomic composition that is found in majority of the iterations, thereby providing a robust estimate of the population and measures of the accuracy for the results.

Conclusion

CensuScope can be run on a laptop or on a high-performance computer. Based on our analysis we are able to provide some recommendations in terms of the number of sequence reads to analyze and the number of iterations to use. For example, to quantify taxonomic groups present in the sample at a level of 1% or higher a subsampling size of 250 random reads with 50 iterations yields a statistical power of >99%. Windows and UNIX versions of CensuScope are available for download at https://hive.biochemistry.gwu.edu/dna.cgi?cmd=censuscope. CensuScope is also available through the High-performance Integrated Virtual Environment (HIVE) and can be used in conjunction with other HIVE analysis and visualization tools.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-918) contains supplementary material, which is available to authorized users.  相似文献   

2.

Background

When studying the genetics of a human trait, we typically have to manage both genome-wide and targeted genotype data. There can be overlap of both people and markers from different genotyping experiments; the overlap can introduce several kinds of problems. Most times the overlapping genotypes are the same, but sometimes they are different. Occasionally, the lab will return genotypes using a different allele labeling scheme (for example 1/2 vs A/C). Sometimes, the genotype for a person/marker index is unreliable or missing. Further, over time some markers are merged and bad samples are re-run under a different sample name. We need a consistent picture of the subset of data we have chosen to work with even though there might possibly be conflicting measurements from multiple data sources.

Results

We have developed the dbVOR database, which is designed to hold data efficiently for both genome-wide and targeted experiments. The data are indexed for fast retrieval by person and marker. In addition, we store pedigree and phenotype data for our subjects. The dbVOR database allows us to select subsets of the data by several different criteria and to merge their results into a coherent and consistent whole. Data may be filtered by: family, person, trait value, markers, chromosomes, and chromosome ranges. The results can be presented in columnar, Mega2, or PLINK format.

Conclusions

dbVOR serves our needs well. It is freely available from https://watson.hgen.pitt.edu/register. Documentation for dbVOR can be found at https://watson.hgen.pitt.edu/register/docs/dbvor.html.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0505-4) contains supplementary material, which is available to authorized users.  相似文献   

3.

Background

Exome sequencing allows researchers to study the human genome in unprecedented detail. Among the many types of variants detectable through exome sequencing, one of the most over looked types of mutation is internal deletion of exons. Internal exon deletions are the absence of consecutive exons in a gene. Such deletions have potentially significant biological meaning, and they are often too short to be considered copy number variation. Therefore, to the need for efficient detection of such deletions using exome sequencing data exists.

Results

We present ExonDel, a tool specially designed to detect homozygous exon deletions efficiently. We tested ExonDel on exome sequencing data generated from 16 breast cancer cell lines and identified both novel and known IEDs. Subsequently, we verified our findings using RNAseq and PCR technologies. Further comparisons with multiple sequencing-based CNV tools showed that ExonDel is capable of detecting unique IEDs not found by other CNV tools.

Conclusions

ExonDel is an efficient way to screen for novel and known IEDs using exome sequencing data. ExonDel and its source code can be downloaded freely at https://github.com/slzhao/ExonDel.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-332) contains supplementary material, which is available to authorized users.  相似文献   

4.

Background

Next-generation sequencing technologies are rapidly generating whole-genome datasets for an increasing number of organisms. However, phylogenetic reconstruction of genomic data remains difficult because de novo assembly for non-model genomes and multi-genome alignment are challenging.

Results

To greatly simplify the analysis, we present an Assembly and Alignment-Free (AAF) method (https://sourceforge.net/projects/aaf-phylogeny) that constructs phylogenies directly from unassembled genome sequence data, bypassing both genome assembly and alignment. Using mathematical calculations, models of sequence evolution, and simulated sequencing of published genomes, we address both evolutionary and sampling issues caused by direct reconstruction, including homoplasy, sequencing errors, and incomplete sequencing coverage. From these results, we calculate the statistical properties of the pairwise distances between genomes, allowing us to optimize parameter selection and perform bootstrapping. As a test case with real data, we successfully reconstructed the phylogeny of 12 mammals using raw sequencing reads. We also applied AAF to 21 tropical tree genome datasets with low coverage to demonstrate its effectiveness on non-model organisms.

Conclusion

Our AAF method opens up phylogenomics for species without an appropriate reference genome or high sequence coverage, and rapidly creates a phylogenetic framework for further analysis of genome structure and diversity among non-model organisms.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1647-5) contains supplementary material, which is available to authorized users.  相似文献   

5.
6.

Background

Comparing and aligning genomes is a key step in analyzing closely related genomes. Despite the development of many genome aligners in the last 15 years, the problem is not yet fully resolved, even when aligning closely related bacterial genomes of the same species. In addition, no procedures are available to assess the quality of genome alignments or to compare genome aligners.

Results

We designed an original method for pairwise genome alignment, named YOC, which employs a highly sensitive similarity detection method together with a recent collinear chaining strategy that allows overlaps. YOC improves the reliability of collinear genome alignments, while preserving or even improving sensitivity. We also propose an original qualitative evaluation criterion for measuring the relevance of genome alignments. We used this criterion to compare and benchmark YOC with five recent genome aligners on large bacterial genome datasets, and showed it is suitable for identifying the specificities and the potential flaws of their underlying strategies.

Conclusions

The YOC prototype is available at https://github.com/ruricaru/YOC. It has several advantages over existing genome aligners: (1) it is based on a simplified two phase alignment strategy, (2) it is easy to parameterize, (3) it produces reliable genome alignments, which are easier to analyze and to use.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0530-3) contains supplementary material, which is available to authorized users.  相似文献   

7.
8.

Background

Searching the orthologs of a given protein or DNA sequence is one of the most important and most commonly used Bioinformatics methods in Biology. Programs like BLAST or the orthology search engine Inparanoid can be used to find orthologs when the similarity between two sequences is sufficiently high. They however fail when the level of conservation is low. The detection of remotely conserved proteins oftentimes involves sophisticated manual intervention that is difficult to automate.

Results

Here, we introduce morFeus, a search program to find remotely conserved orthologs. Based on relaxed sequence similarity searches, morFeus selects sequences based on the similarity of their alignments to the query, tests for orthology by iterative reciprocal BLAST searches and calculates a network score for the resulting network of orthologs that is a measure of orthology independent of the E-value. Detecting remotely conserved orthologs of a protein using morFeus thus requires no manual intervention. We demonstrate the performance of morFeus by comparing it to state-of-the-art orthology resources and methods. We provide an example of remotely conserved orthologs, which were experimentally shown to be functionally equivalent in the respective organisms and therefore meet the criteria of the orthology-function conjecture.

Conclusions

Based on our results, we conclude that morFeus is a powerful and specific search method for detecting remotely conserved orthologs. morFeus is freely available at http://bio.biochem.mpg.de/morfeus/. Its source code is available from Sourceforge.net (https://sourceforge.net/p/morfeus/).

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-263) contains supplementary material, which is available to authorized users.  相似文献   

9.
10.

Background

While next-generation sequencing technologies have made sequencing genomes faster and more affordable, deciphering the complete genome sequence of an organism remains a significant bioinformatics challenge, especially for large genomes. Low sequence coverage, repetitive elements and short read length make de novo genome assembly difficult, often resulting in sequence and/or fragment “gaps” – uncharacterized nucleotide (N) stretches of unknown or estimated lengths. Some of these gaps can be closed by re-processing latent information in the raw reads. Even though there are several tools for closing gaps, they do not easily scale up to processing billion base pair genomes.

Results

Here we describe Sealer, a tool designed to close gaps within assembly scaffolds by navigating de Bruijn graphs represented by space-efficient Bloom filter data structures. We demonstrate how it scales to successfully close 50.8 % and 13.8 % of gaps in human (3 Gbp) and white spruce (20 Gbp) draft assemblies in under 30 and 27 h, respectively – a feat that is not possible with other leading tools with the breadth of data used in our study.

Conclusion

Sealer is an automated finishing application that uses the succinct Bloom filter representation of a de Bruijn graph to close gaps in draft assemblies, including that of very large genomes. We expect Sealer to have broad utility for finishing genomes across the tree of life, from bacterial genomes to large plant genomes and beyond. Sealer is available for download at https://github.com/bcgsc/abyss/tree/sealer-release.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0663-4) contains supplementary material, which is available to authorized users.  相似文献   

11.

Background

New treatments need to be evaluated in real-world clinical practice to account for co-morbidities, adherence and polypharmacy.

Methods

Patients with chronic obstructive pulmonary disease (COPD), ≥40 years old, with exacerbation in the previous 3 years are randomised 1:1 to once-daily fluticasone furoate 100 μg/vilanterol 25 μg in a novel dry-powder inhaler versus continuing their existing therapy. The primary endpoint is the mean annual rate of COPD exacerbations; an electronic medical record allows real-time collection and monitoring of endpoint and safety data.

Conclusions

The Salford Lung Study is the world’s first pragmatic randomised controlled trial of a pre-licensed medication in COPD.

Trial registration

Clinicaltrials.gov identifier NCT01551758.  相似文献   

12.
13.

Background

First pass methods based on BLAST match are commonly used as an initial step to separate the different phylogenetic histories of genes in microbial genomes, and target putative horizontal gene transfer (HGT) events. This will continue to be necessary given the rapid growth of genomic data and the technical difficulties in conducting large-scale explicit phylogenetic analyses. However, these methods often produce misleading results due to their inability to resolve indirect phylogenetic links and their vulnerability to stochastic events.

Results

A new computational method of rapid, exhaustive and genome-wide detection of HGT was developed, featuring the systematic analysis of BLAST hit distribution patterns in the context of a priori defined hierarchical evolutionary categories. Genes that fall beyond a series of statistically determined thresholds are identified as not adhering to the typical vertical history of the organisms in question, but instead having a putative horizontal origin. Tests on simulated genomic data suggest that this approach effectively targets atypically distributed genes that are highly likely to be HGT-derived, and exhibits robust performance compared to conventional BLAST-based approaches. This method was further tested on real genomic datasets, including Rickettsia genomes, and was compared to previous studies. Results show consistency with currently employed categories of HGT prediction methods. In-depth analysis of both simulated and real genomic data suggests that the method is notably insensitive to stochastic events such as gene loss, rate variation and database error, which are common challenges to the current methodology. An automated pipeline was created to implement this approach and was made publicly available at: https://github.com/DittmarLab/HGTector. The program is versatile, easily deployed, has a low requirement for computational resources.

Conclusions

HGTector is an effective tool for initial or standalone large-scale discovery of candidate HGT-derived genes.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-717) contains supplementary material, which is available to authorized users.  相似文献   

14.

Background

Multifactor dimensionality reduction (MDR) is widely used to analyze interactions of genes to determine the complex relationship between diseases and polymorphisms in humans. However, the astronomical number of high-order combinations makes MDR a highly time-consuming process which can be difficult to implement for multiple tests to identify more complex interactions between genes. This study proposes a new framework, named fast MDR (FMDR), which is a greedy search strategy based on the joint effect property.

Results

Six models with different minor allele frequencies (MAFs) and different sample sizes were used to generate the six simulation data sets. A real data set was obtained from the mitochondrial D-loop of chronic dialysis patients. Comparison of results from the simulation data and real data sets showed that FMDR identified significant gene–gene interaction with less computational complexity than the MDR in high-order interaction analysis.

Conclusion

FMDR improves the MDR difficulties associated with the computational loading of high-order SNPs and can be used to evaluate the relative effects of each individual SNP on disease susceptibility. FMDR is freely available at http://bioinfo.kmu.edu.tw/FMDR.rar.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1717-8) contains supplementary material, which is available to authorized users.  相似文献   

15.

Background

DAVID is the most popular tool for interpreting large lists of gene/proteins classically produced in high-throughput experiments. However, the use of DAVID website becomes difficult when analyzing multiple gene lists, for it does not provide an adequate visualization tool to show/compare multiple enrichment results in a concise and informative manner.

Result

We implemented a new R-based graphical tool, BACA (Bubble chArt to Compare Annotations), which uses the DAVID web service for cross-comparing enrichment analysis results derived from multiple large gene lists. BACA is implemented in R and is freely available at the CRAN repository (http://cran.r-project.org/web/packages/BACA/).

Conclusion

The package BACA allows R users to combine multiple annotation charts into one output graph by passing DAVID website.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0477-4) contains supplementary material, which is available to authorized users.  相似文献   

16.

Background

Cigarette smoking is the most important risk factor for Chronic Obstructive Pulmonary Disease (COPD). Only a subgroup of smokers develops COPD and it is unclear why these individuals are more susceptible to the detrimental effects of cigarette smoking. The risk to develop COPD is known to be higher in individuals with familial aggregation of COPD. This study aimed to investigate if acute systemic and local immune responses to cigarette smoke differentiate between individuals susceptible or non-susceptible to develop COPD, both at young (18-40 years) and old (40-75 years) age.

Methods

All participants smoked three cigarettes in one hour. Changes in inflammatory markers in peripheral blood (at 0 and 3 hours) and in bronchial biopsies (at 0 and 24 hours) were investigated. Acute effects of smoking were analyzed within and between susceptible and non-susceptible individuals, and by multiple regression analysis.

Results

Young susceptible individuals showed significantly higher increases in the expression of FcγRII (CD32) in its active forms (A17 and A27) on neutrophils after smoking (p = 0.016 and 0.028 respectively), independently of age, smoking status and expression of the respective markers at baseline. Smoking had no significant effect on mediators in blood or inflammatory cell counts in bronchial biopsies. In the old group, acute effects of smoking were comparable between healthy controls and COPD patients.

Conclusions

We show for the first time that COPD susceptibility at young age associates with an increased systemic innate immune response to cigarette smoking. This suggests a role of systemic inflammation in the early induction phase of COPD.

Trial registration

Clinicaltrials.gov: NCT00807469

Electronic supplementary material

The online version of this article (doi:10.1186/s12931-014-0121-2) contains supplementary material, which is available to authorized users.  相似文献   

17.
18.

Background

Chronic bronchitis (CB) is one of the classic phenotypes of COPD. The aims of our study were to investigate genetic variants associated with COPD subjects with CB relative to smokers with normal spirometry, and to assess for genetic differences between subjects with CB and without CB within the COPD population.

Methods

We analyzed data from current and former smokers from three cohorts: the COPDGene Study; GenKOLS (Bergen, Norway); and the Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE). CB was defined as having a cough productive of phlegm on most days for at least 3 consecutive months per year for at least 2 consecutive years. CB COPD cases were defined as having both CB and at least moderate COPD based on spirometry. Our primary analysis used smokers with normal spirometry as controls; secondary analysis was performed using COPD subjects without CB as controls. Genotyping was performed on Illumina platforms; results were summarized using fixed-effect meta-analysis.

Results

For CB COPD relative to smoking controls, we identified a new genome-wide significant locus on chromosome 11p15.5 (rs34391416, OR = 1.93, P = 4.99 × 10-8) as well as significant associations of known COPD SNPs within FAM13A. In addition, a GWAS of CB relative to those without CB within COPD subjects showed suggestive evidence for association on 1q23.3 (rs114931935, OR = 1.88, P = 4.99 × 10-7).

Conclusions

We found genome-wide significant associations with CB COPD on 4q22.1 (FAM13A) and 11p15.5 (EFCAB4A, CHID1 and AP2A2), and a locus associated with CB within COPD subjects on 1q23.3 (RPL31P11 and ATF6). This study provides further evidence that genetic variants may contribute to phenotypic heterogeneity of COPD.

Trial registration

ClinicalTrials.gov NCT00608764, NCT00292552

Electronic supplementary material

The online version of this article (doi:10.1186/s12931-014-0113-2) contains supplementary material, which is available to authorized users.  相似文献   

19.
20.

Background

Large clinical genomics studies using next generation DNA sequencing require the ability to select and track samples from a large population of patients through many experimental steps. With the number of clinical genome sequencing studies increasing, it is critical to maintain adequate laboratory information management systems to manage the thousands of patient samples that are subject to this type of genetic analysis.

Results

To meet the needs of clinical population studies using genome sequencing, we developed a web-based laboratory information management system (LIMS) with a flexible configuration that is adaptable to continuously evolving experimental protocols of next generation DNA sequencing technologies. Our system is referred to as MendeLIMS, is easily implemented with open source tools and is also highly configurable and extensible. MendeLIMS has been invaluable in the management of our clinical genome sequencing studies.

Conclusions

We maintain a publicly available demonstration version of the application for evaluation purposes at http://mendelims.stanford.edu. MendeLIMS is programmed in Ruby on Rails (RoR) and accesses data stored in SQL-compliant relational databases. Software is freely available for non-commercial use at http://dna-discovery.stanford.edu/software/mendelims/.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-290) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号