首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

The goal of haplotype assembly is to infer haplotypes of an individual from a mixture of sequenced chromosome fragments. Limited lengths of paired-end sequencing reads and inserts render haplotype assembly computationally challenging; in fact, most of the problem formulations are known to be NP-hard. Dimensions (and, therefore, difficulty) of the haplotype assembly problems keep increasing as the sequencing technology advances and the length of reads and inserts grow. The computational challenges are even more pronounced in the case of polyploid haplotypes, whose assembly is considerably more difficult than in the case of diploids. Fast, accurate, and scalable methods for haplotype assembly of diploid and polyploid organisms are needed.

Results

We develop a novel framework for diploid/polyploid haplotype assembly from high-throughput sequencing data. The method formulates the haplotype assembly problem as a semi-definite program and exploits its special structure – namely, the low rank of the underlying solution – to solve it rapidly and with high accuracy. The developed framework is applicable to both diploid and polyploid species. The code for SDhaP is freely available at https://sourceforge.net/projects/sdhap.

Conclusion

Extensive benchmarking tests on both real and simulated data show that the proposed algorithms outperform several well-known haplotype assembly methods in terms of either accuracy or speed or both. Useful recommendations for coverages needed to achieve near-optimal solutions are also provided.  相似文献   

2.

Motivation

16S rDNA hypervariable tag sequencing has become the de facto method for accessing microbial diversity. Illumina paired-end sequencing, which produces two separate reads for each DNA fragment, has become the platform of choice for this application. However, when the two reads do not overlap, existing computational pipelines analyze data from read separately and underutilize the information contained in the paired-end reads.

Results

We created a workflow known as Illinois Mayo Taxon Organization from RNA Dataset Operations (IM-TORNADO) for processing non-overlapping reads while retaining maximal information content. Using synthetic mock datasets, we show that the use of both reads produced answers with greater correlation to those from full length 16S rDNA when looking at taxonomy, phylogeny, and beta-diversity.

Availability and Implementation

IM-TORNADO is freely available at http://sourceforge.net/projects/imtornado and produces BIOM format output for cross compatibility with other pipelines such as QIIME, mothur, and phyloseq.  相似文献   

3.

Background

By examining the genotype calls generated by the 1000 Genomes Project we discovered that the human reference genome GRCh37 contains almost 20,000 loci in which the reference allele has never been observed in healthy individuals and around 70,000 loci in which it has been observed only in the heterozygous state.

Results

We show that a large fraction of this rare reference allele (RRA) loci belongs to coding, functional and regulatory elements of the genome and could be linked to rare Mendelian disorders as well as cancer. We also demonstrate that classical germline and somatic variant calling tools are not capable to recognize the rare allele when present in these loci. To overcome such limitations, we developed a novel tool, named RAREVATOR, that is able to identify and call the rare allele in these genomic positions. By using a small cancer dataset we compared our tool with two state-of-the-art callers and we found that RAREVATOR identified more than 1,500 germline and 22 somatic RRA variants missed by the two methods and which belong to significantly mutated pathways.

Conclusions

These results show that, to date, the investigation of around 100,000 loci of the human genome has been missed by re-sequencing experiments based on the GRCh37 assembly and that our tool can fill the gap left by other methods. Moreover, the investigation of the latest version of the human reference genome, GRCh38, showed that although the GRC corrected almost all insertions and a small part of SNVs and deletions, a large number of functionally relevant RRAs still remain unchanged. For this reason, also future resequencing experiments, based on GRCh38, will benefit from RAREVATOR analysis results. RAREVATOR is freely available at http://sourceforge.net/projects/rarevator.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1481-9) contains supplementary material, which is available to authorized users.  相似文献   

4.

Background

Next generation sequencing platforms have greatly reduced sequencing costs, leading to the production of unprecedented amounts of sequence data. BWA is one of the most popular alignment tools due to its relatively high accuracy. However, mapping reads using BWA is still the most time consuming step in sequence analysis. Increasing mapping efficiency would allow the community to better cope with ever expanding volumes of sequence data.

Results

We designed a new program, CGAP-align, that achieves a performance improvement over BWA without sacrificing recall or precision. This is accomplished through the use of Suffix Tarray, a novel data structure combining elements of Suffix Array and Suffix Tree. We also utilize a tighter lower bound estimation for the number of mismatches in a read, allowing for more effective pruning during inexact mapping. Evaluation of both simulated and real data suggests that CGAP-align consistently outperforms the current version of BWA and can achieve over twice its speed under certain conditions, all while obtaining nearly identical results.

Conclusion

CGAP-align is a new time efficient read alignment tool that extends and improves BWA. The increase in alignment speed will be of critical assistance to all sequence-based research and medicine. CGAP-align is freely available to the academic community at http://sourceforge.net/p/cgap-align under the GNU General Public License (GPL).  相似文献   

5.

Background

Searching the orthologs of a given protein or DNA sequence is one of the most important and most commonly used Bioinformatics methods in Biology. Programs like BLAST or the orthology search engine Inparanoid can be used to find orthologs when the similarity between two sequences is sufficiently high. They however fail when the level of conservation is low. The detection of remotely conserved proteins oftentimes involves sophisticated manual intervention that is difficult to automate.

Results

Here, we introduce morFeus, a search program to find remotely conserved orthologs. Based on relaxed sequence similarity searches, morFeus selects sequences based on the similarity of their alignments to the query, tests for orthology by iterative reciprocal BLAST searches and calculates a network score for the resulting network of orthologs that is a measure of orthology independent of the E-value. Detecting remotely conserved orthologs of a protein using morFeus thus requires no manual intervention. We demonstrate the performance of morFeus by comparing it to state-of-the-art orthology resources and methods. We provide an example of remotely conserved orthologs, which were experimentally shown to be functionally equivalent in the respective organisms and therefore meet the criteria of the orthology-function conjecture.

Conclusions

Based on our results, we conclude that morFeus is a powerful and specific search method for detecting remotely conserved orthologs. morFeus is freely available at http://bio.biochem.mpg.de/morfeus/. Its source code is available from Sourceforge.net (https://sourceforge.net/p/morfeus/).

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-263) contains supplementary material, which is available to authorized users.  相似文献   

6.

Background

Small RNA sequencing is commonly used to identify novel miRNAs and to determine their expression levels in plants. There are several miRNA identification tools for animals such as miRDeep, miRDeep2 and miRDeep*. miRDeep-P was developed to identify plant miRNA using miRDeep’s probabilistic model of miRNA biogenesis, but it depends on several third party tools and lacks a user-friendly interface. The objective of our miRPlant program is to predict novel plant miRNA, while providing a user-friendly interface with improved accuracy of prediction.

Result

We have developed a user-friendly plant miRNA prediction tool called miRPlant. We show using 16 plant miRNA datasets from four different plant species that miRPlant has at least a 10% improvement in accuracy compared to miRDeep-P, which is the most popular plant miRNA prediction tool. Furthermore, miRPlant uses a Graphical User Interface for data input and output, and identified miRNA are shown with all RNAseq reads in a hairpin diagram.

Conclusions

We have developed miRPlant which extends miRDeep* to various plant species by adopting suitable strategies to identify hairpin excision regions and hairpin structure filtering for plants. miRPlant does not require any third party tools such as mapping or RNA secondary structure prediction tools. miRPlant is also the first plant miRNA prediction tool that dynamically plots miRNA hairpin structure with small reads for identified novel miRNAs. This feature will enable biologists to visualize novel pre-miRNA structure and the location of small RNA reads relative to the hairpin. Moreover, miRPlant can be easily used by biologists with limited bioinformatics skills.miRPlant and its manual are freely available at http://www.australianprostatecentre.org/research/software/mirplant or http://sourceforge.net/projects/mirplant/.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-275) contains supplementary material, which is available to authorized users.  相似文献   

7.

Background

Using whole exome sequencing to predict aberrations in tumours is a cost effective alternative to whole genome sequencing, however is predominantly used for variant detection and infrequently utilised for detection of somatic copy number variation.

Results

We propose a new method to infer copy number and genotypes using whole exome data from paired tumour/normal samples. Our algorithm uses two Hidden Markov Models to predict copy number and genotypes and computationally resolves polyploidy/aneuploidy, normal cell contamination and signal baseline shift. Our method makes explicit detection on chromosome arm level events, which are commonly found in tumour samples. The methods are combined into a package named ADTEx (Aberration Detection in Tumour Exome). We applied our algorithm to a cohort of 17 in-house generated and 18 TCGA paired ovarian cancer/normal exomes and evaluated the performance by comparing against the copy number variations and genotypes predicted using Affymetrix SNP 6.0 data of the same samples. Further, we carried out a comparison study to show that ADTEx outperformed its competitors in terms of precision and F-measure.

Conclusions

Our proposed method, ADTEx, uses both depth of coverage ratios and B allele frequencies calculated from whole exome sequencing data, to predict copy number variations along with their genotypes. ADTEx is implemented as a user friendly software package using Python and R statistical language. Source code and sample data are freely available under GNU license (GPLv3) at http://adtex.sourceforge.net/.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-732) contains supplementary material, which is available to authorized users.  相似文献   

8.
9.

Background

A typical affinity purification coupled to mass spectrometry (AP-MS) experiment includes the purification of a target protein (bait) using an antibody and subsequent mass spectrometry analysis of all proteins co-purifying with the bait (aka prey proteins). Like any other systems biology approach, AP-MS experiments generate a lot of data and visualization has been challenging, especially when integrating AP-MS experiments with orthogonal datasets.

Results

We present Circular Interaction Graph for Proteomics (CIG-P), which generates circular diagrams for visually appealing final representation of AP-MS data. Through a Java based GUI, the user inputs experimental and reference data as file in csv format. The resulting circular representation can be manipulated live within the GUI before exporting the diagram as vector graphic in pdf format. The strength of CIG-P is the ability to integrate orthogonal datasets with each other, e.g. affinity purification data of kinase PRPF4B in relation to the functional components of the spliceosome. Further, various AP-MS experiments can be compared to each other.

Conclusions

CIG-P aids to present AP-MS data to a wider audience and we envision that the tool finds other applications too, e.g. kinase – substrate relationships as a function of perturbation. CIG-P is available under: http://sourceforge.net/projects/cig-p/

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-344) contains supplementary material, which is available to authorized users.  相似文献   

10.

Background

Phylogenetic-based classification of M. tuberculosis and other bacterial genomes is a core analysis for studying evolutionary hypotheses, disease outbreaks and transmission events. Whole genome sequencing is providing new insights into the genomic variation underlying intra- and inter-strain diversity, thereby assisting with the classification and molecular barcoding of the bacteria. One roadblock to strain investigation is the lack of user-interactive solutions to interrogate and visualise variation within a phylogenetic tree setting.

Results

We have developed a web-based tool called PhyTB (http://pathogenseq.lshtm.ac.uk/phytblive/index.php) to assist phylogenetic tree visualisation and identification of M. tuberculosis clade-informative polymorphism. Variant Call Format files can be uploaded to determine a sample position within the tree. A map view summarises the geographical distribution of alleles and strain-types. The utility of the PhyTB is demonstrated on sequence data from 1,601 M. tuberculosis isolates.

Conclusion

PhyTB contextualises M. tuberculosis genomic variation within epidemiological, geographical and phylogenic settings. Further tool utility is possible by incorporating large variants and phenotypic data (e.g. drug-resistance profiles), and an assessment of genotype-phenotype associations. Source code is available to develop similar websites for other organisms (http://sourceforge.net/projects/phylotrack).  相似文献   

11.

Background

The use of sequencing technologies to investigate the microbiome of a sample can positively impact patient healthcare by providing therapeutic targets for personalized disease treatment. However, these samples contain genomic sequences from various sources that complicate the identification of pathogens.

Results

Here we present Clinical PathoScope, a pipeline to rapidly and accurately remove host contamination, isolate microbial reads, and identify potential disease-causing pathogens. We have accomplished three essential tasks in the development of Clinical PathoScope. First, we developed an optimized framework for pathogen identification using a computational subtraction methodology in concordance with read trimming and ambiguous read reassignment. Second, we have demonstrated the ability of our approach to identify multiple pathogens in a single clinical sample, accurately identify pathogens at the subspecies level, and determine the nearest phylogenetic neighbor of novel or highly mutated pathogens using real clinical sequencing data. Finally, we have shown that Clinical PathoScope outperforms previously published pathogen identification methods with regard to computational speed, sensitivity, and specificity.

Conclusions

Clinical PathoScope is the only pathogen identification method currently available that can identify multiple pathogens from mixed samples and distinguish between very closely related species and strains in samples with very few reads per pathogen. Furthermore, Clinical PathoScope does not rely on genome assembly and thus can more rapidly complete the analysis of a clinical sample when compared with current assembly-based methods. Clinical PathoScope is freely available at: http://sourceforge.net/projects/pathoscope/.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-262) contains supplementary material, which is available to authorized users.  相似文献   

12.

Background

There is a need for biomarkers to better characterise individuals with COPD and to aid with the development of therapeutic interventions. A panel of putative blood biomarkers was assessed in a subgroup of the Evaluation of COPD Longitudinally to Identify Surrogate Endpoints (ECLIPSE) cohort.

Methods

Thirty-four blood biomarkers were assessed in 201 subjects with COPD, 37 ex-smoker controls with normal lung function and 37 healthy non-smokers selected from the ECLIPSE cohort. Biomarker repeatability was assessed using baseline and 3-month samples. Intergroup comparisons were made using analysis of variance, repeatability was assessed through Bland-Altman plots, and correlations between biomarkers and clinical characteristics were assessed using Spearman correlation coefficients.

Results

Fifteen biomarkers were significantly different in individuals with COPD when compared to former or non-smoker controls. Some biomarkers, including tumor necrosis factor-α and interferon-γ, were measurable in only a minority of subjects whilst others such as C-reactive protein showed wide variability over the 3-month replication period. Fibrinogen was the most repeatable biomarker and exhibited a weak correlation with 6-minute walk distance, exacerbation rate, BODE index and MRC dyspnoea score in COPD subjects. 33% (66/201) of the COPD subjects reported at least 1 exacerbation over the 3 month study with 18% (36/201) reporting the exacerbation within 30 days of the 3-month visit. CRP, fibrinogen interleukin-6 and surfactant protein-D were significantly elevated in those COPD subjects with exacerbations within 30 days of the 3-month visit compared with those individuals that did not exacerbate or whose exacerbations had resolved.

Conclusions

Only a few of the biomarkers assessed may be useful in diagnosis or management of COPD where the diagnosis is based on airflow obstruction (GOLD). Further analysis of more promising biomarkers may reveal utility in subsets of patients. Fibrinogen in particular has emerged as a potentially useful biomarker from this cohort and requires further investigation.

Trial Registration

SCO104960, clinicaltrials.gov identifier NCT00292552  相似文献   

13.

Background

Hydrogen/deuterium exchange (HDX) coupled to mass spectrometry permits analysis of structure, dynamics, and molecular interactions of proteins. HDX mass spectrometry is confounded by deuterium exchange-associated peaks overlapping with peaks of heavy, natural abundance isotopes, such as carbon-13. Recent studies demonstrated that high-performance mass spectrometers could resolve isotopic fine structure and eliminate this peak overlap, allowing direct detection and quantification of deuterium incorporation.

Results

Here, we present a graphical tool that allows for a rapid and automated estimation of deuterium incorporation from a spectrum with isotopic fine structure. Given a peptide sequence (or elemental formula) and charge state, the mass-to-charge ratios of deuterium-associated peaks of the specified ion is determined. Intensities of peaks in an experimental mass spectrum within bins corresponding to these values are used to determine the distribution of deuterium incorporated. A theoretical spectrum can then be calculated based on the estimated distribution of deuterium exchange to confirm interpretation of the spectrum. Deuterium incorporation can also be detected for ion signals without a priori specification of an elemental formula, permitting detection of exchange in complex samples of unidentified material such as natural organic matter. A tool is also incorporated into QUDeX-MS to help in assigning ion signals from peptides arising from enzymatic digestion of proteins. MATLAB-deployable and standalone versions are available for academic use at qudex-ms.sourceforge.net and agarlabs.com.

Conclusion

Isotopic fine structure HDX-MS offers the potential to increase sequence coverage of proteins being analyzed through mass accuracy and deconvolution of overlapping ion signals. As previously demonstrated, however, the data analysis workflow for HDX-MS data with resolved isotopic fine structure is distinct. QUDeX-MS we hope will aid in the adoption of isotopic fine structure HDX-MS by providing an intuitive workflow and interface for data analysis.  相似文献   

14.

Background

The assembly of viral or endosymbiont genomes from Next Generation Sequencing (NGS) data is often hampered by the predominant abundance of reads originating from the host organism. These reads increase the memory and CPU time usage of the assembler and can lead to misassemblies.

Results

We developed RAMBO-K (Read Assignment Method Based On K-mers), a tool which allows rapid and sensitive removal of unwanted host sequences from NGS datasets. Reaching a speed of 10 Megabases/s on 4 CPU cores and a standard hard drive, RAMBO-K is faster than any tool we tested, while showing a consistently high sensitivity and specificity across different datasets.

Conclusions

RAMBO-K rapidly and reliably separates reads from different species without data preprocessing. It is suitable as a straightforward standard solution for workflows dealing with mixed datasets. Binaries and source code (java and python) are available from http://sourceforge.net/projects/rambok/.  相似文献   

15.

Background

Cigarette smoking is the most important risk factor for Chronic Obstructive Pulmonary Disease (COPD). Only a subgroup of smokers develops COPD and it is unclear why these individuals are more susceptible to the detrimental effects of cigarette smoking. The risk to develop COPD is known to be higher in individuals with familial aggregation of COPD. This study aimed to investigate if acute systemic and local immune responses to cigarette smoke differentiate between individuals susceptible or non-susceptible to develop COPD, both at young (18-40 years) and old (40-75 years) age.

Methods

All participants smoked three cigarettes in one hour. Changes in inflammatory markers in peripheral blood (at 0 and 3 hours) and in bronchial biopsies (at 0 and 24 hours) were investigated. Acute effects of smoking were analyzed within and between susceptible and non-susceptible individuals, and by multiple regression analysis.

Results

Young susceptible individuals showed significantly higher increases in the expression of FcγRII (CD32) in its active forms (A17 and A27) on neutrophils after smoking (p = 0.016 and 0.028 respectively), independently of age, smoking status and expression of the respective markers at baseline. Smoking had no significant effect on mediators in blood or inflammatory cell counts in bronchial biopsies. In the old group, acute effects of smoking were comparable between healthy controls and COPD patients.

Conclusions

We show for the first time that COPD susceptibility at young age associates with an increased systemic innate immune response to cigarette smoking. This suggests a role of systemic inflammation in the early induction phase of COPD.

Trial registration

Clinicaltrials.gov: NCT00807469

Electronic supplementary material

The online version of this article (doi:10.1186/s12931-014-0121-2) contains supplementary material, which is available to authorized users.  相似文献   

16.
Mobile elements are major drivers in changing genomic architecture and can cause disease. The detection of mobile elements is hindered due to the low mappability of their highly repetitive sequences. We have developed an algorithm, called Mobster, to detect non-reference mobile element insertions in next generation sequencing data from both whole genome and whole exome studies. Mobster uses discordant read pairs and clipped reads in combination with consensus sequences of known active mobile elements. Mobster has a low false discovery rate and high recall rate for both L1 and Alu elements. Mobster is available at http://sourceforge.net/projects/mobster.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0488-x) contains supplementary material, which is available to authorized users.  相似文献   

17.

Background

Sleep deprivation and obesity, are associated with neurocognitive impairments. Effects of sleep deprivation and obesity on cognition are unknown, and the cognitive long-term effects of improvement of sleep have not been prospectively assessed in short sleeping, obese individuals.

Objective

To characterize neurocognitive functions and assess its reversibility.

Design

Prospective cohort study.

Setting

Tertiary Referral Research Clinical Center.

Patients

A cohort of 121 short-sleeping (<6.5 h/night) obese (BMI 30–55 kg/m2) men and pre-menopausal women.

Intervention

Sleep extension (468±88 days) with life-style modifications.

Measurements

Neurocognitive functions, sleep quality and sleep duration.

Results

At baseline, 44% of the individuals had an impaired global deficit score (t-score 0–39). Impaired global deficit score was associated with worse subjective sleep quality (p = 0.02), and lower urinary dopamine levels (p = 0.001). Memory was impaired in 33%; attention in 35%; motor skills in 42%; and executive function in 51% of individuals. At the final evaluation (N = 74), subjective sleep quality improved by 24% (p<0.001), self-reported sleep duration increased by 11% by questionnaires (p<0.001) and by 4% by diaries (p = 0.04), and daytime sleepiness tended to improve (p = 0.10). Global cognitive function and attention improved by 7% and 10%, respectively (both p = 0.001), and memory and executive functions tended to improve (p = 0.07 and p = 0.06). Serum cortisol increased by 17% (p = 0.02). In a multivariate mixed model, subjective sleep quality and sleep efficiency, urinary free cortisol and dopamine and plasma total ghrelin accounted for 1/5 of the variability in global cognitive function.

Limitations

Drop-out rate.

Conclusions

Chronically sleep-deprived obese individuals exhibit substantial neurocognitive deficits that are partially reversible upon improvement of sleep in a non-pharmacological way. These findings have clinical implications for large segments of the US population.

Trail registration

www.ClinicalTrials.gov NCT00261898. NIDDK protocol 06-DK-0036  相似文献   

18.

Background

Internet support groups (ISGs) are popular, particularly among people with depression, but there is little high quality evidence concerning their effectiveness.

Aim

The study aimed to evaluate the efficacy of an ISG for reducing depressive symptoms among community members when used alone and in combination with an automated Internet-based psychotherapy training program.

Method

Volunteers with elevated psychological distress were identified using a community-based screening postal survey. Participants were randomised to one of four 12-week conditions: depression Internet Support Group (ISG), automated depression Internet Training Program (ITP), combination of the two (ITP+ISG), or a control website with delayed access to e-couch at 6 months. Assessments were conducted at baseline, post-intervention, 6 and 12 months.

Results

There was no change in depressive symptoms relative to control after 3 months of exposure to the ISG. However, both the ISG alone and the combined ISG+ITP group showed significantly greater reduction in depressive symptoms at 6 and 12 months follow-up than the control group. The ITP program was effective relative to control at post-intervention but not at 6 months.

Conclusions

ISGs for depression are promising and warrant further empirical investigation.

Trial Registration

Controlled-Trials.com ISRCTN65657330  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号