首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.

Background

With the development of sequencing technologies, more and more sequence variants are available for investigation. Different classes of variants in the human genome have been identified, including single nucleotide substitutions, insertion and deletion, and large structural variations such as duplications and deletions. Insertion and deletion (indel) variants comprise a major proportion of human genetic variation. However, little is known about their effects on humans. The absence of understanding is largely due to the lack of both biological data and computational resources.

Results

This paper presents a new indel functional prediction method HMMvar based on HMM profiles, which capture the conservation information in sequences. The results demonstrate that a scoring strategy based on HMM profiles can achieve good performance in identifying deleterious or neutral variants for different data sets, and can predict the protein functional effects of both single and multiple mutations.

Conclusions

This paper proposed a quantitative prediction method, HMMvar, to predict the effect of genetic variation using hidden Markov models. The HMM based pipeline program implementing the method HMMvar is freely available at https://bioinformatics.cs.vt.edu/zhanglab/hmm.  相似文献   

3.

Background

Virus genome sequences, generated in ever-higher volumes, can provide new scientific insights and inform our responses to epidemics and outbreaks. To facilitate interpretation, such data must be organised and processed within scalable computing resources that encapsulate virology expertise. GLUE (Genes Linked by Underlying Evolution) is a data-centric bioinformatics environment for building such resources. The GLUE core data schema organises sequence data along evolutionary lines, capturing not only nucleotide data but associated items such as alignments, genotype definitions, genome annotations and motifs. Its flexible design emphasises applicability to different viruses and to diverse needs within research, clinical or public health contexts.

Results

HCV-GLUE is a case study GLUE resource for hepatitis C virus (HCV). It includes an interactive public web application providing sequence analysis in the form of a maximum-likelihood-based genotyping method, antiviral resistance detection and graphical sequence visualisation. HCV sequence data from GenBank is categorised and stored in a large-scale sequence alignment which is accessible via web-based queries. Whereas this web resource provides a range of basic functionality, the underlying GLUE project can also be downloaded and extended by bioinformaticians addressing more advanced questions.

Conclusion

GLUE can be used to rapidly develop virus sequence data resources with public health, research and clinical applications. This streamlined approach, with its focus on reuse, will help realise the full value of virus sequence data.
  相似文献   

4.

Background

Large-scale sequence studies requiring BLAST-based analysis produce huge amounts of data to be parsed. BLAST parsers are available, but they are often missing some important features, such as keeping all information from the raw BLAST output, allowing direct access to single results, and performing logical operations over them.

Findings

We implemented BlaSTorage, a Python package that parses multi BLAST results and returns them in a purpose-built object-database format. Unlike other BLAST parsers, BlaSTorage retains and stores all parts of BLAST results, including alignments, without loss of information; a complete API allows access to all the data components.

Conclusions

BlaSTorage shows comparable speed of more basic parser written in compiled languages as C++ and can be easily integrated into web applications or software pipelines.  相似文献   

5.
Nguyen  Nam-phuong  Nute  Michael  Mirarab  Siavash  Warnow  Tandy 《BMC genomics》2016,17(10):765-100

Background

Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics.

Results

We present a new technique for family identification called HIPPI (Hierarchical Profile Hidden Markov Models for Protein family Identification). HIPPI uses a novel technique to represent a multiple sequence alignment for a given protein family or superfamily by an ensemble of profile hidden Markov models computed using HMMER. An evaluation of HIPPI on the Pfam database shows that HIPPI has better overall precision and recall than blastp, HMMER, and pipelines based on HHsearch, and maintains good accuracy even for fragmentary query sequences and for protein families with low average pairwise sequence identity, both conditions where other methods degrade in accuracy.

Conclusion

HIPPI provides accurate protein family identification and is robust to difficult model conditions. Our results, combined with observations from previous studies, show that ensembles of profile Hidden Markov models can better represent multiple sequence alignments than a single profile Hidden Markov model, and thus can improve downstream analyses for various bioinformatic tasks. Further research is needed to determine the best practices for building the ensemble of profile Hidden Markov models. HIPPI is available on GitHub at https://github.com/smirarab/sepp.
  相似文献   

6.

Background

Evolutionary conservation of RNA secondary structure is a typical feature of many functional non-coding RNAs. Since almost all of the available methods used for prediction and annotation of non-coding RNA genes rely on this evolutionary signature, accurate measures for structural conservation are essential.

Results

We systematically assessed the ability of various measures to detect conserved RNA structures in multiple sequence alignments. We tested three existing and eight novel strategies that are based on metrics of folding energies, metrics of single optimal structure predictions, and metrics of structure ensembles. We find that the folding energy based SCI score used in the RNAz program and a simple base-pair distance metric are by far the most accurate. The use of more complex metrics like for example tree editing does not improve performance. A variant of the SCI performed particularly well on highly conserved alignments and is thus a viable alternative when only little evolutionary information is available. Surprisingly, ensemble based methods that, in principle, could benefit from the additional information contained in sub-optimal structures, perform particularly poorly. As a general trend, we observed that methods that include a consensus structure prediction outperformed equivalent methods that only consider pairwise comparisons.

Conclusion

Structural conservation can be measured accurately with relatively simple and intuitive metrics. They have the potential to form the basis of future RNA gene finders, that face new challenges like finding lineage specific structures or detecting mis-aligned sequences.  相似文献   

7.

Background

Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate.

Results

We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/.

Conclusions

Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners.  相似文献   

8.

Background

The identification of copy number aberration in the human genome is an important area in cancer research. We develop a model for determining genomic copy numbers using high-density single nucleotide polymorphism genotyping microarrays. The method is based on a Bayesian spatial normal mixture model with an unknown number of components corresponding to true copy numbers. A reversible jump Markov chain Monte Carlo algorithm is used to implement the model and perform posterior inference.

Results

The performance of the algorithm is examined on both simulated and real cancer data, and it is compared with the popular CNAG algorithm for copy number detection.

Conclusions

We demonstrate that our Bayesian mixture model performs at least as well as the hidden Markov model based CNAG algorithm and in certain cases does better. One of the added advantages of our method is the flexibility of modeling normal cell contamination in tumor samples.  相似文献   

9.
10.

Aims

The objective of this study was to determine the relative importance of transpirational pull, Se speciation, sulfate and species on Se accumulation by plants, in order to determine which of these factors must be considered in the future development of models to predict Se accumulation by plants.

Methods

Seedlings of durum wheat (Triticum turgidum L. var durum cv ‘Kyle’) and spring canola (Brassica napus L. var Hyola 401) were grown hydroponically and exposed to SeO 4 2- (selenate) with or without SO 4 2- (sulfate), or to HSeO 3 - (biselenite) under different transpiration regimes altered through ‘low’ (~50%) or ‘high’ (~78%) relative humidity (RH). Plants were harvested after 0, 8, 16, or 24?h exposures, digested, and analyzed for Se by GFAAS.

Results

Accumulation and distribution of Se by plants is dependent on plant species, Se speciation in the nutrient solution, SO 4 2- competition, and transpiration regimes. Canola accumulated and translocated more Se than wheat. In wheat and canola, the greatest accumulation and translocation of Se occurred when plants were exposed to SeO 4 2- without SO 4 2- compared to solutions of SeO 4 2- with SO 4 2- or HSeO 3 2- . Wheat plants exposed to SeO 4 2- and SO 4 2- had an increased Se accumulation and translocation under increased transpiration rates than when exposed to SeO 4 2- without SO 4 2- or HSeO 3 2- . On the other hand, increases in transpiration increased the translocation of Se to canola shoots when exposed to HSeO 3 - more than any other treatments.

Conclusions

Overall, our results suggest that plant species is the most important factor influencing Se accumulation and translocation, but that these endpoints can be modified by climate and specific soil Se or S content. Models to predict accumulation of Se by plants must consider all of these factors to accurately calculate the mechanisms of uptake and translocation.  相似文献   

11.

Background

Protein sequence alignment is essential for a variety of tasks such as homology modeling and active site prediction. Alignment errors remain the main cause of low-quality structure models. A bioinformatics tool to refine alignments is needed to make protein alignments more accurate.

Results

We developed the SFESA web server to refine pairwise protein sequence alignments. Compared to the previous version of SFESA, which required a set of 3D coordinates for a protein, the new server will search a sequence database for the closest homolog with an available 3D structure to be used as a template. For each alignment block defined by secondary structure elements in the template, SFESA evaluates alignment variants generated by local shifts and selects the best-scoring alignment variant. A scoring function that combines the sequence score of profile-profile comparison and the structure score of template-derived contact energy is used for evaluation of alignments. PROMALS pairwise alignments refined by SFESA are more accurate than those produced by current advanced alignment methods such as HHpred and CNFpred. In addition, SFESA also improves alignments generated by other software.

Conclusions

SFESA is a web-based tool for alignment refinement, designed for researchers to compute, refine, and evaluate pairwise alignments with a combined sequence and structure scoring of alignment blocks. To our knowledge, the SFESA web server is the only tool that refines alignments by evaluating local shifts of secondary structure elements. The SFESA web server is available at http://prodata.swmed.edu/sfesa.  相似文献   

12.
13.
14.

Background

More than one million terms from biomedical ontologies and controlled vocabularies are available through the Ontology Lookup Service (OLS). Although OLS provides ample possibility for querying and browsing terms, the visualization of parts of the ontology graphs is rather limited and inflexible.

Results

We created the OLSVis web application, a visualiser for browsing all ontologies available in the OLS database. OLSVis shows customisable subgraphs of the OLS ontologies. Subgraphs are animated via a real-time force-based layout algorithm which is fully interactive: each time the user makes a change, e.g. browsing to a new term, hiding, adding, or dragging terms, the algorithm performs smooth and only essential reorganisations of the graph. This assures an optimal viewing experience, because subsequent screen layouts are not grossly altered, and users can easily navigate through the graph. URL: http://ols.wordvis.com

Conclusions

The OLSVis web application provides a user-friendly tool to visualise ontologies from the OLS repository. It broadens the possibilities to investigate and select ontology subgraphs through a smooth visualisation method.  相似文献   

15.
16.
Improved method for predicting linear B-cell epitopes   总被引:2,自引:0,他引:2  

Background

B-cell epitopes are the sites of molecules that are recognized by antibodies of the immune system. Knowledge of B-cell epitopes may be used in the design of vaccines and diagnostics tests. It is therefore of interest to develop improved methods for predicting B-cell epitopes. In this paper, we describe an improved method for predicting linear B-cell epitopes.

Results

In order to do this, three data sets of linear B-cell epitope annotated proteins were constructed. A data set was collected from the literature, another data set was extracted from the AntiJen database and a data sets of epitopes in the proteins of HIV was collected from the Los Alamos HIV database. An unbiased validation of the methods was made by testing on data sets on which they were neither trained nor optimized on. We have measured the performance in a non-parametric way by constructing ROC-curves.

Conclusion

The best single method for predicting linear B-cell epitopes is the hidden Markov model. Combining the hidden Markov model with one of the best propensity scale methods, we obtained the BepiPred method. When tested on the validation data set this method performs significantly better than any of the other methods tested. The server and data sets are publicly available at http://www.cbs.dtu.dk/services/BepiPred.  相似文献   

17.

Aims and background

The ability to suppress soil nitrification through the release of nitrification inhibitors from plant roots is termed ‘biological nitrification inhibition’ (BNI). Earlier, we reported that sorghum roots release higher BNI-activity when grown with NH 4 + , but not with NO 3 - as N source. Also for BNI release, rhizosphere pH of <5.0 is needed; beyond this, a negative effect on BNI release was observed with nearly 80% loss of BNI activity at pH >7.0. This study is aimed at understanding the inter-functional relationships associated with NH 4 + uptake, rhizosphere-pH and plasma membrane H+-ATPase (PM H+-ATPase) activity in regulating the release of BNIs (biological nitrification inhibitors) from sorghum roots.

Methods

Sorghum was grown hydroponically and root exudates were collected from intact plants using a pH-stat system to separate the secondary acidification effects by NH 4 + uptake on BNIs release. A recombinant luminescent Nitrosomonas europaea bioassay was used to determine BNI-activity. Root plasma membrane was isolated using a two-phase partitioning system. Hydrolytic H+-ATPase activity was determined. Split-root system setup was deployed to understand the localized responses to NH 4 + , H+-ATPase-stimulator (fusicoccin) or H+-ATPase-inhibitor (vanadates) on BNI release by sorghum.

Results

Presence of NH 4 + in the rhizosphere stimulated the expression of H+-ATPase activity and enhanced the release of BNIs from sorghum roots. Fusicoccin, which stimulates H+-ATPase activity, also stimulated BNIs release in the absence of NH 4 + ; vanadate, which suppresses H+-ATPase activity, also suppressed the release of BNIs. NH 4 + levels (in rhizosphere) positively influenced BNIs release and root H+-ATPase activity in the concentration range of 0-1.0 mM, indicating a close relationship between BNI release and root H+-ATPase activity with a possible involvement of carrier-mediated transport for the release of BNIs in sorghum.

Conclusion

Our results suggest that NH 4 + uptake, PM H+-ATPase activity, and rhizosphere acidification are functionally inter-connected with BNI release in sorghum. Such knowledge is critical to gain insights into why BNI function is more effective in light-textured, mildly acidic soils compared to other soil types.  相似文献   

18.

Background

The generation of multiple sequence alignments (MSAs) is a crucial step for many bioinformatic analyses. Thus improving MSA accuracy and identifying potential errors in MSAs is important for a wide range of post-genomic research. We present a novel method called MergeAlign which constructs consensus MSAs from multiple independent MSAs and assigns an alignment precision score to each column.

Results

Using conventional benchmark tests we demonstrate that on average MergeAlign MSAs are more accurate than MSAs generated using any single matrix of sequence substitution. We show that MergeAlign column scores are related to alignment precision and hence provide an ab initio method of estimating alignment precision in the absence of curated reference MSAs. Using two novel and independent alignment performance tests that utilise a large set of orthologous gene families we demonstrate that increasing MSA performance leads to an increase in the performance of downstream phylogenetic analyses.

Conclusion

Using multiple tests of alignment performance we demonstrate that this novel method has broad general application in biological research.  相似文献   

19.

Background

With advances in DNA re-sequencing methods and Next-Generation parallel sequencing approaches, there has been a large increase in genomic efforts to define and analyze the sequence variability present among individuals within a species. For very polymorphic species such as maize, this has lead to a need for intuitive, user-friendly software that aids the biologist, often with naïve programming capability, in tracking, editing, displaying, and exporting multiple individual sequence alignments. To fill this need we have developed a novel DNA alignment editor.

Results

We have generated a nucleotide sequence alignment editor (DNAAlignEditor) that provides an intuitive, user-friendly interface for manual editing of multiple sequence alignments with functions for input, editing, and output of sequence alignments. The color-coding of nucleotide identity and the display of associated quality score aids in the manual alignment editing process. DNAAlignEditor works as a client/server tool having two main components: a relational database that collects the processed alignments and a user interface connected to database through universal data access connectivity drivers. DNAAlignEditor can be used either as a stand-alone application or as a network application with multiple users concurrently connected.

Conclusion

We anticipate that this software will be of general interest to biologists and population genetics in editing DNA sequence alignments and analyzing natural sequence variation regardless of species, and will be particularly useful for manual alignment editing of sequences in species with high levels of polymorphism.
  相似文献   

20.

Background and aims

The direct measurement of denitrification dynamics and its product fractions is important for parameterizing process-oriented model(s) for nitrogen cycling in various soils. The aims of this study are to a) directly measure the denitrification potential and the fractions of nitrogenous gases as products of the process in laboratory, b) investigate the effects of the nitrate (NO 3 ? ) concentration on emissions of denitrification gases, and c) test the hypothesis that denitrification can be a major pathway of nitrous oxide (N2O) and nitric oxide (NO) production in calcic cambisols under conditions of simultaneously sufficient supplies of carbon and nitrogen substrates and anaerobiosis as to be found to occur commonly in agricultural lands.

Methods

Using the helium atmosphere (with or without oxygen) gas-flow-soil-core technique in laboratory, we directly measured the denitrification potential of a silt clay calcic cambisol and the production of nitrogen gas (N2), N2O and NO during denitrification under the conditions of seven levels of NO 3 ? concentrations (ranging from 10 to 250 mg N kg?1 dry soil) and an almost constant initial dissolved organic carbon concentration (300 mg C kg?1 dry soil).

Results

Almost all the soil NO 3 ? was consumed during anaerobic incubation, with 80–88 % of the consumed NO 3 ? recovered by measuring nitrogenous gases. The results showed that the increases in initial NO 3 ? concentrations significantly enhanced the denitrification potential and the emissions of N2 and N2O as products of this process. Despite the wide range of initial NO 3 ? concentrations, the ratios of N2, N2O and NO products to denitrification potential showed much narrower ranges of 51–78 % for N2, 14–36 % for N2O and 5–22 % for NO.

Conclusions

These results well support the above hypothesis and provide some parameters for simulating effects of variable soil NO 3 ? concentrations on denitrification process as needed for biogeochemical models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号