首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The advent of next-generation sequencing technologies has greatly promoted the field of metagenomics which studies genetic material recovered directly from an environment. Characterization of genomic composition of a metagenomic sample is essential for understanding the structure of the microbial community. Multiple genomes contained in a metagenomic sample can be identified and quantitated through homology searches of sequence reads with known sequences catalogued in reference databases. Traditionally, reads with multiple genomic hits are assigned to non-specific or high ranks of the taxonomy tree, thereby impacting on accurate estimates of relative abundance of multiple genomes present in a sample. Instead of assigning reads one by one to the taxonomy tree as many existing methods do, we propose a statistical framework to model the identified candidate genomes to which sequence reads have hits. After obtaining the estimated proportion of reads generated by each genome, sequence reads are assigned to the candidate genomes and the taxonomy tree based on the estimated probability by taking into account both sequence alignment scores and estimated genome abundance. The proposed method is comprehensively tested on both simulated datasets and two real datasets. It assigns reads to the low taxonomic ranks very accurately. Our statistical approach of taxonomic assignment of metagenomic reads, TAMER, is implemented in R and available at http://faculty.wcas.northwestern.edu/hji403/MetaR.htm.  相似文献   

2.
We describe and analyze a periodically-forced difference equation model for malaria in mosquitoes that captures the effects of seasonality and allows the mosquitoes to feed on a heterogeneous population of hosts. We numerically show the existence of a unique globally asymptotically stable periodic orbit and calculate periodic orbits of field-measurable quantities that measure malaria transmission. We integrate this model with an individual-based stochastic simulation model for malaria in humans to compare the effects of insecticide-treated nets (ITNs) and indoor residual spraying (IRS) in reducing malaria transmission, prevalence, and incidence. We show that ITNs are more effective than IRS in reducing transmission and prevalence though IRS would achieve its maximal effects within 2 years while ITNs would need two mass distribution campaigns over several years to do so. Furthermore, the combination of both interventions is more effective than either intervention alone. However, although these interventions reduce transmission and prevalence, they can lead to increased clinical malaria; and all three malaria indicators return to preintervention levels within 3 years after the interventions are withdrawn.  相似文献   

3.
Malaria is currently one of the world´s major health problems. About a half-million deaths are recorded every year. In Portugal, malaria cases were significantly high until the end of the 1950s but the disease was considered eliminated in 1973. In the past few years, endemic malaria cases have been recorded in some European countries. With the increasing human mobility from countries with endemic malaria to Portugal, there is concern about the resurgence of this disease in the country. Here, we model and map the risk of malaria transmission for mainland Portugal, considering 3 different scenarios of existing imported infections. This risk assessment resulted from entomological studies on An. atroparvus, the only known mosquito capable of transmitting malaria in the study area. We used the malariogenic potential (determined by receptivity, infectivity and vulnerability) applied over geospatial data sets to estimate spatial variation in malaria risk. The results suggest that the risk exists, and the hotspots are concentrated in the northeast region of the country and in the upper and lower Alentejo regions.  相似文献   

4.
5.
6.
The standard framework for ecological risk assessment does not explicitly address multiple activities. Although this has not prevented its use for assessments of risks from multiple agents, the routine assessment of complex programs or of multiple agents acting on a site, watershed or region would be aided by use of a framework that is designed for that purpose. The framework proposed in this paper is modular with respect to the individual activities which makes the assessment more manageable and more efficient when the same activities are addressed in multiple programs or at multiple sites. It explicitly allows for analysis of indirect effects in terms of causal chains. It includes links to other risk assessments for which changes in ecological conditions are the hazardous agent. For example, changes in ecological condition may create risks to agricultural economies or to the cultural resource values of a site. Finally, the framework includes a standard approach to estimating the combined effects of the multiple agents acting on a receptor.  相似文献   

7.
Heterogeneous networked clusters are being increasingly used as platforms for resource-intensive parallel and distributed applications. The fundamental underlying idea is to provide large amounts of processing capacity over extended periods of time by harnessing the idle and available resources on the network in an opportunistic manner. In this paper we present the design, implementation and evaluation of a framework that uses JavaSpaces to support this type of opportunistic adaptive parallel/distributed computing over networked clusters in a non-intrusive manner. The framework targets applications exhibiting coarse grained parallelism and has three key features: (1) portability across heterogeneous platforms, (2) minimal configuration overheads for participating nodes, and (3) automated system state monitoring (using SNMP) to ensure non-intrusive behavior. Experimental results presented in this paper demonstrate that for applications that can be broken into coarse-grained, relatively independent tasks, the opportunistic adaptive parallel computing framework can provide performance gains. Furthermore, the results indicate that monitoring and reacting to the current system state minimizes the intrusiveness of the framework.  相似文献   

8.
9.
QTL形态标记定位的一种数学方法   总被引:3,自引:0,他引:3  
根据家蚕中位于Z染色体上的伴性遗传的双形态标记和假定与其有连锁关系的一个具有一对主基因差异的数量性状在测交世代中,所作的理论分布,本文建立了QTL形态标记定位的数学方法,即频数分布面积法,并给出了相应的检测一对主基因在测交世代中的同分离比例及其与形态标记是否有连锁关系的X2统计量.这种定位方法亦适应于非伴性遗传方式的QTL形态标记定位.与单标记定位的极大似然方法相比,我们的方法所作的双标记定位能显示QTL与形态标记发生重组的交叉干步作用,并且定位结果不受作用于数量性状的环境效应所影响.  相似文献   

10.
Clustering is a prevalent analytical means to analyze single cell RNA sequencing (scRNA-seq) data but the rapidly expanding data volume can make this process computationally challenging. New methods for both accurate and efficient clustering are of pressing need. Here we proposed Spearman subsampling-clustering-classification (SSCC),a new clustering framework based on random projection and feature construction,for large-scale scRNA-seq data. SSCC greatly improves clustering accuracy,robustness,and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells,SSCC achieved 20%improvement for clustering accuracy and 50-fold acceleration,but only consumed 66%memory usage,compared to the widelyused software package SC3. Compared to k-means,the accuracy improvement of SSCC can reach 3-fold. An R implementation of SSCC is available at https://github.com/Japrin/sscClust.  相似文献   

11.
Annotation of protein functions plays an important role in understanding life at the molecular level. High‐throughput sequencing produces massive numbers of raw proteins sequences and only about 1% of them have been manually annotated with functions. Experimental annotations of functions are expensive, time‐consuming and do not keep up with the rapid growth of the sequence numbers. This motivates the development of computational approaches that predict protein functions. A novel deep learning framework, DeepFunc, is proposed which accurately predicts protein functions from protein sequence‐ and network‐derived information. More precisely, DeepFunc uses a long and sparse binary vector to encode information concerning domains, families, and motifs collected from the InterPro tool that is associated with the input protein sequence. This vector is processed with two neural layers to obtain a low‐dimensional vector which is combined with topological information extracted from protein–protein interactions (PPIs) and functional linkages. The combined information is processed by a deep neural network that predicts protein functions. DeepFunc is empirically and comparatively tested on a benchmark testing dataset and the Critical Assessment of protein Function Annotation algorithms (CAFA) 3 dataset. The experimental results demonstrate that DeepFunc outperforms current methods on the testing dataset and that it secures the highest Fmax = 0.54 and AUC = 0.94 on the CAFA3 dataset.  相似文献   

12.
The Systemwide Initiative on Malaria and Agriculture (SIMA) is an initiative of international agricultural research centers to promote research and capacity building on the links between malaria and agriculture and to validate innovative interventions that would strengthen and complement existing malaria-control strategies in clearly defined settings. Knowledge regarding the nature and dynamics of agroecosystems is particularly needed for the purpose of developing appropriate farmer-managed preventive measures against malaria. SIMA research aims to make use of new and existing information on biomedical and socioeconomic determinants of malaria risks in formulating and evaluating the feasibility of integrated strategies. The initiative is especially interested and proactive in promoting and facilitating transdisciplinary and participatory research in relation to malaria. The convening institute for SIMA is the International Water Management Institute at its Africa Regional Office in Pretoria, South Africa. This article outlines SIMAs objectives and scope of activities and also highlights achievements, challenges, and opportunities for future collaboration.  相似文献   

13.
In some cross-sectional studies of chronic disease, data consist of the age at examination, whether the disease was present at the exam, and recall of the age at first diagnosis. This article describes a flexible parametric approach for combining current status and age at first diagnosis data. We assume that the log odds of onset by a given age and of detection by a given age conditional on onset by that age are nondecreasing functions of time plus linear combinations of covariates. Piecewise linear models are used to characterize changes across time in the baseline odds. Methods are described for accommodating informatively missing current status data and inferences based on the age-specific incidence of disease prior to a landmark event (e.g., puberty, menopause). Our formulation enables straightforward maximum likelihood estimation without requiring restrictive parametric or Markov assumptions. The methods are applied to data from a study of uterine fibroids.  相似文献   

14.
The impacts of sediment contaminants can be evaluated by different lines of evidence, including toxicity tests and ecological community studies. Responses from 10 different toxicity assays/tests were combined to arrive at a “site score.” We employed a relatively simple summary measure, pooled P-values where we quantify a potential decrement in response in a contaminated site relative to nominally clean reference sites. The response-specific P-values were defined relative to a “null” distribution of responses in reference sites, and were then pooled using standard meta-analytic methods. Ecological community data were also evaluated using an analogous strategy. A distribution of distances of the reference sites from thecentroid of the reference sites was obtained. The distance from each of the test sites from the centroid of the reference sites was then calculated, and the proportion of reference distances that exceed the test site difference was used to define an empirical P-value for that test site. A plot of the toxicity P-value versus the community P-value was used to identify sites based on both alteration in community structure and toxicity, that is, by weight-of-evidence. This approach provides a useful strategy for examining multiple lines of evidence that should be accessible to the broader scientific community. The use of a large collection of reference sites to empirically define P-values is appealing in that parametric distribution assumptions are avoided, although this does come at the cost of assuming the reference sites provide an appropriate comparison group for test sites.  相似文献   

15.
Penalized Multiple Regression (PMR) can be used to discover novel disease associations in GWAS datasets. In practice, proposed PMR methods have not been able to identify well-supported associations in GWAS that are undetectable by standard association tests and thus these methods are not widely applied. Here, we present a combined algorithmic and heuristic framework for PUMA (Penalized Unified Multiple-locus Association) analysis that solves the problems of previously proposed methods including computational speed, poor performance on genome-scale simulated data, and identification of too many associations for real data to be biologically plausible. The framework includes a new minorize-maximization (MM) algorithm for generalized linear models (GLM) combined with heuristic model selection and testing methods for identification of robust associations. The PUMA framework implements the penalized maximum likelihood penalties previously proposed for GWAS analysis (i.e. Lasso, Adaptive Lasso, NEG, MCP), as well as a penalty that has not been previously applied to GWAS (i.e. LOG). Using simulations that closely mirror real GWAS data, we show that our framework has high performance and reliably increases power to detect weak associations, while existing PMR methods can perform worse than single marker testing in overall performance. To demonstrate the empirical value of PUMA, we analyzed GWAS data for type 1 diabetes, Crohns''s disease, and rheumatoid arthritis, three autoimmune diseases from the original Wellcome Trust Case Control Consortium. Our analysis replicates known associations for these diseases and we discover novel etiologically relevant susceptibility loci that are invisible to standard single marker tests, including six novel associations implicating genes involved in pancreatic function, insulin pathways and immune-cell function in type 1 diabetes; three novel associations implicating genes in pro- and anti-inflammatory pathways in Crohn''s disease; and one novel association implicating a gene involved in apoptosis pathways in rheumatoid arthritis. We provide software for applying our PUMA analysis framework.  相似文献   

16.

Objective

The purpose of this study is to provide an optimized method to reconstruct the structure of the upper airway (UA) based on magnetic resonance imaging (MRI) that can faithfully show the anatomical structure with a smooth surface without artificial modifications.

Methods

MRI was performed on the head and neck of a healthy young male participant in the axial, coronal and sagittal planes to acquire images of the UA. The level set method was used to segment the boundary of the UA. The boundaries in the three scanning planes were registered according to the positions of crossing points and anatomical characteristics using a Matlab program. Finally, the three-dimensional (3D) NURBS (Non-Uniform Rational B-Splines) surface of the UA was constructed using the registered boundaries in all three different planes.

Results

A smooth 3D structure of the UA was constructed, which captured the anatomical features from the three anatomical planes, particularly the location of the anterior wall of the nasopharynx. The volume and area of every cross section of the UA can be calculated from the constructed 3D model of UA.

Conclusions

A complete scheme of reconstruction of the UA was proposed, which can be used to measure and evaluate the 3D upper airway accurately.  相似文献   

17.

Background

Although rapid diagnostic tests (RDTs) have practical advantages over light microscopy (LM) and good sensitivity in severe falciparum malaria in Africa, their utility where severe non-falciparum malaria occurs is unknown. LM, RDTs and polymerase chain reaction (PCR)-based methods have limitations, and thus conventional comparative malaria diagnostic studies employ imperfect gold standards. We assessed whether, using Bayesian latent class models (LCMs) which do not require a reference method, RDTs could safely direct initial anti-infective therapy in severe ill children from an area of hyperendemic transmission of both Plasmodium falciparum and P. vivax.

Methods and Findings

We studied 797 Papua New Guinean children hospitalized with well-characterized severe illness for whom LM, RDT and nested PCR (nPCR) results were available. For any severe malaria, the estimated prevalence was 47.5% with RDTs exhibiting similar sensitivity and negative predictive value (NPV) to nPCR (≥96.0%). LM was the least sensitive test (87.4%) and had the lowest NPV (89.7%), but had the highest specificity (99.1%) and positive predictive value (98.9%). For severe falciparum malaria (prevalence 42.9%), the findings were similar. For non-falciparum severe malaria (prevalence 6.9%), no test had the WHO-recommended sensitivity and specificity of >95% and >90%, respectively. RDTs were the least sensitive (69.6%) and had the lowest NPV (96.7%).

Conclusions

RDTs appear a valuable point-of-care test that is at least equivalent to LM in diagnosing severe falciparum malaria in this epidemiologic situation. None of the tests had the required sensitivity/specificity for severe non-falciparum malaria but the number of false-negative RDTs in this group was small.  相似文献   

18.
19.
Predicting protein-coding genes still remains a significant challenge. Although a variety of computational programs that use commonly machine learning methods have emerged, the accuracy of predictions remains a low level when implementing in large genomic sequences. Moreover, computational gene finding in newly se- quenced genomes is especially a difficult task due to the absence of a training set of abundant validated genes. Here we present a new gene-finding program, SCGPred, to improve the accuracy of prediction by combining multiple sources of evidence. SCGPred can perform both supervised method in previously well-studied genomes and unsupervised one in novel genomes. By testing with datasets composed of large DNA sequences from human and a novel genome of Ustilago maydi, SCGPred gains a significant improvement in comparison to the popular ab initio gene predictors. We also demonstrate that SCGPred can significantly improve prediction in novel genomes by combining several foreign gene finders with similarity alignments, which is superior to other unsupervised methods. Therefore, SCGPred can serve as an alternative gene-finding tool for newly sequenced eukaryotic genomes. The program is freely available at http://bio.scu.edu.cn/SCGPred/.  相似文献   

20.
Epitope-based vaccines (EVs) have a wide range of applications: from therapeutic to prophylactic approaches, from infectious diseases to cancer. The development of an EV is based on the knowledge of target-specific antigens from which immunogenic peptides, so-called epitopes, are derived. Such epitopes form the key components of the EV. Due to regulatory, economic, and practical concerns the number of epitopes that can be included in an EV is limited. Furthermore, as the major histocompatibility complex (MHC) binding these epitopes is highly polymorphic, every patient possesses a set of MHC class I and class II molecules of differing specificities. A peptide combination effective for one person can thus be completely ineffective for another. This renders the optimal selection of these epitopes an important and interesting optimization problem. In this work we present a mathematical framework based on integer linear programming (ILP) that allows the formulation of various flavors of the vaccine design problem and the efficient identification of optimal sets of epitopes. Out of a user-defined set of predicted or experimentally determined epitopes, the framework selects the set with the maximum likelihood of eliciting a broad and potent immune response. Our ILP approach allows an elegant and flexible formulation of numerous variants of the EV design problem. In order to demonstrate this, we show how common immunological requirements for a good EV (e.g., coverage of epitopes from each antigen, coverage of all MHC alleles in a set, or avoidance of epitopes with high mutation rates) can be translated into constraints or modifications of the objective function within the ILP framework. An implementation of the algorithm outperforms a simple greedy strategy as well as a previously suggested evolutionary algorithm and has runtimes on the order of seconds for typical problem sizes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号