首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

Superbubbles are distinctive subgraphs in direct graphs that play an important role in assembly algorithms for high-throughput sequencing (HTS) data. Their practical importance derives from the fact they are connected to their host graph by a single entrance and a single exit vertex, thus allowing them to be handled independently. Efficient algorithms for the enumeration of superbubbles are therefore of important for the processing of HTS data. Superbubbles can be identified within the strongly connected components of the input digraph after transforming them into directed acyclic graphs. The algorithm by Sung et al. (IEEE ACM Trans Comput Biol Bioinform 12:770–777, 2015) achieves this task in \(\mathcal {O}(m~log(m))\)-time. The extraction of superbubbles from the transformed components was later improved to by Brankovic et al. (Theor Comput Sci 609:374–383, 2016) resulting in an overall \(\mathcal {O}(m+n)\)-time algorithm.

Results

A re-analysis of the mathematical structure of superbubbles showed that the construction of auxiliary DAGs from the strongly connected components in the work of Sung et al. missed some details that can lead to the reporting of false positive superbubbles. We propose an alternative, even simpler auxiliary graph that solved the problem and retains the linear running time for general digraph. Furthermore, we describe a simpler, space-efficient \(\mathcal {O}(m+n)\)-time algorithm for detecting superbubbles in DAGs that uses only simple data structures.

Implementation

We present a reference implementation of the algorithm that accepts many commonly used formats for the input graph and provides convenient access to the improved algorithm. https://github.com/Fabianexe/Superbubble.
  相似文献   

2.

Background

The basic RNA secondary structure prediction problem or single sequence folding problem (SSF) was solved 35 years ago by a now well-known \(O(n^3)\)-time dynamic programming method. Recently three methodologies—Valiant, Four-Russians, and Sparsification—have been applied to speedup RNA secondary structure prediction. The sparsification method exploits two properties of the input: the number of subsequence Z with the endpoints belonging to the optimal folding set and the maximum number base-pairs L. These sparsity properties satisfy \(0 \le L \le n / 2\) and \(n \le Z \le n^2 / 2\), and the method reduces the algorithmic running time to O(LZ). While the Four-Russians method utilizes tabling partial results.

Results

In this paper, we explore three different algorithmic speedups. We first expand the reformulate the single sequence folding Four-Russians \(\Theta \left(\frac{n^3}{\log ^2 n}\right)\)-time algorithm, to utilize an on-demand lookup table. Second, we create a framework that combines the fastest Sparsification and new fastest on-demand Four-Russians methods. This combined method has worst-case running time of \(O(\tilde{L}\tilde{Z})\), where \(\frac{{L}}{\log n} \le \tilde{L}\le min\left({L},\frac{n}{\log n}\right)\) and \(\frac{{Z}}{\log n}\le \tilde{Z} \le min\left({Z},\frac{n^2}{\log n}\right)\). Third we update the Four-Russians formulation to achieve an on-demand \(O( n^2/ \log ^2n )\)-time parallel algorithm. This then leads to an asymptotic speedup of \(O(\tilde{L}\tilde{Z_j})\) where \(\frac{{Z_j}}{\log n}\le \tilde{Z_j} \le min\left({Z_j},\frac{n}{\log n}\right)\) and \(Z_j\) the number of subsequence with the endpoint j belonging to the optimal folding set.

Conclusions

The on-demand formulation not only removes all extraneous computation and allows us to incorporate more realistic scoring schemes, but leads us to take advantage of the sparsity properties. Through asymptotic analysis and empirical testing on the base-pair maximization variant and a more biologically informative scoring scheme, we show that this Sparse Four-Russians framework is able to achieve a speedup on every problem instance, that is asymptotically never worse, and empirically better than achieved by the minimum of the two methods alone.
  相似文献   

3.

Background

Suffix arrays, augmented by additional data structures, allow solving efficiently many string processing problems. The external memory construction of the generalized suffix array for a string collection is a fundamental task when the size of the input collection or the data structure exceeds the available internal memory.

Results

In this article we present and analyze \(\mathsf {eGSA}\) [introduced in CPM (External memory generalized suffix and \(\mathsf {LCP}\) arrays construction. In: Proceedings of CPM. pp 201–10, 2013)], the first external memory algorithm to construct generalized suffix arrays augmented with the longest common prefix array for a string collection. Our algorithm relies on a combination of buffers, induced sorting and a heap to avoid direct string comparisons. We performed experiments that covered different aspects of our algorithm, including running time, efficiency, external memory access, internal phases and the influence of different optimization strategies. On real datasets of size up to 24 GB and using 2 GB of internal memory, \(\mathsf {eGSA}\) showed a competitive performance when compared to \(\mathsf {eSAIS}\) and \(\mathsf {SAscan}\), which are efficient algorithms for a single string according to the related literature. We also show the effect of disk caching managed by the operating system on our algorithm.

Conclusions

The proposed algorithm was validated through performance tests using real datasets from different domains, in various combinations, and showed a competitive performance. Our algorithm can also construct the generalized Burrows-Wheeler transform of a string collection with no additional cost except by the output time.
  相似文献   

4.
Community N-mixture abundance models for replicated counts provide a powerful and novel framework for drawing inferences related to species abundance within communities subject to imperfect detection. To assess the performance of these models, and to compare them to related community occupancy models in situations with marginal information, we used simulation to examine the effects of mean abundance \((\bar{\lambda }\): 0.1, 0.5, 1, 5), detection probability \((\bar{p}\): 0.1, 0.2, 0.5), and number of sampling sites (n site : 10, 20, 40) and visits (n visit : 2, 3, 4) on the bias and precision of species-level parameters (mean abundance and covariate effect) and a community-level parameter (species richness). Bias and imprecision of estimates decreased when any of the four variables \((\bar{\lambda }\), \(\bar{p}\), n site , n visit ) increased. Detection probability \(\bar{p}\) was most important for the estimates of mean abundance, while \(\bar{\lambda }\) was most influential for covariate effect and species richness estimates. For all parameters, increasing n site was more beneficial than increasing n visit . Minimal conditions for obtaining adequate performance of community abundance models were n site  ≥ 20, \(\bar{p}\) ≥ 0.2, and \(\bar{\lambda }\) ≥ 0.5. At lower abundance, the performance of community abundance and community occupancy models as species richness estimators were comparable. We then used additive partitioning analysis to reveal that raw species counts can overestimate β diversity both of species richness and the Shannon index, while community abundance models yielded better estimates. Community N-mixture abundance models thus have great potential for use with community ecology or conservation applications provided that replicated counts are available.  相似文献   

5.
The present study aimed to investigate the association of \(\hbox {g}.313\hbox {A}{>}\hbox {G}\) and \(\hbox {g}.341\hbox {C}{>}\hbox {T}\) polymorphisms of GSTP1 with coronary artery disease (CAD) in a subgroup of north Indian population. In the present case–control study, CAD patients (\(n = 200\)) and age-matched, sex-matched and ethnicity-matched healthy controls (\(n = 200\)) were genotyped for polymorphisms in GSTP1 using polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) method. Genotype distribution of \(\hbox {g}.313\hbox {A}{>}\hbox {G}\) and \(\hbox {g}.341\hbox {C}{>}\hbox {T}\) polymorphisms of GSTP1 gene was significantly different between cases and controls (\(P = 0.005\) and 0.024, respectively). Binary logistic regression analysis showed significant association of A/G (odds ratio (OR): 1.6, 95% CI: 1.08–2.49, \(P = 0.020\)) and G/G (OR: 3.1, 95% CI: 1.41–6.71, P \(=\) 0.005) genotypes of GSTP1 \(\hbox {g}.313\hbox {A}{\!>\!}\hbox {G}\), and C/T (OR: 5.8, 95% CI: 1.26–26.34, \(P = 0.024\)) genotype of GSTP1 \(\hbox {g}.341\hbox {C}{>}\hbox {T}\) with CAD. The A/G and G/G genotypes of \(\hbox {g}.313\hbox {A}{>}\hbox {G}\) and C/T genotype of \(\hbox {g}.341\hbox {C}{>}\hbox {T}\) conferred 6.5-fold increased risk for CAD (OR: 6.5, 95% CI: 1.37–31.27, \(P = 0.018\)). Moreover, the recessive model of GSTP1 \(\hbox {g}.313\hbox {A}{>}\hbox {G}\) is the best fit inheritance model to predict the susceptible gene effect (OR: 2.3, 95% CI: 1.11–4.92, \(P = 0.020\)). In conclusion, statistically significant associations of GSTP1 \(\hbox {g}.313\hbox {A}{>}\hbox {G}\) (A/G, G/G) and \(\hbox {g}.341\hbox {C}{>}\hbox {T}\) (C/T) genotypes with CAD were observed.  相似文献   

6.
7.
We are interested in characterization of synchronization transitions of bursting neurons in the frequency domain. Instantaneous population firing rate (IPFR) R(t), which is directly obtained from the raster plot of neural spikes, is often used as a realistic collective quantity describing population activities in both the computational and the experimental neuroscience. For the case of spiking neurons, a realistic time-domain order parameter, based on R(t), was introduced in our recent work to characterize the spike synchronization transition. Unlike the case of spiking neurons, the IPFR R(t) of bursting neurons exhibits population behaviors with both the slow bursting and the fast spiking timescales. For our aim, we decompose the IPFR R(t) into the instantaneous population bursting rate Rb(t) (describing the bursting behavior) and the instantaneous population spike rate Rs(t) (describing the spiking behavior) via frequency filtering, and extend the realistic order parameter to the case of bursting neurons. Thus, we develop the frequency-domain bursting and spiking order parameters which are just the bursting and spiking “coherence factors” βb and βs of the bursting and spiking peaks in the power spectral densities of Rb and Rs (i.e., “signal to noise” ratio of the spectral peak height and its relative width). Through calculation of βb and βs, we obtain the bursting and spiking thresholds beyond which the burst and spike synchronizations break up, respectively. Consequently, it is shown in explicit examples that the frequency-domain bursting and spiking order parameters may be usefully used for characterization of the bursting and the spiking transitions, respectively.  相似文献   

8.
In this paper, we propose a novel multi-objective ant colony optimizer (called iMOACO\(_{\mathbb {R}}\)) for continuous search spaces, which is based on ACO\(_{\mathbb {R}}\) and the R2 performance indicator. iMOACO\(_{\mathbb {R}}\) is the first multi-objective ant colony optimizer (MOACO) specifically designed to tackle continuous many-objective optimization problems (i.e., multi-objective optimization problems having four or more objectives). Our proposed iMOACO\(_{\mathbb {R}}\) is compared to three state-of-the-art multi-objective evolutionary algorithms (NSGA-III, MOEA/D and SMS-EMOA) and a MOACO algorithm called MOACO\(_{\mathbb {R}}\) using standard test problems and performance indicators taken from the specialized literature. Our experimental results indicate that iMOACO\(_{\mathbb {R}}\) is very competitive with respect to NSGA-III and MOEA/D and it is able to outperform SMS-EMOA and MOACO\(_{\mathbb {R}}\) in most of the test problems adopted.  相似文献   

9.

Background

Recently, Marcus et al. (Bioinformatics 30:3476–83, 2014) proposed to use a compressed de Bruijn graph to describe the relationship between the genomes of many individuals/strains of the same or closely related species. They devised an \(O(n\log g)\) time algorithm called splitMEM that constructs this graph directly (i.e., without using the uncompressed de Bruijn graph) based on a suffix tree, where n is the total length of the genomes and g is the length of the longest genome. Baier et al. (Bioinformatics 32:497–504, 2016) improved their result.

Results

In this paper, we propose a new space-efficient representation of the compressed de Bruijn graph that adds the possibility to search for a pattern (e.g. an allele—a variant form of a gene) within the pan-genome. The ability to search within the pan-genome graph is of utmost importance and is a design goal of pan-genome data structures.
  相似文献   

10.
11.
We study the effect of changes in flow speed on competition of an arbitrary number of species living in advective environments, such as streams and rivers. We begin with a spatial Lotka–Volterra model which is described by n reaction–diffusion–advection equations with Danckwerts boundary conditions. Using the dominant eigenvalue \(\lambda \le 0\) of the diffusion–advection operator subject to boundary conditions, we reduce the model to a system of ordinary differential equations. We impose a “transitive arrangement” of the competitors in terms of their interspecific coefficients and growth rates, which means that in the absence of advection, we have the following situation: for all \(1\le i<j\le n\), species i out-competes species j, while species j has higher intrinsic growth rate than species i. Changing advection speed in the original spatial model corresponds to changing the value of \(\lambda \) in the spatially implicit model. Considering the cases of the odd and even n separately, we obtain explicit intervals of the values of \(\lambda \) that allow all n species to be present in the habitat (coexistence interval). Stability of this equilibrium is shown for \(n\le 4\).  相似文献   

12.
Pentatricopeptide repeat (PPR) gene family plays an essential role in the regulation of plant growth and organelle gene expression. Some PPR genes are related to fertility restoration in plant, but there is no detailed information in Gossypium. In the present study, we identified 482 and 433 PPR homologues in Gossypium raimondii (\(\hbox {D}_{5}\)) and G. arboreum (\(\hbox {A}_{2}\)) genomes, respectively. Most PPR homologues showed an even distribution on the whole chromosomes. Given an evolutionary analysis to PPR genes from G. raimondii (\(\hbox {D}_{5}\)), G. arboreum (\(\hbox {A}_{2}\)) and G. hirsutum genomes, eight PPR genes were clustered together with restoring genes of other species. Most cotton PPR genes were qualified with no intron, high proportion of \(\upalpha \)-helix and classical tertiary structure of PPR protein. Based on bioinformatics analyses, eight PPR genes were targeted in mitochondrion, encoding typical P subfamily protein with protein binding activity and organelle RNA metabolism in function. Further verified by RNA-seq and quantitative real-time PCR (qRT-PCR) analyses, two PPR candidate genes, Gorai.005G0470 (\(\hbox {D}_{5}\)) and Cotton_A_08373 (\(\hbox {A}_{2}\)), were upregulated in fertile line than sterile line. These results reveal new insights into PPR gene evolution in Gossypium.  相似文献   

13.
14.

Background

In this work, we present a new coarse grained representation of RNA dynamics. It is based on adjacency matrices and their interactions patterns obtained from molecular dynamics simulations. RNA molecules are well-suited for this representation due to their composition which is mainly modular and assessable by the secondary structure alone. These interactions can be represented as adjacency matrices of k nucleotides. Based on those, we define transitions between states as changes in the adjacency matrices which form Markovian dynamics. The intense computational demand for deriving the transition probability matrices prompted us to develop StreAM-\(T_g\), a stream-based algorithm for generating such Markov models of k-vertex adjacency matrices representing the RNA.

Results

We benchmark StreAM-\(T_g\) (a) for random and RNA unit sphere dynamic graphs (b) for the robustness of our method against different parameters. Moreover, we address a riboswitch design problem by applying StreAM-\(T_g\) on six long term molecular dynamics simulation of a synthetic tetracycline dependent riboswitch (500 ns) in combination with five different antibiotics.

Conclusions

The proposed algorithm performs well on large simulated as well as real world dynamic graphs. Additionally, StreAM-\(T_g\) provides insights into nucleotide based RNA dynamics in comparison to conventional metrics like the root-mean square fluctuation. In the light of experimental data our results show important design opportunities for the riboswitch.
  相似文献   

15.
Animal behavior is flexible, and the same individual can exhibit variable expressions under the equivalent ecological situations (i.e., within-individual behavioral variation). This study examines the evolution of within-individual behavioral variation using an individual-based model. A common predation scenario is considered where a predator spends a period h to handle and consume a captured prey. The model assumes the handling time of the predator to be a random variable. The average and within-individual variance of handling time are described by \(\mu _h\) and \(\sigma _h^2\), respectively, where each individual has its own unique \(\mu _h\) and \(\sigma _h^2\). Using a genetic algorithm, the evolution of \(\sigma _h^2\) is traced. The results show that natural selection acts on both \(\mu _h\) and \(\sigma _h^2\), and the optimal behavioral variation depends on the density of prey. In particular, individuals with high behavioral variance \(\sigma _h^2\) are more likely selected when prey density is low. Individual based modeling can be a useful tool for studying the ultimate significance of within-individual behavioral variation and generating empirically testable predictions. The mechanisms of the evolution of within-individual behavioral variation and their ecological implications are discussed.  相似文献   

16.
Age-related macular degeneration (AMD) is a common cause of blindness in the elderly. Caucasian patients are predominantly affected by the dry form of AMD, whereas Japanese patients have predominantly the wet form of AMD and/or polypoidal choroidal vasculopathy (PCV). Although genetic association in the 10q26 (ARMS2/HTRA1) region has been established in many ethnic groups for dry-type AMD, typical wet-type AMD, and PCV, the contribution of the 1q32 (CFH) region seem to differ among these groups. Here we show a single nucleotide polymorphism (SNP) in the ARMS2/HTRA1 locus is associated in the whole genome for Japanese typical wet-type AMD (rs10490924: , OR = 4.16) and PCV (rs10490924: , OR = 2.72) followed by CFH (rs800292: , OR = 2.08; , OR = 2.00), which differs from previous studies in Caucasian populations. Moreover, a SNP (rs2241394) in complement component C3 gene showed significant association with PCV (, OR = 3.47). We conclude that dry-type AMD, typical wet-type AMD, and PCV have both common and distinct genetic risks that become apparent when comparing Japanese versus Caucasian populations.

Electronic supplementary material

The online version of this article (doi:10.1007/s12177-009-9047-1) contains supplementary material, which is available to authorized users.  相似文献   

17.
18.
Previous genomewide association studies (GWAS) and meta-analyses have enumerated several genes/loci in major histocompatibility complex region, which are consistently associated with rheumatoid arthritis (RA) in different ethnic populations. Given the genetic heterogeneity of the disease, it is necessary to replicate these susceptibility loci in other populations. In this case, we investigate the analysis of two SNPs, rs13192471 and rs6457617, from the human leukocyte antigen (HLA) region with the risk of RA in Tunisian population. These SNPs were previously identified to have a strong RA association signal in several GWAS studies. A case–control sample composed of 142 RA patients and 123 healthy controls was analysed. Genotyping of rs13192471 and rs6457617 was carried out using real-time PCR methods by TaqMan allelic discrimination assay. A trend of significant association was found in rs6457617 TT genotype with susceptibility to RA (\(P = 0.04\), \(p_{c} = 0.08\), \(\hbox {OR} = 1.73\)). Moreover, using multivariable analysis, the combination of rs6457617*TT–HLA-DRB1*\(04^{+}\) increased risk of RA (\(\hbox {OR} = 2.38\)), which suggest a gene–gene interaction event between rs6457617 located within the HLA-DQB1 and HLA-DRB1. Additionally, haplotypic analysis highlighted a significant association of rs6457617*T–HLA-DRB1*\(04^{+}\) haplotype with susceptibility to RA (\(P = 0.018\), \(p_{c} = 0.036\), \(\hbox {OR} = 1.72\)). An evidence of association was shown subsequently in \(\hbox {antiCCP}^{+}\) subgroup with rs6457617 both in T allele and TT genotype (\(P = 0.01\), \(p_{c} = 0.03\), \(\hbox {OR} = 1.66\) and \(P = 0.008\), \(p_{c} = 0.024\), \(\hbox {OR} = 1.28\), respectively). However, no association was shown for rs13192471 polymorphism with susceptibility and severity to RA. This study suggests the involvement of rs6457617 locus as risk variant for susceptibility/severity to RA in Tunisian population. Secondly, it highlights the gene–gene interaction between HLA-DQB1 and HLA-DRB1.  相似文献   

19.
During the early phase of the cell cycle the eukaryotic genome is organized into chromosome territories. The geometry of the interface between any two chromosomes remains a matter of debate and may have important functional consequences. The Interchromosomal Network model (introduced by Branco and Pombo) proposes that territories intermingle along their periphery. In order to partially quantify this concept we here investigate the probability that two chromosomes form an unsplittable link. We use the uniform random polygon as a crude model for chromosome territories and we model the interchromosomal network as the common spatial region of two overlapping uniform random polygons. This simple model allows us to derive some rigorous mathematical results as well as to perform computer simulations easily. We find that the probability that one uniform random polygon of length n that partially overlaps a fixed polygon is bounded below by \({1-O(\frac{1}{\sqrt n})}\). We use numerical simulations to estimate the dependence of the linking probability of two uniform random polygons (of lengths n and m, respectively) on the amount of overlapping. The degree of overlapping is parametrized by a parameter \({\epsilon\in [0,1]}\) such that \({\epsilon=0}\) indicates no overlapping and \({\epsilon=1}\) indicates total overlapping. We propose that this dependence relation may be modeled as \({f(\varepsilon, m, n) =1-{\frac{a(\epsilon)}{b(\epsilon)\sqrt{mn}+c(\epsilon)}}}\). Numerical evidence shows that this model works well when \({\epsilon}\) is relatively large \({(\varepsilon \ge 0.5)}\). We then use these results to model the data published by Branco and Pombo and observe that for the amount of overlapping observed experimentally the URPs have a non-zero probability of forming an unsplittable link.  相似文献   

20.
The cathepsin E-A-like, also known as ‘similar to nothepsin’, is a new member of the aspartic protease family, which may take part in processing of egg yolk macromolecules, due to it was identified in the chicken egg-yolk. Previously, studies have suggested that the expression of cathepsin E-A-like increased gradually during sexual maturation of pullets, but the exact regulation mechanism is poorly understood. In this study, to gain insight into the function and regulation mechanism of the gene in egg-laying hen, we cloned the cathepsin E-A-like gene and evaluated its evolutionary origin by using both phylogenetic and syntenic methods. The mode of the gene expression regulation was analysed through stimulating juvenile hens with \(17\upbeta \)-estradiol and chicken embryo hepatocytes with \(17\upbeta \)-estradiol combined with oestrogen receptor antagonists including MPP, ICI 182,780 and tamoxifen. Our results showed that cathepsin E-A-like was an orthologoues gene with nothepsin, which is present in birds but not in mammals. The expression of cathepsin E-A-like significantly increased in a dose-dependent manner after the juvenile hens were treated with \(17\upbeta \)-estradiol (\(P~<~0.05\)). Compared with the \(17\upbeta \)-estradiol treatment group, the expression of cathepsin E-A-like was not significantly changed when the hepatocytes were treated with \(17\upbeta \)-estradiol combined with MPP (\(P~<~0.05\)). In contrast, compared with the \(17\upbeta \)-estradiol combined with MPP treatment group, the expression of cathepsin E-A-like was significantly downregulated when the hepatocytes were treated with \(17\upbeta \)-estradiol combined with tamoxifen or ICI 182,780 (\(P~<~0.05\)). These results demonstrated that cathepsin E-A-like shared the same evolutionary origin with nothepsin. The expression of cathepsin E-A-like was regulated by oestrogen, and the regulative effect was predominantly mediated through ER-\(\upbeta \) in liver of chicken.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号