首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.

Background

Cancer is an evolutionary process characterized by the accumulation of somatic mutations in a population of cells that form a tumor. One frequent type of mutations is copy number aberrations, which alter the number of copies of genomic regions. The number of copies of each position along a chromosome constitutes the chromosome’s copy-number profile. Understanding how such profiles evolve in cancer can assist in both diagnosis and prognosis.

Results

We model the evolution of a tumor by segmental deletions and amplifications, and gauge distance from profile \(\mathbf {a}\) to \(\mathbf {b}\) by the minimum number of events needed to transform \(\mathbf {a}\) into \(\mathbf {b}\). Given two profiles, our first problem aims to find a parental profile that minimizes the sum of distances to its children. Given k profiles, the second, more general problem, seeks a phylogenetic tree, whose k leaves are labeled by the k given profiles and whose internal vertices are labeled by ancestral profiles such that the sum of edge distances is minimum.

Conclusions

For the former problem we give a pseudo-polynomial dynamic programming algorithm that is linear in the profile length, and an integer linear program formulation. For the latter problem we show it is NP-hard and give an integer linear program formulation that scales to practical problem instance sizes. We assess the efficiency and quality of our algorithms on simulated instances.
  相似文献   

2.

Background

The basic RNA secondary structure prediction problem or single sequence folding problem (SSF) was solved 35 years ago by a now well-known \(O(n^3)\)-time dynamic programming method. Recently three methodologies—Valiant, Four-Russians, and Sparsification—have been applied to speedup RNA secondary structure prediction. The sparsification method exploits two properties of the input: the number of subsequence Z with the endpoints belonging to the optimal folding set and the maximum number base-pairs L. These sparsity properties satisfy \(0 \le L \le n / 2\) and \(n \le Z \le n^2 / 2\), and the method reduces the algorithmic running time to O(LZ). While the Four-Russians method utilizes tabling partial results.

Results

In this paper, we explore three different algorithmic speedups. We first expand the reformulate the single sequence folding Four-Russians \(\Theta \left(\frac{n^3}{\log ^2 n}\right)\)-time algorithm, to utilize an on-demand lookup table. Second, we create a framework that combines the fastest Sparsification and new fastest on-demand Four-Russians methods. This combined method has worst-case running time of \(O(\tilde{L}\tilde{Z})\), where \(\frac{{L}}{\log n} \le \tilde{L}\le min\left({L},\frac{n}{\log n}\right)\) and \(\frac{{Z}}{\log n}\le \tilde{Z} \le min\left({Z},\frac{n^2}{\log n}\right)\). Third we update the Four-Russians formulation to achieve an on-demand \(O( n^2/ \log ^2n )\)-time parallel algorithm. This then leads to an asymptotic speedup of \(O(\tilde{L}\tilde{Z_j})\) where \(\frac{{Z_j}}{\log n}\le \tilde{Z_j} \le min\left({Z_j},\frac{n}{\log n}\right)\) and \(Z_j\) the number of subsequence with the endpoint j belonging to the optimal folding set.

Conclusions

The on-demand formulation not only removes all extraneous computation and allows us to incorporate more realistic scoring schemes, but leads us to take advantage of the sparsity properties. Through asymptotic analysis and empirical testing on the base-pair maximization variant and a more biologically informative scoring scheme, we show that this Sparse Four-Russians framework is able to achieve a speedup on every problem instance, that is asymptotically never worse, and empirically better than achieved by the minimum of the two methods alone.
  相似文献   

3.

Background

Suffix arrays, augmented by additional data structures, allow solving efficiently many string processing problems. The external memory construction of the generalized suffix array for a string collection is a fundamental task when the size of the input collection or the data structure exceeds the available internal memory.

Results

In this article we present and analyze \(\mathsf {eGSA}\) [introduced in CPM (External memory generalized suffix and \(\mathsf {LCP}\) arrays construction. In: Proceedings of CPM. pp 201–10, 2013)], the first external memory algorithm to construct generalized suffix arrays augmented with the longest common prefix array for a string collection. Our algorithm relies on a combination of buffers, induced sorting and a heap to avoid direct string comparisons. We performed experiments that covered different aspects of our algorithm, including running time, efficiency, external memory access, internal phases and the influence of different optimization strategies. On real datasets of size up to 24 GB and using 2 GB of internal memory, \(\mathsf {eGSA}\) showed a competitive performance when compared to \(\mathsf {eSAIS}\) and \(\mathsf {SAscan}\), which are efficient algorithms for a single string according to the related literature. We also show the effect of disk caching managed by the operating system on our algorithm.

Conclusions

The proposed algorithm was validated through performance tests using real datasets from different domains, in various combinations, and showed a competitive performance. Our algorithm can also construct the generalized Burrows-Wheeler transform of a string collection with no additional cost except by the output time.
  相似文献   

4.

Introduction

The Elongator complex, comprising six subunits (Elp1p-Elp6p), is required for formation of 5-carbamoylmethyl (ncm5) and 5-methoxycarbonylmethyl (mcm5) side chains on wobble uridines in 11 out of 42 tRNA species in Saccharomyces cerevisiae. Loss of these side chains reduces the efficiency of tRNA decoding during translation, resulting in pleiotropic phenotypes. Overexpression of hypomodified \( {\text {tRNA}_{{\rm s^{2} {\rm UUU}}}^{{\rm Lys}} , {\rm tRNA}_{{\rm s^{2} {\rm UUG}}}^{{\rm Gln }} \;{\rm and}\;{\rm tRNA}_{{\rm s^{2} {\rm UUC}}}^{{\rm Glu}}} \), which in wild-type strains are modified with mcm5s2U, partially suppress phenotypes of an elp3Δ strain.

Objectives

To identify metabolic alterations in an elp3Δ strain and elucidate whether these metabolic alterations are suppressed by overexpression of hypomodified \( {\text {tRNA}_{{\rm s^{2} {\rm UUU}}}^{{\rm Lys}} , {\rm tRNA}_{{\rm s^{2} {\rm UUG}}}^{{\rm Gln }} \;{\rm and}\;{\rm tRNA}_{{\rm s^{2} {\rm UUC}}}^{{\rm Glu}}} \).

Method

Metabolic profiles were obtained using untargeted GC-TOF-MS of a temperature-sensitive elp3Δ strain carrying either an empty low-copy vector, an empty high-copy vector, a low-copy vector harboring the wild-type ELP3 gene, or a high-copy vector overexpressing \( {\text {tRNA}_{{\rm s^{2} {\rm UUU}}}^{{\rm Lys}} , {\rm tRNA}_{{\rm s^{2} {\rm UUG}}}^{{\rm Gln }} \;{\rm and}\;{\rm tRNA}_{{\rm s^{2} {\rm UUC}}}^{{\rm Glu}}} \). The temperature sensitive elp3Δ strain derivatives were cultivated at permissive (30 °C) or semi-permissive (34 °C) growth conditions.

Results

Culturing an elp3Δ strain at 30 or 34 °C resulted in altered metabolism of 36 and 46 %, respectively, of all metabolites detected when compared to an elp3Δ strain carrying the wild-type ELP3 gene. Overexpression of hypomodified \( {\text {tRNA}_{{\rm s^{2} {\rm UUU}}}^{{\rm Lys}} , {\rm tRNA}_{{\rm s^{2} {\rm UUG}}}^{{\rm Gln }} \;{\rm and}\;{\rm tRNA}_{{\rm s^{2} {\rm UUC}}}^{{\rm Glu}}} \) suppressed a subset of the metabolic alterations observed in the elp3Δ strain.

Conclusion

Our results suggest that the presence of ncm5- and mcm5-side chains on wobble uridines in tRNA are important for metabolic homeostasis.
  相似文献   

5.
Zeng  Chao  Hamada  Michiaki 《BMC genomics》2018,19(10):906-49

Background

With the increasing number of annotated long noncoding RNAs (lncRNAs) from the genome, researchers are continually updating their understanding of lncRNAs. Recently, thousands of lncRNAs have been reported to be associated with ribosomes in mammals. However, their biological functions or mechanisms are still unclear.

Results

In this study, we tried to investigate the sequence features involved in the ribosomal association of lncRNA. We have extracted ninety-nine sequence features corresponding to different biological mechanisms (i.e., RNA splicing, putative ORF, k-mer frequency, RNA modification, RNA secondary structure, and repeat element). An \(\mathcal {L}1\)-regularized logistic regression model was applied to screen these features. Finally, we obtained fifteen and nine important features for the ribosomal association of human and mouse lncRNAs, respectively.

Conclusion

To our knowledge, this is the first study to characterize ribosome-associated lncRNAs and ribosome-free lncRNAs from the perspective of sequence features. These sequence features that were identified in this study may shed light on the biological mechanism of the ribosomal association and provide important clues for functional analysis of lncRNAs.
  相似文献   

6.

Main conclusion

Starch granule size distributions in plant tissues, when determined in high resolution and specifiedproperly as a frequency function, could provide useful information on the granule formation and growth.

Abstract

To better understand genetic control of physical properties of starch granules, we attempted a new approach to analyze developmental and genotypic effects on morphology and size distributions of starch granules in sweetpotato storage roots. Starch granules in sweetpotatoes exhibited low sphericity, many shapes that appeared to be independent of genotypes or developmental stages, and non-randomly distributed sizes. Granule size distributions of sweetpotato starches were determined in high resolution as differential volume-percentage distributions of volume-equivalent spherical diameters, rigorously curve-fitted to be lognormal, and specified using their geometric means \(\bar{x}^{*}\) and multiplicative standard deviations \(s^{*}\) in a \(\bar{x}^{*} \times /({\text{multiply/divide}})s^{*}\) form. The scale (\(\bar{x}^{*}\)) and shape (\(\bar{s}^{*}\)) of these distributions were independently variable, ranging from 14.02 to 19.36 μm and 1.403 to 1.567, respectively, among 22 cultivars/clones. The shape (\(s^{*}\)) of granule lognormal volume-size distributions of sweetpotato starch were found to be highly significantly and inversely correlated with their apparent amylose contents. More importantly, granule lognormal volume-size distributions of starches in developing sweetpotatoes displayed the same self-preserving kinetics, i.e., preserving the shape but shifting upward the scale, as those of particles undergoing agglomeration, which strongly indicated involvement of agglomeration in the formation and growth of starch granules. Furthermore, QTL analysis of a segregating null allele at one of three homoeologous starch synthase II loci in a reciprocal-cross population, which was identified through profiling starch granule-bound proteins in sweetpotatoes of diverse genotypes, showed that the locus is a QTL modulating the scale of granule volume-size distributions of starch in sweetpotatoes.
  相似文献   

7.
We developed a dynamic model of a rat proximal convoluted tubule cell in order to investigate cell volume regulation mechanisms in this nephron segment. We examined whether regulatory volume decrease (RVD), which follows exposure to a hyposmotic peritubular solution, can be achieved solely via stimulation of basolateral K\(^+\) and \(\hbox {Cl}^-\) channels and \(\hbox {Na}^+\)\(\hbox {HCO}_3^-\) cotransporters. We also determined whether regulatory volume increase (RVI), which follows exposure to a hyperosmotic peritubular solution under certain conditions, may be accomplished by activating basolateral \(\hbox {Na}^+\)/H\(^+\) exchangers. Model predictions were in good agreement with experimental observations in mouse proximal tubule cells assuming that a 10% increase in cell volume induces a fourfold increase in the expression of basolateral K\(^+\) and \(\hbox {Cl}^-\) channels and \(\hbox {Na}^+\)\(\hbox {HCO}_3^-\) cotransporters. Our results also suggest that in response to a hyposmotic challenge and subsequent cell swelling, \(\hbox {Na}^+\)\(\hbox {HCO}^-_3\) cotransporters are more efficient than basolateral K\(^+\) and \(\hbox {Cl}^-\) channels at lowering intracellular osmolality and reducing cell volume. Moreover, both RVD and RVI are predicted to stabilize net transcellular \(\hbox {Na}^+\) reabsorption, that is, to limit the net \(\hbox {Na}^+\) flux decrease during a hyposmotic challenge or the net \(\hbox {Na}^+\) flux increase during a hyperosmotic challenge.  相似文献   

8.

Introduction

To aid the development of better algorithms for \(^1\)H NMR data analysis, such as alignment or peak-fitting, it is important to characterise and model chemical shift changes caused by variation in pH. The number of protonation sites, a key parameter in the theoretical relationship between pH and chemical shift, is traditionally estimated from the molecular structure, which is often unknown in untargeted metabolomics applications.

Objective

We aim to use observed NMR chemical shift titration data to estimate the number of protonation sites for a range of urinary metabolites.

Methods

A pool of urine from healthy subjects was titrated in the range pH 2–12, standard \(^1\)H NMR spectra were acquired and positions of 51 peaks (corresponding to 32 identified metabolites) were recorded. A theoretical model of chemical shift was fit to the data using a Bayesian statistical framework, using model selection procedures in a Markov Chain Monte Carlo algorithm to estimate the number of protonation sites for each molecule.

Results

The estimated number of protonation sites was found to be correct for 41 out of 51 peaks. In some cases, the number of sites was incorrectly estimated, due to very close pKa values or a limited amount of data in the required pH range.

Conclusions

Given appropriate data, it is possible to estimate the number of protonation sites for many metabolites typically observed in \(^1\)H NMR metabolomics without knowledge of the molecular structure. This approach may be a valuable resource for the development of future automated metabolite alignment, annotation and peak fitting algorithms.
  相似文献   

9.
Computational modelling has received increasing attention to investigate multi-scale coupled problems in micro-heterogeneous biological structures such as cells. In the current study, we investigated for a single cell the effects of (1) different cell-substrate attachment (2) and different substrate modulus \(\textit{E}_\mathrm{s}\) on intracellular deformations. A fibroblast was geometrically reconstructed from confocal micrographs. Finite element models of the cell on a planar substrate were developed. Intracellular deformations due to substrate stretch of \(\lambda =1.1\), were assessed for: (1) cell-substrate attachment implemented as full basal contact (FC) and 124 focal adhesions (FA), respectively, and \(\textit{E}_\mathrm{s}\,=\,\)140 KPa and (2) \(\textit{E}_\mathrm{s}\,=\,10\), 140, 1000, and 10,000 KPa, respectively, and FA attachment. The largest strains in cytosol, nucleus and cell membrane were higher for FC (1.35\(\text {e}^{-2}\), 0.235\(\text {e}^{-2}\) and 0.6\(\text {e}^{-2}\)) than for FA attachment (0.0952\(\text {e}^{-2}\), 0.0472\(\text {e}^{-2}\) and 0.05\(\text {e}^{-2}\)). For increasing \(\textit{E}_\mathrm{s}\), the largest maximum principal strain was 4.4\(\text {e}^{-4}\), 5\(\text {e}^{-4}\), 5.3\(\text {e}^{-4}\) and 5.3\(\text {e}^{-4}\) in the membrane, 9.5\(\text {e}^{-4}\), 1.1\(\text {e}^{-4}\), 1.2\(\text {e}^{-3}\) and 1.2\(\text {e}^{-3}\) in the cytosol, and 4.5\(\text {e}^{-4}\), 5.3\(\text {e}^{-4}\), 5.7\(\text {e}^{-4}\) and 5.7\(\text {e}^{-4}\) in the nucleus. The results show (1) the importance of representing FA in cell models and (2) higher cellular mechanical sensitivity for substrate stiffness changes in the range of cell stiffness. The latter indicates that matching substrate stiffness to cell stiffness, and moderate variation of the former is very effective for controlled variation of cell deformation. The developed methodology is useful for parametric studies on cellular mechanics to obtain quantitative data of subcellular strains and stresses that cannot easily be measured experimentally.  相似文献   

10.

Background

In this work, we present a new coarse grained representation of RNA dynamics. It is based on adjacency matrices and their interactions patterns obtained from molecular dynamics simulations. RNA molecules are well-suited for this representation due to their composition which is mainly modular and assessable by the secondary structure alone. These interactions can be represented as adjacency matrices of k nucleotides. Based on those, we define transitions between states as changes in the adjacency matrices which form Markovian dynamics. The intense computational demand for deriving the transition probability matrices prompted us to develop StreAM-\(T_g\), a stream-based algorithm for generating such Markov models of k-vertex adjacency matrices representing the RNA.

Results

We benchmark StreAM-\(T_g\) (a) for random and RNA unit sphere dynamic graphs (b) for the robustness of our method against different parameters. Moreover, we address a riboswitch design problem by applying StreAM-\(T_g\) on six long term molecular dynamics simulation of a synthetic tetracycline dependent riboswitch (500 ns) in combination with five different antibiotics.

Conclusions

The proposed algorithm performs well on large simulated as well as real world dynamic graphs. Additionally, StreAM-\(T_g\) provides insights into nucleotide based RNA dynamics in comparison to conventional metrics like the root-mean square fluctuation. In the light of experimental data our results show important design opportunities for the riboswitch.
  相似文献   

11.

Background

Patterns with wildcards in specified positions, namely spaced seeds, are increasingly used instead of k-mers in many bioinformatics applications that require indexing, querying and rapid similarity search, as they can provide better sensitivity. Many of these applications require to compute the hashing of each position in the input sequences with respect to the given spaced seed, or to multiple spaced seeds. While the hashing of k-mers can be rapidly computed by exploiting the large overlap between consecutive k-mers, spaced seeds hashing is usually computed from scratch for each position in the input sequence, thus resulting in slower processing.

Results

The method proposed in this paper, fast spaced-seed hashing (FSH), exploits the similarity of the hash values of spaced seeds computed at adjacent positions in the input sequence. In our experiments we compute the hash for each positions of metagenomics reads from several datasets, with respect to different spaced seeds. We also propose a generalized version of the algorithm for the simultaneous computation of multiple spaced seeds hashing. In the experiments, our algorithm can compute the hashing values of spaced seeds with a speedup, with respect to the traditional approach, between 1.6\(\times\) to 5.3\(\times\), depending on the structure of the spaced seed.

Conclusions

Spaced seed hashing is a routine task for several bioinformatics application. FSH allows to perform this task efficiently and raise the question of whether other hashing can be exploited to further improve the speed up. This has the potential of major impact in the field, making spaced seed applications not only accurate, but also faster and more efficient.

Availability

The software FSH is freely available for academic use at: https://bitbucket.org/samu661/fsh/overview.
  相似文献   

12.

Key message

Using landraces for broadening the genetic base of elite maize germplasm is hampered by heterogeneity and high genetic load. Production of DH line libraries can help to overcome these problems.

Abstract

Landraces of maize (Zea mays L.) represent a huge reservoir of genetic diversity largely untapped by breeders. Genetic heterogeneity and a high genetic load hamper their use in hybrid breeding. Production of doubled haploid line libraries (DHL) by the in vivo haploid induction method promises to overcome these problems. To test this hypothesis, we compared the line per se performance of 389 doubled haploid (DH) lines across six DHL produced from European flint landraces with that of four flint founder lines (FFL) and 53 elite flint lines (EFL) for 16 agronomic traits evaluated in four locations. The genotypic variance (\(\sigma _{G}^{2}\)) within DHL was generally much larger than that among DHL and exceeded also \(\sigma _{G}^{2}\) of the EFL. For most traits, the means and \(\sigma _{G}^{2}\) differed considerably among the DHL, resulting in different expected selection gains. Mean grain yield of the EFL was 25 and 62% higher than for the FFL and DHL, respectively, indicating considerable breeding progress in the EFL and a remnant genetic load in the DHL. Usefulness of the best 20% lines was for individual DHL comparable to the EFL and grain yield (GY) in the top lines from both groups was similar. Our results corroborate the tremendous potential of landraces for broadening the narrow genetic base of elite germplasm. To make best use of these “gold reserves”, we propose a multi-stage selection approach with optimal allocation of resources to (1) choose the most promising landraces for DHL production and (2) identify the top DH lines for further breeding.
  相似文献   

13.
Aberrant NSD2 methyltransferase activity is implicated as the oncogenic driver in multiple myeloma, suggesting opportunities for novel therapeutic intervention. The methyltransferase activity of NSD2 resides in its catalytic SET domain, which is conserved among most lysine methyltransferases. Here we report the backbone \(\hbox {H}^{\mathrm{N}}\), N, C\(^{\prime }\), \(\hbox {C}^\alpha\) and side-chain \(\hbox {C}^\beta\) assignments of a 25 kDa NSD2 SET domain construct, spanning residues 991–1203. A chemical shift analysis of C\(^{\prime }\), \(\hbox {C}^\alpha\) and \(\hbox {C}^\beta\) resonances predicts a secondary structural pattern that is in agreement with homology models.  相似文献   

14.
15.

Background

Mathematical modeling is a powerful tool to analyze, and ultimately design biochemical networks. However, the estimation of the parameters that appear in biochemical models is a significant challenge. Parameter estimation typically involves expensive function evaluations and noisy data, making it difficult to quickly obtain optimal solutions. Further, biochemical models often have many local extrema which further complicates parameter estimation. Toward these challenges, we developed Dynamic Optimization with Particle Swarms (DOPS), a novel hybrid meta-heuristic that combined multi-swarm particle swarm optimization with dynamically dimensioned search (DDS). DOPS uses a multi-swarm particle swarm optimization technique to generate candidate solution vectors, the best of which is then greedily updated using dynamically dimensioned search.

Results

We tested DOPS using classic optimization test functions, biochemical benchmark problems and real-world biochemical models. We performed \(\mathcal {T}\) = 25 trials with \(\mathcal {N}\) = 4000 function evaluations per trial, and compared the performance of DOPS with other commonly used meta-heuristics such as differential evolution (DE), simulated annealing (SA) and dynamically dimensioned search (DDS). On average, DOPS outperformed other common meta-heuristics on the optimization test functions, benchmark problems and a real-world model of the human coagulation cascade.

Conclusions

DOPS is a promising meta-heuristic approach for the estimation of biochemical model parameters in relatively few function evaluations. DOPS source code is available for download under a MIT license at http://www.varnerlab.org.
  相似文献   

16.
We develop a mathematical model of a salivary gland acinar cell with the objective of investigating the role of two \(\mathrm{Cl}^-/\mathrm{HCO}_3^-\) exchangers from the solute carrier family 4 (Slc4), Ae2 (Slc4a2) and Ae4 (Slc4a9), in fluid secretion. Water transport in this type of cell is predominantly driven by \(\mathrm{Cl}^-\) movement. Here, a basolateral \(\mathrm{Na}^+/ \mathrm{K}^+\) adenosine triphosphatase pump (NaK-ATPase) and a \(\mathrm{Na}^+\)\(\mathrm{K}^+\)\(2 \mathrm{Cl}^-\) cotransporter (Nkcc1) are primarily responsible for concentrating the intracellular space with \(\mathrm{Cl}^-\) well above its equilibrium potential. Gustatory and olfactory stimuli induce the release of \(\mathrm{Ca}^{2+}\) ions from the internal stores of acinar cells, which triggers saliva secretion. \(\mathrm{Ca}^{2+}\)-dependent \(\mathrm{Cl}^-\) and \(\mathrm{K}^+\) channels promote ion secretion into the luminal space, thus creating an osmotic gradient that promotes water movement in the secretory direction. The current model for saliva secretion proposes that \(\mathrm{Cl}^-/ \mathrm{HCO}_3^-\) anion exchangers (Ae), coupled with a basolateral \(\mathrm{Na}^+/\hbox {proton}\) (\(\hbox {H}^+\)) (Nhe1) antiporter, regulate intracellular pH and act as a secondary \(\mathrm{Cl}^-\) uptake mechanism (Nauntofte in Am J Physiol Gastrointest Liver Physiol 263(6):G823–G837, 1992; Melvin et al. in Annu Rev Physiol 67:445–469, 2005.  https://doi.org/10.1146/annurev.physiol.67.041703.084745). Recent studies demonstrated that Ae4 deficient mice exhibit an approximate \(30\%\) decrease in gland salivation (Peña-Münzenmayer et al. in J Biol Chem 290(17):10677–10688, 2015). Surprisingly, the same study revealed that absence of Ae2 does not impair salivation, as previously suggested. These results seem to indicate that the Ae4 may be responsible for the majority of the secondary \(\mathrm{Cl}^-\) uptake and thus a key mechanism for saliva secretion. Here, by using ‘in-silico’ Ae2 and Ae4 knockout simulations, we produced mathematical support for such controversial findings. Our results suggest that the exchanger’s cotransport of monovalent cations is likely to be important in establishing the osmotic gradient necessary for optimal transepithelial fluid movement.  相似文献   

17.

Background

Superbubbles are distinctive subgraphs in direct graphs that play an important role in assembly algorithms for high-throughput sequencing (HTS) data. Their practical importance derives from the fact they are connected to their host graph by a single entrance and a single exit vertex, thus allowing them to be handled independently. Efficient algorithms for the enumeration of superbubbles are therefore of important for the processing of HTS data. Superbubbles can be identified within the strongly connected components of the input digraph after transforming them into directed acyclic graphs. The algorithm by Sung et al. (IEEE ACM Trans Comput Biol Bioinform 12:770–777, 2015) achieves this task in \(\mathcal {O}(m~log(m))\)-time. The extraction of superbubbles from the transformed components was later improved to by Brankovic et al. (Theor Comput Sci 609:374–383, 2016) resulting in an overall \(\mathcal {O}(m+n)\)-time algorithm.

Results

A re-analysis of the mathematical structure of superbubbles showed that the construction of auxiliary DAGs from the strongly connected components in the work of Sung et al. missed some details that can lead to the reporting of false positive superbubbles. We propose an alternative, even simpler auxiliary graph that solved the problem and retains the linear running time for general digraph. Furthermore, we describe a simpler, space-efficient \(\mathcal {O}(m+n)\)-time algorithm for detecting superbubbles in DAGs that uses only simple data structures.

Implementation

We present a reference implementation of the algorithm that accepts many commonly used formats for the input graph and provides convenient access to the improved algorithm. https://github.com/Fabianexe/Superbubble.
  相似文献   

18.

Background

One way to estimate the evolutionary distance between two given genomes is to determine the minimum number of large-scale mutations, or genome rearrangements, that are necessary to transform one into the other. In this context, genomes can be represented as ordered sequences of genes, each gene being represented by a signed integer. If no gene is repeated, genomes are thus modeled as signed permutations of the form \(\pi =(\pi _1 \pi _2 \ldots \pi _n)\), and in that case we can consider without loss of generality that one of them is the identity permutation \(\iota _n =(1 2 \ldots n)\), and that we just need to sort the other (i.e., transform it into \(\iota _n\)). The most studied genome rearrangement events are reversals, where a segment of the genome is reversed and reincorporated at the same location; and transpositions, where two consecutive segments are exchanged. Many variants, e.g., combining different types of (possibly constrained) rearrangements, have been proposed in the literature. One of them considers that the number of genes involved, in a reversal or a transposition, is never greater than two, which is known as the problem of sorting by super short operations (or SSOs).

Results and conclusions

All problems considering SSOs in permutations have been shown to be in \(\mathsf {P}\), except for one, namely sorting signed circular permutations by super short reversals and super short transpositions. Here we fill this gap by introducing a new graph structure called cyclic permutation graph and providing a series of intermediate results, which allows us to design a polynomial algorithm for sorting signed circular permutations by super short reversals and super short transpositions.
  相似文献   

19.
Previous genomewide association studies (GWAS) and meta-analyses have enumerated several genes/loci in major histocompatibility complex region, which are consistently associated with rheumatoid arthritis (RA) in different ethnic populations. Given the genetic heterogeneity of the disease, it is necessary to replicate these susceptibility loci in other populations. In this case, we investigate the analysis of two SNPs, rs13192471 and rs6457617, from the human leukocyte antigen (HLA) region with the risk of RA in Tunisian population. These SNPs were previously identified to have a strong RA association signal in several GWAS studies. A case–control sample composed of 142 RA patients and 123 healthy controls was analysed. Genotyping of rs13192471 and rs6457617 was carried out using real-time PCR methods by TaqMan allelic discrimination assay. A trend of significant association was found in rs6457617 TT genotype with susceptibility to RA (\(P = 0.04\), \(p_{c} = 0.08\), \(\hbox {OR} = 1.73\)). Moreover, using multivariable analysis, the combination of rs6457617*TT–HLA-DRB1*\(04^{+}\) increased risk of RA (\(\hbox {OR} = 2.38\)), which suggest a gene–gene interaction event between rs6457617 located within the HLA-DQB1 and HLA-DRB1. Additionally, haplotypic analysis highlighted a significant association of rs6457617*T–HLA-DRB1*\(04^{+}\) haplotype with susceptibility to RA (\(P = 0.018\), \(p_{c} = 0.036\), \(\hbox {OR} = 1.72\)). An evidence of association was shown subsequently in \(\hbox {antiCCP}^{+}\) subgroup with rs6457617 both in T allele and TT genotype (\(P = 0.01\), \(p_{c} = 0.03\), \(\hbox {OR} = 1.66\) and \(P = 0.008\), \(p_{c} = 0.024\), \(\hbox {OR} = 1.28\), respectively). However, no association was shown for rs13192471 polymorphism with susceptibility and severity to RA. This study suggests the involvement of rs6457617 locus as risk variant for susceptibility/severity to RA in Tunisian population. Secondly, it highlights the gene–gene interaction between HLA-DQB1 and HLA-DRB1.  相似文献   

20.
We prove almost sure exponential stability for the disease-free equilibrium of a stochastic differential equations model of an SIR epidemic with vaccination. The model allows for vertical transmission. The stochastic perturbation is associated with the force of infection and is such that the total population size remains constant in time. We prove almost sure positivity of solutions. The main result concerns especially the smaller values of the diffusion parameter, and describes the stability in terms of an analogue \(\mathcal{R}_\sigma\) of the basic reproduction number \(\mathcal{R}_0\) of the underlying deterministic model, with \(\mathcal{R}_\sigma \le \mathcal{R}_0\). We prove that the disease-free equilibrium is almost sure exponentially stable if \(\mathcal{R}_\sigma <1\).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号