首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 7 毫秒
1.

Background

As Next-Generation Sequencing data becomes available, existing hardware environments do not provide sufficient storage space and computational power to store and process the data due to their enormous size. This is and will be a frequent problem that is encountered everyday by researchers who are working on genetic data. There are some options available for compressing and storing such data, such as general-purpose compression software, PBAT/PLINK binary format, etc. However, these currently available methods either do not offer sufficient compression rates, or require a great amount of CPU time for decompression and loading every time the data is accessed.

Results

Here, we propose a novel and simple algorithm for storing such sequencing data. We show that, the compression factor of the algorithm ranges from 16 to several hundreds, which potentially allows SNP data of hundreds of Gigabytes to be stored in hundreds of Megabytes. We provide a C++ implementation of the algorithm, which supports direct loading and parallel loading of the compressed format without requiring extra time for decompression. By applying the algorithm to simulated and real datasets, we show that the algorithm gives greater compression rate than the commonly used compression methods, and the data-loading process takes less time. Also, The C++ library provides direct-data-retrieving functions, which allows the compressed information to be easily accessed by other C++ programs.

Conclusions

The SpeedGene algorithm enables the storage and the analysis of next generation sequencing data in current hardware environment, making system upgrades unnecessary.  相似文献   

2.
3.

Background  

The conventional superposition methods use an ordinary least squares (LS) fit for structural comparison of two different conformations of the same protein. The main problem of the LS fit that it is sensitive to outliers, i.e. large displacements of the original structures superimposed.  相似文献   

4.
A new software package, RASPA, for simulating adsorption and diffusion of molecules in flexible nanoporous materials is presented. The code implements the latest state-of-the-art algorithms for molecular dynamics and Monte Carlo (MC) in various ensembles including symplectic/measure-preserving integrators, Ewald summation, configurational-bias MC, continuous fractional component MC, reactive MC and Baker's minimisation. We show example applications of RASPA in computing coexistence properties, adsorption isotherms for single and multiple components, self- and collective diffusivities, reaction systems and visualisation. The software is released under the GNU General Public License.  相似文献   

5.
Models of fibers and capillaries in cross sections of muscle were used to quantify the relationships between diffusion distances and tissue capillarity. The fibers were constructed as square and hexagonal arrays, and the placement of capillaries around the perimeters of the fibers ordered them in similar arrays. Diffusion distances were measured as the percent cumulative frequency of fiber area within a given distance of a capillary when capillary-to-fiber ratio was increased from 0.5 to 4.0. Equations fitted to the data make it possible to estimate diffusion distances in muscle and to correlate changes in diffusion distances with fiber growth, capillary growth, and the geometrical arrangement of capillaries in the muscle bed.  相似文献   

6.
7.
1. Tissue capillarity in muscle was modelled as square-ordered arrays with capillary-to-fiber ratios (C/F) from 0.5 to 'infinity'. 2. C/F up to two had marked effects on diffusion distances, but C/F above had only slight effects on average distances and almost no effect on maximal distances. 3. Capillary growth during normal maturation results in C/F around two. Thus, capillary growth in adult muscle may not be an adaptive mechanism for reducing diffusion distances.  相似文献   

8.
Differential equations are derived whose solution gives the cross-sectional shape of a flexible tube as a function of the transmural pressure. These equations are solved digitally to produce a series of closed curves, each curve representing the shape of a cross section for a particular set of conditions. These are then applied to the case of systemic arteries, pulmonary arteries, and large veins. The results predict that systemic arteries must always be circular, even when the internal and external pressures are equal. In veins, a small positive internal pressure causes them to become circular, regardless of their initial state, with negligible stretching. Further increases in internal pressure cause the area of the cross section to increase due only to stretching, the shape remaining essentially circular. With pulmonary arteries, known to be noncircular, changes in the cross-sectional area result from a combination of stretching and changes of shape. Presented at the Society for Mathematical Biology Meeting, University of Pennsylvania, Philadelphia, August 19–21, 1976.  相似文献   

9.
Summary Tissue capillarity and diffusion distances were determined for red and white skeletal muscles of adult birds ranging in mass from 10.8 to 6200 g. In addition, literature values for capillarity and diffusion distances in skeletal muscles of mammals were incorporated into the data set. Muscle mass was closely coupled to body mass. However, no significant allometric relations were found for any of the other variables measured. Number of capillaries per fiber was not correlated with cross sectional area of individual muscle fibers. Thus, capillary density decreased in a hyperbolic manner against fiber area and diffusion distance decreased in a hyperbolic manner against the number of capillaries per muscle fiber. Red muscles had significantly higher numbers of capillaries per fiber and significantly shorter diffusion distances than did white muscles. The patterns for tissue capillarity and diffusion distances in avian muscle reported here are similar to values reported previously for mammalian muscles. In both taxanomic groups capillarity and diffusion distances are independent of body mass. In addition, diffusion distances are characteristic of capillaries distributed in random arrays through the muscle cross section.Abbreviations ALD muscle anterior latissimus dorsi - CD numerical density of capillaries in muscle cross section - C/F number of capillaries per individual muscle fiber - FCSA fiber cross sectional area - GST muscle gastrocnemius - LGST lateral head of muscle gastrocnemius - MGST medial head of muscle gastrocnemius - MM muscle mass - PLD muscle posterior latissimus dorsi  相似文献   

10.
Finding optimal three-dimensional molecular configurations based on a limited amount of experimental and/or theoretical data requires efficient nonlinear optimization algorithms. Optimization methods must be able to find atomic configurations that are close to the absolute, or global, minimum error and also satisfy known physical constraints such as minimum separation distances between atoms (based on van der Waals interactions). The most difficult obstacles in these types of problems are that 1) using a limited amount of input data leads to many possible local optima and 2) introducing physical constraints, such as minimum separation distances, helps to limit the search space but often makes convergence to a global minimum more difficult. We introduce a constrained global optimization algorithm that is robust and efficient in yielding near-optimal three-dimensional configurations that are guaranteed to satisfy known separation constraints. The algorithm uses an atom-based approach that reduces the dimensionality and allows for tractable enforcement of constraints while maintaining good global convergence properties. We evaluate the new optimization algorithm using synthetic data from the yeast phenylalanine tRNA and several proteins, all with known crystal structure taken from the Protein Data Bank. We compare the results to commonly applied optimization methods, such as distance geometry, simulated annealing, continuation, and smoothing. We show that compared to other optimization approaches, our algorithm is able combine sparse input data with physical constraints in an efficient manner to yield structures with lower root mean squared deviation.  相似文献   

11.
12.
13.
Pyrene-containing compounds are commonly used in a number of fluorescence-based applications because they can form excited-state dimers (excimers) by stacking interaction between excited-state and ground-state monomers. Their usefulness arises from the facts that excimer formation requires close proximity between the pyrenes and that the excimer emission spectrum is very different from that of the monomers. One of many applications is to assess proximity between specific sites of macromolecules labeled with pyrenes. This has been done using pyrene maleimide, a reagent that reacts with reduced thiols of cysteines, but its use for structural studies of proteins has been rather limited. This is because the introduction of two cysteines at sufficiently close distance from each other to obtain excimer fluorescence upon labeling with pyrene maleimide requires detailed knowledge of the protein structure or extensive site-directed mutagenesis trials. We synthesized and tested a new compound with a 4-carbon methylene linker placed between the maleimide and the pyrene (pyrene-4-maleimide), with the aim of increasing the sampling distance for excimer formation and making the use of excimer fluorescence simpler and more widespread. We tested the new compound on thiol-modified oligonucleotides and showed that it can detect proximity between thiols beyond the reach of pyrene maleimide. Based on its spectroscopic and chemical properties, we suggest that pyrene-4-maleimide is an excellent probe to assess proximities between cysteines in proteins and thiols in other macromolecules, as well as to follow conformational changes.  相似文献   

14.
A direct comparison of the metric matrix distance geometry and restrained molecular dynamics methods for determining three-dimensional structures of proteins on the basis of interproton distances is presented using crambin as a model system. It is shown that both methods reproduce the overall features of the secondary and tertiary structure (shape and polypeptide fold). The region of conformational space sampled by the converged structures generated by the two methods is similar in size, and in both cases the converged structures are distributed about mean structures which are closer to the X-ray structure than any of the individual structures. The restrained molecular dynamics structures are superior to those obtained from distance geometry as regards local backbone conformation, side chain positions and non-bonding energies.  相似文献   

15.
In order to examine the effects of various methods for tissue preparation on ultrastructural analyses, and hence standardize reported values, six commonly used fixatives were examined for their quantitative effect on muscle fibre size and capillary dimensions. Both the composition and osmolarity of fixatives affected structural indices significantly, producing a range of values of similar magnitude to that presented in reports of structural adaptations. When comparing data from different studies, therefore, it is essential to establish that dissimilar values reflect different tissue composition, rather than methodologies. The method of choice for quantitative analysis of intracellular diffusion pathways uses a combined aldehyde fixative with a metabolic poison, and an isotonic buffer as vehicle.  相似文献   

16.
17.
Zavodszky MI  Lei M  Thorpe MF  Day AR  Kuhn LA 《Proteins》2004,57(2):243-261
We describe a new method for modeling protein and ligand main-chain flexibility, and show its ability to model flexible molecular recognition. The goal is to sample the full conformational space, including large-scale motions that typically cannot be reached in molecular dynamics simulations due to the computational intensity, as well as conformations that have not been observed yet by crystallography or NMR. A secondary goal is to assess the degree of flexibility consistent with protein-ligand recognition. Flexibility analysis of the target protein is performed using the graph-theoretic algorithm FIRST, which also identifies coupled networks of covalent and noncovalent bonds within the protein. The available conformations of the flexible regions are then explored with ROCK by random-walk sampling of the rotatable bonds. ROCK explores correlated motions by only sampling dihedral angles that preserve the coupled bond networks in the protein and generates conformers with good stereochemistry, without using a computationally expensive potential function. A representative set of the conformational ensemble generated this way can be used as targets for docking with SLIDE, which handles the flexibility of protein and ligand side-chains. The realism of this protein main-chain conformational sampling is assessed by comparison with time-resolved NMR studies of cyclophilin A motions. ROCK is also effective for modeling the flexibility of large cyclic and polycyclic ligands, as demonstrated for cyclosporin and zearalenol. The use of this combined approach to perform docking with main-chain flexibility is illustrated for the cyclophilin A-cyclosporin complex and the estrogen receptor in complex with zearalenol, while addressing the question of how much flexibility is allowed without hindering molecular recognition.  相似文献   

18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号