首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Patterns with wildcards in specified positions, namely spaced seeds, are increasingly used instead of k-mers in many bioinformatics applications that require indexing, querying and rapid similarity search, as they can provide better sensitivity. Many of these applications require to compute the hashing of each position in the input sequences with respect to the given spaced seed, or to multiple spaced seeds. While the hashing of k-mers can be rapidly computed by exploiting the large overlap between consecutive k-mers, spaced seeds hashing is usually computed from scratch for each position in the input sequence, thus resulting in slower processing.

Results

The method proposed in this paper, fast spaced-seed hashing (FSH), exploits the similarity of the hash values of spaced seeds computed at adjacent positions in the input sequence. In our experiments we compute the hash for each positions of metagenomics reads from several datasets, with respect to different spaced seeds. We also propose a generalized version of the algorithm for the simultaneous computation of multiple spaced seeds hashing. In the experiments, our algorithm can compute the hashing values of spaced seeds with a speedup, with respect to the traditional approach, between 1.6\(\times\) to 5.3\(\times\), depending on the structure of the spaced seed.

Conclusions

Spaced seed hashing is a routine task for several bioinformatics application. FSH allows to perform this task efficiently and raise the question of whether other hashing can be exploited to further improve the speed up. This has the potential of major impact in the field, making spaced seed applications not only accurate, but also faster and more efficient.

Availability

The software FSH is freely available for academic use at: https://bitbucket.org/samu661/fsh/overview.
  相似文献   

2.

Background

NGS data contains many machine-induced errors. The most advanced methods for the error correction heavily depend on the selection of solid k-mers. A solid k-mer is a k-mer frequently occurring in NGS reads. The other k-mers are called weak k-mers. A solid k-mer does not likely contain errors, while a weak k-mer most likely contains errors. An intensively investigated problem is to find a good frequency cutoff f0 to balance the numbers of solid and weak k-mers. Once the cutoff is determined, a more challenging but less-studied problem is to: (i) remove a small subset of solid k-mers that are likely to contain errors, and (ii) add a small subset of weak k-mers, that are likely to contain no errors, into the remaining set of solid k-mers. Identification of these two subsets of k-mers can improve the correction performance.

Results

We propose to use a Gamma distribution to model the frequencies of erroneous k-mers and a mixture of Gaussian distributions to model correct k-mers, and combine them to determine f0. To identify the two special subsets of k-mers, we use the z-score of k-mers which measures the number of standard deviations a k-mer’s frequency is from the mean. Then these statistically-solid k-mers are used to construct a Bloom filter for error correction. Our method is markedly superior to the state-of-art methods, tested on both real and synthetic NGS data sets.

Conclusion

The z-score is adequate to distinguish solid k-mers from weak k-mers, particularly useful for pinpointing out solid k-mers having very low frequency. Applying z-score on k-mer can markedly improve the error correction accuracy.
  相似文献   

3.

Background

A basic task in bioinformatics is the counting of k-mers in genome sequences. Existing k-mer counting tools are most often optimized for small k < 32 and suffer from excessive memory resource consumption or degrading performance for large k. However, given the technology trend towards long reads of next-generation sequencers, support for large k becomes increasingly important.

Results

We present the open source k-mer counting software Gerbil that has been designed for the efficient counting of k-mers for k ≥ 32. Our software is the result of an intensive process of algorithm engineering. It implements a two-step approach. In the first step, genome reads are loaded from disk and redistributed to temporary files. In a second step, the k-mers of each temporary file are counted via a hash table approach. In addition to its basic functionality, Gerbil can optionally use GPUs to accelerate the counting step. In a set of experiments with real-world genome data sets, we show that Gerbil is able to efficiently support both small and large k.

Conclusions

While Gerbil’s performance is comparable to existing state-of-the-art open source k-mer counting tools for small k < 32, it vastly outperforms its competitors for large k, thereby enabling new applications which require large values of k.
  相似文献   

4.

Background

Advances in biotechnology have changed the manner of characterizing large populations of microbial communities that are ubiquitous across several environments."Metagenome" sequencing involves decoding the DNA of organisms co-existing within ecosystems ranging from ocean, soil and human body. Several researchers are interested in metagenomics because it provides an insight into the complex biodiversity across several environments. Clinicians are using metagenomics to determine the role played by collection of microbial organisms within human body with respect to human health wellness and disease.

Results

We have developed an efficient and scalable, species richness estimation algorithm that uses locality sensitive hashing (LSH). Our algorithm achieves efficiency by approximating the pairwise sequence comparison operations using hashing and also incorporates matching of fixed-length, gapless subsequences criterion to improve the quality of sequence comparisons. We use LSH-based similarity function to cluster similar sequences and make individual groups, called operational taxonomic units (OTUs). We also compute different species diversity/richness metrics by utilizing OTU assignment results to further extend our analysis.

Conclusion

The algorithm is evaluated on synthetic samples and eight targeted 16S rRNA metagenome samples taken from seawater. We compare the performance of our algorithm with several competing diversity estimation algorithms. We show the benefits of our approach with respect to computational runtime and meaningful OTU assignments. We also demonstrate practical significance of the developed algorithm by comparing bacterial diversity and structure across different skin locations.
  相似文献   

5.

Aims

Soil fungal pathogens can result in the failure of seedling establishment, but the effects of fungicide applications on seed/seedling survival have differed among studies. We assumed that the variation may relate to seed dormancy/germination characteristics and hypothesized that nondormant germinating seeds are more likely to be killed by fungal pathogens than dormant seeds.

Methods

Dormant and nondormant seeds of Stipa bungeana and Lespedeza davurica were inoculated with a pathogenic fungus Fusarium tricinctum under laboratory and field conditions. The outcomes of seed/seedling fate and other parameters were evaluated.

Results

In the laboratory, nondormant seeds inoculated with F. tricinctum developed white tufts of mycelium on the radicles of germinating seeds causing them to quickly die, but dormant seeds remained intact. In contrast, in the field inoculation with F. tricinctum did not cause higher mortality of nondormant than dormant seeds but resulted in higher percentages of seedling death before they emerged from soil than the controls.

Conclusions

Our results suggest that dormancy protects seeds from being attacked by some pathogens by preventing germination, but the protection is lost once germination has commenced. Further study involving various plant species with more seeds is needed to assess the generality of this pathogen-seed interaction hypothesis.
  相似文献   

6.

Introduction

Mass spectrometry imaging (MSI) experiments result in complex multi-dimensional datasets, which require specialist data analysis tools.

Objectives

We have developed massPix—an R package for analysing and interpreting data from MSI of lipids in tissue.

Methods

massPix produces single ion images, performs multivariate statistics and provides putative lipid annotations based on accurate mass matching against generated lipid libraries.

Results

Classification of tissue regions with high spectral similarly can be carried out by principal components analysis (PCA) or k-means clustering.

Conclusion

massPix is an open-source tool for the analysis and statistical interpretation of MSI data, and is particularly useful for lipidomics applications.
  相似文献   

7.

Background

Mapping medical terms to standardized UMLS concepts is a basic step for leveraging biomedical texts in data management and analysis. However, available methods and tools have major limitations in handling queries over the UMLS Metathesaurus that contain inaccurate query terms, which frequently appear in real world applications.

Methods

To provide a practical solution for this task, we propose a layered dynamic programming mapping (LDPMap) approach, which can efficiently handle these queries. LDPMap uses indexing and two layers of dynamic programming techniques to efficiently map a biomedical term to a UMLS concept.

Results

Our empirical study shows that LDPMap achieves much faster query speeds than LCS. In comparison to the UMLS Metathesaurus Browser and MetaMap, LDPMap is much more effective in querying the UMLS Metathesaurus for inaccurately spelled medical terms, long medical terms, and medical terms with special characters.

Conclusions

These results demonstrate that LDPMap is an efficient and effective method for mapping medical terms to the UMLS Metathesaurus.
  相似文献   

8.

Objective

To investigate the oil body protein and function in seeds of mature seagrass, Thalassia hemprichii.

Results

Seeds of mature seagrass T. hemprichii when stained with a fluorescent probe BODIPY showed the presence of oil bodies in intracellular cells. Triacylglycerol was the major lipid class in the seeds. Protein extracted from seagrass seeds was subjected to immunological cross-recognition with land plant seed oil body proteins, such as oleosin and caleosin, resulting in no cross-reactivity. An oleosin-like gene was found in seagrass seeds. Next generation sequencing and sequence alignment indicated that the deduced seagrass seed oleosin-like protein has a central hydrophobic domain responsible for their anchoring onto the surface of oil bodies. Phylogenetic analysis showed that the oleosin-like protein was evolutionarily closer to pollen oleosin than to seed oleosins.

Conclusion

Oil body protein found in seagrass seeds represent a distinct class of land seed oil body proteins.
  相似文献   

9.

Background and aims

We characterized fungal endophytes of seeds of invasive, non-native Phragmites from three sites in the Great Lakes region to determine if fungal symbiosis could contribute to invasiveness through their effects on seed germination and seedling growth.

Methods

Field-collected seeds were surface sterilized and plated on agar to culture endophytes for ITS sequencing. Prevalence of specific endophytes from germinated and non-germinated seeds, and from seedlings, was compared.

Results

One-third of 740 seeds yielded endophyte isolates. Fifteen taxa were identified with Alternaria sp. representing 54% of all isolates followed by Phoma sp. (21%) and Penicillium corylophilum (12%). Overall germination of seeds producing an isolate (36%) was significantly higher than seeds not producing an isolate (20%). Penicillium in particular was strongly associated with increased germination of seeds from one site. Sixty-three isolates and 11 taxa were also obtained from 30 seedlings where Phoma, Penicillium and Alternaria respectively were most prevalent. There was a significant effect of isolating an endophyte from the seed on seedling growth.

Conclusions

These results suggest that many endophyte taxa are transmitted in seeds and can increase seed germination and seedling growth of invasive Phragmites. The role of fungal endophytes in host establishment, growth and invasiveness in nature requires further research.
  相似文献   

10.

Aims

The pseudo-cereal quinoa has an outstanding nutritional value. Seed germination is unusually fast, and plant tolerance to salt stress exceptionally high. Seemingly all seeds harbor bacterial endophytes. This work examines mitogen-activated protein kinase (MAPK) activities during early development. It evaluates possible contribution of endophytes to rapid germination and plant robustness.

Methods

MAPK activities were monitored in water- and NaCl-imbibed seeds over a 4-h-period using an immunoblot-based approach. Cellulolytic and pectinolytic abilities of bacteria were assessed biochemically, and cellular movement, biofilm, elicitor and antimicrobial compound synthesis genes sequenced. GyrA-based, cultivation-independent studies provided first insight into endophyte diversity.

Results

Quinoa seeds and seedlings exhibit remarkably complex and dynamic MAPK activity profiles. Depending on seed origin, variances exist in MAPK patterns and probably also in endophyte assemblages. Mucilage-degrading activities enable endophytes to colonize seed surfaces of a non-host species, chia, without apparent adverse effects.

Conclusions

Owing to their motility, cell wall-loosening and elicitor-generating abilities, quinoa endophytes have the potential to drive cell expansion, move across cell walls, generate damage-associated molecular patterns and activate MAPKs in their host. Bacteria may thus facilitate rapid germination and confer a primed state directly upon seed rehydration. Transfer into non-native crops appears both desirable and feasible.
  相似文献   

11.

Introduction

In plant metabolomics, metabolite contents are often normalized by sample weight. However, accurate weighing of very small samples, such as individual Arabidopsis thaliana seeds (approximately 20 µg), is difficult, which may lead to irreproducible results.

Objectives

We aimed to establish alternative normalization methods for seed-grain-based comparative metabolomics of A. thaliana.

Methods

Arabidopsis thaliana seeds were assumed to have a prolate spheroid shape. Using a microscope image of each seed, the lengths of major and minor axes were measured by fitting a projected 2-dimensional shape of each seed as an ellipse. Metabolic profiles of individual diploid or tetraploid A. thaliana seeds were measured by our highly sensitive protocol (“widely targeted metabolomics”) that uses liquid chromatography coupled with tandem quadrupole mass spectrometry. Mass spectrometric analysis of 1 µL of solution extract identified more than 100 metabolites. The data were normalized by various seed-size measures, including seed volume (single-grain-based analysis). For comparison, metabolites were extracted from 4 mg of diploid and tetraploid A. thaliana seeds and their metabolic profiles were analyzed by normalization of weight (weight-based analysis).

Results

A small number of metabolites showed statistically significant differences in the single-grain-based analysis compared to weight-based analysis. A total of 17 metabolites showed statistically different accumulation between ploidy types with similar fold changes in both analyses.

Conclusion

Seed-size measures obtained by microscopic imaging were useful for data normalization. Single-grain-based analysis enables evaluation of metabolism of each seed and elucidates the metabolic profiles of precious bioresources by using small amounts of samples.
  相似文献   

12.

Background

The reconstruction of ancestral genomes must deal with the problem of resolution, necessarily involving a trade-off between trying to identify genomic details and being overwhelmed by noise at higher resolutions.

Results

We use the median reconstruction at the synteny block level, of the ancestral genome of the order Gentianales, based on coffee, Rhazya stricta and grape, to exemplify the effects of resolution (granularity) on comparative genomic analyses.

Conclusions

We show how decreased resolution blurs the differences between evolving genomes, with respect to rate, mutational process and other characteristics.
  相似文献   

13.

Introduction

Seed germination is inherently related to seed metabolism, which changes throughout its maturation, desiccation and germination processes. The metabolite content of a seed and its ability to germinate are determined by underlying genetic architecture and environmental effects during development.

Objective

This study aimed to assess an integrative approach to explore genetics modulating seed metabolism in different developmental stages and the link between seed metabolic- and germination traits.

Methods

We have utilized gas chromatography-time-of-flight/mass spectrometry (GC-TOF/MS) metabolite profiling to characterize tomato seeds during dry and imbibed stages. We describe, for the first time in tomato, the use of a so-called generalized genetical genomics (GGG) model to study the interaction between genetics, environment and seed metabolism using 100 tomato recombinant inbred lines (RILs) derived from a cross between Solanum lycopersicum and Solanum pimpinellifolium.

Results

QTLs were found for over two-thirds of the metabolites within several QTL hotspots. The transition from dry to 6 h imbibed seeds was associated with programmed metabolic switches. Significant correlations varied among individual metabolites and the obtained clusters were significantly enriched for metabolites involved in specific biochemical pathways.

Conclusions

Extensive genetic variation in metabolite abundance was uncovered. Numerous identified genetic regions that coordinate groups of metabolites were detected and these will contain plausible candidate genes. The combined analysis of germination phenotypes and metabolite profiles provides a strong indication for the hypothesis that metabolic composition is related to germination phenotypes and thus to seed performance.
  相似文献   

14.

Introduction

Collecting feces is easy. It offers direct outcome to endogenous and microbial metabolites.

Objectives

In a context of lack of consensus about fecal sample preparation, especially in animal species, we developed a robust protocol allowing untargeted LC-HRMS fingerprinting.

Methods

The conditions of extraction (quantity, preparation, solvents, dilutions) were investigated in bovine feces.

Results

A rapid and simple protocol involving feces extraction with methanol (1/3, M/V) followed by centrifugation and a step filtration (10 kDa) was developed.

Conclusion

The workflow generated repeatable and informative fingerprints for robust metabolome characterization.
  相似文献   

15.

Background and aims

Plant breeding activities shape the rhizosphere microbiome but less is known about the relationship of both with the seed microbiome. We analyzed the composition of bacterial communities of seeds and rhizospheres of Styrian oil pumpkin genotypes in comparison to bulk soil to elucidate specific microbial signatures to support a concept involving plant-microbe interactions in breeding strategies.

Methods

The seed and rhizosphere microbiomes of 14 genotypes of oilseed pumpkin and relatives were analyzed using a 16S rRNA gene amplicon sequencing approach, which was assessed by bioinformatics and statistical methods.

Results

All analyzed microhabitats were characterized by diverse bacterial communities, but the relative proportions of phyla and the overall diversity was different. Seed microbiomes were characterized by the lowest diversity and dominant members of Enterobacteriaceae including potential pathogens (Erwinia, Pectobacterium). Potential plant-beneficial bacteria like Lysobacter, Paenibacillus and Lactococcus contributed to the microbial communities in significant abundances. Interestingly, strong genotype-specific microbiomes were detected for seeds but not for the rhizospheres.

Conclusions

Our study indicates a strong impact of the Cucurbita pepo genotype on the composition of the seed microbiome. This should be considered in breeding of new cultivars that are more capable of exploiting beneficial indigenous microbial communities.
  相似文献   

16.

Background

Practical applications for data analysis may require combining multiple databases belonging to different owners, such as health centers. The analysis should be performed without violating privacy of neither the centers themselves, nor the patients whose records these centers store. To avoid biased analysis results, it may be important to remove duplicate records among the centers, so that each patient’s data would be taken into account only once. This task is very closely related to privacy-preserving record linkage.

Methods

This paper presents a solution to privacy-preserving deduplication among records of several databases using secure multiparty computation. It is build upon one of the fastest practical secure multiparty computation platforms, called Sharemind.

Results

The tests on ca 10 million records of simulated databases with 1000 health centers of 10000 records each show that the computation is feasible in practice. The expected running time of the experiment is ca. 30 min for computing servers connected over 100 Mbit/s WAN, the expected error of the results is 2?40, and no errors have been detected for the particular test set that we used for our benchmarks.

Conclusions

The solution is ready for practical use. It has well-defined security properties, implied by the properties of Sharemind platform. The solution assumes that exact matching of records is required, and a possible future research would be extending it to approximate matching.
  相似文献   

17.

Aims

Seeds are vectors of a diversified microbiota including plant pathogens. To better understand transmission of common bacterial blight (CBB) agents to bean seeds, we analyzed the role of non-pathogenic xanthomonads on seed transmission efficiency and investigated the location of Xanthomonas citri pv. fuscans (Xcf) into seeds and plantlets.

Methods

Competition between CBB and NP strains was initially assessed in vitro and then extended in planta to monitor the impact of co-inoculation on Xcf seed transmission. Moreover, location of Xcf strains in seeds and seedlings was visualized using a combination of gfp-tagged strain and DOPE-FISH/CSLM.

Results

Whereas CBB agent growth was inhibited in vitro by some seed-borne non-pathogenic xanthomonads strains, these strains did not transmit efficiently to seed through floral pathway and did not affect Xcf seed transmission. Xcf cells were observed entering seed through vascular elements and parenchyma of funiculus, but also micropyle and testa. Xcf cells were observed, moreover, among other bacteria on radicle surfaces, especially tip, in cotyledons, and plumules.

Conclusions

CBB agents are more efficient than non-pathogenic xanthomonads in using the floral route to colonize seeds. CBB agents are located within different niches in the seed tissues up to the embryonic axis.
  相似文献   

18.

Introduction

Data sharing is being increasingly required by journals and has been heralded as a solution to the ‘replication crisis’.

Objectives

(i) Review data sharing policies of journals publishing the most metabolomics papers associated with open data and (ii) compare these journals’ policies to those that publish the most metabolomics papers.

Methods

A PubMed search was used to identify metabolomics papers. Metabolomics data repositories were manually searched for linked publications.

Results

Journals that support data sharing are not necessarily those with the most papers associated to open metabolomics data.

Conclusion

Further efforts are required to improve data sharing in metabolomics.
  相似文献   

19.

Background

In recent years the visualization of biomagnetic measurement data by so-called pseudo current density maps or Hosaka-Cohen (HC) transformations became popular.

Methods

The physical basis of these intuitive maps is clarified by means of analytically solvable problems.

Results

Examples in magnetocardiography, magnetoencephalography and magnetoneurography demonstrate the usefulness of this method.

Conclusion

Hardware realizations of the HC-transformation and some similar transformations are discussed which could advantageously support cross-platform comparability of biomagnetic measurements.
  相似文献   

20.

Aims

Maintaining variation in germination response provides a selective advantage, by spreading risk during recruitment. In fire-prone regions, physically dormant (PY) species vary their response to dormancy-breaking fire-related heat cues at the intra-population level. However little is known about physiologically dormant (PD) species, which respond to smoke cues. These contrasting dormancy types reflect different evolutionary developmental pathways and we considered whether intra-population variation in germination of Boronia floribunda (PD) occurs in response to smoke.

Methods

Seeds were collected from individual plants. We assessed germination magnitude and rate of seeds from each individual in response to a single aerosol smoke treatment, and three concentrations of smoke water, using replicate seed lots in temperature-controlled incubators.

Results

The magnitude and onset of germination differed significantly among individuals in response to the same smoke treatment. Seeds from different individuals varied in their sensitivity to smoke water concentration, with some responding to very low doses, and others obligated to high doses.

Conclusions

Variation in germination response to smoke highlights a mechanism by which PD species spread risk, by allowing some seeds to emerge quickly, while others remain dormant in the soil seed bank. The similarity to heat-cued variation displayed by PY species suggests that this could represent a convergent functional response.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号