共查询到20条相似文献,搜索用时 8 毫秒
1.
Many methods and tools are available for preprocessing high-throughput RNA sequencing data and detecting differential expression. 相似文献
3.
The Composite Link Model is a generalization of the generalized linear model in which expected values of observed counts are constructed as a sum of generalized linear components. When combined with penalized likelihood, it provides a powerful and elegant way to estimate haplotype probabilities from observed genotypes. Uncertain ("fuzzy") genotypes, like those resulting from AFLP scores, can be handled by adding an extra layer to the model. We describe the model and the estimation algorithm. We apply it to a data set of accurate human single nucleotide polymorphism (SNP) and to a data set of fuzzy tomato AFLP scores. 相似文献
5.
A large number of RNA-sequencing studies set out to predict mutations, splice junctions or fusion RNAs. We propose a method, CRAC, that integrates genomic locations and local coverage to enable such predictions to be made directly from RNA-seq read analysis. A k-mer profiling approach detects candidate mutations, indels and splice or chimeric junctions in each single read. CRAC increases precision compared with existing tools, reaching 99:5% for splice junctions, without losing sensitivity. Importantly, CRAC predictions improve with read length. In cancer libraries, CRAC recovered 74% of validated fusion RNAs and predicted novel recurrent chimeric junctions. CRAC is available at http://crac.gforge.inria.fr. 相似文献
6.
Accurate mapping of spliced RNA-Seq reads to genomic DNA has been known as a challenging problem. Despite significant efforts invested in developing efficient algorithms, with the human genome as a primary focus, the best solution is still not known. A recently introduced tool, TrueSight, has demonstrated better performance compared with earlier developed algorithms such as TopHat and MapSplice. To improve detection of splice junctions, TrueSight uses information on statistical patterns of nucleotide ordering in intronic and exonic DNA. This line of research led to yet another new algorithm, UnSplicer, designed for eukaryotic species with compact genomes where functional alternative splicing is likely to be dominated by splicing noise. Genome-specific parameters of the new algorithm are generated by GeneMark-ES, an ab initio gene prediction algorithm based on unsupervised training. UnSplicer shares several components with TrueSight; the difference lies in the training strategy and the classification algorithm. We tested UnSplicer on RNA-Seq data sets of Arabidopsis thaliana, Caenorhabditis elegans, Cryptococcus neoformans and Drosophila melanogaster. We have shown that splice junctions inferred by UnSplicer are in better agreement with knowledge accumulated on these well-studied genomes than predictions made by earlier developed tools. 相似文献
11.
BackgroundThe identification of inversions of DNA segments shorter than read length (e.g., 100 bp), defined as micro-inversions (MIs), remains challenging for next-generation sequencing reads. It is acknowledged that MIs are important genomic variation and may play roles in causing genetic disease. However, current alignment methods are generally insensitive to detect MIs. Here we develop a novel tool, MID (Micro-Inversion Detector), to identify MIs in human genomes using next-generation sequencing reads. ResultsThe algorithm of MID is designed based on a dynamic programming path-finding approach. What makes MID different from other variant detection tools is that MID can handle small MIs and multiple breakpoints within an unmapped read. Moreover, MID improves reliability in low coverage data by integrating multiple samples. Our evaluation demonstrated that MID outperforms Gustaf, which can currently detect inversions from 30 bp to 500 bp. ConclusionsTo our knowledge, MID is the first method that can efficiently and reliably identify MIs from unmapped short next-generation sequencing reads. MID is reliable on low coverage data, which is suitable for large-scale projects such as the 1000 Genomes Project (1KGP). MID identified previously unknown MIs from the 1KGP that overlap with genes and regulatory elements in the human genome. We also identified MIs in cancer cell lines from Cancer Cell Line Encyclopedia (CCLE). Therefore our tool is expected to be useful to improve the study of MIs as a type of genetic variant in the human genome. The source code can be downloaded from: http://cqb.pku.edu.cn/ZhuLab/MID. 相似文献
12.
Three oligonucleotide probes complementary to specific DNA sequences of the six human globin genes (epsilon, G gamma, A gamma, psi beta, delta, beta) were synthesized. The oligonucleotides were used either singly or in combination as hybridization probes to determine the haplotype of the human beta-globin gene cluster employing the four conventionally used restriction endonucleases HincII, HindIII, AvaII, and BamHI, in addition to HpaI. Polymorphism in the epsilon- and psi beta-genes (HincII) can be simultaneously determined with a single probe mixture. One of the probes complementary to both the psi beta- and gamma-genes is useful for determining both HindIII and HincII polymorphisms. The advantages of these probes relative to conventional cDNA probes are discussed. 相似文献
13.
Nitric oxide (NO) plays a critical role in a number of physiological processes and is produced in mammalian cells by nitric oxide synthase (NOS) isozymes. Because of the diverse functions of NO, pharmaceutical interventions which seek to abrogate adverse effects of excess NOS activity must not interfere with the normal regulation of NO levels in the body. A method has been developed for the control of NOS enzyme activity using the localized photochemical release of a caged isoform-specific NOS inhibitor. The caged form of an iNOS inhibitor has been synthesized and tested for photosensitivity and potency. UV and multiphoton uncaging were verified using a hemoglobin-based assay. IC(50) values were determined for the inhibitor (70+/-11 nM), the caged inhibitor (1098+/-172 nM), the UV uncaged inhibitor (67+/-26 nM) and the multiphoton uncaged inhibitor (73+/-11 nM). UV irradiation of the caged inhibitor resulted in a 86% reduction in iNOS activity after 5 min. Multiphoton uncaging had an apparent first order time constant of 0.007+/-0.001 min(-1). A therapeutic range exists, with molar excess of inhibitor to enzyme from 3- to 7-fold, over which the full dynamic range of the inhibition can be exploited. 相似文献
14.
MOTIVATION: A typical metagenome dataset generated using a 454 pyrosequencing platform consists of short reads sampled from the collective genome of a microbial community. The amount of sequence in such datasets is usually insufficient for assembly, and traditional gene prediction cannot be applied to unassembled short reads. As a result, analysis of such datasets usually involves comparisons in terms of relative abundances of various protein families. The latter requires assignment of individual reads to protein families, which is hindered by the fact that short reads contain only a fragment, usually small, of a protein. RESULTS: We have considered the assignment of pyrosequencing reads to protein families directly using RPS-BLAST against COG and Pfam databases and indirectly via proxygenes that are identified using BLASTx searches against protein sequence databases. Using simulated metagenome datasets as benchmarks, we show that the proxygene method is more accurate than the direct assignment. We introduce a clustering method which significantly reduces the size of a metagenome dataset while maintaining a faithful representation of its functional and taxonomic content. 相似文献
15.
BackgroundLong-read sequencing technologies were launched a few years ago, and in contrast with short-read sequencing technologies, they offered a promise of solving assembly problems for large and complex genomes. Moreover by providing long-range information, it could also solve haplotype phasing. However, existing long-read technologies still have several limitations that complicate their use for most research laboratories, as well as in large and/or complex genome projects. In 2014, Oxford Nanopore released the MinION® device, a small and low-cost single-molecule nanopore sequencer, which offers the possibility of sequencing long DNA fragments. ResultsThe assembly of long reads generated using the Oxford Nanopore MinION® instrument is challenging as existing assemblers were not implemented to deal with long reads exhibiting close to 30% of errors. Here, we presented a hybrid approach developed to take advantage of data generated using MinION® device. We sequenced a well-known bacterium, Acinetobacter baylyi ADP1 and applied our method to obtain a highly contiguous (one single contig) and accurate genome assembly even in repetitive regions, in contrast to an Illumina-only assembly. Our hybrid strategy was able to generate NaS (Nanopore Synthetic-long) reads up to 60 kb that aligned entirely and with no error to the reference genome and that spanned highly conserved repetitive regions. The average accuracy of NaS reads reached 99.99% without losing the initial size of the input MinION® reads. ConclusionsWe described NaS tool, a hybrid approach allowing the sequencing of microbial genomes using the MinION® device. Our method, based ideally on 20x and 50x of NaS and Illumina reads respectively, provides an efficient and cost-effective way of sequencing microbial or small eukaryotic genomes in a very short time even in small facilities. Moreover, we demonstrated that although the Oxford Nanopore technology is a relatively new sequencing technology, currently with a high error rate, it is already useful in the generation of high-quality genome assemblies. Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-1519-z) contains supplementary material, which is available to authorized users. 相似文献
17.
BackgroundCircular RNA (circRNA) is one type of noncoding RNA that forms a covalently closed continuous loop. Similar to long noncoding RNA (lncRNA), circRNA can act as microRNA (miRNA) ‘sponges’ to regulate gene expression, and its abnormal expression is related to diseases such as atherosclerosis, nervous system disorders and cancer. So far, there have been no systematic studies on circRNA abundance and expression profiles in human adult and fetal tissues.ResultsWe explored circRNA expression profiles using RNA-seq data for six adult and fetal normal tissues (colon, heart, kidney, liver, lung, and stomach) and four gland normal tissues (adrenal gland, mammary gland, pancreas, and thyroid gland). A total of 8120, 25,933 and 14,433 circRNAs were detected by at least two supporting junction reads in adult, fetal and gland tissues, respectively. Among them, 3092, 14,241 and 6879 circRNAs were novel when compared to the published results. In each adult tissue type, we found at least 1000 circRNAs, among which 36.97–50.04% were tissue-specific. We reported 33 circRNAs that were ubiquitously expressed in all the adult tissues we examined. To further explore the potential “housekeeping” function of these circRNAs, we constructed a circRNA-miRNA-mRNA regulatory network containing 17 circRNAs, 22 miRNAs and 90 mRNAs. Furthermore, we found that both the abundance and the relative expression level of circRNAs were higher in fetal tissue than adult tissue. The number of circRNAs in gland tissues, especially in mammary gland (9665 circRNA candidates), was higher than that of other adult tissues (1160–3777).ConclusionsWe systematically investigated circRNA expression in a variety of human adult and fetal tissues. Our observation of different expression level of circRNAs in adult and fetal tissues suggested that circRNAs might play their role in a tissue-specific and development-specific fashion. Analysis of circRNA-miRNA-mRNA network provided potential targets of circRNAs. High expression level of circRNAs in mammary gland might be attributed to the rich innervation. 相似文献
19.
A complementary DNA encoding a new bovine tryptase isoform (here named BLT) was cloned and sequenced from lung tissue. Analysis of sequence indicates the presence of a 26-amino acid prepro-sequence and a 245 amino acid catalytic domain. It contains six different residues when compared with the previously characterized tryptase from bovine liver capsule (BLCT), with the most significant difference residing at the primary specificity S1 pocket. In BLT, the canonical residues Asp-Ser are present at positions 188-189, while in BLCT these positions are occupied by residues Asn-Phe. This finding was confirmed by mass fingerprinting of the peptide mixture obtained upon in-gel tryptic digestion of BLT. Analysis by gel filtration of the purified protein shows that BLT is probably tetrameric, similar to the previously identified tryptases from other species, with monomer migrating as 35-40 kDa multiple bands in SDS/PAGE. As expected, the catalytic abilities of the two bovine tryptases are different. The specificity constant values (kcat/Km) assayed with model substrates are 10- to 60-fold higher in the case of BLT. The tissue-specific expression of the two tryptases was evaluated at the RNA level by analysis of their different restriction patterns. In lung, only BLT was found to be expressed, while in liver capsule only BLCT is present. Both isoforms are distributed in similar amounts in heart and spleen. Analysis of the two gene sequences reveals the presence of several recognition sequences in the promoter regions and suggest a role for hormones in governing the mechanism of tissue expression of bovine tryptases. 相似文献
20.
OBJECTIVE: The potential value of haplotypes has attracted widespread interest in the mapping of complex traits. Haplotype sharing methods take the linkage disequilibrium information between multiple markers into account, and may have good power to detect predisposing genes. We present a new approach based on Mantel statistics for spacetime clustering, which is developed in order to improve the power of haplotype sharing analysis for gene mapping in complex disease. METHODS: The new statistic correlates genetic similarity and phenotypic similarity across pairs of haplotypes for case-only and case-control studies. The genetic similarity is measured as the shared length between haplotypes around a putative disease locus. The phenotypic similarity is measured as the mean-corrected cross-product based on the respective phenotypes. We analyzed two tests for statistical significance with respect to type I error: (1) assuming asymptotic normality, and (2) using a Monte Carlo permutation procedure. The results were compared to the chi(2) test for association based on 3-marker haplotypes. RESULTS: The results of the type I error rates for the Mantel statistics using the permutational procedure yielded pointwise valid tests. The approach based on the assumption of asymptotic normality was seriously liberal. CONCLUSION: Power comparisons showed that the Mantel statistics were better than or equal to the chi(2) test for all simulated disease models. 相似文献
|