首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
The availability of genomic sequences of many organisms has opened new challenges in many aspects particularly in terms of genome analysis. Sequence extraction is a vital step and many tools have been developed to solve this issue. These tools are available publically but have limitations with reference to the sequence extraction, length of the sequence to be extracted, organism specificity and lack of user friendly interface. We have developed a java based software package having three modules which can be used independently or sequentially. The tool efficiently extracts sequences from large datasets with few simple steps. It can efficiently extract multiple sequences of any desired length from a genome of any organism. The results are crosschecked by published data.

Availability

URL 1: http://ww3.comsats.edu.pk/bio/ResearchProjects.aspxURL 2: http://ww3.comsats.edu.pk/bio/SequenceManeuverer.aspx  相似文献   

3.
4.
The advent of next-generation sequencing technologies has greatly promoted the field of metagenomics which studies genetic material recovered directly from an environment. Characterization of genomic composition of a metagenomic sample is essential for understanding the structure of the microbial community. Multiple genomes contained in a metagenomic sample can be identified and quantitated through homology searches of sequence reads with known sequences catalogued in reference databases. Traditionally, reads with multiple genomic hits are assigned to non-specific or high ranks of the taxonomy tree, thereby impacting on accurate estimates of relative abundance of multiple genomes present in a sample. Instead of assigning reads one by one to the taxonomy tree as many existing methods do, we propose a statistical framework to model the identified candidate genomes to which sequence reads have hits. After obtaining the estimated proportion of reads generated by each genome, sequence reads are assigned to the candidate genomes and the taxonomy tree based on the estimated probability by taking into account both sequence alignment scores and estimated genome abundance. The proposed method is comprehensively tested on both simulated datasets and two real datasets. It assigns reads to the low taxonomic ranks very accurately. Our statistical approach of taxonomic assignment of metagenomic reads, TAMER, is implemented in R and available at http://faculty.wcas.northwestern.edu/hji403/MetaR.htm.  相似文献   

5.
6.
Rapid development of next generation sequencing technology has enabled the identification of genomic alterations from short sequencing reads. There are a number of software pipelines available for calling single nucleotide variants from genomic DNA but, no comprehensive pipelines to identify, annotate and prioritize expressed SNVs (eSNVs) from non-directional paired-end RNA-Seq data. We have developed the eSNV-Detect, a novel computational system, which utilizes data from multiple aligners to call, even at low read depths, and rank variants from RNA-Seq. Multi-platform comparisons with the eSNV-Detect variant candidates were performed. The method was first applied to RNA-Seq from a lymphoblastoid cell-line, achieving 99.7% precision and 91.0% sensitivity in the expressed SNPs for the matching HumanOmni2.5 BeadChip data. Comparison of RNA-Seq eSNV candidates from 25 ER+ breast tumors from The Cancer Genome Atlas (TCGA) project with whole exome coding data showed 90.6–96.8% precision and 91.6–95.7% sensitivity. Contrasting single-cell mRNA-Seq variants with matching traditional multicellular RNA-Seq data for the MD-MB231 breast cancer cell-line delineated variant heterogeneity among the single-cells. Further, Sanger sequencing validation was performed for an ER+ breast tumor with paired normal adjacent tissue validating 29 out of 31 candidate eSNVs. The source code and user manuals of the eSNV-Detect pipeline for Sun Grid Engine and virtual machine are available at http://bioinformaticstools.mayo.edu/research/esnv-detect/.  相似文献   

7.
Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD). BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/.  相似文献   

8.
9.

Background

Personal genome assembly is a critical process when studying tumor genomes and other highly divergent sequences. The accuracy of downstream analyses, such as RNA-seq and ChIP-seq, can be greatly enhanced by using personal genomic sequences rather than standard references. Unfortunately, reads sequenced from these types of samples often have a heterogeneous mix of various subpopulations with different variants, making assembly extremely difficult using existing assembly tools. To address these challenges, we developed SHEAR (Sample Heterogeneity Estimation and Assembly by Reference; http://vk.cs.umn.edu/SHEAR), a tool that predicts SVs, accounts for heterogeneous variants by estimating their representative percentages, and generates personal genomic sequences to be used for downstream analysis.

Results

By making use of structural variant detection algorithms, SHEAR offers improved performance in the form of a stronger ability to handle difficult structural variant types and better computational efficiency. We compare against the lead competing approach using a variety of simulated scenarios as well as real tumor cell line data with known heterogeneous variants. SHEAR is shown to successfully estimate heterogeneity percentages in both cases, and demonstrates an improved efficiency and better ability to handle tandem duplications.

Conclusion

SHEAR allows for accurate and efficient SV detection and personal genomic sequence generation. It is also able to account for heterogeneous sequencing samples, such as from tumor tissue, by estimating the subpopulation percentage for each heterogeneous variant.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-84) contains supplementary material, which is available to authorized users.  相似文献   

10.
High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a ‘variants reduction’ protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.  相似文献   

11.
12.
13.
Tumor samples are typically heterogeneous, containing admixture by normal, non-cancerous cells and one or more subpopulations of cancerous cells. Whole-genome sequencing of a tumor sample yields reads from this mixture, but does not directly reveal the cell of origin for each read. We introduce THetA (Tumor Heterogeneity Analysis), an algorithm that infers the most likely collection of genomes and their proportions in a sample, for the case where copy number aberrations distinguish subpopulations. THetA successfully estimates normal admixture and recovers clonal and subclonal copy number aberrations in real and simulated sequencing data. THetA is available at http://compbio.cs.brown.edu/software/.  相似文献   

14.
TP Lu  CY Lee  MH Tsai  YC Chiu  CK Hsiao  LC Lai  EY Chuang 《PloS one》2012,7(8):e42390

Background

Many prediction tools for microRNA (miRNA) targets have been developed, but inconsistent predictions were observed across multiple algorithms, which can make further analysis difficult. Moreover, the nomenclature of human miRNAs changes rapidly. To address these issues, we developed a web-based system, miRSystem, for converting queried miRNAs to the latest annotation and predicting the function of miRNA by integrating miRNA target gene prediction and function/pathway analyses.

Results

First, queried miRNA IDs were converted to the latest annotated version to prevent potential conflicts resulting from multiple aliases. Next, by combining seven algorithms and two validated databases, potential gene targets of miRNAs and their functions were predicted based on the consistency across independent algorithms and observed/expected ratios. Lastly, five pathway databases were included to characterize the enriched pathways of target genes through bootstrap approaches. Based on the enriched pathways of target genes, the functions of queried miRNAs could be predicted.

Conclusions

MiRSystem is a user-friendly tool for predicting the target genes and their associated pathways for many miRNAs simultaneously. The web server and the documentation are freely available at http://mirsystem.cgm.ntu.edu.tw/.  相似文献   

15.

Background

Copy number variants (CNVs), including deletions, amplifications, and other rearrangements, are common in human and cancer genomes. Copy number data from array comparative genome hybridization (aCGH) and next-generation DNA sequencing is widely used to measure copy number variants. Comparison of copy number data from multiple individuals reveals recurrent variants. Typically, the interior of a recurrent CNV is examined for genes or other loci associated with a phenotype. However, in some cases, such as gene truncations and fusion genes, the target of variant lies at the boundary of the variant.

Results

We introduce Neighborhood Breakpoint Conservation (NBC), an algorithm for identifying rearrangement breakpoints that are highly conserved at the same locus in multiple individuals. NBC detects recurrent breakpoints at varying levels of resolution, including breakpoints whose location is exactly conserved and breakpoints whose location varies within a gene. NBC also identifies pairs of recurrent breakpoints such as those that result from fusion genes. We apply NBC to aCGH data from 36 primary prostate tumors and identify 12 novel rearrangements, one of which is the well-known TMPRSS2-ERG fusion gene. We also apply NBC to 227 glioblastoma tumors and predict 93 novel rearrangements which we further classify as gene truncations, germline structural variants, and fusion genes. A number of these variants involve the protein phosphatase PTPN12 suggesting that deregulation of PTPN12, via a variety of rearrangements, is common in glioblastoma.

Conclusions

We demonstrate that NBC is useful for detection of recurrent breakpoints resulting from copy number variants or other structural variants, and in particular identifies recurrent breakpoints that result in gene truncations or fusion genes. Software is available at http://http.//cs.brown.edu/people/braphael/software.html.  相似文献   

16.
Background to the debate: Several studies have found disparities in the outcome of medical procedures across different hospitals—better outcomes have been associated with higher procedure volume. An Institute of Medicine workshop found such a “volume–outcome relationship” for two types of cancer surgery: resection of the pancreas and esophagus (http://www.iom.edu/?id=31508). This debate examines whether physicians have an ethical obligation to inform patients of hospital outcome disparities for these cancers.  相似文献   

17.
We performed a genome-level computational study of sequence and structure similarity, the latter using crystal structures and models, of the proteases of Homo sapiens and the human parasite Trypanosoma brucei. Using sequence and structure similarity networks to summarize the results, we constructed global views that show visually the relative abundance and variety of proteases in the degradome landscapes of these two species, and provide insights into evolutionary relationships between proteases. The results also indicate how broadly these sequence sets are covered by three-dimensional structures. These views facilitate cross-species comparisons and offer clues for drug design from knowledge about the sequences and structures of potential drug targets and their homologs. Two protease groups (“M32” and “C51”) that are very different in sequence from human proteases are examined in structural detail, illustrating the application of this global approach in mining new pathogen genomes for potential drug targets. Based on our analyses, a human ACE2 inhibitor was selected for experimental testing on one of these parasite proteases, TbM32, and was shown to inhibit it. These sequence and structure data, along with interactive versions of the protein similarity networks generated in this study, are available at http://babbittlab.ucsf.edu/resources.html.  相似文献   

18.
19.
Repetitive sequences are biologically and clinically important because they can influence traits and disease, but repeats are challenging to analyse using short-read sequencing technology. We present a tool for genotyping microsatellite repeats called RepeatSeq, which uses Bayesian model selection guided by an empirically derived error model that incorporates sequence and read properties. Next, we apply RepeatSeq to high-coverage genomes from the 1000 Genomes Project to evaluate performance and accuracy. The software uses common formats, such as VCF, for compatibility with existing genome analysis pipelines. Source code and binaries are available at http://github.com/adaptivegenome/repeatseq.  相似文献   

20.
The innate immune system is an ancient component of host defense. Since innate immunity pathways are well conserved throughout many eukaryotes, immune genes in model animals can be used to putatively identify homologous genes in newly sequenced genomes of non-model organisms. With the initiation of the “i5k” project, which aims to sequence 5,000 insect genomes by 2016, many novel insect genomes will soon become publicly available, yet few annotation resources are currently available for insects. Thus, we developed an online tool called the Insect Innate Immunity Database (IIID) to provide an open access resource for insect immunity and comparative biology research (http://www.vanderbilt.edu/IIID). The database provides users with simple exploratory tools to search the immune repertoires of five insect models (including Nasonia), spanning three orders, for specific immunity genes or genes within a particular immunity pathway. As a proof of principle, we used an initial database with only four insect models to annotate potential immune genes in the parasitoid wasp genus Nasonia. Results specify 306 putative immune genes in the genomes of N. vitripennis and its two sister species N. giraulti and N. longicornis. Of these genes, 146 were not found in previous annotations of Nasonia immunity genes. Combining these newly identified immune genes with those in previous annotations, Nasonia possess 489 putative immunity genes, the largest immune repertoire found in insects to date. While these computational predictions need to be complemented with functional studies, the IIID database can help initiate and augment annotations of the immune system in the plethora of insect genomes that will soon become available.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号