共查询到20条相似文献,搜索用时 545 毫秒
1.
Jonah D. Hocum Logan R. Battrell Ryan Maynard Jennifer E. Adair Brian C. Beard David J. Rawlings Hans-Peter Kiem Daniel G. Miller Grant D. Trobridge 《BMC bioinformatics》2015,16(1)
Background
Analyzing the integration profile of retroviral vectors is a vital step in determining their potential genotoxic effects and developing safer vectors for therapeutic use. Identifying retroviral vector integration sites is also important for retroviral mutagenesis screens.Results
We developed VISA, a vector integration site analysis server, to analyze next-generation sequencing data for retroviral vector integration sites. Sequence reads that contain a provirus are mapped to the human genome, sequence reads that cannot be localized to a unique location in the genome are filtered out, and then unique retroviral vector integration sites are determined based on the alignment scores of the remaining sequence reads.Conclusions
VISA offers a simple web interface to upload sequence files and results are returned in a concise tabular format to allow rapid analysis of retroviral vector integration sites.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0653-6) contains supplementary material, which is available to authorized users. 相似文献2.
3.
4.
Background
The CCCTC-binding factor (CTCF) is a highly conserved insulator protein that plays various roles in many cellular processes. CTCF is one of the main architecture proteins in higher eukaryotes, and in combination with other architecture proteins and regulators, also shapes the three-dimensional organization of a genome. Experiments show CTCF partially remains associated with chromatin during mitosis. However, the role of CTCF in the maintenance and propagation of genome architectures throughout the cell cycle remains elusive.Results
We performed a comprehensive bioinformatics analysis on public datasets of Drosophila CTCF (dCTCF). We characterized dCTCF-binding sites according to their occupancy status during the cell cycle, and identified three classes: interphase-mitosis-common (IM), interphase-only (IO) and mitosis-only (MO) sites. Integrated function analysis showed dCTCF-binding sites of different classes might be involved in different biological processes, and IM sites were more conserved and more intensely bound. dCTCF-binding sites of the same class preferentially localized closer to each other, and were highly enriched at chromatin syntenic and topologically associating domains boundaries.Conclusions
Our results revealed different functions of dCTCF during the cell cycle and suggested that dCTCF might contribute to the establishment of the three-dimensional architecture of the Drosophila genome by maintaining local chromatin compartments throughout the whole cell cycle.Electronic supplementary material
The online version of this article (doi:10.1186/s40659-015-0019-6) contains supplementary material, which is available to authorized users. 相似文献5.
Background
Transposable elements constitute an important part of the genome and are essential in adaptive mechanisms. Transposition events associated with phenotypic changes occur naturally or are induced in insertional mutant populations. Transposon mutagenesis results in multiple random insertions and recovery of most/all the insertions is critical for forward genetics study. Using genome next-generation sequencing data and appropriate bioinformatics tool, it is plausible to accurately identify transposon insertion sites, which could provide candidate causal mutations for desired phenotypes for further functional validation.Results
We developed a novel bioinformatics tool, ITIS (Identification of Transposon Insertion Sites), for localizing transposon insertion sites within a genome. It takes next-generation genome re-sequencing data (NGS data), transposon sequence, and reference genome sequence as input, and generates a list of highly reliable candidate insertion sites as well as zygosity information of each insertion. Using a simulated dataset and a case study based on an insertional mutant line from Medicago truncatula, we showed that ITIS performed better in terms of sensitivity and specificity than other similar algorithms such as RelocaTE, RetroSeq, TEMP and TIF. With the case study data, we demonstrated the efficiency of ITIS by validating the presence and zygosity of predicted insertion sites of the Tnt1 transposon within a complex plant system, M. truncatula.Conclusion
This study showed that ITIS is a robust and powerful tool for forward genetic studies in identifying transposable element insertions causing phenotypes. ITIS is suitable in various systems such as cell culture, bacteria, yeast, insect, mammal and plant.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0507-2) contains supplementary material, which is available to authorized users. 相似文献6.
Beth L. Dumont 《BMC genomics》2015,16(1)
Background
Interlocus gene conversion (IGC) is a recombination-based mechanism that results in the unidirectional transfer of short stretches of sequence between paralogous loci. Although IGC is a well-established mechanism of human disease, the extent to which this mutagenic process has shaped overall patterns of segregating variation in multi-copy regions of the human genome remains unknown. One expected manifestation of IGC in population genomic data is the presence of one-to-one paralogous SNPs that segregate identical alleles.Results
Here, I use SNP genotype calls from the low-coverage phase 3 release of the 1000 Genomes Project to identify 15,790 parallel, shared SNPs in duplicated regions of the human genome. My approach for identifying these sites accounts for the potential redundancy of short read mapping in multi-copy genomic regions, thereby effectively eliminating false positive SNP calls arising from paralogous sequence variation. I demonstrate that independent mutation events to identical nucleotides at paralogous sites are not a significant source of shared polymorphisms in the human genome, consistent with the interpretation that these sites are the outcome of historical IGC events. These putative signals of IGC are enriched in genomic contexts previously associated with non-allelic homologous recombination, including clear signals in gene families that form tandem intra-chromosomal clusters.Conclusions
Taken together, my analyses implicate IGC, not point mutation, as the mechanism generating at least 2.7 % of single nucleotide variants in duplicated regions of the human genome.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1681-3) contains supplementary material, which is available to authorized users. 相似文献7.
8.
9.
James Murphy Jochen Klumpp Jennifer Mahony Mary O’Connell-Motherway Arjen Nauta Douwe van Sinderen 《BMC genomics》2014,15(1)
Background
So-called 936-type phages are among the most frequently isolated phages in dairy facilities utilising Lactococcus lactis starter cultures. Despite extensive efforts to control phage proliferation and decades of research, these phages continue to negatively impact cheese production in terms of the final product quality and consequently, monetary return.Results
Whole genome sequencing and in silico analysis of three 936-type phage genomes identified several putative (orphan) methyltransferase (MTase)-encoding genes located within the packaging and replication regions of the genome. Utilising SMRT sequencing, methylome analysis was performed on all three phages, allowing the identification of adenine modifications consistent with N-6 methyladenine sequence methylation, which in some cases could be attributed to these phage-encoded MTases. Heterologous gene expression revealed that M.Phi145I/M.Phi93I and M.Phi93DAM, encoded by genes located within the packaging module, provide protection against the restriction enzymes HphI and DpnII, respectively, representing the first functional MTases identified in members of 936-type phages.Conclusions
SMRT sequencing technology enabled the identification of the target motifs of MTases encoded by the genomes of three lytic 936-type phages and these MTases represent the first functional MTases identified in this species of phage. The presence of these MTase-encoding genes on 936-type phage genomes is assumed to represent an adaptive response to circumvent host encoded restriction-modification systems thereby increasing the fitness of the phages in a dynamic dairy environment.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-831) contains supplementary material, which is available to authorized users. 相似文献10.
11.
12.
Ja-Rang Lee Chang Pyo Hong Jae-Woo Moon Yi-Deun Jung Dae-Soo Kim Tae-Hyung Kim Jeong-An Gim Jin-Han Bae Yuri Choi Jungwoo Eo Yun-Jeong Kwon Sanghoon Song Junsu Ko Young Mok Yang Hak-Kyo Lee Kyung-Do Park Kung Ahn Kyoung-Tag Do Hong-Seok Ha Kyudong Han Joo Mi Yi Hee-Jae Cha Byung-Wook Cho Jong Bhak Heui-Soo Kim 《BMC genomics》2014,15(1)
13.
Background
The correct taxonomic assignment of bacterial genomes is a primary and challenging task. With the availability of whole genome sequences, the gene content based approaches appear promising in inferring the bacterial taxonomy. The complete genome sequencing of a bacterial genome often reveals a substantial number of unique genes present only in that genome which can be used for its taxonomic classification.Results
In this study, we have proposed a comprehensive method which uses the taxon-specific genes for the correct taxonomic assignment of existing and new bacterial genomes. The taxon-specific genes identified at each taxonomic rank have been successfully used for the taxonomic classification of 2,342 genomes present in the NCBI genomes, 36 newly sequenced genomes, and 17 genomes for which the complete taxonomy is not yet known. This approach has been implemented for the development of a tool ‘Microtaxi’ which can be used for the taxonomic assignment of complete bacterial genomes.Conclusion
The taxon-specific gene based approach provides an alternate valuable methodology to carry out the taxonomic classification of newly sequenced or existing bacterial genomes.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1542-0) contains supplementary material, which is available to authorized users. 相似文献14.
Dario Copetti Jianwei Zhang Moaine El Baidouri Dongying Gao Jun Wang Elena Barghini Rosa M. Cossu Angelina Angelova Carlos E. Maldonado L. Stefan Roffler Hajime Ohyanagi Thomas Wicker Chuanzhu Fan Andrea Zuccolo Mingsheng Chen Antonio Costa de Oliveira Bin Han Robert Henry Yue-ie Hsing Nori Kurata Wen Wang Scott A. Jackson Olivier Panaud Rod A. Wing 《BMC genomics》2015,16(1)
Background
Comparative evolutionary analysis of whole genomes requires not only accurate annotation of gene space, but also proper annotation of the repetitive fraction which is often the largest component of most if not all genomes larger than 50 kb in size.Results
Here we present the Rice TE database (RiTE-db) - a genus-wide collection of transposable elements and repeated sequences across 11 diploid species of the genus Oryza and the closely-related out-group Leersia perrieri. The database consists of more than 170,000 entries divided into three main types: (i) a classified and curated set of publicly-available repeated sequences, (ii) a set of consensus assemblies of highly-repetitive sequences obtained from genome sequencing surveys of 12 species; and (iii) a set of full-length TEs, identified and extracted from 12 whole genome assemblies.Conclusions
This is the first report of a repeat dataset that spans the majority of repeat variability within an entire genus, and one that includes complete elements as well as unassembled repeats. The database allows sequence browsing, downloading, and similarity searches. Because of the strategy adopted, the RiTE-db opens a new path to unprecedented direct comparative studies that span the entire nuclear repeat content of 15 million years of Oryza diversity.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1762-3) contains supplementary material, which is available to authorized users. 相似文献15.
Background
Next-generation sequencing technologies are rapidly generating whole-genome datasets for an increasing number of organisms. However, phylogenetic reconstruction of genomic data remains difficult because de novo assembly for non-model genomes and multi-genome alignment are challenging.Results
To greatly simplify the analysis, we present an Assembly and Alignment-Free (AAF) method (https://sourceforge.net/projects/aaf-phylogeny) that constructs phylogenies directly from unassembled genome sequence data, bypassing both genome assembly and alignment. Using mathematical calculations, models of sequence evolution, and simulated sequencing of published genomes, we address both evolutionary and sampling issues caused by direct reconstruction, including homoplasy, sequencing errors, and incomplete sequencing coverage. From these results, we calculate the statistical properties of the pairwise distances between genomes, allowing us to optimize parameter selection and perform bootstrapping. As a test case with real data, we successfully reconstructed the phylogeny of 12 mammals using raw sequencing reads. We also applied AAF to 21 tropical tree genome datasets with low coverage to demonstrate its effectiveness on non-model organisms.Conclusion
Our AAF method opens up phylogenomics for species without an appropriate reference genome or high sequence coverage, and rapidly creates a phylogenetic framework for further analysis of genome structure and diversity among non-model organisms.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1647-5) contains supplementary material, which is available to authorized users. 相似文献16.
Anuj Srivastava Vivek M Philip Ian Greenstein Lucy B Rowe Mary Barter Cathleen Lutz Laura G Reinholdt 《BMC genomics》2014,15(1)
Background
Transgenesis by random integration of a transgene into the genome of a zygote has become a reliable and powerful method for the creation of new mouse strains that express exogenous genes, including human disease genes, tissue specific reporter genes or genes that allow for tissue specific recombination. Nearly 6,500 transgenic alleles have been created by random integration in embryos over the last 30 years, but for the vast majority of these strains, the transgene insertion sites remain uncharacterized.Results
To obtain a complete understanding of how insertion sites might contribute to phenotypic outcomes, to more cost effectively manage transgenic strains, and to fully understand mechanisms of instability in transgene expression, we’ve developed methodology and a scoring scheme for transgene insertion site discovery using high throughput sequencing data.Conclusions
Similar to other molecular approaches to transgene insertion site discovery, high-throughput sequencing of standard paired-end libraries is hindered by low signal to noise ratios. This problem is exacerbated when the transgene consists of sequences that are also present in the host genome. We’ve found that high throughput sequencing data from mate-pair libraries are more informative when compared to data from standard paired end libraries. We also show examples of the genomic regions that harbor transgenes, which have in common a preponderance of repetitive sequences.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-367) contains supplementary material, which is available to authorized users. 相似文献17.
18.
19.
Lisa Klasson Nikhil Kumar Robin Bromley Karsten Sieber Melissa Flowers Sandra H Ott Luke J Tallon Siv G E Andersson Julie C Dunning Hotopp 《BMC genomics》2014,15(1)