首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The rapid development of high-throughput sequencing technologies has led to a dramatic decrease in the money and time required for de novo genome sequencing or genome resequencing projects, with new genome sequences constantly released every week. Among such projects, the plethora of updated genome assemblies induces the requirement of version-dependent annotation files and other compatible public dataset for downstream analysis. To handle these tasks in an efficient manner, we developed the reference-based genome assembly and annotation tool (RGAAT), a flexible toolkit for resequencing-based consensus building and annotation update. RGAAT can detect sequence variants with comparable precision, specificity, and sensitivity to GATK and with higher precision and specificity than Freebayes and SAMtools on four DNA-seq datasets tested in this study. RGAAT can also identify sequence variants based on cross-cultivar or cross-version genomic alignments. Unlike GATK and SAMtools/BCFtools, RGAAT builds the consensus sequence by taking into account the true allele frequency. Finally, RGAAT generates a coordinate conversion file between the reference and query genomes using sequence variants and supports annotation file transfer. Compared to the rapid annotation transfer tool (RATT), RGAAT displays better performance characteristics for annotation transfer between different genome assemblies, strains, and species. In addition, RGAAT can be used for genome modification, genome comparison, and coordinate conversion. RGAAT is available at https://sourceforge.net/projects/rgaat/ and https://github.com/wushyer/RGAAT_v2 at no cost.  相似文献   

2.
The dynamic activity of transposable elements (TEs) contributes to the vast diversity of genome size and architecture among plants. Here, we examined the genomic distribution and transposition activity of long terminal repeat retrotransposons (LTR-RTs) in Arabidopsis thaliana (Ath) and three of its relatives, Arabidopsis lyrata (Aly), Eutrema salsugineum (Esa), and Schrenkiella parvula (Spa), in Brassicaceae. Our analyses revealed the distinct evolutionary dynamics of Gypsy retrotransposons, which reflects the different patterns of genome size changes of the four species over the past million years. The rate of Gypsy transposition in Aly is approximately five times more rapid than that of Ath and Esa, suggesting an expanding Aly genome. Gypsy insertions in Esa are strictly confined to pericentromeric heterochromatin and associated with dramatic centromere expansion. In contrast, Gypsy insertions in Spa have been largely suppressed over the last million years, likely as a result of a combination of an inherent molecular mechanism of preferential DNA removal and purifying selection at Gypsy elements. Additionally, species-specific clades of Gypsy elements shaped the distinct genome architectures of Aly and Esa.  相似文献   

3.
Genome reannotation aims for complete and accurate characterization of gene models and thus is of critical significance for in-depth exploration of gene function. Although the availability of massive RNA-seq data provides great opportunities for gene model refinement, few efforts have been made to adopt these precious data in rice genome reannotation. Here we reannotate the rice (Oryza sativa L. ssp. japonica) genome based on integration of large-scale RNA-seq data and release a new annotation system IC4R-2.0. In general, IC4R-2.0 significantly improves the completeness of gene structure, identifies a number of novel genes, and integrates a variety of functional annotations. Furthermore, long non-coding RNAs (lncRNAs) and circular RNAs (circRNAs) are systematically characterized in the rice genome. Performance evaluation shows that compared to previous annotation systems, IC4R-2.0 achieves higher integrity and quality, primarily attributable to massive RNA-seq data applied in genome annotation. Consequently, we incorporate the improved annotations into the Information Commons for Rice (IC4R), a database integrating multiple omics data of rice, and accordingly update IC4R by providing more user-friendly web interfaces and implementing a series of practical online tools. Together, the updated IC4R, which is equipped with the improved annotations, bears great promise for comparative and functional genomic studies in rice and other monocotyledonous species. The IC4R-2.0 annotation system and related resources are freely accessible at http://ic4r.org/.  相似文献   

4.
Mutated genes are rarely common even in the same pathological type between cancer patients and as such, it has been very challenging to interpret genome sequencing data and difficult to predict clinical outcomes. PIK3CA is one of a few genes whose mutations are relatively popular in tumors. For example, more than 46.6% of luminal-A breast cancer samples have PIK3CA mutated, whereas only 35.5% of all breast cancer samples contain PIK3CA mutations. To understand the function of PIK3CA mutations in luminal A breast cancer, we applied our recently-proposed Cancer Hallmark Network Framework to investigate the network motifs in the PIK3CA-mutated luminal A tumors. We found that more than 70% of the PIK3CA-mutated luminal A tumors contain a positive regulatory loop where a master regulator (PDGF-D), a second regulator (FLT1) and an output node (SHC1) work together. Importantly, we found the luminal A breast cancer patients harboring the PIK3CA mutation and this positive regulatory loop in their tumors have significantly longer survival than those harboring PIK3CA mutation only in their tumors. These findings suggest that the underlying molecular mechanism of PIK3CA mutations in luminal A patients can participate in a positive regulatory loop, and furthermore the positive regulatory loop (PDGF-D/FLT1/SHC1) has a predictive power for the survival of the PIK3CA-mutated luminal A patients.  相似文献   

5.
6.
We offer a guide to de novo genome assembly1 using sequence data generated by the Illumina platform for biologists working with fungi or other organisms whose genomes are less than 100 Mb in size. The guide requires no familiarity with sequencing assembly technology or associated computer programs. It defines commonly used terms in genome sequencing and assembly; provides examples of assembling short-read genome sequence data for four strains of the fungus Grosmannia clavigera using four assembly programs; gives examples of protocols and software; and presents a commented flowchart that extends from DNA preparation for submission to a sequencing center, through to processing and assembly of the raw sequence reads using freely available operating systems and software.  相似文献   

7.
The Genome Warehouse (GWH) is a public repository housing genome assembly data for a wide range of species and delivering a series of web services for genome data submission, storage, release, and sharing. As one of the core resources in the National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB; https://ngdc.cncb.ac.cn), GWH accepts both full and partial (chloroplast, mitochondrion, and plasmid) genome sequences with different assembly levels, as well as an update of existing genome assemblies. For each assembly, GWH collects detailed genome-related metadata of biological project, biological sample, and genome assembly, in addition to genome sequence and annotation. To archive high-quality genome sequences and annotations, GWH is equipped with a uniform and standardized procedure for quality control. Besides basic browse and search functionalities, all released genome sequences and annotations can be visualized with JBrowse. By May 21, 2021, GWH has received 19,124 direct submissions covering a diversity of 1108 species and has released 8772 of them. Collectively, GWH serves as an important resource for genome-scale data management and provides free and publicly accessible data to support research activities throughout the world. GWH is publicly accessible at https://ngdc.cncb.ac.cn/gwh.  相似文献   

8.
Alfalfa(Medicago sativa L.) is the most important legume forage crop worldwide with high nutritional value and yield.For a long time,the breeding of alfalfa was hampered by lacking reliable information on the autotetraploid genome and molecular markers linked to important agronomic traits.We herein reported the de novo assembly of the allele-aware chromosome-level genome of Zhongmu-4,a cultivar widely cultivated in China,and a comprehensive database of genomic variations based on resequencing of...  相似文献   

9.
The oral cavity of each person is home to hundreds of bacterial species. While taxa for oral diseases have been studied using culture-based characterization as well as amplicon sequencing,metagenomic and genomic information remains scarce compared to the fecal microbiome. Here,using metagenomic shotgun data for 3346 oral metagenomic samples together with 808 published samples, we obtain 56,213 metagenome-assembled genomes(MAGs), and more than 64% of the3589 species-level genome bins(SGBs) contai...  相似文献   

10.
Wild castor grows in the high-altitude tropical desert of the African Plateau,a region known for high ultraviolet radiation,strong light,and extremely dry condition.To investigate the potential genetic basis of adaptation to both highland and tropical deserts,we generated a chromosome-level genome sequence assembly of the wild castor accession WT05,with a genome size of 316 Mb,a scaffold N50 of 31.93 Mb,and a contig N50 of 8.96 Mb,respectively.Compared with cultivated castor and other Euphorbiac...  相似文献   

11.
12.
Ceratocystis fimbriata sensu lato represents a complex of cryptic and commonly plant pathogenic species that are morphologically similar. Species in this complex have been described using morphological characteristics, intersterility tests and phylogenetics. Microsatellite markers have been useful to study the population structure and origin of some species in the complex. In this study we sequenced the genome of C. fimbriata. This provided an opportunity to mine the genome for microsatellites, to develop new microsatellite markers, and map previously developed markers onto the genome. Over 6000 microsatellites were identified in the genome and their abundance and distribution was determined. Ceratocystis fimbriata has a medium level of microsatellite density and slightly smaller genome when compared with other fungi for which similar microsatellite analyses have been performed. This is the first report of a microsatellite analysis conducted on a genome sequence of a fungal species in the order Microascales. Forty-seven microsatellite markers have been published for population genetic studies, of which 35 could be mapped onto the C. fimbriata genome sequence. We developed an additional ten microsatellite markers within putative genes to differentiate between species in the C. fimbriata s.l. complex. These markers were used to distinguish between 12 species in the complex.  相似文献   

13.

Background

With the development of several new technologies using synthetic biology, it is possible to engineer genetically intractable organisms including Mycoplasma mycoides subspecies capri (Mmc), by cloning the intact bacterial genome in yeast, using the host yeast’s genetic tools to modify the cloned genome, and subsequently transplanting the modified genome into a recipient cell to obtain mutant cells encoded by the modified genome. The recently described tandem repeat coupled with endonuclease cleavage (TREC) method has been successfully used to generate seamless deletions and point mutations in the mycoplasma genome using the yeast DNA repair machinery. But, attempts to knock-in genes in some cases have encountered a high background of transformation due to maintenance of unwanted circularization of the transforming DNA, which contains possible autonomously replicating sequence (ARS) activity. To overcome this issue, we incorporated a split marker system into the TREC method, enabling seamless gene knock-in with high efficiency. The modified method is called TREC-assisted gene knock-in (TREC-IN). Since a gene to be knocked-in is delivered by a truncated non-functional marker, the background caused by an incomplete integration is essentially eliminated.

Results

In this paper, we demonstrate applications of the TREC-IN method in gene complementation and genome minimization studies in Mmc. In the first example, the Mmc dnaA gene was seamlessly replaced by an orthologous gene, which shares a high degree of identity at the nucleotide level with the original Mmc gene, with high efficiency and low background. In the minimization example, we replaced an essential gene back into the genome that was present in the middle of a cluster of non-essential genes, while deleting the non-essential gene cluster, again with low backgrounds of transformation and high efficiency.

Conclusion

Although we have demonstrated the feasibility of TREC-IN in gene complementation and genome minimization studies in Mmc, the applicability of TREC-IN ranges widely. This method proves to be a valuable genetic tool that can be extended for genomic engineering in other genetically intractable organisms, where it may be implemented in elucidating specific metabolic pathways and in rationale vaccine design.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1180) contains supplementary material, which is available to authorized users.  相似文献   

14.
The Synthetic Yeast Genome Project (Sc2.0) aims to build 16 designer yeast chromosomes and combine them into a single yeast cell. To date one synthetic chromosome, synIII1, and one synthetic chromosome arm, synIXR2, have been constructed and their in vivo function validated in the absence of the corresponding wild type chromosomes. An important design feature of Sc2.0 chromosomes is the introduction of PCRTags, which are short, re-coded sequences within open reading frames (ORFs) that enable differentiation of synthetic chromosomes from their wild type counterparts. PCRTag primers anneal selectively to either synthetic or wild type chromosomes and the presence/absence of each type of DNA can be tested using a simple PCR assay. The standard readout of the PCRTag assay is to assess presence/absence of amplicons by agarose gel electrophoresis. However, with an average PCRTag amplicon density of one per 1.5 kb and a genome size of ~12 Mb, the completed Sc2.0 genome will encode roughly 8,000 PCRTags. To improve throughput, we have developed a real time PCR-based detection assay for PCRTag genotyping that we call qPCRTag analysis. The workflow specifies 500 nl reactions in a 1,536 multiwell plate, allowing us to test up to 768 PCRTags with both synthetic and wild type primer pairs in a single experiment.  相似文献   

15.
16.
Artemia is an industrially important genus used in aquaculture as a nutritious diet for fish and as an aquatic model organism for toxicity tests. However, despite the significance of Artemia, genomic research remains incomplete and knowledge on its genomic characteristics is insufficient. In particular, Artemia franciscana of North America has been widely used in fisheries of other continents, resulting in invasion of native species. Therefore, studies on population genetics and molecular marker development as well as morphological analyses are required to investigate its population structure and to discriminate closely related species. Here, we used the Illumina Hi-Seq platform to estimate the genomic characteristics of A. franciscana through genome survey sequencing (GSS). Further, simple sequence repeat (SSR) loci were identified for microsatellite marker development. The predicted genome size was ∼867 Mb using K-mer (a sequence of k characters in a string) analysis (K = 17), and heterozygosity and duplication rates were 0.655 and 0.809%, respectively. A total of 421467 SSRs were identified from the genome survey assembly, most of which were dinucleotide motifs with a frequency of 77.22%. The present study will be a useful basis in genomic and genetic research for A. franciscana.  相似文献   

17.
The human liver fluke, Opisthorchis viverrini, has been categorized as a class one carcinogenic organism according to its strong association with cholangiocarcinoma, bile duct cancer which has high incidence in the northeast of Thailand. The lack of genome database of this parasite limited the studies aimed to understand the basic molecular biology of this carcinogenic liver fluke. The determination of the genome size is an initial step prior to the full genome sequencing. In this study, we applied an absolute quantitative real-time polymerase chain reaction for this aspect. Our results indicated the genome size of O. viverrini is 75.95 Mb or C value 0.083. The information of O. viverrini genome size is useful for estimation of sequence coverage and the cost of the parasite's whole genome sequencing using next-generation sequencing technologies.  相似文献   

18.
The genome sequence of Manduca sexta was recently determined using 454 technology. Cufflinks and MAKER2 were used to establish gene models in the genome assembly based on the RNA-Seq data and other species' sequences. Aided by the extensive RNA-Seq data from 50 tissue samples at various life stages, annotators over the world (including the present authors) have manually confirmed and improved a small percentage of the models after spending months of effort. While such collaborative efforts are highly commendable, many of the predicted genes still have problems which may hamper future research on this insect species. As a biochemical model representing lepidopteran pests, M. sexta has been used extensively to study insect physiological processes for over five decades. In this work, we assembled Manduca datasets Cufflinks 3.0, Trinity 4.0, and Oases 4.0 to assist the manual annotation efforts and development of Official Gene Set (OGS) 2.0. To further improve annotation quality, we developed methods to evaluate gene models in the MAKER2, Cufflinks, Oases and Trinity assemblies and selected the best ones to constitute MCOT 1.0 after thorough crosschecking. MCOT 1.0 has 18,089 genes encoding 31,666 proteins: 32.8% match OGS 2.0 models perfectly or near perfectly, 11,747 differ considerably, and 29.5% are absent in OGS 2.0. Future automation of this process is anticipated to greatly reduce human efforts in generating comprehensive, reliable models of structural genes in other genome projects where extensive RNA-Seq data are available.  相似文献   

19.
Proteins responsible for the integrity of the genome are often used targets in drug therapies against various diseases. The inhibitors of these proteins are also important to study the pathways in genome integrity maintenance. A prominent example is Ugi, a well known cross-species inhibitor protein of the enzyme uracil-DNA glycosylase, responsible for uracil excision from DNA. Here, we report that a Staphylococcus pathogenicity island repressor protein called StlSaPIbov1 (Stl) exhibits potent dUTPase inhibition in Mycobacteria. To our knowledge, this is the first indication of a cross-species inhibitor protein for any dUTPase. We demonstrate that the Staphylococcus aureus Stl and the Mycobacterium tuberculosis dUTPase form a stable complex and that in this complex, the enzymatic activity of dUTPase is strongly inhibited. We also found that the expression of the Stl protein in Mycobacterium smegmatis led to highly increased cellular dUTP levels in the mycobacterial cell, this effect being in agreement with its dUTPase inhibitory role. In addition, Stl expression in M. smegmatis drastically decreased colony forming ability, as well, indicating significant perturbation of the phenotype. Therefore, we propose that Stl can be considered to be a cross-species dUTPase inhibitor and may be used as an important reagent in dUTPase inhibition experiments either in vitro/in situ or in vivo.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号