首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Mining long noncoding RNA in livestock   总被引:2,自引:0,他引:2       下载免费PDF全文
  相似文献   

2.
3.
4.
Grapevine is an important perennial fruit to the wine industry, and has implications for the health industry with some causative agents proven to reduce heart disease. Since the sequencing and assembly of grapevine cultivar Pinot Noir, several studies have contributed to its genome annotation. This new study further contributes toward genome annotation efforts by conducting a proteogenomics analysis using the latest genome annotation from CRIBI, legacy proteomics dataset from cultivar Cabernet Sauvignon and a large RNA‐seq dataset. A total of 341 novel annotation events are identified consisting of five frame‐shifts, 37 translated UTRs, 15 exon boundaries, one novel splice, nine novel exons, 159 gene boundaries, 112 reverse strands, and one novel gene event in 213 genes and 323 proteins. From this proteogenomics evidence, the Augustus gene prediction tool predicted 52 novel and revised genes (54 protein isoforms), 11 genes of which are associated with key traits such as stress tolerance and floral and fruity wine characteristics. This study also highlights a likely over‐assembly with the genome, particularly on chromosome 7.  相似文献   

5.
As an increasing number of plant genome sequences become available, it is clear that gene content varies between individuals, and the challenge arises to predict the gene content of a species. However, genome comparison is often confounded by variation in assembly and annotation. Differentiating between true gene absence and variation in assembly or annotation is essential for the accurate identification of conserved and variable genes in a species. Here, we present the de novo assembly of the B. napus cultivar Tapidor and comparison with an improved assembly of the Brassica napus cultivar Darmor‐bzh. Both cultivars were annotated using the same method to allow comparison of gene content. We identified genes unique to each cultivar and differentiate these from artefacts due to variation in the assembly and annotation. We demonstrate that using a common annotation pipeline can result in different gene predictions, even for closely related cultivars, and repeat regions which collapse during assembly impact whole genome comparison. After accounting for differences in assembly and annotation, we demonstrate that the genome of Darmor‐bzh contains a greater number of genes than the genome of Tapidor. Our results are the first step towards comparison of the true differences between B. napus genomes and highlight the potential sources of error in future production of a B. napus pangenome.  相似文献   

6.
7.
The 1.5 Gbp/2C genome of pedunculate oak (Quercus robur) has been sequenced. A strategy was established for dealing with the challenges imposed by the sequencing of such a large, complex and highly heterozygous genome by a whole‐genome shotgun (WGS) approach, without the use of costly and time‐consuming methods, such as fosmid or BAC clone‐based hierarchical sequencing methods. The sequencing strategy combined short and long reads. Over 49 million reads provided by Roche 454 GS‐FLX technology were assembled into contigs and combined with shorter Illumina sequence reads from paired‐end and mate‐pair libraries of different insert sizes, to build scaffolds. Errors were corrected and gaps filled with Illumina paired‐end reads and contaminants detected, resulting in a total of 17 910 scaffolds (>2 kb) corresponding to 1.34 Gb. Fifty per cent of the assembly was accounted for by 1468 scaffolds (N50 of 260 kb). Initial comparison with the phylogenetically related Prunus persica gene model indicated that genes for 84.6% of the proteins present in peach (mean protein coverage of 90.5%) were present in our assembly. The second and third steps in this project are genome annotation and the assignment of scaffolds to the oak genetic linkage map. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement, the oak genome data have been released into public sequence repositories in advance of publication. In this presubmission paper, the oak genome consortium describes its principal lines of work and future directions for analyses of the nature, function and evolution of the oak genome.  相似文献   

8.
The domestic dog serves as an excellent model to investigate the genetic basis of disease. More than 400 heritable traits analogous to human diseases have been described in dogs. To further canine medical genetics research, we established the Dog Biomedical Variant Database Consortium (DBVDC) and present a comprehensive list of functionally annotated genome variants that were identified with whole genome sequencing of 582 dogs from 126 breeds and eight wolves. The genomes used in the study have a minimum coverage of 10× and an average coverage of ~24×. In total, we identified 23 133 692 single‐nucleotide variants (SNVs) and 10 048 038 short indels, including 93% undescribed variants. On average, each individual dog genome carried ~4.1 million single‐nucleotide and ~1.4 million short‐indel variants with respect to the reference genome assembly. About 2% of the variants were located in coding regions of annotated genes and loci. Variant effect classification showed 247 141 SNVs and 99 562 short indels having moderate or high impact on 11 267 protein‐coding genes. On average, each genome contained heterozygous loss‐of‐function variants in 30 potentially embryonic lethal genes and 97 genes associated with developmental disorders. More than 50 inherited disorders and traits have been unravelled using the DBVDC variant catalogue, enabling genetic testing for breeding and diagnostics. This resource of annotated variants and their corresponding genotype frequencies constitutes a highly useful tool for the identification of potential variants causative for rare inherited disorders in dogs.  相似文献   

9.
10.
11.
施季森  王占军  陈金慧 《遗传》2012,34(2):145-156
近年来, 植物全基因组测序的结果正如雨后春笋般涌现, 木本植物全基因组测序也在紧锣密鼓地展开。但由于木本植物通常基因组较大, 基因组结构较为复杂, 在测序、测序后的组装、注释、功能分析等均存在较大的困难。在基因组测序分析的经费预算方面也存在着较大的压力。因此, 有必要对这方面的研究进展及其存在问题进行分析比较, 以提高林木全基因组研究方面的效率。文章在比较分析已经发展起来的3代基因测序技术(Sanger测序法、合成测序法和单分子测序法)的基础上, 选择4种已经公布的木本植物(杨树、葡萄、番木瓜、苹果), 从全基因组测序的研究背景、测序结果及应用的研究进展和存在问题等方面进行了述评, 对未来要开展的木本植物全基因组测序前的准备工作(材料选择、遗传图谱和连锁图谱的构建、测序技术的选择), 全基因组测序结果的生物信息学分析和应用进行了讨论。  相似文献   

12.
Thanks to a dramatic reduction in sequencing costs followed by a rapid development of bioinformatics tools, genome assembly and annotation have become accessible to many researchers in recent years. Among tetrapods, birds have genomes that display many features that facilitate their assembly and annotation, such as small genome size, low number of repeats and highly conserved genomic structure. However, we found that high genomic heterozygosity could have a great impact on the quality of the genome assembly of the thick‐billed murre (Uria lomvia), an arctic colonial seabird. In this study, we tested the performance of three genome assemblers, ray /sscape , soapdenovo 2 and platanus , in assembling the highly heterozygous genome of the thick‐billed murre. Our results show that platanus , an assembler specifically designed for heterozygous genomes, outperforms the other two approaches and produces a highly contiguous (N50 = 15.8 Mb) and complete genome assembly (93% presence of genes from the Benchmarking Universal Single Copy Ortholog [BUSCO] gene set). Additionally, we annotated the thick‐billed murre genome using a homology‐based approach that takes advantage of the genomic resources available for birds and other taxa. Our study will be useful for those researchers who are approaching assembly and annotation of highly heterozygous genomes, or genomes of species of conservation concern, and/or who have limited financial resources.  相似文献   

13.
木本植物全基因组测序研究进展   总被引:4,自引:0,他引:4  
Shi JS  Wang ZJ  Chen JH 《遗传》2012,34(2):145-156
近年来,植物全基因组测序的结果正如雨后春笋般涌现,木本植物全基因组测序也在紧锣密鼓地展开。但由于木本植物通常基因组较大,基因组结构较为复杂,在测序、测序后的组装、注释、功能分析等均存在较大的困难。在基因组测序分析的经费预算方面也存在着较大的压力。因此,有必要对这方面的研究进展及其存在问题进行分析比较,以提高林木全基因组研究方面的效率。文章在比较分析已经发展起来的3代基因测序技术(Sanger测序法、合成测序法和单分子测序法)的基础上,选择4种已经公布的木本植物(杨树、葡萄、番木瓜、苹果),从全基因组测序的研究背景、测序结果及应用的研究进展和存在问题等方面进行了述评,对未来要开展的木本植物全基因组测序前的准备工作(材料选择、遗传图谱和连锁图谱的构建、测序技术的选择),全基因组测序结果的生物信息学分析和应用进行了讨论。  相似文献   

14.
We carried out a comprehensive genomic analysis of porcine copy number variants (CNVs) based on whole‐genome SNP genotyping data and provided new measures of genomic diversity (number, length and distribution of CNV events) for a highly inbred strain (the Guadyerbas strain). This strain represents one of the most ancient surviving populations of the Iberian breed, and it is currently in serious danger of extinction. CNV detection was conducted on the complete Guadyerbas population, adjusted for genomic waves, and used strict quality criteria, pedigree information and the latest porcine genome annotation. The analysis led to the detection of 65 CNV regions (CNVRs). These regions cover 0.33% of the autosomal genome of this particular strain. Twenty‐nine of these CNVRs were identified here for the first time. The relatively low number of detected CNVRs is in line with the low variability and high inbreeding estimated previously for this Iberian strain using pedigree, microsatellite or SNP data. A comparison across different porcine studies has revealed that more than half of these regions overlap with previously identified CNVRs or multicopy regions. Also, a preliminary analysis of CNV detection using whole‐genome sequence data for four Guadyerbas pigs showed overlapping for 16 of the CNVRs, supporting their reliability. Some of the identified CNVRs contain relevant functional genes (e.g., the SCD and USP15 genes), which are worth being further investigated because of their importance in determining the quality of Iberian pig products. The CNVR data generated could be useful for improving the porcine genome annotation.  相似文献   

15.
16.
Gene duplications and gene losses are major determinants of genome evolution and phenotypic diversity. The frequency of gene turnover (gene gains and gene losses combined) is known to vary between organisms. Comparative genomic analyses of gene families can highlight such variation; however, estimates of gene turnover may be biased when using highly fragmented genome assemblies resulting in poor gene annotations. Here, we address potential biases introduced by gene annotation errors in estimates of gene turnover frequencies in a dataset including both well‐annotated angiosperm genomes and the incomplete gene sets of four Pinaceae, including two pine species, Norway spruce and Douglas‐fir. We show that Pinaceae experienced higher gene turnover rates than angiosperm lineages lacking recent whole‐genome duplications. This finding is robust to both known major issues in Pinaceae gene sets: missing gene models and erroneous annotation of pseudogenes. A separate analysis limited to the four Pinaceae gene sets pointed to an accelerated gene turnover rate in pines compared with Norway spruce and Douglas‐fir. Our results indicate that gene turnover significantly contributes to genome variation and possibly to speciation in Pinaceae, particularly in pines. Moreover, these findings indicate that reliable estimates of gene turnover frequencies can be discerned in incomplete and potentially inaccurate gene sets. Because gymnosperms are known to exhibit low overall substitution rates compared with angiosperms, our results suggest that the rate of single‐base pair mutations is uncoupled from the rate of large DNA duplications and deletions associated with gene turnover in Pinaceae.  相似文献   

17.
After sequencing the human and mouse genomes, the annotation of these sequences with biological functions is an important challenge in genomic research. A major tool to analyse gene function on the organismal level is the analysis of mutant phenotypes. Because of its genetic and physiological similarity to man, the mouse has become the model organism of choice for the study of genetic diseases. In addition, there is at the moment no other vertebrate for which versatile techniques to manipulate the genome are as well developed. Several mouse mutagenesis projects have provided the proof-of-principle that a systematic and comprehensive mutagenesis of every gene in the mammalian genome will be feasible. An exhaustive functional annotation of the mammalian genome can only be achieved in a combination of phenotype- and gene-driven approaches in large- and small-scale academic and private projects. Major challenges will be to develop standardised phenotyping protocols for the clinical and pathological characterisation of mouse mutants, the improvement of mutation detection methods and the dissemination of resources and data. Beyond gene annotation, it will be necessary to understand how gene functions are integrated into the complex network of regulatory interactions in the cell.  相似文献   

18.
With the development of high throughput sequencing and single-cell genomics technologies, many uncultured bacterial communities have been dissected by combining these two techniques. Especially, by simultaneously leveraging of single-cell genomics and metagenomics, researchers can greatly improve the efficiency and accuracy of obtaining whole genome information from complex microbial communities, which not only allow us to identify microbes but also link function to species, identify subspecies variations, study host-virus interactions and etc. Here, we review recent developments and the challenges need to be addressed in single-cell metagenomics, including potential contamination, uneven sequence coverage, sequence chimera, genome assembly and annotation. With the development of sequencing and computational methods, single-cell metagenomics will undoubtedly broaden its application in various microbiome studies.  相似文献   

19.
20.
Casuarina equisetifolia (C. equisetifolia), a conifer‐like angiosperm with resistance to typhoon and stress tolerance, is mainly cultivated in the coastal areas of Australasia. C. equisetifolia, making it a valuable model to study secondary growth associated genes and stress‐tolerance traits. However, the genome sequence is unavailable and therefore wood‐associated growth rate and stress resistance at the molecular level is largely unexplored. We therefore constructed a high‐quality draft genome sequence of C. equisetifolia by a combination of Illumina second‐generation sequencing reads and Pacific Biosciences single‐molecule real‐time (SMRT) long reads to advance the investigation of this species. Here, we report the genome assembly, which contains approximately 300 megabases (Mb) and scaffold size of N50 is 1.06 Mb. Additionally, gene annotation, assisted by a combination of prediction and RNA‐seq data, generated 29 827 annotated protein‐coding genes and 1983 non‐coding genes, respectively. Furthermore, we found that the total number of repetitive sequences account for one‐third of the genome assembly. Here we also construct the genome‐wide map of DNA modification, such as two novel forms N6‐adenine (6mA) and N4‐methylcytosine (4mC) at the level of single‐nucleotide resolution using single‐molecule real‐time (SMRT) sequencing. Interestingly, we found that 17% of 6mA modification genes and 15% of 4mC modification genes also included alternative splicing events. Finally, we investigated cellulose, hemicellulose, and lignin‐related genes, which were associated with secondary growth and contained different DNA modifications. The high‐quality genome sequence and annotation of C. equisetifolia in this study provide a valuable resource to strengthen our understanding of the diverse traits of trees.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号