首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
Candidate gene identification is typically labour intensive, involving laboratory experiments required to corroborate or disprove any hypothesis for a nominated candidate gene being considered the causative gene. The traditional approach to reduce the number of candidate genes entails fine-mapping studies using markers and pedigrees. Gene prioritization establishes the ranking of candidate genes based on their relevance to the biological process of interest, from which the most promising genes can be selected for further analysis. To date, many computational methods have focused on the prediction of candidate genes by analysis of their inherent sequence characteristics and similarity with respect to known disease genes, as well as their functional annotation. In the last decade, several computational tools for prioritizing candidate genes have been proposed. A large number of them are web-based tools, while others are standalone applications that install and run locally. This review attempts to take a close look at gene prioritization criteria, as well as candidate gene prioritization algorithms, and thus provide a comprehensive synopsis of the subject matter.  相似文献   

2.
Complete structure of the chloroplast genome of Arabidopsis thaliana.   总被引:7,自引:0,他引:7  
The complete nucleotide sequence of the chloroplast genome of Arabidopsis thaliana has been determined. The genome as a circular DNA composed of 154,478 bp containing a pair of inverted repeats of 26,264 bp, which are separated by small and large single copy regions of 17,780 bp and 84,170 bp, respectively. A total of 87 potential protein-coding genes including 8 genes duplicated in the inverted repeat regions, 4 ribosomal RNA genes and 37 tRNA genes (30 gene species) representing 20 amino acid species were assigned to the genome on the basis of similarity to the chloroplast genes previously reported for other species. The translated amino acid sequences from respective potential protein-coding genes showed 63.9% to 100% sequence similarity to those of the corresponding genes in the chloroplast genome of Nicotiana tabacum, indicating the occurrence of significant diversity in the chloroplast genes between two dicot plants. The sequence data and gene information are available on the World Wide Web database KAOS (Kazusa Arabidopsis data Opening Site) at http://www.kazusa.or.jp/arabi/.  相似文献   

3.
Familial adult myoclonus epilepsy (FAME) is a rare autosomal dominant disorder characterized by adult onset, involuntary muscle jerks, cortical myoclonus and occasional seizures. FAME is genetically heterogeneous with more than 70 families reported worldwide and five potential disease loci. The efforts to identify potential causal variants have been unsuccessful in all but three families. To date, linkage analysis has been the main approach to find and narrow FAME critical regions. We propose an alternative method, pedigree free identity-by-descent (IBD) mapping, that infers regions of the genome between individuals that have been inherited from a common ancestor. IBD mapping provides an alternative to linkage analysis in the presence of allelic and locus heterogeneity by detecting clusters of individuals who share a common allele. Succeeding IBD mapping, gene prioritization based on gene co-expression analysis can be used to identify the most promising candidate genes. We performed an IBD analysis using high-density single nucleotide polymorphism (SNP) array data followed by gene prioritization on a FAME cohort of ten European families and one Australian/New Zealander family; eight of which had known disease loci. By identifying IBD regions common to multiple families, we were able to narrow the FAME2 locus to a 9.78 megabase interval within 2p11.2–q11.2. We provide additional evidence of a founder effect in four Italian families and allelic heterogeneity with at least four distinct founders responsible for FAME at the FAME2 locus. In addition, we suggest candidate disease genes using gene prioritization based on gene co-expression analysis.  相似文献   

4.
Species evolutionary relationships have traditionally been defined by sequence similarities of phylogenetic marker molecules, recently followed by whole-genome phylogenies based on gene order, average ortholog similarity or gene content. Here, we introduce genome conservation--a novel metric of evolutionary distances between species that simultaneously takes into account, both gene content and sequence similarity at the whole-genome level. Genome conservation represents a robust distance measure, as demonstrated by accurate phylogenetic reconstructions. The genome conservation matrix for all presently sequenced organisms exhibits a remarkable ability to define evolutionary relationships across all taxonomic ranges. An assessment of taxonomic ranks with genome conservation shows that certain ranks are inadequately described and raises the possibility for a more precise and quantitative taxonomy in the future. All phylogenetic reconstructions are available at the genome phylogeny server: .  相似文献   

5.
The Rice TOGO Browser is an online public resource designed to facilitate integration and visualization of mapping data of bacterial artificial chromosome (BAC)/P1-derived artificial chromosome (PAC) clones, genes, restriction fragment length polymorphism (RFLP)/simple sequence repeat (SSR) markers and phenotype data represented as quantitative trait loci (QTLs) onto the genome sequence, and to provide a platform for more efficient utilization of genome information from the point of view of applied genomics as well as functional genomics. Three search options, namely keyword search, region search and trait search, generate various types of data in a user-friendly interface with three distinct viewers, a chromosome viewer, an integrated map viewer and a sequence viewer, thereby providing the opportunity to view the position of genes and/or QTLs at the chromosomal level and to retrieve any sequence information in a user-defined genome region. Furthermore, the gene list, marker list and genome sequence in a specified region delineated by RFLP/SSR markers and any sequences designed as primers can be viewed and downloaded to support forward genetics approaches. An additional feature of this database is the graphical viewer for BLAST search to reveal information not only for regions with significant sequence similarity but also for regions adjacent to those with similarity but with no hits between sequences. An easy to use and intuitive user interface can help a wide range of users in retrieving integrated mapping information including agronomically important traits on the rice genome sequence. The database can be accessed at http://agri-trait.dna.affrc.go.jp/.  相似文献   

6.
The use of DNA sequence-based comparative genomics for evolutionary studies and for transferring information from model species to related large-genome species has revolutionized molecular genetics and breeding strategies for improving those crops. Comparative sequence analysis methods can be used to cross-reference genes between species maps, enhance the resolution of comparative maps, study patterns of gene evolution, identify conserved regions of the genomes, and facilitate interspecies gene cloning. In this study, 5,780 Triticeae ESTs that have been physically mapped using wheat (Triticum aestivum L.) deletion lines and segregating populations were compared using NCBI BLASTN to the first draft of the public rice (Oryza sativa L.) genome sequence data from 3,280 ordered BAC/PAC clones. A rice genome view of the homoeologous wheat genome locations based on sequence analysis shows general similarity to the previously published comparative maps based on Southern analysis of RFLP. For most rice chromosomes there is a preponderance of wheat genes from one or two wheat chromosomes. The physical locations of non-conserved regions were not consistent across rice chromosomes. Some wheat ESTs with multiple wheat genome locations are associated with the non-conserved regions of similarity between rice and wheat. The inverse view, showing the relationship between the wheat deletion map and rice genomic sequence, revealed the breakdown of gene content and order at the resolution conferred by the physical chromosome deletions in the wheat genome. An average of 35% of the putative single copy genes that were mapped to the most conserved bins matched rice chromosomes other than the one that was most similar. This suggests that there has been an abundance of rearrangements, insertions, deletions, and duplications eroding the wheat-rice genome relationship that may complicate the use of rice as a model for cross-species transfer of information in non-conserved regions.  相似文献   

7.
We have determined that Borrelia burgdorferi strain B31 MI carries 21 extrachromosomal DNA elements, the largest number known for any bacterium. Among these are 12 linear and nine circular plasmids, whose sequences total 610 694 bp. We report here the nucleotide sequence of three linear and seven circular plasmids (comprising 290 546 bp) in this infectious isolate. This completes the genome sequencing project for this organism; its genome size is 1 521 419 bp (plus about 2000 bp of undetermined telomeric sequences). Analysis of the sequence implies that there has been extensive and sometimes rather recent DNA rearrangement among a number of the linear plasmids. Many of these events appear to have been mediated by recombinational processes that formed duplications. These many regions of similarity are reflected in the fact that most plasmid genes are members of one of the genome's 161 paralogous gene families; 107 of these gene families, which vary in size from two to 41 members, contain at least one plasmid gene. These rearrangements appear to have contributed to a surprisingly large number of apparently non-functional pseudogenes, a very unusual feature for a prokaryotic genome. The presence of these damaged genes suggests that some of the plasmids may be in a period of rapid evolution. The sequence predicts 535 plasmid genes >/=300 bp in length that may be intact and 167 apparently mutationally damaged and/or unexpressed genes (pseudogenes). The large majority, over 90%, of genes on these plasmids have no convincing similarity to genes outside Borrelia, suggesting that they perform specialized functions.  相似文献   

8.
Non-coding variants have long been recognized as important contributors to common disease risks, but with the expansion of clinical whole genome sequencing, examples of rare, high-impact non-coding variants are also accumulating. Despite recent advances in the study of regulatory elements and the availability of specialized data collections, the systematic annotation of non-coding variants from genome sequencing remains challenging. Here, we propose a new framework for the prioritization of non-coding regulatory variants that integrates information about regulatory regions with prediction scores and HPO-based prioritization. Firstly, we created a comprehensive collection of annotations for regulatory regions including a database of 2.4 million regulatory elements (GREEN-DB) annotated with controlled gene(s), tissue(s) and associated phenotype(s) where available. Secondly, we calculated a variation constraint metric and showed that constrained regulatory regions associate with disease-associated genes and essential genes from mouse knock-outs. Thirdly, we compared 19 non-coding impact prediction scores providing suggestions for variant prioritization. Finally, we developed a VCF annotation tool (GREEN-VARAN) that can integrate all these elements to annotate variants for their potential regulatory impact. In our evaluation, we show that GREEN-DB can capture previously published disease-associated non-coding variants as well as identify additional candidate disease genes in trio analyses.  相似文献   

9.
Multiple copies of a given ribosomal RNA gene family undergo concerted evolution such that sequences of all gene copies are virtually identical within a species although they diverge normally between species. In eukaryotes, gene conversion and unequal crossing over are the proposed mechanisms for concerted evolution of tandemly repeated sequences, whereas dispersed genes are homogenized by gene conversion. However, the homogenization mechanisms for multiple-copy, normally dispersed, prokaryotic rRNA genes are not well understood. Here we compared the sequences of multiple paralogous rRNA genes within a genome in 12 prokaryotic organisms that have multiple copies of the rRNA genes. Within a genome, putative sequence conversion tracts were found throughout the entire length of each individual rRNA genes and their immediate flanks. Individual conversion events convert only a short sequence tract, and the conversion partners can be any paralogous genes within the genome. Interestingly, the genic sequences undergo much slower divergence than their flanking sequences. Moreover, genomic context and operon organization do not affect rRNA gene homogenization. Thus, gene conversion underlies concerted evolution of bacterial rRNA genes, which normally occurs within genic sequences, and homogenization of flanking regions may result from co-conversion with the genic sequence. Received: 31 March 2000 / Accepted: 15 June 2000  相似文献   

10.
The complete genomic sequence of an aerobic thermoacidophilic crenarchaeon, Sulfolobus tokodaii strain7 which optimally grows at 80 degrees C, at low pH, and under aerobic conditions, has been determined by the whole genome shotgun method with slight modifications. The genomic size was 2,694,756 bp long and the G + C content was 32.8%. The following RNA-coding genes were identified: a single 16S-23S rRNA cluster, one 5S rRNA gene and 46 tRNA genes (including 24 intron-containing tRNA genes). The repetitive sequences identified were SR-type repetitive sequences, long dispersed-type repetitive sequences and Tn-like repetitive elements. The genome contained 2826 potential protein-coding regions (open reading frames, ORFs). By similarity search against public databases, 911 (32.2%) ORFs were related to functional assigned genes, 921 (32.6%) were related to conserved ORFs of unknown function, 145 (5.1%) contained some motifs, and remaining 849 (30.0%) did not show any significant similarity to the registered sequences. The ORFs with functional assignments included the candidate genes involved in sulfide metabolism, the TCA cycle and the respiratory chain. Sequence comparison provided evidence suggesting the integration of plasmid, rearrangement of genomic structure, and duplication of genomic regions that may be responsible for the larger genomic size of the S. tokodaii strain7 genome. The genome contained eukaryote-type genes which were not identified in other archaea and lacked the CCA sequence in the tRNA genes. The result suggests that this strain is closer to eukaryotes among the archaea strains so far sequenced. The data presented in this paper are also available on the internet homepage (http://www.bio.nite.go.jp/E-home/genome_list-e.html/).  相似文献   

11.
12.
Measuring in a quantitative, statistical sense the degree to which structural and functional information can be "transferred" between pairs of related protein sequences at various levels of similarity is an essential prerequisite for robust genome annotation. To this end, we performed pairwise sequence, structure and function comparisons on approximately 30,000 pairs of protein domains with known structure and function. Our domain pairs, which are constructed according to the SCOP fold classification, range in similarity from just sharing a fold, to being nearly identical. Our results show that traditional scores for sequence and structure similarity have the same basic exponential relationship as observed previously, with structural divergence, measured in RMS, being exponentially related to sequence divergence, measured in percent identity. However, as the scale of our survey is much larger than any previous investigations, our results have greater statistical weight and precision. We have been able to express the relationship of sequence and structure similarity using more "modern scores," such as Smith-Waterman alignment scores and probabilistic P-values for both sequence and structure comparison. These modern scores address some of the problems with traditional scores, such as determining a conserved core and correcting for length dependency; they enable us to phrase the sequence-structure relationship in more precise and accurate terms. We found that the basic exponential sequence-structure relationship is very general: the same essential relationship is found in the different secondary-structure classes and is evident in all the scoring schemes. To relate function to sequence and structure we assigned various levels of functional similarity to the domain pairs, based on a simple functional classification scheme. This scheme was constructed by combining and augmenting annotations in the enzyme and fly functional classifications and comparing subsets of these to the Escherichia coli and yeast classifications. We found sigmoidal relationships between similarity in function and sequence, with clear thresholds for different levels of functional conservation. For pairs of domains that share the same fold, precise function appears to be conserved down to approximately 40 % sequence identity, whereas broad functional class is conserved to approximately 25 %. Interestingly, percent identity is more effective at quantifying functional conservation than the more modern scores (e.g. P-values). Results of all the pairwise comparisons and our combined functional classification scheme for protein structures can be accessed from a web database at http://bioinfo.mbb.yale.edu/alignCopyright 2000 Academic Press.  相似文献   

13.
The nucleotide sequence of the gene coding for tRNA(Lys) and its flanking regions from the rapeseed mitochondrial genome are presented and compared with other known tRNA(Lys) genes from plant mitochondria. This tRNA sequence can be folded into the standard cloverleaf structure model. Also, this tRNA sequence shows less similarity with its chloroplast counterparts and therefore appears to be 'native' mitochondrial tRNA.  相似文献   

14.
Complete structure of the chloroplast genome of a legume, Lotus japonicus.   总被引:4,自引:0,他引:4  
The nucleotide sequence of the entire chloroplast genome (150,519 bp) of a legume, Lotus japonicus, has been determined. The circular double-stranded DNA contains a pair of inverted repeats of 25,156 bp which are separated by a small and a large single copy region of 18,271 bp and 81,936 bp, respectively. A total of 84 predicted protein-coding genes including 7 genes duplicated in the inverted repeat regions, 4 ribosomal RNA genes and 37 tRNA genes (30 gene species) representing 20 amino acids species were assigned on the genome based on similarity to genes previously identified in other chloroplasts. All the predicted genes were conserved among dicot plants except that rpl22, a gene encoding chloroplast ribosomal protein CL22, was missing in L. japonicus. Inversion of a 51-kb segment spanning rbcL to rpsl6 (positions 5161-56,176) in the large single copy region was observed in the chloroplast genome of L. japonicus. The sequence data and gene information are available on our World Wide Web database at http://www.kazusa.or.jp/en/plant/database.html.  相似文献   

15.
16.
17.
The complete nucleotide sequence of mulberry (Morus indica cv. K2) chloroplast genome (158,484 bp) has been determined using a combination of long PCR and shotgun-based approaches. This is the third angiosperm tree species whose plastome sequence has been completely deciphered. The circular double-stranded molecule comprises of two identical inverted repeats (25,678 bp each) separating a large and a small single-copy region of 87,386 bp and 19,742 bp, respectively. A total of 83 protein-coding genes including five genes duplicated in the inverted repeat regions, eight ribosomal RNA genes and 37 tRNA genes (30 gene species) representing 20 amino acids, were assigned on the basis of homology to predicted genes from other chloroplast genomes. The mulberry plastome lacks the genes infA, sprA, and rpl21 and contains two pseudogenes ycf15 and ycf68. Comparative analysis, based on sequence similarity, both at the gene and genome level, indicates Morus to be closer to Cucumis and Lotus, phylogenetically. However, at genome level, inclusion of non-coding regions brings it closer to Eucalyptus, followed by Cucumis. This may reflect differential selection pressure operating on the genic and intergenic regions of the chloroplast genome.Electronic supplementary material Supplementary material is available in the online version of this article at and is accessible for authorized users.Communicated by Y. Tsumura  相似文献   

18.
Human gene catalogs are fundamental to the study of human biology and medicine. But they are all based on open reading frames (ORFs) in a reference genome sequence (with allowance for introns). Individual genomes, however, are polymorphic: their sequences are not identical. There has been much research on how polymorphism affects previously-identified genes, but no research has been done on how it affects gene identification itself. We computationally predict protein-coding genes in a straightforward manner, by finding long ORFs in mRNA sequences aligned to the reference genome. We systematically test the effect of known polymorphisms with this procedure. Polymorphisms can not only disrupt ORFs, they can also create long ORFs that do not exist in the reference sequence. We found 5,737 putative protein-coding genes that do not exist in the reference, whose protein-coding status is supported by homology to known proteins. On average 10% of these genes are located in the genomic regions devoid of annotated genes in 12 other catalogs. Our statistical analysis showed that these ORFs are unlikely to occur by chance.  相似文献   

19.
ABSTRACT: BACKGROUND: Plant mitochondrial genome has unique features such as large size, frequent recombination and incorporation of foreign DNA. Cytoplasmic male sterility (CMS) is caused by rearrangement of the mitochondrial genome, and a novel chimeric open reading frame (ORF) created by shuffling of endogenous sequences is often responsible for CMS. The Ogura-type male-sterile cytoplasm is one of the most extensively studied cytoplasms in Brassicaceae. Although the gene orf138 has been isolated as a determinant of Ogura-type CMS, no homologous sequence to orf138 has been found in public databases. Therefore, how orf138 sequence was created is a mystery. In this study, we determined the complete nucleotide sequence of two radish mitochondrial genomes, namely, Ogura- and normal-type genomes, and analyzed them to reveal the origin of the gene orf138. RESULTS: Ogura- and normal-type mitochondrial genomes were assembled to 258,426-bp and 244,036-bp circular sequences, respectively. Normal-type mitochondrial genome contained 33 protein-coding and three rRNA genes, which are well conserved with the reported mitochondrial genome of rapeseed. Ogura-type genomes contained same genes and additional atp9. As for tRNA, normal-type contained 17 tRNAs, while Ogura type contained 17 tRNAs and one additional trnfM. The gene orf138 was specific to Ogura-type mitochondrial genome, and no sequence homologous to it was found in normal-type genome. Comparative analysis of the two genomes revealed that radish mitochondrial genome consists of 11 syntenic regions (length >3kb, similarity >99.9%). It was shown that short repeats and overlapped repeats present in the edge of syntenic regions were involved in recombination events during evolution to interconvert two types of mitochondrial genome. Ogura-type mitochondrial genome has four unique regions (2,803 bp, 1,601 bp, 451 bp and 15,255 bp in size) that are non-syntenic to normal-type genome, and the gene orf138 was found to be located at the edge of the largest unique region. Blast analysis performed to assign the unique regions showed that about 80% of the region was covered by short homologous sequences to the mitochondrial sequences of normal-type radish or other reported Brassicaceae species, although no homology was found for the remaining 20% of sequences. CONCLUSIONS: Ogura-type mitochondrial genome was highly rearranged compared with the normal-type genome by recombination through one large repeat and multiple short repeats. The rearrangement has produced four unique regions in Ogura-type mitochondrial genome, and most of the unique regions are composed of known Brassicaceae mitochondrial sequences. This suggests that the regions unique to the Ogura-type genome were generated by integration and shuffling of pre-existing mitochondrial sequences during the evolution of Brassicaceae, and novel genes such as orf138 could have been created by the shuffling process of mitochondrial genome.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号