首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

The sequencing of the human genome has enabled us to access a comprehensive list of genes (both experimental and predicted) for further analysis. While a majority of the approximately 30000 known and predicted human coding genes are characterized and have been assigned at least one function, there remains a fair number of genes (about 12000) for which no annotation has been made. The recent sequencing of other genomes has provided us with a huge amount of auxiliary sequence data which could help in the characterization of the human genes. Clustering these sequences into families is one of the first steps to perform comparative studies across several genomes.  相似文献   

2.
The olfactory receptor (OR) gene cluster on human chromosome 17p13.3 was subjected to mixed shotgun automated DNA sequencing. The resulting 412 kb of genomic sequence include 17 OR coding regions, 6 of which are pseudogenes. Six of the coding regions were discovered only upon genomic sequencing, while the others were previously reported as partial sequences. A comparison of DNA sequences in the vicinity of the OR coding regions revealed a common gene structure with an intronless coding region and at least one upstream noncoding exon. Potential gene control regions including specific pyrimidine:purine tracts and Olf-1 sites have been identified. One of the pseudogenes apparently has evolved into a CpG island. Four extensive CpG islands can be discerned within the cluster, not coupled to specific OR genes. The cluster is flanked at its telomeric end by an unidentified open reading frame (C17orf2) with no significant similarity to any known protein. A high proportion of the cluster sequence (about 60%) belongs to various families of interspersed repetitive elements, with a clear predominance of LINE repeats. The OR genes in the cluster belong to two families and seven subfamilies, which show a relatively high degree of intermixing along the cluster, in seemingly random orientations. This genomic organization may be best accounted for by a complex series of evolutionary events.  相似文献   

3.
The vertebrate olfactory receptor (OR) subgenome harbors the largest known gene family, which has been expanded by the need to provide recognition capacity for millions of potential odorants. We implemented an automated procedure to identify all OR coding regions from published sequences. This led us to the identification of 831 OR coding regions (including pseudogenes) from 24 vertebrate species. The resulting dataset was subjected to neighbor-joining phylogenetic analysis and classified into 32 distinct families, 14 of which include only genes from tetrapodan species (Class II ORs). We also report here the first identification of OR sequences from a marsupial (koala) and a monotreme (platypus). Analysis of these OR sequences suggests that the ancestral mammal had a small OR repertoire, which expanded independently in all three mammalian subclasses. Classification of ``fish-like' (Class I) ORs indicates that some of these ancient ORs were maintained and even expanded in mammals. A nomenclature system for the OR gene superfamily is proposed, based on a divergence evolutionary model. The nomenclature consists of the root symbol `OR', followed by a family numeral, subfamily letter(s), and a numeral representing the individual gene within the subfamily. For example, OR3A1 is an OR gene of family 3, subfamily A, and OR7E12P is an OR pseudogene of family 7, subfamily E. The symbol is to be preceded by a species indicator. We have assigned the proposed nomenclature symbols for all 330 human OR genes in the database. A WWW tool for automated name assignment is provided. Received: / Accepted:  相似文献   

4.
5.
Niimura Y  Nei M 《PloS one》2007,2(8):e708
Odor perception in mammals is mediated by a large multigene family of olfactory receptor (OR) genes. The number of OR genes varies extensively among different species of mammals, and most species have a substantial number of pseudogenes. To gain some insight into the evolutionary dynamics of mammalian OR genes, we identified the entire set of OR genes in platypuses, opossums, cows, dogs, rats, and macaques and studied the evolutionary change of the genes together with those of humans and mice. We found that platypuses and primates have <400 functional OR genes while the other species have 800-1,200 functional OR genes. We then estimated the numbers of gains and losses of OR genes for each branch of the phylogenetic tree of mammals. This analysis showed that (i) gene expansion occurred in the placental lineage each time after it diverged from monotremes and from marsupials and (ii) hundreds of gains and losses of OR genes have occurred in an order-specific manner, making the gene repertoires highly variable among different orders. It appears that the number of OR genes is determined primarily by the functional requirement for each species, but once the number reaches the required level, it fluctuates by random duplication and deletion of genes. This fluctuation seems to have been aided by the stochastic nature of OR gene expression.  相似文献   

6.
The olfactory receptor (OR) subgenome harbors the largest known gene family in mammals, disposed in clusters on numerous chromosomes. One of the best characterized OR clusters, located at human chromosome 17p13.3, has previously been studied by us in human and in other primates, revealing a conserved set of 17 OR genes. Here, we report the identification of a syntenic OR cluster in the mouse and the partial DNA sequence of many of its OR genes. A probe for the mouse M5 gene, orthologous to one of the OR genes in the human cluster (OR17-25), was used to isolate six PAC clones, all mapping by in situ hybridization to mouse chromosome 11B3-11B5, a region of shared synteny with human chromosome 17p13.3. Thirteen mouse OR sequences amplified and sequenced from these PACs allowed us to construct a putative physical map of the OR gene cluster at the mouse Olfr1 locus. Several points of evidence, including a strong similarity in subfamily composition and at least four cases of gene orthology, suggest that the mouse Olfr1 and the human 17p13.3 clusters are orthologous. A detailed comparison of the OR sequences within the two clusters helps trace their independent evolutionary history in the two species. Two types of evolutionary scenarios are discerned: cases of "true orthologous genes" in which high sequence similarity suggests a shared conserved function, as opposed to instances in which orthologous genes may have undergone independent diversification in the realm of "free reign" repertoire expansion.  相似文献   

7.
With ∼1000 genes, the odorant receptor (OR) gene repertoire is the largest gene family in the mouse genome. Here we have established a 129/Sv BAC contig for mouse OR gene cluster 7 (Olfr7) on Chromosome (Chr) 9. The assembled ∼2-Mb contig consists of 75 BACs and may contain as many as 100 OR genes, or ∼10% of the mouse repertoire. Facilitated by the lack of introns in the coding region, we have determined the nucleotide sequence of 37 full-length, 2 partial, and 3 pseudo coding regions. These 42 OR genes and 3 additional OR genes previously mapped to the mouse Olfr7 cluster can be organized into 13 classes based on OR probe cross-hybridizations with 129/Sv mouse genomic DNA. OR genes belonging to the same class tend to be located next to each other within the cluster. Comparison of published full-length mouse and rat OR coding sequences with those identified here shows that the Olfr7 OR genes are highly related to each other, clustering on two major branches of an unrooted phylogenetic tree. Eight ORs contain an unusual NXC sequon at the amino-terminal extracellular domain that may represent a novel N-linked glycosylation site. The BAC contig presented here provides the substrate for sequencing of the cluster. Received: 27 June 2000 / Accepted: 17 August 2000  相似文献   

8.
The olfactory receptor (OR) multigene family is widely distributed in the human genome. We characterize here a new cluster of four OR genes (HGMW-approved symbols OR7E20P, OR7E6P, OR7E21P, and OR7E22P) on human chromosome 3p13 that is contained in an approximately 250-kb region. This region has been physically mapped, and a 106-kb portion containing the OR genes has been sequenced. All the OR sequences are disrupted by frameshifts and stop codons and appear to have arisen through local duplications. A myosin light chain kinase pseudogene (HGMW-approved symbol MYLKP) lies at one end of the OR gene cluster. Sequences spanning the entire region are also present at 3q13-q21, the site of the functional MYLK gene. This region duplicated locally before the divergence of primates, and the two paralogous copies were later separated to sites on either side of the centromere. This study increases our understanding of the evolution of the human genome. The 3p13 cluster is the first example of a tandem array of OR pseudogenes, and duplications of such clusters may account for the accumulation of a large number of pseudogenes in the human genome.  相似文献   

9.
Human bone marrow stromal cells (HBMSC) are pluripotent cells with the potential to differentiate into osteoblasts, chondrocytes, myelosupportive stroma, and marrow adipocytes. We used high-throughput DNA sequencing analysis to generate 4258 single-pass sequencing reactions (known as expressed sequence tags, or ESTs) obtained from the 5' (97) and 3' (4161) ends of human cDNA clones from a HBMSC cDNA library. Our goal was to obtain tag sequences from the maximum number of possible genes and to deposit them in the publicly accessible database for ESTs (dbEST of the National Center for Biotechnology Information). Comparisons of our EST sequencing data with nonredundant human mRNA and protein databases showed that the ESTs represent 1860 gene clusters. The EST sequencing data analysis showed 60 novel genes found only in this cDNA library after BLAST analysis against 3.0 million ESTs in NCBI's dbEST database. The BLAST search also showed the identified ESTs that have close homology to known genes, which suggests that these may be newly recognized members of known gene families. The gene expression profile of this cell type is revealed by analyzing both the frequency with which a message is encountered and the functional categorization of expressed sequences. Comparing an EST sequence with the human genomic sequence database enables assignment of an EST to a specific chromosomal region (a process called digital gene localization) and often enables immediate partial determination of intron/exon boundaries within the genomic structure. It is expected that high-throughput EST sequencing and data mining analysis will greatly promote our understanding of gene expression in these cells and of growth and development of the skeleton.  相似文献   

10.
We developed a novel efficient scheme, DEFOG (for "deciphering families of genes"), for determining sequences of numerous genes from a family of interest. The scheme provides a powerful means to obtain a gene family composition in species for which high-throughput genomic sequencing data are not available. DEFOG uses two key procedures. The first is a novel algorithm for designing highly degenerate primers based on a set of known genes from the family of interest. These primers are used in PCR reactions to amplify the members of the gene family. The second combines oligofingerprinting of the cloned PCR products with clustering of the clones based on their fingerprints. By selecting members from each cluster, a low-redundancy clone subset is chosen for sequencing. We applied the scheme to the human olfactory receptor (OR) genes. OR genes constitute the largest gene superfamily in the human genome, as well as in the genomes of other vertebrate species. DEFOG almost tripled the size of the initial repertoire of human ORs in a single experiment, and only 7% of the PCR clones had to be sequenced. Extremely high degeneracies, reaching over a billion combinations of distinct PCR primer pairs, proved to be very effective and yielded only 0.4% nonspecific products.  相似文献   

11.
He K  Ye Q  Zhu Y  Chen H  Wan QH  Fang SG 《Gene》2012,507(1):74-78
Chinese alligator (Alligator sinensis) is a rare and endangered species endemic to China. To better understand genetic details of the Chinese alligator genomic structure, a highly redundant bacterial artificial chromosome (BAC) library was constructed. This library consists of 216,238 clones with an average insert size of about 90kb, indicating that the library contains 6.8-fold genome equivalents. Subsequently, we constructed a 516kb contig map for the Chinese alligator olfactory receptor (OR) genes, which spans nine BAC clones, and subjected the BACs to full sequencing. The sequence analysis revealed that this contig contained 16 OR functional genes and meanwhile demonstrated that the nine BACs, which constituted the contig, overlapped correctly, proving the usability of this genome library. As a result, this BAC library could provide a useful platform for physical mapping, genome sequencing or complex analysis of targeted genomic regions for this rare species.  相似文献   

12.
Isolation and characterization of human thioredoxin-encoding genes   总被引:7,自引:0,他引:7  
K F Tonissen  J R Wells 《Gene》1991,102(2):221-228
Thioredoxin (Trx) has recently been demonstrated to be an essential component of the early pregnancy factor activity of pregnancy serum. Here, we report the structure and sequence of human Trx-encoding genes (Trx) by analysis of genomic clones. The Trx gene extends over 13 kb and consists of five exons encoding a 12-kDa protein. A 700-bp fragment upstream from the start codon functions as a promoter when inserted in front of a human growth hormone-encoding reporter gene in tissue-culture cells. This promoter region is very G + C rich and does not contain a classical TATA or CCAAT box, but has three consensus sequences for high-affinity Sp1 binding. Southern analysis demonstrated the presence of several Trx genes in the human genome. The number includes at least one inactive copy as shown by the isolation and sequencing of an inactive pseudogene.  相似文献   

13.
14.
15.
Unravelling cell wall formation in the woody dicot stem   总被引:20,自引:0,他引:20  
Populus is presented as a model system for the study of wood formation (xylogenesis). The formation of wood (secondary xylem) is an ordered developmental process involving cell division, cell expansion, secondary wall deposition, lignification and programmed cell death. Because wood is formed in a variable environment and subject to developmental control, xylem cells are produced that differ in size, shape, cell wall structure, texture and composition. Hormones mediate some of the variability observed and control the process of xylogenesis. High-resolution analysis of auxin distribution across cambial region tissues, combined with the analysis of transgenic plants with modified auxin distribution, suggests that auxin provides positional information for the exit of cells from the meristem and probably also for the duration of cell expansion. Poplar sequencing projects have provided access to genes involved in cell wall formation. Genes involved in the biosynthesis of the carbohydrate skeleton of the cell wall are briefly reviewed. Most progress has been made in characterizing pectin methyl esterases that modify pectins in the cambial region. Specific expression patterns have also been found for expansins, xyloglucan endotransglycosylases and cellulose synthases, pointing to their role in wood cell wall formation and modification. Finally, by studying transgenic plants modified in various steps of the monolignol biosynthetic pathway and by localizing the expression of various enzymes, new insight into the lignin biosynthesis in planta has been gained.  相似文献   

16.
Opsin gene sequences were first reported in the 1980s. The goal of that research was to test the hypothesis that human opsins were members of a single gene family and that variation in human color vision was mediated by mutations in these genes. While the new data supported both hypotheses, the greatest contribution of this work was, arguably, that it provided the data necessary for PCR-based surveys in a diversity of other species. Such studies, and recent whole genome sequencing projects, have uncovered exceptionally large opsin gene repertoires in ray-finned fishes (taxon, Actinopterygii). Guppies and zebrafish, for example, have 10 visual opsin genes each. Here we review the duplication and divergence events that have generated these gene collections. Phylogenetic analyses revealed that large opsin gene repertories in fish have been generated by gene duplication and divergence events that span the age of the ray-finned fishes. Data from whole genome sequencing projects and from large-insert clones show that tandem duplication is the primary mode of opsin gene family expansion in fishes. In some instances gene conversion between tandem duplicates has obscured evolutionary relationships among genes and generated unique key-site haplotypes. We mapped amino acid substitutions at so-called key-sites onto phylogenies and this exposed many examples of convergence. We found that dN/dS values were higher on the branches of our trees that followed gene duplication than on branches that followed speciation events, suggesting that duplication relaxes constraints on opsin sequence evolution. Though the focus of the review is opsin sequence evolution, we also note that there are few clear connections between opsin gene repertoires and variation in spectral environment, morphological traits, or life history traits.  相似文献   

17.
18.
Somatic transposon mutagenesis in mice is an efficient strategy to investigate the genetic mechanisms of tumorigenesis. The identification of tumor driving transposon insertions traditionally requires the generation of large tumor cohorts to obtain information about common insertion sites. Tumor driving insertions are also characterized by their clonal expansion in tumor tissue, a phenomenon that is facilitated by the slow and evolving transformation process of transposon mutagenesis. We describe here an improved approach for the detection of tumor driving insertions that assesses the clonal expansion of insertions by quantifying the relative proportion of sequence reads obtained in individual tumors. To this end, we have developed a protocol for insertion site sequencing that utilizes acoustic shearing of tumor DNA and Illumina sequencing. We analyzed various solid tumors generated by PiggyBac mutagenesis and for each tumor >106 reads corresponding to >104 insertion sites were obtained. In each tumor, 9 to 25 insertions stood out by their enriched sequence read frequencies when compared to frequencies obtained from tail DNA controls. These enriched insertions are potential clonally expanded tumor driving insertions, and thus identify candidate cancer genes. The candidate cancer genes of our study comprised many established cancer genes, but also novel candidate genes such as Mastermind-like1 (Mamld1) and Diacylglycerolkinase delta (Dgkd). We show that clonal expansion analysis by high-throughput sequencing is a robust approach for the identification of candidate cancer genes in insertional mutagenesis screens on the level of individual tumors.  相似文献   

19.
20.
Copy-number variations (CNVs) are widespread in the human genome, but comprehensive assignments of integer locus copy-numbers (i.e., copy-number genotypes) that, for example, enable discrimination of homozygous from heterozygous CNVs, have remained challenging. Here we present CopySeq, a novel computational approach with an underlying statistical framework that analyzes the depth-of-coverage of high-throughput DNA sequencing reads, and can incorporate paired-end and breakpoint junction analysis based CNV-analysis approaches, to infer locus copy-number genotypes. We benchmarked CopySeq by genotyping 500 chromosome 1 CNV regions in 150 personal genomes sequenced at low-coverage. The assessed copy-number genotypes were highly concordant with our performed qPCR experiments (Pearson correlation coefficient 0.94), and with the published results of two microarray platforms (95-99% concordance). We further demonstrated the utility of CopySeq for analyzing gene regions enriched for segmental duplications by comprehensively inferring copy-number genotypes in the CNV-enriched >800 olfactory receptor (OR) human gene and pseudogene loci. CopySeq revealed that OR loci display an extensive range of locus copy-numbers across individuals, with zero to two copies in some OR loci, and two to nine copies in others. Among genetic variants affecting OR loci we identified deleterious variants including CNVs and SNPs affecting ~15% and ~20% of the human OR gene repertoire, respectively, implying that genetic variants with a possible impact on smell perception are widespread. Finally, we found that for several OR loci the reference genome appears to represent a minor-frequency variant, implying a necessary revision of the OR repertoire for future functional studies. CopySeq can ascertain genomic structural variation in specific gene families as well as at a genome-wide scale, where it may enable the quantitative evaluation of CNVs in genome-wide association studies involving high-throughput sequencing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号