首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 466 毫秒
1.
Genome phylogenetic analysis based on extended gene contents   总被引:1,自引:0,他引:1  
With the rapid growth of entire genome data, whole-genome approaches such as gene content become popular for genome phylogeny inference, including the tree of life. However, the underlying model for genome evolution is unclear, and the proposed (ad hoc) genome distance measure may violate the additivity. In this article, we formulate a stochastic framework for genome evolution, which provides a basis for defining an additive genome distance. However, we show that it is difficult to utilize the typical gene content data-i.e., the presence or absence of gene families across genomes-to estimate the genome distance. We solve this problem by introducing the concept of extended gene content; that is, the status of a gene family in a given genome could be absence, presence as single copy, or presence as duplicates, any of which can be used to estimate the genome distance and phylogenetic inference. Computer simulation shows that the new tree-making method is efficient, consistent, and fairly robust. The example of 35 microbial complete genomes demonstrates that it is useful not only to study the universal tree of life but also to explore the evolutionary pattern of genomes.  相似文献   

2.
3.
We used a unique combination of techniques to sequence the first complete chloroplast genome of a lycophyte, Huperzia lucidula. This plant belongs to a significant clade hypothesized to represent the sister group to all other vascular plants. We used fluorescence-activated cell sorting (FACS) to isolate the organelles, rolling circle amplification (RCA) to amplify the genome, and shotgun sequencing to 8× depth coverage to obtain the complete chloroplast genome sequence. The genome is 154,373 bp, containing inverted repeats of 15,314 bp each, a large single-copy region of 104,088 bp, and a small single-copy region of 19,657 bp. Gene order is more similar to those of mosses, liverworts, and hornworts than to gene order for other vascular plants. For example, the Huperzia chloroplast genome possesses the bryophyte gene order for a previously characterized 30 kb inversion, thus supporting the hypothesis that lycophytes are sister to all other extant vascular plants. The lycophyte chloroplast genome data also enable a better reconstruction of the basal tracheophyte genome, which is useful for inferring relationships among bryophyte lineages. Several unique characters are observed in Huperzia, such as movement of the gene ndhF from the small single copy region into the inverted repeat. We present several analyses of evolutionary relationships among land plants by using nucleotide data, inferred amino acid sequences, and by comparing gene arrangements from chloroplast genomes. The results, while still tentative pending the large number of chloroplast genomes from other key lineages that are soon to be sequenced, are intriguing in themselves, and contribute to a growing comparative database of genomic and morphological data across the green plants.  相似文献   

4.
Copy number alterations (CNAs) can be observed in most of cancer patients. Several oncogenes and tumor suppressor genes with CNAs have been identified in different kinds of tumor. However, the systematic survey of CNA-affected functions is still lack. By employing systems biology approaches, instead of examining individual genes, we directly identified the functional hotspots on human genome. A total of 838 hotspots on human genome with 540 enriched Gene Ontology functions were identified. Seventy-six aCGH array data of hepatocellular carcinoma (HCC) tumors were employed in this study. A total of 150 regions which putatively affected by CNAs and the encoded functions were identified. Our results indicate that two immune related hotspots had copy number alterations in most of patients. In addition, our data implied that these immune-related regions might be involved in HCC oncogenesis. Also, we identified 39 hotspots of which copy number status were associated with patient survival. Our data implied that copy number alterations of the regions may contribute in the dysregulation of the encoded functions. These results further demonstrated that our method enables researchers to survey biological functions of CNAs and to construct regulation hypothesis at pathway and functional levels.  相似文献   

5.
Enhancers, as the genomic non-coding sequences, play a key role in the activation of gene expression. They have been widely identified in the human genome. Pig is an important biomedical model for human health. Few studies have been performed to explore the enhancers in the pig genome. The human enhancer information may be useful to identify enhancers in the pig genome. In addition, the genetic background of pig traits could be useful to annotate human enhancers and diseases. Thus, in order to further study enhancers and their potential roles in human and pig, we developed a public database, ETph (Enhancers and their Targets in pig and human). ETph integrates the information on human enhancers, pig putative enhancers, target genes, pig QTL terms, human diseases, GO terms and the KEGG pathway. A total of 25 182 enhancers were identified in the pig genome using the human homology sequence information. Among them, 6232 high-confidence enhancers were used to build the ETph. ETph provides a convenient platform to search, browse and download data. Moreover, a web-based analytical tool was designed to visualize networks and topology graphs among pig putative enhancers, target genes, pig QTL traits and human diseases. ETph might provide a useful tool for researchers to investigate the genetic background of pig traits and human diseases. ETph is freely accessible at http://klab.sjtu.edu.cn/enhancer/ .  相似文献   

6.
利用转基因克隆技术实现外源基因的导入宿主染色体基因组内稳定整合,并能遗传给后代,已在基因表达与调控的理论研究、人类遗传病动物模型的建立、药用蛋白的生产、抗病育种、人类移植用的器官的研究等方面得到广泛应用。转基因动物的研究与应用也已经成为21世纪生命科学领域最活跃、最具有实际应用价值的方向之一,尤其是作为生物反应器和医学上为人类提供所用器官方面,其经济价值和社会效益将是不可估量。在查阅大量近年来国内外相关资料的基础上,本文以转基因动物克隆为中心,对转基因动物克隆所采用显微注射技术、核移植技术、基因打靶与真核BAC表达载体制备等主要研究技术,以及转基因动物克隆在异种器官移植、构建生物反应器等方面的应用进行了综合性论述与分析,同时阐述了各种转基因技术的优点与缺点,以其为转基因动物克隆研究提供理论基础与技术支撑。  相似文献   

7.
王琦  许杰  郭政  李霞 《生物信息学》2003,1(1):33-36
基因芯片具有高通量快速并行检测基因表达水平的功能,是功能基因组研究的有力工具。针对基因芯片常规的信息分析需要,我们初步设计开发了基因表达谱的信息学分析平台,包括基于单机的软件IDKA(Information Digger for Experiments of microArray)与网络应用程序WebGEA(WEB GeneChip Expression Analysis),分别支持用户运行独立程序与在因特网上提交数据运行服务器程序来完成数据采掘分析任务。该平台得到良好的应用,是解决基因芯片常规的信息分析问题的一个方便工具。  相似文献   

8.
Carels N 《FEBS letters》2005,579(18):3867-3871
Previous investigations by Southern hybridization of cDNA with compositional DNA fractions showed that the majority of maize genes are located in a narrow GC range of DNA fragments and that the corresponding gene space was GC-richer than the region of the genome where zein genes are found. Here, we revisited the maize gene space using new data from the maize genome sequencing initiative. We found that the maize gene space itself is formed of two compositional compartments, i.e., a GC-poor and a GC-rich, characterized by a different distribution of Opie and Huck retrotransposons. The GC-rich compartment tends to be richer in GC-rich genes than the GC-poor compartment. However, the gene space compartimentalization of maize is much simpler than that of human.  相似文献   

9.
The human liver fluke, Opisthorchis viverrini, has been categorized as a class one carcinogenic organism according to its strong association with cholangiocarcinoma, bile duct cancer which has high incidence in the northeast of Thailand. The lack of genome database of this parasite limited the studies aimed to understand the basic molecular biology of this carcinogenic liver fluke. The determination of the genome size is an initial step prior to the full genome sequencing. In this study, we applied an absolute quantitative real-time polymerase chain reaction for this aspect. Our results indicated the genome size of O. viverrini is 75.95 Mb or C value 0.083. The information of O. viverrini genome size is useful for estimation of sequence coverage and the cost of the parasite's whole genome sequencing using next-generation sequencing technologies.  相似文献   

10.
Semple C 《Genome biology》2000,1(4):reviews2001.1-reviews20015
Much is expected of the draft human genome sequence, and yet there is no central resource to host the plethora of sequence and mapping information available. Consequently, finding the most useful and reliable human genome data and resources currently available on the web can be challenging, but is not impossible.  相似文献   

11.
Dynamic models of gene expression and classification   总被引:3,自引:0,他引:3  
Powerful new methods, like expression profiles using cDNA arrays, have been used to monitor changes in gene expression levels as a result of a variety of metabolic, xenobiotic or pathogenic challenges. This potentially vast quantity of data enables, in principle, the dissection of the complex genetic networks that control the patterns and rhythms of gene expression in the cell. Here we present a general approach to developing dynamic models for analyzing time series of whole genome expression. In this approach, a self-consistent calculation is performed that involves both linear and non-linear response terms for interrelating gene expression levels. This calculation uses singular value decomposition (SVD) not as a statistical tool but as a means of inverting noisy and near-singular matrices. The linear transition matrix that is determined from this calculation can be used to calculate the underlying network reflected in the data. This suggests a direct method of classifying genes according to their place in the resulting network. In addition to providing a means to model such a large multivariate system this approach can be used to reduce the dimensionality of the problem in a rational and consistent way, and suppress the strong noise amplification effects often encountered with expression profile data. Non-linear and higher-order Markov behavior of the network are also determined in this self-consistent method. In data sets from yeast, we calculate the Markov matrix and the gene classes based on the linear-Markov network. These results compare favorably with previously used methods like cluster analysis. Our dynamic method appears to give a broad and general framework for data analysis and modeling of gene expression arrays. Electronic Publication  相似文献   

12.

Background

Interlocus gene conversion (IGC) is a recombination-based mechanism that results in the unidirectional transfer of short stretches of sequence between paralogous loci. Although IGC is a well-established mechanism of human disease, the extent to which this mutagenic process has shaped overall patterns of segregating variation in multi-copy regions of the human genome remains unknown. One expected manifestation of IGC in population genomic data is the presence of one-to-one paralogous SNPs that segregate identical alleles.

Results

Here, I use SNP genotype calls from the low-coverage phase 3 release of the 1000 Genomes Project to identify 15,790 parallel, shared SNPs in duplicated regions of the human genome. My approach for identifying these sites accounts for the potential redundancy of short read mapping in multi-copy genomic regions, thereby effectively eliminating false positive SNP calls arising from paralogous sequence variation. I demonstrate that independent mutation events to identical nucleotides at paralogous sites are not a significant source of shared polymorphisms in the human genome, consistent with the interpretation that these sites are the outcome of historical IGC events. These putative signals of IGC are enriched in genomic contexts previously associated with non-allelic homologous recombination, including clear signals in gene families that form tandem intra-chromosomal clusters.

Conclusions

Taken together, my analyses implicate IGC, not point mutation, as the mechanism generating at least 2.7 % of single nucleotide variants in duplicated regions of the human genome.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1681-3) contains supplementary material, which is available to authorized users.  相似文献   

13.
Introns were found to enhance almost every steps of gene expression except increasing mRNA stability. By analyzing the genome-wide data of mRNA stability published by someone previously, we found that human intron-containing genes have more stable mRNAs than intronless genes, and the Arabidopsis thaliana genes with the most unstable mRNAs have fewer introns than other genes in the genome. After controlling for mRNA length, we found mRNA stability is still positively correlated with intron number in human intron-containing genes. But in yeast Saccharomyces cerevisiae, two different datasets on mRNA half-life gave conflicting results. The components of messenger ribonucleoprotein particles recruited during intron splicing may be retained in cytoplasmic mRNPs and act as signals of mRNA stability or simply insulators to avoid mRNA degradation.  相似文献   

14.
Adeno-associated virus (AAV) vectors have a limited capacity for packaging DNA. To insert both a therapeutic gene and a selectable marker gene in the same AAV vector efficiently, we developed a novel dicistronic AAV vector containing a 230 base pairs (bp) internal ribosome entry site (IRES) element derived from hepatitis C virus (HCV) genome and a 420 bp blasticidin S-resistance gene (bsr) as a small selectable marker in the second cistron. The 650 bp HCV IRES-bsr construct was placed downstream of the 3′ end of the luciferase gene (Luc) under the control of the human cytomegalovirus (CMV) promoter. This dicistronic gene conferred blasticidin S-resistance to 293 cells besides luciferase activity, when examined not only by transfection but also by transduction using AAV vectors. The dicistronic AAV vector harbouring HCV IRES-bsr is capable of expressing a therapeutic gene of up to 3.6 kilobases (kb) (including promoter/enhancer elements) as well as a selectable marker gene. If a selectable marker gene is not necessary, this vector is able to incorporate two different kinds of therapeutic genes more easily than that containing EMCV IRES. The dicistronic AAV vector described here is useful for expressing many kinds of cDNA besides a selectable marker.  相似文献   

15.
Comprehensive analysis of keratin gene clusters in humans and rodents   总被引:1,自引:0,他引:1  
Here, we present the comparative analysis of the two keratin (K) gene clusters in the genomes of man, mouse and rat. Overall, there is a remarkable but not perfect synteny among the clusters of the three mammalian species. The human type I keratin gene cluster consists of 27 genes and 4 pseudogenes, all in the same orientation. It is interrupted by a domain of multiple genes encoding keratin-associated proteins (KAPs). Cytokeratin, hair and inner root sheath keratin genes are grouped together in small subclusters, indicating that evolution occurred by duplication events. At the end of the rodent type I gene cluster, a novel gene related to K14 and K17 was identified, which is converted to a pseudogene in humans. The human type II cluster consists of 27 genes and 5 pseudogenes, most of which are arranged in the same orientation. Of the 26 type II murine keratin genes now known, the expression of two new genes was identified by RT-PCR. Kb20, the first gene in the cluster, was detected in lung tissue. Kb39, a new ortholog of K1, is expressed in certain stratified epithelia. It represents a candidate gene for those hyperkeratotic skin syndromes in which no K1 mutations were identified so far. Most remarkably, the human K3 gene which causes Meesmann's corneal dystrophy when mutated, lacks a counterpart in the mouse genome. While the human genome has 138 pseudogenes related to K8 and K18, the mouse and rat genomes contain only 4 and 6 such pseudogenes. Our results also provide the basis for a unified keratin nomenclature and for future functional studies.  相似文献   

16.
节肢动物线粒体基因组与系统发生重建   总被引:10,自引:0,他引:10  
对mt基因组的比较研究是探讨节肢动物系统发生的有效手段之一。基因的排列和DNA序列可以为重建节肢动物的系统发生提供有用的信息。目前,已测定mt基因组全序列的节肢动物已增加到44种。归纳、总结了节肢动物mt基因组的基本特征、基因顺序、基因重排的发生和机制等。简要评述基于mt基因组的节肢动物系统发生研究。  相似文献   

17.
18.
Genome content analysis has been used as a source of phylogenetic information in large prokaryotic tree of life studies. Recently the sequencing of many eukaryotic genomes has allowed for the similar use of genome content analysis for these organisms too. In this communication we examine the utility of genome content analysis for recovering phylogenetic patterns in several eukaryotic groups. By constructing multiple matrices using different e value cutoffs we examine the dynamics of altering the e value cutoff on five eukaryotic genome data sets. Our analysis indicates that the e value cutoff that is used as a criterion in the construction of the genome content matrix is a critical factor in both the accuracy and information content of the analysis. Strikingly, genome content by itself is not a reliable or accurate source of characters for phylogenetic analysis of the taxa in the five data sets we analyzed. We discuss two problems--small genome attraction and genome duplications as being involved in the rather poor performance of genome content data in recovering eukaryotic phylogeny.  相似文献   

19.
Sequencing family DNA samples provides an attractive alternative to population based designs to identify rare variants associated with human disease due to the enrichment of causal variants in pedigrees. Previous studies showed that genotype calling accuracy can be improved by modeling family relatedness compared to standard calling algorithms. Current family-based variant calling methods use sequencing data on single variants and ignore the identity-by-descent (IBD) sharing along the genome. In this study we describe a new computational framework to accurately estimate the IBD sharing from the sequencing data, and to utilize the inferred IBD among family members to jointly call genotypes in pedigrees. Through simulations and application to real data, we showed that IBD can be reliably estimated across the genome, even at very low coverage (e.g. 2X), and genotype accuracy can be dramatically improved. Moreover, the improvement is more pronounced for variants with low frequencies, especially at low to intermediate coverage (e.g. 10X to 20X), making our approach effective in studying rare variants in cost-effective whole genome sequencing in pedigrees. We hope that our tool is useful to the research community for identifying rare variants for human disease through family-based sequencing.  相似文献   

20.
R C Levitt 《Genomics》1991,11(2):484-489
In this review we present preliminary evidence for a new class of polymorphism that may be used in a systematic way to map cDNAs efficiently and to expedite the construction of a high-resolution genetic map of the human genome. Ultimately, transcribed 3' untranslated polymorphisms will warrant further study because they should be widely distributed throughout the genome within transcribed sequences, and they can be readily identified as a result of cDNA cloning and sequencing. Furthermore, these markers should be universally available on the basis of the sequence data and highly useful in linkage analyses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号