首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
As whole genome sequences continue to expand in number and complexity, effective methods for comparing and categorizing both genes and species represented within extremely large datasets are required. Methods introduced to date have generally utilized incomplete and likely insufficient subsets of the available data. We have developed an accurate and efficient method for producing robust gene and species phylogenies using very large whole genome protein datasets. This method relies on multidimensional protein vector definitions supplied by the singular value decomposition (SVD) of a large sparse data matrix in which each protein is uniquely represented as a vector of overlapping tetrapeptide frequencies. Quantitative pairwise estimates of species similarity were obtained by summing the protein vectors to form species vectors, then determining the cosines of the angles between species vectors. Evolutionary trees produced using this method confirmed many accepted prokaryotic relationships. However, several unconventional relationships were also noted. In addition, we demonstrate that many of the SVD-derived right basis vectors represent particular conserved protein families, while many of the corresponding left basis vectors describe conserved motifs within these families as sets of correlated peptides (copeps). This analysis represents the most detailed simultaneous comparison of prokaryotic genes and species available to date.  相似文献   

2.
MOTIVATION: Most molecular phylogenies are based on sequence alignments. Consequently, they fail to account for modes of sequence evolution that involve frequent insertions or deletions. Here we present a method for generating accurate gene and species phylogenies from whole genome sequence that makes use of short character string matches not placed within explicit alignments. In this work, the singular value decomposition of a sparse tetrapeptide frequency matrix is used to represent the proteins of organisms uniquely and precisely as vectors in a high-dimensional space. Vectors of this kind can be used to calculate pairwise distance values based on the angle separating the vectors, and the resulting distance values can be used to generate phylogenetic trees. Protein trees so derived can be examined directly for homologous sequences. Alternatively, vectors defining each of the proteins within an organism can be summed to provide a vector representation of the organism, which is then used to generate species trees. RESULTS: Using a large mitochondrial genome dataset, we have produced species trees that are largely in agreement with previously published trees based on the analysis of identical datasets using different methods. These trees also agree well with currently accepted phylogenetic theory. In principle, our method could be used to compare much larger bacterial or nuclear genomes in full molecular detail, ultimately allowing accurate gene and species relationships to be derived from a comprehensive comparison of complete genomes. In contrast to phylogenetic methods based on alignments, sequences that evolve by relative insertion or deletion would tend to remain recognizably similar.  相似文献   

3.
ABSTRACT: BACKGROUND: The increased use of multi-locus data sets for phylogenetic reconstruction has increased the need to determine whether a set of gene trees significantly deviate from the phylogenetic patterns of other genes. Such unusual gene trees may have been influenced by other evolutionary processes such as selection, gene duplication, or horizontal gene transfer. RESULTS: Motivated by this problem we propose a nonparametric goodness-of-fit test for two empirical distributions of gene trees, and we developed the software GeneOut to estimate a p-value for the test. Our approach maps trees into a multi-dimensional vector space and then applies support vector machines (SVMs) to measure the separation between two sets of pre-defined trees. We use a permutation test to assess the significance of the SVM separation. To demonstrate the performance of GeneOut, we applied it to the comparison of gene trees simulated within different species trees across a range of species tree depths. Applied directly to sets of simulated gene trees with large sample sizes, GeneOut was able to detect very small differences between two set of gene trees generated under different species trees. Our statistical test can also include tree reconstruction into its test framework through a variety of phylogenetic optimality criteria. When applied to DNA sequence data simulated from different sets of gene trees, results in the form of receiver operating characteristic (ROC) curves indicated that GeneOut performed well in the detection of differences between sets of trees with different distributions in a multi-dimensional space. Furthermore, it controlled false positive and false negative rates very well, indicating a high degree of accuracy. CONCLUSIONS: The non-parametric nature of our statistical test provides fast and efficient analyses, and makes it an applicable test for any scenario where evolutionary or other factors can lead to trees with different multi-dimensional distributions. The software GeneOut is freely available under the GNU public license.  相似文献   

4.
A Gateway-based platform for multigene plant transformation   总被引:2,自引:0,他引:2  
The post-genomic era offers unrivalled opportunities for genetic manipulation of polygenic traits, multiple traits, and multiple gene products. However, remaining technical hurdles make the manipulation of multiple genes in plants difficult. Here we describe a Gateway-based vector system to enable multiple transgenes to be directly linked or fused. The vector system consists of a destination vector and two special attL-flanked entry vectors each containing an attR cassette incompatible with the attL. By multiple rounds of LR recombination reactions, which we call MultiRound Gateway, multiple transgenes can be delivered sequentially and indefinitely into the Gateway-compatible destination vector through alternate use of the two special entry vectors. In our proof-of-principle experiments we have used this vector system to construct a plant transformation vector containing seven functional DNA fragments, including a screening marker gene, two reporter genes and four matrix attachment region sequences. This system provides a platform for fully realizing the potential of plant genetic manipulation.Electronic Supplementary Material Supplementary material is available to authorised users in the online version of this article at .  相似文献   

5.
The current challenge, now that two plant genomes have been sequenced, is to assign a function to the increasing number of predicted genes. In Arabidopsis, approximately 55% of genes can be assigned a putative function, however, less than 8% of these have been assigned a function by direct experimental evidence. To identify these functions, many genes will have to undergo comprehensive analyses, which will include the production of chimeric transgenes for constitutive or inducible ectopic expression, for antisense or dominant negative expression, for subcellular localization studies, for promoter analysis, and for gene complementation studies. The production of such transgenes is often hampered by laborious conventional cloning technology that relies on restriction digestion and ligation. With the aim of providing tools for high throughput gene analysis, we have produced a Gateway-compatible Agrobacterium sp. binary vector system that facilitates fast and reliable DNA cloning. This collection of vectors is freely available, for noncommercial purposes, and can be used for the ectopic expression of genes either constitutively or inducibly. The vectors can be used for the expression of protein fusions to the Aequorea victoria green fluorescent protein and to the beta-glucuronidase protein so that the subcellular localization of a protein can be identified. They can also be used to generate promoter-reporter constructs and to facilitate efficient cloning of genomic DNA fragments for complementation experiments. All vectors were derived from pCambia T-DNA cloning vectors, with the exception of a chemically inducible vector, for Agrobacterium sp.-mediated transformation of a wide range of plant species.  相似文献   

6.
7.
Insertional mutagenesis is a technique often used to inactivate genes in Streptococcus pneumoniae. Using conventional vectors, a 5' segment of the targeted gene remains under the control of the gene's authentic promoter following gene disruption. Thus, the expression of a functional peptide and the misinterpretation of results in consequence cannot be excluded. To circumvent this problem, we have developed a plasmid for insertional mutagenesis based on the tmRNA-tagging system of S. pneumoniae which ensures that any protein expressed after gene disruption is degraded. Insertional mutagenesis using this vector results in the targeted gene being tagged with a tmRNA-derived sequence coding for a proteolysis tag. Here we show that the translation product of a gene tagged by this method is not detectable by Western blotting, suggesting that the protein was degraded. This modified vector allows total inactivation of genes with a reliability that cannot be achieved by conventional vectors for insertional mutagenesis. This approach can be applied to other bacterial species.  相似文献   

8.
Here, a new theory of molecular phylogeny is developed in a multidimensional vector space (MVS). The molecular evolution is represented as a successive splitting of branch vectors in the MVS. The end points of these vectors are the extant species and indicate the specific directions reflected by their individual histories of evolution in the past. This representation makes it possible to infer the phylogeny (evolutionary histories) from the spatial positions of the end points. Search vectors are introduced to draw out the groups of species distributed around them. These groups are classified according to the nearby order of branches with them. A law of physics is applied to determine the species positions in the MVS. The species are regarded as the particles moving in time according to the equation of motion, finally falling into the lowest-energy state in spite of their randomly distributed initial condition. This falling into the ground state results in the construction of an MVS in which the relative distances between two particles are equal to the substitution distances. The species positions are obtained prior to the phylogeny inference. Therefore, as the number of species increases, the species vectors can be more specific in an MVS of a larger size, such that the vector analysis gives a more stable and reliable topology. The efficacy of the present method was examined by using computer simulations of molecular evolution in which all the branch- and end-point sequences of the trees are known in advance. In the phylogeny inference from the end points with 100 multiple data sets, the present method consistently reconstructed the correct topologies, in contrast to standard methods. In applications to 185 vertebrates in the alpha-hemoglobin, the vector analysis drew out the two lineage groups of birds and mammals. A core member of the mammalian radiation appeared at the base of the mammalian lineage. Squamates were isolated from the bird lineage to compose the outgroup, while the other living reptilians were directly coupled with birds without forming any sister groups. This result is in contrast to the morphological phylogeny and is also different from those of recent molecular analyses.  相似文献   

9.
10.
There is considerable interest in the use of bacteriophage vectors for mammalian cell gene transfer applications, due to their stability, excellent safety profile and inexpensive mass production. However, to date, phage vectors have been plagued by mediocre performance as gene transfer agents. This may reflect the complexity of the viral infection process in mammalian cells and the need to refine each step of this process in order to arrive at an optimal, phage-based gene transfer system. Therefore, a flexible system was designed that alowed for the introduction of multiple modifications on the surface of bacteriophage lambda. Using this novel method, multiple peptides were displayed simultaneously from both the phage head and tail. Surface head display of an ubiquitinylation motif greatly increased the efficiency of phage-mediated gene transfer in a murine macrophage cell line. Gene transfer was further increased when this peptide was displayed in combination with a tail-displayed CD40-binding motif. Overall, this work provides a novel system that can be used to rationally improve bacteriophage gene transfer vectors and shows it may be possible to enhance the efficiency of phage-mediated gene transfer by targeting and optimizing multiple steps within the viral infection pathway.  相似文献   

11.
The experimental control of gene expression in specific tissues or cells at defined time points is a useful tool for the analysis of gene function. GAL4/VP16-UAS enhancer trap lines can be used to selectively express genes in specific tissues or cells, and an ethanol-inducible system can help to control the time of expression. In this study, the combination of the two methods allowed the successful regulation of gene expression in both time and space. For this purpose, a binary vector, 962-UAS::GUS, was constructed in which the ALCR activator and β-glucuronidase (GUS) reporter gene were placed under the control of upstream activator sequence (UAS) elements and the alcA response element, respectively. Three different GAL4/VP16-UAS enhancer trap lines of Arabidopsis were transformed, resulting in transgenic plants in which GUS activity was detected only on ethanol induction and exclusively in the predicted tissues of the enhancer trap lines. As a library of different enhancer trap lines with distinct green fluorescent protein (GFP) patterns exist, transformation with a similar vector, in which GUS is replaced by another gene, would enable the control of the time and place of transgene expression. We have constructed two vectors for easy cloning of the gene of interest, one with a polylinker site and one that is compatible with the GATEWAY™ vector conversion system. The method can be extended to other species when enhancer trap lines become available.  相似文献   

12.
Yau SS  Yu C  He R 《DNA and cell biology》2008,27(5):241-250
Graphical representation of gene sequences provides a simple way of viewing, sorting, and comparing various gene structures. Here we first report a two-dimensional graphical representation for protein sequences. With this method, we constructed the moment vectors for protein sequences, and mathematically proved that the correspondence between moment vectors and protein sequences is one-to-one. Therefore, each protein sequence can be represented as a point in a map, which we call protein map, and cluster analysis can be used for comparison between the points. Sixty-six proteins from five protein families were analyzed using this method. Our data showed that for proteins in the same family, their corresponding points in the map are close to each other. We also illustrate the efficiency of this approach by performing an extensive cluster analysis of the protein kinase C family. These results indicate that this protein map could be used to mathematically specify the similarity of two proteins and predict properties of an unknown protein based on its amino acid sequence.  相似文献   

13.
The primary receptor, the coxsackievirus and adenovirus receptor (CAR), and the secondary receptor, αv integrins, are the tropism determinants of adenovirus (Ad) type 5. Inhibition of the interaction of both the fiber with CAR and the penton base with the αv integrin appears to be crucial to the development of targeted Ad vectors, which specifically transduce a given cell population. In this study, we developed Ad vectors with ablation of both CAR and αv integrin binding by mutating the fiber knob and the RGD motif of the penton base. We also replaced the fiber shaft domain with that derived from Ad type 35. High transduction efficiency in the mouse liver was suppressed approximately 130- to 270-fold by intravenous administration of the double-mutant Ad vectors, which mutated two domains each of the fiber knob and shaft and the RGD motif of the penton base compared with those of conventional Ad vectors (type 5). Most significantly, the triple-mutant Ad vector containing the fiber knob with ablation of CAR binding ability, the fiber shaft of Ad type 35, and the penton base with a deletion of the RGD motif mediated a >30,000-fold lower level of mouse liver transduction than the conventional Ad vectors. This triple-mutant Ad vector also mediated reduced transduction in other organs (the spleen, kidney, heart, and lung). Viral DNA analysis showed that systemically delivered triple-mutant Ad vector was primarily taken up by liver nonparenchymal cells and that most viral DNAs were easily degraded, resulting in little gene expression in the liver. These results suggest that the fiber knob, fiber shaft, and RGD motif of the penton base each plays an important role in Ad vector-mediated transduction to the mouse liver and that the triple-mutant Ad vector exhibits little tropism to any organs and appears to be a fundamental vector for targeted Ad vectors.  相似文献   

14.
Inferring species phylogenies is an important part of understanding molecular evolution. Even so, it is well known that an accurate phylogenetic tree reconstruction for a single gene does not always necessarily correspond to the species phylogeny. One commonly accepted strategy to cope with this problem is to sequence many genes; the way in which to analyze the resulting collection of genes is somewhat more contentious. Supermatrix and supertree methods can be used, although these can suppress conflicts arising from true differences in the gene trees caused by processes such as lineage sorting, horizontal gene transfer, or gene duplication and loss. In 2004, Huson et al. (IEEE/ACM Trans. Comput. Biol. Bioinformatics 1:151-158) presented the Z-closure method that can circumvent this problem by generating a supernetwork as opposed to a supertree. Here we present an alternative way for generating supernetworks called Q-imputation. In particular, we describe a method that uses quartet information to add missing taxa into gene trees. The resulting trees are subsequently used to generate consensus networks, networks that generalize strict and majority-rule consensus trees. Through simulations and application to real data sets, we compare Q-imputation to the matrix representation with parsimony (MRP) supertree method and Z-closure, and demonstrate that it provides a useful complementary tool.  相似文献   

15.
Liu L  Yu L 《Systematic biology》2011,60(5):661-667
In this study, we develop a distance method for inferring unrooted species trees from a collection of unrooted gene trees. The species tree is estimated by the neighbor joining (NJ) tree built from a distance matrix in which the distance between two species is defined as the average number of internodes between two species across gene trees, that is, average gene-tree internode distance. The distance method is named NJ(st) to distinguish it from the original NJ method. Under the coalescent model, we show that if gene trees are known or estimated correctly, the NJ(st) method is statistically consistent in estimating unrooted species trees. The simulation results suggest that NJ(st) and STAR (another coalescence-based method for inferring species trees) perform almost equally well in estimating topologies of species trees, whereas the Bayesian coalescence-based method, BEST, outperforms both NJ(st) and STAR. Unlike BEST and STAR, the NJ(st) method can take unrooted gene trees to infer species trees without using an outgroup. In addition, the NJ(st) method can handle missing data and is thus useful in phylogenomic studies in which data sets often contain missing loci for some individuals.  相似文献   

16.
Increasing the efficiency of gene transfer using non-viral vectors, which have the potential to be safe and economical, would improve upon available options for gene therapy. We previously reported that the third EGF motif of the extracellular matrix protein Del1 (E3) increases the transfection efficiency of non-viral vector methods. Here, we asked if E3 could increase the in vivo transfection efficiency of a polyplex-based approach. To test this, cDNA encoding a heat-stable alkaline phosphatase (AP) was first injected intravenously into mice along with recombinant E3. After 24 h, exogenous AP activity in serum was measured. We found that the introduction of E3 resulted in 50 % more AP activity as compared to the control. We next tested transfection into a tumour explant of SCCKN cells, an oral carcinoma-derived cell line. To do this, a cDNA encoding yellow fluorescent protein was locally injected into a tumour explant, followed by local injection of recombinant E3. Use of E3 increased the number of transfected cells to 2.5 times that of the control. Histochemical staining revealed that E3-induced apoptosis in a tumour explant. The data suggest that E3 might be a useful tool for cancer gene therapy using non-viral vectors.  相似文献   

17.
The avian adenovirus CELO is being developed as a gene transfer tool. Using homologous recombination in Escherichia coli, the CELO genome was screened for regions that could be deleted and would tolerate the insertion of a marker gene (luciferase or enhanced green fluorescent protein). For each mutant genome, the production of viable virus able to deliver the transgene to target cells was monitored. A series of mutants in the genome identified a set of open reading frames that could be deleted but which must be supplied in trans for virus replication. A region of the genome which is dispensable for viral replication and allows the insertion of an expression cassette was identified and a vector based on this mutation was evaluated as a gene delivery reagent. Transduction of avian cells occurs at 10- to 100-fold greater efficiency (per virus particle) than with an adenovirus type 5 (Ad5)-based vector carrying the same expression cassette. Most important for gene transfer applications, the CELO vector transduced mammalian cells as efficiently as an Ad5 vector. The CELO vector is exceptionally stable, can be grown inexpensively in chicken embryos, and provides a useful alternative to Ad5-based vectors.  相似文献   

18.
Under a coalescent model for within-species evolution, gene trees may differ from species trees to such an extent that the gene tree topology most likely to evolve along the branches of a species tree can disagree with the species tree topology. Gene tree topologies that are more likely to be produced than the topology that matches that of the species tree are termed anomalous, and the region of branch-length space that gives rise to anomalous gene trees (AGTs) is the anomaly zone. We examine the occurrence of anomalous gene trees for the case of five taxa, the smallest number of taxa for which every species tree topology has a nonempty anomaly zone. Considering all sets of branch lengths that give rise to anomalous gene trees, the largest value possible for the smallest branch length in the species tree is greater in the five-taxon case (0.1934 coalescent time units) than in the previously studied case of four taxa (0.1568). The five-taxon case demonstrates the existence of three phenomena that do not occur in the four-taxon case. First, anomalous gene trees can have the same unlabeled topology as the species tree. Second, the anomaly zone does not necessarily enclose a ball centered at the origin in branch-length space, in which all branches are short. Third, as a branch length increases, it is possible for the number of AGTs to increase rather than decrease or remain constant. These results, which help to describe how the properties of anomalous gene trees increase in complexity as the number of taxa increases, will be useful in formulating strategies for evading the problem of anomalous gene trees during species tree inference from multilocus data.  相似文献   

19.
The singular value decomposition (SVD) provides a method for decomposing a molecular dynamics trajectory into fundamental modes of atomic motion. The right singular vectors are projections of the protein conformations onto these modes showing the protein motion in a generalized low-dimensional basis. Statistical analysis of the right singular vectors can be used to classify discrete configurational substates in the protein. The configuration space portraits formed from the right singular vectors can also be used to visualize complex high-dimensional motion and to examine the extent of configuration space sampling by the simulation. © 1995 Wiley-Liss, Inc.  相似文献   

20.
B Nilsson  L Abrahmsn    M Uhln 《The EMBO journal》1985,4(4):1075-1080
Two improved plasmid vectors, containing the gene coding for staphylococcal protein A and adapted for gene fusions, have been constructed. These vectors allow fusion of any gene to the protein A moiety, giving fusion proteins which can be purified, in a one-step procedure by IgG affinity chromatography. One vector, pRIT2, is designed for temperature-inducible expression of intracellular fusion proteins in Escherichia coli and the other pRIT5, is a shuttle vector designed for secretion. The latter gives a periplasmatic fusion protein in E. coli and an extracellular protein in Gram-positive hosts such as Staphylococcus aureus. The usefulness of these vectors is exemplified by fusion of the protein A gene and the E. coli genes encoding the enzymes beta-galactosidase and alkaline phosphatase. High amounts of intact fusion protein are produced which can be immobilized on IgG-Sepharose in high yield (95-100%) without loss of enzymatic activity. Efficient secretion in both E. coli and S. aureus, was obtained for the alkaline phosphatase hybrid, in contrast to beta-galactosidase which was only expressed efficiently using the intracellular system. More than 80% of the protein A alkaline-phosphatase hybrid protein can be eluted from IgG affinity columns without loss of enzymatic activity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号