首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 562 毫秒
1.
2.
3.
Microarray data quality analysis: lessons from the AFGC project   总被引:10,自引:0,他引:10  
Genome-wide expression profiling with DNA microarrays has and will provide a great deal of data to the plant scientific community. However, reliability concerns have required the development data quality tests for common systematic biases. Fortunately, most large-scale systematic biases are detectable and some are correctable by normalization. Technical replication experiments and statistical surveys indicate that these biases vary widely in severity and appearance. As a result, no single normalization or correction method currently available is able to address all the issues. However, careful sequence selection, array design, experimental design and experimental annotation can substantially improve the quality and biological of microarray data. In this review, we discuss these issues with reference to examples from the Arabidopsis Functional Genomics Consortium (AFGC) microarray project.  相似文献   

4.
Genome-wide expression profiling with DNA microarrays has and will provide a great deal of data to the plant scientific community. However, reliability concerns have required the development data quality tests for common systematic biases. Fortunately, most large-scale systematic biases are detectable and some are correctable by normalization. Technical replication experiments and statistical surveys indicate that these biases vary widely in severity and appearance. As a result, no single normalization or correction method currently available is able to address all the issues. However, careful sequence selection, array design, experimental design and experimental annotation can substantially improve the quality and biological of microarray data. In this review, we discuss these issues with reference to examples from the Arabidopsis Functional Genomics Consortium (AFGC) microarray project.  相似文献   

5.
To begin biochemical and molecular studies on the biosynthesis of the type II arabinogalactan chains on arabinogalactan-proteins (AGPs), we adopted a bioinformatic approach to identify and systematically characterise the putative galactosyltransferases (GalTs) responsible for synthesizing the beta-(1,3)-Gal linkage from CAZy GT-family-31 from Arabidopsis thaliana. These analyses confirmed that 20 members of the GT-31 family contained domains/motifs typical of biochemically characterised beta-(1,3)-GTs from mammalian systems. Microarray data confirm that members of this family are expressed throughout all tissues making them likely candidates for the assembly of the ubiquitously found AGPs. One member, At1g77810, was selected for further analysis including location studies that confirmed its presence in the Golgi and preliminary enzyme substrate specificity studies that demonstrated beta-(1,3)-GalT activity. This bioinformatic/molecular study of CAZy GT-family-31 was validated by the recent report of Strasser et al. (Plant Cell 19:2278-2292, 2007) that another member of this family (At1g26810; GALT1) encodes a beta-(1,3)-GalT involved in the biosynthesis of the Lewis a epitope of N-glycans in Arabidopsis thaliana.  相似文献   

6.
7.
Jung KH  Lee J  Dardick C  Seo YS  Cao P  Canlas P  Phetsom J  Xu X  Ouyang S  An K  Cho YJ  Lee GC  Lee Y  An G  Ronald PC 《PLoS genetics》2008,4(8):e1000164
Functional redundancy limits detailed analysis of genes in many organisms. Here, we report a method to efficiently overcome this obstacle by combining gene expression data with analysis of gene-indexed mutants. Using a rice NSF45K oligo-microarray to compare 2-week-old light- and dark-grown rice leaf tissue, we identified 365 genes that showed significant 8-fold or greater induction in the light relative to dark conditions. We then screened collections of rice T-DNA insertional mutants to identify rice lines with mutations in the strongly light-induced genes. From this analysis, we identified 74 different lines comprising two independent mutant lines for each of 37 light-induced genes. This list was further refined by mining gene expression data to exclude genes that had potential functional redundancy due to co-expressed family members (12 genes) and genes that had inconsistent light responses across other publicly available microarray datasets (five genes). We next characterized the phenotypes of rice lines carrying mutations in ten of the remaining candidate genes and then carried out co-expression analysis associated with these genes. This analysis effectively provided candidate functions for two genes of previously unknown function and for one gene not directly linked to the tested biochemical pathways. These data demonstrate the efficiency of combining gene family-based expression profiles with analyses of insertional mutants to identify novel genes and their functions, even among members of multi-gene families.  相似文献   

8.
Small auxin-up RNAs(SAURs)are the early auxin-responsive genes represented by a large multigene family in plants.Here,we identified 79 SAUR gene family members from maize(Zea mays subsp.mays)by a reiterative database search and manual annotation.Phylogenetic analysis indicated that the SAUR proteins from Arabidopsis,rice,sorghum,and maize had divided into 16 groups.These genes were non-randomly distributed across the maize chromosomes,and segmental duplication and tandem duplication contributed to the expansion of the maize SAUR gene family.Synteny analysis established orthology relationships and functional linkages between SAUR genes in maize and sorghum genomes.We also found that the auxin-responsive elements were conserved in the upstream sequences of maize SAUR members.Selection analyses identified some significant site-specific constraints acted on most SAUR paralogs.Expression profiles based on microarray data have provided insights into the possible functional divergence among members of the SAUR gene family.Quantitative real-time PCR analysis indicated that some of the 10 randomly selected ZmSAUR genes could be induced at least in maize shoot or root tissue tested.The results reveal a comprehensive overview of the maize SAUR gene family and may pave the way for deciphering their function during plant development.  相似文献   

9.
Small auxin-up RNAs (.SAURs) are the early auxin- responsive genes represented by a large multigene family in plants. Here, we identified 79 SAUR gene family members from maize (Zea mays subsp, mays) by a reiterative database search and manual annotation. Phylogenetic analysis indicated that the SAUR proteins from Arabidopsis, rice, sorghum, and maize had divided into 16 groups. These genes were non-randomly distributed across the maize chromosomes, and segmental duplication and tandem duplication contributed to the expansion of the maize .SAUR gene family. Synteny analysis established ortholos~J relationships and functional linkages between SAUR genes in maize and sorghum genomes. We also found that the auxin-responsive elements were conserved in the upstream sequences of maize SAUR members. Selection analyses identified some significant site-specific constraints acted on most SAUR paralogs. Expression profiles based on microarray data have provided insights into the possible functional divergence among members of the .SAUR gene family. Quantitative real-time PCR analysis indicated that some of the 10 randomly selected ZmSAUR genes could be induced at least in maize shoot or root tissue tested. The results reveal a comprehensive overview of the maize .SAUR gene family and may pave the way for deciphering their function during pJant development.  相似文献   

10.
Hydroxyproline-rich glycoproteins (HRGPs) are a superfamily of plant cell wall proteins that function in diverse aspects of plant growth and development. This superfamily consists of three members: hyperglycosylated arabinogalactan proteins (AGPs), moderately glycosylated extensins (EXTs), and lightly glycosylated proline-rich proteins (PRPs). Hybrid and chimeric versions of HRGP molecules also exist. In order to “mine” genomic databases for HRGPs and to facilitate and guide research in the field, the BIO OHIO software program was developed that identifies and classifies AGPs, EXTs, PRPs, hybrid HRGPs, and chimeric HRGPs from proteins predicted from DNA sequence data. This bioinformatics program is based on searching for biased amino acid compositions and for particular protein motifs associated with known HRGPs. HRGPs identified by the program are subsequently analyzed to elucidate the following: (1) repeating amino acid sequences, (2) signal peptide and glycosylphosphatidylinositol lipid anchor addition sequences, (3) similar HRGPs via Basic Local Alignment Search Tool, (4) expression patterns of their genes, (5) other HRGPs, glycosyl transferase, prolyl 4-hydroxylase, and peroxidase genes coexpressed with their genes, and (6) gene structure and whether genetic mutants exist in their genes. The program was used to identify and classify 166 HRGPs from Arabidopsis (Arabidopsis thaliana) as follows: 85 AGPs (including classical AGPs, lysine-rich AGPs, arabinogalactan peptides, fasciclin-like AGPs, plastocyanin AGPs, and other chimeric AGPs), 59 EXTs (including SP5 EXTs, SP5/SP4 EXTs, SP4 EXTs, SP4/SP3 EXTs, a SP3 EXT, “short” EXTs, leucine-rich repeat-EXTs, proline-rich extensin-like receptor kinases, and other chimeric EXTs), 18 PRPs (including PRPs and chimeric PRPs), and AGP/EXT hybrid HRGPs.The genomics era has produced vast amounts of biological data that await examination. In order to “mine” such data effectively, a bioinformatics approach can be utilized to identify genes of interest, subject them to various in silico analyses, and extract relevant biological information on them from various public databases. Examination of such data produces novel insights with respect to the genes in question and can be used to facilitate and guide further research in the field. Such is the case here, where bioinformatics tools were developed to identify, classify, and analyze members of the Hyp-rich glycoprotein (HRGP) superfamily encoded by the Arabidopsis (Arabidopsis thaliana) genome.HRGPs are a superfamily of plant cell wall proteins that are subdivided into three families, arabinogalactan proteins (AGPs), extensins (EXTs), and Pro-rich proteins (PRPs), and extensively reviewed (Showalter, 1993; Kieliszewski and Lamport, 1994; Nothnagel, 1997; Cassab, 1998; José-Estanyol and Puigdomènech, 2000; Seifert and Roberts, 2007). However, it has become increasingly clear that the HRGP superfamily is perhaps better represented as a spectrum of molecules ranging from the highly glycosylated AGPs to the moderately glycosylated EXTs and finally to the lightly glycosylated PRPs. Moreover, hybrid HRGPs, composed of HRGP modules from different families, and chimeric HRGPs, composed of one or more HRGP modules within a non-HRGP protein, also can be considered part of the HRGP superfamily. Given that many HRGPs are composed of repetitive protein sequences, particularly the EXTs and PRPs, and many have low sequence similarity to one another, particularly the AGPs, BLAST searches typically identify only a few closely related family members and do not represent a particularly effective means to identify members of the HRGP superfamily in a comprehensive manner.Building upon the work of Schultz et al. (2002) that focused on the AGP family, a new bioinformatics software program, BIO OHIO, developed at Ohio University, makes it possible to search all 28,952 proteins encoded by the Arabidopsis genome and identify putative HRGP genes. Two distinct types of searches are possible with this program. First, the program can search for biased amino acid compositions in the genome-encoded protein sequences. For example, classical AGPs can be identified by their biased amino acid compositions of greater then 50% Pro (P), Ala (A), Ser (S), and Thr (T), as indicated by greater than 50% PAST. Similarly, arabinogalactan peptides (AG peptides) are identified by biased amino acid compositions of greater then 35% PAST, but the protein (i.e. peptide) must also be between 50 and 90 amino acids in length. Likewise, PRPs can be identified by a biased amino acid composition of greater then 45% PVKCYT. Second, the program can search for specific amino acid motifs that are commonly found in known HRGPs. For example, SP4 pentapeptide and SP3 tetrapeptide motifs are associated with EXTs, a fasciclin H1 motif is found in fasciclin-like AGPs (FLAs), and PPVX(K/T) (where X is any amino acid) and KKPCPP motifs are found in several known PRPs (Fowler et al., 1999). In addition to searching for HRGPs, the program can analyze proteins identified by a search. For example, the program checks for potential signal peptide sequences and glycosylphosphatidylinositol (GPI) plasma member anchor addition sequences, both of which are associated with HRGPs (Showalter, 1993, 2001; Youl et al., 1998; Sherrier et al., 1999; Svetek et al., 1999). Moreover, the program can identify repeated amino acid sequences within the sequence and has the ability to search for bias amino acid compositions within a sliding window of user-defined size, making it possible to identify HRGP domains within a protein sequence.Here, we report on the use of this bioinformatics program in identifying, classifying, and analyzing members of the HRGP superfamily (i.e. AGPs, EXTs, PRPs, hybrid HRGPs, and chimeric HRGPs) in the genetic model plant Arabidopsis. An overview of this bioinformatics approach is presented in Figure 1. In addition, public databases and programs were accessed and utilized to extract relevant biological information on these HRGPs in terms of their expression patterns, most similar sequences via BLAST analysis, available genetic mutants, and coexpressed HRGP, glycosyl transferase (GT), prolyl 4-hydroxylase (P4H), and peroxidase genes in Arabidopsis. This information provides new insight to the HRGP superfamily and can be used by researchers to facilitate and guide further research in the field. Moreover, the bioinformatics tools developed here can be readily applied to protein sequences from other species to analyze their HRGPs or, for that matter, any given protein family by altering the input parameters.Open in a separate windowFigure 1.Bioinformatics workflow diagram summarizing the identification, classification, and analysis of HRGPs (AGPs, EXTs, and PRPs) in Arabidopsis. Classical AGPs were defined as containing greater than 50% PAST coupled with the presence of AP, PA, SP, and TP repeats distributed throughout the protein, Lys-rich AGPs were a subgroup of classical AGPs that included a Lys-rich domain, and chimeric AGPs were defined as containing greater than 50% PAST coupled with the localized distribution of AP, PA, SP, and TP repeats. AG peptides were defined to be 50 to 90 amino acids in length and containing greater than 35% PAST coupled with the presence of AP, PA, SP, and TP repeats distributed throughout the peptide. FLAs were defined as having a fasciclin domain coupled with the localized distribution of AP, PA, SP, and TP repeats. Extensins were defined as containing two or more SP3 or SP4 repeats coupled with the distribution of such repeats throughout the protein; chimeric extensins were similarly identified but were distinguished from the extensins by the localized distribution of such repeats in the protein; and short extensins were defined to be less than 200 amino acids in length coupled with the extensin definition. PRPs were identified as containing greater than 45% PVKCYT or two or more KKPCPP or PVX(K/T) repeats coupled with the distribution of such repeats and/or PPV throughout the protein. Chimeric PRPs were similarly identified but were distinguished from PRPs by the localized distribution of such repeats in the protein. Hybrid HRGPs (i.e. AGP/EXT hybrids) were defined as containing two or more repeat units used to identify AGPs, extensins, or PRPs. The presence of a signal peptide was used to provide added support for the identification of an HRGP but was not used in an absolute fashion. Similarly, the presence of a GPI anchor addition sequence was used to provide added support for the identification of classical AGPs and AG peptides, which are known to contain such sequences. BLAST searches were also used to provide some support to our classification if the query sequence showed similarity to other members of an HRGP subfamily. Note that some AGPs, particularly chimeric AGPs, and PRPs were identified from an Arabidopsis database annotation search and that two chimeric extensins were identified from the primary literature as noted in the text.  相似文献   

11.
Arabinogalactan proteins (AGPs) are extracellular proteoglycans implicated in plant growth and development. We searched for classical AGPs in Arabidopsis by identifying expressed sequence tags based on the conserved domain structure of the predicted protein backbone. To confirm that these genes encoded bona fide AGPs, we purified native AGPs and then deglycosylated and deblocked them for N-terminal protein sequencing. In total, we identified 15 genes encoding the protein backbones of classical AGPs, including genes for AG peptides-AGPs with very short backbones (10 to 13 amino acid residues). Seven of the AGPs were verified as AGPs by protein sequencing. A gene encoding a putative cell adhesion molecule with AGP-like domains was also identified. This work provides a firm foundation for beginning functional analysis by using a genetic approach.  相似文献   

12.
13.
Large-scale analysis of the GRAS gene family in Arabidopsis thaliana   总被引:2,自引:0,他引:2  
  相似文献   

14.
NIMA-related kinases (Neks) are a family of serine/threonine kinases that have been linked to cell-cycle regulation in fungi and mammals. Information regarding the function of Neks in plants is very limited. We screened the three plant species that have had their genomes sequenced in an attempt to improve our understanding of their role in plants. We retrieved seven members in Arabidopsis thaliana, nine in Populus trichocarpa and six in Oryza sativa. Phylogenetic analysis showed that plant Neks are closely related to each other and contain paralogous genes. Moreover, their chromosome distribution and their exon-intron structure revealed that the actual plant Nek family was derived from a single representative followed by large segmental duplication events. Functional expression analyses in the three species relied on RTqPCR in poplar and publicly available microarray data for Arabidopsis and rice. Although plant Neks are present in every organ analyzed, their expression profiles suggest their involvement in plant development processes. Furthermore, we showed that PNek1, a member of the poplar family, is expressed at sites of free auxin synthesis and is specifically involved during the vascularization process.  相似文献   

15.
The plant cell wall is of supermolecular architecture, and is composed of various types of heterogeneous polymers. A few thousand enzymes and structural proteins are directly involved in the construction processes, and in the functional aspects of the dynamic architecture in Arabidopsis thaliana. Most of these proteins are encoded by multigene families, and most members within each family share significant similarities in structural features, but often exhibit differing expression profiles and physiological functions. Thus, for the molecular dissection of cell wall dynamics, it is necessary to distinguish individual members within a family of proteins. As a first step towards characterizing the processes involved in cell wall dynamics, we have manufactured a gene-specific 70-mer oligo microarray that consists of 765 genes classified into 30 putative families of proteins that are implicated in the cell wall dynamics of Arabidopsis. By using this array system, we identified several sets of genes that exhibit organ preferential expression profiles. We also identified gene sets that are expressed differentially at certain specific growth stages of the Arabidopsis inflorescence stem. Our results indicate that there is a division of roles among family members within each of the putative cell wall-related gene families.  相似文献   

16.
BACKGROUND: The fasciclin-like arabinogalactan-proteins (FLAs) are an enigmatic class of 21 members within the larger family of arabinogalactan-proteins (AGPs) in Arabidopsis thaliana. Located at the cell surface, in the cell wall/plasma membrane, they are implicated in many developmental roles yet their function remains largely undefined. Fasciclin (FAS) domains are putative cell-adhesion domains found in extracellular matrix proteins of organisms from all kingdoms, but the juxtaposition of FAS domains with highly glycosylated AGP domains is unique to plants. Recent studies have started to elucidate the role of FLAs in Arabidopsis development. FLAs containing a single FAS domain are important for the integrity and elasticity of the plant cell wall matrix (FLA11 and FLA12) and FLA3 is involved in microspore development. FLA4/SOS5 with two FAS domains and two AGP domains has a role in maintaining proper cell expansion under salt stressed conditions. The role of other FLAs remains to be uncovered. METHOD/PRINCIPAL FINDINGS: Here we describe the characterisation of a T-DNA insertion mutant in the FLA1 gene (At5g55730). Under standard growth conditions fla1-1 mutants have no obvious phenotype. Based on gene expression studies, a putative role for FLA1 in callus induction was investigated and revealed that fla1-1 has a reduced ability to regenerate shoots in an in vitro shoot-induction assay. Analysis of FLA1p:GUS reporter lines show that FLA1 is expressed in several tissues including stomata, trichomes, the vasculature of leaves, the primary root tip and in lateral roots near the junction of the primary root. CONCLUSION: The results of the developmental expression of FLA1 and characterisation of the fla1 mutant support a role for FLA1 in the early events of lateral root development and shoot development in tissue culture, prior to cell-type specification.  相似文献   

17.
Identifying genes involved in complex neuropsychiatric disorders through classic human genetic approaches has proven difficult. To overcome that barrier, we have developed a translational approach called Convergent Functional Genomics (CFG), which cross-matches animal model microarray gene expression data with human genetic linkage data as well as human postmortem brain data and biological role data, as a Bayesian way of cross-validating findings and reducing uncertainty. Our approach produces a short list of high probability candidate genes out of the hundreds of genes changed in microarray datasets and the hundreds of genes present in a linkage peak chromosomal area. These genes can then be prioritized, pursued, and validated in an individual fashion using: (1) human candidate gene association studies and (2) cell culture and mouse transgenic models. Further bioinformatics analysis of groups of genes identified through CFG leads to insights into pathways and mechanisms that may be involved in the pathophysiology of the illness studied. This simple but powerful approach is likely generalizable to other complex, non-neuropsychiatric disorders, for which good animal models, as well as good human genetic linkage datasets and human target tissue gene expression datasets exist.  相似文献   

18.
The comparison of gene expression profiles among DNA microarray experiments enables the identification of unknown relationships among experiments to uncover the underlying biological relationships. Despite the ongoing accumulation of data in public databases, detecting biological correlations among gene expression profiles from multiple laboratories on a large scale remains difficult. Here, we applied a module (sets of genes working in the same biological action)-based correlation analysis in combination with a network analysis to Arabidopsis data and developed a 'module-based correlation network' (MCN) which represents relationships among DNA microarray experiments on a large scale. We developed a Web-based data analysis tool, 'AtCAST' (Arabidopsis thaliana: DNA Microarray Correlation Analysis Tool), which enables browsing of an MCN or mining of users' microarray data by mapping the data into an MCN. AtCAST can help researchers to find novel connections among DNA microarray experiments, which in turn will help to build new hypotheses to uncover physiological mechanisms or gene functions in Arabidopsis.  相似文献   

19.
Summary In recent times, new members of the insulin/relaxin peptide superfamily have been identified by both differential cloning strategies as well as bioinformatic searching of the EST databases. We have used the public and Celera Genomics databases to search for novel members of this peptide family. No new members of the insulin/relaxin family were identified although the human (H3) and mouse (M3) relaxin 3 genes that we recently discovered in the Celera Genomics database were identified in the public database. We were able to confirm that there are no mouse equivalents of human INSL-4 or human gene 1 relaxin. Hence, as the two human relaxin genes (H1 and H2) are localized together with INSL6 and INSL4 on chromosome 9 it is probable that INSL4 and H1 relaxin are the result of a gene duplication which did not occur in non-primates. The discovery of a full relaxin 3 sequences in a new Zebrafish brain EST library, which retains a high homology in both A and B chain peptide sequence with the H3 peptide, indicate that this novel peptide has important conserved functions.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号