首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
S Engen 《Biometrics》1975,31(1):201-208
A taxonomic group will frequently have a large number of species with small abundances. When a sample is drawn at random from this group, one is therefore faced with the problem that a large proportion of the species will not be discovered. A general definition of quantitative measures of "sample coverage" is proposed, and the problem of statistical inference is considered for two special cases, (1) the actual total relative abundance of those species that are represented in the sample, and (2) their relative contribution to the information index of diversity. The analysis is based on a extended version of the negative binomial species frequency model. The results are tabulated.  相似文献   

2.
MOTIVATION: Because co-expressed genes are likely to share the same biological function, cluster analysis of gene expression profiles has been applied for gene function discovery. Most existing clustering methods ignore known gene functions in the process of clustering. RESULTS: To take advantage of accumulating gene functional annotations, we propose incorporating known gene functions into a new distance metric, which shrinks a gene expression-based distance towards 0 if and only if the two genes share a common gene function. A two-step procedure is used. First, the shrinkage distance metric is used in any distance-based clustering method, e.g. K-medoids or hierarchical clustering, to cluster the genes with known functions. Second, while keeping the clustering results from the first step for the genes with known functions, the expression-based distance metric is used to cluster the remaining genes of unknown function, assigning each of them to either one of the clusters obtained in the first step or some new clusters. A simulation study and an application to gene function prediction for the yeast demonstrate the advantage of our proposal over the standard method.  相似文献   

3.
S M Snapinn  J D Knoke 《Biometrics》1989,45(1):289-299
Accurate estimation of misclassification rates in discriminant analysis with selection of variables by, for example, a stepwise algorithm, is complicated by the large optimistic bias inherent in standard estimators such as those obtained by the resubstitution method. Application of a bootstrap adjustment can reduce the bias of the resubstitution method; however, the bootstrap technique requires the variable selection procedure to be repeated many times and is therefore difficult to compute. In this paper we propose a smoothed estimator that requires relatively little computation and which, on the basis of a Monte Carlo sampling study, is found to perform generally at least as well as the bootstrap method.  相似文献   

4.
With well over 1,000 specialized biological databases in use today, the task of automatically identifying novel, relevant data for such databases is increasingly important. In this paper, we describe practical machine learning approaches for identifying MEDLINE documents and Swiss-Prot/TrEMBL protein records, for incorporation into a specialized biological database of transport proteins named TCDB. We show that both learning approaches outperform rules created by hand by a human expert. As one of the first case studies involving two different approaches to updating a deployed database, both the methods compared and the results will be of interest to curators of many specialized databases.  相似文献   

5.
Fourier-transform infrared (FT-IR) microspectroscopy was used in this study to identify yeasts. Cells were grown to microcolonies of 70 to 250 micro m in diameter and transferred from the agar plate by replica stamping to an IR-transparent ZnSe carrier. IR spectra of the replicas on the carrier were recorded using an IR microscope coupled to an IR spectrometer, and identification was performed by comparison to reference spectra. The method was tested by using small model libraries comprising reference spectra of 45 strains from 9 genera and 13 species, recorded with both FT-IR microspectroscopy and FT-IR macrospectroscopy. The results show that identification by FT-IR microspectroscopy is equivalent to that achieved by FT-IR macrospectroscopy but the time-consuming isolation of the organisms prior to identification is not necessary. Therefore, this method also provides a rapid tool to analyze mixed populations. Furthermore, identification of 21 Debaryomyces hansenii and 9 Saccharomyces cerevisiae strains resulted in 92% correct identification at the strain level for S. cerevisiae and 91% for D. hansenii, which demonstrates that the resolution power of FT-IR microspectroscopy may also be used for yeast typing at the strain level.  相似文献   

6.
MOTIVATIONS: Classification of biological samples for diagnostic purposes is a difficult task because of the many decisions involved on the number, type and functional manipulations of the input variables. This study presents a generally applicable strategy for systematic formulation of optimal diagnostic indexes. To this end, we develop a novel set of computational tools by integrating regression optimization, stepwise variable selection and cross-validation algorithms. RESULTS: The proposed discrimination methodology was applied to plasma and tissue (liver) metabolic profiling data describing the time progression of liver dysfunction in a rat model of acute hepatic failure generated by d-galactosamine (GalN) injection. From the plasma data, our methodology identified seven (out of a total of 23) metabolites, and the corresponding transform functions, as the best inputs to the optimal diagnostic index. This index showed better time resolution and increased noise robustness compared with an existing metabolic index, Fischer's BCAA/AAA molar ratio, as well as indexes generated using other commonly used discriminant analysis tools. Comparison of plasma and liver indexes found two consensus metabolites, lactate and glucose, which implicate glycolysis and/or gluconeogenesis in mediating the metabolic effects of GalN.  相似文献   

7.
BACKGROUND: Analytical flow cytometry (AFC) provides rapid and accurate measurement of particles from heterogeneous populations. AFC has been used to classify and identify phytoplankton species, but most methods of discriminant analysis of resulting data have depended on normality assumptions and outcomes have been disappointing. METHODS AND RESULTS: In this study, we consider nonparametric methods based on density estimation. In addition to the familiar kernel method, methods based on wavelets are also implemented. Full five-dimensional wavelet estimation proves to be computationally prohibitive with current workstation power, so we employ projection pursuit for reduction of dimensionality. AFC typically produces very large samples, so we also investigate data simplification through binning. Further modifications to the discrimination strategy are suggested by specific features of phytoplankton data, namely, a hierarchical group structure, the possible presence of many groups, and the likelihood of encountering an aberrant group in a test sample. CONCLUSIONS: We apply all the resultant procedures to appropriate subsets of a very large data set, demonstrate their efficacy, and compare their error rates with those of more conventional methods. We further show that incorporation of the specific features of phytoplankton data into the analysis leads to improved results and provides a general framework for analysis of such data.  相似文献   

8.
9.
Biclustering algorithms for biological data analysis: a survey   总被引:7,自引:0,他引:7  
A large number of clustering approaches have been proposed for the analysis of gene expression data obtained from microarray experiments. However, the results from the application of standard clustering methods to genes are limited. This limitation is imposed by the existence of a number of experimental conditions where the activity of genes is uncorrelated. A similar limitation exists when clustering of conditions is performed. For this reason, a number of algorithms that perform simultaneous clustering on the row and column dimensions of the data matrix has been proposed. The goal is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this paper, we refer to this class of algorithms as biclustering. Biclustering is also referred in the literature as coclustering and direct clustering, among others names, and has also been used in fields such as information retrieval and data mining. In this comprehensive survey, we analyze a large number of existing approaches to biclustering, and classify them in accordance with the type of biclusters they can find, the patterns of biclusters that are discovered, the methods used to perform the search, the approaches used to evaluate the solution, and the target applications.  相似文献   

10.

Background  

Cluster analysis is an important technique for the exploratory analysis of biological data. Such data is often high-dimensional, inherently noisy and contains outliers. This makes clustering challenging. Mixtures are versatile and powerful statistical models which perform robustly for clustering in the presence of noise and have been successfully applied in a wide range of applications.  相似文献   

11.
Although it is clear that osteoporosis is associated with a reduction in bone mass and a fragile skeleton, it is not understood whether the chemical composition of osteoporotic bone is different from normal bone. In this study, cynomolgus monkeys (Macaca fascicularis) were administered fluorochrome labels at one and two years after ovariectomy (Ovx) or Sham ovariectomy (intact), that were taken up into newly remodeled bone. Using fluorescence-assisted synchrotron infrared microspectroscopy, the chemical composition of bone from intact versus Ovx monkeys has been compared. Results from overall composition distributions (labeled + non-labeled bone) reveal similar carbonate/protein and phosphate/protein ratios, but increased acid phosphate content and different collagen structure in the Ovx animals. Analysis of the fluorochrome-labeled bone indicates similar degrees of mineralization in bone remodeled after one year, but decreased mineralization in Ovx bone remodeled two years after surgery. Thus, bone from monkeys with osteoporosis can be characterized as having abnormal collagen structure and reduced rates of mineralization. Coupled with factors such as trabecular architecture and bone shape and size, these ultrastructural factors may play a contributing role in the increased bone fragility in osteoporosis.  相似文献   

12.
MOTIVATION: The public web-based biological database infrastructure is a source of both wonder and worry. Users delight in the ever increasing amounts of information available; database administrators and curators worry about long-term financial support. An earlier study of 153 biological databases (Ellis and Kalumbi, Nature Biotechnol., 16, 1323-1324, 1998) determined that near future (1-5 year) funding for over two-thirds of them was uncertain. More detailed data are required to determine the magnitude of the problem and offer possible solutions. METHODS: This study examines the finances and use statistics of a few of these organizations in more depth, and reviews several economic models that may help sustain them. RESULTS: Six organizations were studied. Their administrative overhead is fairly low; non-administrative personnel and computer-related costs account for 77% of expenses. One smaller, more specialized US database, in 1997, had 60% of total access from US domains; a majority (56%) of its US accesses came from commercial domains, although only 2% of the 153 databases originally studied received any industrial support. The most popular model used to gain industrial support is asymmetric pricing: preferentially charging the commercial users of a database. At least five biological databases have recently begun using this model. Advertising is another model which may be useful for the more general, more heavily used sites. Microcommerce has promise, especially for databases that do not attract advertisers, but needs further testing. The least income reported for any of the databases studied was $50,000/year; applying this rate to 400 biological databases (a lower limit of the number of such databases, many of which require far larger resources) would mean annual support need of at least $20 million. To obtain this level of support is challenging, yet failure to accept the challenge could be catastrophic. CONTACT: lynda@tc.umn. edu  相似文献   

13.
Fourier transform infrared (FT-IR) microspectroscopy is a powerful technique that can be used to collect infrared spectra from microscopic regions of tissue sections. The infrared spectra are evaluated to chemically characterize the absorbing molecules. This technique can be applied to normal or diseased tissues. In the latter case, FT-IR microspectroscopy can reveal chemical changes that are associated with discrete regions of lesion sites, which can provide insights into the chemical mechanisms of disease processes. In the present study, FT-IR microspectroscopy was used to analyze sections of retina from normal (pigmented) and albino rats. The outer segments of retinas from pigmented animals were found to have unusually strong absorption values for C&z.dbnd6;C-H unsaturation and carbonyl functional groups. Docosahexaenoic acid (DHA), a major constituent of lipids in the outer segments, also had particularly high absorption values for these functional groups, which suggests that it is responsible for those enhanced absorption values. Absorbance values for the unsaturation and carbonyl functional groups were substantially reduced in the outer segments of retinas from albino animals. This finding, together with data from other studies on light-induced oxidative events in the retina, indicates a loss of DHA by a light-induced mechanism in albino animals. The outer nuclear layer had strong absorbance values for H-C-OH and P&z. dbnd6;O functional groups, which is likely due to the sugar phosphate backbone of DNA. The outer and inner plexiform layers were found to contain greater concentrations of CH(2) and C&z.dbnd6;O functional groups than the outer and inner nuclear layers, which is due to the high concentration of synaptic connections in the former layers. In summary, FT-IR microspectroscopy revealed a unique chemical profile in the outer segments compared to other retinal layers, and this profile was altered in albino animals.  相似文献   

14.
An infrared (ir) method to determine the secondary structure of proteins in solution using the amide I region of the spectrum has been devised. The method is based on the circular dichroism (CD) matrix method for secondary structure analysis given by Compton and Johnson (L. A. Compton and W. C. Johnson, 1986, Anal. Biochem. 155, 155-167). The infrared data matrix was constructed from the normalized Fourier transform infrared spectra from 1700 to 1600 cm-1 of 17 commercially available proteins. The secondary structure matrix was constructed from the X-ray data of the seventeen proteins with secondary structure elements of helix, beta-sheet, beta-turn, and other (random). The CD and ir methods were compared by analyzing the proteins of the CD and ir databases as unknowns. Both methods produce similar results compared to structures obtained by X-ray crystallographic means with the CD slightly better for helix conformation, and the ir slightly better for beta-sheet. The relatively good ir analysis for concanavalin A and alpha-chymotrypsin indicate that the ir method is less affected by the presence of aromatic groups. The concentration of the protein and the cell path length need not be known for the ir analysis since the spectra can be normalized to the total ir intensity in the amide I region. The ir spectra for helix, beta-sheet, beta-turn, and other, as extracted from the data-base, agree with the literature band assignments. The ir data matrix and the inverse matrix necessary to analyze unknown proteins are presented.  相似文献   

15.
Identifying the major routes of disease transmission and reservoirs of infection are needed to increase our understanding of disease dynamics and improve disease control. Despite this, transmission events are rarely observed directly. Here we had the unique opportunity to study natural transmission of Bordetella bronchiseptica--a directly transmitted respiratory pathogen with a wide mammalian host range, including sporadic infection of humans--within a commercial rabbitry to evaluate the relative effects of sex and age on the transmission dynamics therein. We did this by developing an a priori set of hypotheses outlining how natural B. bronchiseptica infections may be transmitted between rabbits. We discriminated between these hypotheses by using force-of-infection estimates coupled with random effects binomial regression analysis of B. bronchiseptica age-prevalence data from within our rabbit population. Force-of-infection analysis allowed us to quantify the apparent prevalence of B. bronchiseptica while correcting for age structure. To determine whether transmission is largely within social groups (in this case litter), or from an external group, we used random-effect binomial regression to evaluate the importance of social mixing in disease spread. Between these two approaches our results support young weanlings--as opposed to, for example, breeder or maternal cohorts--as the age cohort primarily responsible for B. bronchiseptica transmission. Thus age-prevalence data, which is relatively easy to gather in clinical or agricultural settings, can be used to evaluate contact patterns and infer the likely age-cohort responsible for transmission of directly transmitted infections. These insights shed light on the dynamics of disease spread and allow an assessment to be made of the best methods for effective long-term disease control.  相似文献   

16.

Background

As an alternative to the frequently used "reference design" for two-channel microarrays, other designs have been proposed. These designs have been shown to be more profitable from a theoretical point of view (more replicates of the conditions of interest for the same number of arrays). However, the interpretation of the measurements is less straightforward and a reconstruction method is needed to convert the observed ratios into the genuine profile of interest (e.g. a time profile). The potential advantages of using these alternative designs thus largely depend on the success of the profile reconstruction. Therefore, we compared to what extent different linear models agree with each other in reconstructing expression ratios and corresponding time profiles from a complex design.

Results

On average the correlation between the estimated ratios was high, and all methods agreed with each other in predicting the same profile, especially for genes of which the expression profile showed a large variance across the different time points. Assessing the similarity in profile shape, it appears that, the more similar the underlying principles of the methods (model and input data), the more similar their results. Methods with a dye effect seemed more robust against array failure. The influence of a different normalization was not drastic and independent of the method used.

Conclusion

Including a dye effect such as in the methods lmbr_dye, anovaFix and anovaMix compensates for residual dye related inconsistencies in the data and renders the results more robust against array failure. Including random effects requires more parameters to be estimated and is only advised when a design is used with a sufficient number of replicates. Because of this, we believe lmbr_dye, anovaFix and anovaMix are most appropriate for practical use.  相似文献   

17.
MOTIVATION: Discriminant analysis is an effective tool for the classification of experimental units into groups. Here, we consider the typical problem of classifying subjects according to phenotypes via gene expression data and propose a method that incorporates variable selection into the inferential procedure, for the identification of the important biomarkers. To achieve this goal, we build upon a conjugate normal discriminant model, both linear and quadratic, and include a stochastic search variable selection procedure via an MCMC algorithm. Furthermore, we incorporate into the model prior information on the relationships among the genes as described by a gene-gene network. We use a Markov random field (MRF) prior to map the network connections among genes. Our prior model assumes that neighboring genes in the network are more likely to have a joint effect on the relevant biological processes. RESULTS: We use simulated data to assess performances of our method. In particular, we compare the MRF prior to a situation where independent Bernoulli priors are chosen for the individual predictors. We also illustrate the method on benchmark datasets for gene expression. Our simulation studies show that employing the MRF prior improves on selection accuracy. In real data applications, in addition to identifying markers and improving prediction accuracy, we show how the integration of existing biological knowledge into the prior model results in an increased ability to identify genes with strong discriminatory power and also aids the interpretation of the results.  相似文献   

18.
19.
Fragmentary human remains compromised by different types of inhumation, or physical insults such as explosions, fires, and mutilations may frustrate the use of traditional morphognostic sex determination methods. The basicranium is protected by a large soft tissue mass comprising muscle, tendon, and ligaments. As such, the occipital region may prove useful for sex identification in cases of significantly fragmented remains. The aims of this paper are to (1) evaluate sexual dimorphism in British cranial bases by manually recorded unilateral and bilateral condylar length and width as well as intercondylar measurements and (2) develop discriminant functions for sex determination for this cranial sample. The crania selected for this study are part of the 18th-19th century documented skeletal collection of St. Bride's Church, Fleet Street, London. Adult human skulls (n = 146; male75/female71) were measured to derive statistical functions. Results indicated that expression of sexual dimorphism in the occipital condylar region within the St. Bride's population is demonstrable but low. Crossvalidated classification accuracy ranged between 69.2 and 76.7%, and sex bias ranged from 0.3 to 9.7%. Therefore, the use of discriminant functions derived from occipital condyles, especially in British skeletal populations, should only be considered in cases of fragmented cranial bases when no other morphognostic or morphometric method can be utilized for sex determination.  相似文献   

20.
One important issue commonly encountered in the analysis of microarray data is to decide which and how many genes should be selected for further studies. For discriminant microarray data analyses based on statistical models, such as the logistic regression models, gene selection can be accomplished by a comparison of the maximum likelihood of the model given the real data, L(D|M), and the expected maximum likelihood of the model given an ensemble of surrogate data with randomly permuted label, L(D(0)|M). Typically, the computational burden for obtaining L(D(0)M) is immense, often exceeding the limits of available computing resources by orders of magnitude. Here, we propose an approach that circumvents such heavy computations by mapping the simulation problem to an extreme-value problem. We present the derivation of an asymptotic distribution of the extreme-value as well as its mean, median, and variance. Using this distribution, we propose two gene selection criteria, and we apply them to two microarray datasets and three classification tasks for illustration.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号