1-2hit |
Dongwan HONG Jeehee YOON Jongkeun LEE Sanghyun PARK Jongil KIM
By converting the expression values of each sample into the corresponding rank values, the rank-based approach enables the direct integration of multiple microarray data produced by different laboratories and/or different techniques. In this study, we verify through statistical and experimental methods that informative genes can be extracted from multiple microarray data integrated by the rank-based approach (briefly, integrated rank-based microarray data). First, after showing that a nonparametric technique can be used effectively as a scoring metric for rank-based microarray data, we prove that the scoring results from integrated rank-based microarray data are statistically significant. Next, through experimental comparisons, we show that the informative genes from integrated rank-based microarray data are statistically more significant than those of single-microarray data. In addition, by comparing the lists of informative genes extracted from experimental data, we show that the rank-based data integration method extracts more significant genes than the z-score-based normalization technique or the rank products technique. Public cancer microarray data were used for our experiments and the marker genes list from the CGAP database was used to compare the extracted genes. The GO database and the GSEA method were also used to analyze the functionalities of the extracted genes.
Yoichi YAMADA Ken-ichi HIROTANI Kenji SATOU Ken-ichiro MURAMOTO
Microarray technology has been applied to various biological and medical research fields. A preliminary step to extract any information from a microarray data set is to identify differentially expressed genes between microarray data. The identification of the differentially expressed genes and their commonly associated GO terms allows us to find stimulation-dependent or disease-related genes and biological events, etc. However, the identification of these deregulated GO terms by general approaches including gene set enrichment analysis (GSEA) does not necessarily provide us with overrepresented GO terms in specific data among a microarray data set (i.e., data-specific GO terms). In this paper, we propose a statistical method to correctly identify the data-specific GO terms, and estimate its availability by simulation using an actual microarray data set.