1-3hit |
Dang Hung TRAN Tu Bao HO Tho Hoan PHAM Kenji SATOU
One kind of functional noncoding RNAs, microRNAs (miRNAs), form a class of endogenous RNAs that can have important regulatory roles in animals and plants by targeting transcripts for cleavage or translation repression. Researches on both experimental and computational approaches have shown that miRNAs indeed involve in the human cancer development and progression. However, the miRNAs that contribute more information to the distinction between the normal and tumor samples (tissues) are still undetermined. Recently, the high-throughput microarray technology was used as a powerful technique to measure the expression level of miRNAs in cells. Analyzing this expression data can allow us to determine the functional roles of miRNAs in the living cells. In this paper, we present a computational method to (1) predicting the tumor tissues using high-throughput miRNA expression profiles; (2) finding the informative miRNAs that show strong distinction of expression level in tumor tissues. To this end, we perform a support vector machine (SVM) based method to deeply examine one recent miRNA expression dataset. The experimental results show that SVM-based method outperforms other supervised learning methods such as decision trees, Bayesian networks, and backpropagation neural networks. Furthermore, by using the miRNA-target information and Gene Ontology annotations, we showed that the informative miRNAs have strong evidences related to some types of human cancer including breast, lung, and colon cancer.
Akio NISHIKAWA Kenji SATOU Emiko FURUICHI Satoru KUHARA Kazuo USHIJIMA
Scientific database systems for the analysis of genes and proteins are becoming very important these days. We have developed a deductive database system PACADE for analyzing the three dimensional and secondary structures of proteins. In this paper, we describe the statistical data classification component of PACADE. We implemented the component for cluster analysis and discrimination analysis. In addition, we enhanced the aggregation function in order to calculate the characteristic values which are useful for data classification. By using the cluster analysis function, the proteins are thereby classified into different types of structural characteristics. The results of these structural analysis experiments are also described in this paper.
Yoichi YAMADA Ken-ichi HIROTANI Kenji SATOU Ken-ichiro MURAMOTO
Microarray technology has been applied to various biological and medical research fields. A preliminary step to extract any information from a microarray data set is to identify differentially expressed genes between microarray data. The identification of the differentially expressed genes and their commonly associated GO terms allows us to find stimulation-dependent or disease-related genes and biological events, etc. However, the identification of these deregulated GO terms by general approaches including gene set enrichment analysis (GSEA) does not necessarily provide us with overrepresented GO terms in specific data among a microarray data set (i.e., data-specific GO terms). In this paper, we propose a statistical method to correctly identify the data-specific GO terms, and estimate its availability by simulation using an actual microarray data set.