The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] PIC(273hit)

21-40hit(273hit)

  • Improving Thai Word and Sentence Segmentation Using Linguistic Knowledge

    Rungsiman NARARATWONG  Natthawut KERTKEIDKACHORN  Nagul COOHAROJANANONE  Hitoshi OKADA  

     
    PAPER-Natural Language Processing

      Pubricized:
    2018/09/07
      Vol:
    E101-D No:12
      Page(s):
    3218-3225

    Word boundary ambiguity in word segmentation has long been a fundamental challenge within Thai language processing. The Conditional Random Fields (CRF) model is among the best-known methods to have achieved remarkably accurate segmentation. Nevertheless, current advancements appear to have left the problem of compound words unaccounted for. Compound words lose their meaning or context once segmented. Hence, we introduce a dictionary-based word-merging algorithm, which merges all kinds of compound words. Our evaluation shows that the algorithm can accomplish a high-accuracy of word segmentation, with compound words being preserved. Moreover, it can also restore some incorrectly segmented words. Another problem involving a different word-chunking approach is sentence boundary ambiguity. In tackling the problem, utilizing the part of speech (POS) of a segmented word has been found previously to help boost the accuracy of CRF-based sentence segmentation. However, not all segmented words can be tagged. Thus, we propose a POS-based word-splitting algorithm, which splits words in order to increase POS tags. We found that with more identifiable POS tags, the CRF model performs better in segmenting sentences. To demonstrate the contributions of both methods, we experimented with three of their applications. With the word merging algorithm, we found that intact compound words in the product of topic extraction can help to preserve their intended meanings, offering more precise information for human interpretation. The algorithm, together with the POS-based word-splitting algorithm, can also be used to amend word-level Thai-English translations. In addition, the word-splitting algorithm improves sentence segmentation, thus enhancing text summarization.

  • A Novel Recommendation Algorithm Incorporating Temporal Dynamics, Reviews and Item Correlation

    Ting WU  Yong FENG  JiaXing SANG  BaoHua QIANG  YaNan WANG  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2018/05/18
      Vol:
    E101-D No:8
      Page(s):
    2027-2034

    Recommender systems (RS) exploit user ratings on items and side information to make personalized recommendations. In order to recommend the right products to users, RS must accurately model the implicit preferences of each user and the properties of each product. In reality, both user preferences and item properties are changing dynamically over time, so treating the historical decisions of a user or the received comments of an item as static is inappropriate. Besides, the review text accompanied with a rating score can help us to understand why a user likes or dislikes an item, so temporal dynamics and text information in reviews are important side information for recommender systems. Moreover, compared with the large number of available items, the number of items a user can buy is very limited, which is called the sparsity problem. In order to solve this problem, utilizing item correlation provides a promising solution. Although famous methods like TimeSVD++, TopicMF and CoFactor partially take temporal dynamics, reviews and correlation into consideration, none of them combine these information together for accurate recommendation. Therefore, in this paper we propose a novel combined model called TmRevCo which is based on matrix factorization. Our model combines the dynamic user factor of TimeSVD++ with the hidden topic of each review text mined by the topic model of TopicMF through a new transformation function. Meanwhile, to support our five-scoring datasets, we use a more appropriate item correlation measure in CoFactor and associate the item factors of CoFactor with that of matrix factorization. Our model comprehensively combines the temporal dynamics, review information and item correlation simultaneously. Experimental results on three real-world datasets show that our proposed model leads to significant improvement compared with the baseline methods.

  • Retweeting Prediction Based on Social Hotspots and Dynamic Tensor Decomposition

    Qian LI  Xiaojuan LI  Bin WU  Yunpeng XIAO  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/01/30
      Vol:
    E101-D No:5
      Page(s):
    1380-1392

    In social networks, predicting user behavior under social hotspots can aid in understanding the development trend of a topic. In this paper, we propose a retweeting prediction method for social hotspots based on tensor decomposition, using user information, relationship and behavioral data. The method can be used to predict the behavior of users and analyze the evolvement of topics. Firstly, we propose a tensor-based mechanism for mining user interaction, and then we propose that the tensor be used to solve the problem of inaccuracy that arises when interactively calculating intensity for sparse user interaction data. At the same time, we can analyze the influence of the following relationship on the interaction between users based on characteristics of the tensor in data space conversion and projection. Secondly, time decay function is introduced for the tensor to quantify further the evolution of user behavior in current social hotspots. That function can be fit to the behavior of a user dynamically, and can also solve the problem of interaction between users with time decay. Finally, we invoke time slices and discretization of the topic life cycle and construct a user retweeting prediction model based on logistic regression. In this way, we can both explore the temporal characteristics of user behavior in social hotspots and also solve the problem of uneven interaction behavior between users. Experiments show that the proposed method can improve the accuracy of user behavior prediction effectively and aid in understanding the development trend of a topic.

  • Sequential Bayesian Nonparametric Multimodal Topic Models for Video Data Analysis

    Jianfei XUE  Koji EGUCHI  

     
    PAPER

      Pubricized:
    2018/01/18
      Vol:
    E101-D No:4
      Page(s):
    1079-1087

    Topic modeling as a well-known method is widely applied for not only text data mining but also multimedia data analysis such as video data analysis. However, existing models cannot adequately handle time dependency and multimodal data modeling for video data that generally contain image information and speech information. In this paper, we therefore propose a novel topic model, sequential symmetric correspondence hierarchical Dirichlet processes (Seq-Sym-cHDP) extended from sequential conditionally independent hierarchical Dirichlet processes (Seq-CI-HDP) and sequential correspondence hierarchical Dirichlet processes (Seq-cHDP), to improve the multimodal data modeling mechanism via controlling the pivot assignments with a latent variable. An inference scheme for Seq-Sym-cHDP based on a posterior representation sampler is also developed in this work. We finally demonstrate that our model outperforms other baseline models via experiments.

  • Stochastic Divergence Minimization for Biterm Topic Models

    Zhenghang CUI  Issei SATO  Masashi SUGIYAMA  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2017/12/20
      Vol:
    E101-D No:3
      Page(s):
    668-677

    As the emergence and the thriving development of social networks, a huge number of short texts are accumulated and need to be processed. Inferring latent topics of collected short texts is an essential task for understanding its hidden structure and predicting new contents. A biterm topic model (BTM) was recently proposed for short texts to overcome the sparseness of document-level word co-occurrences by directly modeling the generation process of word pairs. Stochastic inference algorithms based on collapsed Gibbs sampling (CGS) and collapsed variational inference have been proposed for BTM. However, they either require large computational complexity, or rely on very crude estimation that does not preserve sufficient statistics. In this work, we develop a stochastic divergence minimization (SDM) inference algorithm for BTM to achieve better predictive likelihood in a scalable way. Experiments show that SDM-BTM trained by 30% data outperforms the best existing algorithm trained by full data.

  • Transmission Property Analysis of Optically-Anisotropic Dielectric Multilayer for Thin Wide-Viewing-Angle Reflective Polarizer Open Access

    Kunihiko AKAHANE  Takahiro ISHINABE  Yosei SHIBATA  Hideo FUJIKAKE  

     
    INVITED PAPER

      Vol:
    E100-C No:11
      Page(s):
    998-1004

    We show that light leakage that occurs in reflective polarizers at large angles of incidence can be suppressed by using anisotropic dielectric multilayers with larger reflective indices in thickness direction and that the interference-included 2×2 Jones matrix method is useful for the investigation of the optical propagation properties of the dielectric multilayers. The thickness of the reflective polarizer can also be reduced by optimizing the distribution of the multilayers in the stack, whilst considering the visual sensitivity. These results indicate that it is possible to realize a high-quality liquid crystal display with wide viewing angles and high light utilization efficiency.

  • Generating Questions for Inquiry-Based Learning of History in Elementary Schools by Using Stereoscopic 3D Images Open Access

    Takashi SHIBATA  Kazunori SATO  Ryohei IKEJIRI  

     
    INVITED PAPER

      Vol:
    E100-C No:11
      Page(s):
    1012-1020

    We conducted experimental classes in an elementary school to examine how the advantages of using stereoscopic 3D images could be applied in education. More specifically, we selected a unit of the Tumulus period in Japan for sixth-graders as the source of our 3D educational materials. This unit represents part of the coursework for the topic of Japanese history. The educational materials used in our study included stereoscopic 3D images for examining the stone chambers and Haniwa (i.e., terracotta clay figures) of the Tumulus period. The results of our experimental class showed that 3D educational materials helped students focus on specific parts in images such as attached objects of the Haniwa and also understand 3D spaces and concavo-convex shapes. The experimental class revealed that 3D educational materials also helped students come up with novel questions regarding attached objects of the Haniwa, and Haniwa's spatial balance and spatial alignment. The results suggest that the educational use of stereoscopic 3D images is worthwhile in that they lead to question and hypothesis generation and an inquiry-based learning approach to history.

  • Transient Analysis of Anisotropic Dielectrics and Ferromagnetic Materials Based on Unconditionally Stable Perfectly-Matched-Layer (PML) Complex-Envelope (CE) Finite-Difference Time-Domain (FDTD) Method

    Sang-Gyu HA  Jeahoon CHO  Kyung-Young JUNG  

     
    PAPER-Antennas and Propagation

      Pubricized:
    2017/03/14
      Vol:
    E100-B No:10
      Page(s):
    1879-1883

    Anisotropic dielectrics and ferromagnetic materials are widely used in dispersion-engineered metamaterials. For example, nonreciprocal magnetic photonic crystals (MPhCs) are periodic structures whose unit cell is composed of two misaligned anisotropic dielectric layers and one ferromagnetic layer and they have extraordinary characteristics such as wave slowdown and field amplitude increase. We develop an unconditionally stable complex-envelop alternating-direction-implicit finite-difference time-domain method (CE-ADI-FDTD) suitable for the transient analysis of anisotropic dielectrics and ferromagnetic materials. In the proposed algorithm, the perfectly-matched-layer (PML) is straightforwardly incorporated in Maxwell's curl equations. Numerical examples show that the proposed PML-CE-ADI-FDTD method can reduce the CPU time significantly for the transient analysis of anisotropic dielectrics and ferromagnetic materials while maintaining computational accuracy.

  • Occluded Appearance Modeling with Sample Weighting for Human Pose Estimation

    Yuki KAWANA  Norimichi UKITA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2017/07/06
      Vol:
    E100-D No:10
      Page(s):
    2627-2634

    This paper proposes a method for human pose estimation in still images. The proposed method achieves occlusion-aware appearance modeling. Appearance modeling with less accurate appearance data is problematic because it adversely affects the entire training process. The proposed method evaluates the effectiveness of mitigating the influence of occluded body parts in training sample images. In order to improve occlusion evaluation by a discriminatively-trained model, occlusion images are synthesized and employed with non-occlusion images for discriminative modeling. The score of this discriminative model is used for weighting each sample in the training process. Experimental results demonstrate that our approach improves the performance of human pose estimation in contrast to base models.

  • The Biterm Author Topic in the Sentences Model for E-Mail Analysis

    Xiuze ZHOU  Shunxiang WU  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2017/04/25
      Vol:
    E100-D No:8
      Page(s):
    1852-1859

    E-mails, which vary in length, are a special form of text. The difference in the lengths of e-mails increases the difficulty of text analysis. To better analyze e-mail, our models must analyze not only long e-mails but also short e-mails. Unlike normal documents, short texts have some unique characteristics, such as data sparsity and ambiguity problems, making it difficult to obtain useful information from them. However, long text and short text cannot be analyzed in the same manner. Therefore, we have to analyze the characteristics of both. We present the Biterm Author Topic in the Sentences Model (BATS) model; it can discover relevant topics of corpus and accurately capture the relationship between the topics and authors of e-mails. The Author Topic (AT) model learns from a single word in a document, while the BATS is modeled on word co-occurrence in the entire corpus. We assume that all words in a single sentence are generated from the same topic. Accordingly, our method uses only word co-occurrence patterns at the sentence level, rather than the document or corpus level. Experiments on the Enron data set indicate that our proposed method achieves better performance on e-mails than the baseline methods. What's more, our method analyzes long texts effectively and solves the data sparsity problems of short texts.

  • Joint Optimization of User Association and Inter-Cell Interference Coordination for Proportional Fair-Based System Throughput Maximization in Heterogeneous Cellular Networks

    Yoshitaka IKEDA  Shozo OKASAKA  Kenichi HIGUCHI  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2017/02/08
      Vol:
    E100-B No:8
      Page(s):
    1334-1342

    This paper proposes a proportional fair-based joint optimization method for user association and the bandwidth ratio of protected radio resources exclusively used by pico base stations (BSs) for inter-cell interference coordination (ICIC) in heterogeneous networks where low transmission-power pico BSs overlay a high transmission-power macro BS. The proposed method employs an iterative algorithm, in which the user association process for a given bandwidth ratio of protected radio resources and the bandwidth ratio control of protected radio resources for a given user association are applied alternately and repeatedly up to convergence. For user association, we use our previously reported decentralized iterative user association method based on the feedback information of each individual user assisted by a small amount of broadcast information from the respective BSs. Based on numerical results, we show that the proposed method adaptively achieves optimal user association and bandwidth ratio control of protected radio resources, which maximizes the geometric mean user throughput within the macrocell coverage area. The system throughput of the proposed method is compared to that for conventional approaches to show the performance gain.

  • Double Directional Millimeter Wave Propagation Channel Measurement and Polarimetric Cluster Properties in Outdoor Urban Pico-cell Environment

    Karma WANGCHUK  Kento UMEKI  Tatsuki IWATA  Panawit HANPINITSAK  Minseok KIM  Kentaro SAITO  Jun-ichi TAKADA  

     
    PAPER-Antennas and Propagation

      Pubricized:
    2017/01/16
      Vol:
    E100-B No:7
      Page(s):
    1133-1144

    To use millimeter wave bands in future cellular and outdoor wireless networks, understanding the multipath cluster characteristics such as delay and angular spread for different polarization is very important besides knowing the path loss and other large scale propagation parameters. This paper presents result from analysis of wide-band full polarimetric double directional channel measurement at the millimeter wave band in a typical urban pico-cell environment. Only limited number of multipath clusters with gains ranging from -8dB to -26.8dB below the free space path loss and mainly due to single reflection, double reflection and diffraction, under both line of sight (LOS) and obstructed LOS conditions are seen. The cluster gain and scattering intensity showed strong dependence on polarization. The scattering intensities for ϑ-ϑ polarization were seen to be stronger compared to ϕ-ϕ polarization and on average 6.1dB, 5.6dB and 4.5dB higher for clusters due to single reflection, double reflection and scattering respectively. In each cluster, the paths are highly concentrated in the delay domain with delay spread comparable to the delay resolution of 2.5ns irrespective of polarization. Unlike the scattering intensity, the angular spread of paths in each cluster did not show dependence on polarization. On the base station side, average angular spread in azimuth and in elevation were almost similar with ≤3.3° spread in azimuth and ≤3.2° spread in elevation for ϑ-ϑ polarization. These spreads were slightly smaller than those observed for ϕ-ϕ polarization. On the mobile station side the angular spread in azimuth was much higher compared to the base station side. On average, azimuth angular spread of ≤11.4° and elevation angular spread of ≤5° are observed for ϑ-ϑ polarization. These spreads were slightly larger than in ϕ-ϕ polarization. Knowing these characteristics will be vital for more accurate modeling of the channel, and in system and antenna design.

  • Relation Prediction in Multilingual Data Based on Multimodal Relational Topic Models

    Yosuke SAKATA  Koji EGUCHI  

     
    PAPER

      Pubricized:
    2017/01/17
      Vol:
    E100-D No:4
      Page(s):
    741-749

    There are increasing demands for improved analysis of multimodal data that consist of multiple representations, such as multilingual documents and text-annotated images. One promising approach for analyzing such multimodal data is latent topic models. In this paper, we propose conditionally independent generalized relational topic models (CI-gRTM) for predicting unknown relations across different multiple representations of multimodal data. We developed CI-gRTM as a multimodal extension of discriminative relational topic models called generalized relational topic models (gRTM). We demonstrated through experiments with multilingual documents that CI-gRTM can more effectively predict both multilingual representations and relations between two different language representations compared with several state-of-the-art baseline models that enable to predict either multilingual representations or unimodal relations.

  • Inferring User Consumption Preferences from Social Media

    Yang LI  Jing JIANG  Ting LIU  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2016/12/09
      Vol:
    E100-D No:3
      Page(s):
    537-545

    Social Media has already become a new arena of our lives and involved different aspects of our social presence. Users' personal information and activities on social media presumably reveal their personal interests, which offer great opportunities for many e-commerce applications. In this paper, we propose a principled latent variable model to infer user consumption preferences at the category level (e.g. inferring what categories of products a user would like to buy). Our model naturally links users' published content and following relations on microblogs with their consumption behaviors on e-commerce websites. Experimental results show our model outperforms the state-of-the-art methods significantly in inferring a new user's consumption preference. Our model can also learn meaningful consumption-specific topics automatically.

  • Vehicle Classification under Different Feature Sets with a Single Anisotropic Magnetoresistive Sensor

    Chang XU  Yingguan WANG  Yunlong ZHAN  

     
    PAPER

      Vol:
    E100-A No:2
      Page(s):
    440-447

    This paper focus on the development of a single portable roadside magnetic sensor for vehicle classification. The magnetic sensor is a kind of anisotropic magnetic device that do not require to be embedded in the roadway-the device is placed next to the roadway and measure traffic in the immediately adjacent lane. A novel feature extraction and comparison approach is presented for vehicle classification with a single magnetic sensor, which is based on four different feature sets extracted from the detected magnetic signal. Furthermore, vehicle classification has been achieved with three common classification algorithms, including support vector machine, k-nearest neighbors and back-propagation neural network. Experimental results have demonstrated that the Peak-Peak feature set with back-propagation neural network approach performs much better than other approaches. Besides, the normalization technology has been proved it does work.

  • Video Data Modeling Using Sequential Correspondence Hierarchical Dirichlet Processes

    Jianfei XUE  Koji EGUCHI  

     
    PAPER

      Pubricized:
    2016/10/07
      Vol:
    E100-D No:1
      Page(s):
    33-41

    Video data mining based on topic models as an emerging technique recently has become a very popular research topic. In this paper, we present a novel topic model named sequential correspondence hierarchical Dirichlet processes (Seq-cHDP) to learn the hidden structure within video data. The Seq-cHDP model can be deemed as an extended hierarchical Dirichlet processes (HDP) model containing two important features: one is the time-dependency mechanism that connects neighboring video frames on the basis of a time dependent Markovian assumption, and the other is the correspondence mechanism that provides a solution for dealing with the multimodal data such as the mixture of visual words and speech words extracted from video files. A cascaded Gibbs sampling method is applied for implementing the inference task of Seq-cHDP. We present a comprehensive evaluation for Seq-cHDP through experimentation and finally demonstrate that Seq-cHDP outperforms other baseline models.

  • A Bipartite Graph-Based Ranking Approach to Query Subtopics Diversification Focused on Word Embedding Features

    Md Zia ULLAH  Masaki AONO  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2016/09/05
      Vol:
    E99-D No:12
      Page(s):
    3090-3100

    Web search queries are usually vague, ambiguous, or tend to have multiple intents. Users have different search intents while issuing the same query. Understanding the intents through mining subtopics underlying a query has gained much interest in recent years. Query suggestions provided by search engines hold some intents of the original query, however, suggested queries are often noisy and contain a group of alternative queries with similar meaning. Therefore, identifying the subtopics covering possible intents behind a query is a formidable task. Moreover, both the query and subtopics are short in length, it is challenging to estimate the similarity between a pair of short texts and rank them accordingly. In this paper, we propose a method for mining and ranking subtopics where we introduce multiple semantic and content-aware features, a bipartite graph-based ranking (BGR) method, and a similarity function for short texts. Given a query, we aggregate the suggested queries from search engines as candidate subtopics and estimate the relevance of them with the given query based on word embedding and content-aware features by modeling a bipartite graph. To estimate the similarity between two short texts, we propose a Jensen-Shannon divergence based similarity function through the probability distributions of the terms in the top retrieved documents from a search engine. A diversified ranked list of subtopics covering possible intents of a query is assembled by balancing the relevance and novelty. We experimented and evaluated our method on the NTCIR-10 INTENT-2 and NTCIR-12 IMINE-2 subtopic mining test collections. Our proposed method outperforms the baselines, known related methods, and the official participants of the INTENT-2 and IMINE-2 competitions.

  • Numerical Evaluation of Effect of Using UTM Grid Maps on Emergency Response Performance — A Case of Information-Processing Training at an Emergency Operation Center in Tagajo City, Miyagi Prefecture —

    Shosuke SATO  Rui NOUCHI  Fumihiko IMAMURA  

     
    LETTER

      Vol:
    E99-A No:8
      Page(s):
    1560-1566

    It is qualitatively considered that emergency information processing by using UTM grids is effective in generating COP (Common Operational Pictures). Here, we conducted a numerical evaluation based on emergency information-processing training to examine the efficiency of the use of UTM grid maps by staff at the Tagajo City Government office. The results of the demonstration experiment were as follows: 1) The time required for information propagation and mapping with UTM coordinates was less than that with address text consisting of area name and block number. 2) There was no measurable difference in subjective estimates of the training performance of participants with or without the use of UTM grids. 3) Fear of real emergency responses decreased among training participants using UTM grids. 4) Many of the negative free answers on a questionnaire evaluation of participants involved requests regarding the reliability and operability of UTM tools.

  • Automated Duplicate Bug Report Detection Using Multi-Factor Analysis

    Jie ZOU  Ling XU  Mengning YANG  Xiaohong ZHANG  Jun ZENG  Sachio HIROKAWA  

     
    PAPER-Software Engineering

      Pubricized:
    2016/04/01
      Vol:
    E99-D No:7
      Page(s):
    1762-1775

    The bug reports expressed in natural language text usually suffer from vast, ambiguous and poorly written, which causes the challenge to the duplicate bug reports detection. Current automatic duplicate bug reports detection techniques have mainly focused on textual information and ignored some useful factors. To improve the detection accuracy, in this paper, we propose a new approach calls LNG (LDA and N-gram) model which takes advantages of the topic model LDA and word-based model N-gram. The LNG considers multiple factors, including textual information, semantic correlation, word order, contextual connections, and categorial information, that potentially affect the detection accuracy. Besides, the N-gram adopted in our LNG model is improved by modifying the similarity algorithm. The experiment is conducted under more than 230,000 real bug reports of the Eclipse project. In the evaluation, we propose a new evaluation metric, namely exact-accuracy (EA) rate, which can be used to enhance the understanding of the performance of duplicates detection. The evaluation results show that all the recall rate, precision rate, and EA rate of the proposed method are higher than treating them separately. Also, the recall rate is improved by 2.96%-10.53% compared to the state-of-art approach DBTM.

  • A Study on Single Polarization Guidance in Photonic Band Gap Fiber with Anisotropic Lattice of Circular Air Holes

    Kazuki ICHIKAWA  Zejun ZHANG  Yasuhide TSUJI  Masashi EGUCHI  

     
    PAPER

      Vol:
    E99-C No:7
      Page(s):
    774-779

    We propose a novel single polarization photonic band gap fiber (SP-PBGF) with an anisotropic air hole lattice in the core. An SP-PBGF with an elliptical air hole lattice in the core recently proposed can easily realize SP guidance utilizing the large difference of cutoff frequency for the x- and y-polarized modes. In this paper, in order to achieve SP guidance based on the same principle of this PBGF, we utilize an anisotropic lattice of circular air holes instead of elliptical air holes to ease the fabrication difficulty. After investigating the influence of the structural parameters on SP guidance, it is numerically demonstrated that the designed SP-PBGF has 381 nm SP operating band.

21-40hit(273hit)