The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] CRF(12hit)

1-12hit
  • Extracting Knowledge Entities from Sci-Tech Intelligence Resources Based on BiLSTM and Conditional Random Field

    Weizhi LIAO  Mingtong HUANG  Pan MA  Yu WANG  

     
    PAPER

      Pubricized:
    2021/04/22
      Vol:
    E104-D No:8
      Page(s):
    1214-1221

    There are many knowledge entities in sci-tech intelligence resources. Extracting these knowledge entities is of great importance for building knowledge networks, exploring the relationship between knowledge, and optimizing search engines. Many existing methods, which are mainly based on rules and traditional machine learning, require significant human involvement, but still suffer from unsatisfactory extraction accuracy. This paper proposes a novel approach for knowledge entity extraction based on BiLSTM and conditional random field (CRF).A BiLSTM neural network to obtain the context information of sentences, and CRF is then employed to integrate global label information to achieve optimal labels. This approach does not require the manual construction of features, and outperforms conventional methods. In the experiments presented in this paper, the titles and abstracts of 20,000 items in the existing sci-tech literature are processed, of which 50,243 items are used to build benchmark datasets. Based on these datasets, comparative experiments are conducted to evaluate the effectiveness of the proposed approach. Knowledge entities are extracted and corresponding knowledge networks are established with a further elaboration on the correlation of two different types of knowledge entities. The proposed research has the potential to improve the quality of sci-tech information services.

  • Contextualized Character Embedding with Multi-Sequence LSTM for Automatic Word Segmentation

    Hyunyoung LEE  Seungshik KANG  

     
    PAPER-Natural Language Processing

      Pubricized:
    2020/08/19
      Vol:
    E103-D No:11
      Page(s):
    2371-2378

    Contextual information is a crucial factor in natural language processing tasks such as sequence labeling. Previous studies on contextualized embedding and word embedding have explored the context of word-level tokens in order to obtain useful features of languages. However, unlike it is the case in English, the fundamental task in East Asian languages is related to character-level tokens. In this paper, we propose a contextualized character embedding method using n-gram multi-sequences information with long short-term memory (LSTM). It is hypothesized that contextualized embeddings on multi-sequences in the task help each other deal with long-term contextual information such as the notion of spans and boundaries of segmentation. The analysis shows that the contextualized embedding of bigram character sequences encodes well the notion of spans and boundaries for word segmentation rather than that of unigram character sequences. We find out that the combination of contextualized embeddings from both unigram and bigram character sequences at output layer rather than the input layer of LSTMs improves the performance of word segmentation. The comparison showed that our proposed method outperforms the previous models.

  • HDR Image Synthesis Using Visual Brightness Mapping and Local Surround-Based Image Fusion

    Sung-Hak LEE  

     
    PAPER

      Vol:
    E102-C No:11
      Page(s):
    802-809

    An HDR (High Dynamic Range) image synthesis is a method which is to photograph scenes with wide luminance range and to reproduce images close to real visual scenes on an LDR (Low Dynamic Range) display. In general, HDR images are reproduced by taking images with various camera exposures and using the tone synthesis of several images. In this paper, we propose an HDR image tone mapping method based on a visual brightness function using dual exposed images and a synthesis algorithm based on local surround. The proposed algorithm has improved boundary errors and color balance compared with existing methods. Also, it improves blurring and noise amplification due to image mixing.

  • ORRIS: Throughput Optimization for Backscatter Link on Physical and MAC Layers

    Jumin ZHAO  Yanxia LI  Dengao LI  Hao WU  Biaokai ZHU  

     
    PAPER-Multimedia Systems for Communications

      Pubricized:
    2019/04/05
      Vol:
    E102-B No:10
      Page(s):
    2082-2090

    Unlike Radio Frequency Identification (RFID), emerging Computational RFID (CRFID) integrates the RF front-end and MCU with multiple sensors. CRFIDs need to transmit data within the interrogator range, so when the tags moved rapidly or the contact duration with interrogator is limited, the sensor data collected by CRFID must be transferred to interrogator quickly. In this paper, we focus on throughput optimization for backscatter link, take physical and medium access control (MAC) layers both into consideration, put forward our scheme called ORRIS. On physical layer, we propose Cluster Gather Degree (CGD) indicator, which is the clustering degree of signal in IQ domain. Then CGD is regarded as the criterion to adaptively adjust the rate encoding mode and link frequency, accordingly achieve adaptive rate transmission. On MAC layer, based on the idea of asynchronous transfer, we utilize the the number of clusters in IQ domain to select the optimal Q value as much as possible. So that achieve burst transmission or bulk data transmission. Experiments and analyses on the static and mobile scenarios show that our proposal has significantly better mean throughput than BLINK or CARA, which demonstrate the effectiveness of our scheme.

  • LSTM-CRF Models for Named Entity Recognition

    Changki LEE  

     
    PAPER-Natural Language Processing

      Pubricized:
    2017/01/20
      Vol:
    E100-D No:4
      Page(s):
    882-887

    Recurrent neural networks (RNNs) are a powerful model for sequential data. RNNs that use long short-term memory (LSTM) cells have proven effective in handwriting recognition, language modeling, speech recognition, and language comprehension tasks. In this study, we propose LSTM conditional random fields (LSTM-CRF); it is an LSTM-based RNN model that uses output-label dependencies with transition features and a CRF-like sequence-level objective function. We also propose variations to the LSTM-CRF model using a gate recurrent unit (GRU) and structurally constrained recurrent network (SCRN). Empirical results reveal that our proposed models attain state-of-the-art performance for named entity recognition.

  • Fuzzy Matching of Semantic Class in Chinese Spoken Language Understanding

    Yanling LI  Qingwei ZHAO  Yonghong YAN  

     
    PAPER-Natural Language Processing

      Vol:
    E96-D No:8
      Page(s):
    1845-1852

    Semantic concept in an utterance is obtained by a fuzzy matching methods to solve problems such as words' variation induced by automatic speech recognition (ASR), or missing field of key information by users in the process of spoken language understanding (SLU). A two-stage method is proposed: first, we adopt conditional random field (CRF) for building probabilistic models to segment and label entity names from an input sentence. Second, fuzzy matching based on similarity function is conducted between the named entities labeled by a CRF model and the reference characters of a dictionary. The experiments compare the performances in terms of accuracy and processing speed. Dice similarity and cosine similarity based on TF score can achieve better accuracy performance among four similarity measures, which equal to and greater than 93% in F1-measure. Especially the latter one improved by 8.8% and 9% respectively compared to q-gram and improved edit-distance, which are two conventional methods for string fuzzy matching.

  • A New Hybrid Method for Machine Transliteration

    Dong YANG  Paul DIXON  Sadaoki FURUI  

     
    PAPER-Natural Language Processing

      Vol:
    E93-D No:12
      Page(s):
    3377-3383

    This paper proposes a new hybrid method for machine transliteration. Our method is based on combining a newly proposed two-step conditional random field (CRF) method and the well-known joint source channel model (JSCM). The contributions of this paper are as follows: (1) A two-step CRF model for machine transliteration is proposed. The first CRF segments a character string of an input word into chunks and the second one converts each chunk into a character in the target language. (2) A joint optimization method of the two-step CRF model and a fast decoding algorithm are also proposed. Our experiments show that the joint optimization of the two-step CRF model works as well as or even better than the JSCM, and the fast decoding algorithm significantly decreases the decoding time. (3) A rapid development method based on a weighted finite state transducer (WFST) framework for the JSCM is proposed. (4) The combination of the proposed two-step CRF model and JSCM outperforms the state-of-the-art result in terms of top-1 accuracy.

  • Detecting New Words from Chinese Text Using Latent Semi-CRF Models

    Xiao SUN  Degen HUANG  Fuji REN  

     
    PAPER-Natural Language Processing

      Vol:
    E93-D No:6
      Page(s):
    1386-1393

    Chinese new words and their part-of-speech (POS) are particularly problematic in Chinese natural language processing. With the fast development of internet and information technology, it is impossible to get a complete system dictionary for Chinese natural language processing, as new words out of the basic system dictionary are always being created. A latent semi-CRF model, which combines the strengths of LDCRF (Latent-Dynamic Conditional Random Field) and semi-CRF, is proposed to detect the new words together with their POS synchronously regardless of the types of the new words from the Chinese text without being pre-segmented. Unlike the original semi-CRF, the LDCRF is applied to generate the candidate entities for training and testing the latent semi-CRF, which accelerates the training speed and decreases the computation cost. The complexity of the latent semi-CRF could be further adjusted by tuning the number of hidden variables in LDCRF and the number of the candidate entities from the Nbest outputs of the LDCRF. A new-words-generating framework is proposed for model training and testing, under which the definitions and distributions of the new words conform to the ones existing in real text. Specific features called "Global Fragment Information" for new word detection and POS tagging are adopted in the model training and testing. The experimental results show that the proposed method is capable of detecting even low frequency new words together with their POS tags. The proposed model is found to be performing competitively with the state-of-the-art models presented.

  • Improved Sequential Dependency Analysis Integrating Labeling-Based Sentence Boundary Detection

    Takanobu OBA  Takaaki HORI  Atsushi NAKAMURA  

     
    PAPER-Natural Language Processing

      Vol:
    E93-D No:5
      Page(s):
    1272-1281

    A dependency structure interprets modification relationships between words or phrases and is recognized as an important element in semantic information analysis. With the conventional approaches for extracting this dependency structure, it is assumed that the complete sentence is known before the analysis starts. For spontaneous speech data, however, this assumption is not necessarily correct since sentence boundaries are not marked in the data. Although sentence boundaries can be detected before dependency analysis, this cascaded implementation is not suitable for online processing since it delays the responses of the application. To solve these problems, we proposed a sequential dependency analysis (SDA) method for online spontaneous speech processing, which enabled us to analyze incomplete sentences sequentially and detect sentence boundaries simultaneously. In this paper, we propose an improved SDA integrating a labeling-based sentence boundary detection (SntBD) technique based on Conditional Random Fields (CRFs). In the new method, we use CRF for soft decision of sentence boundaries and combine it with SDA to retain its online framework. Since CRF-based SntBD yields better estimates of sentence boundaries, SDA can provide better results in which the dependency structure and sentence boundaries are consistent. Experimental results using spontaneous lecture speech from the Corpus of Spontaneous Japanese show that our improved SDA outperforms the original SDA with SntBD accuracy providing better dependency analysis results.

  • Experimental Study on a Two Phase Method for Biomedical Named Entity Recognition

    Seonho KIM  Juntae YOON  

     
    PAPER-Natural Language Processing

      Vol:
    E90-D No:7
      Page(s):
    1103-1110

    In this paper, we describe a two-phase method for biomedical named entity recognition consisting of term boundary detection and biomedical category labeling. The term boundary detection can be defined as a task to assign label sequences to a given sentence, and biomedical category labeling can be viewed as a local classification problem which does not need knowledge of the labels of other named entities in a sentence. The advantage of dividing the recognition process into two phases is that we can measure the effectiveness of models at each phase and select separately the appropriate model for each subtask. In order to obtain a better performance in biomedical named entity recognition, we conducted comparative experiments using several learning methods at each phase. Moreover, results by these machine learning based models are refined by rule-based postprocessing. We tested our methods on the JNLPBA 2004 shared task and the GENIA corpus.

  • Variable Spreading and Chip Repetition Factors (VSCRF)-CDMA in Reverse Link for Broadband Packet Wireless Access

    Yoshikazu GOTO  Teruo KAWAMURA  Hiroyuki ATARASHI  Mamoru SAWAHASHI  

     
    PAPER

      Vol:
    E88-B No:2
      Page(s):
    509-519

    This paper proposes Variable Spreading and Chip Repetition Factors (VSCRF)-Code Division Multiple Access (CDMA) broadband packet wireless access in the reverse link, which flexibly supports employing the same air interface in various radio environments such as a cellular system with a multi-cell configuration and local areas such as very-small cell, indoor, and isolated-cell environments. In VSCRF-CDMA, we propose two schemes: the first is a combination of time-domain spreading with an orthogonal code and chip repetition that achieves orthogonal multiple access in the frequency domain by utilizing a comb-shaped frequency spectrum, and the other is adaptive control of the spreading factor and chip repetition factor according to the cell configurations, number of simultaneously accessing users, propagation channel conditions, and major radio link parameters. Simulation results show that the proposed VSCRF-CDMA associated with the combination of the spreading factor, SFD, of four and the chip repetition factor, CRF, of four improves the required average received signal energy per bit-to-noise power spectrum density ratio (Eb/N0) for the average packet error rate of 10-2 by approximately 2.0 dB compared to DS-CDMA only employing SFD = 16 assuming four simultaneously accessing users in an exponentially decaying six-path Rayleigh fading channel with two-branch diversity reception.

  • Circuit Analysis and Design of Low-Power CMOS Tapered Buffer

    Kuo-Hsing CHENG  Wei-Bin YANG  

     
    PAPER-Electronic Circuits

      Vol:
    E86-C No:5
      Page(s):
    850-858

    Decreased power dissipation and transient voltage drops in CMOS power distribution networks are important for high-speed deep submicrometer CMOS integrated circuits. In this paper, three CMOS buffers based on the charge-transfer, split-path and bootstrapped techniques to reduce the power dissipation and transient voltage drop in power supply are proposed. First, the inverted-delay-unit is used in the low-power inverted-delay-unit (LPID) CMOS buffer to eliminate the short-circuit current of the output stage. Second, the low-swing bootstrapped feedback-controlled split-path (LBFS) CMOS buffer is proposed to eliminate the short-circuit current of the output stage by using the feedback-controlled split-path method. The dynamic power dissipation of the LBFS CMOS buffer can be reduced by limiting the gate voltage swing of the output stage. Moreover, the propagation delay of the LBFS CMOS buffer is also reduced by non-full-swing gate voltage of the output stage. Third, the charge-recovery scheme is used in the charge-transfer feedback-controlled 4-split-path (CRFS) CMOS buffer to recovery and pull up the gate voltage of the output stage for reducing power-delay product and power line noise. Based on HSPICE simulation results, the power-delay product and the transient voltage drop in power supply of the proposed three CMOS buffers can be reduced by 20% to 40% as compared to conventional CMOS tapered buffer under various capacitive load.