IEICE global.ieice.org Site

Author Search Result

[Author] Rong HUANG(2hit)

1-2hit

Scene Character Detection and Recognition with Cooperative Multiple-Hypothesis Framework
Rong HUANG Palaiahnakote SHIVAKUMARA Yaokai FENG Seiichi UCHIDA

PAPER-Image Recognition, Computer Vision

Vol:
E96-D No:10
Page(s):
2235-2244
To handle the variety of scene characters, we propose a cooperative multiple-hypothesis framework which consists of an image operator set module, an Optical Character Recognition (OCR) module and an integration module. Multiple image operators activated by multiple parameters probe suspected character regions. The OCR module is then applied to each suspected region and returns multiple candidates with weight values for future integration. Without the aid of the heuristic rules which impose constraints on segmentation area, aspect ratio, color consistency, text line orientations, etc., the integration module automatically prunes the redundant detection/recognition and pads the missing detection/recognition. The proposed framework bridges the gap between scene character detection and recognition, in the sense that a practical OCR engine is effectively leveraged for result refinement. In addition, the proposed method achieves the detection and recognition at the character level, which enables dealing with special scenarios such as single character, text along arbitrary orientations or text along curves. We perform experiments on the benchmark ICDAR 2011 Robust Reading Competition dataset which includes a text localization task and a word recognition task. The quantitative results demonstrate that multiple hypotheses outperform a single hypothesis, and be comparable with state-of-the-art methods in terms of recall, precision, F-measure, character recognition rate, total edit distance and word recognition rate. Moreover, two additional experiments are conducted to confirm the simplicity of parameter setting in this proposal.
A CNN-Based Multi-Scale Pooling Strategy for Acoustic Scene Classification
Rong HUANG Yue XIE

LETTER-Speech and Hearing

Pubricized:
2023/10/17
Vol:
E107-D No:1
Page(s):
153-156
Acoustic scene classification (ASC) is a fundamental domain within the realm of artificial intelligence classification tasks. ASC-based tasks commonly employ models based on convolutional neural networks (CNNs) that utilize log-Mel spectrograms as input for gathering acoustic features. In this paper, we designed a CNN-based multi-scale pooling (MSP) strategy for ASC. The log-Mel spectrograms are utilized as the input to CNN, which is partitioned into four frequency axis segments. Furthermore, we devised four CNN channels to acquire inputs from distinct frequency ranges. The high-level features extracted from outputs in various frequency bands are integrated through frequency pyramid average pooling layers at multiple levels. Subsequently, a softmax classifier is employed to classify different scenes. Our study demonstrates that the implementation of our designed model leads to a significant enhancement in the model's performance, as evidenced by the testing of two acoustic datasets.

Author Search Result

[Author] Rong HUANG(2hit)

Scene Character Detection and Recognition with Cooperative Multiple-Hypothesis Framework

A CNN-Based Multi-Scale Pooling Strategy for Acoustic Scene Classification

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles