IEICE global.ieice.org Site

Keyword Search Result

[Keyword] arc(1309hit)

441-460hit(1309hit)

Adaptive Tree Search Algorithm Based on Path Metric Ratio for MIMO Systems
Bong-seok KIM Kwonhue CHOI

PAPER-Wireless Communication Technologies

Vol:
E94-B No:4
Page(s):
997-1005
We propose new adaptive tree search algorithms for multiple-input multiple-output (MIMO) systems based on path metric comparison. With the fixed number of survivor paths, the correct path metric may be temporarily larger than the maximum path metric of the survivor paths under an ill-conditioned channel. There have been also adaptive path metric algorithms that control the number of survivor paths according to SNR. However, these algorithms cannot instantaneously adapt to the channel condition. The proposed algorithms accomplish dynamic adaptation based on the ratio of two minimum path metrics as the minimum is significantly smaller than the second minimum under good channel conditions and vice versa. The proposed algorithms are much less complex than the conventional noise variance-based adaptive tree search algorithms while keeping lower or similar error performance. We first employ the proposed adaptive tree search idea to K-best detection and then extend it QRD-M MIMO detection.
A 530 Mpixels/s Intra Prediction Architecture for Ultra High Definition H.264/AVC Encoder
Gang HE Dajiang ZHOU Jinjia ZHOU Tianruo ZHANG Satoshi GOTO

PAPER

Vol:
E94-C No:4
Page(s):
419-427
Intra coding in H.264/AVC significantly enhances video compression efficiency. However, due to the high data dependency of intra prediction in H.264, both pipelining and parallel processing techniques are limited to be applied. Moreover, it is difficult to get high hardware utilization and throughput because of the long block/MB-level reconstruction loops. This paper proposes a high-performance intra prediction architecture that can support H.264/AVC high profile. The proposed MB/block co-reordering can avoid data dependency and improve pipeline utilization. Therefore, the timing constraint of real-time 40962160 encoding can be achieved with negligible quality loss. 1616 prediction engine and 88 prediction engine work parallel for prediction and coefficients generating. A reordering interlaced reconstruction is also designed for fully pipelined architecture. It takes only 160 cycles to process one macroblock (MB). Hardware utilization of prediction and reconstruction modules is almost 100%. Furthermore, PE-reusable 88 intra predictor and hybrid SAD & SATD mode decision are proposed to save hardware cost. The design is implemented by 90 nm CMOS technology with 113.2 k gates and can encode 40962160 video sequences at 60 fps with operation frequency of 332 MHz.
Optimal Pivot Selection Method Based on the Partition and the Pruning Effect for Metric Space Indexes
Hisashi KURASAWA Daiji FUKAGAWA Atsuhiro TAKASU Jun ADACHI

PAPER

Vol:
E94-D No:3
Page(s):
504-514
This paper proposes a new method to reduce the cost of nearest neighbor searches in metric spaces. Many similarity search indexes recursively divide a region into subregions by using pivots, and construct a tree-structured index. Most of recently developed indexes focus on pruning objects and do not pay much attention to the tree balancing. As a result, indexes having imbalanced tree-structure may be constructed and the search cost is degraded. We propose a similarity search index called the Partitioning Capacity (PC) Tree. It selects the optimal pivot in terms of the PC that quantifies the balance of the regions partitioned by a pivot as well as the estimated effectiveness of the search pruning by the pivot. As a result, PCTree reduces the search cost for various data distributions. We experimentally compared PCTree with four indexes using synthetic data and five real datasets. The experimental results shows that the PCTree successfully reduces the search cost.
Low Complexity Filter Architecture for ATSC Terrestrial Broadcasting DTV Systems
Yong-Kyu KIM Chang-Seok CHOI Hanho LEE

PAPER-VLSI Design Technology and CAD

Vol:
E94-A No:3
Page(s):
937-945
This paper presents a low complexity partially folded architecture of transposed FIR filter and cubic B-spline interpolator for ATSC terrestrial broadcasting systems. By using the multiplexer, the proposed FIR filter and interpolator can provide high clock frequency and low hardware complexity. A binary representation method was used for designing the high order FIR filter. Also, in order to compensate the truncation error of FIR filter outputs, a fixed-point range detection method was used. The proposed partially folded architecture was designed and implemented with 90-nm CMOS technology that had a supply voltage of 1.1 V. The implementation results show that the proposed architectures have 12% and 16% less hardware complexity than the other kinds of architecture. Also, both the filter and the interpolator operate at a clock frequency of 200 MHz and 385 MHz, respectively.
Page History Explorer: Visualizing and Comparing Page Histories
Adam JATOWT Yukiko KAWAI Katsumi TANAKA

PAPER

Vol:
E94-D No:3
Page(s):
564-577
Due to the increased preservation efforts, large amounts of past Web data have been stored in Web archives and other archival repositories. Utilizing this data can offer certain benefits to users, for example, it can facilitate page understanding. In this paper, we propose a system for interactive exploration of page histories. We demonstrate an application called Page History Explorer (PHE) for summarizing and visualizing histories of Web pages. PHE portrays the overview of page evolution, characterizes its typical content over time and lets users observe page histories from different viewpoints. In addition, it enables flexible comparison of histories of different pages.
Query Expansion and Text Mining for ChronoSeeker -- Search Engine for Future/Past Events --
Hideki KAWAI Adam JATOWT Katsumi TANAKA Kazuo KUNIEDA Keiji YAMADA

PAPER

Vol:
E94-D No:3
Page(s):
552-563
This paper introduces a future and past search engine, ChronoSeeker, which can help users to develop long-term strategies for their organizations. To provide on-demand searches, we tackled two technical issues: (1) organizing efficient event searches and (2) filtering out noises from search results. Our system employed query expansion with typical expressions related to event information such as year expressions, temporal modifiers, and context terms for efficient event searches. We utilized a machine-learning technique of filtering noise to classify candidates into information or non-event information, using heuristic features and lexical patterns derived from a text-mining approach. Our experiment revealed that filtering achieved an 85% F-measure, and that query expansion could collect dozens more events than those without expansion.
A General Reverse Converter Architecture with Low Complexity and High Performance
Keivan NAVI Mohammad ESMAEILDOUST Amir SABBAGH MOLAHOSSEINI

PAPER-Computer System

Vol:
E94-D No:2
Page(s):
264-273
This paper presents a general architecture for designing efficient reverse converters based on the moduli set {2α, 22β+1-1, 2β-1}, where β < α ≤ 2β, by using a parallel implementation of mixed-radix conversion (MRC) algorithm. The moduli set {2α, 22β+1-1, 2β-1} is free from modulo (2k+1)-type which can result in an efficient arithmetic unit for residue number system (RNS). The values of α and β can be selected to provide the required dynamic range (DR) and also to adjust the desired equilibrium between moduli bit-width. The simple multiplicative inverses of the proposed moduli set and also using novel techniques to simplify conversion equations lead to a low-complexity and high-performance general reverse converter architecture that can be used to support different DRs. Moreover, due to the current importance of the 5n-bit DR moduli sets, we also introduced the moduli set {22n, 22n+1-1, 2n-1} which is a special case of the general set {2α, 22β+1-1, 2β-1}, where α=2n and β=n. The converter for this special set is derived from the presented general architecture with higher speed than the fastest state-of-the-art reverse converter which has been designed for the 5n-bit DR moduli set {22n, 22n+1-1, 2n-1}. Furthermore, theoretical and FPGA implementation results show that the proposed reverse converter for moduli set {22n, 22n+1-1, 2n-1} results in considerable improvement in conversion delay with less hardware requirements compared to other works with similar DR.
An All-Zero Block Mode Decision Algorithm for H.264/AVC Optimization
Chaoke PEI Li GAO Donghui WANG Chaohuan HOU

LETTER-Image Processing and Video Processing

Vol:
E94-D No:2
Page(s):
384-387
The H.264/AVC standard achieves significantly high coding efficiency if multiple block size Motion Estimation is adopted. However, the complexity of Motion Estimation and DCT is dramatically increased as a result. In previous work we propose an early mode decision algorithm to control the complexity, based on all-zero-blocks detection in 1616 size. In this paper, we improve the algorithm. Firstly, we propose to detect all-zero blocks in 1616, 88 and 44 sizes to simplify the course of mode decision. Secondly, we define the thresholds which are used to terminate motion estimation and mode decision in advance for these sizes. Last, we present the whole proposed algorithm. Experiments show that about 77% encoding time and 85% motion estimation time can be saved on average, which is better than state-of-the-art approaches.
Improving Keyword Match for Semantic Search
Hangkyu KIM Chang-Sup PARK Yoon Joon LEE

LETTER-Artificial Intelligence, Data Mining

Vol:
E94-D No:2
Page(s):
375-378
Semantic search can be divided into three steps. Keyword matching, the first step, significantly impacts the search results, since the following steps are based on it. In this paper, we propose a keyword matching methodology that aggregates relevance scores of the related text to define the score of an object. Validity of the approach is shown by experiments performed with three public data sets and the detailed analysis of the results.
A Differential Cross-Correlation Cell Search Algorithm for IEEE 802.16e OFDMA Systems
Juinn-Horng DENG Jeng-Kuang HWANG Shu-Min LIAO

LETTER-Wireless Communication Technologies

Vol:
E94-B No:2
Page(s):
587-590
A differential cross-correlation cell ID identification algorithm is proposed for IEEE 802.16e OFDMA cellular system. The cell ID represents the number of the preamble selected by the base station in downlink mode. First, we construct the downlink (DL) preamble structure and signal model with carrier frequency offset (CFO) and channel effects. Next, in order to achieve the initial synchronization, a differential receiver with cross correlation for all preamble patterns is proposed to search for cell ID. Simulation results confirm that the proposed structure is suitable for ITU fading channels and outperforms the conventional cell search system.
Post-Routing Double-Via Insertion for X-Architecture Clock Tree Yield Improvement
Chia-Chun TSAI Chung-Chieh KUO Trong-Yen LEE

PAPER-VLSI Design Technology and CAD

Vol:
E94-A No:2
Page(s):
706-716
As the VLSI manufacturing technology shrinks to 65 nm and below, reducing the yield loss induced by via failures is a critical issue in design for manufacturability (DFM). Semiconductor foundries highly recommend using the double-via insertion (DVI) method to improve yield and reliability of designs. This work applies the DVI method in the post-stage of an X-architecture clock routing for double-via insertion rate improvement. The proposed DVI-X algorithm constructs the bipartite graphs of the partitioned clock routing layout with single vias and redundant-via candidates (RVCs). Then, DVI-X applies the augmenting path approach associated with the construction of the maximal cliques to obtain the matching solution from the bipartite graphs. Experimental results on benchmarks show that DVI-X can achieve higher double-via insertion rate by 3% and less running time by 68% than existing works. Moreover, a skew tuning technique is further applied to achieve zero skew because the inserted double vias affect the clock skew.
An Instruction Mapping Scheme for FU Array Accelerator
Kazuhiro YOSHIMURA Takuya IWAKAMI Takashi NAKADA Jun YAO Hajime SHIMADA Yasuhiko NAKASHIMA

PAPER-Computer System

Vol:
E94-D No:2
Page(s):
286-297
Recently, we have proposed using a Linear Array Pipeline Processor (LAPP) to improve energy efficiency for various workloads such as image processing and to maintain programmability by working on VLIW codes. In this paper, we proposed an instruction mapping scheme for LAPP to fully exploit the array execution of functional units (FUs) and bypass networks by a mapper to fit the VLIW codes onto the FUs. The mapping can be finished within multi-cycles during a data prefetch before the array execution of FUs. According to an HDL based implementation, the hardware required for mapping scheme is 84% of the cost introduced by a baseline method. In addition, the proposed mapper can further help to shrink the size of array stage, as our results show that their combination becomes 88% of the baseline model in area.
Local Search with Probabilistic Modeling for Learning Multiple-Valued Logic Networks
Shangce GAO Qiping CAO Masahiro ISHII Zheng TANG

PAPER-Neural Networks and Bioengineering

Vol:
E94-A No:2
Page(s):
795-805
This paper proposes a probabilistic modeling learning algorithm for the local search approach to the Multiple-Valued Logic (MVL) networks. The learning model (PMLS) has two phases: a local search (LS) phase, and a probabilistic modeling (PM) phase. The LS performs searches by updating the parameters of the MVL network. It is equivalent to a gradient decrease of the error measures, and leads to a local minimum of error that represents a good solution to the problem. Once the LS is trapped in local minima, the PM phase attempts to generate a new starting point for LS for further search. It is expected that the further search is guided to a promising area by the probability model. Thus, the proposed algorithm can escape from local minima and further search better results. We test the algorithm on many randomly generated MVL networks. Simulation results show that the proposed algorithm is better than the other improved local search learning methods, such as stochastic dynamic local search (SDLS) and chaotic dynamic local search (CDLS).
Low-Complexity Algorithm for Log Likelihood Ratios in Coded MIMO Communications
Liming ZHENG Jooin WOO Kazuhiko FUKAWA Hiroshi SUZUKI Satoshi SUYAMA

PAPER-Wireless Communication Technologies

Vol:
E94-B No:1
Page(s):
183-193
This paper proposes a low-complexity algorithm to calculate log likelihood ratios (LLRs) of coded bits, which is necessary for channel decoding in coded MIMO mobile communications. An approximate LLR needs to find a pair of transmitted signal candidates that can maximize the log likelihood function under a constraint that a coded bit is equal to either one or zero. The proposed algorithm can find such a pair simultaneously, whereas conventional ones find them individually. Specifically, the proposed method searches for such candidates in directions of the noise enhancement using the MMSE detection as a starting point. First, an inverse matrix which the MMSE weight matrix includes is obtained and then the power method derives eigenvectors of the inverse matrix as the directions of the noise enhancement. With some eigenvectors, one-dimensional search and hard decision are performed. From the resultant signals, the transmitted signal candidates to be required are selected on the basis of the log likelihood function. Computer simulations with 44 MIMO-OFDM, 16QAM, and convolutional codes (rate =1/2, 2/3) demonstrate that the proposed algorithm requires only 1.0 dB more Eb/N0 than that of the maximum likelihood detection (MLD) in order to achieve packet error rate of 10-3, while reducing the complexity to about 0.2% of that of MLD.
Geometry Splitting: An Acceleration Technique of Quadtree-Based Terrain Rendering Using GPU
Eun-Seok LEE Byeong-Seok SHIN

PAPER-Computer Graphics

Vol:
E94-D No:1
Page(s):
137-145
In terrain visualization, the quadtree is the most frequently used data structure for progressive mesh generation. The quadtree provides an efficient level of detail selection and view frustum culling. However, most applications using quadtrees are performed on the CPU, because the pointer and recursive operation in hierarchical data structure cannot be manipulated in a programmable rendering pipeline. We present a quadtree-based terrain rendering method for GPU (Graphics Processing Unit) execution that uses vertex splitting and triangle splitting. Vertex splitting supports a level of detail selection, and triangle splitting is used for crack removal. This method offers higher performance than previous CPU-based quadtree methods, without loss of image quality. We can then use the CPU for other computations while rendering the terrain using only the GPU.
Improving the Performance of the Hough Detector in Search Radars
Ali MOQISEH Mahdi HADAVI Mohammad M. NAYEBI

PAPER-Sensing

Vol:
E94-B No:1
Page(s):
273-281
In this paper, the inherent problem of the Hough transform when applied to search radars is considered. This problem makes the detection probability of a target depend on the length of the target line in the data space in addition to the received SNR from it. It is shown that this problem results in a non-uniform distribution of noise power in the parameter space. In other words, noise power in some regions of the parameter space is greater than in others. Therefore, the detection probability of the targets covered by these regions will decrease. Our solution is to modify the Hough detector to remove the problem. This modification uses non-uniform quantization in the parameter space based on the Maximum Entropy Quantization method. The details of implementing the modified Hough detector in a search radar are presented according to this quantization method. Then, it is shown that by using this method the detection performance of the target will not depend on its length in the data space. The performance of the modified Hough detector is also compared with the standard Hough detector by considering their probability of detection and probability of false alarm. This comparison shows the performance improvement of the modified detector.
Low-Overhead Architecture for Security Tag
Ryota SHIOYA Daewung KIM Kazuo HORIO Masahiro GOSHIMA Shuichi SAKAI

PAPER-Computer System

Vol:
E94-D No:1
Page(s):
69-78
A security-tagged architecture is one that applies tags on data to detect attack or information leakage, tracking data flow. The previous studies using security-tagged architecture mostly focused on how to utilize tags, not how the tags are implemented. A naive implementation of tags simply adds a tag field to every byte of the cache and the memory. Such a technique, however, results in a huge hardware overhead. This paper proposes a low-overhead tagged architecture. We achieve our goal by exploiting some properties of tag, the non-uniformity and the locality of reference. Our design includes the use of uniquely designed multi-level table and various cache-like structures, all contributing to exploit these properties. Under simulation, our method was able to limit the memory overhead to 0.685%, where a naive implementation suffered 12.5% overhead.
How to Maximize the Potential of FPGA-Based DSPs for Modular Exponentiation
Daisuke SUZUKI Tsutomu MATSUMOTO

PAPER-Implementation

Vol:
E94-A No:1
Page(s):
211-222
This paper describes a modular exponentiation processing method and circuit architecture that can exhibit the maximum performance of FPGA resources. The modular exponentiation architecture proposed by us comprises three main techniques. The first one is to improve the Montgomery multiplication algorithm in order to maximize the performance of the multiplication unit in an FPGA. The second one is to balance and improve the circuit delay. The third one is to ensure scalability of the circuit. Our architecture can perform fast operations using small-scale resources; in particular, it can complete a 512-bit modular exponentiation as fast as in 0.26 ms with the smallest Virtex-4 FPGA, XC4VF12-10SF363. In fact the number of SLICEs used is approx. 4200, which proves the compactness of our design. Moreover, the scalability of our design also allows 1024-, 1536-, and 2048-bit modular exponentiations to be processed in the same circuit.
A Low-Cost Continuous-Flow Parallel Memory-Based FFT Processor for UWB Applications
Chin-Long WEY Shin-Yo LIN Hsu-Sheng WANG Hung-Lieh CHEN Chun-Ming HUANG

PAPER-VLSI Design Technology and CAD

Vol:
E94-A No:1
Page(s):
315-323
In UWB systems, data symbols are transmitted and received continuously. The Fast Fourier Transform (FFT) processor must be able to seamlessly process input/output data. This paper presents the design and implementation of a continuous data flow parallel memory-based FFT (CF-PMBFFT) processor without the use of input buffer for pre-loading the input data. The processor realizes a memory space of two N-words and multiple processing elements (PEs) to achieve the seamless data flow and meet the design requirement. The circuit has been fabricated in TSMC 0.18 µm 1P6M CMOS process with the supply voltage of 1.8 V. Measurement results of the test chip shows that the developed CF-PMBFFT processor takes a core area of 1.97 mm2 with a power consumption of 62.12 mW for a throughput rate of 528 MS/s.
How to Decide Selection Functions for Power Analysis: From the Viewpoint of Hardware Architecture of Block Ciphers
Daisuke SUZUKI Minoru SAEKI Koichi SHIMIZU Tsutomu MATSUMOTO

PAPER-Implementation

Vol:
E94-A No:1
Page(s):
200-210
In this paper we first demonstrate that effective selection functions in power analysis attacks change depending on circuit architectures of a block cipher. We then conclude that the most resistant architecture on its own, in the case of the loop architecture, has two data registers have separate roles: one for storing the plaintext and ciphertext, and the other for storing intermediate values. There, the pre-whitening operation is placed at the output of the former register. The architecture allows the narrowest range of selection functions and thereby has resistance against ordinary CPA. Thus, we can easily defend against attacks by ordinary CPA at the architectural level, whereas we cannot against DPA. Secondly, we propose a new technique called "self-templates" in order to raise the accuracy of evaluation of DPA-based attacks. Self-templates enable to differentiate meaningful selection functions for DPA-based attacks without any strong assumption as in the template attack. We also present the results of attacks to an AES co-processor on an ASIC and demonstrate the effectiveness of the proposed technique.

441-460hit(1309hit)

Keyword Search Result

[Keyword] arc(1309hit)

Adaptive Tree Search Algorithm Based on Path Metric Ratio for MIMO Systems

A 530 Mpixels/s Intra Prediction Architecture for Ultra High Definition H.264/AVC Encoder

Optimal Pivot Selection Method Based on the Partition and the Pruning Effect for Metric Space Indexes

Low Complexity Filter Architecture for ATSC Terrestrial Broadcasting DTV Systems

Page History Explorer: Visualizing and Comparing Page Histories

Query Expansion and Text Mining for ChronoSeeker -- Search Engine for Future/Past Events --

A General Reverse Converter Architecture with Low Complexity and High Performance

An All-Zero Block Mode Decision Algorithm for H.264/AVC Optimization

Improving Keyword Match for Semantic Search

A Differential Cross-Correlation Cell Search Algorithm for IEEE 802.16e OFDMA Systems

Post-Routing Double-Via Insertion for X-Architecture Clock Tree Yield Improvement

An Instruction Mapping Scheme for FU Array Accelerator

Local Search with Probabilistic Modeling for Learning Multiple-Valued Logic Networks

Low-Complexity Algorithm for Log Likelihood Ratios in Coded MIMO Communications

Geometry Splitting: An Acceleration Technique of Quadtree-Based Terrain Rendering Using GPU

Improving the Performance of the Hough Detector in Search Radars

Low-Overhead Architecture for Security Tag

How to Maximize the Potential of FPGA-Based DSPs for Modular Exponentiation

A Low-Cost Continuous-Flow Parallel Memory-Based FFT Processor for UWB Applications

How to Decide Selection Functions for Power Analysis: From the Viewpoint of Hardware Architecture of Block Ciphers

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles