IEICE global.ieice.org Site

Author Search Result

[Author] Takeshi YOSHIMURA(28hit)

1-20hit(28hit)

FOREWORD
Takeshi YOSHIMURA

FOREWORD

Vol:
E79-A No:12
Page(s):
2085-2085
An Engineering Change Orders Design Method Based on Patchwork-Like Partitioning for High Performance LSIs
Yuichi NAKAMURA Ko YOSHIKAWA Takeshi YOSHIMURA

PAPER-Logic Synthesis

Vol:
E88-A No:12
Page(s):
3351-3357
This paper describes a novel engineering change order (ECO) design method for large-scale, high performance LSIs, based on a patchwork-like partitioning technique. In conventional design methods, even when only small changes are made to the design after the placement and routing process, a whole re-layout must be done, and this is very time consuming. Using the proposed method, we can partition the design into several parts after logic synthesis. When design changes occur in HDL, only the parts related to the changes need to be redesigned. The netlist for the changed design remains almost the same as the original, except for the small changed parts. For partitioning, we used multiple-fan-out-points as partition borders. An experimental evaluation of our method showed that when a small change was made in the RTL description, the revised circuit part had only about 87 gates on average. This greatly reduces the re-layout time required for implementing an ECO. In actual commercial designs in which several design changes are required, it takes only one day to redesign.
Multiple-Reference Compression of RTP/UDP/IP Headers for Mobile Multimedia Communications
Takeshi YOSHIMURA Toshiro KAWAHARA Tomoyuki OHYA Minoru ETOH

PAPER

Vol:
E85-A No:7
Page(s):
1491-1500
In this paper, we propose an RTP/UDP/IP header compression method, Multiple-Reference Compression (MRC), which is designed for mobile multimedia communications. MRC is a compression method that calculates differences from the multiple reference headers that have already been sent and inserts them into a compressed header. The receiver can decompress the compressed header as long as at least one of the reference headers is correctly received and decompressed. MRC improves robustness against packet losses compared with CRTP defined in IETF RFC2508, and imposes less overheads and computational burden than robust header compression (ROHC) defined in RFC3095. We also implemented MRC and other header compression algorithms into our mobile testbed, and conducted multimedia streaming experiments over the testbed. The results of the experiments show that MRC offers the same level of packet loss rate as Legacy RTP for both audio and video streams, and provides better media quality than Legacy RTP and CRTP on error-prone radio links. Header compression robust against packet losses is expected as a key technology for VoIP and multimedia streaming services over 3G and future mobile networks.
Mobility Overlap-Removal-Based Leakage Power and Register-Aware Scheduling in High-Level Synthesis
Nan WANG Song CHEN Wei ZHONG Nan LIU Takeshi YOSHIMURA

PAPER-VLSI Design Technology and CAD

Vol:
E97-A No:8
Page(s):
1709-1719
Scheduling is a key problem in high level synthesis, as the scheduling results affect most of the important design metrics. In this paper, we propose a novel scheduling method to simultaneously optimize the leakage power of functional units with dual-Vth techniques and the number of registers under given timing and resource constraints. The mobility overlaps between operations are removed to eliminate data dependencies, and a simulated-annealing-based method is introduced to explore the mobility overlap removal solution space. Given the overlap-free mobilities, the resource usage and register usage in each control step can be accurately estimated. Meanwhile, operations are scheduled so as to optimize the leakage power of functional units with minimal number of registers. Then, a set of operations is iteratively selected, reassigned as low-Vth, and rescheduled until the resource constraints are all satisfied. Experimental results show the efficiency of the proposed algorithm.
Unified Parameter Decoder Architecture for H.265/HEVC Motion Vector and Boundary Strength Decoding
Shihao WANG Dajiang ZHOU Jianbin ZHOU Takeshi YOSHIMURA Satoshi GOTO

PAPER

Vol:
E98-A No:7
Page(s):
1356-1365
In this paper, VLSI architecture design of unified motion vector (MV) and boundary strength (BS) parameter decoder (PDec) for 8K UHDTV HEVC decoder is presented. The adoption of new coding tools in PDec, such as Advanced Motion Vector Prediction (AMVP), increases the VLSI hardware realization overhead and memory bandwidth requirement, especially for 8K UHDTV application. We propose four techniques for these challenges. Firstly, this work unifies MV and BS parameter decoders for line buffer memory sharing. Secondly, to support high throughput, we propose the top-level CU-adaptive pipeline scheme by trading off between implementation complexity and performance. Thirdly, PDec process engine with optimizations is adopted for 43.2k area reduction. Finally, PU-based coding scheme is proposed for 30% DRAM bandwidth reduction. In 90nm process, our design costs 93.3k logic gates with 23.0kB line buffer. The proposed architecture can support real-time decoding for 7680x4320@60fps application at 249MHz in the worst case.
High Performance VLSI Architecture of H.265/HEVC Intra Prediction for 8K UHDTV Video Decoder
Jianbin ZHOU Dajiang ZHOU Shihao WANG Takeshi YOSHIMURA Satoshi GOTO

PAPER-High-Level Synthesis and System-Level Design

Vol:
E98-A No:12
Page(s):
2519-2527
8K Ultra High Definition Television (UHDTV) requires extremely high throughput for video decoding based on H.265. In H.265, intra coding could significantly enhance video compression efficiency, at the expense of an increased computational complexity compared with H.264. For intra prediction of 8K UHDTV real-time H.265 decoding, the joint complexity and throughput issue is more difficult to solve. Therefore, based on the divide-and-conquer strategy, we propose a new VLSI architecture in this paper, including two techniques, in order to achieve 8K UHDTV H.265 intra prediction decoding. The first technique is the LUT based Reference Sample Fetching Scheme (LUT-RSFS), reducing the number of reference samples in the worst case from 99 to 13. It further reduces the circuit area and enhances the performance. The second one is the Hybrid Block Reordering and Data Forwarding (HBRDF), minimizing the idle time and eliminating the dependency between TUs by creating 3 Data Forwarding paths. It achieves the hardware utilization of 94%. Our design is synthesized using Synopsys Design Compiler in 40nm process technology. It achieves an operation frequency of 260MHz, with a gate count of 217.8K for 8-bit design, and 251.1K for 10-bit design. The proposed VLSI architecture can support 4320p@120fps H.265 intra decoding (8-bit or 10-bit), with all 35 intra prediction modes and prediction unit sizes ranging from 4×4 to 64×64.
An Efficient Multi-Level Algorithm for 3D-IC TSV Assignment
Cong HAO Takeshi YOSHIMURA

PAPER-VLSI Design Technology and CAD

Vol:
E100-A No:3
Page(s):
776-784
Through-silicon via (TSV) assignment problem is one of the key design challenges of 3-D IC which is crucial to the wire length and signal delay. In this work we formulate the 3-D IC TSV assignment as an Integer Minimum Cost Multi Commodity (IMCMC) problem on a IMCMC network, and propose a multi-level algorithm. It coarsens the IMCMC network level by level, applies a rough flow assignment on each level of coarsened graph, and generates only promising edges to reduce the IMCMC network size. Benefiting from the multi-level structure, we propose a mixed single and multi commodity flow method improve the TSV assignment solution quality. Moreover, given a TSV assignment, we propose an extended layer by layer algorithm to further optimize the TSV assignment. The experimental results demonstrate that our multi-level with mixed single and multi commodity flow algorithm achieves not only smaller wire length but also shorter runtime compared to other existing works.
Framework and VLSI Architecture of Measurement-Domain Intra Prediction for Compressively Sensed Visual Contents
Jianbin ZHOU Dajiang ZHOU Li GUO Takeshi YOSHIMURA Satoshi GOTO

PAPER

Vol:
E100-A No:12
Page(s):
2869-2877
This paper presents a measurement-domain intra prediction coding framework that is compatible with compressive sensing (CS)-based image sensors. In this framework, we propose a low-complexity intra prediction algorithm that can be directly applied to measurements captured by the image sensor. We proposed a structural random 0/1 measurement matrix, embedding the block boundary information that can be extracted from the measurements for intra prediction. Furthermore, a low-cost Very Large Scale Integration (VLSI) architecture is implemented for the proposed framework, by substituting the matrix multiplication with shared adders and shifters. The experimental results show that our proposed framework can compress the measurements and increase coding efficiency, with 34.9% BD-rate reduction compared to the direct output of CS-based sensors. The VLSI architecture of the proposed framework is 9.1 Kin area, and achieves the 83% reduction in size of memory bandwidth and storage for the line buffer. This could significantly reduce both the energy consumption and bandwidth in communication of wireless camera systems, which are expected to be massively deployed in the Internet of Things (IoT) era.
Approximate-DCT-Derived Measurement Matrices with Row-Operation-Based Measurement Compression and its VLSI Architecture for Compressed Sensing
Jianbin ZHOU Dajiang ZHOU Takeshi YOSHIMURA Satoshi GOTO

PAPER

Vol:
E101-C No:4
Page(s):
263-272
Compressed Sensing based CMOS image sensor (CS-CIS) is a new generation of CMOS image sensor that significantly reduces the power consumption. For CS-CIS, the image quality and data volume of output are two important issues to concern. In this paper, we first proposed an algorithm to generate a series of deterministic and ternary matrices, which improves the image quality, reduces the data volume and are compatible with CS-CIS. Proposed matrices are derived from the approximate DCT and trimmed in 2D-zigzag order, thus preserving the energy compaction property as DCT does. Moreover, we proposed matrix row operations adaptive to the proposed matrix to further compress data (measurements) without any image quality loss. At last, a low-cost VLSI architecture of measurements compression with proposed matrix row operations is implemented. Experiment results show our proposed matrix significantly improve the coding efficiency by BD-PSNR increase of 4.2 dB, comparing with the random binary matrix used in the-state-of-art CS-CIS. The proposed matrix row operations for measurement compression further increases the coding efficiency by 0.24 dB BD-PSNR (4.8% BD-rate reduction). The VLSI architecture is only 4.3 K gates in area and 0.3 mW in power consumption.
Floorplanning for High Utilization of Heterogeneous FPGAs
Nan LIU Song CHEN Takeshi YOSHIMURA

PAPER-VLSI Design Technology and CAD

Vol:
E95-A No:9
Page(s):
1529-1537
Heterogeneous resources such as configurable logic blocks (CLBs), multiplier blocks (MULs) and RAM blocks (RAMs) where millions of logic gates are included have been added to field programmable gate arrays (FPGAs). The fixed-outline floorplanning used by the existing methods always has a big penalty item in the objective function to ensure all the modules are placed in the specified chip region, which maybe greatly degrade the wirelength. This paper presents a three-phase floorplanning method for heterogeneous FPGAs. First, a non-slicing free-outline floorplanning method is used to optimize the wirelength, however, in this phase, the satisfaction of resource requirements from functional modules might fail. Second, a min-cost-max-flow algorithm is used to tune the assignment of CLBs to functional modules, and assign contiguous regions to each module so that all the functional modules satisfy CLB requirements. Finally, the MULs and RAMs are allocated to modules by a network flow model. CLBs hold the maximum quantity among all the resources. Therefore, making a high utilization of them means an enhancement of the FPGA densities. The proposed method can improve the utilization of CLBs, hence, much larger circuits could be mapped to the same FPGA chip. The results show that about 7–85% wirelength reduction is obtained, and CLB utilization is improved by about 25%.
Max-Flow Scheduling in High-Level Synthesis
Liangwei GE Song CHEN Kazutoshi WAKABAYASHI Takashi TAKENAKA Takeshi YOSHIMURA

PAPER-VLSI Design Technology and CAD

Vol:
E90-A No:9
Page(s):
1940-1948
Scheduling, an essential step in high-level synthesis, is an intractable process. Traditional heuristic scheduling methods usually search schedules directly in the entire solution space. In this paper, we propose the idea of searching within an intermediate solution space (ISS). We put forward a max-flow scheduling method that heuristically prunes the solution space into a specific ISS and finds the optimum of ISS in polynomial time. The proposed scheduling algorithm has some unique features, such as the correction of previous scheduling decisions in a later stage, the simultaneous scheduling of all the operations, and the optimization of more complicated objectives. Aided by the max-flow scheduling method, we implement the optimization of the IC power-ground integrity problem at the behavior level conveniently. Experiments on well-known benchmarks show that without requiring additional resources or prolonging schedule latency, the proposed scheduling method can find a schedule that draws current more stably from a supply, which mitigates the voltage fluctuation in the on-chip power distribution network.
Score Sequence Pair Problems of (r₁₁, r₁₂, r₂₂)-Tournaments--Determination of Realizability--
Masaya TAKAHASHI Takahiro WATANABE Takeshi YOSHIMURA

PAPER-Graph Algorithms

Vol:
E90-D No:2
Page(s):
440-448
Let G be any graph with property P (for example, general graph, directed graph, etc.) and S be nonnegative and non-decreasing integer sequence(s). The prescribed degree sequence problem is a problem to determine whether there is a graph G having S as the prescribed sequence(s) of degrees or outdegrees of the vertices. From 1950's, P has attracted wide attentions, and its many extensions have been considered. Let P be the property satisfying the following (1) and (2):(1) G is a directed graph with two disjoint vertex sets A and B. (2) There are r11 (r22, respectively) directed edges between every pair of vertices in A(B), and r12 directed edges between every pair of vertex in A and vertex in B. Then G is called an (r11, r12, r22)-tournament ("tournament", for short). The problem is called the score sequence pair problem of a "tournament" (realizable, for short). S is called a score sequence pair of a "tournament" if the answer of the problem is "yes." In this paper, we propose the characterizations of a score sequence pair of a "tournament" and an algorithm for determining in linear time whether a pair of two integer sequences is realizable or not.
Acoustic OFDM System and Performance Analysis
Hosei MATSUOKA Yusuke NAKASHIMA Takeshi YOSHIMURA

PAPER

Vol:
E91-A No:7
Page(s):
1652-1658
This paper presents a technology for short-range communications using sound wave, in which the modulated data signal can be transmitted in parallel with regular audio without significantly degrading the quality of the sound. The technology, which we call Acoustic OFDM, replaces the high frequency band of the audio signal with OFDM carriers, each of which is power-controlled according to the spectrum envelope of the original audio signal. It can provide data transmission of several hundreds bps. The implemented Acoustic OFDM system enables the transmission of short text messages from loud speakers to mobile devices at a distance of around 3 m.
FOREWORD
Winfried HAHN Takeshi YOSHIMURA

FOREWORD

Vol:
E76-A No:10
Page(s):
1615-1616
Redundant via Insertion: Removing Design Rule Conflicts and Balancing via Density
Song CHEN Jianwei SHEN Wei GUO Mei-Fang CHIANG Takeshi YOSHIMURA

PAPER-Physical Level Design

Vol:
E93-A No:12
Page(s):
2372-2379
The occurrence of via defects increases due to the shrinking size in integrated circuit manufacturing. Redundant via insertion is an effective and recommended method to reduce the yield loss caused by via failures. In this paper, we introduce the redundant via allocation problem for layer partition-based redundant via insertion methods [1] and solve it using the genetic algorithm. At the same time, we use a convex-cost flow model to equilibrate the via density, which is good for the via density rules. The results of layer partition-based model depend on the partition and processing order of metal layers. Furthermore, even we try all of partitions and processing orders, we might miss the optimal solutions. By introducing the redundant via allocation problem on partitioning boundaries, we can avoid the sub-optimality of the original layer-partition based method. The experimental results show that the proposed method got 12 more redundant vias inserted on average and the via density balance can be greatly improved.
A Minimum Bandwidth Guaranteed Service Model and Its Implementation on Wireless Packet Scheduler
Mooryong JEONG Takeshi YOSHIMURA Hiroyuki MORIKAWA Tomonori AOYAMA

PAPER

Vol:
E85-A No:7
Page(s):
1463-1471
In this paper, we introduce a concept of minimum bandwidth guaranteed service model for mobile multimedia. In this service model, service is defined in the context of the guaranteed minimum bandwidth and the residual service share. Each flow under this service model is guaranteed with its minimum bandwidth and provided with more in proportion to the residual service share if there is leftover bandwidth. The guaranteed minimum bandwidth assures a flow to keep minimum tolerable quality regardless of the network load, while the leftover bandwidth enhances the quality of service according to the application's adaptivity and the user's interest. We show that the minimum bandwidth guaranteed service model could be implemented by a two-folded wireless packet scheduler consisting of a guaranteed scheduler and a sharing scheduler. Wireless channel condition of each flow is considered in scheduling so that wireless resource can be distributed only to the flows of good channel state, improving total wireless link utilization. We evaluate the service model and the scheduling method by simulation and implementation.
Mobile Broadcast Streaming Service and Protocols on Unidirectional Radio Channels
Takeshi YOSHIMURA Tomoyuki OHYA

PAPER-Multicast/Broadcast

Vol:
E87-B No:9
Page(s):
2596-2604
In this paper, we propose a set of broadcast streaming protocols designed for unidirectional radio channels. Considering the limited size and implementation overhead on a mobile terminal, the proposed protocol set is almost compliant with the current mobile streaming protocols, i.e. 3GPP PSS (Packet-switched Streaming Service), except for that the proposed protocols are designed to work on a unidirectional downlink channel. This protocol set enables flexible layout rendering by SMIL (Synchronized Multimedia Integration Language) in combination with SDP (Session Description Protocol), and reliable and synchronized static media (including still image and text) delivery by RTP (Real-time Transport Protocol) carousel. We present the prototype of this protocol set and measure its performance of video quality and waiting time for video presentation through a W-CDMA radio channel emulator and header compression nodes. From the experimental results, we show 1) trade-off between video quality and waiting time, 2) advantage and disadvantage of header compression, 3) effectiveness of synchronized transmission of SDP, SMIL, and I-frames of video objects, and 4) reliability of RTP-carousel. This protocol set is applicable to 3G MBMS (Multimedia Broadcast/Multicast Service) streaming service.
Real-Time UHD Background Modelling with Mixed Selection Block Updates
Axel BEAUGENDRE Satoshi GOTO Takeshi YOSHIMURA

PAPER-IMAGE PROCESSING

Vol:
E100-A No:2
Page(s):
581-591
The vast majority of foreground detection methods require heavy hardware optimization to process in real-time standard definition videos. Indeed, those methods process the whole frame for the detection but also for the background modelling part which makes them resource-guzzlers (time, memory, etc.) unable to be applied to Ultra High Definition (UHD) videos. This paper presents a real-time background modelling method called Mixed Block Background Modelling (MBBM). It is a spatio-temporal approach which updates the background model by carefully selecting block by a linear and pseudo-random orders and update the corresponding model's block parts. The two block selection orders make sure that every block will be updated. For foreground detection purposes, the method is combined with a foreground detection designed for UHD videos such as the Adaptive Block-Propagative Background Subtraction method. Experimental results show that the proposed MBBM can process 50min. of 4K UHD videos in less than 6 hours. while other methods are estimated to take from 8 days to more than 21 years. Compared to 10 state-of-the-art foreground detection methods, the proposed MBBM shows the best quality results with an average global quality score of 0.597 (1 being the maximum) on a dataset of 4K UHDTV sequences containing various situation like illumination variation. Finally, the processing time per pixel of the MBBM is the lowest of all compared methods with an average of 3.18×10-8s.
Content Delivery Network Architecture for Mobile Streaming Service Enabled by SMIL Modification
Takeshi YOSHIMURA Yoshifumi YONEMOTO Tomoyuki OHYA Minoru ETOH Susie WEE

PAPER-CDN Architecture

Vol:
E86-B No:6
Page(s):
1778-1787
In this paper, we present a CDN (Content Delivery Network) architecture for mobile streaming service in which content segmentation, request routing, pre-fetch scheduling, and session handoff are controlled by SMIL (Synchronized Multimedia Integration Language) modification. In this architecture, mobile clients simply follow modified SMIL files downloaded from a portal server; these modifications enable multimedia content to be delivered to the mobile clients from the best surrogates in the CDN. The key components of this architecture are 1) content segmentation with SMIL modification, 2) on-demand rewriting of URLs in SMIL, 3) pre-fetch scheduling based on timing information derived from SMIL, and 4) SMIL updates by SOAP (Simple Object Access Protocol) messaging for session handoffs due to client mobility. This architecture enhances streaming media quality for mobile clients while utilizing network resources efficiently and supporting client mobility in an integrated and practical way. The current status of our prototype on a mobile QoS testbed "MOBIQ" is also reported in this paper.
Lagrangian Relaxation Based Inter-Layer Signal Via Assignment for 3-D ICs
Song CHEN Liangwei GE Mei-Fang CHIANG Takeshi YOSHIMURA

PAPER

Vol:
E92-A No:4
Page(s):
1080-1087
Three-dimensional integrated circuits (3-D ICs), i.e., stacked dies, can alleviate the interconnect problem coming with the decreasing feature size and increasing integration density, and promise a solution to heterogenous integration. The vertical connection, which is generally implemented by the through-the-silicon via, is a key technology for 3-D ICs. In this paper, given 3-D circuit placement or floorplan results with white space reserved between blocks for inter-layer interconnections, we proposed methods for assigning inter-layer signal via locations. Introducing a grid structure on the chip, the inter-layer via assignment of two-layer chips can be optimally solved by a convex-cost max-flow formulation with signal via congestion optimized. As for 3-D ICs with three or more layers, the inter-layer signal via assignment is modeled as an integral min-cost multi-commodity flow problem, which is solved by a heuristic method based on the lagrangian relaxation. Relaxing the capacity constraints in the grids, we transfer the min-cost multi-commodity flow problem to a sequence of lagrangian sub-problems, which are solved by finding a sequence of shortest paths. The complexity of solving a lagrangian sub-problem is O(nntng2), where nnt is the number of nets and ng is the number of grids on one chip layer. The experimental results demonstrated the effectiveness of the method.

1-20hit(28hit)

Author Search Result

[Author] Takeshi YOSHIMURA(28hit)

FOREWORD

An Engineering Change Orders Design Method Based on Patchwork-Like Partitioning for High Performance LSIs

Multiple-Reference Compression of RTP/UDP/IP Headers for Mobile Multimedia Communications

Mobility Overlap-Removal-Based Leakage Power and Register-Aware Scheduling in High-Level Synthesis

Unified Parameter Decoder Architecture for H.265/HEVC Motion Vector and Boundary Strength Decoding

High Performance VLSI Architecture of H.265/HEVC Intra Prediction for 8K UHDTV Video Decoder

An Efficient Multi-Level Algorithm for 3D-IC TSV Assignment

Framework and VLSI Architecture of Measurement-Domain Intra Prediction for Compressively Sensed Visual Contents

Approximate-DCT-Derived Measurement Matrices with Row-Operation-Based Measurement Compression and its VLSI Architecture for Compressed Sensing

Floorplanning for High Utilization of Heterogeneous FPGAs

Max-Flow Scheduling in High-Level Synthesis

Score Sequence Pair Problems of (r₁₁, r₁₂, r₂₂)-Tournaments--Determination of Realizability--

Acoustic OFDM System and Performance Analysis

FOREWORD

Redundant via Insertion: Removing Design Rule Conflicts and Balancing via Density

A Minimum Bandwidth Guaranteed Service Model and Its Implementation on Wireless Packet Scheduler

Mobile Broadcast Streaming Service and Protocols on Unidirectional Radio Channels

Real-Time UHD Background Modelling with Mixed Selection Block Updates

Content Delivery Network Architecture for Mobile Streaming Service Enabled by SMIL Modification

Lagrangian Relaxation Based Inter-Layer Signal Via Assignment for 3-D ICs

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles