The search functionality is under construction.

Author Search Result

[Author] Takeshi YOSHIMURA(28hit)

1-20hit(28hit)

  • Content Delivery Network Architecture for Mobile Streaming Service Enabled by SMIL Modification

    Takeshi YOSHIMURA  Yoshifumi YONEMOTO  Tomoyuki OHYA  Minoru ETOH  Susie WEE  

     
    PAPER-CDN Architecture

      Vol:
    E86-B No:6
      Page(s):
    1778-1787

    In this paper, we present a CDN (Content Delivery Network) architecture for mobile streaming service in which content segmentation, request routing, pre-fetch scheduling, and session handoff are controlled by SMIL (Synchronized Multimedia Integration Language) modification. In this architecture, mobile clients simply follow modified SMIL files downloaded from a portal server; these modifications enable multimedia content to be delivered to the mobile clients from the best surrogates in the CDN. The key components of this architecture are 1) content segmentation with SMIL modification, 2) on-demand rewriting of URLs in SMIL, 3) pre-fetch scheduling based on timing information derived from SMIL, and 4) SMIL updates by SOAP (Simple Object Access Protocol) messaging for session handoffs due to client mobility. This architecture enhances streaming media quality for mobile clients while utilizing network resources efficiently and supporting client mobility in an integrated and practical way. The current status of our prototype on a mobile QoS testbed "MOBIQ" is also reported in this paper.

  • Lagrangian Relaxation Based Inter-Layer Signal Via Assignment for 3-D ICs

    Song CHEN  Liangwei GE  Mei-Fang CHIANG  Takeshi YOSHIMURA  

     
    PAPER

      Vol:
    E92-A No:4
      Page(s):
    1080-1087

    Three-dimensional integrated circuits (3-D ICs), i.e., stacked dies, can alleviate the interconnect problem coming with the decreasing feature size and increasing integration density, and promise a solution to heterogenous integration. The vertical connection, which is generally implemented by the through-the-silicon via, is a key technology for 3-D ICs. In this paper, given 3-D circuit placement or floorplan results with white space reserved between blocks for inter-layer interconnections, we proposed methods for assigning inter-layer signal via locations. Introducing a grid structure on the chip, the inter-layer via assignment of two-layer chips can be optimally solved by a convex-cost max-flow formulation with signal via congestion optimized. As for 3-D ICs with three or more layers, the inter-layer signal via assignment is modeled as an integral min-cost multi-commodity flow problem, which is solved by a heuristic method based on the lagrangian relaxation. Relaxing the capacity constraints in the grids, we transfer the min-cost multi-commodity flow problem to a sequence of lagrangian sub-problems, which are solved by finding a sequence of shortest paths. The complexity of solving a lagrangian sub-problem is O(nntng2), where nnt is the number of nets and ng is the number of grids on one chip layer. The experimental results demonstrated the effectiveness of the method.

  • Logic Minimization for Large-Scale Networks Based on Multi-Signal Implications

    Masayuki YUGUCHI  Kazutoshi WAKABAYASHI  Takeshi YOSHIMURA  

     
    PAPER

      Vol:
    E82-A No:11
      Page(s):
    2390-2397

    This paper presents a novel implication-based method for logic minimization in large-scale, multi-level networks. It significantly reduces network size through repeated addition and removal of redundant subnetworks, utilizing multi-signal implications and relationships among these implications. These are handled on a transitive implication graph, proposed in this paper, which offers the practical use of implications for logic minimization. The proposed method holds great promise for the achievement of an interactive logic design environment for large-scale networks.

  • Data Transmission on AM Broadcast with Acoustic OFDM

    Yusuke NAKASHIMA  Hosei MATSUOKA  Takeshi YOSHIMURA  Hiroshi MIURA  Seiichi NAKAJIMA  Masanori MACHIDA  Gen-ichiro OHTA  

     
    PAPER

      Vol:
    E91-B No:10
      Page(s):
    3149-3156

    Data transmission via audio link on AM radio system is shown to be achievable by using Acoustic OFDM. We employ Acoustic OFDM to embed data onto audio contents that are then broadcast as AM radio signals. We tuned the parameters, and performed experiments. Text data as URL can be delivered to mobile phone through existing MF AM radio system and radios.

  • Floorplanning and Topology Synthesis for Application-Specific Network-on-Chips

    Wei ZHONG  Song CHEN  Bo HUANG  Takeshi YOSHIMURA  Satoshi GOTO  

     
    PAPER

      Vol:
    E96-A No:6
      Page(s):
    1174-1184

    Application-Specific Network-on-Chips (ASNoCs) have been proposed as a more promising solution than regular NoCs to the global communication challenges for particular applications in nanoscale System-on-Chip (SoC) designs. In ASNoC Design, one of the key challenges is to generate the most suitable and power efficient NoC topology under the constraints of the application specification. In this work, we present a two-step floorplanning (TSF) algorithm, integrating topology synthesis into floorplanning phase, to automate the synthesis of such ASNoC topologies. At the first-step floorplanning, during the simulated annealing, we explore the optimal positions and clustering of cores and implement an incremental path allocation algorithm to predictively evaluate the power consumption of the generated NoC topology. At the second-step floorplanning, we explore the optimal positions of switches and network interfaces on the floorplan. A power and timing aware path allocation algorithm is also integrated into this step to determine the connectivity across different switches. Experimental results on a variety of benchmarks show that our algorithm can produce greatly improved solutions over the latest works.

  • Resource-Aware Multi-Layer Floorplanning for Partially Reconfigurable FPGAs

    Nan LIU  Song CHEN  Takeshi YOSHIMURA  

     
    PAPER

      Vol:
    E96-C No:4
      Page(s):
    501-510

    Modern field programmable gate arrays (FPGAs) with heterogeneous resources are partially reconfigurable. Existing methods of reconfiguration-aware floorplanning have limitations with regard to homogeneous resources; they solve only a part of the reconfigurable problem. In this paper, first, a precise model for partially reconfigurable FPGAs is formulated, and then, a two-phase floorplanning approach is presented. In the proposed approach, resource distribution is taken into consideration at all times. In the first step, a resource-aware insertion-after-remove perturbation is devised on the basis of the multi-layer sequence pair constraint graphs, and resource-aware slack-based moves (RASBM) are made to satisfy resource requirements. In the second step, a resource-aware fixed-outline floorplanner is used, and RASBM are applied to pack the reconfigurable regions on the FPGAs. Experimental results show that the proposed approach is resource- and reconfiguration-aware, and facilitates stable floorplanning. In addition, it reduces the wire-length by 4–28% in the first step, and by 12% on average in the second step compared to the wire-length in previous approaches.

  • Hierarchical-Analysis-Based Fast Chip-Scale Power Estimation Method for Large and Complex LSIs

    Yuichi NAKAMURA  Takeshi YOSHIMURA  

     
    PAPER-Simulation and Verification

      Vol:
    E89-A No:12
      Page(s):
    3458-3463

    This paper presents a novel power estimation method for large and complex LSIs. The proposed method is based on simulation and is used for analyzing the ways in chip-scale gate-level circuits including processors and memory are affected by gated-clock power reduction and the voltage drop due to electrical resistance. The chip-scale power estimation based on simulation patterns generally takes enormous time. In order to reduce the time to obtain accurate estimation results based on simulation patterns, we introduce three approaches: "partitioning of target LSIs and simulation pattern," "memory modeling," and "processor modeling." After placing and routing, the target LSIs are partitioned into hierarchical blocks, memory, and processors. The power consumption of each hierarchical block is calculated by using the partitioned patterns generated from chip-scale simulation patterns. The power consumption of the processor and memory blocks is estimated by a method considering the static power consumption and the rate of LSI activity ratio. Experimental results for a commercial 0.18 µm-technology media processing chip show that the proposed method is 23 times faster than the conventional method without partitioning and that both the results are almost the same.

  • Leakage Power Aware Scheduling in High-Level Synthesis

    Nan WANG  Song CHEN  Cong HAO  Haoran ZHANG  Takeshi YOSHIMURA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E97-A No:4
      Page(s):
    940-951

    In this paper, we address the problem of scheduling operations into control steps with a dual threshold voltage (dual-Vth) technique, under timing and resource constraints. We present a two-stage algorithm for leakage power optimization. In the threshold voltage (Vth) assignment stage, the proposed algorithm first initializes all the operations to high-Vth, and then it iteratively shortens the critical path delay by reassigning the set of operations covering all the critical paths to low-Vth until the timing constraint is met. In the scheduling stage, a modified force-directed scheduling is implemented to schedule operations and to adjust threshold voltage assignments with a consideration of the resource constraints. To eliminate the potential resource constraint violations, the operations' threshold voltage adjustment problem is formulated as a “weighted interval scheduling” problem. The experimental results show that our proposed method performs better in both running time and leakage power reduction compared with MWIS [3].

  • Cluster Generation and Network Component Insertion for Topology Synthesis of Application-Specific Network-on-Chips

    Wei ZHONG  Takeshi YOSHIMURA  Bei YU  Song CHEN  Sheqin DONG  Satoshi GOTO  

     
    PAPER

      Vol:
    E95-C No:4
      Page(s):
    534-545

    Network-on-Chips (NoCs) have been proposed as a solution for addressing the global communication challenges in System-on-Chip (SoC) architectures that are implemented in nanoscale technologies. For the use of NoCs to be feasible in today's industrial designs, a custom-tailored, power- efficient NoC topology that satisfies the application characteristics is required. In this work, we present a design methodology that automates the synthesis of such application-specific NoC topologies. We present a method which integrates partitioning into floorplanning phase to explore optimal clustering of cores during floorplanning with minimized link and switch power consumption. Based on the size of applications, we also present an Integer Linear Programming and a heuristic method to place switches and network interfaces on the floorplan. Then, a power and timing aware path allocation algorithm is carried out to determine the connectivity across different switches. We perform experiments on several SoC benchmarks and present a comparison with the latest work. For small applications, the NoC topologies synthesized by our method show large improvements in power consumption (27.54%), hop-count (4%) and running time (66%) on average. And for large applications, the synthesized topologies result in large power (31.77%), hop-count (29%) and running time (94.18%) on average.

  • Timing Optimization Methodology Based on Replacing Flip-Flops by Latches

    Ko YOSHIKAWA  Keisuke KANAMARU  Yasuhiko HAGIHARA  Shigeto INUI  Yuichi NAKAMURA  Takeshi YOSHIMURA  

     
    PAPER-Logic Synthesis

      Vol:
    E87-A No:12
      Page(s):
    3151-3158

    Latch-based circuits have advantages for timing and are widely used for high-speed custom circuits. ASIC design flows, however, are based on circuits with flip-flops. This paper describes a new timing optimization algorithm by replacing the flip-flops in high-end ASICs by latches without changing the functionality of the circuits. Timing is optimized by using a fixed-phase retiming minimizing the impact of clock skew and jitter. A formal equivalence verification method that assures the logical correctness of the latch-replaced circuits is also proposed. Experimental results show that the optimization algorithm decreases the delay of benchmark circuits by as much as 17%.

  • FOREWORD

    Takeshi YOSHIMURA  

     
    FOREWORD

      Vol:
    E79-A No:12
      Page(s):
    2085-2085
  • An Engineering Change Orders Design Method Based on Patchwork-Like Partitioning for High Performance LSIs

    Yuichi NAKAMURA  Ko YOSHIKAWA  Takeshi YOSHIMURA  

     
    PAPER-Logic Synthesis

      Vol:
    E88-A No:12
      Page(s):
    3351-3357

    This paper describes a novel engineering change order (ECO) design method for large-scale, high performance LSIs, based on a patchwork-like partitioning technique. In conventional design methods, even when only small changes are made to the design after the placement and routing process, a whole re-layout must be done, and this is very time consuming. Using the proposed method, we can partition the design into several parts after logic synthesis. When design changes occur in HDL, only the parts related to the changes need to be redesigned. The netlist for the changed design remains almost the same as the original, except for the small changed parts. For partitioning, we used multiple-fan-out-points as partition borders. An experimental evaluation of our method showed that when a small change was made in the RTL description, the revised circuit part had only about 87 gates on average. This greatly reduces the re-layout time required for implementing an ECO. In actual commercial designs in which several design changes are required, it takes only one day to redesign.

  • Multiple-Reference Compression of RTP/UDP/IP Headers for Mobile Multimedia Communications

    Takeshi YOSHIMURA  Toshiro KAWAHARA  Tomoyuki OHYA  Minoru ETOH  

     
    PAPER

      Vol:
    E85-A No:7
      Page(s):
    1491-1500

    In this paper, we propose an RTP/UDP/IP header compression method, Multiple-Reference Compression (MRC), which is designed for mobile multimedia communications. MRC is a compression method that calculates differences from the multiple reference headers that have already been sent and inserts them into a compressed header. The receiver can decompress the compressed header as long as at least one of the reference headers is correctly received and decompressed. MRC improves robustness against packet losses compared with CRTP defined in IETF RFC2508, and imposes less overheads and computational burden than robust header compression (ROHC) defined in RFC3095. We also implemented MRC and other header compression algorithms into our mobile testbed, and conducted multimedia streaming experiments over the testbed. The results of the experiments show that MRC offers the same level of packet loss rate as Legacy RTP for both audio and video streams, and provides better media quality than Legacy RTP and CRTP on error-prone radio links. Header compression robust against packet losses is expected as a key technology for VoIP and multimedia streaming services over 3G and future mobile networks.

  • Mobility Overlap-Removal-Based Leakage Power and Register-Aware Scheduling in High-Level Synthesis

    Nan WANG  Song CHEN  Wei ZHONG  Nan LIU  Takeshi YOSHIMURA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E97-A No:8
      Page(s):
    1709-1719

    Scheduling is a key problem in high level synthesis, as the scheduling results affect most of the important design metrics. In this paper, we propose a novel scheduling method to simultaneously optimize the leakage power of functional units with dual-Vth techniques and the number of registers under given timing and resource constraints. The mobility overlaps between operations are removed to eliminate data dependencies, and a simulated-annealing-based method is introduced to explore the mobility overlap removal solution space. Given the overlap-free mobilities, the resource usage and register usage in each control step can be accurately estimated. Meanwhile, operations are scheduled so as to optimize the leakage power of functional units with minimal number of registers. Then, a set of operations is iteratively selected, reassigned as low-Vth, and rescheduled until the resource constraints are all satisfied. Experimental results show the efficiency of the proposed algorithm.

  • Unified Parameter Decoder Architecture for H.265/HEVC Motion Vector and Boundary Strength Decoding

    Shihao WANG  Dajiang ZHOU  Jianbin ZHOU  Takeshi YOSHIMURA  Satoshi GOTO  

     
    PAPER

      Vol:
    E98-A No:7
      Page(s):
    1356-1365

    In this paper, VLSI architecture design of unified motion vector (MV) and boundary strength (BS) parameter decoder (PDec) for 8K UHDTV HEVC decoder is presented. The adoption of new coding tools in PDec, such as Advanced Motion Vector Prediction (AMVP), increases the VLSI hardware realization overhead and memory bandwidth requirement, especially for 8K UHDTV application. We propose four techniques for these challenges. Firstly, this work unifies MV and BS parameter decoders for line buffer memory sharing. Secondly, to support high throughput, we propose the top-level CU-adaptive pipeline scheme by trading off between implementation complexity and performance. Thirdly, PDec process engine with optimizations is adopted for 43.2k area reduction. Finally, PU-based coding scheme is proposed for 30% DRAM bandwidth reduction. In 90nm process, our design costs 93.3k logic gates with 23.0kB line buffer. The proposed architecture can support real-time decoding for 7680x4320@60fps application at 249MHz in the worst case.

  • High Performance VLSI Architecture of H.265/HEVC Intra Prediction for 8K UHDTV Video Decoder

    Jianbin ZHOU  Dajiang ZHOU  Shihao WANG  Takeshi YOSHIMURA  Satoshi GOTO  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E98-A No:12
      Page(s):
    2519-2527

    8K Ultra High Definition Television (UHDTV) requires extremely high throughput for video decoding based on H.265. In H.265, intra coding could significantly enhance video compression efficiency, at the expense of an increased computational complexity compared with H.264. For intra prediction of 8K UHDTV real-time H.265 decoding, the joint complexity and throughput issue is more difficult to solve. Therefore, based on the divide-and-conquer strategy, we propose a new VLSI architecture in this paper, including two techniques, in order to achieve 8K UHDTV H.265 intra prediction decoding. The first technique is the LUT based Reference Sample Fetching Scheme (LUT-RSFS), reducing the number of reference samples in the worst case from 99 to 13. It further reduces the circuit area and enhances the performance. The second one is the Hybrid Block Reordering and Data Forwarding (HBRDF), minimizing the idle time and eliminating the dependency between TUs by creating 3 Data Forwarding paths. It achieves the hardware utilization of 94%. Our design is synthesized using Synopsys Design Compiler in 40nm process technology. It achieves an operation frequency of 260MHz, with a gate count of 217.8K for 8-bit design, and 251.1K for 10-bit design. The proposed VLSI architecture can support 4320p@120fps H.265 intra decoding (8-bit or 10-bit), with all 35 intra prediction modes and prediction unit sizes ranging from 4×4 to 64×64.

  • An Efficient Multi-Level Algorithm for 3D-IC TSV Assignment

    Cong HAO  Takeshi YOSHIMURA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E100-A No:3
      Page(s):
    776-784

    Through-silicon via (TSV) assignment problem is one of the key design challenges of 3-D IC which is crucial to the wire length and signal delay. In this work we formulate the 3-D IC TSV assignment as an Integer Minimum Cost Multi Commodity (IMCMC) problem on a IMCMC network, and propose a multi-level algorithm. It coarsens the IMCMC network level by level, applies a rough flow assignment on each level of coarsened graph, and generates only promising edges to reduce the IMCMC network size. Benefiting from the multi-level structure, we propose a mixed single and multi commodity flow method improve the TSV assignment solution quality. Moreover, given a TSV assignment, we propose an extended layer by layer algorithm to further optimize the TSV assignment. The experimental results demonstrate that our multi-level with mixed single and multi commodity flow algorithm achieves not only smaller wire length but also shorter runtime compared to other existing works.

  • Framework and VLSI Architecture of Measurement-Domain Intra Prediction for Compressively Sensed Visual Contents

    Jianbin ZHOU  Dajiang ZHOU  Li GUO  Takeshi YOSHIMURA  Satoshi GOTO  

     
    PAPER

      Vol:
    E100-A No:12
      Page(s):
    2869-2877

    This paper presents a measurement-domain intra prediction coding framework that is compatible with compressive sensing (CS)-based image sensors. In this framework, we propose a low-complexity intra prediction algorithm that can be directly applied to measurements captured by the image sensor. We proposed a structural random 0/1 measurement matrix, embedding the block boundary information that can be extracted from the measurements for intra prediction. Furthermore, a low-cost Very Large Scale Integration (VLSI) architecture is implemented for the proposed framework, by substituting the matrix multiplication with shared adders and shifters. The experimental results show that our proposed framework can compress the measurements and increase coding efficiency, with 34.9% BD-rate reduction compared to the direct output of CS-based sensors. The VLSI architecture of the proposed framework is 9.1 Kin area, and achieves the 83% reduction in size of memory bandwidth and storage for the line buffer. This could significantly reduce both the energy consumption and bandwidth in communication of wireless camera systems, which are expected to be massively deployed in the Internet of Things (IoT) era.

  • Approximate-DCT-Derived Measurement Matrices with Row-Operation-Based Measurement Compression and its VLSI Architecture for Compressed Sensing

    Jianbin ZHOU  Dajiang ZHOU  Takeshi YOSHIMURA  Satoshi GOTO  

     
    PAPER

      Vol:
    E101-C No:4
      Page(s):
    263-272

    Compressed Sensing based CMOS image sensor (CS-CIS) is a new generation of CMOS image sensor that significantly reduces the power consumption. For CS-CIS, the image quality and data volume of output are two important issues to concern. In this paper, we first proposed an algorithm to generate a series of deterministic and ternary matrices, which improves the image quality, reduces the data volume and are compatible with CS-CIS. Proposed matrices are derived from the approximate DCT and trimmed in 2D-zigzag order, thus preserving the energy compaction property as DCT does. Moreover, we proposed matrix row operations adaptive to the proposed matrix to further compress data (measurements) without any image quality loss. At last, a low-cost VLSI architecture of measurements compression with proposed matrix row operations is implemented. Experiment results show our proposed matrix significantly improve the coding efficiency by BD-PSNR increase of 4.2 dB, comparing with the random binary matrix used in the-state-of-art CS-CIS. The proposed matrix row operations for measurement compression further increases the coding efficiency by 0.24 dB BD-PSNR (4.8% BD-rate reduction). The VLSI architecture is only 4.3 K gates in area and 0.3 mW in power consumption.

  • Floorplanning for High Utilization of Heterogeneous FPGAs

    Nan LIU  Song CHEN  Takeshi YOSHIMURA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E95-A No:9
      Page(s):
    1529-1537

    Heterogeneous resources such as configurable logic blocks (CLBs), multiplier blocks (MULs) and RAM blocks (RAMs) where millions of logic gates are included have been added to field programmable gate arrays (FPGAs). The fixed-outline floorplanning used by the existing methods always has a big penalty item in the objective function to ensure all the modules are placed in the specified chip region, which maybe greatly degrade the wirelength. This paper presents a three-phase floorplanning method for heterogeneous FPGAs. First, a non-slicing free-outline floorplanning method is used to optimize the wirelength, however, in this phase, the satisfaction of resource requirements from functional modules might fail. Second, a min-cost-max-flow algorithm is used to tune the assignment of CLBs to functional modules, and assign contiguous regions to each module so that all the functional modules satisfy CLB requirements. Finally, the MULs and RAMs are allocated to modules by a network flow model. CLBs hold the maximum quantity among all the resources. Therefore, making a high utilization of them means an enhancement of the FPGA densities. The proposed method can improve the utilization of CLBs, hence, much larger circuits could be mapped to the same FPGA chip. The results show that about 7–85% wirelength reduction is obtained, and CLB utilization is improved by about 25%.

1-20hit(28hit)