The search functionality is under construction.

Keyword Search Result

[Keyword] performance optimization(14hit)

1-14hit
  • Preliminary Performance Analysis of Distributed DNN Training with Relaxed Synchronization

    Koichi SHIRAHATA  Amir HADERBACHE  Naoto FUKUMOTO  Kohta NAKASHIMA  

     
    BRIEF PAPER

      Pubricized:
    2020/12/01
      Vol:
    E104-C No:6
      Page(s):
    257-260

    Scalability of distributed DNN training can be limited by slowdown of specific processes due to unexpected hardware failures. We propose a dynamic process exclusion technique so that training throughput is maximized. Our evaluation using 32 processes with ResNet-50 shows that our proposed technique reduces slowdown by 12.5% to 50% without accuracy loss through excluding the slow processes.

  • Cooperative Path Selection Framework for Effective Data Gathering in UAV-Aided Wireless Sensor Networks

    Sotheara SAY  Mohamad Erick ERNAWAN  Shigeru SHIMAMOTO  

     
    PAPER

      Vol:
    E99-B No:10
      Page(s):
    2156-2167

    Sensor networks are often used to understand underlying phenomena that are reflected through sensing data. In real world applications, this understanding supports decision makers attempting to access a disaster area or monitor a certain event regularly and thus necessary actions can be triggered in response to the problems. Practitioners designing such systems must overcome difficulties due to the practical limitations of the data and the fidelity of a network condition. This paper explores the design of a network solution for the data acquisition domain with the goal of increasing the efficiency of data gathering efforts. An unmanned aerial vehicle (UAV) is introduced to address various real-world sensor network challenges such as limited resources, lack of real-time representative data, and mobility of a relay station. Towards this goal, we introduce a novel cooperative path selection framework to effectively collect data from multiple sensor sources. The framework consists of six main parts ranging from the system initialization to the UAV data acquisition. The UAV data acquisition is useful to increase situational awareness or used as inputs for data manipulation that support response efforts. We develop a system-based simulation that creates the representative sensor networks and uses the UAV for collecting data packets. Results using our proposed framework are analyzed and compared to existing approaches to show the efficiency of the scheme.

  • A New Non-Uniform Weight-Updating Beamformer for LEO Satellite Communication

    Jie LIU  Zhuochen XIE  Huijie LIU  Zhengmin ZHANG  

     
    LETTER-Digital Signal Processing

      Vol:
    E99-A No:9
      Page(s):
    1708-1711

    In this paper, a new non-uniform weight-updating scheme for adaptive digital beamforming (DBF) is proposed. The unique feature of the letter is that the effective working range of the beamformer is extended and the computational complexity is reduced by introducing the robust DBF based on worst-case performance optimization. The robust parameter for each weight updating is chosen by analyzing the changing rate of the Direction of Arrival (DOA) of desired signal in LEO satellite communication. Simulation results demonstrate the improved performance of the new Non-Uniform Weight-Updating Beamformer (NUWUB).

  • Time Performance Optimization and Resource Conflicts Resolution for Multiple Project Management

    Cong LIU  Jiujun CHENG  Yirui WANG  Shangce GAO  

     
    PAPER-Software Engineering

      Pubricized:
    2015/12/04
      Vol:
    E99-D No:3
      Page(s):
    650-660

    Time performance optimization and resource conflict resolution are two important challenges in multiple project management contexts. Compared with traditional project management, multi-project management usually suffers limited and insufficient resources, and a tight and urgent deadline to finish all concurrent projects. In this case, time performance optimization of the global project management is badly needed. To our best knowledge, existing work seldom pays attention to the formal modeling and analyzing of multi-project management in an effort to eliminate resource conflicts and optimizing the project execution time. This work proposes such a method based on PRT-Net, which is a Petri net-based formulism tailored for a kind of project constrained by resource and time. The detailed modeling approaches based on PRT-Net are first presented. Then, resource conflict detection method with corresponding algorithm is proposed. Next, the priority criteria including a key-activity priority strategy and a waiting-short priority strategy are presented to resolve resource conflicts. Finally, we show how to construct a conflict-free PRT-Net by designing resource conflict resolution controllers. By experiments, we prove that our proposed priority strategy can ensure the execution time of global multiple projects much shorter than those without using any strategies.

  • Design of Interpolated Pipeline ADC Using Low-Gain Open-Loop Amplifiers

    Hyunui LEE  Masaya MIYAHARA  Akira MATSUZAWA  

     
    PAPER

      Vol:
    E96-C No:6
      Page(s):
    838-849

    This paper describes the design of an interpolated pipeline analog-to-digital converter (ADC). By introducing the interpolation technique into the conventional pipeline topology, it becomes possible to realize a more than 10-bits resolution and several hundred MS/s ADC using low-gain open-loop amplifiers without any multiplying digital-to-analog converter (MDAC) calibration. In this paper, linearity requirement of the amplifier is analyzed with the relation of reference range and stage resolution first. Noise characteristic is also discussed with amplifier's noise bandwidth and load capacitance. After that, sampling speed and SNR characteristic are examined with various amplifier currents. Next, the resolution optimization of the pipeline stage is discussed based on the power consumption. Through the analysis, reasonable parameters for the amplifier can be defined, such as transconductance, source degeneration resistance and load capacitance. Also, optimized operating speed and stage resolution for interpolated pipelined ADC is shown. The analysis in this paper is valuable to both the design of interpolated pipeline ADCs and other circuits which incorporate interpolation and amplifiers.

  • Performance-Driven Architectural Synthesis for Distributed Register-File Microarchitecture with Inter-Island Delay

    Juinn-Dar HUANG  Chia-I CHEN  Wan-Ling HSU  Yen-Ting LIN  Jing-Yang JOU  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E95-A No:2
      Page(s):
    559-566

    In deep-submicron era, wire delay is becoming a bottleneck while pursuing higher system clock speed. Several distributed register (DR) architectures are proposed to cope with this problem by keeping most wires local. In this article, we propose the distributed register-file microarchitecture with inter-island delay (DRFM-IID). Though DRFM-IID is also one of the DR-based architectures, it is considered more practical than the previously proposed DRFM, in terms of delay model. With such delay consideration, the synthesis task is inherently more complicated than the one without inter-island delay concern since uncertain interconnect latency is very likely to seriously impact on the whole system performance. Therefore we also develop a performance-driven architectural synthesis framework targeting DRFM-IID. Several factors for evaluating the quality of results, such as number of inter-island transfers, timing-criticality of transfer, and resource utilization balancing, are adopted as the guidance while performing architectural synthesis for better optimization outcomes. The experimental results show that the latency and the number of inter-cluster transfers can be reduced by 26.9% and 37.5% on average; and the latter is commonly regarded as an indicator for power consumption of on-chip communication.

  • Performance Optimization of Time Delay Estimation Based on Chirp Spread Spectrum Using ESPRIT

    Seong-Hyun JANG  Yeong-Sam KIM  Sang-Hoon YOON  Jong-Wha CHONG  

     
    LETTER-Sensing

      Vol:
    E94-B No:2
      Page(s):
    607-609

    In this letter, we analyze the effect of the size of observed data on the performance of time delay estimation (TDE) in the chirp spread spectrum (CSS) system. By adjusting the size of observed data, we reduce the effect of DC offsets, which would otherwise degrade the performance of TDE based on CSS, and we optimize the performance of TDE in CSS system. Finally, we derive the optimal size of observed data of TDE in CSS system.

  • DAC: A Device-Aware Cache Management Algorithm for Heterogeneous Mobile Storage Systems

    Young-Jin KIM  Jihong KIM  

     
    PAPER-System Programs

      Vol:
    E91-D No:12
      Page(s):
    2818-2833

    In recent years, heterogeneous devices have been employed frequently in mobile storage systems because a combination of such devices can supply a synergistically useful storage solution by taking advantage of each device. One important design constraint in heterogeneous storage systems is to mitigate I/O performance degradation stemming from the difference between access times of different devices. To this end, there has not been much work to devise proper buffer cache management algorithms. This paper presents a novel buffer cache management algorithm which considers both I/O cost per device and workload patterns in mobile computing systems with a heterogeneous storage pair of a hard disk and a NAND flash memory. In order to minimize the total I/O cost under varying workload patterns, the proposed algorithm employs a dynamic cache partitioning technique over different devices and manages each partition according to request patterns and I/O types along with the temporal locality. Trace-based simulations show that the proposed algorithm reduces the total I/O cost and flash write count significantly over the existing buffer cache algorithms on typical mobile traces.

  • Adaptive Beamforming with Robustness against Both Finite-Sample Effects and Steering Vector Mismatches

    Jing-Ran LIN  Qi-Cong PENG  Qi-Shan HUANG  

     
    PAPER-Digital Signal Processing

      Vol:
    E89-A No:9
      Page(s):
    2356-2362

    A novel approach of robust adaptive beamforming (RABF) is presented in this paper, aiming at robustness against both finite-sample effects and steering vector mismatches. It belongs to the class of diagonal loading approaches with the loading level determined based on worst-case performance optimization. The proposed approach, however, is distinguished by two points. (1) It takes finite-sample effects into account and applies worst-case performance optimization to not only the constraints, but also the objective of the constrained quadratic equation, for which it is referred to as joint worst-case RABF (JW-RABF). (2) It suggests a simple closed-form solution to the optimal loading after some approximations, revealing how different factors affect the loading. Compared with many existing methods in this field, the proposed one achieves better robustness in the case of small sample data size as well as steering vector mismatches. Moreover, it is less computationally demanding for presenting a simple closed-form solution to the optimal loading. Numerical examples confirm the effectiveness of the proposed approach.

  • Increase in Delay Uncertainty by Performance Optimization

    Masanori HASHIMOTO  Hidetoshi ONODERA  

     
    LETTER-Timing Analysis

      Vol:
    E85-A No:12
      Page(s):
    2799-2802

    This paper discusses a statistical effect of performance optimization to uncertainty in circuit delay. Performance optimization has an effect of balancing the delay of each path in a circuit, i.e. the delay times of long paths are shortened and the delay times of short paths are lengthened. In these path-balanced circuits, the uncertainty in circuit delay, which is caused by delay calculation error, manufacturing variability, fluctuation of operating condition, etc., becomes worse by a statistical characteristic of circuit delay. Thus, a highly-optimized circuit may not satisfy delay constraints. In this paper, we demonstrate some examples that uncertainty in circuit delay is increased by path-balancing, and we then raise a problem that performance optimization increases statistically-distributed circuit delay.

  • A Performance Optimization Method by Gate Resizing Based on Statistical Static Timing Analysis

    Masanori HASHIMOTO  Hidetoshi ONODERA  

     
    PAPER-Performance Optimization

      Vol:
    E83-A No:12
      Page(s):
    2558-2568

    This paper discusses a gate resizing method for performance enhancement based on statistical static timing analysis. The proposed method focuses on timing uncertainties caused by local random fluctuation. Our method aims to remove both over-design and under-design of a circuit, and realize high-performance and high-reliability LSI design. The effectiveness of our method is examined by 6 benchmark circuits. We verify that our method can reduce the delay time further from the circuits optimized for minimizing the delay without the consideration of delay fluctuation.

  • A Performance Optimization Method for Pipelined ASIPs in Consideration of Clock Frequency

    Katsuya SHINOHARA  Norimasa OHTSUKI  Yoshinori TAKEUCHI  Masaharu IMAI  

     
    PAPER

      Vol:
    E82-A No:11
      Page(s):
    2356-2365

    This paper proposes an ASIP performance optimization method taking clock frequency into account. The performance of an instruction set processor can be measured using the execution time of an application program, which can be determined by the clock cycles to perform the application program divided by the applied clock frequency. Therefore, the clock frequency should also be tuned in order to maximize the performance of the processor under the given design constraints. Experimental results show that the proposed method determines an optimal combination of FUs considering clock frequency.

  • A performance-Oriented Simultaneous Placement and Global Routing Algorithm for Transport-Processing FPGAs

    Nozomu TOGAWA  Masao SATO  Tatsuo OHTSUKI  

     
    PAPER

      Vol:
    E80-A No:10
      Page(s):
    1795-1806

    In layout design of transport-processing FPGAs, it is required that not only routing congestion is kept small but also circuits implemented on them operate with higher operation frequency. This paper extends the proposed simultaneous placement and global routing algorithm for transport-processing FPGAs whose objective is to minimize routing congestion and proposes a new algorithm in which the length of each critical signal path (path length) is limited within a specified upper bound imposed on it (path length constraint). The algorithm is based on hierarchical bipartitioning of layout regions and LUT (Look Up Table) sets to be placed. In each bipartitioning, the algorithm first searches the paths with tighter path length constraints by estimating their path lengths. Second the algorithm proceeds the bipartitioning so that the path lengths of critical paths can be reduced. The algorithm is applied to transport-processing circuits and compared with conventional approaches. The results demonstrate that the algorithm satisfies the path length constraints for 11 out of 13 circuits, though it increases routing congestion by an average of 20%. After detailed routing, it achieves 100% routing for all the circuits and decreases a circuit delay by an average of 23%.

  • A Simultaneous Technology Mapping, Placement, and Global Routing Algorithm for FPGAs with Path Delay Constraints

    Nozomu TOGAWA  Masao SATO  Tatsuo OHTSUKI  

     
    PAPER

      Vol:
    E79-A No:3
      Page(s):
    321-329

    In this paper, we propose a new FPGA design algorithm, Maple-opt, in which technology mapping, placement, and global routing are executed so that the delay of each critical signal path in an input circuit is within a specified upper bound imposed on it. The basic algorithm of Maple-opt is top-down hi-erarchical bi-partitioning of regions. Technology mapping onto logic-blocks of FPGAs, their placement, and global routing are determined simulatenously in each hierarchical process. This simultaneity leads to less congested layout for routing. In addition to that, Maple-opt computes a lower bound of delay for each path with a constraint value and determines critical paths based on the difference between the lower bound and the constraint value dynamically in each hierarchical process. Two delay reduction processes are executed for the critical paths; one is routing delay reduction and the other is logic-block delay reduction. Routing delay reduction is realized such that, when bi-partitioning a region, each constrained path is assigned to one subregion. Logic-block delay reduction is realized such that each constrained path is mapped onto fewer logic-blocks. Experimental results for some benchmark circuits show its efficiency and effectiveness.