IEICE global.ieice.org Site

Keyword Search Result

[Keyword] performance optimization(14hit)

1-14hit

Preliminary Performance Analysis of Distributed DNN Training with Relaxed Synchronization
Koichi SHIRAHATA Amir HADERBACHE Naoto FUKUMOTO Kohta NAKASHIMA

BRIEF PAPER

Pubricized:
2020/12/01
Vol:
E104-C No:6
Page(s):
257-260
Scalability of distributed DNN training can be limited by slowdown of specific processes due to unexpected hardware failures. We propose a dynamic process exclusion technique so that training throughput is maximized. Our evaluation using 32 processes with ResNet-50 shows that our proposed technique reduces slowdown by 12.5% to 50% without accuracy loss through excluding the slow processes.
Cooperative Path Selection Framework for Effective Data Gathering in UAV-Aided Wireless Sensor Networks
Sotheara SAY Mohamad Erick ERNAWAN Shigeru SHIMAMOTO

PAPER

Vol:
E99-B No:10
Page(s):
2156-2167
Sensor networks are often used to understand underlying phenomena that are reflected through sensing data. In real world applications, this understanding supports decision makers attempting to access a disaster area or monitor a certain event regularly and thus necessary actions can be triggered in response to the problems. Practitioners designing such systems must overcome difficulties due to the practical limitations of the data and the fidelity of a network condition. This paper explores the design of a network solution for the data acquisition domain with the goal of increasing the efficiency of data gathering efforts. An unmanned aerial vehicle (UAV) is introduced to address various real-world sensor network challenges such as limited resources, lack of real-time representative data, and mobility of a relay station. Towards this goal, we introduce a novel cooperative path selection framework to effectively collect data from multiple sensor sources. The framework consists of six main parts ranging from the system initialization to the UAV data acquisition. The UAV data acquisition is useful to increase situational awareness or used as inputs for data manipulation that support response efforts. We develop a system-based simulation that creates the representative sensor networks and uses the UAV for collecting data packets. Results using our proposed framework are analyzed and compared to existing approaches to show the efficiency of the scheme.
A New Non-Uniform Weight-Updating Beamformer for LEO Satellite Communication
Jie LIU Zhuochen XIE Huijie LIU Zhengmin ZHANG

LETTER-Digital Signal Processing

Vol:
E99-A No:9
Page(s):
1708-1711
In this paper, a new non-uniform weight-updating scheme for adaptive digital beamforming (DBF) is proposed. The unique feature of the letter is that the effective working range of the beamformer is extended and the computational complexity is reduced by introducing the robust DBF based on worst-case performance optimization. The robust parameter for each weight updating is chosen by analyzing the changing rate of the Direction of Arrival (DOA) of desired signal in LEO satellite communication. Simulation results demonstrate the improved performance of the new Non-Uniform Weight-Updating Beamformer (NUWUB).
Time Performance Optimization and Resource Conflicts Resolution for Multiple Project Management
Cong LIU Jiujun CHENG Yirui WANG Shangce GAO

PAPER-Software Engineering

Pubricized:
2015/12/04
Vol:
E99-D No:3
Page(s):
650-660
Time performance optimization and resource conflict resolution are two important challenges in multiple project management contexts. Compared with traditional project management, multi-project management usually suffers limited and insufficient resources, and a tight and urgent deadline to finish all concurrent projects. In this case, time performance optimization of the global project management is badly needed. To our best knowledge, existing work seldom pays attention to the formal modeling and analyzing of multi-project management in an effort to eliminate resource conflicts and optimizing the project execution time. This work proposes such a method based on PRT-Net, which is a Petri net-based formulism tailored for a kind of project constrained by resource and time. The detailed modeling approaches based on PRT-Net are first presented. Then, resource conflict detection method with corresponding algorithm is proposed. Next, the priority criteria including a key-activity priority strategy and a waiting-short priority strategy are presented to resolve resource conflicts. Finally, we show how to construct a conflict-free PRT-Net by designing resource conflict resolution controllers. By experiments, we prove that our proposed priority strategy can ensure the execution time of global multiple projects much shorter than those without using any strategies.
Design of Interpolated Pipeline ADC Using Low-Gain Open-Loop Amplifiers
Hyunui LEE Masaya MIYAHARA Akira MATSUZAWA

PAPER

Vol:
E96-C No:6
Page(s):
838-849
This paper describes the design of an interpolated pipeline analog-to-digital converter (ADC). By introducing the interpolation technique into the conventional pipeline topology, it becomes possible to realize a more than 10-bits resolution and several hundred MS/s ADC using low-gain open-loop amplifiers without any multiplying digital-to-analog converter (MDAC) calibration. In this paper, linearity requirement of the amplifier is analyzed with the relation of reference range and stage resolution first. Noise characteristic is also discussed with amplifier's noise bandwidth and load capacitance. After that, sampling speed and SNR characteristic are examined with various amplifier currents. Next, the resolution optimization of the pipeline stage is discussed based on the power consumption. Through the analysis, reasonable parameters for the amplifier can be defined, such as transconductance, source degeneration resistance and load capacitance. Also, optimized operating speed and stage resolution for interpolated pipelined ADC is shown. The analysis in this paper is valuable to both the design of interpolated pipeline ADCs and other circuits which incorporate interpolation and amplifiers.
Performance-Driven Architectural Synthesis for Distributed Register-File Microarchitecture with Inter-Island Delay
Juinn-Dar HUANG Chia-I CHEN Wan-Ling HSU Yen-Ting LIN Jing-Yang JOU

PAPER-VLSI Design Technology and CAD

Vol:
E95-A No:2
Page(s):
559-566
In deep-submicron era, wire delay is becoming a bottleneck while pursuing higher system clock speed. Several distributed register (DR) architectures are proposed to cope with this problem by keeping most wires local. In this article, we propose the distributed register-file microarchitecture with inter-island delay (DRFM-IID). Though DRFM-IID is also one of the DR-based architectures, it is considered more practical than the previously proposed DRFM, in terms of delay model. With such delay consideration, the synthesis task is inherently more complicated than the one without inter-island delay concern since uncertain interconnect latency is very likely to seriously impact on the whole system performance. Therefore we also develop a performance-driven architectural synthesis framework targeting DRFM-IID. Several factors for evaluating the quality of results, such as number of inter-island transfers, timing-criticality of transfer, and resource utilization balancing, are adopted as the guidance while performing architectural synthesis for better optimization outcomes. The experimental results show that the latency and the number of inter-cluster transfers can be reduced by 26.9% and 37.5% on average; and the latter is commonly regarded as an indicator for power consumption of on-chip communication.
Performance Optimization of Time Delay Estimation Based on Chirp Spread Spectrum Using ESPRIT
Seong-Hyun JANG Yeong-Sam KIM Sang-Hoon YOON Jong-Wha CHONG

LETTER-Sensing

Vol:
E94-B No:2
Page(s):
607-609
In this letter, we analyze the effect of the size of observed data on the performance of time delay estimation (TDE) in the chirp spread spectrum (CSS) system. By adjusting the size of observed data, we reduce the effect of DC offsets, which would otherwise degrade the performance of TDE based on CSS, and we optimize the performance of TDE in CSS system. Finally, we derive the optimal size of observed data of TDE in CSS system.
DAC: A Device-Aware Cache Management Algorithm for Heterogeneous Mobile Storage Systems
Young-Jin KIM Jihong KIM

PAPER-System Programs

Vol:
E91-D No:12
Page(s):
2818-2833
In recent years, heterogeneous devices have been employed frequently in mobile storage systems because a combination of such devices can supply a synergistically useful storage solution by taking advantage of each device. One important design constraint in heterogeneous storage systems is to mitigate I/O performance degradation stemming from the difference between access times of different devices. To this end, there has not been much work to devise proper buffer cache management algorithms. This paper presents a novel buffer cache management algorithm which considers both I/O cost per device and workload patterns in mobile computing systems with a heterogeneous storage pair of a hard disk and a NAND flash memory. In order to minimize the total I/O cost under varying workload patterns, the proposed algorithm employs a dynamic cache partitioning technique over different devices and manages each partition according to request patterns and I/O types along with the temporal locality. Trace-based simulations show that the proposed algorithm reduces the total I/O cost and flash write count significantly over the existing buffer cache algorithms on typical mobile traces.
Adaptive Beamforming with Robustness against Both Finite-Sample Effects and Steering Vector Mismatches
Jing-Ran LIN Qi-Cong PENG Qi-Shan HUANG

PAPER-Digital Signal Processing

Vol:
E89-A No:9
Page(s):
2356-2362
A novel approach of robust adaptive beamforming (RABF) is presented in this paper, aiming at robustness against both finite-sample effects and steering vector mismatches. It belongs to the class of diagonal loading approaches with the loading level determined based on worst-case performance optimization. The proposed approach, however, is distinguished by two points. (1) It takes finite-sample effects into account and applies worst-case performance optimization to not only the constraints, but also the objective of the constrained quadratic equation, for which it is referred to as joint worst-case RABF (JW-RABF). (2) It suggests a simple closed-form solution to the optimal loading after some approximations, revealing how different factors affect the loading. Compared with many existing methods in this field, the proposed one achieves better robustness in the case of small sample data size as well as steering vector mismatches. Moreover, it is less computationally demanding for presenting a simple closed-form solution to the optimal loading. Numerical examples confirm the effectiveness of the proposed approach.
Increase in Delay Uncertainty by Performance Optimization
Masanori HASHIMOTO Hidetoshi ONODERA

LETTER-Timing Analysis

Vol:
E85-A No:12
Page(s):
2799-2802
This paper discusses a statistical effect of performance optimization to uncertainty in circuit delay. Performance optimization has an effect of balancing the delay of each path in a circuit, i.e. the delay times of long paths are shortened and the delay times of short paths are lengthened. In these path-balanced circuits, the uncertainty in circuit delay, which is caused by delay calculation error, manufacturing variability, fluctuation of operating condition, etc., becomes worse by a statistical characteristic of circuit delay. Thus, a highly-optimized circuit may not satisfy delay constraints. In this paper, we demonstrate some examples that uncertainty in circuit delay is increased by path-balancing, and we then raise a problem that performance optimization increases statistically-distributed circuit delay.
A Performance Optimization Method by Gate Resizing Based on Statistical Static Timing Analysis
Masanori HASHIMOTO Hidetoshi ONODERA

PAPER-Performance Optimization

Vol:
E83-A No:12
Page(s):
2558-2568
This paper discusses a gate resizing method for performance enhancement based on statistical static timing analysis. The proposed method focuses on timing uncertainties caused by local random fluctuation. Our method aims to remove both over-design and under-design of a circuit, and realize high-performance and high-reliability LSI design. The effectiveness of our method is examined by 6 benchmark circuits. We verify that our method can reduce the delay time further from the circuits optimized for minimizing the delay without the consideration of delay fluctuation.
A Performance Optimization Method for Pipelined ASIPs in Consideration of Clock Frequency
Katsuya SHINOHARA Norimasa OHTSUKI Yoshinori TAKEUCHI Masaharu IMAI

PAPER

Vol:
E82-A No:11
Page(s):
2356-2365
This paper proposes an ASIP performance optimization method taking clock frequency into account. The performance of an instruction set processor can be measured using the execution time of an application program, which can be determined by the clock cycles to perform the application program divided by the applied clock frequency. Therefore, the clock frequency should also be tuned in order to maximize the performance of the processor under the given design constraints. Experimental results show that the proposed method determines an optimal combination of FUs considering clock frequency.
A performance-Oriented Simultaneous Placement and Global Routing Algorithm for Transport-Processing FPGAs
Nozomu TOGAWA Masao SATO Tatsuo OHTSUKI

PAPER

Vol:
E80-A No:10
Page(s):
1795-1806
In layout design of transport-processing FPGAs, it is required that not only routing congestion is kept small but also circuits implemented on them operate with higher operation frequency. This paper extends the proposed simultaneous placement and global routing algorithm for transport-processing FPGAs whose objective is to minimize routing congestion and proposes a new algorithm in which the length of each critical signal path (path length) is limited within a specified upper bound imposed on it (path length constraint). The algorithm is based on hierarchical bipartitioning of layout regions and LUT (Look Up Table) sets to be placed. In each bipartitioning, the algorithm first searches the paths with tighter path length constraints by estimating their path lengths. Second the algorithm proceeds the bipartitioning so that the path lengths of critical paths can be reduced. The algorithm is applied to transport-processing circuits and compared with conventional approaches. The results demonstrate that the algorithm satisfies the path length constraints for 11 out of 13 circuits, though it increases routing congestion by an average of 20%. After detailed routing, it achieves 100% routing for all the circuits and decreases a circuit delay by an average of 23%.
A Simultaneous Technology Mapping, Placement, and Global Routing Algorithm for FPGAs with Path Delay Constraints
Nozomu TOGAWA Masao SATO Tatsuo OHTSUKI

PAPER

Vol:
E79-A No:3
Page(s):
321-329
In this paper, we propose a new FPGA design algorithm, Maple-opt, in which technology mapping, placement, and global routing are executed so that the delay of each critical signal path in an input circuit is within a specified upper bound imposed on it. The basic algorithm of Maple-opt is top-down hi-erarchical bi-partitioning of regions. Technology mapping onto logic-blocks of FPGAs, their placement, and global routing are determined simulatenously in each hierarchical process. This simultaneity leads to less congested layout for routing. In addition to that, Maple-opt computes a lower bound of delay for each path with a constraint value and determines critical paths based on the difference between the lower bound and the constraint value dynamically in each hierarchical process. Two delay reduction processes are executed for the critical paths; one is routing delay reduction and the other is logic-block delay reduction. Routing delay reduction is realized such that, when bi-partitioning a region, each constrained path is assigned to one subregion. Logic-block delay reduction is realized such that each constrained path is mapped onto fewer logic-blocks. Experimental results for some benchmark circuits show its efficiency and effectiveness.

Keyword Search Result

[Keyword] performance optimization(14hit)

Preliminary Performance Analysis of Distributed DNN Training with Relaxed Synchronization

Cooperative Path Selection Framework for Effective Data Gathering in UAV-Aided Wireless Sensor Networks

A New Non-Uniform Weight-Updating Beamformer for LEO Satellite Communication

Time Performance Optimization and Resource Conflicts Resolution for Multiple Project Management

Design of Interpolated Pipeline ADC Using Low-Gain Open-Loop Amplifiers

Performance-Driven Architectural Synthesis for Distributed Register-File Microarchitecture with Inter-Island Delay

Performance Optimization of Time Delay Estimation Based on Chirp Spread Spectrum Using ESPRIT

DAC: A Device-Aware Cache Management Algorithm for Heterogeneous Mobile Storage Systems

Adaptive Beamforming with Robustness against Both Finite-Sample Effects and Steering Vector Mismatches

Increase in Delay Uncertainty by Performance Optimization

A Performance Optimization Method by Gate Resizing Based on Statistical Static Timing Analysis

A Performance Optimization Method for Pipelined ASIPs in Consideration of Clock Frequency

A performance-Oriented Simultaneous Placement and Global Routing Algorithm for Transport-Processing FPGAs

A Simultaneous Technology Mapping, Placement, and Global Routing Algorithm for FPGAs with Path Delay Constraints

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles