Hang Liu Fei Wu
Keiji GOTO Toru KAWANO Ryohei NAKAMURA
Takahiro SASAKI Yukihiro KAMIYA
Xiang XIONG Wen LI Xiaohua TAN Yusheng HU
Anton WIDARTA
Hiroshi OKADA Mao FUKINAKA Yoshiki AKIRA
Shun-ichiro Ohmi
Tohgo HOSODA Kazuyuki SAITO
Shohei Matsuhara Kazuyuki Saito Tomoyuki Tajima Aditya Rakhmadi Yoshiki Watanabe Nobuyoshi Takeshita
Koji Abe Mikiya Kuzutani Satoki Furuya Jose A. Piedra-Lorenzana Takeshi Hizawa Yasuhiko Ishikawa
Yihan ZHU Takashi OHSAWA
Shengbao YU Fanze MENG Yihan SHEN Yuzhu HAO Haigen ZHOU
Ryo KUMAGAI Ryosuke SUGA Tomoki UWANO
Jun SONODA Kazusa NAKAMICHI
Kaiji Owaki Yusuke Kanda Hideaki Kimura
Takuya FUJIMOTO
Yuji Wada
Fuyuki Kihara Chihiro Matsui Ken Takeuchi
Keito YUASA Michihiro IDE Sena KATO Kenichi OKADA Atsushi SHIRANE
Tomoo Ushio Yuuki Wada Syo Yoshida
Futoshi KUROKI
Jun FURUTA Shotaro SUGITANI Ryuichi NAKAJIMA Takafumi ITO Kazutoshi KOBAYASHI
Yuya Ichikawa Ayumu Yamada Naoko Misawa Chihiro Matsui Ken Takeuchi
Ayumu Yamada Zhiyuan Huang Naoko Misawa Chihiro Matsui Ken Takeuchi
Yoshinori ITOTAGAWA Koma ATSUMI Hikaru SEBE Daisuke KANEMOTO Tetsuya HIROSE
Hikaru SEBE Daisuke KANEMOTO Tetsuya HIROSE
Zhibo CAO Pengfei HAN Hongming LYU
Takuya SAKAMOTO Itsuki IWATA Toshiki MINAMI Takuya MATSUMOTO
Koji YAMANAKA Kazuhiro IYOMASA Takumi SUGITANI Eigo KUWATA Shintaro SHINJO
Minoru MIZUTANI Takashi OHIRA
Katsumi KAWAI Naoki SHINOHARA Tomohiko MITANI
Baku TAKAHARA Tomohiko MITANI Naoki SHINOHARA
Akihiko ISHIWATA Yasumasa NAKA Masaya TAMURA
Atsushi Fukuda Hiroto Yamamoto Junya Matsudaira Sumire Aoki Yasunori Suzuki
Ting DING Jiandong ZHU Jing YANG Xingmeng JIANG Chengcheng LIU
Fan Liu Zhewang Ma Masataka Ohira Dongchun Qiao Guosheng Pu Masaru Ichikawa
Ludovico MINATI
Minoru Fujishima
Hyunuk AHN Akito IGUCHI Keita MORIMOTO Yasuhide TSUJI
Kensei ITAYA Ryosuke OZAKI Tsuneki YAMASAKI
Akira KAWAHARA Jun SHIBAYAMA Kazuhiro FUJITA Junji YAMAUCHI Hisamatsu NAKANO
Seiya Kishimoto Ryoya Ogino Kenta Arase Shinichiro Ohnuki
Yasuo OHTERA
Tomohiro Kumaki Akihiko Hirata Tubasa Saijo Yuma Kawamoto Tadao Nagatsuma Osamu Kagaya
Haonan CHEN Akito IGUCHI Yasuhide TSUJI
Keiji GOTO Toru KAWANO Munetoshi IWAKIRI Tsubasa KAWAKAMI Kazuki NAKAZAWA
Tongxin YANG Toshinori SATO Tomoaki UKEZONO
Addition is a key fundamental function for many error-tolerant applications. Approximate addition is considered to be an efficient technique for trading off energy against performance and accuracy. This paper proposes a carry-maskable adder whose accuracy can be configured at runtime. The proposed scheme can dynamically select the length of the carry propagation to satisfy the quality requirements flexibly. Compared with a conventional ripple carry adder and a conventional carry look-ahead adder, the proposed 16-bit adder reduced the power consumption by 54.1% and 57.5%, respectively, and the critical path delay by 72.5% and 54.2%, respectively. In addition, results from an image processing application indicate that the quality of processed images can be controlled by the proposed adder. Good scalability of the proposed adder is demonstrated from the evaluation results using a 32-bit length.
Ken NAKAMURA Daisuke KOBAYASHI Yuya OMORI Tatsuya OSAWA Takayuki ONISHI Koyo NITTA Hiroe IWASAKI
In this paper, we describe a novel low-delay 4K 120-fps real-time HEVC decoder with a parallel processing architecture that conforms to the HEVC main 4:2:2 10 profile. It supports the hierarchical temporal scalable streams required for Ultra High Definition high-frame-rate broadcasting and also supports low-delay and high-bitrate decoding for video transmission uses. To achieve this support, the decoding processes are parallelized and pipelined at the frame level, slice level, and coding tree unit row level. The proposed decoder was implemented on three FPGAs operated at 133 and 150 MHz, and it achieved 300-Mbps stream decoding and 37-msec end-to-end delay with our concurrently developed 4K 120-fps encoder.
Boma A. ADHI Tomoya KASHIMATA Ken TAKAHASHI Keiji KIMURA Hironori KASAHARA
The advancement of multicore technology has made hundreds or even thousands of cores processor on a single chip possible. However, on a larger scale multicore, a hardware-based cache coherency mechanism becomes overwhelmingly complicated, hot, and expensive. Therefore, we propose a software coherence scheme managed by a parallelizing compiler for shared-memory multicore systems without a hardware cache coherence mechanism. Our proposed method is simple and efficient. It is built into OSCAR automatic parallelizing compiler. The OSCAR compiler parallelizes the coarse grain task, analyzes stale data and line sharing in the program, then solves those problems by simple program restructuring and data synchronization. Using our proposed method, we compiled 10 benchmark programs from SPEC2000, SPEC2006, NAS Parallel Benchmark (NPB), and MediaBench II. The compiled binaries then are run on Renesas RP2, an 8 cores SH-4A processor, and a custom 8-core Altera Nios II system on Altera Arria 10 FPGA. The cache coherence hardware on the RP2 processor is only available for up to 4 cores. The RP2's cache coherence hardware can also be turned off for non-coherence cache mode. The Nios II multicore system does not have any hardware cache coherence mechanism; therefore, running a parallel program is difficult without any compiler support. The proposed method performed as good as or better than the hardware cache coherence scheme while still provided the correct result as the hardware coherence mechanism. This method allows a massive array of shared memory CPU cores in an HPC setting or a simple non-coherent multicore embedded CPU to be easily programmed. For example, on the RP2 processor, the proposed software-controlled non-coherent-cache (NCC) method gave us 2.6 times speedup for SPEC 2000 “equake” with 4 cores against sequential execution while got only 2.5 times speedup for 4 cores MESI hardware coherent control. Also, the software coherence control gave us 4.4 times speedup for 8 cores with no hardware coherence mechanism available.
Yoshitake OKI Yuto ABE Kazuki YAMAMOTO Kohei YAMAMOTO Tomoya SHIRAKAWA Akimasa YOSHIDA Keiji KIMURA Hironori KASAHARA
Utilization of local memory from real-time embedded systems to high performance systems with multi-core processors has become an important factor for satisfying hard deadline constraints. However, challenges lie in the area of efficiently managing the memory hierarchy, such as decomposing large data into small blocks to fit onto local memory and transferring blocks for reuse and replacement. To address this issue, this paper presents a compiler optimization method that automatically manage local memory of multi-core processors. The method selects and maps multi-dimensional data onto software specified memory blocks called Adjustable Blocks. These blocks are hierarchically divisible with varying sizes defined by the features of the input application. Moreover, the method introduces mapping structures called Template Arrays to maintain the indices of the decomposed multi-dimensional data. The proposed work is implemented on the OSCAR automatic parallelizing compiler and evaluations were performed on the Renesas RP2 8-core processor. Experimental results from NAS Parallel Benchmark, SPEC benchmark, and multimedia applications show the effectiveness of the method, obtaining maximum speed-ups of 20.44 with 8 cores utilizing local memory from single core sequential versions that use off-chip memory.
Hosang LEE Jawad YOUSAF Kwangho KIM Seongjin MUN Chanseok HWANG Wansoo NAH
This paper analyzes and compares two methods to estimate electromagnetically coupled noises introduced to an antenna due to the nearby circuits at a circuit design stage. One of them is to estimate the power spectrum, and the other one is to estimate the active S11 parameter at the victim antenna, respectively, and both of them use simulated standard S-parameters for the electromagnetic coupling in the circuit. They also need the assumed or measured excitation of noise sources. To confirm the validness of the two methods, an evaluation board consisting of an antenna and noise sources were designed and fabricated in which voltage controlled oscillator (VCO) chips are placed as noise sources. The generated electromagnetic noises are transferred to an antenna via loop-shaped transmission lines, degrading the performance of the antenna. In this paper, detailed analysis procedures are described using the evaluation board, and it is shown that the two methods are equivalent to each other in terms of the induced voltages in the antenna. Finally, a procedure to estimate antenna performance degradation at the design stage is summarized.
Kenshiro SATO Dondee NAVARRO Shinya SEKIZAKI Yoshifumi ZOKA Naoto YORINO Hans Jürgen MATTAUSCH Mitiko MIURA-MATTAUSCH
The degradation of a SiC-MOSFET-based DC-AC converter-circuit efficiency due to aging of the electrically active devices is investigated. The newly developed compact aging model HiSIM_HSiC for high-voltage SiC-MOSFETs is used in the investigation. The model considers explicitly the carrier-trap-density increase in the solution of the Poisson equation. Measured converter characteristics during a 3-phase line-to-ground (3LG) fault is correctly reproduced by the model. It is verified that the MOSFETs experience additional stress due to the high biases occurring during the fault event, which translates to severe MOSFET aging. Simulation results predict a 0.5% reduction of converter efficiency due to a single 70ms-3LG, which is equivalent to a year of operation under normal conditions, where no additional stress is applied. With the developed compact model, prediction of the efficiency degradation of the converter circuit under prolonged stress, for which measurements are difficult to obtain and typically not available, is also feasible.
Takamaru MATSUI Shouhei KIDERA
Here, we present a novel spectroscopic imaging method based on the boundary-extraction scheme for wide-beam terahertz (THz) three-dimensional imaging. Optical-lens-focusing systems for THz subsurface imaging generally require the depth of the object from the surface to be input beforehand to achieve the desired azimuth resolution. This limitation can be alleviated by incorporating a wide-beam THz transmitter into the synthetic aperture to automatically change the focusing depth in the post-signal processing. The range point migration (RPM) method has been demonstrated to have significant advantages in terms of imaging accuracy over the synthetic-aperture method. Moreover, in the RPM scheme, spectroscopic information can be easily associated with each scattering center. Thus, we propose an RPM-based terahertz spectroscopic imaging method. The finite-difference time-domain-based numerical analysis shows that the proposed algorithm provides accurate target boundary imaging associated with each frequency-dependent characteristic.