Yoshinobu HIGAMI Senling WANG Hiroshi TAKAHASHI Shin-ya KOBAYASHI Kewal K. SALUJA
In this paper, we propose a method to diagnose a bridging fault between a clock line and a gate signal line. Assuming that scan based flush tests are applied, we perform fault simulation to deduce candidate faults. By analyzing fault behavior, it is revealed that faulty clock waveforms depend on the timing of the signal transition on a gate signal line which is bridged. In the fault simulation, a backward sensitized path tracing approach is introduced to calculate the timing of signal transitions. Experimental results show that the proposed method deduces candidate faults more accurately than our previous method.
Yutaka TAKAGI Takanori FUJISAWA Masaaki IKEHARA
In this paper, we propose a method for removing block noise which appears in JPEG (Joint Photographic Experts Group) encoded images. We iteratively perform the 3D wiener filtering and correction of the coefficients. In the wiener filtering, we perform the block matching for each patch in order to get the patches which have high similarities to the reference patch. After wiener filtering, the collected patches are returned to the places where they were and aggregated. We compare the performance of the proposed method to some conventional methods, and show that the proposed method has an excellent performance.
Tetsuki TANIGUCHI Yoshio KARASAWA
Massive multiple input multiple output (MIMO) communication system offers high rate transmission and/or support of a large number of users by invoking the power of a large array antenna, but one of its problem is the heavy computational burden required for the design and signal processing. Assuming the utilization of a large array in the transmitter side and much fewer users than the maximum possible value, this paper first presents a subarray based design approach of MIMO system with a low computational load taking into account efficient subarray grouping for the realization of higher performance; a large transmit array is first divided into subarrays based on channel gain or channel correlation, then block diagonalization is applied to each of them, and finally a large array weight is reconstructed by maximal ratio combining (MRC). In addition, the extension of the proposed method to two-stage design is studied in order to support a larger number of users; in the process of reconstruction to a large array, subarrays are again divided into groups, and block diagonalization is applied to those subarray groups. Through computer simulations, it is shown that the both channel gain and correlation based grouping strategies are effective under certain conditions, and that the number of supported users can be increased by two-stage design if certain level of performance degradation is acceptable.
Hiroshi NISHIMOTO Akinori TAIRA Hiroki IURA Shigeru UCHIDA Akihiro OKAZAKI Atsushi OKAMURA
Massive multiple-input multiple-output (MIMO) technology is one of the key enablers in the fifth generation mobile communications (5G), in order to accommodate growing traffic demands and to utilize higher super high frequency (SHF) and extremely high frequency (EHF) bands. In the paper, we propose a novel transmit precoding named “nonlinear block multi-diagonalization (NL-BMD) precoding” for multiuser MIMO (MU-MIMO) downlink toward 5G. Our NL-BMD precoding strategy is composed of two essential techniques: block multi-diagonalization (BMD) and adjacent inter-user interference pre-cancellation (IUI-PC). First, as an extension of the conventional block diagonalization (BD) method, the linear BMD precoder for the desired user is computed to incorporate a predetermined number of interfering users, in order to ensure extra degrees of freedom at the transmit array even after null steering. Additionally, adjacent IUI-PC, as a nonlinear operation, is introduced to manage the residual interference partially allowed in BMD computation, with effectively-reduced numerical complexity. It is revealed through computer simulations that the proposed NL-BMD precoding yields up to 67% performance improvement in average sum-rate spectral efficiency and enables large-capacity transmission regardless of the user distribution, compared with the conventional BD precoding.
Kai-Feng XIA Bin WU Tao XIONG Cheng-Ying CHEN
This paper presents a high-throughput sliding block Viterbi decoder for IEEE 802.11ac systems. A 64-state bidirectional sliding block Viterbi method is proposed to meet the speed requirement of the system. The decoder throughput goes up to 640Mbps, which can be further increased by adding the block parallelism. Moreover, a modified add-compare-select (ACS) unit is designed to enhance the working frequency. The modified ACS unit obtains nearly 26% speed-up, compared to the conventional ACS unit. However, the area overhead and power dissipation are almost the same. The decoder is designed in a SMIC 0.13µm technology, and it occupies 1.96mm2 core area and 105mW power consumption with an energy efficiency of 0.1641nJ/bit with a 1.2V voltage supply.
Fumiyuki ADACHI Amnart BOONKAJAY Yuta SEKI Tomoyuki SAITO Shinya KUMAGAI Hiroyuki MIYAZAKI
In this paper, the recent advances in cooperative distributed antenna transmission (CDAT) are introduced for spatial diversity and multi-user spatial multiplexing in 5G mobile communications network. CDAT is an advanced version of the coordinated multi-point (CoMP) transmission. Space-time block coded transmit diversity (STBC-TD) for spatial diversity and minimum mean square error filtering combined with singular value decomposition (MMSE-SVD) for multi-user spatial multiplexing are described under the presence of co-channel interference from adjacent macro-cells. Blind selected mapping (blind SLM) which requires no side information transmission is introduced in order to suppress the increased peak-to-average signal power ratio (PAPR) of the transmit signals when CDAT is applied. Some computer simulation results are presented to confirm the effectiveness of CDAT techniques.
Ryuta KAWANO Hiroshi NAKAHARA Seiichi TADE Ikki FUJIWARA Hiroki MATSUTANI Michihiro KOIBUCHI Hideharu AMANO
Inter-switch networks for HPC systems and data-centers can be improved by applying random shortcut topologies with a reduced number of hops. With minimal routing in such networks; however, deadlock-freedom is not guaranteed. Multiple Virtual Channels (VCs) are efficiently used to avoid this problem. However, previous works do not provide good trade-offs between the number of required VCs and the time and memory complexities of an algorithm. In this work, a novel and fast algorithm, named ACRO, is proposed to endorse the arbitrary routing functions with deadlock-freedom, as well as consuming a small number of VCs. A heuristic approach to reduce VCs is achieved with a hash table, which improves the scalability of the algorithm compared with our previous work. Moreover, experimental results show that ACRO can reduce the average number of VCs by up to 63% when compared with a conventional algorithm that has the same time complexity. Furthermore, ACRO reduces the time complexity by a factor of O(|N|⋅log|N|), when compared with another conventional algorithm that requires almost the same number of VCs.
Mohamed TOLBA Ahmed ABDELKHALEK Amr M. YOUSSEF
Midori128 is a lightweight block cipher proposed at ASIACRYPT 2015 to achieve low energy consumption per bit. Currently, the best published impossible differential attack on Midori128 covers 10 rounds without the pre-whitening key. By exploiting the special structure of the S-boxes and the binary linear transformation layer in Midori128, we present impossible differential distinguishers that cover 7 full rounds including the mix column operations. Then, we exploit four of these distinguishers to launch multiple impossible differential attack against 11 rounds of the cipher with the pre-whitening and post-whitening keys.
Yun WANG Makihiko KATSURAGI Kenichi OKADA Akira MATSUZAWA
This paper present a 20-GHz differential push-push voltage controlled oscillator (VCO) for 60-GHz frequency synthesizer. The 20-GHz VCO consists of a 10-GHz in-phase injection-coupled QVCO (IPIC-QVCO) with tail-filter and a differential output push-push doubler for 20-GHz output. The VCO fabricated in 65-nm CMOS technology, it achieves tuning range of 3 GHz from 17.5 GHz to 20.4 GHz with a phase noise of -113.8 dBc/Hz at 1 MHz offset. The core oscillator consumes up to 71 mW power and a FoM of -180.2 dBc/Hz is achieved.
Yosuke OGASAWARA Ryuichi FUJIMOTO Tsuneo SUZUKI Kenichi SAMI
A novel spur cancelled clock generator (SCCG) capable of recovering RX sensitivity degradations caused by digital clocks in wireless SoCs is presented. Clock spurs that degrade RX sensitivities are canceled by applying the SCCG to digital circuits or ADCs. The SCCG is integrated into a Bluetooth Low Energy (BLE) SoC fabricated in a 65 nm CMOS process. A measured clock spur reduction of 34 dB and an RX sensitivity recovery of 5 dB are achieved by the proposed SCCG. The power consumption and occupied area of the SCCG is only 18 µW and 40 μm × 120 μm, respectively.
Masaya HASEGAWA Kazuki SAKASHITA Kousei UCHIKOSHI Shigeki HIROBAYASHI Tadanobu MISAWA
A digital image is often deteriorated by impulse noise that may occur during processes such as transmission. An impulse noise converts the pixel data in the image into black (0) or white (255) values at a random frequency and is also called salt-and-pepper noise. In this paper, we identify the details of pixels that have been damaged by impulse noise by analyzing the frequency of the noisy image using non-harmonic analysis (NHA). From experimental results, we can confirm that this method shows superior performance compared to the recent PSNR denoising method. In addition, we show that the proposed method is particularly superior in eliminating impulse noise in images with high noise rates.
Seon Hwan KIM Ju Hee CHOI Jong Wook KWAK
In this letter, we propose a round robin-based wear leveling (RRWL) for flash memory systems. RRWL uses a block erase table (BET), which is composed of a bit array and saves the erasure histories of blocks. BET can use one-to-one mode to increase the performance of wear leveling or one-to-many mode to reduce memory consumption. However, one-to-many mode decreases the accuracy of cold block information, which results in the lifetime degradation of flash memory. To solve this problem, RRWL consistently uses one-to-one mode based on round robin method to increase the accuracy of cold block identification, with reduced memory size of BET, like in one-to-many mode. Experiments show that RRWL increases the lifetime of flash memory by up to 47% and 14%, compared with BET and HaWL, respectively.
We propose a Simulink model of a ring oscillator using saturating integrators. The oscillator's period is tuned via the saturation time of the integrators. Thus, timing jitters due to white and flicker noises are easily introduced into the model, enabling an efficient phase noise evaluation before transistor-level circuit design.
Hanxu YOU Lianqiang LI Jie ZHU
The compressive sensing (CS) theory has been widely used in synthetic aperture radar (SAR) imaging for its ability to reconstruct image from an extremely small set of measurements than what is generally considered necessary. Because block-based CS approaches in SAR imaging always cause block boundaries between two adjacent blocks, resulting in namely the block artefacts. In this paper, we propose a weighted overlapped block-based compressive sensing (WOBCS) method to reduce the block artefacts and accomplish SAR imaging. It has two main characteristics: 1) the strategy of sensing small and recovering big and 2) adaptive weighting technique among overlapped blocks. This proposed method is implemented by the well-known CS recovery schemes like orthogonal matching pursuit (OMP) and BCS-SPL. Promising results are demonstrated through several experiments.
Aravind THARAYIL NARAYANAN Wei DENG Dongsheng YANG Rui WU Kenichi OKADA Akira MATSUZAWA
An all-digital fully-synthesizable PVT-tolerant clock data recovery (CDR) architecture for wireline chip-to-chip interconnects is presented. The proposed architecture enables the co-synthesis of the CDR with the digital core. By eliminating the resource hungry manual layout and interfacing steps, which are necessary for conventional CDR topologies, the design process and the time-to-market can be drastically improved. Besides, the proposed CDR architecture enables the re-usability of majority of the sub-systems which enables easy migration to different process nodes. The proposed CDR is also equipped with a self-calibration scheme for ensuring tolerence over PVT. The proposed fully-syntehsizable CDR was implemented in 28nm FDSOI. The system achieves a maximum data rate of 10.06Gbps while consuming a power of 16.1mW from a 1V power supply.
Jinwoo LEE Jae Woo SEO Kookrae CHO Pil Joong LEE Dae Hyun YUM
The Android pattern unlock is a widely adopted graphical password system that requires a user to draw a secret pattern connecting points arranged in a grid. The theoretical security of pattern unlock can be defined by the number of possible patterns. However, only upper bounds of the number of patterns have been known except for 3×3 and 4×4 grids for which the exact number of patterns was found by brute-force enumeration. In this letter, we present the first lower bound by computing the minimum number of visible points from each point in various subgrids.
Controlling synchrony as well as desynchrony in a network of neuronal oscillators has been one of the focus issues in nonlinear science and engineering. It has been well known that spike stimuli injected commonly to multiple neurons can synchronize them if the strength of the common spike stimuli is high enough. Our recent study showed that this common spike-induced synchrony could be suppressed by introducing heterogeneity to inhibitory connections, through which the common spikes are transmitted. The aim of the present study is apply this methodology to electronic neurons as a real physical hardware. Using an Axon-Hillock circuit that represents basic properties of the leaky integrate-and-fire (LIF) neuron, our experiment demonstrated that the method was quite effective for desynchronizing the neuron circuits. The experimental results are also in a good agreement with the linear response theory that describes the input-output relationship of LIF neurons. Our method of suppressing the neuronal synchrony should be of practical use for enhancement of neural information processing as well as for improvement of pathological state of the brain.
The overdrive technique is widely used to eliminate motion blur in liquid-crystal displays (LCDs). However, this technique requires a large frame memory to store the previous frame. A reduction in the frame memory requires an image compression algorithm suitable for real-time data processing. In this paper, we present an algorithm based on multimode-color-conversion block truncation coding (MCC-BTC) to obtain a constant output bit rate and high overdrive performance. The MCC-BTC algorithm uses four compression methods, one of which is selected. The four compression modes either use the single-bitmap-generation method or the subsampling method for chrominance. As shown in the simulation results, the proposed algorithm improves the performance of both coding (up to 2.73dB) and overdrive (up to 2.61dB), and the visual quality is improved in comparison to other competing algorithms in literature.
Ting-Chou LU Ming-Dou KER Hsiao-Wen ZAN Jen-Chieh LIU Yu LEE
A multi-phase crystal-less clock generator (MPCLCG) with a process-voltage-temperature (PVT) calibration circuit is proposed. It operates at 192 MHz with 8 phases outputs, and is implemented as a 0.18µm CMOS process for digital power management systems. A temperature calibrated circuit is proposed to align operational frequency under process and supply voltage variations. It occupies an area of 65µm × 75µm and consumes 1.1mW with the power supply of 1.8V. Temperature coefficient (TC) is 69.5ppm/°C from 0 to 100°C, and 2-point calibration is applied to calibrate PVT variation. The measured period jitter is a 4.58-ps RMS jitter and a 34.55-ps peak-to-peak jitter (P2P jitter) at 192MHz within 12.67k-hits. At 192MHz, it shows a 1-MHz-offset phase noise of -102dBc/Hz. Phase to phase errors and duty cycle errors are less than 5.5% and 4.3%, respectively.
Wei-Kai CHENG Jui-Hung HUNG Yi-Hsuan CHIU
As the increasing complexity of chip design, reducing both power consumption and clock skew becomes a crucial research topic in clock network synthesis. Among various clock network synthesis approaches, clock tree has less power consumption in comparison with clock mesh structure. In contrast, clock mesh has a higher tolerance of process variation and hence is easier to satisfy the clock skew constraint. To reduce the power consumption of clock mesh network, an effective way is to minimize the wire capacitance of stub wires. In addition, integration of clock gating and register clustering techniques on clock mesh network can further reduce dynamic power consumption. In this paper, under both enable timing constraint and clock skew constraint, we propose a methodology to reduce the switching capacitance by non-uniform clock mesh synthesis, clock gate insertion and register clustering. In comparison with clock mesh synthesis and clock gating technique individually, experimental results show that our methodology can improve both the clock skew and switching capacitance efficiently.