Hasitha Muthumala WAIDYASOORIYA Masanori HARIYAMA Michitaka KAMEYAMA
Accelerator cores in low-power embedded processors have on-chip multiple memory modules to increase the data access speed and to enable parallel data access. When large functional units such as multipliers and dividers are used for addressing, a large power and chip area are consumed. Therefore, recent low-power processors use small functional units such as adders and counters to reduce the power and area. Such small functional units make it difficult to implement complex addressing patterns without duplicating data among multiple memory modules. The data duplication wastes the memory capacity and increases the data transfer time significantly. This paper proposes a method to reduce the memory duplication for window-based image processing, which is widely used in many applications. Evaluations using an accelerator core show that the proposed method reduces the data amount and data transfer time by more than 50%.
Chin-Long WEY Shin-Yo LIN Hsu-Sheng WANG Hung-Lieh CHEN Chun-Ming HUANG
In UWB systems, data symbols are transmitted and received continuously. The Fast Fourier Transform (FFT) processor must be able to seamlessly process input/output data. This paper presents the design and implementation of a continuous data flow parallel memory-based FFT (CF-PMBFFT) processor without the use of input buffer for pre-loading the input data. The processor realizes a memory space of two N-words and multiple processing elements (PEs) to achieve the seamless data flow and meet the design requirement. The circuit has been fabricated in TSMC 0.18 µm 1P6M CMOS process with the supply voltage of 1.8 V. Measurement results of the test chip shows that the developed CF-PMBFFT processor takes a core area of 1.97 mm2 with a power consumption of 62.12 mW for a throughput rate of 528 MS/s.
This letter proposes a dynamic phasor-based apparent impedance measuring method for a single-line-to-ground fault. Using the proposed method, the effects of the decaying DC components on the apparent impedance of a single-line-to-ground fault can be completely removed. Compared with previous works, the proposed method uses less computation to measure an accurate apparent impedance.
Lihong SHANG Mi ZHOU Yu HU Erfu YANG
Field programmable gate arrays (FPGAs) are widely used in reliability-critical systems due to their reconfiguration ability. However, with the shrinking device feature size and increasing die area, nowadays FPGAs can be deeply affected by the errors induced by electromigration and radiation. To improve the reliability of FPGA-based reconfigurable systems, a permanent fault recovery approach using a domain partition model is proposed in this paper. In the proposed approach, the fault-tolerant FPGA recovery from faults is realized by reloading a proper configuration from a pool of multiple alternative configurations with overlaps. The overlaps are presented as a set of vectors in the domain partition model. To enhance the reliability, a technical procedure is also presented in which the set of vectors are heuristically filtered so that the corresponding small overlaps can be merged into big ones. Experimental results are provided to demonstrate the effectiveness of the proposed approach through applying it to several benchmark circuits. Compared with previous approaches, the proposed approach increased MTTF by up to 18.87%.
Katsuma ONO Kenya JIN'NO Toshimichi SAITO
This letter studies application of the growing PSO to the design of DC-AC inverters. In this application, each particle corresponds to a set of circuit parameters and moves to solve a multi-objective problem of the total harmonic distortion and desired average power. The problem is described by the hybrid fitness consisting of analog objective function, criterion and digital logic. The PSO has growing structure and dynamic acceleration parameters. Performing basic numerical experiments, we have confirmed the algorithm efficiency.
Shingo MANDAI Taihei MOMMA Makoto IKEDA Kunihiro ASADA
This paper presents an architecture and a circuit design of readout address compression for a high-speed 3-D range-finding image sensor using the light-section method. We utilize a kind of variable-length code which is modified to suit the 3-D range-finder. The best compression rate by the proposed compression technique is 33.3%. The worst compression and the average compression rate is 56.4% and 42.4%, respectively, when we simulated the effectivity by using the example of measured sheet scans. We also show the measurement result of the fabricated image sensor with the address compression.
Tetsuo KIRIMOTO Takeshi AMISHIMA Atsushi OKAMURA
ICA (Independent Component Analysis) has a remarkable capability of separating mixtures of stochastic random signals. However, we often face problems of separating mixtures of deterministic signals, especially sinusoidal signals, in some applications such as radar systems and communication systems. One may ask if ICA is effective for deterministic signals. In this paper, we analyze the basic performance of ICA in separating mixtures of complex sinusoidal signals, which utilizes the fourth order cumulant as a criterion of independency of signals. We theoretically show that ICA can separate mixtures of deterministic sinusoidal signals. Then, we conduct computer simulations and radio experiments with a linear array antenna to confirm the theoretical result. We will show that ICA is successful in separating mixtures of sinusoidal signals with frequency difference less than FFT resolution and with DOA (Direction of Arrival) difference less than Rayleigh criterion.
Karolina NURZYNSKA Mamoru KUBO Ken-ichiro MURAMOTO
This study presents three image processing systems for snow particle classification into snowflake and graupel. All of them are based on feature classification, yet as a novelty in all cases multiple features are exploited. Additionally, each of them is characterized by a different data flow. In order to compare the performances, we not only consider various features, but also suggest different classifiers. The best achieved results are for the snowflake discrimination method applied before statistical classifier, as the correct classification ratio in this case reaches 94%. In other cases the best results are around 88%.
Xiao PENG Xiongxin ZHAO Zhixiang CHEN Fumiaki MAEHARA Satoshi GOTO
Permutation network plays an important role in the reconfigurable QC-LDPC decoder for most modern wireless communication systems with multiple code rates and various code lengths. This paper presents the generic permutation network (GPN) for the reconfigurable QC-LDPC decoder. Compared with conventional permutation networks, this proposal could break through the input number restriction, such as power of 2 and other limited number, and optimize the network for any application in demand. Moreover, the proposed scheme could greatly reduce the latency because of less stages and efficient control signal generating algorithm. In addition, the proposed network processes the nature of high parallelism which could enable several groups of data to be cyclically shifted simultaneously. The synthesis results using the 90 nm technology demonstrate that this architecture can be implemented with the gate count of 18.3k for WiMAX standard at the frequency of 600 MHz and 10.9k for WiFi standard at the frequency of 800 MHz.
Xiaolei ZHU Yanfei CHEN Masaya KIBUNE Yasumoto TOMITA Takayuki HAMADA Hirotaka TAMURA Sanroku TSUKAMOTO Tadahiro KURODA
The accuracy of the comparator, which is often determined by its offset, is essential for the resolution of the high performance mixed-signal system. Various design efforts have been made to cancel or calibrate the comparator offset due to many factors like process variations, device thermal noise and input-referred supply noise. However, effective and simple method for offset cancel by applying additional circuits without scarifying the power, speed and area is always challenging. This work explores a dynamic offset control technique that employs charge compensation by timing control. The charge injection and clock feed-through by the latch reset transistor are investigated. A simple method is proposed to generate offset compensation voltage by implementing two source-drain shorted transistors on each regenerative node with timing control signals on their gates. Further analysis for the principle of timing based charge compensation approach for comparator offset control is described. The analysis has been verified by fabricating a 65 nm CMOS 1.2 V 1 GHz comparator that occupies 25 65 µm2 and consumes 380 µW. Circuits for offset control occupies 21% of the areas and 12% of the power consumption of the whole comparator chip.
Nan QU Shingo YAMAGUCHI Qi-Wei GE
In this paper, we discuss the parallel degree of well-structured workflow nets, WF-nets, for short. First, we give the definition of parallel degree, PARAdeg, for WF-nets. Second, we show it is intractable to compute the value of PARAdeg for acyclic well-structured WF-nets. Next we construct two heuristic algorithms to compute the value. The first algorithm is focused on nest structure and the second one is focused on the longest path. Finally, we perform an experiment to compare the two algorithms and the result is that the accuracy of the first algorithm based on nest structure was higher than that of the second one based on the longest path for most well-structured WF-nets and the accuracy of the second one is better than that of first one only when the well-structured workflow nets are mainly composed by the parallel structures.
The performance of the kernel-based learning algorithms, such as SVM, depends heavily on the proper choice of the kernel parameter. It is desirable for the kernel machines to work on the optimal kernel parameter that adapts well to the input data and the learning tasks. In this paper, we present a novel method for selecting Gaussian kernel parameter by maximizing a class separability criterion, which measures the data distribution in the kernel-induced feature space, and is invariant under any non-singular linear transformation. The experimental results show that both the class separability of the data in the kernel-induced feature space and the classification performance of the SVM classifier are improved by using the optimal kernel parameter.
Yi TANG Junchen JIANG Xiaofei WANG Chengchen HU Bin LIU Zhijia CHEN
Multi-pattern matching is a key technique for implementing network security applications such as Network Intrusion Detection/Protection Systems (NIDS/NIPSes) where every packet is inspected against tens of thousands of predefined attack signatures written in regular expressions (regexes). To this end, Deterministic Finite Automaton (DFA) is widely used for multi-regex matching, but existing DFA-based researches have claimed high throughput at an expense of extremely high memory cost, so fail to be employed in devices such as high-speed routers and embedded systems where the available memory is quite limited. In this paper, we propose a parallel architecture of DFA called Parallel DFA (PDFA) taking advantage of the large amount of concurrent flows to increase the throughput with nearly no extra memory cost. The basic idea is to selectively store the underlying DFA in memory modules that can be accessed in parallel. To explore its potential parallelism we intensively study DFA-split schemes from both state and transition points in this paper. The performance of our approach in both the average cases and the worst cases is analyzed, optimized and evaluated by numerical results. The evaluation shows that we obtain an average speedup of 100 times compared with traditional DFA-based matching approach.
Tongsheng GENG Leibo LIU Shouyi YIN Min ZHU Shaojun WEI
This paper proposes approaches to perform HW/SW (Hardware/Software) partition and parallelization of computing-intensive tasks of the H.264 HiP (High Profile) decoding algorithm on an embedded coarse-grained reconfigurable multimedia system, called REMUS (REconfigurable MUltimedia System). Several techniques, such as MB (Macro-Block) based parallelization, unfixed sub-block operation etc., are utilized to speed up the decoding process, satisfying the requirements of real-time and high quality H.264 applications. Tests show that the execution performance of MC (Motion Compensation), deblocking, and IDCT-IQ (Inverse Discrete Cosine Transform-Inverse Quantization) on REMUS is improved by 60%, 73%, 88.5% in the typical case and 60%, 69%, 88.5% in the worst case, respectively compared with that on XPP PACT (a commercial reconfigurable processor). Compared with ASIC solutions, the performance of MC is improved by 70%, 74% in the typical and in the worst case, respectively, while those of Deblocking remain the same. As for IDCT_IQ, the performance is improved by 17% no matter in the typical or worst case. Relying on the proposed techniques, 1080p@30 fps of H.264 HiP@ Level 4 decoding could be achieved on REMUS when utilizing a 200 MHz working frequency.
We present a new method that can represent the reflectance of metallic paints accurately using a two-layer reflectance model with sampled microfacet distribution functions. We model the structure of metallic paints simplified by two layers: a binder surface that follows a microfacet distribution and a sub-layer that also follows a facet distribution. In the sub-layer, the diffuse and the specular reflectance represent color pigments and metallic flakes respectively. We use an iterative method based on the principle of Gauss-Seidel relaxation that stably fits the measured data to our highly non-linear model. We optimize the model by handling the microfacet distribution terms as a piecewise linear non-parametric form in order to increase its degree of freedom. The proposed model is validated by applying it to various metallic paints. The results show that our model has better fitting performance compared to the models used in other studies. Our model provides better accuracy due to the non-parametric terms employed in the model, and also gives efficiency in analyzing the characteristics of metallic paints by the analytical form embedded in the model. The non-parametric terms for the microfacet distribution in our model require densely measured data but not for the entire BRDF(bidirectional reflectance distribution function) domain, so that our method can reduce the burden of data acquisition during measurement. Especially, it becomes efficient for a system that uses a curved-sample based measurement system which allows us to obtain dense data in microfacet domain by a single measurement.
Tetsunao MATSUTA Tomohiko UYEMATSU Ryutaroh MATSUMOTO
Low-density parity-check (LDPC) codes become very popular in channel coding, since they can achieve the performance close to maximum-likelihood (ML) decoding with linear complexity of the block length. Recently, Muramatsu et al. proposed a code using LDPC matrices for Slepian-Wolf source coding, and showed that their code can achieve any point in the achievable rate region of Slepian-Wolf source coding. However, since they employed ML decoding, their decoder needs to know the probability distribution of the source. Hence, it is an open problem whether there exists a universal code using LDPC matrices, where universal code means that the error probability of the code vanishes as the block length tends to infinity for all sources whose achievable rate region contains the rate pair of encoders. In this paper, we show the existence of a universal Slepian-Wolf source code using LDPC matrices for stationary memoryless sources.
Heru SUKOCO Yoshiaki HORI Hendrawan Kouichi SAKURAI
The distribution of streaming multicast and real time audio/video applications in the Internet has been quickly increased in the Internet. Commonly, these applications rarely use congestion control and do not fairly share provided network capacity with TCP-based applications such as HTTP, FTP and emails. Therefore, Internet communities will be threatened by the increase of non-TCP-based applications that likely cause a significant increase of traffics congestion and starvation. This paper proposes a set of mechanisms, such as providing various data rates, background traffics, and various scenarios, to act friendly with TCP when sending multicast traffics. By using 8 scenarios of simulations, we use 6 layered multicast transmissions with background traffic Pareto with the shape factor 1.5 to evaluate performance metrics such as throughput, delay/latency, jitter, TCP friendliness, packet loss ratio, and convergence time. Our study shows that non TCP traffics behave fairly and respectful of the co-existent TCP-based applications that run on shared link transmissions even with background traffic. Another result shows that the simulation has low values on throughput, vary in jitter (0-10 ms), and packet loss ratio > 3%. It was also difficult to reach convergence time quickly when involving only non TCP traffics.
Kazuki CHIBA Masanori HAMAMURA
A novel peak-to-average power ratio (PAR) control algorithm for feedback-controlled multitone-hopping code-division multiple access (FC/MH-CDMA) signals is proposed. In FC/MH-CDMA, since each chip consists of plural tones, the energy consumption due to a large PAR is not negligible at the transmitter. The proposed PAR control algorithm iteratively constructs a time-frequency code that achieves a preset, target PAR under the condition that all signals are asynchronously transmitted. A PAR of 1 dB is shown to be achievable, and the bit-error rate performance is shown to be only slightly influenced if the target PAR is set to be larger than 3 dB. The influence of quantization is also discussed in terms of its application to limited feedback channels.
Shunsuke HORII Toshiyasu MATSUSHIMA Shigeichi HIRASAWA
Maximum likelihood (ML) decoding of linear block codes can be considered as an integer linear programming (ILP). Since it is an NP-hard problem in general, there are many researches about the algorithms to approximately solve the problem. One of the most popular algorithms is linear programming (LP) decoding proposed by Feldman et al. LP decoding is based on the LP relaxation, which is a method to approximately solve the ILP corresponding to the ML decoding problem. Advanced algorithms for solving ILP (approximately or exactly) include cutting-plane method and branch-and-bound method. As applications of these methods, adaptive LP decoding and branch-and-bound decoding have been proposed by Taghavi et al. and Yang et al., respectively. Another method for solving ILP is the branch-and-cut method, which is a hybrid of cutting-plane and branch-and-bound methods. The branch-and-cut method is widely used to solve ILP, however, it is unobvious that the method works well for the ML decoding problem. In this paper, we show that the branch-and-cut method is certainly effective for the ML decoding problem. Furthermore the branch-and-cut method consists of some technical components and the performance of the algorithm depends on the selection of these components. It is important to consider how to select the technical components in the branch-and-cut method. We see the differences caused by the selection of those technical components and consider which scheme is most effective for the ML decoding problem through numerical simulations.
Hiroyuki YOSHIDA Kosuke KAWAMOTO Yuma TANAKA Hitoshi KUBO Akihiko FUJII Masanori OZAKI
The authors describe a method to produce gold nanoparticle-dispersed liquid crystals by means of sputtering, and discuss how the presence of gold nanoparticles affect the electro-optic response of the host liquid crystal. The method exploits the fact that liquid crystals possess low vapor pressures which allow them to undergo the sputtering process, and the target material is sputtered directly on the liquid crystal in a reduced air pressure environment. The sample attained a red-brownish color after sputtering, but no aggregations were observed in the samples kept in the liquid crystal phase. Polarization optical microscopy of the sample placed in a conventional sandwich cell revealed that the phase transition behaviour is affected by the presence of the nanoparticles and that the onset of the nematic phase is observed in the form of bubble-like domains whereas in the pure sample the nematic phase appears after the passing of a phase transition front. Transmission electron microscopy confirmed the presence of single nano-sized particles that were dispersed without forming aggregates in the material. The electro-optic properties of the nanoparticle-dispersed liquid crystal was investigated by measuring the threshold voltage for a twisted-nematic cell. The threshold voltage was found to depend on the frequency of the applied rectangular voltage, and at frequencies higher than 200 Hz, the threshold became lower than the pure samples.