Sung-Kwan Youm Meejoung KIM Chul-Hee KANG
This paper considers the reliable multicast transport protocols used in hybrid networks that include wired and wireless networks and transparent proxy servers. We present four analytic performance models of two extreme reliable multicast transport protocols, sender-initiated and receiver-initiated, and supported and unsupported by transparent proxy servers are considered in each reliable multicast protocol. We analyze the throughputs of these four different models mathematically. Numerical results show that transparent proxy servers give good effects to overall performance. Furthermore, the receiver-initiated reliable multicast supported by transparent proxy servers gives better performances of total throughput than sender-initiated reliable multicast supported by transparent proxy servers. We provide efficiency criterion of the optimal number of transparent proxy servers for each protocol under varying wireless loss probabilities. Numerical results are verified by simulations.
Yutaka MURAKAMI Kiyotaka KOBAYASHI Takashi FUKAGAWA Masayuki ORIHASHI
We propose a likelihood detection scheme that utilizes ordering and decision of partial bits in MIMO spatial multiplexing systems. We compute BER performance of the proposed detection scheme under Rayleigh fading channels in a 33 MIMO spatial multiplexing system and compare it with BER performance using MLD only and detection utilizing ZF or MMSE only. In addition, the computational complexity of the proposed detection scheme is compared with that of MLD and detection utilizing ZF or MMSE. The results of our investigation show that the proposed detection is a scheme achieves both good BER performance and low computational complexity.
This letter describes unequal-power transmission for multiple-input and multiple-output (MIMO) systems with a parallel interference canceller (PIC) applied to a maximum likelihood detector (MLD) or complexity-reduced MLD at the receiver. Unequal-power transmission reduces the possibility that all substreams are incorrectly decoded. Canceling the correctly decoded substreams enables more reliable detection in the next stage. The simulation results demonstrated that unequal-power transmission improves the transmission performance of the PIC applied to MLDs or complexity-reduced MLDs, compared with equal-power transmission cases.
Recently, various decoding algorithms with Low Density Parity Check (LDPC) codes have been proposed. Most algorithms can be divided into a hard decision algorithm and a soft decision algorithm. The Weighted Bit Flipping (WBF) algorithm that is between a hard decision and a soft decision algorithms has been proposed. The Bootstrapped WBF and Modified WBF algorithms have been proposed to improve the error rate performance and decoding complexity of the WBF algorithm. In this letter, we apply the Bootstrap step to the Modified WBF algorithm. We show that the Bootstrapped modified WBF algorithm outperforms the WBF, Bootstrapped WBF, and Modified WBF algorithms. Moreover, we show that the Bootstrapped modified WBF algorithm has the lowest decoding complexity.
Yoichi INABA Tomonori SAITO Tomoaki OHTSUKI
The Reliability-Based Hybrid ARQ (RB-HARQ) scheme, which can be used with error correcting codes using soft-input soft-output (SISO) decoders such as convolutional codes and turbo codes has been proposed. In the RB-HARQ scheme, the error rate performance is improved by selecting the retransmission bits based on Log Likelihood Ratio (LLR) of each bit in the receiver. However, the receiver has to send the bit positions of retransmission bits to the transmitter. Therefore, the RB-HARQ scheme requires a great number of feedback bits. On the other hand, Low Density Parity Check (LDPC) codes are attracting a lot of interest, recently. Because LDPC codes can achieve near Shannon limit performance and be decoded easily compared to turbo code. In this paper, we evaluate the RB-HARQ scheme using LDPC code. Moreover, we propose a RB-HARQ scheme that requires a fewer feedback bits by utilizing a code structure of LDPC code. We refer to the scheme as the RB-HARQ (row base) scheme. We show that the RB-HARQ and RB-HARQ (row base) schemes using LDPC code have better error rate performance than the scheme without ARQ. We also show that the RB-HARQ (row base) scheme has a good trade-off between error rate performance and the number of feedback bits compared to the RB-HARQ scheme.
Temperature-tracking is becoming of paramount importance in modern electronic design automation tools. In this paper, we present a deterministic thermal placement algorithm for standard cell based layout which can lead to a smooth temperature distribution over the die. It is mainly based on Fiduccia-Mattheyses partition scheme and a former substrate thermal model that can convert the known temperature constraints into the corresponding power distribution constraints. Moreover, a kind of force-directed heuristic based on cells' power consumption is introduced in the above process. Experimental results demonstrate a comparatively uniform temperature distribution and show a reduction of the maximal temperature on the die.
Masayoshi NABESHIMA Kouji YATA
It is well known that TCP does not fully utilize the available bandwidth in fast long-distance networks. To solve this scalability problem, several high speed transport protocols have been proposed. They include HighSpeed TCP (HS-TCP), Scalable TCP (S-TCP), Binary increase control TCP (BIC-TCP), and H-TCP. These protocols increase (decrease) their window size more aggressively (slowly) compared to standard TCP (STD-TCP). This paper aims at evaluating and comparing these high speed transport protocols through computer simulations. We select six metrics that are important for high speed protocols; scalability, buffer requirement, TCP friendliness, TCP compatibility, RTT fairness, and responsiveness. Simulation scenarios are carefully designed to investigate the performance of these protocols in terms of the metrics. Results clarify that each high speed protocol successfully solves the problem of STD-TCP. In terms of the buffer requirement, S-TCP and BIC-TCP have better performance. For TCP friendliness and compatibility, HS-TCP and H-TCP offer better performance. For RTT fairness, BIC-TCP and H-TCP are superior. For responsiveness, HS-TCP and H-TCP are preferred. However, H-TCP achieves a high degree of fairness at the expense of the link utilization. Thus, we understand that all the proposed high speed transport protocols have their own shortcomings. Thus, much more research is needed on high speed transport protocols.
Kazunori SHIMIZU Tatsuyuki ISHIKAWA Nozomu TOGAWA Takeshi IKENAGA Satoshi GOTO
In this paper, we propose a partially-parallel LDPC decoder which achieves a high-efficiency message-passing schedule. The proposed LDPC decoder is characterized as follows: (i) The column operations follow the row operations in a pipelined architecture to ensure that the row and column operations are performed concurrently. (ii) The proposed parallel pipelined bit functional unit enables the column operation module to compute every message in each bit node which is updated by the row operations. These column operations can be performed without extending the single iterative decoding delay when the row and column operations are performed concurrently. Therefore, the proposed decoder performs the column operations more frequently in a single iterative decoding, and achieves a high-efficiency message-passing schedule within the limited decoding delay time. Hardware implementation on an FPGA and simulation results show that the proposed partially-parallel LDPC decoder improves the decoding throughput and bit error performance with a small hardware overhead.
CheolHong KIM SungWoo CHUNG ChuShik JHON
Energy efficiency of cache memories is crucial in designing embedded processors. Reducing energy consumption in the instruction cache is especially important, since the instruction cache consumes a significant portion of total processor energy. This paper proposes a new instruction cache architecture, named Partitioned Instruction Cache (PI-Cache), for reducing dynamic energy consumption in the instruction cache by partitioning it to smaller (less power-consuming) sub-caches. When the proposed PI-Cache is accessed, only one sub-cache is accessed by utilizing the temporal/spatial locality of applications. In the meantime, other sub-caches are not accessed, leading to dynamic energy reduction. The PI-Cache also reduces dynamic energy consumption by eliminating the energy consumed in tag lookup and comparison. Moreover, the performance gap between the conventional instruction cache and the proposed PI-Cache becomes little when the physical cache access time is considered. We evaluated the energy efficiency by running a cycle accurate simulator, SimpleScalar, with power parameters obtained from CACTI. Simulation results show that the PI-Cache improves the energy-delay product by 20%-54% compared to the conventional direct-mapped instruction cache.
Nuo ZHANG Jianming LU Takashi YAHAGI
In this study, we propose a robust approach for blind source separation (BSS) by using radial basis function networks (RBFNs) and higher-order statistics (HOS). The RBFN is employed to estimate the inverse of a hypothetical complicated mixing procedure. It transforms the observed signals into high-dimensional space, in which one can simply separate the transformed signals by using a cost function. Recently, Tan et al. proposed a nonlinear BSS method, in which higher-order moments between source signals and observations are matched in the cost function. However, it has a strict restriction that it requires the higher-order statistics of sources to be known. We propose a cost function that consists of higher-order cumulants and the second-order moment of signals to remove the constraint. The proposed approach has the capacity of not only recovering the complicated mixed signals, but also reducing noise from observed signals. Simulation results demonstrate the validity of the proposed approach. Moreover, a result of application to X-ray image separation also shows its practical applicability.
Md. Khademul Islam MOLLA Keikichi HIROSE Nobuaki MINEMATSU
The Hilbert transformation together with empirical mode decomposition (EMD) produces Hilbert spectrum (HS) which is a fine-resolution time-frequency representation of any nonlinear and non-stationary signal. The EMD decomposes the mixture signal into some oscillatory components each one is called intrinsic mode function (IMF). Some modification of the conventional EMD is proposed here. The instantaneous frequency of every real valued IMF component is computed with Hilbert transformation. The HS is constructed by arranging the instantaneous frequency spectra of IMF components. The HS of the mixture signal is decomposed into subspaces corresponding to the component sources. The decomposition is performed by applying independent component analysis (ICA) and Kulback-Leibler divergence based K-means clustering on the selected number of bases derived from HS of the mixture. The time domain source signals are assembled by applying some post processing on the subspaces. We have produced experimental results using the proposed separation technique.
Hong WANG Gang QIU Deng-Guo FENG Guo-Zhen XIAO
In PKC'01, Tzeng et al. proposed two robust forward-secure signature schemes with proactive security: one is an efficient scheme, but it requires a manager; the other scheme is a new construction based on distributed multiplication procedures. In this paper, we point out their new distributed multiplication procedure is not secure, thus making the whole new construction insecure. Finally, we present an improved forward-secure signature scheme without a manager.
Jehyuk RYU Sungho YUN Kyungjin SONG Jundong CHO Jongmoo CHOI Sukhan LEE
This paper introduces the hardware platform of the structured light processing based on depth imaging to perform a 3D modeling of cluttered workspace for home service robots. We have discovered that the degradation of precision and robustness comes mainly from the overlapping of multiple codes in the signal received at a camera pixel. Considering the criticality of separating the overlapped codes to precision and robustness, we proposed a novel signal separation code, referred to here as "Hierarchically Orthogonal Code (HOC)," for depth imaging. The proposed HOC algorithm was implemented by using hardware platform which applies the Xilinx XC2V6000 FPGA to perform a real time 3D modeling and the invisible IR (Infrared) pattern lights to eliminate any inconveniences for the home environment. The experimental results have shown that the proposed HOC algorithm significantly enhances the robustness and precision in depth imaging, compared to the best known conventional approaches. Furthermore, after we processed the HOC algorithm implemented on our hardware platform, the results showed that it required 34 ms of time to generate one 3D image. This processing time is about 24 times faster than the same implementation of HOC algorithm using software, and the real-time processing is realized.
Shigeki MATSUDA Takatoshi JITSUHIRO Konstantin MARKOV Satoshi NAKAMURA
In this paper, we describe a parallel decoding-based ASR system developed of ATR that is robust to noise type, SNR and speaking style. It is difficult to recognize speech affected by various factors, especially when an ASR system contains only a single acoustic model. One solution is to employ multiple acoustic models, one model for each different condition. Even though the robustness of each acoustic model is limited, the whole ASR system can handle various conditions appropriately. In our system, there are two recognition sub-systems which use different features such as MFCC and Differential MFCC (DMFCC). Each sub-system has several acoustic models depending on SNR, speaker gender and speaking style, and during recognition each acoustic model is adapted by fast noise adaptation. From each sub-system, one hypothesis is selected based on posterior probability. The final recognition result is obtained by combining the best hypotheses from the two sub-systems. On the AURORA-2J task used widely for the evaluation of noise robustness, our system achieved higher recognition performance than a system which contains only a single model. Also, our system was tested using normal and hyper-articulated speech contaminated by several background noises, and exhibited high robustness to noise and speaking styles.
Takashi MORIMOTO Hidekazu ADACHI Osamu KIRIYAMA Tetsushi KOIDE Hans Jurgen MATTAUSCH
This letter presents a boundary-active-only (BAO) power reduction technique for cell-network-based region-growing video segmentation. The key approach is an adaptive situation-dependent power switching of each network cell, namely only cells at the boundary of currently grown regions are activated, and all the other cells are kept in low-power stand-by mode. The effectiveness of the proposed technique is experimentally confirmed with CMOS test-chips having small-scale cell networks of up to 4133 cells, where an average of only 1.7% of the cells remains active after application of the proposed approach. About 85% power reduction is thus achievable without sacrificing real-time processing.
Hui QIN Tsutomu SASAO Yukihiro IGUCHI
This paper addresses a pipelined partial rolling (PPR) architecture for the AES encryption. The key technique is the PPR architecture. With the proposed architecture on the Altera Stratix FPGA, two PPR implementations achieve 6.45 Gbps throughput and 12.78 Gbps throughput, respectively. Compared with the unrolling implementation that achieves a throughput of 22.75 Gbps on the same FPGA, the two PPR implementations improve the memory efficiency (i.e., throughput divided by the size of memory for core) by 13.4% and 12.3%, respectively, and reduce the amount of the memory by 75% and 50%, respectively. Also, the PPR implementation has a up to 9.83% higher memory efficiency than the fastest previous FPGA implementation known to date. In terms of resource efficiency (i.e., throughput divided by the equivalent logic element or slice), one PPR implementation offers almost the same as the rolling implementation, and the other PPR implementation offers a medium value between the rolling implementation and the unrolling implementation that has the highest resource efficiency. However, the two PPR implementations can be implemented on the minimum-sized Stratix FPGA while the unrolling implementation cannot. The PPR architecture fills the gap between unrolling and rolling architectures and is suitable for small and medium-sized FPGAs.
Shingo KUROIWA Yoshiyuki UMEDA Satoru TSUGE Fuji REN
In this paper, we propose a distributed speaker recognition method using a nonparametric speaker model and Earth Mover's Distance (EMD). In distributed speaker recognition, the quantized feature vectors are sent to a server. The Gaussian mixture model (GMM), the traditional method used for speaker recognition, is trained using the maximum likelihood approach. However, it is difficult to fit continuous density functions to quantized data. To overcome this problem, the proposed method represents each speaker model with a speaker-dependent VQ code histogram designed by registered feature vectors and directly calculates the distance between the histograms of speaker models and testing quantized feature vectors. To measure the distance between each speaker model and testing data, we use EMD which can calculate the distance between histograms with different bins. We conducted text-independent speaker identification experiments using the proposed method. Compared to results using the traditional GMM, the proposed method yielded relative error reductions of 32% for quantized data.
This paper introduces a modified particle swarm optimizer (PSO) called the Multi-Species Particle Swarm Optimizer (MSPSO) for locating all the global minima of multi-modal functions. MSPSO extend the original PSO by dividing the particle swarm spatially into a multiple cluster called a species in a multi-dimensional search space. Each species explores a different area of the search space and tries to find out the global or local optima of that area. We test our MSPSO for several multi-modal functions with multiple global optima. Our MSPSO can successfully locate all the global optima of all the test functions, and in particular, can locate all 18 global optima of the two-dimensional Shubert function. We also examined how the performance of MSPSO depends on various algorithm parameters.
Tetsuji OGAWA Tetsunori KOBAYASHI
A discriminative modeling is applied to optimize the structure of a Partly-Hidden Markov Model (PHMM). PHMM was proposed in our previous work to deal with the complicated temporal changes of acoustic features. It can represent observation dependent behaviors in both observations and state transitions. In the formulation of the previous PHMM, we used a common structure for all models. However, it is expected that the optimal structure which gives the best performance differs from category to category. In this paper, we designed a new structure optimization method in which the dependence of the states and the observations of PHMM are optimally defined according to each model using the weighted likelihood-ratio maximization (WLRM) criterion. The WLRM criterion gives high discriminability between the correct category and the incorrect categories. Therefore it gives model structures with good discriminative performance. We define the model structure combination which satisfy the WLRM criterion for any possible structure combinations as the optimal structures. A genetic algorithm is also applied to the adequate approximation of a full search. With results of continuous lecture talk speech recognition, the effectiveness of the proposed structure optimization is shown: it reduced the word errors compared to HMM and PHMM with a common structure for all models.
Masakiyo FUJIMOTO Satoshi NAKAMURA
This paper addresses a speech recognition problem in non-stationary noise environments: the estimation of noise sequences. To solve this problem, we present a particle filter-based sequential noise estimation method for front-end processing of speech recognition in noise. In the proposed method, a noise sequence is estimated in three stages: a sequential importance sampling step, a residual resampling step, and finally a Markov chain Monte Carlo step with Metropolis-Hastings sampling. The estimated noise sequence is used in the MMSE-based clean speech estimation. We also introduce Polyak averaging and feedback into a state transition process for particle filtering. In the evaluation results, we observed that the proposed method improves speech recognition accuracy in the results of non-stationary noise environments a noise compensation method with stationary noise assumptions.