Yibo FAN Takeshi IKENAGA Satoshi GOTO
With the increase of key length used in public cryptographic algorithms such as RSA and ECC, the speed of Montgomery multiplication becomes a bottleneck. This paper proposes a high speed design of Montgomery multiplier. Firstly, a modified scalable high-radix Montgomery algorithm is proposed to reduce critical path. Secondly, a high-radix clock-saving dataflow is proposed to support high-radix operation and one clock cycle delay in dataflow. Finally, a hardware-reused architecture is proposed to reduce the hardware cost and a parallel radix-16 design of data path is proposed to accelerate the speed. By using HHNEC 0.25 µm standard cell library, the implementation results show that the total cost of Montgomery multiplier is 130 KGates, the clock frequency is 180 MHz and the throughput of 1024-bit RSA encryption is 352 kbps. This design is suitable to be used in high speed RSA or ECC encryption/decryption. As a scalable design, it supports any key-length encryption/decryption up to the size of on-chip memory.
Yan ZHANG Masato UCHIDA Masato TSURU Yuji OIE
We present a TCP flow level performance evaluation on error rate aware scheduling algorithms in Evolved UTRA and UTRAN networks. With the introduction of the error rate, which is the probability of transmission failure under a given wireless condition and the instantaneous transmission rate, the transmission efficiency can be improved without sacrificing the balance between system performance and user fairness. The performance comparison with and without error rate awareness is carried out dependant on various TCP traffic models, user channel conditions, schedulers with different fairness constraints, and automatic repeat request (ARQ) types. The results indicate that error rate awareness can make the resource allocation more reasonable and effectively improve the system and individual performance, especially for poor channel condition users.
Haeryong PARK Kilsoo CHUN Seungho AHN
Hwang-Lo-Lin proposed a user identification scheme [3] based on the Maurer-Yacobi scheme [6] that is suitable for application to the mobile environment. Hwang-Lo-Lin argued that their scheme is secure against any attack. Against the Hwang-Lo-Lin argument, Liu-Horng-Liu showed that the Hwang-Lo-Lin scheme is insecure against a Liu-Horng-Liu attack mounted by an eavesdrop attacker. However, Liu-Horng-Liu did not propose any improved version of the original identification scheme which is still secure against the Liu-Horng-Liu attack. In this paper, we propose an identification scheme that can solve this problem and a non-interactive public key distribution scheme also.
Tomokazu YONEDA Kimihiko MASUDA Hideo FUJIWARA
This paper presents a power-constrained test scheduling method for multi-clock domain SoCs that consist of cores operating at different clock frequencies during test. In the proposed method, we utilize virtual TAM to solve the frequency gaps between cores and the ATE. Moreover, we present a technique to reduce power consumption of cores during test while the test time of the cores remain the same or increase a little by using virtual TAM. Experimental results show the effectiveness of the proposed method.
Masayuki ARAI Satoshi FUKUMOTO Kazuhiko IWASAKI
In this paper, we propose a scheme for test data reduction which uses broadcaster along with bit-flipping circuit. The proposed scheme can reduce test data without degrading the fault coverage of ATPG, and without requiring or modifying the arrangement of CUT. We theoretically analyze the test data size by the proposed scheme. The numerical examples obtained by the analysis and experimental results show that our scheme can effectively reduce test data if the care-bit rate is not so much low according to the number of scan chains. We also discuss the hybrid scheme of random-pattern-based flipping and single-input-based flipping.
Fawnizu Azmadi HUSSIN Tomokazu YONEDA Alex ORAILOLU Hideo FUJIWARA
This paper proposes a test methodology for core-based testing of System-on-Chips by utilizing the functional bus as a test access mechanism. The functional bus is used as a transportation channel for the test stimuli and responses from a tester to the cores under test (CUT). To enable test concurrency, local test buffers are added to all CUTs. In order to limit the buffer area overhead while minimizing the test application time, we propose a packet-based scheduling algorithm called PAcket Set Scheduling (PASS), which finds the complete packet delivery schedule under a given power constraint. The utilization of test packets, consisting of a small number of bits of test data, for test data delivery allow an efficient sharing of bus bandwidth with the help of an effective buffer-based test architecture. The experimental results show that the methodology is highly effective, especially for smaller bus widths, compared to previous approaches that do not use the functional bus.
Thomas Edison YU Tomokazu YONEDA Danella ZHAO Hideo FUJIWARA
The rapid advancement of VLSI technology has made it possible for chip designers and manufacturers to embed the components of a whole system onto a single chip, called System-on-Chip or SoC. SoCs make use of pre-designed modules, called IP-cores, which provide faster design time and quicker time-to-market. Furthermore, SoCs that operate at multiple clock domains and very low power requirements are being utilized in the latest communications, networking and signal processing devices. As a result, the testing of SoCs and multi-clock domain embedded cores under power constraints has been rapidly gaining importance. In this research, a novel method for designing power-aware test wrappers for embedded cores with multiple clock domains is presented. By effectively partitioning the various clock domains, we are able to increase the solution space of possible test schedules for the core. Since previous methods were limited to concurrently testing all the clock domains, we effectively remove this limitation by making use of bandwidth conversion, multiple shift frequencies and properly gating the clock signals to control the shift activity of various core logic elements. The combination of the above techniques gains us greater flexibility when determining an optimal test schedule under very tight power constraints. Furthermore, since it is computationally intensive to search the entire expanded solution space for the possible test schedules, we propose a heuristic 3-D bin packing algorithm to determine the optimal wrapper architecture and test schedule while minimizing the test time under power and bandwidth constraints.
Hongbin SUO Ming LI Ping LU Yonghong YAN
Robust automatic language identification (LID) is the task of identifying the language from a short utterance spoken by an unknown speaker. The mainstream approaches include parallel phone recognition language modeling (PPRLM), support vector machine (SVM) and the general Gaussian mixture models (GMMs). These systems map the cepstral features of spoken utterances into high level scores by classifiers. In this paper, in order to increase the dimension of the score vector and alleviate the inter-speaker variability within the same language, multiple data groups based on supervised speaker clustering are employed to generate the discriminative language characterization score vectors (DLCSV). The back-end SVM classifiers are used to model the probability distribution of each target language in the DLCSV space. Finally, the output scores of back-end classifiers are calibrated by a pair-wise posterior probability estimation (PPPE) algorithm. The proposed language identification frameworks are evaluated on 2003 NIST Language Recognition Evaluation (LRE) databases and the experiments show that the system described in this paper produces comparable results to the existing systems. Especially, the SVM framework achieves an equal error rate (EER) of 4.0% in the 30-second task and outperforms the state-of-art systems by more than 30% relative error reduction. Besides, the performances of proposed PPRLM and GMMs algorithms achieve an EER of 5.1% and 5.0% respectively.
Masayuki ARAI Satoshi FUKUMOTO Kazuhiko IWASAKI Tatsuru MATSUO Takahisa HIRAIDE Hideaki KONISHI Michiaki EMORI Takashi AIKYO
We developed test data compression scheme for scan-based BIST, aiming to compress test stimuli and responses by more than 100 times. As scan-BIST architecture, we adopt BIST-Aided Scan Test (BAST), and combines four techniques: the invert-and-shift operation, run-length compression, scan address partitioning, and LFSR pre-shifting. Our scheme achieved a 100x compression rate in environments where Xs do not occur without reducing the fault coverage of the original ATPG vectors. Furthermore, we enhanced the masking logic to reduce data for X-masking so that test data is still compressed to 1/100 in a practical environment where Xs occur. We applied our scheme to five real VLSI chips, and the technique compressed the test data by 100x for scan-based BIST.
Min-Cheol HWANG Jun-Hyung KIM Chun-Su PARK Sung-Jea KO
Error concealment at a decoder is an efficient method to reduce the degradation of visual quality caused by channel errors. In this paper, we propose a novel spatio-temporal error concealment algorithm based on the spatial-temporal fading (STF) scheme which has been recently introduced. Although STF achieves good performance for the error concealment, several drawbacks including blurring still remain in the concealed blocks. To alleviate these drawbacks, in the proposed method, hybrid approaches with adaptive weights are proposed. First, the boundary matching algorithm and the decoder motion vector estimation which are well-known temporal error concealment methods are adaptively combined to compensate for the defect of each other. Then, an edge preserved method is utilized to reduce the blurring effects caused by the bilinear interpolation for spatial error concealment. Finally, two concealed results obtained by the hybrid spatial and temporal error concealment are pixel-wisely blended with adaptive weights. Experimental results exhibit that the proposed method outperforms conventional methods including STF in terms of the PSNR performance as well as subjective visual quality, and the computational complexity of the proposed method is similar to that of STF.
Amin SAEEDFAR Hiroyasu SATO Kunio SAWAYA
This paper includes different approaches for analysis of a thin-wire antenna in the presence of de-ionized water box at different temperatures as a high-permittivity three-dimensional dielectric body. In continuation with the previous work of authors, first, the coupled tensor-volume/line integral equations is solved by using Galerkin-based moment method (MoM) consisting of a combination of entire-domain and sub-domain basis functions including three-dimensional polynomials with different degrees. Then, the accuracy of such MoM, specifically for a high-permittivity dielectric scatterer, is substantiated by comparing its numerical results with that of FDTD method and some experimental data.
Youhua SHI Nozomu TOGAWA Masao YANAGISAWA Tatsuo OHTSUKI
In this paper, we presented a Design-for-Secure-Test (DFST) technique for pipelined AES to guarantee both the security and the test quality during testing. Unlike previous works, the proposed method can keep all the secrets inside and provide high test quality and fault diagnosis ability as well. Furthermore, the proposed DFST technique can significantly reduce test application time, test data volume, and test generation effort as additional benefits.
Wei SUN Chen YU Xavier DEFAGO Yasushi INOGUCHI
The scheduling of real-time tasks with fault-tolerant requirements has been an important problem in multiprocessor systems. The primary-backup (PB) approach is often used as a fault-tolerant technique to guarantee the deadlines of tasks despite the presence of faults. In this paper we propose a dynamic PB-based task scheduling approach, wherein an allocation parameter is used to search the available time slots for a newly arriving task, and the previously scheduled tasks can be re-scheduled when there is no available time slot for the newly arriving task. In order to improve the schedulability we also propose an overloading strategy for PB-overloading and Backup-backup (BB) overloading. Our proposed task scheduling algorithm is compared with some existing scheduling algorithms in the literature through simulation studies. The results have shown that the task rejection ratio of our real-time task scheduling algorithm is almost 50% lower than the compared algorithms.
Kazuya HARAGUCHI Mutsunori YAGIURA Endre BOROS Toshihide IBARAKI
We consider a data set in which each example is an n-dimensional Boolean vector labeled as true or false. A pattern is a co-occurrence of a particular value combination of a given subset of the variables. If a pattern appears frequently in the true examples and infrequently in the false examples, we consider it a good pattern. In this paper, we discuss the problem of determining the data size needed for removing "deceptive" good patterns; in a data set of a small size, many good patterns may appear superficially, simply by chance, independently of the underlying structure. Our hypothesis is that, in order to remove such deceptive good patterns, the data set should contain a greater number of examples than that at which a random data set contains few good patterns. We justify this hypothesis by computational studies. We also derive a theoretical upper bound on the needed data size in view of our hypothesis.
Toshiaki KAMADA Nobuaki MINEMATSU Takashi OSANAI Hisanori MAKINAE Masumi TANIMOTO
In forensic voice telephony speaker verification, we may be requested to identify a speaker in a very noisy environment, unlike the conditions in general research. In a noisy environment, we process speech first by clarifying it. However, the previous study of speaker verification from clarified speech did not yield satisfactory results. In this study, we experimented on speaker verification with clarification of speech in a noisy environment, and we examined the relationship between improving acoustic quality and speaker verification results. Moreover, experiments with realistic noise such as a crime prevention alarm and power supply noise was conducted, and speaker verification accuracy in a realistic environment was examined. We confirmed the validity of speaker verification with clarification of speech in a realistic noisy environment.
Xiang-Hui WEI Shen LI Yang SONG Satoshi GOTO
Motion estimation (ME) is a computation-intensive module in video coding system. In MPEG-2 to H.264 transcoding, motion vector (MV) from MPEG-2 reused as search center in H.264 encoder is a simple but effective technique to simplify ME processing. However, directly applying MPEG-2 MV as search center will bring difficulties on application of data reuse method in hardware design, because the irregular overlapping of search windows between successive macro block (MB). In this paper, we propose a search window reuse scheme for transcoding, especially for HDTV application. By utilizing the similarity between neighboring MV, overlapping area of search windows can be regularized. Experiment results show that our method achieves average 93.1% search window reuse-rate in HDTV720p sequence with almost no video quality degradation. Compared to transcoding method without any data reuse scheme, bandwidth of the proposed method can be reduced to 40.6% of that.
Huan SUN Xinyu WANG Xiaohu YOU
In this paper, a novel user scheduling algorithm for maximizing the sum-rate capacity of inhomogeneous network is investigated. In order to extract the multi-user diversity order and reduce the feedback quantity, selective feedback scheme is adopted. An algorithm of key parameter, the prescribed threshold, is proposed. Numerical simulations show that when adopted the proposed threshold in the inhomogeneous networks, selective feedback scheme can still preserve the majority of the sum-rate capacity of the full back scheme, while the feedback load is significantly reduced.
The N-dimensional (N-D) Hilbert curve is a one-to-one mapping between N-D space and one-dimensional (1-D) space. It is studied actively in the area of digital image processing as a scan technique (Hilbert scan) because of its property of preserving the spatial relationship of the N-D patterns. Currently there exist several Hilbert scan algorithms. However, these algorithms have two strict restrictions in implementation. First, recursive functions are used to generate a Hilbert curve, which makes the algorithms complex and computationally expensive. Second, all the sides of the scanned region must have the same size and the length must be a power of two, which limits the application of the Hilbert scan greatly. Thus in order to remove these constraints and improve the Hilbert scan for general application, a nonrecursive N-D Pseudo-Hilbert scan algorithm based on two look-up tables is proposed in this paper. The merit of the proposed algorithm is that implementation is much easier than the original one while preserving the original characteristics. The experimental results indicate that the Pseudo-Hilbert scan can preserve point neighborhoods as much as possible and take advantage of the high correlation between neighboring lattice points, and it also shows the competitive performance of the Pseudo-Hilbert scan in comparison with other common scan techniques. We believe that this novel scan technique undoubtedly leads to many new applications in those areas can benefit from reducing the dimensionality of the problem.
Ching-Ian SHIE Yi-Chyun CHIANG Jinq-Min LIN
This work presents a technique to enhance the performance of the conventional PMOS Colpitts VCO circuit. This technique is accomplished by adding an NMOS cross-coupled pair under the traditional differential Colpitts VCO to enhance the oscillator startup condition and its efficiency. The analytics also support this viewpoint and present a device- choosing method to optimize the output power and phase noise. This new VCO can also be applied to realize the QVCO circuit, because the coupling transistors can be placed in parallel, connecting with the transistors in the NMOS cross-coupled pair, to achieve the proper coupling between individual VCOs. To verify the proposed design concept, two prototypes, which are VCO and QVCO operated at 2.4 GHz and fabricated in CMOS 0.25-µm technology, are designed and tested. The measurement results show that the performance of VCO demonstrates a FOM of about 180 dBC/Hz, and the phase noise of QVCO is -116 dBc/Hz at the 1 MHz offset from oscillation frequency.
This paper describes novel CMOS level-conversion flip-flops for use in low-power SoCs with clustered voltage scaling. These flip-flops feed outputs directly into the front stage to support self-resetting and conditional operations. They thus have simple structures to avoid clock level shifting and redundant transitions, leading to substantial improvements in terms of power and area. The comparison results indicate that the proposed level-conversion flip-flops achieve power and area savings up to 50% and 31%, respectively, with no speed degradation as compared to conventional level-conversion flip-flops.