Kotaro TERADA Masao YANAGISAWA Nozomu TOGAWA
In deep-submicron era, interconnection delays are not negligible even in high-level synthesis and regular-distributed-register architectures (RDR architectures) have been proposed to cope with this problem. In this paper, we propose a high-level synthesis algorithm using operation chainings which reduces the overall latency targeting RDR architectures. Our algorithm consists of three steps: The first step enumerates candidate operations for chaining. The second step introduces maximal chaining distance (MCD), which gives the maximal allowable inter-island distance on RDR architecture between chaining candidate operations. The last step performs list-scheduling and binding simultaneously based on the results of the two preceding steps. Our algorithm enumerates feasible chaining candidates and selects the best ones for RDR architecture. Experimental results show that our proposed algorithm reduces the latency by up to 40.0% compared to the original approach, and by up to 25.0% compared to a conventional approach. Our algorithm also reduces the number of registers and the number of multiplexers compared to the conventional approaches in some cases.
Naruki KURATA Ryota SHIOYA Masahiro GOSHIMA Shuichi SAKAI
To eliminate CAMs from the load/store queues, several techniques to detect memory access order violation with hash filters composed of RAMs have been proposed. This paper proposes a technique with parallel counting Bloom filters (PCBF). A Bloom filter has extremely low false positive rates owing to multiple hash functions. Although some existing researches claim the use of Bloom filters, none of them make mention to multiple hash functions. This paper also addresses the problem relevant to the variety of access sizes of load/store instructions. The evaluation results show that our technique, with only 2720-bit Bloom filters, achieves a relative IPC of 99.0% while the area and power consumption are greatly reduced to 14.3% and 22.0% compared to a conventional model with CAMs. The filter is much smaller than usual branch predictors.
Takashi IMAGAWA Masayuki HIROMOTO Hiroyuki OCHI Takashi SATO
Time redundancy is sometimes an only option for enhancing circuit reliability when the circuit area is severely restricted. In this paper, a time-redundant error-correction scheme, which is particularly suitable for coarse-grained reconfigurable arrays (CGRAs), is proposed. It judges the correctness of the executions by comparing the results of two identical runs. Once a mismatch is found, the second run is terminated immediately to start the third run, under the assumption that the errors tend to persist in many applications, for selecting the correct result in the three runs. The circuit area and reliability of the proposed method is compared with a straightforward implementation of time-redundancy and a selective triple modular redundancy (TMR). A case study on a CGRA revealed that the area of the proposed method is 1% larger than that of the implementation for the selective TMR. The study also shows the proposed scheme is up to 2.6x more reliable than the full-TMR when the persistent error is predominant.
Dajiang LIU Shouyi YIN Leibo LIU Shaojun WEI
The coarse-grained reconfigurable architecture (CGRA) is a promising computing platform that provides both high performance and high power-efficiency. The computation-intensive portions of an application (e.g. loop nests) are often mapped onto CGRA for acceleration. However, mapping loop nests onto CGRA efficiently is quite a challenge due to the special characteristics of CGRA. To optimize the mapping of loop nests onto CGRA, this paper makes three contributions: i) Establishing a precise performance model of mapping loop nests onto CGRA, ii) Formulating the loop nests mapping as a nonlinear optimization problem based on polyhedral model, iii) Extracting an efficient heuristic algorithm and building a complete flow of mapping loop nests onto CGRA (PolyMAP). Experiment results on most kernels of the PolyBench and real-life applications show that our proposed approach can improve the performance of the kernels by 27% on average, as compared to the state-of-the-art methods. The runtime complexity of our approach is also acceptable.
Seung-Youl KIM Kyoung-Rok CHO Je-Hoon LEE
This paper presents a new parallel architecture of syndrome generator for a high-speed BCH (Bose-Chaudhuri-Hocquenghem) decoder. In particular, the proposed parallel syndrome generators are based on LFSR (linear feedback shift register) architecture to achieve high throughput without significant area overhead. From the experimental results, the proposed approach achieves 4.60 Gbps using 0.25-µm standard CMOS technology. This result is much faster than the conventional byte-wise GFM-based counterpart. The high throughputs are due to the well-tuned hardware implementation using unfolding transformation.
Satoshi NAGATA Yoshihisa KISHIYAMA Motohiro TANNO Kenichi HIGUCHI Mamoru SAWAHASHI
This paper presents the effect of transmit diversity on the initial and neighboring cell search time performance and the most appropriate transmit diversity scheme based on system-level simulations employing synchronization signals for the Long Term Evolution (LTE) downlink. The synchronization signals including the primary synchronization signal (PSS) and secondary synchronization signal (SSS) are the first physical channel that a set of user equipment (UE) acquires at the initial radio-link connection. The transmit diversity candidates assumed in the paper are Precoding Vector Switching (PVS), Cyclic Delay Diversity (CDD), Time Switched Transmit Diversity (TSTD), and Frequency Switched Transmit Diversity (FSTD), which are all suitable for simple blind detection at a UE. System-level simulation results show that transmit diversity is effective in improving the detection probabilities of the received PSS timing and PSS sequence in the first step and those of the SSS sequence and radio frame timing in the second step of the cell search process. We also show that PVS achieves fast cell search time performance of less than approximately 20ms at the location probability of 90% regardless of the inter-cell site distance up to 10km. Hence, we conclude that PVS is the best transmit diversity scheme for the synchronization signals from the viewpoint of decreasing the initial and neighboring cell search times.
Packet classification is a fundamental task in the control of network traffic, protection from cyber threats. Most layer 3 and higher network devices have a packet classification capability that determines whether to permit or discard incoming packets by comparing their headers with a set of rules. Although linear search is an intuitive implementation of packet classification, it is very inefficient. Srinivasan et al. proposed a novel lookup scheme using a hierarchical trie instead of linear search, which realizes faster packet classification with time complexity proportional to rule length rather than the number of rules. However, the hierarchical trie and its various improved algorithms allow only single prefix rules to be processed. Since it is necessary for layer 4 and higher packet classifications to deal with arbitrary bitmask rules in the hierarchical trie, we propose a run-based trie based on the hierarchical trie, but extended to deal with arbitrary bitmask rules. Our proposed algorithm achieves O((dW)2) query time and O(NdW) space complexity with N rules of length dW. The query time of our novel alrorithm doesn't depend on the number of rules. It solves the latency problem caused by increase of the rules in firewalls.
Fatemeh ABRISHAMIAN Katsumi MORISHITA
A novel method was developed to expand and adjust the bandwidth of long-period fiber gratings (LPFGs) as band-rejection filters. The band-rejection filters were constructed by concatenating two LPFGs with an appropriate space, that causes a $pi$-phase shift. The component LPFGs with the same period and the different numbers of periods are designed to have $-$3-dB transmission at wavelengths on both sides of a resonance wavelength symmetrically, and the transmission loss of the concatenated LPFGs peaks at the -3-dB transmission wavelengths. The rejection bandwidth was widened by changing the interval between the -3-dB transmission wavelengths. The concatenated LPFGs were simulated by using a transfer-matrix method based on a discrete coupling model, and were fabricated by a point-by-point arc discharge technique on the basis of the simulation results. It was demonstrated that the rejection bandwidth at 20-dB attenuation reached 26.6,nm and was 2.7 times broader than that of a single uniform LPFG.
Hong LIU Yang YANG Xiumei YANG Zhengmin ZHANG
Small cell networks have been promoted as an enabling solution to enhance indoor coverage and improve spectral efficiency. Users usually deploy small cells on-demand and pay no attention to global profile in residential areas or offices. The reduction of cell radius leads to dense deployment which brings intractable computation complexity for resource allocation. In this paper, we develop a semi-distributed resource allocation algorithm by dividing small cell networks into clusters with limited inter-cluster interference and selecting a reference cluster for interference estimation to reduce the coordination degree. Numerical results show that the proposed algorithm can maintain similar system performance while having low complexity and reduced information exchange overheads.
Juha PETÄJÄJÄRVI Heikki KARVONEN Konstantin MIKHAYLOV Aarno PÄRSSINEN Matti HÄMÄLÄINEN Jari IINATTI
This paper discusses the perspectives of using a wake-up receiver (WUR) in wireless body area network (WBAN) applications with event-driven data transfers. First we compare energy efficiency between the WUR-based and the duty-cycled medium access control protocol -based IEEE 802.15.6 compliant WBAN. Then, we review the architectures of state-of-the-art WURs and discuss their suitability for WBANs. The presented results clearly show that the radio frequency envelope detection based architecture features the lowest power consumption at a cost of sensitivity. The other architectures are capable of providing better sensitivity, but consume more power. Finally, we propose the design modification that enables using a WUR to receive the control commands beside the wake-up signals. The presented results reveal that use of this feature does not require complex modifications of the current architectures, but enables to improve energy efficiency and latency for small data blocks transfers.
Atsushi OOKA Shingo ATA Kazunari INOUE Masayuki MURATA
Content-centric networking (CCN) is an innovative network architecture that is being considered as a successor to the Internet. In recent years, CCN has received increasing attention from all over the world because its novel technologies (e.g., caching, multicast, aggregating requests) and communication based on names that act as addresses for content have the potential to resolve various problems facing the Internet. To implement these technologies, however, requires routers with performance far superior to that offered by today's Internet routers. Although many researchers have proposed various router components, such as caching and name lookup mechanisms, there are few router-level designs incorporating all the necessary components. The design and evaluation of a complete router is the primary contribution of this paper. We provide a concrete hardware design for a router model that uses three basic tables — forwarding information base (FIB), pending interest table (PIT), and content store (CS) — and incorporates two entities that we propose. One of these entities is the name lookup entity, which looks up a name address within a few cycles from content-addressable memory by use of a Bloom filter; the other is the interest count entity, which counts interest packets that require certain content and selects content worth caching. Our contributions are (1) presenting a proper algorithm for looking up and matching name addresses in CCN communication, (2) proposing a method to process CCN packets in a way that achieves high throughput and very low latency, and (3) demonstrating feasible performance and cost on the basis of a concrete hardware design using distributed content-addressable memory.
Ki-Seong LEE Byung-Woo HONG Youngmin KIM Jaeyeop AHN Chan-Gun LEE
Most previous approaches on comparing the results for software architecture recovery are designed to handle only flat decompositions. In this paper, we propose a novel distance called Split-Jaccard Distance of Hierarchical Decompositions. It extends the Jaccard coefficient and incorporates the concept of the splits of leaves in a hierarchical decomposition. We analyze the proposed distance and derive its properties, including the lower-bound and the metric space.
Shingo YOSHIZAWA Mai NOZAKI Hiroshi TANIMOTO
Due to increasing demand for machine-to-machine (M2M) communication, simultaneous connections for many terminals are requested for current wireless communication systems. Interleave division multiple access (IDMA) has superior multiuser detection performance and attains high data transmission efficiency in multiuser communications. This paper describes the VLSI implementation of an interference canceller for OFDM-IDMA systems. The conventional architecture decreases a throughput in pipeline processing due to wait time occurring in interleave and deinterleave memory units. The proposed architecture adopts dual-frame processing to solve the problem of the wait time and achieves a high utilization ratio in pipeline stage operation. In the implementation results, the proposed architecture has reduced circuit area and power consumption by 25% and 41% for BPSK demodulation and 33% and 44% for QPSK demodulation compared with the conventional architecture on the same throughput condition.
Chunyan HOU Chen CHEN Jinsong WANG Kai SHI
With the rise of component-based software development, its reliability has attracted much attention from both academic and industry communities. Component-based software development focuses on architecture design, and thus it is important for reliability analysis to emphasize software architecture. Existing approaches to architecture-based software reliability analysis don't model the usage profile explicitly, and they ignore the difference between the testing profile and the practical profile of components, which limits their applicability and accuracy. In response to these issues, a new reliability modeling and prediction approach is introduced. The approach considers reliability-related architecture factors by explicitly modeling the system usage profile, and transforms the testing profile into the practical usage profile of components by representing the profile with input sub-domains. Finally, the evaluation experiment shows the potential of the approach.
One of the fast approximate similarity search techniques is a binary hashing method that transforms a real-valued vector into a binary code. The similarity between two binary codes is measured by their Hamming distance. In this method, a hash table is often used when undertaking a constant-time similarity search. The number of accesses to the hash table, however, increases when the number of bits lengthens. In this paper, we consider a method that does not access data with a long Hamming radius by using multiple binary codes. Further, we attempt to integrate the proposed approach and the existing multi-index hashing (MIH) method to accelerate the performance of the similarity search in the Hamming space. Then, we propose a learning method of the binary hash functions for multiple binary codes. We conduct an experiment on similarity search utilizing a dataset of up to 50 million items and show that our proposed method achieves a faster similarity search than that possible with the conventional linear scan and hash table search.
Supacheep AMTADE Toshiyuki MIYAMOTO
A cloud system is defined as a large scale computer system that contains running high performance computers and responds to a large number of incoming tasks over the Internet. In this paper, we consider the problem to schedule computational jobs efficiently regarding system resource constraint and introduce a cuckoo search (CS) algorithm. Experimental results show that CS outperforms the genetic algorithm in terms of fitness value.
Up until now, the best public key encryption with multi-dimensional range query (PKMDRQ) scheme has two problems which need to be resolved. One is that the scheme is selectively secure. The other is that the time of decryption is long. To address these problems, we present a method of converting a predicate encryption supporting inner product (IPE) scheme into a PKMDRQ scheme. By taking advantage of this approach, an instance is also proposed. The comparison between the previous work and ours shows that our scheme is more efficient over the time complexity. Moreover, our scheme is adaptively secure.
Rui SHI Shouyi YIN Leibo LIU Qiongbing LIU Shuang LIANG Shaojun WEI
Video Up-scaling is a hotspot in TV display area; as an important brunch of Video Up-scaling, Texture-Based Video Up-scaling (TBVU) method shows great potential of hardware implementation. Coarse-grained Reconfigurable Architecture (CGRA) is a very promising processor; it is a parallel computing platform which provides high performance of hardware, high flexibility of software, and dynamical reconfiguration ability. In this paper we propose an implementation of TBVU on CGRA. We fully exploit the characters of TBVU and utilize several techniques to reduce memory I/O operation and total execution time. Experimental results show that our work can greatly reduce the I/O operation and the execution time compared with the non-optimized ones. We also compare our work with other platforms and find great advantage in execution time and resource utilization rate.
Information networks are an important infrastructure and their resources are shared by many users. In order to utilize their resources efficiently, they should be controlled to prevent synchronization of user traffic. In addition, fairness among users must be assured. This paper discusses the framework of transmission rate control based on chaos. There are two different characteristics that coexist in chaos. One is that the state in the future is extremely sensitive to the initial condition. This makes it impossible to predict the future state at a fine level of detail. The other is the structural stability of macroscopic dynamics. Even if the state is uncertain on the microscopic scale, state dynamics on the macroscopic scale are stable. This paper proposes a novel framework of distributed hierarchical control of transmission rate by interpreting the coexistence of chaos as microscopic fairness of users and macroscopic stable utilization of networks.
Zhangjie FU Xingming SUN Qi LIU Lu ZHOU Jiangang SHU
Cloud computing is becoming increasingly popular. A large number of data are outsourced to the cloud by data owners motivated to access the large-scale computing resources and economic savings. To protect data privacy, the sensitive data should be encrypted by the data owner before outsourcing, which makes the traditional and efficient plaintext keyword search technique useless. So how to design an efficient, in the two aspects of accuracy and efficiency, searchable encryption scheme over encrypted cloud data is a very challenging task. In this paper, for the first time, we propose a practical, efficient, and flexible searchable encryption scheme which supports both multi-keyword ranked search and parallel search. To support multi-keyword search and result relevance ranking, we adopt Vector Space Model (VSM) to build the searchable index to achieve accurate search results. To improve search efficiency, we design a tree-based index structure which supports parallel search to take advantage of the powerful computing capacity and resources of the cloud server. With our designed parallel search algorithm, the search efficiency is well improved. We propose two secure searchable encryption schemes to meet different privacy requirements in two threat models. Extensive experiments on the real-world dataset validate our analysis and show that our proposed solution is very efficient and effective in supporting multi-keyword ranked parallel searches.