Eunji PAK Sang-Hoon KIM Jaehyuk HUH Seungryoul MAENG
Although shared caches allow the dynamic allocation of limited cache capacity among cores, traditional LRU replacement policies often cannot prevent negative interference among cores. To address the contention problem in shared caches, cache partitioning and application scheduling techniques have been extensively studied. Partitioning explicitly determines cache capacity for each core to maximize the overall throughput. On the other hand, application scheduling by operating systems groups the least interfering applications for each shared cache, when multiple shared caches exist in systems. Although application scheduling can mitigate the contention problem without any extra hardware support, its effect can be limited for some severe contentions. This paper proposes a low cost solution, based on application scheduling with a simple cache insertion control. Instead of using a full hardware-based cache partitioning mechanism, the proposed technique mostly relies on application scheduling. It selectively uses LRU insertion to the shared caches, which can be added with negligible hardware changes from the current commercial processor designs. For the completeness of cache interference evaluation, this paper examines all possible mixes from a set of applications, instead of using a just few selected mixes. The evaluation shows that the proposed technique can mitigate the cache contention problem effectively, close to the ideal scheduling and partitioning.
Kazuya TAKAHASHI Tatsuya MORI Yusuke HIROTA Hideki TODE Koso MURAKAMI
In recent years, real-time streaming has become widespread as a major service on the Internet. However, real-time streaming has a strict playback deadline. Application level multicasts using multiple distribution trees, which are known as forests, are an effective approach for reducing delay and jitter. However, the failure or departure of nodes during forest-based multicast transfer can severely affect the performance of other nodes. Thus, the multimedia data quality is degraded until the distribution trees are repaired. This means that increasing the speed of recovery from isolation is very important, especially in real-time streaming services. In this paper, we propose three methods for resolving this problem. The first method is a random-based proactive method that achieves rapid recovery from isolation and gives efficient “Randomized Forwarding” via cooperation among distribution trees. Each node forwards the data it receives to child nodes in its tree, and then, the node randomly transferring it to other trees with a predetermined probability. The second method is a reactive method, which provides a reliable isolation recovery method with low overheads. In this method, an isolated node requests “Continuous Forwarding” from other nodes if it detects a problem with a parent node. Forwarding to the nearest nodes in the IP network ensures that this method is efficient. The third method is a hybrid method that combines these two methods to achieve further performance improvements. We evaluated the performances of these proposed methods using computer simulations. The simulation results demonstrated that our proposed methods delivered isolation recovery and that the hybrid method was the most suitable for real-time streaming.
Takeshi USUI Kiyohide NAKAUCHI Yozo SHOJI Yoshinori KITATSUJI Hidetoshi YOKOTA Nozomu NISHINAGA
This paper proposes a session state migration architecture for flexible server consolidation. One of technical challenges is how to split a session state from a connection and bind the session state to another connection in any servers. A conventional server and client application assumes that a session state is statically bound to a connection once the connection has been established. The proposed architecture reduces the migration latency, compared to an existing study by splitting the session state from the connection. This paper classifies common procedures of session state migration for various services. The session state migration architecture enables service providers to conduct server maintenance at their own convenience, and to conserve energy consumption at servers by consolidating them. A simulation to evaluate server consolidation reveals that the session state migration reduces the number of servers for accommdating users, compared to virtual machine migration. This paper also shows implementation of the session state migration architecture. Experimental results reveal that the impact caused by the proposed architecture on real-time applications is small.
The string analysis is a static analysis of dynamically generated strings in a target program, which is applied to check well-formed string construction in web applications. The string analysis constructs a finite state automaton that approximates a set of possible strings generated for a particular string variable at a program location at runtime. A drawback in the string analysis is imprecision in the analysis result, leading to false positives in the well-formedness checkers. To address the imprecision, this paper proposes an improvement technique of the string analysis to make it perform more precise analysis with respect to input validation in web applications. This paper presents the improvement by annotations representing screening of a set of possible strings, and empirical evaluation with experiments of the improved analyzer on real-world web applications.
Seon-Man HWANG Yi-Jung JUNG Hyuk-Min KWON Jae-Hyung JANG Ho-Young KWAK Sung-Kyu KWON Seung-Yong SUNG Jong-Kwan SHIN Yi-Sun CHUNG Da-Soon LEE Hi-Deok LEE
In this paper, we suggest a novel pnp BJT structure to improve the matching characteristics of the bipolar junction transistor (BJT) which is fabricated using standard CMOS process. In the case of electrical characteristics, the collector current density Jc of the proposed structure (T2) is a little greater than the conventional structure (T1), which contributes to the greater current gain β of the proposed structure than the conventional structure. Although the matching characteristics of the collector current density of the proposed structure is almost similar to the conventional structure, that of the current gain of the proposed structure is better than the conventional structure about 14.81% due to the better matching characteristics of the base current density of the proposed structure about 59.34%. Therefore, the proposed BJT structure is desirable for high performance analog/digital mixed signal application.
Sritrusta SUKARIDHOTO Nobuo FUNABIKI Toru NAKANISHI Kan WATANABE Shigeto TAJIMA
As a flexible and cost-efficient scalable Internet access network, we studied architectures, protocols, and design optimizations of the Wireless Internet-access Mesh NETwork (WIMNET). WIMNET is composed of multiple access points (APs) connected through multihop wireless communications on IEEE 802.11 standards. The increasing popularity of real-time applications such as IP-phones and IP-TV means that they should be supported in WIMNET. However, the contention resolution mechanism using a random backoff-time in the CSMA/CA protocol of 802.11 standards is not sufficient for handling real-time traffic in multihop wireless communications. In this paper, we propose a Fixed Backoff-time Switching (FBS) method for the CSMA/CA protocol to improve the real-time traffic performance in WIMNET by giving the necessary activation chances to each link. We implement our proposal on the QualNet simulator, and verify its effectiveness through simulations on three network topologies with four scenarios.
Pingguo HUANG Yutaka ISHIBASHI
Multi-sensory communications with haptics attract a number of researchers in recent years. To provide services of the communications with high realistic sensations, the researchers focus on the quality of service (QoS) control, which keeps as high quality as possible, and the quality of experience (QoE) assessment, which is carried out to investigate the influence on user perception and to verify the effectiveness of QoS control. In this paper, we report the present status of studies on multi-sensory communications with haptics. Then, we divide applications of the communications into applications in virtual environments and those in real environments, and we mainly describe collaborative work and competitive work in each of the virtual and real environments. We also explain QoS control which is applied to the applications and QoE assessment carried out in them. Furthermore, we discuss the future directions of studies on multi-sensory communications.
Yousic LEE Jae-Dong LEE Taekeun PARK
In this letter, for offloading traffic to Wireless Local Area Network (WLAN) with transport layer mobility where WLAN service is intermittently available, we propose a novel scheme to freeze and melt the timeout handling procedure of SCTP. Simulation results show that the proposed scheme significantly improves the performance in terms of file transfer completion time.
This paper presents an algorithmic approach to acquiring the influencing relationships among users by discovering implicit influencing group structure from smartphone usage. The method assumes that a time series of users' application downloads and activations can be represented by individual inter-personal influence factors. To achieve better predictive performance and also to avoid over-fitting, a latent feature model is employed. The method tries to extract the latent structures by monitoring cross validating predictive performances on approximated influence matrices with reduced ranks, which are generated based on an initial influence matrix obtained from a training set. The method adopts Nonnegative Matrix Factorization (NMF) to reduce the influence matrix dimension and thus to extract the latent features. To validate and demonstrate its ability, about 160 university students voluntarily participated in a mobile application usage monitoring experiment. An empirical study on real collected data reveals that the influencing structure consisted of six influencing groups with two types of mutual influence, i.e. intra-group influence and inter-group influence. The results also highlight the importance of sparseness control on NMF for discovering latent influencing groups. The obtained influencing structure provides better predictive performance than state-of-the-art collaborative filtering methods as well as conventional methods such as user-based collaborative filtering techniques and simple popularity.
An instrument that can efficiently measure individual competency of IT applications (ICITA) is presented. It allows an organization to develop and manage the IT application capability of individuals working in an enterprise IT environment. The measurement items are generated from the definition and major components of individual competency of IT applications. The reliability and validity of the instrument construct are verified by factor and correlation analysis. A 15-item instrument is proposed to efficiently measure individual competency of IT applications and the instrument will contribute to the improved ICITA of human resources working in an enterprise IT environment.
Chul Bum KIM Doo Hyung WOO Hee Chul LEE
This paper presents a novel CMOS readout circuit for satellite infrared time delay and integration (TDI) arrays. An integrate-while-read method is adopted, and a dead-pixel-elimination circuit for solving a critical problem of the TDI scheme is integrated within a chip. In addition, an adaptive charge capacity control method is proposed to improve the signal-to-noise ratio (SNR) for low-temperature targets. The readout circuit was fabricated with a 0.35-µm CMOS process for a 5004 mid-wavelength infrared (MWIR) HgCdTe detector array. Using the circuit, a 90% background-limited infrared photodetection (BLIP) is satisfied over a wide input range (∼200–330 K), and the SNR is improved by 11 dB for the target temperature of 200 K.
Daisuke KANEMOTO Toru IDO Kenji TANIGUCHI
A low power and high performance with third order delta-sigma modulator for audio applications, fabricated in a 0.18 µm CMOS process, is presented. The modulator utilizes a third order noise shaping with only one opamp by using an opamp sharing technique. The opamp sharing among three integrator stages is achieved through the optimal operation timing, which makes use of the load capacitance differences between the three integrator stages. The designed modulator achieves 101.1 dB signal-to-noise ratio (A-weighted) and 101.5 dB dynamic range (A-weighted) with 7.5 mW power consumption from a 3.3 V supply. The die area is 1.27 mm2. The fabricated delta-sigma modulator achieves the highest figure-of-merit among published high performance low power audio delta-sigma modulators.
Bo LIU Peng CAO Min ZHU Jun YANG Leibo LIU Shaojun WEI Longxing SHI
This paper presents a novel architecture design to optimize the reconfiguration process of a coarse-grained reconfigurable architecture (CGRA) called Reconfigurable Multimedia System II ( REMUS-II ). In REMUS-II, the tasks in multi-media applications are divided into two parts: computing-intensive tasks and control-intensive tasks. Two Reconfigurable Processor Units (RPUs) for accelerating computing-intensive tasks and a Micro-Processor Unit (µPU) for accelerating control-intensive tasks are contained in REMUS-II. As a large-scale CGRA, REMUS-II can provide satisfying solutions in terms of both efficiency and flexibility. This feature makes REMUS-II well-suited for video processing, where higher flexibility requirements are posed and a lot of computation tasks are involved. To meet the high requirement of the dynamic reconfiguration performance for multimedia applications, the reconfiguration architecture of REMUS-II should be well designed. To optimize the reconfiguration architecture of REMUS-II, a hierarchical configuration storage structure and a 3-stage reconfiguration processing structure are proposed. Furthermore, several optimization methods for configuration reusing are also introduced, to further improve the performance of reconfiguration process. The optimization methods include two aspects: the multi-target reconfiguration method and the configuration caching strategies. Experimental results showed that, with the reconfiguration architecture proposed, the performance of reconfiguration process will be improved by 4 times. Based on RTL simulation, REMUS-II can support the 1080p@32 fps of H.264 HiP@Level4 and 1080p@40 fps High-level MPEG-2 stream decoding at the clock frequency of 200 MHz. The proposed REMUS-II system has been implemented on a TSMC 65 nm process. The die size is 23.7 mm2 and the estimated on-chip dynamic power is 620 mW.
Jianfeng XU Koichi TAKAGI Shigeyuki SAKAZAWA
This paper presents a system for automatic generation of dancing animation that is synchronized with a piece of music by re-using motion capture data. Basically, the dancing motion is synthesized according to the rhythm and intensity features of music. For this purpose, we propose a novel meta motion graph structure to embed the necessary features including both rhythm and intensity, which is constructed on the motion capture database beforehand. In this paper, we consider two scenarios for non-streaming music and streaming music, where global search and local search are required respectively. In the case of the former, once a piece of music is input, the efficient dynamic programming algorithm can be employed to globally search a best path in the meta motion graph, where an objective function is properly designed by measuring the quality of beat synchronization, intensity matching, and motion smoothness. In the case of the latter, the input music is stored in a buffer in a streaming mode, then an efficient search method is presented for a certain amount of music data (called a segment) in the buffer with the same objective function, resulting in a segment-based search approach. For streaming applications, we define an additional property in the above meta motion graph to deal with the unpredictable future music, which guarantees that there is some motion to match the unknown remaining music. A user study with totally 60 subjects demonstrates that our system outperforms the stat-of-the-art techniques in both scenarios. Furthermore, our system improves the synthesis speed greatly (maximal speedup is more than 500 times), which is essential for mobile applications. We have implemented our system on commercially available smart phones and confirmed that it works well on these mobile phones.
Mingfu XUE Aiqun HU Chunlong HE
We propose a new security model based on MLS Policy to achieve a better security performance on confidentiality, integrity and availability. First, it realizes a combination of BLP model and Biba model through a two-dimensional independent adjustment of integrity and confidentiality. And, the subject's access range is adjusted dynamically according to the security label of related objects and the subject's access history. Second, the security level of the trusted subject is extended to writing and reading privilege range respectively, following the principle of least privilege. Third, it adjusts the objects' security levels after adding confidential information to prevent the information disclosure. Fourth, it uses application-oriented logic to protect specific applications to avoid the degradation of security levels. Thus, it can ensure certain applications operate smoothly. Lastly, examples are presented to show the effectiveness and usability of the proposed model.
Nobuharu KAMI Teruyuki BABA Takashi YOSHIKAWA Hiroyuki MORIKAWA
We study the properties of information dissemination over location-aware gossiping networks leveraging location-based real-time communication applications. Gossiping is a promising method for quickly disseminating messages in a large-scale system, but in its application to information dissemination for location-aware applications, it is important to consider the network topology and patterns of spatial dissemination over the network in order to achieve effective delivery of messages to potentially interested users. To this end, we propose a continuous-space network model extended from Kleinberg's small-world model applicable to actual location-based applications. Analytical and simulation-based study shows that the proposed network achieves high dissemination efficiency resulting from geographically neutral dissemination patterns as well as selective dissemination to proximate users. We have designed a highly scalable location management method capable of promptly updating the network topology in response to node movement and have implemented a distributed simulator to perform dynamic target pursuit experiments as one example of applications that are the most sensitive to message forwarding delay. The experimental results show that the proposed network surpasses other types of networks in pursuit efficiency and achieves the desirable dissemination patterns.
Shouyi YIN Yang HU Zhen ZHANG Leibo LIU Shaojun WEI
Hybrid wired/wireless on-chip network is a promising communication architecture for multi-/many-core SoC. For application-specific SoC design, it is important to design a dedicated on-chip network architecture according to the application-specific nature. In this paper, we propose a heuristic wireless link allocation algorithm for creating hybrid on-chip network architecture. The algorithm can eliminate the performance bottleneck by replacing multi-hop wired paths by high-bandwidth single-hop long-range wireless links. The simulation results show that the hybrid on-chip network designed by our algorithm improves the performance in terms of both communication delay and energy consumption significantly.
Xinhai XU Xuejun YANG Yufei LIN
As supercomputers increase in size, the mean time between failures (MTBF) of a system becomes shorter, and the reliability problem of supercomputers becomes more and more serious. MPI is currently the de facto standard used to build high-performance applications, and researches on the fault tolerance methods of MPI are always hot topics. However, due to the characteristics of MPI programs, most current checkpointing methods for MPI programs need to modify the MPI library (even operating system), or implement a complicated protocol by logging lots of messages. In this paper, we carry forward the idea of Application-Level Checkpointing (ALC). Based on the general fact that programmers are familiar with the communication characteristics of applications, we have developed BC-ALC, a new portable blocking coordinated ALC for MPI programs. BC-ALC neither modifies the MPI library (even operating system) nor logs any message. It implements coordination only by the Barrier operations instead of any complicated protocol. Furthermore, in order to reduce the cost of fault-tolerance, we reduce the synchronization range of the barrier, and design WBC-ALC, a weak blocking coordinated ALC utilizing group synchronization instead of global synchronization based on the communication relationship between processes. We also propose a fault-tolerance framework developed on top of WBC-ALC and discuss an implementation of it. Experimental results on NPB3.3-MPI benchmarks validate BC-ALC and WBC-ALC, and show that compared with BC-ALC, the average coordination time and the average backup time of a single checkpoint in WBC-ALC are reduced by 44.5% and 5.7% respectively.
Souheil BEN AYED Fumio TERAOKA
The evolution of Internet, the growth of Internet users and the new enabled technological capabilities place new requirements to form the Future Internet. Many features improvements and challenges were imposed to build a better Internet, including securing roaming of data and services over multiple administrative domains. In this research, we propose a multi-domain access control infrastructure to authenticate and authorize roaming users through the use of the Diameter protocol and EAP. The Diameter Protocol is a AAA protocol that solves the problems of previous AAA protocols such as RADIUS. The Diameter EAP Application is one of Diameter applications that extends the Diameter Base Protocol to support authentication using EAP. The contributions in this paper are: 1) first implementation of Diameter EAP Application, called DiamEAP, capable of practical authentication and authorization services in a multi-domain environment, 2) extensibility design capable of adding any new EAP methods, as loadable plugins, without modifying the main part, and 3) provision of EAP-TLS plugin as one of the most secure EAP methods. DiamEAP Server basic performances were evaluated and tested in a real multi-domain environment where 200 users attempted to access network using the EAP-TLS method during an event of 4 days. As evaluation results, the processing time of DiamEAP using the EAP-TLS plugin for authentication of 10 requests is about 20 ms while that for 400 requests/second is about 1.9 second. Evaluation and operation results show that DiamEAP is scalable and stable with the ability to handle more than 6 hundreds of authentication requests per second without any crashes. DiamEAP is supported by the AAA working group of the WIDE Project.
Xinning LIU Chen MEI Peng CAO Min ZHU Longxing SHI
This paper proposes a novel sub-architecture to optimize the data flow of REMUS-II (REconfigurable MUltimedia System 2), a dynamically coarse grain reconfigurable architecture. REMUS-II consists of a µPU (Micro-Processor Unit) and two RPUs (Reconfigurable Processor Unit), which are used to speeds up control-intensive tasks and data-intensive tasks respectively. The parallel computing capability and flexibility of REMUS-II makes itself an excellent candidate to process multimedia applications, which require a large amount of memory accesses. In this paper, we specifically optimize the data flow to deal with those performance-hazard and energy-hungry memory accessing in order to meet the bandwidth requirement of parallel computing. The RPU internal memory could work in multiple modes, like 2D-access mode and transformation mode, according to different multimedia access patterns. This novel design can improve the performance up to 26% compared to traditional on-chip memory. Meanwhile, the block buffer is implemented to optimize the off-chip data flow through reducing off-chip memory accesses, which reducing up to 43% compared to direct DDR access. Based on RTL simulation, REMUS-II can achieve 1080p@30 fps of H.264 High Profile@ Level 4 and High Level MPEG2 at 200 MHz clock frequency. The REMUS-II is implemented into 23.7 mm2 silicon on TSMC 65 nm logic process with a 400 MHz maximum working frequency.