1-12hit |
Min CHOI Namgi KIM Seungryoul MAENG
In this paper, we describe a single system image (SSI) architecture for distributed systems. The SSI architecture is constructed through three components: single process space (SPS), process migration, and dynamic load balancing. These components attempt to share all available resources in the cluster among all executing processes, so that the distributed system operates like a single node with much more computing power. To this end, we first resolve broken pipe problems and bind errors on server socket in process migration. Second, we realize SPS based on block process identifier (PID) allocation. Finally, we design and implement a dynamic load balancing scheme. The dynamic load balancing scheme exploits our novel metric, effective tasks, to effectively distribute jobs to a large distributed system. The experimental results show that these three components present scalability, new functionality, and performance improvement in distributed systems.
YoungBae JANG SeungRyoul MAENG JungWan CHO
An active network has the advantage of being able to accept new protocols quickly and easily. The cluster-based active router can provide sufficient computing power for customized computations. In the router architecture, load balancing is achieved by the efficient distribution of packets. We present a packet distribution scheme according to estimated processing time.
Modern microprocessors achieve high application performance at the acceptable level of power dissipation. In terms of power to performance trade-off, the instruction window is particularly important. This is because enlarging the window size achieves high performance but naive scaling of the conventional instruction window can severely increase the complexity and power consumption. In this paper, we propose low-power instruction window techniques for contemporary microprocessors. First, the small reorder buffer (SROB) reduces power dissipation by deferred allocation and early release. The deferred allocation delays the SROB allocation of instructions until their all data dependencies are resolved. Then, the instructions are executed in program order and they are released faster from the SROB. This results in higher resource utilization and low power consumption. Second, we replace a conventional issue queue by a direct lookup table (DLT) with an efficient tag translation technique. The translation scheme resolves the instruction dependency, especially for the case of one producer to multiple consumers. The efficiency of the translation scheme stems from the fact that the vast majority of instruction dependency exists within a basic block. Experimental results show that our proposed design reduces the power consumption significantly for SPEC2000 benchmarks.
Eunji PAK Sang-Hoon KIM Jaehyuk HUH Seungryoul MAENG
Although shared caches allow the dynamic allocation of limited cache capacity among cores, traditional LRU replacement policies often cannot prevent negative interference among cores. To address the contention problem in shared caches, cache partitioning and application scheduling techniques have been extensively studied. Partitioning explicitly determines cache capacity for each core to maximize the overall throughput. On the other hand, application scheduling by operating systems groups the least interfering applications for each shared cache, when multiple shared caches exist in systems. Although application scheduling can mitigate the contention problem without any extra hardware support, its effect can be limited for some severe contentions. This paper proposes a low cost solution, based on application scheduling with a simple cache insertion control. Instead of using a full hardware-based cache partitioning mechanism, the proposed technique mostly relies on application scheduling. It selectively uses LRU insertion to the shared caches, which can be added with negligible hardware changes from the current commercial processor designs. For the completeness of cache interference evaluation, this paper examines all possible mixes from a set of applications, instead of using a just few selected mixes. The evaluation shows that the proposed technique can mitigate the cache contention problem effectively, close to the ideal scheduling and partitioning.
EuiHoon JEONG Lillykutty JACOB SeungRyoul MAENG
In this paper, we propose a dynamic TDMA with priority-based request packet transmission scheme (D-TDMA/PRPTS) which applies priority-based request packet transmission scheme instead of slotted ALOHA (S-ALOHA). D-TDMA/PRPTS can avoid collisions between voice request packets and data request packets and transmit voice request packets preferentially. This makes D-TDMA/PRPTS enlarge the system capacity for voice users with SAD. We analyze voice packet dropping probability and channel utilization for voice traffic by using an appropriate Markov model. We also present simulation results to verify the analysis and to investigate data performances as well, with the voice-data integrated scenario.
We propose an Edge-write architecture which performs eager update propagation for update requests for the corresponding secondary server, whereas it lazily propagates updates from other secondary servers. Our architecture resolves consistency problems caused by read/update decoupling in the conventional lazy update propagation-based system. It also improves overall scalability by alleviating the performance bottleneck at the primary server in compensation for increased but bounded response time. Such relaxed consistency management enables a read request to choose whether to read the replicated data immediately or to refresh it. We use the age of a local data copy as the freshness factor so that a secondary server can make a decision for freshness control independently. As a result, our freshness-controlled edge-write architecture benefits by adjusting a tradeoff between the response time and the correctness of data.
Ara KHIL Seungryoul MAENG Jung Wan CHO
The problem of non-preemptive scheduling of real-time periodic tasks with specified release times on a uniprocessor system is known as NP-hard problem. In this paper we propose a new non-preemptive scheduling algorithm and a new static scheduling strategy which use the repetitiveness and the predictability of periodic tasks in order to improve schedulabilities of real-time periodic tasks with specified release times. The proposed scheduling algorithm schedules periodic tasks by using the heuristic that precalculates if the scheduling of the selected task leads to the case that a task misses a deadline when tasks are scheduled by the non-preemptive EDF algorithm. If so, it defers the scheduling of the selected task to avoid the precalculated deadline-missing. Otherwise, it schedules the selected task in the same way as the non-preemptive EDF algorithm. Our scheduling algorithm can always find a feasible schedule for the set of periodic tasks with specified release times which is schedulable by the non-preemptive EDF algorithm. Our static sheduling strategy transforms the problem of non-preemptive scheduling for periodic tasks with specified release times into one with same release times for all tasks. It suggests dividing the given problem into two subproblems, making a non-preemptive scheduling algorithm to find two feasible subschedules for the two subproblems in the forward or backward scheduling within specific time intervals, and then combining the two feasible subschedules into a complete feasible schedule for the given problem. We present the release times as a function of periods for the efficient problem division. Finally, we show improvements of schedulabilities of our scheduling algorithm and scheduling strategy by simulation results.
Sangwon SEO Sangbae YUN Jaehong KIM Inkyo KIM Seongwook JIN Seungryoul MAENG
An increasing number of IoT devices are being introduced to the market in many industries, and the number of devices is expected to exceed billions in the near future. With this trend, many researchers have proposed new architectures to manage IoT devices, but the proposed architecture requires a huge memory footprint and computation overheads to look-up billions of devices. This paper proposes a hybrid hashing architecture called H- TLA to solve the problem from an architectural point of view, instead of modifying a hashing algorithm or designing a new one. We implemented a prototype system that shows about a 30% increase in performance while conserving uniformity. Therefore, we show an efficient architecture-level approach for addressing billions of devices.
Jinho SEOL Seongwook JIN Seungryoul MAENG
Even though cloud users want to keep their data on clouds secure, it is not easy to protect the data because cloud administrators could be malicious and hypervisor could be compromised. To solve this problem, hardware-based memory isolation schemes have been proposed. However, the data in virtual storage are not protected by the memory isolation schemes, and thus, a guest OS should encrypt the data. In this paper, we address the problems of the previous schemes and propose a hardware-based storage isolation scheme. The proposed scheme enables to protect user data securely and to achieve performance improvement.
Hyunku JEONG Seungryoul MAENG Youngsu CHAE
With the increasing deployment of mobile devices and the advent of broadband wireless access systems such as WiBro, mWiMAX, and HSDPA, an efficient IP mobility management protocol becomes one of the most important technical issues for the successful deployment of the broadband wireless data networking service. IETF has proposed the Mobile IPv6 as the basic mobility management protocol for IPv6 networks. To enhance the performance of the basic MIPv6, researchers have been actively working on HMIPv6 and FMIPv6 protocols. In this paper, we propose a new mobility management protocol, HIMIPv6 (Highly Integrated MIPv6), which tightly integrates the hierarchical mobility management mechanism of the HMIPv6 and the proactive handover support of the FMIPv6 to enhance the handover performance especially for the cellular networking environment with high frequent handover activities. We have performed extensive simulation study using ns2 and the results show that the proposed HIMIPv6 outperforms FMIPv6 and HMIPv6. There is no packet loss and consequent service interruption caused by IP handover in HIMIP.
Jaegeuk KIM Jinho SEOL Seungryoul MAENG
This letter introduces a buffer management issue in designing SSDs for log-structured file systems (LFSs). We implemented a novel trace-driven SSD simulator in SystemC language, and simulated several SSD architectures with the NILFS2 trace. From the results, we give two major considerations related to the buffer management as follows. (1) The write buffer is used as a buffer not a cache, since all write requests are sequential in NILFS2. (2) For better performance, the main architectural factor is the bus bandwidth, but 332 MHz is enough. Instead, the read buffer makes a key role in performance improvement while caching data. To enhance SSDs, accordingly, it is an effective way to make efficient read buffer management policies, and one of the examples is tracking the valid data zone in NILFS2, which can increase the data hit ratio in read buffers significantly.
Jongwoong HYUN Inbum JUNG Joonwon LEE Seungryoul MAENG
Recently, layer-4 (L4) switches have been widely used as load balancing front-end routers for Web server clusters. The typical L4 switch attempts to balance load among the servers by estimating load using the load metrics measured in the front-end and/or the servers. However, insufficient load metrics, measurement overhead, and feedback delay often cause misestimate of server load. This may incur significant dynamic load imbalance among the servers particularly when the variation of requested content is high. In this paper, we propose a new content sniffer based load distribution strategy. By sniffing the requests being forwarded to the servers and by extracting load metrics from them, the L4 switch with our strategy more timely and accurately estimates server load without the help of back-end servers. Thus it can properly react to dynamic load imbalance among the servers under various workloads. Our experimental results demonstrate substantial performance improvements over other load balancing strategies used in the typical L4 switch.