1-3hit |
This research addresses a high-speed computation method for the Kleene star of the weighted adjacency matrix in a max-plus algebraic system. We focus on systems whose precedence constraints are represented by a directed acyclic graph and implement it on a Cell Broadband EngineTM (CBE) processor. Since the resulting matrix gives the longest travel times between two adjacent nodes, it is often utilized in scheduling problem solvers for a class of discrete event systems. This research, in particular, attempts to achieve a speedup by using two approaches: parallelization and SIMDization (Single Instruction, Multiple Data), both of which can be accomplished by a CBE processor. The former refers to a parallel computation using multiple cores, while the latter is a method whereby multiple elements are computed by a single instruction. Using the implementation on a Sony PlayStation 3TM equipped with a CBE processor, we found that the SIMDization is effective regardless of the system's size and the number of processor cores used. We also found that the scalability of using multiple cores is remarkable especially for systems with a large number of nodes. In a numerical experiment where the number of nodes is 2000, we achieved a speedup of 20 times compared with the method without the above techniques.
This paper presents effective and efficient implementation techniques for DMA buffer overflow elimination on the Cell Broadband EngineTM (Cell/B.E.) processor. In the Cell/B.E. programming model, application developers manually issue DMA commands to transfer data from the system memory to the local memories of the Cell/B.E. cores. Although this allows us to eliminate cache misses or cache invalidation overhead, it requires careful management of the buffer arrays for DMA in the application programs to prevent DMA buffer overflows. To guard against DMA buffer overflows, we introduced safe DMA handling functions for the applications to use. To improve and minimize the performance overhead of buffer overflow prevention, we used three different optimization techniques that take advantage of SIMD operations: branch-hint-based optimizations, jump-table-based optimizations and self-modifying-based optimizations. Our optimized implementation prevents all DMA buffer overflows with minimal performance overhead, only 2.93% average slowdown in comparison to code without the buffer overflow protection.
This paper presents a method for markerless human motion capture using a single camera. It uses tree-based filtering to efficiently propagate a probability distribution over poses of a 3D body model. The pose vectors and associated shapes are arranged in a tree, which is constructed by hierarchical pairwise clustering, in order to efficiently evaluate the likelihood in each frame. A new likelihood function based on silhouette matching is proposed that improves the pose estimation of thinner body parts, i.e. the limbs. The dynamic model takes self-occlusion into account by increasing the variance of occluded body-parts, thus allowing for recovery when the body part reappears. We present two applications of our method that work in real-time on a Cell Broadband EngineTM: a computer game and a virtual clothing application.