Chengxiang YIN Hongjun ZHANG Rui ZHANG Zilin ZENG Xiuli QI Yuntian FENG
The main idea of filter methods in feature selection is constructing a feature-assessing criterion and searching for feature subset that optimizes the criterion. The primary principle of designing such criterion is to capture the relevance between feature subset and the class as precisely as possible. It would be difficult to compute the relevance directly due to the computation complexity when the size of feature subset grows. As a result, researchers adopt approximate strategies to measure relevance. Though these strategies worked well in some applications, they suffer from three problems: parameter determination problem, the neglect of feature interaction information and overestimation of some features. We propose a new feature selection algorithm that could compute mutual information between feature subset and the class directly without deteriorating computation complexity based on the computation of partitions. In light of the specific properties of mutual information and partitions, we propose a pruning rule and a stopping criterion to accelerate the searching speed. To evaluate the effectiveness of the proposed algorithm, we compare our algorithm to the other five algorithms in terms of the number of selected features and the classification accuracies on three classifiers. The results on the six synthetic datasets show that our algorithm performs well in capturing interaction information. The results on the thirteen real world datasets show that our algorithm selects less yet better feature subset.
Kunihiro FUJIYOSHI Takahisa IMANO
Photo Diode Array (PDA) is the key semiconductor component expected to produce specified output voltage in photo couplers and photo sensors when the light is on. PDA partitioning problem, which is to design PDA, is: Given die area, anode and cathode points, divide the area into N cells, with identical areas, connected in series from anode to cathode. In this paper, we first make restrictions for the problem and reveal the underlying properties of necessary and sufficient conditions for the existence of solutions when the restrictions are satisfied. Then, we propose a method to solve the problem using recursive algorithm, which can be guaranteed to obtain a solution in polynomial time.
Human activity prediction has become a prerequisite for service recommendation and anomaly detection systems in a smart space including ambient assisted living (AAL) and activities of daily living (ADL). In this paper, we present a novel approach to predict the next-activity set in a multi-user smart space. Differing from the majority of the previous studies considering single-user activity patterns, our study considers multi-user activities that occur with a large variety of patterns. Its complexity increases exponentially according to the number of users. In the multi-user smart space, there can be inevitably multiple next-activity candidates after multi-user activities occur. To solve the next-activity problem in a multi-user situation, we propose activity set prediction rather than one activity prediction. We also propose activity sequence partitioning to reduce the complexity of the multi-user activity pattern. This divides an activity sequence into start, ongoing, and finish zones based on the features in the tendency of activity occurrences. The majority of the activities in a multi-user environment occur at the beginning or end, rather than the middle, of an activity sequence. Furthermore, the types of activities typically occurring in each zone can be sufficiently distinguishable. Exploiting these characteristics, we suggest a two-step procedure to predict the next-activity set utilizing a long short-term memory (LSTM) model. The first step identifies the zones to which current activities belong. In the next step, we construct three different LSTM models to predict the next-activity set in each zone. To evaluate the proposed approach, we experimented using a real dataset generated from our campus testbed. Our experiments confirmed the complexity reduction and high accuracy in the next-activity set prediction. Thus, it can be effectively utilized for various applications with context-awareness in a multi-user smart space.
This letter proposes a heuristic algorithm to select check variables, which are points of comparison for error detection, for soft-error tolerant datapaths. Our soft-error tolerance scheme is based on check-and-retry computation and an efficient resource management named speculative resource sharing (SRS). Starting with the smallest set of check variables, the proposed algorithm repeats to add new check variable one by one incrementally and find the minimum latency solution among the series of generated solutions. During the process, each new check variable is selected so that the opportunity of SRS is enlarged. Experimental results show that improvements in latency are achieved compared with the choice of the smallest set of check variables.
Koki ITO Kazushi KAWAMURA Yutaka TAMIYA Masao YANAGISAWA Nozomu TOGAWA
As seen in stream data processing, it is necessary to extract a particular data field from bulk data, where we can use a field-data extractor. Particularly, an (M,N)-field-data extractor reads out any consecutive N bytes from an M-byte register by connecting its input/output using multiplexers (MUXs). However, the number of required MUXs increases too much as the input/output byte widths increase. It is known that partitioning a MUX network leads to reducing the number of MUXs. In this paper, we firstly pick up a multi-layered MUX network, which is generated by repeatedly partitioning a MUX network into a collection of single-layered MUX networks. We show that the multi-layered MUX network is equivalent to the barrel shifter from which redundant MUXs and wires are removed, and we prove that the number of required MUXs becomes the smallest among MUX-network-partitioning based field-data extractors. Next, we propose a rotator-based MUX network for a field-data extractor, which is based on reading out a particular data in an input register to a rotator. The byte width of the rotator is the same as its output register and hence we no longer require any extra wires nor MUXs. By rotating the input data appropriately, we can finally have a right-ordered data into an output register. Experimental results show that a multi-layered MUX network reduces the number of required gates to construct a field-data extractor by up to 97.0% compared with the one using a naive approach and its delay becomes 1.8ns-2.3ns. A rotator-based MUX network with a control circuit also reduces the number of required gates to construct a field-data extractor by up to 97.3% compared with the one using a naive approach and its delay becomes 2.1ns-2.9ns.
This letter proposes a Non-uniform Cell-based Index (NCI) to enable clients to quickly process window queries in the wireless spatial data broadcast environment. To improve the access time, NCI reduces the probe wait time by equalized spacing between indexes, using non-uniformly partitioned cells of data space. Through the performance evaluation, we show the proposed NCI outperforms the existing index schemes for window queries to spatial data in respect of access time.
In this paper, we present two classes of zero difference balanced (ZDB) functions, which are derived by difference balanced functions, and a class of perfect ternary sequences respectively. The proposed functions have parameters not covered in the literature, and can be used to design optimal constant composition codes, and perfect difference systems of sets.
Shanqi PANG Ying WANG Jiao DU Wenju XU
Orthogonal arrays and orthogonal partitions have great significance in communications and coding theory. In this letter, by using a generalized orthogonal partition, Latin squares and orthogonal Latin squares, we present an iterative construction method of orthogonal arrays of strength t and orthogonal partitions. As an application of the method, more orthogonal arrays of strength t and orthogonal partitions than the existing methods can be constructed.
Bin TANG Jianxin LUO Guiqiang NI Weiwei DUAN Yi GAO
This letter proposes a Light Space Partitioned Shadow Maps (LSPSMs) algorithm which implements shadow rendering based on a novel partitioning scheme in light space. In stead of splitting the view frustum like traditional Z-partitioning methods, we split partitions from the projection of refined view frustum in light space. The partitioning scheme is performed dual-directionally while limiting the wasted space. Partitions are created in dynamic number corresponding to the light and view directions. Experiments demonstrate that high quality shadows can be rendered in high efficiency with our algorithm.
Lixin WANG Yutong LU Wei ZHANG Yan LEI
File system workloads are increasing write-heavy. The growing capacity of RAM in modern nodes allows many reads to be satisfied from memory while writes must be persisted to disk. Today's sophisticated local file systems like Ext4, XFS and Btrfs optimize for reads but suffer from workloads dominated by microdata (including metadata and tiny files). In this paper we present an LSM-tree-based file system, RFS, which aims to take advantages of the write optimization of LSM-tree to provide enhanced microdata performance, while offering matching performance for large files. RFS incrementally partitions the namespace into several metadata columns on a per-directory basis, preserving disk locality for directories and reducing the write amplification of LSM-trees. A write-ordered log-structured layout is used to store small files efficiently, rather than embedding the contents of small files into inodes. We also propose an optimization of global bloom filters for efficient point lookups. Experiments show our library version of RFS can handle microwrite-intensive workloads 2-10 times faster than existing solutions such as Ext4, Btrfs and XFS.
This letter proposes an Index based on Irregular Partition of data identifiers (IIP), to enable clients to quickly access multiple data items on a wireless broadcast channel. IIP improves the access time by reducing the index waiting time when clients access multiple data items, through the use of irregular partitioning of the identifier space of data items. Our performance evaluation shows that with respect to access time, the proposed IIP outperforms the existing index schemes supporting multiple data access.
Dun CAO Zhengbao LEI Baofeng JI Chunguo LI
We propose an exponent-based partitioning broadcast protocol (EPBP) to promise the prompt dissemination of emergency message (EM) in vehicular networks. EPBP divides the communication range into segments with different widths iteratively. The width varies corresponding to the exponential curve. The design makes the farther no-empty segment thinner, as a result of which the collision rate of candidates' contention for the relay node decreases and the one-hop message progress increases efficiently. In addition, we adjust the interval of back-off timers to avoid the spurious forwarding problem, and develop more accurate analytical models for the performance. Our simulation verifies these models and show a significant increase of EPBP compared with the state-of-the-art protocols. EM dissemination speed can be improved as 55.94% faster in dense vehicle networks, and packet delivery ratio has risen to higher than 99.99%.
Shu TAJIMA Yusuke KAMEDA Ichiro MATSUDA Susumu ITOH
This paper proposes an efficient lossless coding scheme for color video in RGB 4:4:4 format. For the R signal that is encoded before the other signals at each frame, we employ a block-adaptive prediction technique originally developed for monochrome video. The prediction technique used for the remaining G and B signals is extended to exploit inter-color correlations as well as inter- and intra-frame ones. In both cases, multiple predictors are adaptively selected on a block-by-block basis. For the purpose of designing a set of predictors well suited to the local properties of video signals, we also explore an appropriate setting for the spatiotemporal partitioning of a video volume.
Koki ITO Kazushi KAWAMURA Yutaka TAMIYA Masao YANAGISAWA Nozomu TOGAWA
An (M,N)-field-data extractor reads out any consecutive N bytes from an M-byte register by connecting its input/output using a multiplexer (MUX) network. It is used in packet analysis and/or stream data processing for video/audio data. In this letter, we propose an efficient MUX network for an (M,N)-field-data extractor. By bi-partitioning a simple MUX network into an upper one and a lower one, we can theoretically reduce the number of required MUXs without increasing the MUX network depth. Experimental results show that we can reduce the gate count by up to 92% compared to a naive approach.
Yung-Hao LAI Yang-Lang CHANG Jyh-Perng FANG Lena CHANG Hirokazu KOBAYASHI
Through-silicon vias (TSV) allow the stacking of dies into multilayer structures, and solve connection problems between neighboring tiers for three-dimensional (3D) integrated circuit (IC) technology. Several studies have investigated the placement and routing in 3D ICs, but not much has focused on circuit partitioning for 3D stacking. However, with the scaling trend of CMOS technology, the influence of the area of I/O pads, power/ground (P/G) pads, and TSVs should not be neglected in 3D partitioning technology. In this paper, we propose an iterative layer-aware partitioning algorithm called EX-iLap, which takes into account the area of I/O pads, P/G pads, and TSVs for area balancing and minimization of inter-tier interconnections in a 3D structure. Minimizing the quantity of TSVs reduces the total silicon die area, which is the main source of recurring costs during fabrication. Furthermore, estimations of the number of TSVs and the total area are somewhat imprecise if P/G TSVs are not taken into account. Therefore, we calculate the power consumption of each cell and estimate the number of P/G TSVs at each layer. Experimental results show that, after considering the power of interconnections and pads, our algorithm can reduce area-overhead by ~39% and area standard deviation by ~69%, while increasing the quantity of TSVs by only 12%, as compared to the algorithm without considering the power of interconnections and pads.
Mengmeng ZHANG Heng ZHANG Zhi LIU
The new generation video standard, i.e., High-efficiency Video Coding (HEVC), shows a significantly improved efficiency relative to the last standard, i.e., H.264. However, the quad tree structured coding units (CUs), which are adopted in HEVC to improve compression efficiency, cause high computational complexity. In this study, a novel fast algorithm is proposed for CU partition in intra coding to reduce the computational complexity. A rough minimum depth prediction of the largest CU method and an early termination method for CU partition based on the total coding bits of the current CU are employed. Many approaches have been proposed to reduce the encoding complexity of HEVC, but these methods do not use the total coding bits of the current CU as the main basis for judgment to judge the CU complexity. Compared with the reference software HM16.6, the proposed algorithm reduces encoding time by 45% on average and achieves an approximately 1.1% increase in Bjntegaard delta bit rate and a negligible peak signal-to-noise ratio loss.
Jiasen HUANG Junyan REN Wei LI
Sparse Matrix-Vector Multiplication (SpMxV) is widely used in many high-performance computing applications, including information retrieval, medical imaging, and economic modeling. To eliminate the overhead of zero padding in SpMxV, prior works have focused on partitioning a sparse matrix into row vectors sets (RVS's) or sub-matrices. However, performance was still degraded due to the sparsity pattern of a sparse matrix. In this letter, we propose a heuristics, called recursive merging, which uses a greedy approach to recursively merge those row vectors of nonzeros in a matrix into the RVS's, such that each set included is ensured a local optimal solution. For ten uneven benchmark matrices from the University of Florida Sparse Matrix Collection, our proposed partitioning algorithm is always identified as the method with the highest mean density (over 96%), but with the lowest average relative difference (below 0.07%) over computing powers.
Akihiro SUDA Hideki TAKASE Kazuyoshi TAKAGI Naofumi TAKAGI
We propose a synthesis method of nested loops into parallelized circuits by integrating the polyhedral optimization, which is a state-of-the-art technique in the field of software, into high-level synthesis. Our method constructs circuits equipped with multiple processing elements (PEs), using information generated by the polyhedral optimizing compiler. Since multiple PEs cannot concurrently access the off-chip RAM, a method for constructing on-chip buffers is also proposed. Our buffering method reduces the off-chip RAM access conflicts and further enables burst accesses and data reuses. In our experimental result, the buffered circuits generated by our method are 8.2 times on average and 26.5 times at maximum faster than the sequential non-buffered ones, when each of the parallelized circuits is configured with eight PEs.
Gaoxing CHEN Lei SUN Zhenyu LIU Takeshi IKENAGA
High efficiency video coding (HEVC) is a video compression standard that outperforms the predecessor H.264/AVC by doubling the compression efficiency. To enhance the intra prediction accuracy, 35 intra prediction modes were used in the prediction units (PUs), with partition sizes ranging from 4 × 4 to 64 × 64 in HEVC. However, the manifold prediction modes dramatically increase the encoding complexity. This paper proposes a fast mode- and depth-decision algorithm based on edge detection and reconfiguration to alleviate the large computational complexity in intra prediction with trivial degradation in accuracy. For mode decision, we propose pixel gradient statistics (PGS) and mode refinement (MR). PGS uses pixel gradient information to assist in selecting the prediction mode after rough mode decision (RMD). MR uses the neighboring mode information to select the best PU mode (BPM). For depth decision, we propose a partition reconfiguration algorithm to replace the original partitioning order with a more reasonable structure, by using the smoothness of the coding unit as a criterion in deciding the prediction depth. Smoothness detection is based on the PGS result. Experiment results show that the proposed method saves about 41.50% of the original processing time with little degradation (BD bitrate increased by 0.66% and BDPSNR decreased by 0.060dB) in the coding gain.
Because dielectrics between active layers have low thermal conductivities, there is a demand to reduce the temperature increase in three-dimensional integrated circuits (3D ICs). This paper demonstrates that, in the design of 3D ICs, different layer assignments often lead to different temperature increases. Based on this observation, we are motivated to perform temperature-aware layer assignment. Our work includes two parts. Firstly, an integer linear programming (ILP) approach that guarantees a minimum temperature increase is proposed. Secondly, a polynomial-time heuristic algorithm that reduces the temperature increase is proposed. Compared with the previous work, which does not take the temperature increase into account, the experimental results show that both our ILP approach and our heuristic algorithm produce a significant reduction in the temperature increase with a very small area overhead.