Shinobu NAGAYAMA Tsutomu SASAO Jon T. BUTLER Mitchell A. THORNTON Theodore W. MANIKAS
In the optimization of decision diagrams, variable reordering approaches are often used to minimize the number of nodes. However, such approaches are less effective for analysis of multi-state systems given by monotone structure functions. Thus, in this paper, we propose algorithms to minimize the number of edges in an edge-valued multi-valued decision diagram (EVMDD) for fast analysis of multi-state systems. The proposed algorithms minimize the number of edges by grouping multi-valued variables into larger-valued variables. By grouping multi-valued variables, we can reduce the number of nodes as well. To show the effectiveness of the proposed algorithms, we compare the proposed algorithms with conventional optimization algorithms based on a variable reordering approach. Experimental results show that the proposed algorithms reduce the number of edges by up to 15% and the number of nodes by up to 47%, compared to the conventional ones. This results in a speed-up of the analysis of multi-state systems by about three times.
This paper shows a method to represent interval functions by using head-tail expressions. The head-tail expressions represent greater-than GT(X:A) functions, less-than LT(X:B) functions, and interval functions IN0(X:A,B) more efficiently than sum-of-products expressions. Let n be the number of bits to represent the largest value in the interval (A,B). This paper proves that a head-tail expression (HT) represents an interval function with at most n words in a ternary content addressable memory (TCAM) realization. It also shows the average numbers of factors to represent interval functions by HTs for up to n=16, which were obtained by a computer simulation. It also conjectures that, for sufficiently large n, the average number of factors to represent n-variable interval functions by HTs is at most 2/3n-5/9. Experimental results also show that, for n≥10, to represent interval functions, HTs require at least 20% fewer factors than MSOPs, on the average.
Shinobu NAGAYAMA Tsutomu SASAO Jon T. BUTLER
Numerical function generators (NFGs) realize arithmetic functions, such as ex,sin(πx), and , in hardware. They are used in applications where high-speed is essential, such as in digital signal or graphics applications. We introduce the edge-valued binary decision diagram (EVBDD) as a means of reducing the delay and memory requirements in NFGs. We also introduce a recursive segmentation algorithm, which divides the domain of the function to be realized into segments, where the given function is realized as a polynomial. This design reduces the size of the multiplier needed and thus reduces delay. It is also shown that an adder can be replaced by a set of 2-input AND gates, further reducing delay. We compare our results to NFGs designed with multi-terminal BDDs (MTBDDs). We show that EVBDDs yield a design that has, on the average, only 39% of the memory and 58% of the delay of NFGs designed using MTBDDs.
Munehiro MATSUURA Tsutomu SASAO
A multiple-output function can be represented by a binary decision diagram for characteristic function (BDD_for_CF). This paper presents a method to represent multiple-output incompletely specified functions using BDD_for_CFs. An algorithm to reduce the widths of BDD_for_CFs is presented. This method is useful for decomposition of incompletely specified multiple-output functions. Experimental results for radix converters, adders, a multiplier, and lists of English words show that this method is useful for the synthesis of LUT cascades. An implementation of English words list by LUT cascades and an auxiliary memory is also shown.
Hafiz Md. HASAN BABU Tsutomu SASAO
This paper proposes a method to construct smaller binary decision diagrams for characteristic functions (BDDs for CFs). A BDD for CF represents an n-input m-output function, and evaluates all the outputs in O(n+m) time. We derive an upper bound on the number of nodes of the BDD for CF of n-bit adders (adrn). We also compare complexities of BDDs for CFs with those of shared binary decision diagrams (SBDDs) and multi-terminal binary decision diagrams (MTBDDs). Our experimental results show: 1) BDDs for CFs are usually much smaller than MTBDDs; 2) for adrn and for some benchmark circuits, BDDs for CFs are the smallest among the three types of BDDs; and 3) the proposed method often produces smaller BDDs for CFs than an existing method.
Shinobu NAGAYAMA Tsutomu SASAO
In this paper, we propose a compact representation of logic functions using Multi-valued Decision Diagrams (MDDs) called heterogeneous MDDs. In a heterogeneous MDD, each variable may take a different domain. By partitioning binary input variables and representing each partition as a single multi-valued variable, we can produce a heterogeneous MDD with 16% smaller memory size than a Reduced Ordered Binary Decision Diagram (ROBDD), and with comparable memory size to Free Binary Decision Diagrams (FBDDs). And also, heterogeneous MDDs have shorter Average Path Length (APL) than ROBDDs and FBDDs. We minimized a large number of benchmark functions to show the compactness of heterogeneous MDDs.
Tsutomu SASAO Takashi MATSUBARA Katsufumi TSUJI Yoshiaki KOGA
A universal interconnection network implements arbitrary interconnections among n terminals. This paper considers a problem to realize such a network using contact switches. When n=2, it can be implemented with a single switch. The number of different connections among n terminals is given by the Bell number B(n). The Bell number shows the total number of methods to partition n distinct elements. For n=2, 3, 4, 5 and 6, the corresponding Bell numbers are 2, 5, 15, 52, and 203, respectively. This paper shows a method to realize an n terminal universal interconnection network with $rac {3}{8}(n^2-1)$ contact switches when n=2m+1≥5, and $rac {n}{8}(3n+2)$ contact switches, when n=2m≥6. Also, it shows that a lower bound on the number of contact switches to realize an n-terminal universal interconnection network is ⌈log 2B(n)⌉, where B(n) is the Bell number.
Hiroki NAKAHARA Tsutomu SASAO Munehiro MATSUURA
This paper shows a design method for a sequential circuit by using a Look-Up Table (LUT) ring. The method consists of two steps: The first step partitions the outputs into groups. The second step realizes them by LUT cascades, and allocates the cells of the cascades into the memory. The system automatically finds a fast implementation by maximally utilizing available memory. With the presented algorithm, we can easily design sequential circuits satisfying given specifications. The paper also compares the LUT ring with logic simulator to realize sequential circuits: the LUT ring is 25 to 237 times faster than a logic simulator that uses the same amount of memory.
Hafiz Md. HASAN BABU Tsutomu SASAO
In this paper, we propose a method to minimize multiple-valued decision diagrams (MDDs) for multiple-output functions. We consider the following: (1) a heuristic for encoding the 2-valued inputs; and (2) a heuristic for ordering the multiple-valued input variables based on sampling, where each sample is a group of outputs. We first generate a 4-valued input 2-valued multiple-output function from the given 2-valued input 2-valued functions. Then, we construct an MDD for each sample and find a good variable ordering. Finally, we generate a variable ordering from the orderings of MDDs representing the samples, and minimize the entire MDDs. Experimental results show that the proposed method is much faster, and for many benchmark functions, it produces MDDs with fewer nodes than sifting. Especially, the proposed method generates much smaller MDDs in a short time for benchmark functions when several 2-valued input variables are grouped to form multiple-valued variables.
Hafiz Md. HASAN BABU Tsutomu SASAO
This paper describes a method to represent m output functions using shared multi-terminal binary decision diagrams (SMTBDDs). The SMTBDD(k) consists of multi-terminal binary decision diagrams (MTBDDs), where each MTBDD represents k output functions. An SMTBDD(k) is the generalization of shared binary decision diagrams (SBDDs) and MTBDDs: for k=1, it is an SBDD, and for k=m, it is an MTBDD. The size of a BDD is the total number of nodes. The features of SMTBDD(k)s are: 1) they are often smaller than SBDDs or MTBDDs; and 2) they evaluate k outputs simultaneously. We also propose an algorithm for grouping output functions to reduce the size of SMTBDD(k)s. Experimental results show the compactness of SMTBDD(k)s. An SMTBDDmin denotes the smaller SMTBDD which is either an SMTBDD(2) or an SMTBDD(3) with fewer nodes. The average relative sizes for SBDDs, MTBDDs, and SMTBDDs are 1. 00, 152. 73, and 0. 80, respectively.
Yukihiro IGUCHI Tsutomu SASAO Munehiro MATSUURA
Three types of ternary decision diagrams (TDDs) are considered: AND -TDDs, EXOR-TDDs, and Kleene-TDDs. Kleene-TDDs are useful for logic simulation in the presence of unknown inputs. Let N(BDD:f), N(AND-TDD:f), and N(EXOR-TDD:f) be the number of non-terminal nodes in the BDD, the AND-TDD, and the EXOR-TDD for f, respectively. Let N(Kleene-TDD:) be the number of non-terminal nodes in the Kleene -TDD for , where is the regular ternary function corresponding to f. Then N(BDD:f) N(TDD:f). For parity functions, N(BDD:f)=N(AND-TDD:f)=N(EXOR-TDD:f)=N(Kleene-TDD:). For unate functions,N(BDD:f)=N(AND-TDD:f). The sizes of Kleene-TDDs are O(3n/n), and O(n3) for arbitrary functions, and symmetric functions, respectively. There exist a 2n-variable function, where Kleene-TDDs require O(n) nodes with the best order, while O(3n) nodes in the worst order.
Tsutomu SASAO Debatosh DEBNATH
A generalized Reed-Muller expression (GRM) is obtained by negating some of the literals in a positive polarity Reed-Muller expression (PPRM). There are at most 2(n2)^(n-1) different GRMs for an n-variable function. A minimum GRM is one with the fewest products. This paper presents certain properties and an exact minimization algorithm for GRMs. The minimization algorithm uses binary decision diagrams. Up to five variables, all the representative functions of NP-equivalence classes were generated and minimized. Tables compare the number of products necessary to represent four-and five-variable functions for four classes of expressions: PPRMs, FPRMs, GRMs and SOPs. GRMs require, on the average, fewer products than sum-of-products expressions (SOPs), and have easily testable realizations.
Hafiz Md. HASAN BABU Tsutomu SASAO
This paper considers methods to design multiple-output networks based on decision diagrams (DDs). TDM (time-division multiplexing) systems transmit several signals on a single line. These methods reduce: 1) hardware; 2) logic levels; and 3) pins. In the TDM realizations, we consider three types of DDs: shared binary decision digrams (SBDDs), shared multiple-valued decision diagrams (SMDDs), and shared multi-terminal multiple-valued decision diagrams (SMTMDDs). In the network, each non-terminal node of a DD is realized by a multiplexer (MUX). We propose heuristic algorithms to derive SMTMDDs from SBDDs. We compare the number of non-terminal nodes in SBDDs, SMDDs, and SMTMDDs. For nrm n, log n, and for many other benchmark functions, SMTMDD-based realizations are more economical than other ones, where nrm n is a (2n)-input (n1)-output function computing (X2+Y2)+0.5, log n is an n-input n-output function computing (2n1)log(x1)/nlog2, and a denotes the largest integer not greater than a.
Hiroki NAKAHARA Tsutomu SASAO Munehiro MATSUURA Yoshifumi KAWAMURA
The parallel branching program machine (PBM128) consists of 128 branching program machines (BMs) and a programmable interconnection. To represent logic functions on BMs, we use quaternary decision diagrams. To evaluate functions, we use 3-address quaternary branch instructions. We realized many benchmark functions on the PBM128, and compared its memory size, computation time, and power consumption with the Intel's Core2Duo microprocessor. The PBM128 requires approximately a quarter of the memory for the Core2Duo, and is 21.4-96.1 times faster than the Core2Duo. It dissipates a quarter of the power of the Core2Duo. Also, we realized packet filters such as an access controller and a firewall, and compared their performance with software on the Core2Duo. For these packet filters, the PBM128 requires approximately 17% of the memory for the Core2Duo, and is 21.3-23.7 times faster than the Core2Duo.
Hui QIN Tsutomu SASAO Yukihiro IGUCHI
This paper addresses a pipelined partial rolling (PPR) architecture for the AES encryption. The key technique is the PPR architecture. With the proposed architecture on the Altera Stratix FPGA, two PPR implementations achieve 6.45 Gbps throughput and 12.78 Gbps throughput, respectively. Compared with the unrolling implementation that achieves a throughput of 22.75 Gbps on the same FPGA, the two PPR implementations improve the memory efficiency (i.e., throughput divided by the size of memory for core) by 13.4% and 12.3%, respectively, and reduce the amount of the memory by 75% and 50%, respectively. Also, the PPR implementation has a up to 9.83% higher memory efficiency than the fastest previous FPGA implementation known to date. In terms of resource efficiency (i.e., throughput divided by the equivalent logic element or slice), one PPR implementation offers almost the same as the rolling implementation, and the other PPR implementation offers a medium value between the rolling implementation and the unrolling implementation that has the highest resource efficiency. However, the two PPR implementations can be implemented on the minimum-sized Stratix FPGA while the unrolling implementation cannot. The PPR architecture fills the gap between unrolling and rolling architectures and is suitable for small and medium-sized FPGAs.
Debatosh DEBNATH Tsutomu SASAO
This paper presents a design method for three-level programmable logic arrays (PLAs), which have input decoders and two-input EXOR gates at the outputs. The PLA realizes an EXOR of two sum-of-products expressions (EX-SOP) for multiple-valued input two-valued output functions. We developed an output phase optimization method for EX-SOPs where some outputs of the function are minimized in the complemented form and presented techniques to minimize EX-SOPs for adders by using an extension of Dubrova-Miller-Muzio's AOXMIN algorithm. The proposed algorithm produces solutions with a half products of AOXMIN-like algorithm in 250 times shorter time for large adders with two-valued inputs. We also proved that an n-bit adder with two-valued inputs requires at most 32n-2+7n-5 products in an EX-SOP while it is known that a sum-of-products expression (SOP) requires 62n-4n-5 products.
Hiroki NAKAHARA Tsutomu SASAO Munehiro MATSUURA
This paper shows a design method for a regular expression matching circuit based on a decomposed automaton. To implement a regular expression matching circuit, first, we convert a regular expression into a non-deterministic finite automaton (NFA). Then, to reduce the number of states, we convert the NFA into a merged-states non-deterministic finite automaton with unbounded string transition (MNFAU) using a greedy algorithm. Next, to realize it by a feasible amount of hardware, we decompose the MNFAU into a deterministic finite automaton (DFA) and an NFA. The DFA part is implemented by an off-chip memory and a simple sequencer, while the NFA part is implemented by a cascade of logic cells. Also, in this paper, we show that the MNFAU based implementation has lower area complexity than the DFA and the NFA based ones. Experiments using regular expressions form SNORT shows that, as for the embedded memory size per a character, the MNFAU is 17.17-148.70 times smaller than DFA methods. Also, as for the number of LCs (Logic Cells) per a character, the MNFAU is 1.56-5.12 times smaller than NFA methods. This paper describes detail of the MEMOCODE2010 HW/SW co-design contest for which we won the first place award.
Debatosh DEBNATH Tsutomu SASAO
Fixed polarity Reed-Muller expressions (FPRMs) exhibit several useful properties that make them suitable for many practical applications. This paper presents an exact minimization algorithm for FPRMs for incompletely specified functions. For an n-variable function with α unspecified minterms there are 2n+α distinct FPRMs, and a minimum FPRM is one with the fewest product terms. To find a minimum FPRM the algorithm requires to determine an assignment of the incompletely specified minterms. This is accomplished by using the concept of integer-valued functions in conjunction with an extended truth vector and a weight vector. The vectors help formulate the problem as an assignment of the variables of integer-valued functions, which are then efficiently manipulated by using multi-terminal binary decision diagrams for finding an assignment of the unspecified minterms. The effectiveness of the algorithm is demonstrated through experimental results for code converters, adders, and randomly generated functions.
Shinobu NAGAYAMA Tsutomu SASAO Jon T. BUTLER
This paper presents an architecture and a synthesis method for compact numerical function generators (NFGs) for trigonometric, logarithmic, square root, reciprocal, and combinations of these functions. Our NFG partitions a given domain of the function into non-uniform segments using an LUT cascade, and approximates the given function by a quadratic polynomial for each segment. Thus, we can implement fast and compact NFGs for a wide range of functions. Experimental results show that: 1) our NFGs require, on average, only 4% of the memory needed by NFGs based on the linear approximation with non-uniform segmentation; 2) our NFG for 2x-1 requires only 22% of the memory needed by the NFG based on a 5th-order approximation with uniform segmentation; and 3) our NFGs achieve about 70% of the throughput of the existing table-based NFGs using only a few percent of the memory. Thus, our NFGs can be implemented with more compact FPGAs than needed for the existing NFGs. Our automatic synthesis system generates such compact NFGs quickly.
Shinobu NAGAYAMA Tsutomu SASAO Jon T. BUTLER
Index generation functions model content-addressable memory, and are useful in virus detectors and routers. Linear decompositions yield simpler circuits that realize index generation functions. This paper proposes a balanced decision tree based heuristic to efficiently design linear decompositions for index generation functions. The proposed heuristic finds a good linear decomposition of an index generation function by using appropriate cost functions and a constraint to construct a balanced tree. Since the proposed heuristic is fast and requires a small amount of memory, it is applicable even to large index generation functions that cannot be solved in a reasonable time by existing heuristics. This paper shows time and space complexities of the proposed heuristic, and experimental results using some large examples to show its efficiency.