Mohammad ALIMUDDIN Hussein M. ALNUWEIRI
This paper proposes a number of simple, yet very effective, cell switching architectures that employ shared memory as a basic switching component. Employing small shared-memory switching has several major advantages. First, by taking advantage of commercially available memory technologies, ATM switch design can be simplified to determining a suitable shared-memory module size, and identifying a proper interconnection among the modules. In this way, switch architectures can be reusable and able to evolve as memory technology advances. Second, shared memory greatly enhances buffer space utilization, allows the implementation of flexible and fair buffer allocation policies for multiple services. The switch architectures presented in this paper offer a number of alternative shared buffering schemes including, shared output, input with shared output, and multistage shared buffering. The proposed architectures employ simple, self-routing, interconnection fabrics. We present several simulation results that demonstrate the superior performance of our switch architectures under uniform, bursty, and non-uniform (or hot-spot) input traffic.
King-Sun CHAN Sammy CHAN Kwan Lawrence YEUNG King-Tim KO Eric W. M. WONG
A large-scale modular multicast ATM switch based on a three-stage Clos network architecture is proposed and its performance is studied in this paper. The complexity of our proposed switch is NN if the switch size is NN. The first stage of the proposed multicast switch consists of n sorting modules, where n=N. Each sorting module has n inputs and n outputs and is responsible for traffic distribution. The second and third stages consist of modified Knockout switches which are responsible for packet replication and switching. Although it is a multipath network, cell sequence is preserved because only output buffers are used in this architecture. The proposed multicast switch has the following advantages: 1) it is modular and suitable for large scale deployment; 2) no dedicated copy network is required since copying and switching are performed simultaneously; 3) two-stage packet replication is used which gives a maximum fan-out of n2; 4) translation tables are distributed which gives manageable table sizes; 5) high throughput performance for both uniform and nonuniform input traffic; 6) self-routing scheme is used. The performance of the switch under uniform and non-uniform input traffic is studied and numerical examples demonstrate that the cell loss probability is significantly improved when the distribution network is used. In a particular example, it is shown that for the largest cell loss probability in the second stage to be less then 10-11, the knockout expander, with the use of the distribution network, needs only be larger than 6. On the other hand, without the distribution network, the knockout expander must be larger than 13.
While active researches have been continuously made on the ATM switch architectures and the QoS service guarantees, most of them have been treated independently in the past. In this paper, we first explain the architectural requirement on the ATM switches to implement the mechanism of QoS guarantees in the context of ATM congestion control. Then we discuss how a vital link between two should be built, and remaining problems are pointed out.
Jonathan TURNER Naoaki YAMANAKA
The rapid development of Asynchronous Transfer Mode technology in the last 10-15 years has stimulated renewed interest in the design and analysis of switching systems, leading to new ideas for system designs and new insights into the performance and evaluation of such systems. As ATM moves closer to realizing the vision of ubiquitous broadband ISDN services, the design of switching systems takes on growing importance. This paper seeks to clarify the key architectural issues for ATM switching system design and provides a survey of the current state-of-the-art.
Kazumaro AOKI Kunio KOBAYASHI Shiho MORIAI
This paper presents the results of the best differential characteristic search of FEAL. The search algorithm for the best differential characteristic (best linear expression) was already presented by Matsui, and improvements on this algorithm were presented by Moriai et al. We further improve the speed of the search algorithm. For example, the search time for the 7-round best differential characteristic of FEAL is reduced to about 10 minutes (Pentium/166 MHz), which is about 212. 6 times faster than Matsui's algorithm. Moreover, we determine all the best differential characteristics of FEAL for up to 32 rounds assuming all S-boxes are independent. As a result, we confirm that the N-round (7N32) best differential characteristic probability of FEAL is 2-2N, which was found by Biham. For N=6, we find 6-round differential characteristics with a greater probability, 2-11, than that previously discovered, 2-12.
Hideyuki WATANABE Shigeru KATAGIRI
In general cases of pattern recognition, a pattern to be recognized is first represented by a set of features and the measured values of the features are then classified. Finding features relevant to recognition is thus an important issue in recognizer design. As a fundamental design framework taht systematically enables one to realize such useful features, the Subspace Method (SM) has been extensively used in various recognition tasks. However, this promising methodological framework is still inadequate. The discriminative power of early versions was not very high. The training behavior of a recent discriminative version called the Learning Subspace Method has not been fully clarified due to its empirical definition, though its discriminative power has been improved. To alleviate this insufficiency, we propose in this paper a new discriminative SM algorithm based on the Minimum Classification Error/Generalized Probabilistic Descent method and show that the proposed algorithm achieves an optimal accurate recognition result, i.e., the (at least locally) minimum recognition error situation, in the probabilistic descent sense.
Hideo MAEJIMA Masahiro KAINAGA Kunio UCHIYAMA
This paper describes the design and architecture for a newly developed microprocessor suitable for consumer applications, which we call SuperH. To achieve both low-power and high-speed, the SuperH architecture includes 16-bit fixed length instruction code and several power saving features. The 16-bit fixed length instruction code makes the SuperH possible to achieve excellent code efficiency for the SPECint benchmarks when compared with conventional microcontrollers and RISC's for workstations and PC's. As a result, the SuperH provides almost the same code efficiency as that of 8-bit microcontrollers, and also achieves similar performance as that of RISC's with 32-bit fixed length instruction code. The SuperH also incorporates several power reduction techniques through the control of clock frequency and clock distribution. Thus, the 16-bit code format, power saving features, and other architectural innovations make the SuperH particularly proficient for portable multi-media applications.
Takashi OKUDA Osamu MATSUMOTO Toshio KUMAMOTO Masao ITO Hiroyuki MOMONO Takahiro MIKI Takeshi TOKUDA
This paper describes the 10-bit 50 MS/s pipelined CMOS A/D Converter using a "reference feed-forward architecture." In this architecture, reference voltage generated in a reference generator block and residual voltage from a DA/subtractor block are fed to the next stage. The reference generator block and DA/subtractor block are constructed using resistive-load, low-gain differential amplifiers. The high-gain, high-speed amplifiers consuming much power are not used. Therefore, the power consumption of this ADC is reduced. The gain matching of the reference voltage with the internal signal range is achieved through the introduction of the reference generator block having the same characteristics as a DA/subtractor block. Each offset voltage of the differential amplifier in the reference generator block and the DA/subtractor block is canceled by the offset cancellation technique, individually. In addition, the front-end sample/hold circuit is eliminated to reduce power consumption. Because of the introduction of high-speed comparators based on the source follower and latch circuit into the first stage A/D subconverter, analog bandwidth is not degraded. This ADC has been fabricated in double-polysilicon, double-metal, 0.5µm CMOS technology, and it operates at 50 MS/s with a 300-mW (Vdd=3.0 V) power consumption. The differential linearity error of less than +/-1 LSB is obtained.
Jianliang XU Katsushi INOUE Yue WANG Akira ITO
This paper first investigates a relationship between inkdot-depth and inkdot-size of inkdot two-way alternating Turing machines and pushdown automata with sublogarithmic space, and shows that there exists a language accepted by a strongly loglog n space-bounded alternating pushdown automaton with inkdot-depth 1, but not accepted by any weakly o (log n) space-bounded and d (n) inkdot-size bounded alternating Turing machine, for any function d (n) such that limn [d (n)log n/n1/2] = 0. In this paper, we also show that there exists an infinite space hierarchy among two-way alternating pushdown automata with sublogarithmic space.
This paper proposes a general expansion architecture for constructing large-scale multicast ATM switches with any type of small multicast switch, called the multicast Universal Multistage Interconnection Network (multicast UniMIN). The proposed architecture consists of a buffered distribution network that can perform cell routing and replication simultaneously, and a column of output switch modules (OSMs). The adoption of channel grouping and virtual first-in-first-out (FIFO) buffers results in high delay/throughput performance, and the distributed lookup table scheme for multicast addressing greatly reduces the size of a single lookup table. Analytical and simulation results show that high delay/throughput performance is obtained for both unicast and multicast traffic, and the proposed architecture yields an even better performance for multicast traffic than for unicast traffic. In addition, the multicast UniMIN switch has such good features as modular expandability, simple hardware, and no internal speed-up operation.
Tadao KASAMI Takuya KOUMOTO Toru FUJIWARA Hiroshi YAMAMOTO Yoshihisa DESAKI Shu LIN
Subtrellises for low-weight codewords of binary linear block codes have been recently used in a number of trellis-based decoding algorithms to achieve near-optimum or suboptimum error performance with a significant reduction in decoding complexity. An algorithm for purging a full code trellis to obtain a low-weight subtrellis has been proposed by H.T. Moorthy et al. This algorithm is effective for codes of short to medium lengths, however for long codes, it becomes very time consuming. This paper investigates the structure and complexity of low-weight subtrellises for binary linear block codes. A construction method for these subtrellises is presented. The state and branch complexities of low-weight subtrellises for Reed-Muller codes and some extended BCH codes are given. In addition, a recursive algorithm for searching the most likely codeword in low-weight subtrellis-based decoding algorithm is proposed. This recursive algorithm is more efficient than the conventional Viterbi algorithm.
Hiroyuki OCHI Yoko KAMIDOI Hideyuki KAWABATA
This paper proposes a new approach that makes it possible for every undergraduate student to perform experiments of developing a Ipipelined RISC processor within limited time available for the course. The approach consists of 4 steps. At the first step, every student implements by himself/herself a pipelined RISC processor which is based on a given, very simple model; it has separate buses for instruction and data memory ("Harvard architecture") to avoid structural hazard, while it completely ignores data control hazards to make implementation easy. Although it is such a "defective" processor, we can test its functionality by giving object code containing sufficient amount of NOP instructions to avoid hazards. At the second step, NOP instructions are deleted and behavior of the developed processor is observed carefully to understand data and control hazards. At the third step, benchmark problems are provided, and every student challenges to improve its performance. Finally every student is requested to present how he/she improved the processor. This paper also describes a new educational FPGA board ASAver.1 which is useful for experiments from introductory class to computer architecture/system class. As a feasibility study, a 16-bit pipelined RISC processor "ASAP-O" has been developed which has eight 16-bit general purpose registers, a 16-bit program counter, and a zero flag, with 10 essential instructions.
Dirk STROOBANDT Jan VAN CAMPENHOUT
In computer hardware there is a constant evolution towards smaller transistor sizes. At the same time, more and more transistors are placed on one chip. Both trends make the pin limitation problem worse. Scaling down chip sizes adds to the shortage of available pins while increasing the number of transistors per chip imposes a higher need for chip terminals. The use of three-dimensional systems would alleviate this pin limitation problem. In order to decide whether the benefits of such systems balance the higher processing costs, one must be able to characterize these benefits accurately. This can be done by estimating important layout properties of electronic designs, such as space requirements and interconnection length values. For a two-dimensional placement, Donath found an upper bound for the average interconnection length that follows the trends of experimentally obtained average lengths. Yet, this upper bound deviates from the experimentally obtained value by a factor of approximately 2 which is not sufficiently accurate for some applications. In this paper, we first extend Donath's technique to a three-dimensional placement. We then compute a significantly more accurate estimate by taking into account the inherent features of the optimal placement process.
In this paper, an efficient approach to the synthesis of CA (Cellular Architecture) -type FPGAs is presented. To exploit the array structure of cells in CA-type FPGAs, logic expressions called Maitra terms which can be mapped directly to the cell arrays are generated by using ETDDs (EXOR Ternary Decision Diagrams). Since a traversal of the ETDD is sufficient to generate a Maitra term which takes O (n) steps where n is the number of nodes in the ETDD, Maitra terms are generated very efficiently. The experiments show that the proposed method generates better results than existing methods.
Victor R.L. SHEN Tzer-Shyong CHEN Feipei LAI
A modified cryptographic key assignment scheme for the dynamic access control in a group-oriented user hierarchy is presented. In the partially ordered set (poset, for short) user hierarchy (GjGi) embedded in a group-oriented (t, n) threshold cryptosystem, the source group Gi has higher security clearance to access the information items held in the target group Gj. If a target group Gj has multipe paths reachable from a source group Gi, we must choose the least cost path to rapidly resolve the dynamic access control problem Furthermore, multiple threshold values are also considered in order to meet the different security requirements.
Abderrazak JEMAI Polen KISSION Ahmed Amine JERRAYA
The analysis of an architecture may provide statistic information on the use of the resources and on the execution time. Some of these information need just a static analysis. Others, such as the execution time, may need dynamic analysis. Moreover as the computation time of behavioral descriptions (control step time unit) and RTL ones (cycle based) may differ a lot, unexpected architectures may be generated by behavioral synthesis. Therefore means to debug the results of behavioral synthesis are required. This paper introduces a new approach to integrate an interactive simulator within a behavioral synthesis tool, thereby allowing concurrent synthesis and simulation. The simulator and the behavioral synthesis are based on the same model. This model allows to link the behavioral description and the architecture produced by synthesis. This paper also discusses an implementation of this concept resulting in a simulator, called AMIS. This tool assists the designer for understanding the results of behavioral synthesis and for architecture exploration. It may also be used to debug the behavioral specification.
Wen-Zen SHEN Jiing-Yuan LIN Jyh-Ming LU
In this paper, we present CB-Power, a hierarchical power analysis and characterization environment of cell-based CMOS circuits. The environment includes two parts, a cell characterization system for timing, input capacitance as well as power and a cell-based power estimation system. The characterization system can characterize basic, complex and transmission gates. During the characterization, input slew rate, output loading, capacitive feedthrough effect and the logic state dependence of nodes in a cell are all taken into account. The characterization methodology separates the power consumption of a cell into three components, e.g., capacitive feedthrough power, short-circuit power and dynamic power. With the characterization data, a cell-based power estimator (CBPE) embedded in Verilog-XL is used for estimating the power consumption of the gates in a circuit. CBPE is also a hierarchical power estimator. Macrocells such as flip-flops and adders are partitioned into primitive gates during power estimation. Experimental results on a set of MCNC benchmark circuits show that the power estimation based on our power modeling and characterization provides within 6% error of SPICE simulation on average while the CPU time consumed is more than two orders of magnitude less.
Nobutsugu FUJINO Takashi KIMOTO Ichiro IIDA
This paper describes a mobile information access system based on a multi-agent architecture. With the rapid progress of wireless data communications, mobile Internet access will be more and more popular. In mobile environments, user location plays an important role for information filtering and flexible communication service. In this paper, we propose a mobile information service system where a user with a handy terminal accesses Internet in an open air to look up map information and related town information. Each user information is managed by an independent agent process. And the agent provides each user with a personal service collaborating with other applications. A map-based information service example based on this architecture is also described.
Yuzo KOGA Choong Seon HONG Yutaka MATSUSHITA
In this paper, we propose a scalable service networking architecture as a TINA-like environment for providing flexibly various mobility services. The proposed architecture provides an environment that enables the advent of service providers and rapidly introduces multimedia applications, considering networks scalability. For supporting customized mobility services, this architecture adopts a new service component, which we call Omnipresent Personal Environment Manager (OpeMgr). In order to support mobile users who move between heterogeneous networks, for instance, between the TINA-like environment and the Internet environment, we propose a structure of a gateway. In addition, the proposed architecture uses the fixed and mobile agent approaches for supporting the user's mobility, and we evaluated their performances with comparing those approaches.
Victor R.L. SHEN Tzer-Shyong CHEN Feipei LAI
A novel cryptographic key assignment scheme for dynamic access control in a user hierarchy is presented. Based on Rabin's public key system and Chinese remainder theorem, each security class SCi is assigned a secret key Ki and some public parameters. In our scheme, a secret key is generated in a bottom-up manner so as to reduce the computation time for key generation and the storage size for public parameters. We also show that our proposed scheme is not only secure but also efficient.