Hiroaki AKUTSU Ko ARAI
Lanxi LIU Pengpeng YANG Suwen DU Sani M. ABDULLAHI
Xiaoguang TU Zhi HE Gui FU Jianhua LIU Mian ZHONG Chao ZHOU Xia LEI Juhang YIN Yi HUANG Yu WANG
Yingying LU Cheng LU Yuan ZONG Feng ZHOU Chuangao TANG
Jialong LI Takuto YAMAUCHI Takanori HIRANO Jinyu CAI Kenji TEI
Wei LEI Yue ZHANG Hanfeng XIE Zebin CHEN Zengping CHEN Weixing LI
David CLARINO Naoya ASADA Atsushi MATSUO Shigeru YAMASHITA
Takashi YOKOTA Kanemitsu OOTSU
Xiaokang Jin Benben Huang Hao Sheng Yao Wu
Tomoki MIYAMOTO
Ken WATANABE Katsuhide FUJITA
Masashi UNOKI Kai LI Anuwat CHAIWONGYEN Quoc-Huy NGUYEN Khalid ZAMAN
Takaharu TSUBOYAMA Ryota TAKAHASHI Motoi IWATA Koichi KISE
Chi ZHANG Li TAO Toshihiko YAMASAKI
Ann Jelyn TIEMPO Yong-Jin JEONG
Haruhisa KATO Yoshitaka KIDANI Kei KAWAMURA
Jiakun LI Jiajian LI Yanjun SHI Hui LIAN Haifan WU
Gyuyeong KIM
Hyun KWON Jun LEE
Fan LI Enze YANG Chao LI Shuoyan LIU Haodong WANG
Guangjin Ouyang Yong Guo Yu Lu Fang He
Yuyao LIU Qingyong LI Shi BAO Wen WANG
Cong PANG Ye NI Jia Ming CHENG Lin ZHOU Li ZHAO
Nikolay FEDOROV Yuta YAMASAKI Masateru TSUNODA Akito MONDEN Amjed TAHIR Kwabena Ebo BENNIN Koji TODA Keitaro NAKASAI
Yukasa MURAKAMI Yuta YAMASAKI Masateru TSUNODA Akito MONDEN Amjed TAHIR Kwabena Ebo BENNIN Koji TODA Keitaro NAKASAI
Kazuya KAKIZAKI Kazuto FUKUCHI Jun SAKUMA
Yitong WANG Htoo Htoo Sandi KYAW Kunihiro FUJIYOSHI Keiichi KANEKO
Waqas NAWAZ Muhammad UZAIR Kifayat ULLAH KHAN Iram FATIMA
Haeyoung Lee
Ji XI Pengxu JIANG Yue XIE Wei JIANG Hao DING
Weiwei JING Zhonghua LI
Sena LEE Chaeyoung KIM Hoorin PARK
Akira ITO Yoshiaki TAKAHASHI
Rindo NAKANISHI Yoshiaki TAKATA Hiroyuki SEKI
Chuzo IWAMOTO Ryo TAKAISHI
Chih-Ping Wang Duen-Ren Liu
Yuya TAKADA Rikuto MOCHIDA Miya NAKAJIMA Syun-suke KADOYA Daisuke SANO Tsuyoshi KATO
Yi Huo Yun Ge
Rikuto MOCHIDA Miya NAKAJIMA Haruki ONO Takahiro ANDO Tsuyoshi KATO
Koichi FUJII Tomomi MATSUI
Yaotong SONG Zhipeng LIU Zhiming ZHANG Jun TANG Zhenyu LEI Shangce GAO
Souhei TAKAGI Takuya KOJIMA Hideharu AMANO Morihiro KUGA Masahiro IIDA
Jun ZHOU Masaaki KONDO
Tetsuya MANABE Wataru UNUMA
Kazuyuki AMANO
Takumi SHIOTA Tonan KAMATA Ryuhei UEHARA
Hitoshi MURAKAMI Yutaro YAMAGUCHI
Jingjing Liu Chuanyang Liu Yiquan Wu Zuo Sun
Zhenglong YANG Weihao DENG Guozhong WANG Tao FAN Yixi LUO
Yoshiaki TAKATA Akira ONISHI Ryoma SENDA Hiroyuki SEKI
Dinesh DAULTANI Masayuki TANAKA Masatoshi OKUTOMI Kazuki ENDO
Kento KIMURA Tomohiro HARAMIISHI Kazuyuki AMANO Shin-ichi NAKANO
Ryotaro MITSUBOSHI Kohei HATANO Eiji TAKIMOTO
Genta INOUE Daiki OKONOGI Satoru JIMBO Thiem Van CHU Masato MOTOMURA Kazushi KAWAMURA
Hikaru USAMI Yusuke KAMEDA
Yinan YANG
Takumi INABA Takatsugu ONO Koji INOUE Satoshi KAWAKAMI
Fengshan ZHAO Qin LIU Takeshi IKENAGA
Naohito MATSUMOTO Kazuhiro KURITA Masashi KIYOMI
Tomohiro KOBAYASHI Tomomi MATSUI
Shin-ichi NAKANO
Ming PAN
Hiroyuki EBARA Noriyuki FUKUYAMA Hideo NAKANO Yoshiro NAKANISHI
Roundness is one of the most important geometric measures for circular objects in the process of mechanical assembly. It is the amount of variation in a circular size which can be permitted. To compute roundness, the authors have already proposed an exact polynomial-time algorithm whose time complexity is O(n2). In this paper, we show that this roundness algorithm can be improved more efficiently, by introducing the deletion of the unnecessary points, in practical applications. In addition, the computational experience of this revised algorithm is also presented.
Xue-Hou TAN Tomio HIRATA Yasuyoshi INAGAKI
Recently much attention has been devoted to the problem of translating a set of geometrical objects in a given direction, one at a time, without allowing collisions between the objects. This paper studies the translation problem in three dimensions on a set of
Hideo KIKUCHI Takashi YUKAWA Kazumitsu MATSUZAWA Tsutomu ISHIKAWA
This paper discusses the design, implementation, and performance of a bus-connected multiprocessor, called Presto, for a Rete-based production system. To perform a match, which is a major phase of a production system, a Presto match scheme exploits the subnetworks that are separated by the top two-input nodes and the token flow control at these nodes. Since parallelism of a production system can only increase speed 10-fold, the aim is to do so efficiently on a low-cost, compact bus-connected multi-processor system without shared memory or cache memory. The Presto hardware consists of up to 10 processisng elements (PEs), each comprising a commercial microprocessor, 4 Mbytes of local memory, and two kinds of newly developed ASIC chips for memory control and bus control. Hierarchical system software is provided for developing interpreter programs. Measurement with 10 PEs shows that sample programs run 5-7 times faster.
Douglas E. MARQUARDT Hasan S. ALKHATIB
The problems of cache coherency in multiprocessor systems are directly related to their architectural structures. Small scale multiprocessor systems have focused on the use of bus based memory interconnection networks using centrally shared memory and a sequential consistency model for coherency. This has limited scalability to but a few tens of processors due to the limited bus bandwidth used for both coherency updates and memory traffic. Recently, large scale multiprocessor systems have been proposed that use general interconnection networks and distributed shared memory. These architectures have been proposed using weak consistency models and various directory map schemes to hide the overhead for coherency maintenance within the memory hieratchy, interconnection network or process context switch latencies. The coherency and memory traffic are still maintained over the same interconnection network. In this paper, we present the architecture of a new general purpose medium scale multiprocessor system. This Cache Coherent Multiprocessor System (C2MP), supports distributed shared memory using a general memory interconnection network for memory traffic and a separate bus based coherency interconnection network for coherency maintenance. Through the use of a special directory based coherency protocol and cache oriented distributed coherency controllers, direct cache-to-cache coherency maintenance is performed over the dedicated coherency bus. This minimizes coherency updates to only those processor nodes needing coherency maintenance. An aggressive sequential coherncy model is used, which reduces the hardware penalty to support an ideal sequential consistency programmers model. The system can scale up to 256-512 processors depending on the degree of shared data and is expected to have higher per processor utilization in this range than currently proposed medium and large scale multiprocessor systems. The C2MP system is analyzed utilizing a Generalized Timed Petri-Net model of a processor node. A stochastic model for internode interactions over the general memory interconnection network and coherency bus are used . The model of the proposed architecture is analyzed under steady-state conditions for varying system work load parameters.
Jeong Uk KIM Jae Moon LEE Myunghwan KIM
A tag-partitioned join algorithm is described. The algorithm partitions only one relation, while other partition-based algorithms partition both relations. It is performed as the joinable tuples of one relation are rearranged and some of them are duplicated according to the original sequence of the join attribute values of the other relation. To do this, the algorithm first finds the positions of all the tuples of the other relation which are joinable with each tuple of one relation, and then partitions joinable tuples of one relation into buckets by using the positions found. Final joining is performed on the partitioned relation and the other relation. We analyze and compare the performance of the algorithm with that of other partition-based join algorithms. The comparison shows that our method is better than other partition-based methods under the practical values of the analysis parameters.
Seoung Sup LEE Ha Ryoung OH June Hyoung KIM Won Ho CHUNG Myunghwan KIM
This paper presents a destributed algorithm that uses weak copy consistency to create mutual exclusion in a distributed computer system. The weak copy consistency is deduced from the uncertainty of state which occurs due to the finite and unpredictable communication delays in a distributed environment. Also the method correlates outdated state information to current state. The average number of messages to enter critical section in the algorithm is n/2 to n messages where n is the number of sites. We show that the algorithm achieves mutual exclusion and the fairness and liveness of the algorithm is proven. We study the performance of the algorithm by simulation technique.
The Batcher banyan network is well known as a non-blocking switching fabric. However, it is conflict free only when there is no packets for the same destination. To cope with the arbitrary combination of packets, an additional network or special control sequence which causes the increase of the hardware or performance degradation is required. A Batcher Double Omega network with Combining (BDOC) is an elegant solution of this problem. It consists of a Batcher sorter and two double sized Omega networks. Like in the Batcher banyan network, packets are sorted by the destination label in the Batcher sorter. In the first Omega network called the distributer, a packet is routed by a tag corresponding to the sum of the label at the output of the Batcher sorter and the destination label. In the second (Inverse) Omega network called the concentrator, the original destination label is used as the routing tag, and packets are routed without any conflict. The BDOC is useful for an interconnection network to connect processors and memory modules in multiprocessor. Unlike conventional multistage interconnection networks for multiprocessors, packets are transferred in a serial and synchronized manner. The simple structure of the switching element enables a high speed operation which reduces the latency caused by the serial communication. Using the pipelined circuit switching, the address and data packets share the same control signal, and the structure of the switching element is much simplified. Moreover, packets combining which avoids the hot spot contention is realized easily in the concentrator.
This paper addresses fault tolerance of a processor array that is reconfigurable by replacing faulty processors with spare processors. The fault tolerance of such a reconfigurable array depends on not only an algorithm for spare processor assignment but also the folloving factor of an organization of spare processors in the reconfigurable array: the number of spare processors; the number of processors that can be replaced by each spare processor; and how spare processors are connected with processors. We discuss a relationship between fault tolerance of reconfigurable arrays and their organizations of spare processors in terms of the smallest size of fatal sets and the reliability function. The smallest size of fatal sets is the smallest number of faulty processors for which the reconfigurable array cannot be failure-free as a processor array system no matter what reconfiguration is used. The reliability function is a function of time t whose value is the probability that the reconfigurable array is failure-free as a processor array system by time t when the best possible reconfiguration is used. First, we show that the larger smallest size of fatal sets a reconfigurable array has, the larger reliability function it has by some time. It suggests that it is important to maximize the smallest size of fatal sets in orer to improve the reliability function as well. Second, we present the best possible smallest size of fatal sets for n
A new class of (m2
The outputs of all gates in a circuit are assumed to be observable unber the highly observable condition, which is mainly based on the use of E-beam testers. When using the E-beam tester, it is desirable that the test set for a circuit is small and the test vectors in the test set can be applied in a successive and repetitive manner. For a combinational circuit, these requirements can be satisfied by modifying the circuit into a k-UCP circuit, which needs only a small number of tests for diagnosis. For a sequential circuit, however, even if the combinational portion has been modified into a k-UCP circuit, it is impossible that the test vectors for the combinational portion can always be applied in a successive and repetitive manner because of the existence of feedback loops. To solve this problem, the concept of k-UCP scan circuits is proposed in this paper. It is shown that the test vectors for the combinational portion in a k-UCP scan circuit can be applied in a successive and repetitive manner through a specially constructed scan-path. An efficient method of modifying a sequential circuit into a k-UCP scan circuit is also presented.
This paper proposes a Mean-Separated and Normalized Vector Quantizer with edge-Adaptive Feedback estimation and variable bit rates (AFMSN-VQ). The basic idea of the AFMSN-VQ is to estimate the statistical parameters of each coding block from its previous coded blocks and then use the estimated parameters to normalize the coding block prior to vector quantization. The edge-adaptive feedback estimator utilizes the interblock correlations of edge connectivity and gray level continuity to accurately estimate the mean and standard deviation of the coding block. The rate-variable VQ is to diminish distortion nonuniformity among image blocks of different activities and to improve the reconstruction quality of edges and contours to which the human vision is sensitive. Simulation results show that up to 2.7dB SNR gain of the AFMSN-VQ over the non-adaptive FMSN-VQ and up to 2.2dB over the 16
Teruhiko UKITA Satoshi KINOSHITA Kazuo SUMITA Hiroshi SANO Shin'ya AMANO
Resolving ambiguities in interpreting the user's utterances is one of the most fundamental problems in the development of a question-answering system. The process of disambiguating interpretations requires knowledge and inference functions on an objective task field. This paper describes a framework for understanding conversational language, using the multi-paradigm knowledge representation (
A theoretical conjecture on fractal dimensions of a dendrite distribution in neural networks is presented on the basis of the dendrite tree model. It is shown that the fractal dimensions obtained by the model are consistent with the recent experimental data.
Erkki OJA Hidemitsu OGAWA Jaroonsakdi WANGVIWATTANA
Principal Component Analysis (PCA) is a useful technique in feature extraction and data compression. It can be formulated as a statistical constrained maximization problem, whose solution is given by unit eigenvectors of the data covariance matrix. In a practical application like image compression, the problem can be solved numerically by a corresponding gradient ascent maximization algorithm. Such on-line algoritms can be good alternatives due to their parallelism and adaptivity to input data. The algorithms can be implemented in a local and homogeneous way in learning neural networks. One example is the Subspace Network. It is a regular layer of parallel artificial neurons with a learning rule that is completely homogeneous with respect to the neurons. However, due to the complete homogeneity, the learning rule does not converge to the unique basis given by the dominant eigenvectors, but any basis of this eigenvector subspace is possible. In many applications like data compression, the subspace is not sufficient but the actual eigenvectors or PCA coefficient vectors are needed. A new criterion, called the Weighted Subspace Criterion, is proposed, which makes a small symmetry-breaking change to the Subspace Criterion. Only the true eigenvectors are solutions. Making the corresponding change to the learning rule of the Subspace Network gives a modified learning rule, which can be still implemented on a homogeneous network architecture. In learning, the weight vectors will tend to the true eigenvectors.
Erkki OJA Hidemitsu OGAWA Jaroonsakdi WANGVIWATTANA
Artificial neurons and neural networks have been shown to perform Principal Component Analysis (PCA) when gradient ascent learning rules are used, which are related to the constrained maximization of statistical objective functions. Due to their parallelism and adaptivity to input data, such algorithms and their implementations in neural networks are potentially useful in feature extraction and data compression. In the companion paper(9), two such learning rules were derived from two criteria, the Subspace Criterion and the Weighted Subspace Criterion. It was shown that the only solutions to the latter problem are dominant eigenvectors of the data covariance matrix, which are the basis vectors of PCA. It was suggested by a simulation that the corresponding learning algorithm converges to these eigenvectors. A homogeneous neural network implementation was proposed for the algorithm. The learning algorithm is analyzed here in detail and it is shown that it can be approximated by a continuous-time differential equation that is obtained by averaging. It is shown that the asymptotically stable limits of this differntial equation are the eigenvectors. The neural network learning algorithm is further extended to a case in which each neuron has a sigmoidal nonlinear feedback activity function. Then no parameters specific to each neuron are needed, and the learning rule is fully homogeneous.