Lihan TONG Weijia LI Qingxia YANG Liyuan CHEN Peng CHEN
Yinan YANG
Myung-Hyun KIM Seungkwang LEE
Shuoyan LIU Chao LI Yuxin LIU Yanqiu WANG
Takumi INABA Takatsugu ONO Koji INOUE Satoshi KAWAKAMI
Martin LUKAC Saadat NURSULTAN Georgiy KRYLOV Oliver KESZOCZE Abilmansur RAKHMETTULAYEV Michitaka KAMEYAMA
Zheqing ZHANG Hao ZHOU Chuan LI Weiwei JIANG
Liu ZHANG Zilong WANG Yindong CHEN
Wenxia Bao An Lin Hua Huang Xianjun Yang Hemu Chen
Fengshan ZHAO Qin LIU Takeshi IKENAGA
Haruhiko KAIYA Shinpei OGATA Shinpei HAYASHI
Jiakai LI Jianyong DUAN Hao WANG Li HE Qing ZHANG
Yuxin HUANG Yuanlin YANG Enchang ZHU Yin LIANG Yantuan XIAN
Naohito MATSUMOTO Kazuhiro KURITA Masashi KIYOMI
Na XING Lu LI Ye ZHANG Shiyi YANG
Zhe Wang Zhe-Ming Lu Hao Luo Yang-Ming Zheng
Rina TAGAMI Hiroki KOBAYASHI Shuichi AKIZUKI Manabu HASHIMOTO
Tomohiro KOBAYASHI Tomomi MATSUI
Shin-ichi NAKANO
Hongzhi XU Binlian ZHANG
Weizhi WANG Lei XIA Zhuo ZHANG Xiankai MENG
Yuka KO Katsuhito SUDOH Sakriani SAKTI Satoshi NAKAMURA
Rinka KAWANO Masaki KAWAMURA
Zhishuo ZHANG Chengxiang TAN Xueyan ZHAO Min YANG
Peng WANG Guifen CHEN Zhiyao SUN
Zeyuan JU Zhipeng LIU Yu GAO Haotian LI Qianhang DU Kota YOSHIKAWA Shangce GAO
Ji WU Ruoxi YU Kazuteru NAMBA
Hao WANG Yao Ma Jianyong Duan Li HE Xin Li
Shijie WANG Xuejiao HU Sheng LIU Ming LI Yang LI Sidan DU
Arata KANEKO Htoo Htoo Sandi KYAW Kunihiro FUJIYOSHI Keiichi KANEKO
Qi LIU Bo WANG Shihan TAN Shurong ZOU Wenyi GE
HanYu Zhang Tomoji Kishi
Shinobu NAGAYAMA Tsutomu SASAO Jon T. BUTLER
Yoon Hak KIM
Takashi HIRAYAMA Rin SUZUKI Katsuhisa YAMANAKA Yasuaki NISHITANI
Yosuke IIJIMA Atsunori OKADA Yasushi YUMINAKA
Batnasan Luvaanjalba Elaine Yi-Ling Wu
KuanChao CHU Satoshi YAMAZAKI Hideki NAKAYAMA
Shenglei LI Haoran LUO Tengfei SHAO Reiko HISHIYAMA
Yasushi YUMINAKA Kazuharu NAKAJIMA Yosuke IIJIMA
Chunbo Liu Liyin Wang Zhikai Zhang Chunmiao Xiang Zhaojun Gu Zhi Wang Shuang Wang
Jia-ji JIANG Hai-bin WAN Hong-min SUN Tuan-fa QIN Zheng-qiang WANG
Yuhao LIU Zhenzhong CHU Lifei WEI
Ken ASANO Masanori NATSUI Takahiro HANYU
Shuto HASEGAWA Koichiro ENOMOTO Taeko MIZUTANI Yuri OKANO Takenori TANAKA Osamu SAKAI
Zhewei XU Mizuho IWAIHARA
Takao WAHO Akihisa KOYAMA Hitoshi HAYASHI
Taisei SAITO Kota ANDO Tetsuya ASAI
Shiyu YANG Tetsuya KANDA Daniel M. GERMAN Yoshiki HIGO
Tsutomu SASAO
Jiyeon LEE
Koichi MORIYAMA Akira OTSUKA
Hongliang FU Qianqian LI Huawei TAO Chunhua ZHU Yue XIE Ruxue GUO
Gao WANG Gaoli WANG Siwei SUN
Hua HUANG Yiwen SHAN Chuan LI Zhi WANG
Zhi LIU Heng WANG Yuan LI Hongyun LU Hongyuan JING Mengmeng ZHANG
Tomoyasu NAKANO Masataka GOTO
Hyebong CHOI Joel SHIN Jeongho KIM Samuel YOON Hyeonmin PARK Hyejin CHO Jiyoung JUNG
Xianglong LI Yuan LI Jieyuan ZHANG Xinhai XU Donghong LIU
Haoran LUO Tengfei SHAO Shenglei LI Reiko HISHIYAMA
Chang SUN Yitong LIU Hongwen YANG
Ji XI Yue XIE Pengxu JIANG Wei JIANG
Ming PAN
Hiroyuki EBARA Noriyuki FUKUYAMA Hideo NAKANO Yoshiro NAKANISHI
Roundness is one of the most important geometric measures for circular objects in the process of mechanical assembly. It is the amount of variation in a circular size which can be permitted. To compute roundness, the authors have already proposed an exact polynomial-time algorithm whose time complexity is O(n2). In this paper, we show that this roundness algorithm can be improved more efficiently, by introducing the deletion of the unnecessary points, in practical applications. In addition, the computational experience of this revised algorithm is also presented.
Xue-Hou TAN Tomio HIRATA Yasuyoshi INAGAKI
Recently much attention has been devoted to the problem of translating a set of geometrical objects in a given direction, one at a time, without allowing collisions between the objects. This paper studies the translation problem in three dimensions on a set of
Hideo KIKUCHI Takashi YUKAWA Kazumitsu MATSUZAWA Tsutomu ISHIKAWA
This paper discusses the design, implementation, and performance of a bus-connected multiprocessor, called Presto, for a Rete-based production system. To perform a match, which is a major phase of a production system, a Presto match scheme exploits the subnetworks that are separated by the top two-input nodes and the token flow control at these nodes. Since parallelism of a production system can only increase speed 10-fold, the aim is to do so efficiently on a low-cost, compact bus-connected multi-processor system without shared memory or cache memory. The Presto hardware consists of up to 10 processisng elements (PEs), each comprising a commercial microprocessor, 4 Mbytes of local memory, and two kinds of newly developed ASIC chips for memory control and bus control. Hierarchical system software is provided for developing interpreter programs. Measurement with 10 PEs shows that sample programs run 5-7 times faster.
Douglas E. MARQUARDT Hasan S. ALKHATIB
The problems of cache coherency in multiprocessor systems are directly related to their architectural structures. Small scale multiprocessor systems have focused on the use of bus based memory interconnection networks using centrally shared memory and a sequential consistency model for coherency. This has limited scalability to but a few tens of processors due to the limited bus bandwidth used for both coherency updates and memory traffic. Recently, large scale multiprocessor systems have been proposed that use general interconnection networks and distributed shared memory. These architectures have been proposed using weak consistency models and various directory map schemes to hide the overhead for coherency maintenance within the memory hieratchy, interconnection network or process context switch latencies. The coherency and memory traffic are still maintained over the same interconnection network. In this paper, we present the architecture of a new general purpose medium scale multiprocessor system. This Cache Coherent Multiprocessor System (C2MP), supports distributed shared memory using a general memory interconnection network for memory traffic and a separate bus based coherency interconnection network for coherency maintenance. Through the use of a special directory based coherency protocol and cache oriented distributed coherency controllers, direct cache-to-cache coherency maintenance is performed over the dedicated coherency bus. This minimizes coherency updates to only those processor nodes needing coherency maintenance. An aggressive sequential coherncy model is used, which reduces the hardware penalty to support an ideal sequential consistency programmers model. The system can scale up to 256-512 processors depending on the degree of shared data and is expected to have higher per processor utilization in this range than currently proposed medium and large scale multiprocessor systems. The C2MP system is analyzed utilizing a Generalized Timed Petri-Net model of a processor node. A stochastic model for internode interactions over the general memory interconnection network and coherency bus are used . The model of the proposed architecture is analyzed under steady-state conditions for varying system work load parameters.
Jeong Uk KIM Jae Moon LEE Myunghwan KIM
A tag-partitioned join algorithm is described. The algorithm partitions only one relation, while other partition-based algorithms partition both relations. It is performed as the joinable tuples of one relation are rearranged and some of them are duplicated according to the original sequence of the join attribute values of the other relation. To do this, the algorithm first finds the positions of all the tuples of the other relation which are joinable with each tuple of one relation, and then partitions joinable tuples of one relation into buckets by using the positions found. Final joining is performed on the partitioned relation and the other relation. We analyze and compare the performance of the algorithm with that of other partition-based join algorithms. The comparison shows that our method is better than other partition-based methods under the practical values of the analysis parameters.
Seoung Sup LEE Ha Ryoung OH June Hyoung KIM Won Ho CHUNG Myunghwan KIM
This paper presents a destributed algorithm that uses weak copy consistency to create mutual exclusion in a distributed computer system. The weak copy consistency is deduced from the uncertainty of state which occurs due to the finite and unpredictable communication delays in a distributed environment. Also the method correlates outdated state information to current state. The average number of messages to enter critical section in the algorithm is n/2 to n messages where n is the number of sites. We show that the algorithm achieves mutual exclusion and the fairness and liveness of the algorithm is proven. We study the performance of the algorithm by simulation technique.
The Batcher banyan network is well known as a non-blocking switching fabric. However, it is conflict free only when there is no packets for the same destination. To cope with the arbitrary combination of packets, an additional network or special control sequence which causes the increase of the hardware or performance degradation is required. A Batcher Double Omega network with Combining (BDOC) is an elegant solution of this problem. It consists of a Batcher sorter and two double sized Omega networks. Like in the Batcher banyan network, packets are sorted by the destination label in the Batcher sorter. In the first Omega network called the distributer, a packet is routed by a tag corresponding to the sum of the label at the output of the Batcher sorter and the destination label. In the second (Inverse) Omega network called the concentrator, the original destination label is used as the routing tag, and packets are routed without any conflict. The BDOC is useful for an interconnection network to connect processors and memory modules in multiprocessor. Unlike conventional multistage interconnection networks for multiprocessors, packets are transferred in a serial and synchronized manner. The simple structure of the switching element enables a high speed operation which reduces the latency caused by the serial communication. Using the pipelined circuit switching, the address and data packets share the same control signal, and the structure of the switching element is much simplified. Moreover, packets combining which avoids the hot spot contention is realized easily in the concentrator.
This paper addresses fault tolerance of a processor array that is reconfigurable by replacing faulty processors with spare processors. The fault tolerance of such a reconfigurable array depends on not only an algorithm for spare processor assignment but also the folloving factor of an organization of spare processors in the reconfigurable array: the number of spare processors; the number of processors that can be replaced by each spare processor; and how spare processors are connected with processors. We discuss a relationship between fault tolerance of reconfigurable arrays and their organizations of spare processors in terms of the smallest size of fatal sets and the reliability function. The smallest size of fatal sets is the smallest number of faulty processors for which the reconfigurable array cannot be failure-free as a processor array system no matter what reconfiguration is used. The reliability function is a function of time t whose value is the probability that the reconfigurable array is failure-free as a processor array system by time t when the best possible reconfiguration is used. First, we show that the larger smallest size of fatal sets a reconfigurable array has, the larger reliability function it has by some time. It suggests that it is important to maximize the smallest size of fatal sets in orer to improve the reliability function as well. Second, we present the best possible smallest size of fatal sets for n
A new class of (m2
The outputs of all gates in a circuit are assumed to be observable unber the highly observable condition, which is mainly based on the use of E-beam testers. When using the E-beam tester, it is desirable that the test set for a circuit is small and the test vectors in the test set can be applied in a successive and repetitive manner. For a combinational circuit, these requirements can be satisfied by modifying the circuit into a k-UCP circuit, which needs only a small number of tests for diagnosis. For a sequential circuit, however, even if the combinational portion has been modified into a k-UCP circuit, it is impossible that the test vectors for the combinational portion can always be applied in a successive and repetitive manner because of the existence of feedback loops. To solve this problem, the concept of k-UCP scan circuits is proposed in this paper. It is shown that the test vectors for the combinational portion in a k-UCP scan circuit can be applied in a successive and repetitive manner through a specially constructed scan-path. An efficient method of modifying a sequential circuit into a k-UCP scan circuit is also presented.
This paper proposes a Mean-Separated and Normalized Vector Quantizer with edge-Adaptive Feedback estimation and variable bit rates (AFMSN-VQ). The basic idea of the AFMSN-VQ is to estimate the statistical parameters of each coding block from its previous coded blocks and then use the estimated parameters to normalize the coding block prior to vector quantization. The edge-adaptive feedback estimator utilizes the interblock correlations of edge connectivity and gray level continuity to accurately estimate the mean and standard deviation of the coding block. The rate-variable VQ is to diminish distortion nonuniformity among image blocks of different activities and to improve the reconstruction quality of edges and contours to which the human vision is sensitive. Simulation results show that up to 2.7dB SNR gain of the AFMSN-VQ over the non-adaptive FMSN-VQ and up to 2.2dB over the 16
Teruhiko UKITA Satoshi KINOSHITA Kazuo SUMITA Hiroshi SANO Shin'ya AMANO
Resolving ambiguities in interpreting the user's utterances is one of the most fundamental problems in the development of a question-answering system. The process of disambiguating interpretations requires knowledge and inference functions on an objective task field. This paper describes a framework for understanding conversational language, using the multi-paradigm knowledge representation (
A theoretical conjecture on fractal dimensions of a dendrite distribution in neural networks is presented on the basis of the dendrite tree model. It is shown that the fractal dimensions obtained by the model are consistent with the recent experimental data.
Erkki OJA Hidemitsu OGAWA Jaroonsakdi WANGVIWATTANA
Principal Component Analysis (PCA) is a useful technique in feature extraction and data compression. It can be formulated as a statistical constrained maximization problem, whose solution is given by unit eigenvectors of the data covariance matrix. In a practical application like image compression, the problem can be solved numerically by a corresponding gradient ascent maximization algorithm. Such on-line algoritms can be good alternatives due to their parallelism and adaptivity to input data. The algorithms can be implemented in a local and homogeneous way in learning neural networks. One example is the Subspace Network. It is a regular layer of parallel artificial neurons with a learning rule that is completely homogeneous with respect to the neurons. However, due to the complete homogeneity, the learning rule does not converge to the unique basis given by the dominant eigenvectors, but any basis of this eigenvector subspace is possible. In many applications like data compression, the subspace is not sufficient but the actual eigenvectors or PCA coefficient vectors are needed. A new criterion, called the Weighted Subspace Criterion, is proposed, which makes a small symmetry-breaking change to the Subspace Criterion. Only the true eigenvectors are solutions. Making the corresponding change to the learning rule of the Subspace Network gives a modified learning rule, which can be still implemented on a homogeneous network architecture. In learning, the weight vectors will tend to the true eigenvectors.
Erkki OJA Hidemitsu OGAWA Jaroonsakdi WANGVIWATTANA
Artificial neurons and neural networks have been shown to perform Principal Component Analysis (PCA) when gradient ascent learning rules are used, which are related to the constrained maximization of statistical objective functions. Due to their parallelism and adaptivity to input data, such algorithms and their implementations in neural networks are potentially useful in feature extraction and data compression. In the companion paper(9), two such learning rules were derived from two criteria, the Subspace Criterion and the Weighted Subspace Criterion. It was shown that the only solutions to the latter problem are dominant eigenvectors of the data covariance matrix, which are the basis vectors of PCA. It was suggested by a simulation that the corresponding learning algorithm converges to these eigenvectors. A homogeneous neural network implementation was proposed for the algorithm. The learning algorithm is analyzed here in detail and it is shown that it can be approximated by a continuous-time differential equation that is obtained by averaging. It is shown that the asymptotically stable limits of this differntial equation are the eigenvectors. The neural network learning algorithm is further extended to a case in which each neuron has a sigmoidal nonlinear feedback activity function. Then no parameters specific to each neuron are needed, and the learning rule is fully homogeneous.