Yufei LIN Xuejun YANG Xinhai XU Xiaowei GUO
Scaling up the system size has been the common approach to achieving high performance in parallel computing. However, designing and implementing a large-scale parallel system can be very costly in terms of money and time. When building a target system, it is desirable to initially build a smaller version by using the processing nodes with the same architecture as those in the target system. This allows us to achieve efficient and scalable prediction by using the smaller system to predict the performance of the target system. Such scalability prediction is critical because it enables system designers to evaluate different design alternatives so that a certain performance goal can be successfully achieved. As the de facto standard for writing parallel applications, MPI is widely used in large-scale parallel computing. By categorizing the discrete event simulation methods for MPI programs and analyzing the characteristics of scalability prediction, we propose a novel simulation method, called virtual-actual combined execution-driven (VACED) simulation, to achieve scalable prediction for MPI programs. The basic idea behind is to predict the execution time of an MPI program on a target machine by running it on a smaller system so that we can predict its communication time by virtual simulation and obtain its sequential computation time by actual execution. We introduce a model for the VACED simulation as well as the design and implementation of VACED-SIM, a lightweight simulator based on fine-grained activity and event definitions. We have validated our approach on a sub-system of Tianhe-1A. Our experimental results show that VACED-SIM exhibits higher accuracy and efficiency than MPI-SIM. In particular, for a target system with 1024 cores, the relative errors of VACED-SIM are less than 10% and the slowdowns are close to 1.
Xin WANG Filippos BALASIS Sugang XU Yoshiaki TANAKA
It is believed that the wavelength switched optical network (WSON) technology is moving towards being adopted by large-scale networks. Wavelength conversion and signal regeneration through reamplifying, reshaping, and retiming (3R) are beneficial to support the expansion of WSON. In many cases, these two functions can be technically integrated into a single shared physical component, namely the wavelength convertible 3R regenerator (WC3R). However, fully deploying such devices is infeasible due to their excessive cost. Thus, this topic serves as a motivation behind the investigation of the sparse placement issue of WC3Rs presented in this paper. A series of strategies are proposed based on knowledge of the network. Moreover, a novel adaptive routing and joint resource assignment algorithm is presented to provision the lightpaths in WSON with sparsely placed WC3Rs. Extensive simulation trials are conducted under even and uneven distribution of WC3R resource. Each strategic feature is examined for its efficiency in lowering the blocking probability. The results reveal that carefully designed sparse placement of WC3Rs can achieve performance comparable to that of full WC3R placement scenario. Furthermore, the expenditure of WC3R deployment also depends on the type of used WC3Rs characterized by the wavelength convertibility, i.e., fixed WC3R or tunable WC3R. This paper also investigates WSON from the perspective of cost and benefit by employing different types of WC3Rs in order to find the possibility of more efficient WC3R investment.
Byeong-No KIM Chan-Ho HAN Kyu-Ik SOHNG
We propose a composite DCT basis line test signal to evaluate the video quality of a DTV encoder. The proposed composite test signal contains a frame index, a calibration square wave, and 7-field basis signals. The results show that the proposed method may be useful for an in-service video quality verifier, using an ordinary oscilloscope instead of special equipment.
There is a relentless push for cost and size reduction in optical transmitters and receivers for fiber-optic links. Monolithically integrated optical chips in InP and Si may be a way to leap ahead of this trend. We discuss uses of integration technology to accomplish various telecommunications functions.
Satoshi TAOKA Daisuke TAKAFUJI Toshimasa WATANABE
A vertex cover of a given graph G = (V,E) is a subset N of V such that N contains either u or v for any edge (u,v) of E. The minimum weight vertex cover problem (MWVC for short) is the problem of finding a vertex cover N of any given graph G = (V,E), with weight w(v) for each vertex v of V, such that the sum w(N) of w(v) over all v of N is minimum. In this paper, we consider MWVC with w(v) of any v of V being a positive integer. We propose simple procedures as postprocessing of algorithms for MWVC. Furthremore, five existing approximation algorithms with/without the proposed procedures incorporated are implemented, and they are evaluated through computing experiment.
The performance of cooperative spectrum sensing (CSS) is limited not only by the imperfect sensing channels but also by the imperfect reporting channels. In order to improve the transmission reliability of the reporting channels, an object based cooperative spectrum sensing scheme with best relay (Pe-BRCS) is proposed, in which the best relay is selected by minimizing the total reporting error probability to improve the sensing performance. Numerical results show that, the reduced total reporting error probability and the improved sensing performance can be achieved by the Pe-BRCS scheme.
Wenxin YU Weichen WANG Minghui WANG Satoshi GOTO
Multi-view video can provide users with three-dimensional (3-D) and virtual reality perception through multiple viewing angles. In recent years, depth image-based rendering (DIBR) has been generally used to synthesize virtual view images in free viewpoint television (FTV) and 3-D video. To conceal the zero-region more accurately and improve the quality of a virtual view synthesized frame, an integrated hole-filling algorithm for view synthesis is proposed in this paper. The proposed algorithm contains five parts: an algorithm for distinguishing different regions, foreground and background boundary detection, texture image isophotes detection, a textural and structural isophote prediction algorithm, and an in-painting algorithm with gradient priority order. Based on the texture isophote prediction with a geometrical principle and the in-painting algorithm with a gradient priority order, the boundary information of the foreground is considerably clearer and the texture information in the zero-region can be concealed much more accurately than in previous works. The vision quality mainly depends on the distortion of the structural information. Experimental results indicate that the proposed algorithm improves not only the objective quality of the virtual image, but also its subjective quality considerably; human vision is also clearly improved based on the subjective results. In particular, the algorithm ensures the boundary contours of the foreground objects and the textural and structural information.
Shenchuan LIU Masaaki FUJIYOSHI Hitoshi KIYA
This paper introduces amplitude-only images to image trading systems in which not only the copyright of images but also the privacy of consumers are protected. In the latest framework for image trading systems, an image is divided into an unrecognizable piece and a recognizable but distorted piece to simultaneously protect the privacy of a consumer and the copyright of the image. The proposed scheme uses amplitude-only images which are completely unrecognizable as the former piece, whereas the conventional schemes leave recognizable parts to the piece which degrades privacy protection performance. Moreover, the proposed scheme improves the robustness against copyright violation regardless of the used digital fingerprinting technique, because an amplitude-only image is larger than the piece in the conventional scheme. In addition, phase-only image is used as the second piece in the proposed scheme, the consumer can confirm what he/she bought. Experimental results show the effectiveness of the proposed scheme.
Dong Phuong DINH Fumiko HARADA Hiromitsu SHIMAKAWA
The paper proposes the PMD method to design an introductory programming practice course plan that is inclusive for all learners and stable throughout a course. To achieve the course plan, the method utilizes personas, each of which represents learners having similar motivation to study programming. The learning of the personas is directed to the course goal with an enforcement resulting from the discipline, which is an integration of effective learning strategies with affective components of the persoans. Under the enforcement, services to facilitate and promote the learning of each persona can be decided, based on motivation components of each persona, motivational effects of the services, and the cycle of self-efficacy. The application of the method on about 500 freshmen in C programming practice course has shown this is a successful approach for designing courses.
Kazuhito MATSUDA Go HASEGAWA Masayuki MURATA
Application-level routing that chooses an end-to-end traffic route that relays other end hosts can improve user-perceived performance metrics such as end-to-end latency and available bandwidth. However, selfish route selection performed by each end user can lead to a decrease in path performance due to overload by route overlaps, as well as an increase in the inter-ISP transit cost as a result of utilizing more transit links compared with native IP routing. In this paper, we first strictly define an optimization problem for selecting application-level traffic routes with the aim of maximizing end-to-end network performance under a transit cost constraint. We then propose an application-level traffic routing method based on distributed simulated annealing to obtain good solutions to the problem. We evaluate the performance of the proposed method by assuming that PlanetLab nodes utilize application-level traffic routing. We show that the proposed routing method can result in considerable improvement of network performance without increasing transit cost. In particular, when using end-to-end latency as a routing metric, the number of overloaded end-to-end paths can be reduced by about 65%, as compared with that when using non-coordinated methods. We also demonstrate that the proposed method can react to dynamic changes in traffic demand and select appropriate routes.
Isameldin Mohammed SULIMAN Janne J. LEHTOMÄKI Kenta UMEBAYASHI Marcos KATZ
It is well known that cognitive radio (CR) techniques have great potential for supporting future demands on the scarce radio spectrum resources. For example, by enabling the utilization of spectrum bands temporarily not utilized by primary users (PUs) licensed to operate on those bands. Spectrum sensing is a well-known CR technique for detecting those unutilized bands. However, the spectrum sensing outcomes cannot be perfect and there will always be some misdetections and false alarms which will affect the performance thereby degrading the quality of service (QoS) of PUs. Continuous time Markov chain (CTMC) based modeling has been widely used in the literature to evaluate the performance of CR networks (CRNs). A major limitation of the available literature is that all the key factors and realistic elements such as the effect of imperfect sensing and state dependent transition rates are not modeled in a single work. In this paper, we present a CTMC based model for analyzing the performance of CRNs. The proposed model differs from the existing models by accurately incorporating key elements such as full state dependent transition rates, multi-channel support, handoff capability, and imperfect sensing. We derive formulas for primary termination probability, secondary success probability, secondary blocking probability, secondary forced termination probability, and radio resource utilization. The results show that incorporating fully state dependent transition rates in the CTMC can significantly improve analysis accuracy, thus achieving more realistic and accurate analytical model. The results from extensive Monte Carlo simulations confirm the validity of our proposed model.
This paper presents an efficient algorithm for reporting all intersections among n given segments in the plane using work space of arbitrarily given size. More exactly, given a parameter s which is between Ω(1) and O(n) specifying the size of work space, the algorithm reports all the segment intersections in roughly O(n2/+ K) time using O(s) words of O(log n) bits, where K is the total number of intersecting pairs. The time complexity can be improved to O((n2/s) log s + K) when input segments have only some number of different slopes.
Cyclic Redundancy Check (CRC) is a well known error detection scheme used to detect corruption of digital content in digital networks and storage devices. Since it is a compute-intensive process which adversely affects performance, hardware acceleration using FPGAs has been tried and satisfactory performance has been achieved. However, recent extended usage of networks and storage systems require various correction capabilities for various CRC standards. Traditional hardware designs based on the LFSR (Linear Feedback Shift Register) tend to have fixed structure without such flexibility. Here, fully-adaptable CRC accelerator based on a table-based algorithm is proposed. The table-based algorithm is a flexible method commonly used in software implementations. It has been rarely implemented with the hardware, since it is believed that the operational speed is not enough. However, by using pipelined structure and efficient use of memory modules in FPGAs, it appeared that the table-based fixed CRC accelerators achieved better performance than traditional implementation. Based on the implementation, fully-adaptable CRC accelerator which eliminate the need for many non-adaptable CRC implementations is proposed. The accelerator has ability to process arbitrary number of input data and generates CRC for any known CRC standard, up to 65 bits of generator polynomial, during run-time. Further, we modify Table generation algorithm in order to decrease its space complexity from O(nm) to O(n). On Xilinx Virtex 6 LX550T board, the fully-adaptable accelerators occupy between 1 to 2% area to produce maximum of 289.8 Gbps at 283.1 MHz if BRAM is deployed, or between 1.6 - 14% of area for 418 Gbps at 408.9 MHz if tables are implemented in logic. Proposed architecture enables further expansion of throughput by increasing a number of input bits M processed at a time.
Naoya OKADA Yuichi NAKAMURA Shinji KIMURA
Nonvolatile flip-flop enables leakage power reduction in logic circuits and quick return from standby mode. However, it has limited write endurance, and its power consumption for writing is larger than that of conventional D flip-flop (DFF). For this reason, it is important to reduce the number of write operations. The write operations can be reduced by stopping the clock signal to synchronous flip-flops because write operations are executed only when the clock is applied to the flip-flops. In such clock gating, a method using Exclusive OR (XOR) of the current value and the new value as the control signal is well known. The XOR based method is effective, but there are several cases where the write operations can be reduced even if the current value and the new value are different. The paper proposes a method to detect such unnecessary write operations based on state transition analysis, and proposes a write control method to save power consumption of nonvolatile flip-flops. In the method, redundant bits are detected to reduce the number of write operations. If the next state and the outputs do not depend on some current bit, the bit is redundant and not necessary to write. The method is based on Binary Decision Diagram (BDD) calculation. We construct write control circuits to stop the clock signal by converting BDDs representing a set of states where write operations are unnecessary. Proposed method can be combined with the XOR based method and reduce the total write operations. We apply combined method to some benchmark circuits and estimate the power consumption with Synopsys NanoSim. On average, 15.0% power consumption can be reduced compared with only the XOR based method.
Keisuke KATO Fumitaka ABE Kazuyuki WAKABAYASHI Chuan GAO Takafumi YAMADA Haruo KOBAYASHI Osamu KOBAYASHI Kiichi NIITSU
This paper describes algorithms for generating low intermodulation-distortion (IMD) two-tone sinewaves, for such as communication application ADC testing, using an arbitrary waveform generator (AWG) or a multi-bit ΣΔ DAC inside an SoC. The nonlinearity of the DAC generates distortion components, and we propose here eight methods to precompensate for the IMD using DSP algorithms and produce low-IMD two-tone signals. Theoretical analysis, simulation, and experimental results all demonstrate the effectiveness of our approach.
Shunichi TSUNODA Abu Hena Al MUKTADIR Eiji OKI
Smart OSPF (S-OSPF), a load balancing, shortest-path-based routing scheme, was introduced to improve the routing performances of networks running on OSPF assuming that exact traffic demands are known. S-OSPF distributes traffic from a source node to neighbor nodes, and after reaching the neighbor nodes, traffic is routed according to the OSPF protocol. However, in practice, exact traffic demands are difficult to obtain, and the distribution of unequal traffic to multiple neighbor nodes requires complex functionalities at the source. This paper investigates non-split S-OSPF with the hose model, in which only the total amount of traffic that each node injects into the network and the total amount of traffic each node receives from the network are known, for the first time, with the goal of minimizing the network congestion ratio (maximum link utilization over all links). In non-split S-OSPF, traffic from a source node to a destination node is not split over multiple routes, in other words, it goes via only one neighbor node to the destination node. The routing decision with the hose model is formulated as an integer linear programming (ILP) problem. Since the ILP problem is difficult to solve in a practical time, this paper proposes a heuristic algorithm. In the routing decision process, the proposed algorithm gives the highest priority to the node pair that has the highest product of the total amount of injected traffic by one node and total amount of received traffic by the other node in the pair, where both traffic volumes are specified in the hose model, and enables a source node to select the neighbor node that minimizes network congestion ratio for the worst case traffic condition specified by the hose model. The non-split S-OSPF scheme's network congestion ratios are compared with those of the split S-OSPF and classical shortest path routing (SPR) schemes. Numerical results show that the non-split S-OSPF scheme offers lower network congestion ratios than the classical SPR scheme, and achieves network congestion ratios comparable to the split S-OSPF scheme for larger networks. To validate the non-split S-OSPF scheme, using a testbed network experimentally, we develop prototypes of the non-split S-OSPF path computation server and the non-split S-OSPF router. The functionalities of these prototypes are demonstrated in a non-split S-OSPF network.
Junbo ZHANG Fuping PAN Bin DONG Qingwei ZHAO Yonghong YAN
In this paper, we presented a novel method for automatic pronunciation quality assessment. Unlike the popular “Goodness of Pronunciation” (GOP) method, this method does not map the decoding confidence into pronunciation quality score, but differentiates the different pronunciation quality utterances directly. In this method, the student's utterance need to be decoded for two times. The first-time decoding was for getting the time points of each phone of the utterance by a forced alignment using a conventional trained acoustic model (AM). The second-time decoding was for differentiating the pronunciation quality for each triphone using a specially trained AM, where the triphones in different pronunciation qualities were trained as different units, and the model was trained in discriminative method to ensure the model has the best discrimination among the triphones whose names were same but pronunciation quality scores were different. The decoding network in the second-time decoding included different pronunciation quality triphones, so the phone-level scores can be obtained from the decoding result directly. The phone-level scores were combined into the sentence-level scores using maximum entropy criterion. The experimental results shows that the scoring performance was increased significantly compared to the GOP method, especially in sentence-level.
Ki Sup HONG Sang Hoon LEE Lynn CHOI
Existing MANET routing protocols may not be efficient for mobile sensor networks (MSNs) since they generate too much control traffic by relying on flooding or route maintenance messages. Furthermore, peer-to-peer communication patterns assumed in MANET would exacerbate the traffic around sink nodes in MSNs. In this paper we propose traffic adaptive routing (TAR) for MSNs; it can reduce the control packets by analyzing and predicting the source, volume, and the patterns of both traffic and mobility. Through its analysis and the prediction of mobility, TAR also copes with dynamic topology changes by carrying out a fast route recovery process. Our theoretical analysis shows that TAR can effectively reduce unnecessary control packet flooding by 53% on average when compared to AODV. We implement TAR on NS-2. Our experimental evaluation confirms that TAR can not only improve the network and energy performance for MSNs but also can be a practical routing solution for MANET and WSNs compared to the existing ad hoc routing protocols.
This paper presents a method for learning an overcomplete, nonnegative dictionary and for obtaining the corresponding coefficients so that a group of nonnegative signals can be sparsely represented by them. This is accomplished by posing the learning as a problem of nonnegative matrix factorization (NMF) with maximization of the incoherence of the dictionary and of the sparsity of coefficients. By incorporating a dictionary-incoherence penalty and a sparsity penalty in the NMF formulation and then adopting a hierarchically alternating optimization strategy, we show that the problem can be cast as two sequential optimal problems of quadratic functions. Each optimal problem can be solved explicitly so that the whole problem can be efficiently solved, which leads to the proposed algorithm, i.e., sparse hierarchical alternating least squares (SHALS). The SHALS algorithm is structured by iteratively solving the two optimal problems, corresponding to the learning process of the dictionary and to the estimating process of the coefficients for reconstructing the signals. Numerical experiments demonstrate that the new algorithm performs better than the nonnegative K-SVD (NN-KSVD) algorithm and several other famous algorithms, and its computational cost is remarkably lower than the compared algorithms.
Min-Ho KA Aleksandr I. BASKAKOV Anatoliy A. KONONOV
A method for the specification of weighting functions for a spaceborne/airborne interferometric synthetic aperture radar (SAR) sensor for Earth observation and environment monitoring is introduced. This method is based on designing an optimum mismatched filter which minimizes the total power in sidelobes located out of a specified range region around the peak value point of the system point-target response, i.e. impulse response function under the constraint imposed on the peak value. It is shown that this method allows achieving appreciable improvement in accuracy performance without degradation in the range resolution.