Weiqiang LIU Xiaohui CHEN Weidong WANG
This work investigates the cell range expansion (CRE) possible with time-domain multiplexing inter-cell interference coordination (TDM ICIC) in heterogeneous cellular networks (HCN). CRE is proposed to enable a user to connect to a picocell even when it is not the cell with the strongest received power. However, the users in the expanded region suffer severe interference from the macrocells. To alleviate the cross-tier interference, TDM ICIC is proposed to improve the SIR of pico users. In contrast to previous studies on CRE with TDM ICIC, which rely mostly on simulations, we give theoretical analysis results for different types of users in HCN with CRE and TDM ICIC under the Poisson Point Process (PPP) model, especially for the users in the expanded region of picocells. We analyze the outage probability and average ergodic rate based on the connect probability and statistical distance we obtain in advance. Furthermore, we analyze the optimal ratio of almost blank subframes (ABS) and bias factor of picocells in terms of the network fairness, which is useful in the parameter design of a two-tier HCN.
Hisashi IWAMOTO Yuji YANO Yasuto KURODA Koji YAMAMOTO Shingo ATA Kazunari INOUE
Network traffic keeps increasing due to the increasing popularity of video streaming services. Routers and switches in wire-line networks require guaranteed line rates as high as 20 Gbp/s as well as advanced quality of service (QoS). Hybrid SRAM and DRAM architecture previously presented with the benefit of high-speed and high-density, but it requires complex memory management. As a result, it has hardly supported large numbers of queue, which is an effective approach to satisfying the QoS requirements. This paper proposes an intelligent memory management unit (MMU) which is based on the hybrid architecture, where over 16k multi queues are integrated. The performance examined by the system board is zero-packet loss under the seamless traffic with 60–1.5 kByte packet-length (deterministic manner). Noticeable feature in this paper's architecture is eliminating the need for any premium memories but only low-cost commodity SRAMs and DRAMs are used. The intelligent MMU employs the head buffer architecture, which is suitable for supporting a large numbers of FIFO queues. An experimental board based on this architecture is embedded into a Router system to evaluate the performance. Using 16k queues at 20 Gbps, zero-packet loss is examined with 64-Byte to 1,500-Byte packet-length.
Kotaro NEGISHI Hiroyuki UENOHARA
We have investigated the operational performance of an optical serial-to-parallel conversion scheme using a phase-shifted preamble handling optical packets formatted by differential phase shift keying (DPSK) for integrated optical serial-to-parallel converter (OSPC). The same architecture for on-off keyed signals, based on a transmitter-side preamble at the top of the packet and phase-shifted by π/2, which is then -π/2 phase-biased with a Mach-Zehnder delay interferometer (MZDI), is available for binary and differential PSK signals. The delay length of these signals is determined by the relative timing positions of the gated bit and a balanced receiver-side photodetector. We simulated the operational performance of this scheme and its tolerance against the degree of modulation and optical chirp, with our results showing that a phase shift of more than 0.94π is required in order to attain a suppression ratio in the OSPC output consistent with a bit error rate of less than 10-9 (based on the ratio of intensity of the extracted bit to the maximum peak intensity of the cancelled bits using a single-arm phase modulator). However, by using a Mach-Zehnder phase modulator, the modulation angle can be relaxed to about 0.36π. Experimental investigation of the OSPC showed that its functional tolerance with respect to the modulation angle agreed well with the simulated values. Finally, we performed optical label processing using the OSPC in conjunction with an address table, and our results confirmed the potential of the OSPC for use in label recognition.
Yufei LIN Xuejun YANG Xinhai XU Xiaowei GUO
Scaling up the system size has been the common approach to achieving high performance in parallel computing. However, designing and implementing a large-scale parallel system can be very costly in terms of money and time. When building a target system, it is desirable to initially build a smaller version by using the processing nodes with the same architecture as those in the target system. This allows us to achieve efficient and scalable prediction by using the smaller system to predict the performance of the target system. Such scalability prediction is critical because it enables system designers to evaluate different design alternatives so that a certain performance goal can be successfully achieved. As the de facto standard for writing parallel applications, MPI is widely used in large-scale parallel computing. By categorizing the discrete event simulation methods for MPI programs and analyzing the characteristics of scalability prediction, we propose a novel simulation method, called virtual-actual combined execution-driven (VACED) simulation, to achieve scalable prediction for MPI programs. The basic idea behind is to predict the execution time of an MPI program on a target machine by running it on a smaller system so that we can predict its communication time by virtual simulation and obtain its sequential computation time by actual execution. We introduce a model for the VACED simulation as well as the design and implementation of VACED-SIM, a lightweight simulator based on fine-grained activity and event definitions. We have validated our approach on a sub-system of Tianhe-1A. Our experimental results show that VACED-SIM exhibits higher accuracy and efficiency than MPI-SIM. In particular, for a target system with 1024 cores, the relative errors of VACED-SIM are less than 10% and the slowdowns are close to 1.
Juntao GAO Jiajia LIU Xiaohong JIANG Osamu TAKAHASHI Norio SHIRATORI
The capacity of general mobile ad hoc networks (MANETs) remains largely unknown up to now, which significantly hinders the development and commercialization of such networks. Available throughput capacity studies of MANETs mainly focus on either the order sense capacity scaling laws, the exact throughput capacity under a specific algorithm, or the exact throughput capacity without a careful consideration of critical wireless interference and transmission range issues. In this paper, we explore the exact throughput capacity for a class of MANETs, where we adopt group-based scheduling to schedule simultaneous link transmissions for interference avoidance and allow the transmission range of each node to be adjusted. We first determine a general throughput capacity upper bound for the concerned MANETs, which holds for any feasible packet delivery algorithm in such networks. We then prove that the upper bound we determined is just the exact throughput capacity for this class of MANETs by showing that for any traffic input rate within the throughput capacity upper bound, there exists a corresponding two-hop relay algorithm to stabilize such networks. A closed-form upper bound for packet delay is further derived under any traffic input rate within the throughput capacity. Finally, based on the network capacity result, we examine the impacts of transmission range and node density upon network capacity.
In this paper, an area-efficient decoder architecture is proposed for the quasi-cyclic low-density parity check (QC-LDPC) codes specified in the IEEE 802.16e WiMAX standard. The decoder supports all the code rates and codeword lengths defined in the standard. In order to achieve low area and maximize hardware utilization, the decoder utilizes 4 decoding function units, which is the greatest common divisor of the expansion factors. In addition, the decoder adopts a novel scheduling scheme named stride scheduling, which stores the extrinsic messages in non-sequential order to replace the conventional complex flexible permutation network with simple small-sized cyclic shifters and also minimize the number of memory accesses. To further minimize the complexity, the number of extrinsic memory instances for 24 block columns is reduced to 5 banks by identifying independent sets. All the memory instances used in the decoder are single-port memories which cost less area and price compared to dual-port ones. Finally, the decoding function units have partially parallel structure to make the decoding throughput sufficiently over the requirement of the WiMAX standard. The proposed decoder is synthesized with 49 K equivalent gates and 54,144 bits of memory, and the implementation occupies 0.40 mm2 in a 65 nm CMOS technology.
Soma SHIRAISHI Yaokai FENG Seiichi UCHIDA
This paper proposes a new part-based approach for skew estimation of document images. The proposed method first estimates skew angles on rather small areas, which are the local parts of characters, and subsequently determines the global skew angle by aggregating those local estimations. A local skew estimation on a part of a skewed character is performed by finding an identical part from prepared upright character images and calculating the angular difference. Specifically, a keypoint detector (e.g. SURF) is used to determine the local parts of characters, and once the parts are described as feature vectors, a nearest neighbor search is conducted in the instance database to identify the parts. Finally, a local skew estimation is acquired by calculating the difference of the dominant angles of brightness gradient of the parts. After the local skew estimation, the global skew angle is estimated by the majority voting of those local estimations, disregarding some noisy estimations. Our experiments have shown that the proposed method is more robust to short and sparse text lines and non-text backgrounds in document images compared to conventional methods.
Shusuke YOSHIMOTO Shunsuke OKUMURA Koji NII Hiroshi KAWAGUCHI Masahiko YOSHIMOTO
This paper presents a proposed NMOS-centered 6T SRAM cell layout that reduces a neutron-induced multiple-cell-upset (MCU) SER on a same wordline. We implemented an 1-Mb SRAM macro in a 65-nm CMOS process and irradiated neutrons as a neutron-accelerated test to evaluate the MCU SER. The proposed 6T SRAM macro improves the horizontal MCU SER by 67–98% compared with a general macro that has PMOS-centered 6T SRAM cells.
Eunji PAK Sang-Hoon KIM Jaehyuk HUH Seungryoul MAENG
Although shared caches allow the dynamic allocation of limited cache capacity among cores, traditional LRU replacement policies often cannot prevent negative interference among cores. To address the contention problem in shared caches, cache partitioning and application scheduling techniques have been extensively studied. Partitioning explicitly determines cache capacity for each core to maximize the overall throughput. On the other hand, application scheduling by operating systems groups the least interfering applications for each shared cache, when multiple shared caches exist in systems. Although application scheduling can mitigate the contention problem without any extra hardware support, its effect can be limited for some severe contentions. This paper proposes a low cost solution, based on application scheduling with a simple cache insertion control. Instead of using a full hardware-based cache partitioning mechanism, the proposed technique mostly relies on application scheduling. It selectively uses LRU insertion to the shared caches, which can be added with negligible hardware changes from the current commercial processor designs. For the completeness of cache interference evaluation, this paper examines all possible mixes from a set of applications, instead of using a just few selected mixes. The evaluation shows that the proposed technique can mitigate the cache contention problem effectively, close to the ideal scheduling and partitioning.
Keita MOCHIZUKI Hiroshi ARUGA Hiromitsu ITAMOTO Keitaro YAMAGISHI Yuichiro HORIGUCHI Satoshi NISHIKAWA Ryota TAKEMURA Masaharu NAKAJI Atsushi SUGITATSU
We have succeeded in demonstrating high-performance four-channel 25 Gb/s integrated receiver for 100 Gb/s Ethernet with a built-in spatial Demux optics and an integrated PD array. All components which configure to the Demux optics adhered to a prism. Because of the shaping accuracy for prism, the insertion loss was able to suppress to 0.8 dB with small size. The connection point of the package for high speed electrical signals was improved to decrease the transmission loss. The small size of 12 mm 17 mm 7 mm compact package with a side-wall electrical connector has been achieved, which is compatible with the assembly in CFP2 form-factor. We observed the sensitivity at average power of -12.1 dBm and the power penalty of sensitivity due to the crosstalk of less than 0.1 dB.
Fang YANG Keqian YAN Changyong PAN Jian SONG
Square root-raised-cosine (SRRC) filters are used in many systems for spectrum shaping, which leads to a high peak-to-average power ratio (PAPR). Nevertheless, some applications demand a low PAPR in terms of both the error performance and the strict restriction of the spectrum mask. In this letter, we propose a PAPR reduction method based on the modified active constellation extension for systems using SRRC filters. Results show that the proposed method substantially reduces the PAPR, and therefore it is applicable to satellite communications to improve the power efficiency at the transmitter.
Xianling WANG Xin ZHANG Hongwen YANG Dacheng YANG
This paper investigates the transmission capacity of open-loop spatial multiplexing with zero-forcing receivers in overlaid ad hoc networks. We first derive asymptotic closed-form expressions for the transmission capacity of two coexisting networks (a primary network vs. a secondary network). We then address a special case with equal numbers of transmit and receive antennas through exact analysis. Numerical results validate the accuracy of our expressions. Our findings show that the overall transmission capacity of coexisting networks will improve significantly over that of a single network if the primary network can tolerate a slight outage probability increase. This improvement can be further boosted if more streams are configured in the spatial multiplexing scheme; less improvement is achieved by placing more antennas at the receive side than the transmit side. However, when the stream number exceeds a certain limit, spatial multiplexing will produce negative effect for the overlaid network.
Ryo YAMAGUCHI Shouhei KIDERA Tetsuo KIRIMOTO
Ultra-wideband pulse radar is a promising technology for the imaging sensors of rescue robots operating in disaster scenarios, where optical sensors are not applicable because of thick smog or high-density gas. For the above application, while one promising ultra-wideband radar imaging algorithm for a target with arbitrary motion has already been proposed with a compact observation model, it is based on an ellipsoidal approximation of the target boundary, and is difficult to apply to complex target shapes. To tackle the above problem, this paper proposes a non-parametric and robust imaging algorithm for a target with arbitrary motion including rotation and translation being observed by multi-static radar, which is based on the matching of target boundary points obtained by the range points migration (RPM) algorithm extended to the multi-static radar model. To enhance the imaging accuracy in situations having lower signal-to-noise ratios, the proposed method also adopts an integration scheme for the obtained range points, the antenna location part of which is correctly compensated for the estimated target motion. Results from numerical simulations show that the proposed method accurately extracts the surface of a moving target, and estimates the motion of the target, without any target or motion model.
Akihiro FUJIMOTO Yusuke HIROTA Hideki TODE Koso MURAKAMI
To establish seamless and highly robust content distribution, we proposed the new concept of Inter-Stream Forward Error Correction (FEC), an efficient data recovery method leveraging several video streams. Our previous research showed that Inter-Stream FEC had significant recovery capability compared with the conventional FEC method under ideal modeling conditions and assumptions. In this paper, we design the Inter-Stream FEC architecture in detail with a view to practical application. The functional requirements for practical feasibility are investigated, such as simplicity and flexibility. Further, the investigation clarifies a challenging problem: the increase in processing delay created by the asynchronous arrival of packets. To solve this problem, we propose a pragmatic parity stream construction method. We implement and evaluate experimentally a prototype system with Inter-Stream FEC. The results demonstrate that the proposed system could achieve high recovery performance in our experimental environment.
Motoharu SASAKI Wataru YAMADA Naoki KITA Takatoshi SUGIYAMA
A new path loss model of interference between mobile terminals in a residential area is proposed. The model uses invertible formulas and considers the effects on path loss characteristics produced by paths having many corners or corners with various angles. Angular profile and height pattern measurements clarify three paths that are dominant in terms of their effect on the accurate modeling of path loss characteristics in residential areas: paths along a road, paths between houses, and over-roof propagation paths. Measurements taken in a residential area to verify the model's validity show that the model is able to predict path loss with greater accuracy than conventional models.
Takuya TOJO Hiroyuki KITADA Kimihide MATSUMOTO
Estimating the packet loss ratio of TCP transfers is essential for passively measuring Quality of Service (QoS) on the Internet traffic. However, only a few studies have been conducted on this issue. The Benko-Veres algorithm is one technique for estimating the packet loss ratio of two networks separated by a measurement point. However, this study shows that it leads to an estimation error of a few hundred percent in the particular environment where the packet loss probabilities between the two networks are asymmetrical. We propose a passive method for packet loss estimation that offers improved estimation accuracy by introducing classification conditions for the TCP retransmission timeout. An experiment shows that our proposed algorithm suppressed the maximum estimation error to less than 15%.
Tatsuya SAKANUSHI Jie HU Kou YAMADA
The simple repetitive control system proposed by Yamada et al. is a type of servomechanism for periodic reference inputs. This system follows a periodic reference input with a small steady-state error, even if there is periodic disturbance or uncertainty in the plant. In addition, simple repetitive control systems ensure that transfer functions from the periodic reference input to the output and from the disturbance to the output have finite numbers of poles. Yamada et al. clarified the parameterization of all stabilizing simple repetitive controllers. Recently, Yamada et al. proposed the parameterization of all stabilizing two-degrees-of-freedom (TDOF) simple repetitive controllers that can specify the input-output characteristic and the disturbance attenuation characteristic separately. However, when using the method of Yamada et al., it is complex to specify the low-pass filter in the internal model for the periodic reference input that specifies the frequency characteristics. This paper extends the results of Yamada et al. and proposes the parameterization of all stabilizing TDOF simple repetitive controllers with specified frequency characteristics in which the low-pass filter can be specified beforehand.
Fei LI Masaya MIYAHARA Akira MATSUZAWA
Recent attempts to directly combine CMOS pixel readout chips with modern gas detectors open the possibility to fully take advantage of gas detectors. Those conventional readout LSIs designed for hybrid semiconductor detectors show some issues when applied to gas detectors. Several new proposed readout LSIs can improve the time and the charge measurement precision. However, the widely used basic charge sensitive amplifier (CSA) has an almost fixed dynamic range. There is a trade-off between the charge measurement resolution and the detectable input charge range. This paper presents a method to apply the folding integration technique to a basic CSA. As a result, the detectable input charge dynamic range is expanded while maintaining all the key merits of a basic CSA. Although folding integration technique has already been successfully applied in CMOS image sensors, the working conditions and the signal characteristics are quite different for pixel readout LSIs for gas particle detectors. The related issues of the folding CSA for pixel readout LSIs, including the charge error due to finite gain of the preamplifier, the calibration method of charge error, and the dynamic range expanding efficiency, are addressed and analyzed. As a design example, this paper also demonstrates the application of the folding integration technique to a Qpix readout chip. This improves the charge measurement resolution and expands the detectable input dynamic range while maintaining all the key features. Calculations with SPICE simulations show that the dynamic range can be improved by 12 dB while the charge measurement resolution is improved by 10 times. The charge error during the folding operation can be corrected to less than 0.5%, which is sufficient for large input charge measurement.
Norifumi KAMIYA Yoichi HASHIMOTO Masahiro SHIGIHARA
In this paper, we present a novel class of long quasi-cyclic low-density parity-check (QC-LDPC) codes. Each of the codes in this class has a structure formed by concatenating single-parity-check codes and QC-LDPC codes of shorter lengths, which allows for efficient, high throughput encoder/decoder implementations. Using a code in this class, we design a forward error correction (FEC) scheme for optical transmission systems and present its high throughput encoder/decoder architecture. In order to demonstrate its feasibility, we implement the architecture on a field programmable gate array (FPGA) platform. We show by both FPGA-based simulations and measurements of an optical transmission system that the FEC scheme can achieve excellent error performance and that there is no significant performance degradation due to the constraint on its structure while getting an efficient, high throughput implementation is feasible.
Given a binary image I and a threshold t, the size-thresholded binary image I(t) defined by I and t is the binary image after removing all connected components consisting of at most t pixels. This paper presents space-efficient algorithms for computing a size-thresholded binary image for a binary image of n pixels, assuming that the image is stored in a read-only array with random-access. With regard to the problem, there are two cases depending on how large the threshold t is, namely, Relatively large threshold where t = Ω(), and Relatively small threshold where t = O(). In this paper, a new algorithmic framework for the problem is presented. From an algorithmic point of view, the problem can be solved in O() time and O() work space. We propose new algorithms for both the above cases which compute the size-threshold binary image for any binary image of n pixels in O(nlog n) time using only O() work space.