Honai UEOKA Takehiro SATO Eiji OKI
Multi-core fiber (MCF) is one of the promising space-division multiplexing technologies to increase the capacity of optical networks. MCF-based networks have two challenges. One is the inter-core crosstalk (XT) that degrades the quality of optical signals in two neighboring fiber cores. The other is network protection against link failures that cause massive data loss. One way to protect against multiple link failures is to prepare physically separated links as a backup network. Probabilistic protection improves the efficiency of protection by allowing a certain probability of protection failure. Existing studies on backup network design with probabilistic protection do not target MCF-based networks, which raises problems such as protection failure due to the inter-core XT and excessive consumption of optical resources. To address these problems, this paper proposes a XT-aware backup network design model for the MCF optical path networks. The proposed model protects the network against probabilistic multiple link failures. We adopt probabilistic protection that allows a certain probability of protection failure due to the inter-core XT and minimizes the required number of links in the backup network. We present an algorithm to satisfy the probabilistic protection requirement and formulate the model as an integer linear programming problem. We develop a heuristic approach to apply the proposed model to larger networks. Numerical results observe that the proposed model requires fewer links than the dedicated allocation model, which provisions the backup paths in the same manner as the primary paths.
Masato YOSHIDA Kozo SATO Toshihiko HIROOKA Keisuke KASAI Masataka NAKAZAWA
We present detailed measurements and analysis of the guided acoustic wave Brillouin scattering (GAWBS)-induced depolarization noise in a multi-core fiber (MCF) used for a digital coherent optical transmission. We first describe the GAWBS-induced depolarization noise in an uncoupled four-core fiber (4CF) with a 125μm cladding and compare the depolarization noise spectrum with that of a standard single-mode fiber (SSMF). We found that off-center cores in the 4CF are dominantly affected by higher-order TRn,m modes rather than the TR2,m mode unlike in the center core, and the total power of the depolarization noise in the 4CF was almost the same as that in the SSMF. We also report measurement results for the GAWBS-induced depolarization noise in an uncoupled 19-core fiber with a 240μm cladding. The results indicate that the amounts of depolarization noise generated in the cores are almost identical. Finally, we evaluate the influence of GAWBS-induced polarization crosstalk (XT) on a coherent QAM transmission. We found that the XT limits the achievable multiplicity of the QAM signal to 64 in a transoceanic transmission with an MCF.
Yuto SAGAE Takashi MATSUI Taiji SAKAMOTO Kazuhide NAKAJIMA
We propose an ultra-low inter-core crosstalk (XT) multi-core fiber (MCF) with standard 125-μm cladding. We show the fiber design and fabrication results of an MCF housing four cores with W-shaped index profile; it offers XT of less than -67dB/km over the whole C+L band. This enables us to realize 10,000-km transmission with negligible XT penalty. We also observe a low-loss of 0.17dB/km (average) at a wavelength of 1.55μm and other optical properties compatible with ITU-T G.654.B fiber. We also elucidate its good micro-bend resistance in terms of both the loss and XT to confirm its applicability to high-density optical fiber cables. Finally, we show that the fabricated MCF is feasible along with long-distance transmission by confirming that the XT noise performance corresponds to transmission distances of 10,000km or more.
The potential transmission capacity of a standard single-mode fiber peaks at around 100Tb/s owing to fiber nonlinearity and the bandwidth limitation of amplifiers. As the last frontier of multiplexing, space-division multiplexing (SDM) has been studied intensively in recent years. Although there is still time to deploy such a novel fiber communication infrastructure; basic research on SDM has been carried out. Therefore, a comprehensive review is worthwhile at this time toward further practical investigations.
Junyang ZHANG Yang GUO Xiao HU Rongzhen LI
In recent years, deep learning based image recognition, speech recognition, text translation and other related applications have brought great convenience to people's lives. With the advent of the era of internet of everything, how to run a computationally intensive deep learning algorithm on a limited resources edge device is a major challenge. For an edge oriented computing vector processor, combined with a specific neural network model, a new data layout method for putting the input feature maps in DDR, rearrangement of the convolutional kernel parameters in the nuclear memory bank is proposed. Aiming at the difficulty of parallelism of two-dimensional matrix convolution, a method of parallelizing the matrix convolution calculation in the third dimension is proposed, by setting the vector register with zero as the initial value of the max pooling to fuse the rectified linear unit (ReLU) activation function and pooling operations to reduce the repeated access to intermediate data. On the basis of single core implementation, a multi-core implementation scheme of Inception structure is proposed. Finally, based on the proposed vectorization method, we realize five kinds of neural network models, namely, AlexNet, VGG16, VGG19, GoogLeNet, ResNet18, and performance statistics and analysis based on CPU, gtx1080TI and FT2000 are presented. Experimental results show that the vector processor has better computing advantages than CPU and GPU, and can calculate large-scale neural network model in real time.
Chien-Hui LIAO Charles H.-P. WEN
Hotspots occur frequently in 3D multi-core processors (3D-MCPs), and they may adversely impact both the reliability and lifetime of a system. We present a new thermally constrained task scheduler based on a thermal-pattern-aware voltage assignment (TPAVA) to reduce hotspots in and optimize the performance of 3D-MCPs. By analyzing temperature profiles of different voltage assignments, TPAVA pre-emptively assigns different initial operating-voltage levels to cores for reducing temperature increase in 3D-MCPs. The proposed task scheduler consists of an on-line allocation strategy and a new voltage-scaling strategy. In particular, the proposed on-line allocation strategy uses the temperature-variation rates of the cores and takes into two important thermal behaviors of 3D-MCPs that can effectively minimize occurrences of hotspots in both thermally homogeneous and heterogeneous 3D-MCPs. Furthermore, a new vertical-grouping voltage scaling (VGVS) strategy that considers thermal correlation in 3D-MCPs is used to handle thermal emergencies. Experimental results indicate that, when compared to a previous online thermally constrained task scheduler, the proposed task scheduler can reduce hotspot occurrences by approximately 66% (71%) and improve throughput by approximately 8% (2%) in thermally homogeneous (heterogeneous) 3D-MCPs. These results indicate that the proposed task scheduler is an effective technique for suppressing hotspot occurrences and optimizing throughput for 3D-MCPs subject to thermal constraints.
Runzi ZHANG Jinlin WANG Yiqiang SHENG Xiao CHEN Xiaozhou YE
Cache affinity has been proved to have great impact on the performance of packet processing applications on multi-core platforms. Flow-based packet scheduling can make the best of data cache affinity with flow associated data and context structures. However, little work on packet scheduling algorithms has been conducted when it comes to instruction cache (I-Cache) affinity in modified pipelining (MPL) architecture for multi-core systems. In this paper, we propose a protocol-aware packet scheduling (PAPS) algorithm aiming at maximizing I-Cache affinity at protocol dependent stages in MPL architecture for multi-protocol processing (MPP) scenario. The characteristics of applications in MPL are analyzed and a mapping model is introduced to illustrate the procedure of MPP. Besides, a stage processing time model for MPL is presented based on the analysis of multi-core cache hierarchy. PAPS is a kind of flow-based packet scheduling algorithm and it schedules flows in consideration of both application-level protocol of flows and load balancing. Experiments demonstrate that PAPS outperforms the Round Robin algorithm and the HRW-based (HRW) algorithm for MPP applications. In particular, PAPS can eliminate all I-Cache misses at protocol dependent stage and reduce the average CPU cycle consumption per packet by more than 10% in comparison with HRW.
The history of optical fiber and optical transmission technologies has been described in many publications. However, the history of other technologies designed to support the physical layer of optical transmission has not been described in much detail. I would like to highlight those technologies in addition to optical fibers. Therefore, this paper describes the history of the development of optical fiber related technologies such as fusion splicers, optical fiber connectors, ribbon fiber, and passive components based on changes in optical fibers and optical fiber cables. Moreover, I describe technologies designed to support multi-core fibers such as fan-in/fan-out devices.
Toshio MORIOKA Yoshinari AWAJI Yuichi MATSUSHIMA Takeshi KAMIYA
Research efforts initiated by the EXAT Initiative are described to realize Exabit/s optical communications, utilizing the 3M technologies, i.e. multi-core fiber, multi-mode control and multi-level modulation.
Hiroshi SAITO Masashi IMAI Tomohiro YONEDA
In this paper, we propose a redundant task allocation method for multi-core systems based on the Duplication with Temporary Triple-Modular Redundancy and Reconfiguration (DTTR) scheme. The proposed method determines task allocation of a given task graph to a given multi-core system model from task scheduling in given fault patterns. Fault patterns defined in this paper consist of a set of faulty cores and a set of surviving cores. To optimize the average failure rate of the system, task scheduling minimizes the execution time of the task graph preserving the property of the DTTR scheme. In addition, we propose a selection method of fault patterns to be scheduled to reduce the task allocation time. In the experiments, at first, we evaluate the proposed selection method of fault patterns in terms of the task allocation time. Then, we compare the average failure rate among the proposed method, a task allocation method which packs tasks into particular cores as much as possible, a task allocation method based on Simulated Annealing (SA), a task allocation method based on Integer Linear Programming (ILP), and a task allocation method based on task scheduling without considering the property of the DTTR scheme. The experimental results show that task allocation by the proposed method results in nearly the same average failure rate by the SA based method with shorter task allocation time.
In this paper, we exploit MapReduce framework and other optimizations to improve the performance of hash join algorithms on multi-core CPUs, including No partition hash join and partition hash join. We first implement hash join algorithms with a shared-memory MapReduce model on multi-core CPUs, including partition phase, build phase, and probe phase. Then we design an improved cuckoo hash table for our hash join, which consists of a cuckoo hash table and a chained hash table. Based on our implementation, we also propose two optimizations, one for the usage of SIMD instructions, and the other for partition phase. Through experimental result and analysis, we finally find that the partition hash join often outperforms the No partition hash join, and our hash join algorithm is faster than previous work by an average of 30%.
Masatoshi TANAKA Masayoshi HACHIWAKA Hirokazu TANIGUCHI
Fan-in/fan-out devices are necessary for the construction of multi-core fiber communication systems. A fan-out device using a capillary is proposed and made by connecting a tapered fiber bundle and a multi-core fiber. The tapered fiber bundle is elongated so that the core arrangement and the mode field diameter (MFD) of single-core fibers agree with those of the multi-core fiber. Suppressing the MFD change is necessary to reduce the coupling loss of the fan-out device. While elongating the fiber bundle, the MFD decreases at the beginning until the core reaches a certain core diameter, and then it begins to increase. We suppress the MFD change of the fan-out device by using this phenomenon. The average insertion loss at both ends of a multi-core fiber was approximately 1.6dB when the fabricated fan-in/fan-out devices were connected to the multi-core fiber.
Jun YAO Yasuhiko NAKASHIMA Naveen DEVISETTI Kazuhiro YOSHIMURA Takashi NAKADA
General purpose many-core architecture (MCA) such as GPGPU has recently been used widely to continue the performance scaling when the continuous increase in the working frequency has approached the manufacturing limitation. However, both the general purpose MCA and its building block general purpose processor (GPP) lack a tuning capability to boost energy efficiency for individual applications, especially computation intensive applications. As an alternative to the above MCA platforms, we propose in this paper our LAPP (Linear Array Pipeline) architecture, which takes a special-purpose reconfigurable structure for an optimal MIPS/W. However, we also keep the backward binary compatibility, which is not featured in most special hardware. More specifically, we used a general purpose VLIW processor, interpreting a commercial VLIW ISA, as the baseline frontend part to provide the backward binary compatibility. We also extended the functional unit (FU) stage into an FU array to form the reconfigurable backend for efficient execution of program hotspots to exploit parallelism. The hardware modules in this general purpose reconfigurable architecture have been locally zoned into several groups to apply preferable low-power techniques according to the module hardware features. Our results show that under a comparable performance, the tightly coupled general/special purpose hardware, which is based on a 180nm cell library, can achieve 10.8 times the MIPS/W of MCA architecture of the same technology features. When a 65 technology node is assumed, a similar 9.4x MIPS/W can be achieved by using the LAPP without changing program binaries.
Hidehiko TAKARA Tetsuo TAKAHASHI Kazuhide NAKAJIMA Yutaka MIYAMOTO
The paper presents ultra-high-capacity transmission technologies based on multi-core space-division-multiplexing. In order to realize high-capacity multi-core fiber (MCF) transmission, investigation of low crosstalk fiber and connection technology is important, and high-density signal generation using multilevel modulation and crosstalk management are also key technologies. 1Pb/s multi-core fiber transmission experiment using space-division-multiplexing is also described.
Yukihiro TSUCHIDA Koichi MAEDA Ryuichi SUGIZAKI
We propose multi-core erbium-doped fiber amplifiers for next-generation optical amplifiers utilized by space-division multiplexing technologies. Multi-core erbium-doped fiber amplifiers were studied widely as a means for overcoming exponential growth of internet traffic in the backbone network. We consider two approaches to excitation of erbium irons; One is core-pumping scheme, the other is cladding-pumping scheme. For a core-pumping configuration, we evaluate its applicability to future ultra long-haul network. In addition, we demonstrate that cladding-pumping configuration will enable reduction of power consumption, size, and cost because one multimode pumping laser diode can excite several cores simultaneously embedded in a common cladding and amplify several signals passed through the multi-core erbium-doped fiber cores.
Tetsuya HAYASHI Takashi SASAKI Eisuke SASAOKA
The stochastic behavior of inter-core crosstalk in multi-core fiber is discussed based on a theoretical model validated by measurements, and the effect of the crosstalk on the Q-factor in transmission systems, using multi-core fiber is investigated theoretically. The measurements show that the crosstalk rapidly changes with wavelength, and gradually changes with time, in obedience to the Gaussian distribution in I-Q planes. Therefore, the behavior of the crosstalk as a noise may depend on the bandwidth of the signal light. If the bandwidth is adequately broad, the crosstalk may behave as a virtual additive white Gaussian noise on I-Q planes, and the Q-penalty at the Q-factor of 9.8dB is less than 1dB when the statistical mean of the crosstalk from other cores is less than -16.7dB for PDM-QPSK, -23.7dB for PDM-16QAM, and -29.9dB for PDM-64QAM. If the bandwidth is adequately narrow, the crosstalk may behave as virtually static coupling that changes very gradually with time and heavily depends on the wavelength. To cope with a static crosstalk much higher than its statistical mean, a margin of several decibels from the mean crosstalk may be necessary for suppressing Q-penalty in the case of adequately narrow bandwidth.
Donghai TIAN Xuanya LI Mo CHEN Changzhen HU
Heap buffer overflow has been extensively studied for many years, but it remains a severe threat to software security. Previous solutions suffer from limitations in that: 1) Some methods need to modify the target programs; 2) Most methods could impose considerable performance overhead. In this paper, we present iCruiser, an efficient heap buffer overflow monitoring system that uses the multi-core technology. Our system is compatible with existing programs, and it can detect the heap buffer overflows concurrently. Compared with the latest heap protection systems, our approach can achieves stronger security guarantees. Experiments show that iCruiser can detect heap buffer overflow attacks effectively with a little performance overhead.
Donghai TIAN Mo CHEN Changzhen HU Xuanya LI
As more and more software vulnerabilities are exposed, shellcode has become very popular in recent years. It is widely used by attackers to exploit vulnerabilities and then hijack program's execution. Previous solutions suffer from limitations in that: 1) Some methods based on static analysis may fail to detect the shellcode using obfuscation techniques. 2) Other methods based on dynamic analysis could impose considerable performance overhead. In this paper, we propose Lemo, an efficient shellcode detection system. Our system is compatible with commodity hardware and operating systems, which enables deployment. To improve the performance of our system, we make use of the multi-core technology. The experiments show that our system can detect shellcode efficiently.
Nhat-Phuong TRAN Myungho LEE Sugwon HONG Seung-Jae LEE
Data encryption and decryption are common operations in network-based application programs that must offer security. In order to keep pace with the high data input rate of network-based applications such as the multimedia data streaming, real-time processing of the data encryption/decryption is crucial. In this paper, we propose a new parallelization approach to improve the throughput performance for the de-facto standard data encryption and decryption algorithm, AES-CTR (Counter mode of AES). The new approach extends the size of the block encrypted at one time across the unit block boundaries, thus effectively encrypting multiple unit blocks at the same time. This reduces the associated parallelization overheads such as the number of procedure calls, the scheduling and the synchronizations compared with previous approaches. Therefore, this leads to significant throughput performance improvements on a computing platform with a general-purpose multi-core processor and a Graphic Processing Unit (GPU).
Eunji PAK Sang-Hoon KIM Jaehyuk HUH Seungryoul MAENG
Although shared caches allow the dynamic allocation of limited cache capacity among cores, traditional LRU replacement policies often cannot prevent negative interference among cores. To address the contention problem in shared caches, cache partitioning and application scheduling techniques have been extensively studied. Partitioning explicitly determines cache capacity for each core to maximize the overall throughput. On the other hand, application scheduling by operating systems groups the least interfering applications for each shared cache, when multiple shared caches exist in systems. Although application scheduling can mitigate the contention problem without any extra hardware support, its effect can be limited for some severe contentions. This paper proposes a low cost solution, based on application scheduling with a simple cache insertion control. Instead of using a full hardware-based cache partitioning mechanism, the proposed technique mostly relies on application scheduling. It selectively uses LRU insertion to the shared caches, which can be added with negligible hardware changes from the current commercial processor designs. For the completeness of cache interference evaluation, this paper examines all possible mixes from a set of applications, instead of using a just few selected mixes. The evaluation shows that the proposed technique can mitigate the cache contention problem effectively, close to the ideal scheduling and partitioning.