Oleg A. MUKHANOV Dmitri KIRICHENKO Igor V. VERNIK Timur V. FILIPPOV Alexander KIRICHENKO Robert WEBBER Vladimir DOTSENKO Andrei TALALAEVSKII Jia Cao TANG Anubhav SAHU Pavel SHEVCHENKO Robert MILLER Steven B. KAPLAN Saad SARWANA Deepnarayan GUPTA
Digital superconductor electronics has been experiencing rapid maturation with the emergence of smaller-scale, lower-cost communications applications which became the major technology drivers. These applications are primarily in the area of wireless communications, radar, and surveillance as well as in imaging and sensor systems. In these areas, the fundamental advantages of superconductivity translate into system benefits through novel Digital-RF architectures with direct digitization of wide band, high frequency radio frequency (RF) signals. At the same time the availability of relatively small 4 K cryocoolers has lowered the foremost market barrier for cryogenically-cooled digital electronic systems. Recently, we have achieved a major breakthrough in the development, demonstration, and successful delivery of the cryocooled superconductor digital-RF receivers directly digitizing signals in a broad range from kilohertz to gigahertz. These essentially hybrid-technology systems combine a variety of superconductor and semiconductor technologies packaged with two-stage commercial cryocoolers: cryogenic Nb mixed-signal and digital circuits based on Rapid Single Flux Quantum (RSFQ) technology, room-temperature amplifiers, FPGA processing and control circuitry. The demonstrated cryocooled digital-RF systems are the world's first and fastest directly digitizing receivers operating with live satellite signals in X-band and performing signal acquisition in HF to L-band at ~30 GHz clock frequencies.
Soon-Woo LEE Young-Jin PARK Kwan-Ho KIM
In this paper, an energy-collection-based non-coherent IR-UWB receiver allowing low complexity and low power consumption is proposed for short range data communication. The proposed receiver consists of an on-the-fly integrator, a 1-bit digital sampler, a pre-processor and a digital symbol synchronizer. The on-the-fly integrator for energy collection and the 1-bit digital sampler reduce complexity of IR-UWB system. Furthermore, with a simple digital filter in the pre-processing unit, SNR and robustness of the receiver against time-varying channel are enhanced. Also the receiver complexity is diminished by a simple scheme of symbol synchronization based on rough time information about incoming pulses, not requiring exact timing information. The performance of the proposed receiver is simulated based on IEEE 802.15.4a channel model and the algorithms are implemented and verified on a FPGA.
A new load balanced channel sharing method (CSM), namely Heuristic Traffic Load Balanced (HTLB) CSM, is proposed for metro-wavelength division multiple access (WDMA) networks. In particular, HTLB CSM is designed to be effective for pre-allocation based medium access control (MAC) protocols by balancing traffic loads corresponding to pre-assigned destinations per time slot. As a result, HTLB CSM is shown to provide lower time complexity than the well-known sub-optimal load balanced CSM, MULTIFIT CSM. Furthermore, the Jain Index of the HTLB CSM is shown to be higher and more consistent than the MULTIFIT CSM and other pre-fixed CSMs under diverse traffic conditions.
The historical review of Taiwan's researching activities on the features of PECVD grown SiOx are also included to realize the performance of Si nanocrystal based MOSLED made by such a Si-rich SiOx film with embedded Si nanocrystals on conventional Si substrate. A surface nano-roughened Si substrate with interfacial Si nano-pyramids at SiOx/Si interface are also reviewed, which provide the capabilities of enhancing the surface roughness induced total-internal-reflection relaxation and the Fowler-Nordheim tunneling based carrier injection. These structures enable the light emission and extraction from a metal-SiOx-Si MOSLED.
Yuki KOBAYASHI Murali JAYAPALA Praveen RAGHAVAN Francky CATTHOOR Masaharu IMAI
Clustering L0 buffers is effective for energy reduction in the instruction memory caches of embedded VLIW processors. However, the efficiency of the clustering depends on the schedule of the target application. For improving the energy efficiency of L0 clusters, an operation shuffling is proposed, which explores assignment of operations for each cycle, generates various schedules, and evaluates them to find an energy efficient schedule. This approach can find energy efficient schedules, however, it takes a long time to obtain the final result. In this paper, we propose a new method to directly generate an energy efficient schedule without iterations of operation shuffling. In the proposed method, a compiler schedules operations using the result of the single operation shuffling as a constraint. We propose some optimization algorithms to generate an energy efficient schedule for a given L0 cluster organization. The proposed method can drastically reduce the computational effort since it performs the operation shuffling only once. The experimental results show that comparable energy reduction is achieved by using the proposed method while the computational effort can be reduced significantly over the conventional operation shuffling.
Tadayoshi HORITA Yuuji KATOU Itsuo TAKANAMI
This paper deals with redundant 3D mesh processor arrays using 1.5-track switches, considering track and switch faults together with processor faults. Four variants are defined based on the distributions of spare PEs, and arrays of three variants have the same PE redundancies among them, but the fabrication-time costs are different. We investigate in detail how the reliability of a total system changes according to the reliabilities of tracks and switches as well as PEs, and show the concrete values of Mt and Ms, when the reliability of array are almost the same even if its variant is changed, and when it is not so, respectively, where Mt and Ms are the ratio of the hardware complexities of a PE and a track, and that of a PE and a contact point of a switch, respectively. Other results which are effective basis for the design of fault-tolerant 3D PE arrays using 1.5-TSs are given.
Toshihide AJIKI Toyohiko ISHIHARA
We have derived the novel extended UTD (Uniform Geometrical Theory of Diffraction) solution and the novel modified UTD solution for the back scattering of an incident whispering gallery (WG) mode on the edge of a cylindrically curved conducting sheet. By comparing with the reference solution obtained from the integral representation of the scattered field by integrating numerically along the integration path, we have confirmed the validity and the utility of the novel asymptotic solutions proposed in the present study. It is shown that the extended UTD solution can be connected smoothly to the modified UTD solution on the geometrical boundary separating the edge-diffracted ray and the surface-diffracted ray.
This letter deals with blind multiuser detection based on the multi-channel linearly constrained constant modulus algorithm (MLCCMA) for asynchronous code division multiple access (CDMA) systems over frequency-selective Rayleigh fading channels. In conjunction with the decision-feedback generalized sidelobe canceller (DFGSC), we present an efficient approach to combat multiple access interference and intersymbol interference. Computer simulations confirm that the proposed MLCCMA-based DFGSC can significantly speed up convergence and improve the output performance.
Byeong-Seok SHIN Dong-Ryeol OH Daniel KANG
Because of its simplicity and intuitive approach, point-based rendering has been a very popular research area. Recent approaches have focused on hardware-accelerated techniques. By applying a deferred shading scheme, both high-quality images and high-performance rendering have been achieved. However, previous methods showed problems related to depth-based visibility computation. We propose an extended point-based rendering method using a visibility map. In our method we employ a distance-based visibility technique (replacing depth-based visibility), an averaged position map and an adaptive fragment processing scheme, resulting in more accurate and improved image quality, as well as improved rendering performance.
Fault-tolerance is an important design issue in building a reliable mobile computing system. This paper considers checkpointing recovery services for a mobile computing system based on the ad-hoc network environment. Since potential problems of this new environment are insufficient power and limited storage capacity, the proposed scheme tries to reduce disk access frequency for saving recovery information, and also the amount of information saved for recovery. A brief simulation study has been performed and the results show that the proposed scheme takes advantage of the existing checkpointing recovery schemes.
Takao FUJII Isao OHTA Tadashi KAWAI Yoshihiro KOKUBO
This paper presents some structures of artificial coplanar waveguide with very slow phase velocity and their applications to a design of compact 3-dB branch-line couplers. The slow-wave structure is constructed by periodically loading both of series inductance and shunt capacitance. First, a basic miniature branch-line coupler is designed and consequently considerable size-reduction of about 1/4 is obtained. Next, a broadband design technique is described using open-circuited quarter-wavelength series-stubs added at each port as a matching network. By size-reducing the series-stubs and branchline sections, a very compact broadband coupler with a good hybrid performance over a wide bandwidth of 31 percent or more is realized. The design concepts and procedures are verified both numerically and experimentally.
Kohei HOSOKAWA Katsunori TANAKA Yuichi NAKAMURA
FPGA-based hardware emulators are often used for the verification of LSI functions. They generally have dedicated external memories, such as SDRAMs, to compensate for the lack of memory capacity in FPGAs. In such a case, access between the FPGAs and the dedicated external memory may represent a major bottleneck with respect to emulation speed since the dedicated external memory may have to emulate a large number of memory blocks. In this paper, we propose three methods, "Dynamic Clock Control (DCC)," "Memory Mapping Optimization (MMO)," and "Efficient Access Scheduling (EAS)," to avoid this bottleneck. DCC controls an emulation clock dynamically in accord with the number of memory accesses within one emulation clock cycle. EAS optimizes the ordering of memory access to the dedicated external memory, and MMO optimizes the arrangement of the dedicated external memory addresses to which respective memories will be emulated. With them, emulation speed can be made 29.0 times faster, as evaluated in actual LSI emulations.
Takefumi MIYOSHI Nobuhiko SUGINO
For a coarse grain dynamic reconfigurable processing unit cooperating with a general purpose processor, a context selection method, which can reduce total execution cycles of a given program, is proposed. The method evaluates context candidates from a given program, in terms of reduction in cycles by exploiting parallel and pipeline execution of the reconfigurable processor. According to this evaluation measure, the method selects appropriate contexts for the dynamic reconfigurable processing unit. The proposed method is implemented on the framework of COINS project. For several example programs, the generated codes are evaluated by a software simulator in terms of execution cycles, and these results prove the effectiveness of the proposed method.
Hiroaki TANAKA Yoshinori TAKEUCHI Keishi SAKANUSHI Masaharu IMAI Hiroki TAGAWA Yutaka OTA Nobu MATSUMOTO
SIMD instructions are often implemented in modern multimedia oriented processors. Although SIMD instructions are useful for many digital signal processing applications, most compilers do not exploit SIMD instructions. The difficulty in the utilization of SIMD instructions stems from data parallelism in registers. In assembly code generation, the positions of data in registers must be noted. A technique of generating pack instructions which pack or reorder data in registers is essential for exploitation of SIMD instructions. This paper presents a code generation technique for SIMD instructions with pack instructions. SIMD instructions are generated by finding and grouping the same operations in programs. After the SIMD instruction generation, pack instructions are generated. In the pack instruction generation, Multi-valued Decision Diagram (MDD) is introduced to represent and to manipulate sets of packed data. Experimental results show that the proposed code generation technique can generate assembly code with SIMD and pack instructions performing repacking of 8 packed data in registers for a RISC processor with a dual-issue coprocessor which supports SIMD and pack instructions. The proposed method achieved speedup ratio up to about 8.5 by SIMD instructions and multiple-issue mechanism of the target processor.
Because the leakage current of a digital circuit depends on the states of the circuit's logic gates, assigning a minimum leakage vector (MLV) for the primary inputs and the flip-flops' outputs of the circuit that operates in the sleep mode is a popular technique for leakage current reduction. In this paper, we propose a novel probability-based algorithm and technique that can rapidly find an MLV. Unlike most traditional techniques that ignore the leakage current overhead of the newborn vector controller, our technique can take this overhead into account. Ignoring this overhead during solution space exploration may bring a side effect that is misrecognizing a non-optimal solution as an optimal one. Experimental results show that our heuristic algorithm can reduce the leakage current up to 59.5% and can find the optimal solutions on most of the small MCNC benchmark circuits. Moreover, the required CPU time of our probability-based program is significantly less than that of a random search program.
Farhad MEHDIPOUR Hamid NOORI Morteza SAHEB ZAMANI Koji INOUE Kazuaki MURAKAMI
Extracting frequently executed (hot) portions of the application and executing their corresponding data flow graph (DFG) on the hardware accelerator brings about more speedup and energy saving for embedded systems comprising a base processor integrated with a tightly coupled accelerator. Extending DFGs to support control instructions and using Control DFGs (CDFGs) instead of DFGs results in more coverage of application code portion are being accelerated hence, more speedup and energy saving. In this paper, motivations for extending DFGs to CDFGs and handling control instructions are introduced. In addition, basic requirements for an accelerator with conditional execution support are proposed. Then, two algorithms are presented for temporal partitioning of CDFGs considering the target accelerator architectural constraints. To demonstrate effectiveness of the proposed ideas, they are applied to the accelerator of a reconfigurable processor called AMBER. Experimental results approve the remarkable effectiveness of covering control instructions and using CDFGs versus DFGs in the aspects of performance and energy reduction.
Miyuki HANAOKA Makoto SHIMAMURA Kenji KONO
Exploiting layer7 context is an effective approach to improving the accuracy of detecting malicious messages in network intrusion detection/prevention systems (NIDS/NIPSs). Layer7 context enables us to inspect message formats and the message exchanged order. Unfortunately, layer7-aware NIDS/NIPSs pose crucial implementation issues because they require full TCP and IP reassembly without losing 1) complete prevention, 2) performance, 3) application transparency, or 4) transport transparency. Complete prevention means that the NIDS/NIPS should prevent malicious messages from reaching target applications. Application transparency means not requiring any modifications to and/or reconfiguration of server and client applications. Transport transparency is not to disrupt the end-to-end semantics of TCP/IP. To the best of our knowledge, none of the existing approaches meet all of these requirements. We have developed an efficient mechanism for layer7-aware NIDS/NIPSs that does meet the above requirements. Our store-through does this by forwarding each out-of-order or IP-fragmented packet immediately after copying the packet even if it has not been checked yet by an NIDS/NIPS sensor. Although the forwarded packet might turn out to be a part of an attack message, the store-through mechanism can successfully defend against the attack by blocking one of the subsequent packets that contain another part of attack message. Testing of a prototype in Linux kernel 2.4.30 demonstrated that the overhead of our mechanism is negligible compared with that of a simple IP forwarder even with the presence of out-of-order and IP-fragmented packets. In addition, the experimental results suggest that the CPU and memory usage incurred by our store-through is not significant.
Augusto FORONDA Yuhi HIGUCHI Chikara OHTA Masahiko YOSHIMOTO Yoji OKADA
IEEE 802.11e Medium Access Control (MAC) is a supplement to the IEEE 802.11 Wireless Network (WLAN) standard to support Quality of Service (QoS). The 802.11e MAC defines a new coordination function, namely Hybrid Coordination Function (HCF), which takes the QoS requirements of flows into account and allocates Transmission Opportunity (TXOP) to stations. On the basis of mean sending rate, delay of Variable Bit Rate (VBR) traffic cannot be bounded with the reference HCF scheduling algorithm proposed in this supplement. In this paper, we propose a new Connection Admission Control (CAC) and a scheduling algorithm that utilize the token bucket and a modified Latency-Rate (LR) scheduling algorithm to guarantee a bounded delay for HCF Controlled Channel Access (HCCA). The new Service Interval (SI) is calculated to optimize the number of stations accommodated and takes into account delay bound and token bucket parameters. We show that it is possible to obtain worst-case performance guarantees on delay. First, we analyze the behavior of the new scheduler with a loss free wireless channel model and after this, with a burst loss model and we explain how it is possible to extend this scheduler for a multi-rate scheme. Properties of the proposal are investigated both theoretically and using ns-2 simulations. We present a set of simulations with both Constant Bit Rate (CBR) and VBR flows and performance comparisons with HCF scheduling algorithm. The results show that the delay upper bound can be achieved for a large range of networks load with bandwidth optimization.
Xiaoming TAO Chao ZHANG Jianhua LU
Doppler diversity has been proven effective to combat time variation caused by Doppler spread in single carrier systems. However, it is not efficient to directly apply Doppler diversity into Multi-Carrier Code Division Multiple Access (MC-CDMA) systems because Inter-Carrier-Interference (ICI) increases with the artificial frequency shifts in diversity branches. In this paper, a novel Doppler diversity scheme in MC-CDMA with Three Zero Correlation Zones (T-ZCZ) sequences is proposed to further improve the performance of Doppler diversity. Particularly, zero correlation zones are employed in frequency domain for ICI cancelation caused by Doppler spread, which confirms the validity of the contribution to the wideband wireless communications in high speed mobile environment.
Yusuke ASAI Wenjie JIANG Takeshi ONIZAWA
This paper describes the experimental evaluation of a testbed with a simple decision-feedback channel tracking scheme for MIMO-OFDM systems. The channel tracking scheme periodically estimates the channel state matrix for each subcarrier from received signals and replicas of the transmitted signal. The estimated channel state matrices, which are obtained at mutually different timings, are combined based on maximum ratio combining and used for MIMO signal detection. The testbed was implemented on field programmable gate arrays (FPGAs) of 1/5 scale, which confirms the implementation feasibility of the channel tracking scheme. The packet error rate (PER) and mobility performance of the testbed were measured. The testbed employed a 22 MIMO channel, zero-forcing algorithm for MIMO signal detection, 16QAM for the subcarrier modulation scheme, and coding rate of 1/2. The proposed scheme suppressed the increase in the required SNR for PER of 10-2 to less than 1 dB when the relative velocity between the transmitter and the receiver was less than 45 km/h assuming 5 GHz band operation. In addition, the proposed scheme offers 6.3% better throughput than the conventional scheme. The experimental results demonstrate that the channel tracking scheme implemented in the testbed effectively tracks the fluctuation of a MIMO channel.