Yu WANG Tao LU Feng YAO Yuntao WU Yanduo ZHANG
In recent years, single face image super-resolution (SR) using deep neural networks have been well developed. However, most of the face images captured by the camera in a real scene are from different views of the same person, and the existing traditional multi-frame image SR requires alignment between images. Due to multi-view face images contain texture information from different views, which can be used as effective prior information, how to use this prior information from multi-views to reconstruct frontal face images is challenging. In order to effectively solve the above problems, we propose a novel face SR network based on multi-view face images, which focus on obtaining more texture information from multi-view face images to help the reconstruction of frontal face images. And in this network, we also propose a texture attention mechanism to transfer high-precision texture compensation information to the frontal face image to obtain better visual effects. We conduct subjective and objective evaluations, and the experimental results show the great potential of using multi-view face images SR. The comparison with other state-of-the-art deep learning SR methods proves that the proposed method has excellent performance.
Akira KITAYAMA Goichi ONO Tadashi KISHIMOTO Hiroaki ITO Naohiro KOHMU
Reducing power consumption is crucial for edge devices using convolutional neural network (CNN). The zero-skipping approach for CNNs is a processing technique widely known for its relatively low power consumption and high speed. This approach stops multiplication and accumulation (MAC) when the multiplication results of the input data and weight are zero. However, this technique requires large logic circuits with around 5% overhead, and the average rate of MAC stopping is approximately 30%. In this paper, we propose a precise zero-skipping method that uses input data and simple logic circuits to stop multipliers and accumulators precisely. We also propose an active data-skipping method to further reduce power consumption by slightly degrading recognition accuracy. In this method, each multiplier and accumulator are stopped by using small values (e.g., 1, 2) as input. We implemented single shot multi-box detector 500 (SSD500) network model on a Xilinx ZU9 and applied our proposed techniques. We verified that operations were stopped at a rate of 49.1%, recognition accuracy was degraded by 0.29%, power consumption was reduced from 9.2 to 4.4 W (-52.3%), and circuit overhead was reduced from 5.1 to 2.7% (-45.9%). The proposed techniques were determined to be effective for lowering the power consumption of CNN-based edge devices such as FPGA.
Haitong YANG Guangyou ZHOU Tingting HE Maoxi LI
The current approaches to semantic role classification usually first define a representation vector for a candidate role and feed the vector into a deep neural network to perform classification. The representation vector contains some lexicalization features like word embeddings, lemmar embeddings. From linguistics, the semantic role frame of a sentence is a joint structure with strong dependencies between arguments which is not considered in current deep SRL systems. Therefore, this paper proposes a global deep reranking model to exploit these strong dependencies. The evaluation experiments on the CoNLL 2009 shared tasks show that our system can outperforms a strong local system significantly that does not consider role dependency relations.
A non-volatile memory (NVM) employing MTJ has a lot of strong points such as read/write performance, best endurance and operating-voltage compatibility with standard CMOS. However, it consumes a lot of energy when writing the data. This becomes an obstacle when applying to battery-operated mobile devices. To solve this problem, we propose an approach to augment the capability of the precision scaling technique for the write operation in NVM. Precision scaling is an approximate computing technique to reduce the bit width of data (i.e. precision) for energy reduction. When writing image data to NVM with the precision scaling, the write energy and the image quality are changed according to the write time and the target bit range. We propose an energy-efficient approximate storing scheme for non-volatile flip-flops and a magnetic random-access memory (MRAM) that allows us to write the data by optimizing the bit positions to split the data and the write time for each bit range. By using the statistical model, we obtained optimal values for the write time and the targeted bit range under the trade-off between the write energy reduction and image quality degradation. Simulation results have demonstrated that by using these optimal values the write energy can be reduced up to 50% while maintaining the acceptable image quality. We also investigated the relationship between the input images and the output image quality when using this approach in detail. In addition, we evaluated the energy benefits when applying our approach to nine types of image processing including linear filters and edge detectors. Results showed that the write energy is reduced by further 12.5% at the maximum.
For the first stage of the multi-sensitive bucketization (MSB) method, the l-diversity grouping for multiple sensitive attributes is incomplete, causing more information loss. To solve this problem, we give the definitions of the l-diversity avoidance set for multiple sensitive attributes and the avoiding of a multiple dimensional bucket, and propose a complete l-diversity grouping (CLDG) algorithm for multiple sensitive attributes. Then, we improve the first stages of the MSB algorithms by applying the CLDG algorithm to them. The experimental results show that the grouping ratio of the improved first stages of the MSB algorithms is significantly higher than that of the original first stages of the MSB algorithms, decreasing the information loss of the published microdata.
Toshishige SHIMAMURA Hiroki MORIMURA
A new threshold circuit technique is proposed for a vibration sensing circuit that operates at a nanowatt power level. The sensing circuits that use sample-and-hold require a clock signal, and they consume power to generate a signal. In the use of a Schmitt trigger circuit that does not use a clock signal, a sink current flows when thresholding the analog signal output. The requirements for millimeter-sized wireless sensor nodes are an average power on the order of a nanowatt and a signal transition time of less than 1 ms. To meet these requirements, our circuit limits the sink current with a nanoampere-level current source. The chattering caused by current limiting is suppressed by feeding back the change in output voltage to the limiting current. The increase in the signal transition time that is caused by current limiting is reduced by accelerating the discharge of the load capacitance. For a test chip fabricated in the 0.35-µm CMOS process, the proposed threshold circuits operate without chattering and the average powers are 0.7-3 nW. The signal transition times are estimated in a circuit simulation to be 65-97 µs. The proposed circuit has 1/150th the power-delay product with no time interval of the sensing operation under the condition that the time interval is 1s. These results indicate that, the proposed threshold circuits are suitable for vibration sensing in millimeter-sized wireless sensor nodes.
The purpose of this paper is to find an automated pricing algorithm to calculate the real cost of each product by considering the associate costs of the business. The methodology consists of two main stages. A brief semi-structured survey and a mathematical calculation the expenses and adding them to the original cost of the offered products and services. The output of this process obtains the minimum recommended selling price (MRSP) that the business should not go below, to increase the likelihood of generating profit and avoiding the unexpected loss. The contribution of this study appears in filling the gap by calculating the minimum recommended price automatically and assisting businesses to foresee future budgets. This contribution has a certain limitation, where it is unable to calculate the MRSP of the in-house created products from raw materials. It calculates the MRSP only for the products bought from the wholesaler to be sold by the retailer.
Hong-Li WANG Li-Li FAN Gang WANG Lin-Zhi SHEN
In the letter, two classes of optimal codebooks and asymptotically optimal codebooks in regard to the Levenshtein bound are presented, which are based on mutually unbiased bases (MUB) and approximately mutually unbiased bases (AMUB), respectively.
Dongzhen WANG Daqing HUANG Cheng XU
The reconnaissance mode with the cooperation of two unmanned aerial vehicles (UAVs) equipped with airborne visual tracking platforms is a common practice for localizing a target. Apart from the random noises from sensors, the localization performance is much dependent on their cooperative trajectories. In our previous work, we have proposed a cooperative trajectory generating method that proves better than EKF based method. In this letter, an improved online trajectory generating method is proposed to enhance the previous one. First, the least square estimation method has been replaced with a geometric-optimization based estimation method, which can obtain a better estimation performance than the least square method proposed in our previous work; second, in the trajectory optimization phase, the position error caused by estimation method is also considered, which can further improve the optimization performance of the next way points of the two UAVs. The improved method can well be applied to the two-UAV trajectory planning for corporative target localization, and the simulation results confirm that the improved method achieves an obviously better localization performance than our previous method and the EKF-based method.
Shakhnaz AKHMEDOVA Vladimir STANOVOV Sophia VISHNEVSKAYA Chiori MIYAJIMA Yukihiro KAMIYA
This study is focused on the automated detection of a complex system operator's condition. For example, in this study a person's reaction while listening to music (or not listening at all) was determined. For this purpose various well-known data mining tools as well as ones developed by authors were used. To be more specific, the following techniques were developed and applied for the mentioned problems: artificial neural networks and fuzzy rule-based classifiers. The neural networks were generated by two modifications of the Differential Evolution algorithm based on the NSGA and MOEA/D schemes, proposed for solving multi-objective optimization problems. Fuzzy logic systems were generated by the population-based algorithm called Co-Operation of Biology Related Algorithms or COBRA. However, firstly each person's state was monitored. Thus, databases for problems described in this study were obtained by using non-contact Doppler sensors. Experimental results demonstrated that automatically generated neural networks and fuzzy rule-based classifiers can properly determine the human condition and reaction. Besides, proposed approaches outperformed alternative data mining tools. However, it was established that fuzzy rule-based classifiers are more accurate and interpretable than neural networks. Thus, they can be used for solving more complex problems related to the automated detection of an operator's condition.
Cuffless blood pressure (BP) monitors are noninvasive devices that measure systolic and diastolic BP without an inflatable cuff. They are easy to use, safe, and relatively accurate for resting-state BP measurement. Although commercially available from online retailers, BP monitors must be approved or certificated by medical regulatory bodies for clinical use. Cuffless BP monitoring devices also need to be approved; however, only the Institute of Electrical and Electronics Engineers (IEEE) certify these devices. In this paper, the principles of cuffless BP monitors are described, and the current situation regarding BP monitor standards and approval for medical use is discussed.
Hideya SO Takafumi FUJITA Kento YOSHIZAWA Maiko NAYA Takashi SHIMIZU
This paper proposes a novel radio access scheme that uses duplicated transmission via multiple frequency channels to achieve mission critical Internet of Things (IoT) services requiring highly reliable wireless communications; the interference constraints that yield the required reliability are revealed. To achieve mission critical IoT services by wireless communication, it is necessary to improve reliability in addition to satisfying the required transmission delay time. Reliability is defined as the packet arrival rate without exceeding the desired transmission delay time. Traffic of the own system and interference from the other systems using the same frequency channel such as unlicensed bands degrades the reliability. One solution is the frequency/time diversity technique. However, these techniques may not achieve the required reliability because of the time taken to achieve the correct reception. This paper proposes a novel scheme that transmits duplicate packets utilizing multiple wireless interfaces over multiple frequency channels. It also proposes a suppressed duplicate transmission (SDT) scheme, which prevents the wastage of radio resources. The proposed scheme achieves the same reliable performance as the conventional scheme but has higher tolerance against interference than retransmission. We evaluate the relationship between the reliability and the occupation time ratio where the interference occupation time ratio is defined as the usage ratio of the frequency resources occupied by the other systems. We reveal the upper bound of the interference occupation time ratio for each frequency channel, which is needed if channel selection control is to achieve the required reliability.
Nobutaka SUZUKI Takuya OKADA Yeondae KWON
Cascading Style Sheets (CSS) is a popular language for describing the styles of XML documents as well as HTML documents. To resolve conflicts among CSS rules, CSS has a mechanism called specificity. For a DTD D and a CSS code R, due to specificity R may contain “unsatisfiable” rules under D, e.g., rules that are not applied to any element of any document valid for D. In this paper, we consider the problem of detecting unsatisfiable CSS rules under DTDs. We focus on CSS fragments in which descendant, child, adjacent sibling, and general sibling combinators are allowed. We show that the problem is coNP-hard in most cases, even if only one of the four combinators is allowed and under very restricted DTDs. We also show that the problem is in coNP or PSPACE depending on restrictions on DTDs and CSS. Finally, we present four conditions under which the problem can be solved in polynomial time.
As NAND flash-based storage has been settled, a flash translation layer (FTL) has been in charge of mapping data addresses on NAND flash memory. Many FTLs implemented various mapping schemes, but the amount of mapping data depends on the mapping level. However, the FTL should contemplate mapping consistency irrespective of how much mapping data dwell in the storage. Furthermore, the recovery cost by the inconsistency needs to be considered for a faster storage reboot time. This letter proposes a novel method that enhances the consistency for a page-mapping level FTL running a legacy logging policy. Moreover, the recovery cost of page mappings also decreases. The novel method is to adopt a virtually-shrunk segment and deactivate page-mapping logs by assembling and storing the segments. This segment scheme already gave embedded NAND flash-based storage enhance its response time in our previous study. In addition to that improved result, this novel plan maximizes the page-mapping consistency, therefore improves the recovery cost compared with the legacy page-mapping FTL.
Lei YANG Tingxiao YANG Hiroki KIMURA Yuichiro YOSHIMURA Kumiko ARAI Taka-aki NAKADA Huiqin JIANG Toshiya NAKAGUCHI
In medical fields, detecting traumatic bleedings has always been a difficult task due to the small size, low contrast of targets and large number of images. In this work we propose an automatic traumatic bleeding detection approach from contrast enhanced CT images via deep CNN networks, containing segmentation process and classification process. CT values of DICOM images are extracted and processed via three different window settings first. Small 3D patches are cropped from processed images and segmented by a 3D CNN network. Then segmentation results are converted to point cloud data format and classified by a classifier. The proposed pre-processing approach makes the segmentation network be able to detect small and low contrast targets and achieve a high sensitivity. The additional classification network solves the boundary problem and short-sighted problem generated during the segmentation process to further decrease false positives. The proposed approach is tested with 3 CT cases containing 37 bleeding regions. As a result, a total of 34 bleeding regions are correctly detected, the sensitivity reaches 91.89%. The average false positive number of test cases is 1678. 46.1% of false positive predictions are decreased after being classified. The proposed method is proved to be able to achieve a high sensitivity and be a reference of medical doctors.
Kwangjin JEONG Masahiro YUKAWA
Multikernel adaptive filtering is an attractive nonlinear approach to online estimation/tracking tasks. Despite its potential advantages over its single-kernel counterpart, a use of inappropriately weighted kernels may result in a negligible performance gain. In this paper, we propose an efficient recursive kernel weighting technique for multikernel adaptive filtering to activate all the kernels. The proposed weights equalize the convergence rates of all the corresponding partial coefficient errors. The proposed weights are implemented via a certain metric design based on the weighting matrix. Numerical examples show, for synthetic and multiple real datasets, that the proposed technique exhibits a better performance than the manually-tuned kernel weights, and that it significantly outperforms the online multiple kernel regression algorithm.
In this paper, we present a novel portrait impression estimation method using nine pairs of semantic impression words: bitter-majestic, clear-pure, elegant-mysterious, gorgeous-mature, modern-intellectual, natural-mild, sporty-agile, sweet-sunny, and vivid-dynamic. In the first part of the study, we analyzed the relationship between the facial features in deformed portraits and the nine semantic impression word pairs over a large dataset, which we collected by a crowdsourcing process. In the second part, we leveraged the knowledge from the results of the analysis to develop a ranking network trained on the collected data and designed to estimate the semantic impression associated with a portrait. Our network demonstrated superior performance in impression estimation compared with current state-of-the-art methods.
Rintaro YANAGI Ren TOGO Takahiro OGAWA Miki HASEYAMA
Various cross-modal retrieval methods that can retrieve images related to a query sentence without text annotations have been proposed. Although a high level of retrieval performance is achieved by these methods, they have been developed for a single domain retrieval setting. When retrieval candidate images come from various domains, the retrieval performance of these methods might be decreased. To deal with this problem, we propose a new domain adaptive cross-modal retrieval method. By translating a modality and domains of a query and candidate images, our method can retrieve desired images accurately in a different domain retrieval setting. Experimental results for clipart and painting datasets showed that the proposed method has better retrieval performance than that of other conventional and state-of-the-art methods.
For 360-degree video streaming, a 360-degree video is divided into segments temporally (i.e. some seconds). Each segment consists of multiple video tiles spatially. In this paper, we propose a tile quality selection method for tile-based video streaming. The proposed method suppresses the spatial quality variation within the viewport caused by a change of the viewport region due to user head movement. In the proposed method, the client checks whether the difference in quality level between the viewport and the region around the viewport is large, and if so, reduces it when assigning quality levels. Simulation results indicate that when the segment length is long, quality variation can be suppressed without significantly reducing the perceived video quality (in terms of bitrate). In particular, the quality variation within the viewport can be greatly suppressed. Furthermore, we verify that the proposed method is effective in reducing quality variation within the viewport and across segments without changing the total download size.
Shiori YAMAGUCHI Keita HIRAI Takahiko HORIUCHI
In this study, we present a novel method for removing smoke from videos based on a single image sequence. Smoke is a significant artifact in images or videos because it can reduce the visibility in disaster scenes. Our proposed method for removing smoke involves two main processes: (1) the development of a smoke imaging model and (2) smoke removal using spatio-temporal pixel compensation. First, we model the optical phenomena in natural scenes including smoke, which is called a smoke imaging model. Our smoke imaging model is developed by extending conventional haze imaging models. We then remove the smoke from a video in a frame-by-frame manner based on the smoke imaging model. Next, we refine the appearance of the smoke-free video by spatio-temporal pixel compensation, where we align the smoke-free frames using the corresponding pixels. To obtain the corresponding pixels, we use SIFT and color features with distance constraints. Finally, in order to obtain a clear video, we refine the pixel values based on the spatio-temporal weightings of the corresponding pixels in the smoke-free frames. We used simulated and actual smoke videos in our validation experiments. The experimental results demonstrated that our method can obtain effective smoke removal results from dynamic scenes. We also quantitatively assessed our method based on a temporal coherence measure.