IEICE global.ieice.org Site

Keyword Search Result

[Keyword] MDP(18hit)

1-18hit

A POMDP-Based Approach to Assortment Optimization Problem for Vending Machine Open Access
Gaku NEMOTO Kunihiko HIRAISHI

PAPER-Mathematical Systems Science

Pubricized:
2023/09/05
Vol:
E107-A No:6
Page(s):
909-918
Assortment optimization is one of main problems for retailers, and has been widely studied. In this paper, we focus on vending machines, which have many characteristic issues to be considered. We first formulate an assortment optimization problem for vending machines, next propose a model that represents consumer’s decision making, and then show a solution method based on partially observable Markov decision process (POMDP). The problem includes incomplete state observation, stochastic consumer behavior and policy decisions that maximize future expected rewards. Using computer simulation, we observe that sales increases compared to that by heuristic methods under the same condition. Moreover, the sales approaches the theoretical upper bound.
Deep Coalitional Q-Learning for Dynamic Coalition Formation in Edge Computing
Shiyao DING Donghui LIN

PAPER

Pubricized:
2021/12/14
Vol:
E105-D No:5
Page(s):
864-872
With the high development of computation requirements in Internet of Things, resource-limited edge servers usually require to cooperate to perform the tasks. Most related studies usually assume a static cooperation approach which might not suit the dynamic environment of edge computing. In this paper, we consider a dynamic cooperation approach by guiding edge servers to form coalitions dynamically. It raises two issues: 1) how to guide them to optimally form coalitions and 2) how to cope with the dynamic feature where server statuses dynamically change as the tasks are performed. The coalitional Markov decision process (CMDP) model proposed in our previous work can handle these issues well. However, its basic solution, coalitional Q-learning, cannot handle the large scale problem when the task number is large in edge computing. Our response is to propose a novel algorithm called deep coalitional Q-learning (DCQL) to solve it. To sum up, we first formulate the dynamic cooperation problem of edge servers as a CMDP: each edge server is regarded as an agent and the dynamic process is modeled as a MDP where the agents observe the current state to formulate several coalitions. Each coalition takes an action to impact the environment which correspondingly transfers to the next state to repeat the above process. Then, we propose DCQL which includes a deep neural network and so can well cope with large scale problem. DCQL can guide the edge servers to form coalitions dynamically with the target of optimizing some goal. Furthermore, we run experiments to verify our proposed algorithm's effectiveness in different settings.
Improved Metric Function for AlphaSeq Algorithm to Design Ideal Complementary Codes for Multi-Carrier CDMA Systems
Shucong TIAN Meng YANG Jianpeng WANG Rui WANG Avik R. ADHIKARY

LETTER-Communication Theory and Signals

Pubricized:
2021/11/15
Vol:
E105-A No:5
Page(s):
901-905
AlphaSeq is a new paradigm to design sequencess with desired properties based on deep reinforcement learning (DRL). In this work, we propose a new metric function and a new reward function, to design an improved version of AlphaSeq. We show analytically and also through numerical simulations that the proposed algorithm can discover sequence sets with preferable properties faster than that of the previous algorithm.
Improving the Efficiency of a Reaction Attack on the QC-MDPC McEliece
Thales BANDIERA PAIVA Routo TERADA

PAPER-Cryptography and Information Security

Vol:
E101-A No:10
Page(s):
1676-1686
The QC-MDPC McEliece scheme was considered one of the most promising public key encryption schemes for efficient post-quantum secure encryption. As a variant of the McEliece scheme, it is based on the syndrome decoding problem, which is a hard problem from Coding Theory. Its key sizes are competitive with the ones of the widely used RSA cryptosystem, and it came with an apparently strong security reduction. For three years, the scheme has not suffered major threats, until the end of 2016, at the Asiacrypt, when Guo, Johansson, and Stankovski presented a reaction attack on the QC-MDPC that exploits one aspect that was not considered in the security reduction: the probability of a decoding failure to occur is lower when the secret key and the error used for encryption share certain properties. Recording the decoding failures, the attacker obtains information about the secret key and then use the information gathered to reconstruct the key. Guo et al. presented an algorithm for key reconstruction for which we can point two weaknesses. The first one is that it cannot deal with partial information about the secret key, resulting in the attacker having to send a large number of decoding challenges. The second one is that it does not scale well for higher security levels. To improve the attack, we propose a key reconstruction algorithm that runs faster than Guo's et al. algorithm, even using around 20% less interactions with the secret key holder than used by their algorithm, considering parameters suggested for 80 bits of security. It also has a lower asymptotic complexity which makes it scale much better for higher security parameters. The algorithm can be parallelized straightforwardly, which is not the case for the one by Guo et al.
Energy-Aware Download Method in LTE Based Smartphone
Jie REN Ling GAO Hai WANG QuanLi GAO ZheWen ZHANG

PAPER-Information Network

Pubricized:
2016/11/18
Vol:
E100-D No:2
Page(s):
304-312
Mobile traffic is experiencing tremendous growth, and this growing wave is no doubt increasing the use of radio component of mobile devices, resulting in shorter battery lifetime. In this paper, we present an Energy-Aware Download Method (EDM) based on the Markov Decision Process (MDP) to optimize the data download energy for mobile applications. Unlike the previous download schemes in literature that focus on the energy efficiency by simply delaying the download requests, which often leads to a poor user experience, our MDP model learns off-line from a set of training download workloads for different user patterns. The model is then integrated into the mobile application to deal the download request at runtime, taking into account the current battery level, LTE reference signal receiving power (RSRP), reference signal signal to noise radio (RSSNR) and task size as input of the decision process, and maximizes the reward which refers to the expected battery life and user experience. We evaluate how the EDM can be used in the context of a real file downloading application over the LTE network. We obtain, on average, 20.3%, 15% and 45% improvement respectively for energy consumption, latency, and performance of energy-delay trade off, when compared to the Android default download policy (Minimum Delay).
Energy-Efficient and Throughput Maximization Scheme for Sensor-Aided Cognitive Radio Networks
Hiep VU-VAN Insoo KOO

PAPER

Vol:
E98-B No:10
Page(s):
1996-2003
A cognitive radio user (CU) can get assistance from sensor nodes (SN) to perform spectrum sensing. However, the SNs are often powered by a finite-capacity battery, which can maintain operations of the SNs over a short time. Therefore, energy-efficiency of the SNs becomes a crucial problem. In this paper, an SN is considered to be a device with an energy harvester that can harvest energy from a non-radio frequency (non-RF) energy resource while performing other actions concurrently. In any one time slot, in order to maintain the required sensing accuracy of the CR network and to conserve energy in the SNs, only a small number of SNs are required to sense the primary user (PU) signal, and other SNs are kept silent to save energy. For this, an algorithm to divide all SNs into groups that can satisfy the required sensing accuracy of the network, is proposed. In a time slot, each SN group can be assigned one of two actions: stay silent, or be active to perform sensing. The problem of determining the optimal action for all SN groups to maximize throughput of the CR network is formulated as a framework of a partially observable Markov decision process (POMDP), in which the effect of the current time slot's action on the throughput of future time slots is considered. The solution to the problem, that is the decision mode of the SN groups (i.e., active or silent), depends on the residual energy and belief of absence probability of the PU signal. The simulation results show that the proposed scheme can improve energy efficiency of CR networks compared with other conventional schemes.
Dynamic Linear Bellman Combination of Optimal Policies for Solving New Tasks
Takamitsu MATSUBARA Takaya ASAKURA Kenji SUGIMOTO

LETTER-Systems and Control

Vol:
E98-A No:10
Page(s):
2187-2190
In this letter, we propose Dynamic Linear Bellman Combination that allows us to composite optimal policies of Kullback-Leibler control for similar tasks with different terminal and non-terminal costs. Simulation results demonstrate the effectiveness of our method.
Optimal Channel-Sensing Scheme for Cognitive Radio Systems Based on Fuzzy Q-Learning
Fereidoun H. PANAHI Tomoaki OHTSUKI

PAPER

Vol:
E97-B No:2
Page(s):
283-294
In a cognitive radio (CR) network, the channel sensing scheme used to detect the existence of a primary user (PU) directly affects the performances of both CR and PU. However, in practical systems, the CR is prone to sensing errors due to the inefficiency of the sensing scheme. This may yield primary user interference and low system performance. In this paper, we present a learning-based scheme for channel sensing in CR networks. Specifically, we formulate the channel sensing problem as a partially observable Markov decision process (POMDP), where the most likely channel state is derived by a learning process called Fuzzy Q-Learning (FQL). The optimal policy is derived by solving the problem. Simulation results show the effectiveness and efficiency of our proposed scheme.
A POMDP Based Distributed Adaptive Opportunistic Spectrum Access Strategy for Cognitive Ad Hoc Networks
Yichen WANG Pinyi REN Zhou SU

LETTER

Vol:
E94-B No:6
Page(s):
1621-1624
In this letter, we propose a Partially Observable Markov Decision Process (POMDP) based Distributed Adaptive Opportunistic Spectrum Access (DA-OSA) Strategy for Cognitive Ad Hoc Networks (CAHNs). In each slot, the source and destination choose a set of channels to sense and then decide the transmission channels based on the sensing results. In order to maximize the throughput for each link, we use the theories of sequential decision and optimal stopping to determine the optimal sensing channel set. Moreover, we also establish the myopic policy and exploit the monotonicity of the reward function that we use, which can be used to reduce the complexity of the sequential decision.
Photomask Data Prioritization Based on VLSI Design Intent and Its Utilization for Mask Manufacturing
Kokoro KATO Masakazu ENDO Tadao INOUE Shigetoshi NAKATAKE Masaki YAMABE Sunao ISHIHARA

PAPER-Device and Circuit Modeling and Analysis

Vol:
E93-A No:12
Page(s):
2424-2432
The increase in the time required for data processing, mask drawing, and inspection of photomask, has led to substantial increase in mask manufacturing cost. This has become one of the major challenges in the semiconductor industry. We have developed a data flow process for mask manufacturing in which we refer to design intent information in order to reduce TAT of mask manufacturing processes. We convert design level information "Design Intent (DI)" into priority information of mask manufacturing data known as "Mask Data Rank (MDR)" so that we can identify and sort out the importance of mask patterns from the view point of the design side. As a result, we can reduce mask writing time and mask inspection time. Our objective is to build efficient data flow conversion system from DI to MDR. In this paper we introduce the idea of MDR and the software system that we built for DI extraction. Then we show the experimental results with actual chip data. Lastly we will discuss related issues and their solutions.
Policy Gradient Based Semi-Markov Decision Problems: Approximation and Estimation Errors
Ngo Anh VIEN SeungGwan LEE TaeChoong CHUNG

PAPER

Vol:
E93-D No:2
Page(s):
271-279
In and we have presented a simulation-based algorithm for optimizing the average reward in a parameterized continuous-time, finite-state semi-Markov Decision Process (SMDP). We approximated the gradient of the average reward. Then, a simulation-based algorithm was proposed to estimate the approximate gradient of the average reward (called GSMDP), using only a single sample path of the underlying Markov chain. GSMDP was proved to converge with probability 1. In this paper, we give bounds on the approximation and estimation errors for GSMDP algorithm. The approximation error of that approximation is the size of the difference between the true gradient and the approximate gradient. The estimation error, the size of the difference between the output of the algorithm and its asymptotic output, arises because the algorithm sees only a finite data sequence.
Constructing a Multilayered Boundary to Defend against Intrusive Anomalies
Zonghua ZHANG Hong SHEN

PAPER-Application Information Security

Vol:
E90-D No:2
Page(s):
490-499
We propose a model for constructing a multilayered boundary in an information system to defend against intrusive anomalies by correlating a number of parametric anomaly detectors. The model formulation is based on two observations. First, anomaly detectors differ in their detection coverage or blind spots. Second, operating environments of the anomaly detectors reveal different information about system anomalies. The correlation among observation-specific anomaly detectors is first formulated as a Partially Observable Markov Decision Process, and then a policy-gradient reinforcement learning algorithm is developed for an optimal cooperation search, with the practical objectives being broader overall detection coverage and fewer false alerts. A host-based experimental scenario is developed to illustrate the principle of the model and to demonstrate its performance.
CHQ: A Multi-Agent Reinforcement Learning Scheme for Partially Observable Markov Decision Processes
Hiroshi OSADA Satoshi FUJITA

PAPER-Artificial Intelligence and Cognitive Science

Vol:
E88-D No:5
Page(s):
1004-1011
In this paper, we propose a new reinforcement learning scheme called CHQ that could efficiently acquire appropriate policies under partially observable Markov decision processes (POMDP) involving probabilistic state transitions, that frequently occurs in multi-agent systems in which each agent independently takes a probabilistic action based on a partial observation of the underlying environment. A key idea of CHQ is to extend the HQ-learning proposed by Wiering et al. in such a way that it could learn the activation order of the MDP subtasks as well as an appropriate policy under each MDP subtask. The goodness of the proposed scheme is experimentally evaluated. The result of experiments implies that it can acquire a deterministic policy with a sufficiently high success rate, even if the given task is POMDP with probabilistic state transitions.
Sliding Multiple Phase Differential Detection of Trellis-Coded MDPSK-OFDM
Chong Il KIM Zhengyuan XU Han Jong KIM

PAPER

Vol:
E86-B No:5
Page(s):
1591-1600
In this paper, the Viterbi decoder containing new branch metrics of the squared Euclidean distance with multiple order phase differences is introduced in order to improve the bit error rate (BER) in the differential detection of the trellis-coded MDPSK-OFDM. The proposed Viterbi decoder is conceptually same as the sliding multiple phase differential detection method that uses the branch metric with multiple phase differences. Also, we describe the Viterbi algorithm in order to use this branch metrics. Our study shows that such a Viterbi decoder improves BER performance without sacrificing bandwidth and power efficiency. Also, the proposed algorithm can be used in the single carrier modulation.
Labeling Q-Learning in POMDP Environments
Haeyeon LEE Hiroyuki KAMAYA Kenichi ABE

PAPER-Biocybernetics, Neurocomputing

Vol:
E85-D No:9
Page(s):
1425-1432
This paper presents a new Reinforcement Learning (RL) method, called "Labeling Q-learning (LQ-learning)," to solve the partially obervable Markov Decision Process (POMDP) problems. Recently, hierarchical RL methods are widely studied. However, they have the drawback that the learning time and memory are exhausted only for keeping the hierarchical structure, though they wouldn't be necessary. On the other hand, our LQ-learning has no hierarchical structure, but adopts a new type of internal memory mechanism. Namely, in the LQ-learning, the agent percepts the current state by pair of observation and its label, and then, the agent can distinguish states, which look as same, but obviously different, more exactly. So to speak, at each step t, we define a new type of perception of its environment õt=(ot,θt), where ot is conventional observation, and θt is the label attached to the observation ot. Then the classical RL-algorithm is used as if the pair (ot,θt) serves as a Markov state. This labeling is carried out by a Boolean variable, called "CHANGE," and a hash-like or mod function, called Labeling Function (LF). In order to demonstrate the efficiency of LQ-learning, we will apply it to "maze problems" in Grid-Worlds, used in many literatures as POMDP simulated environments. By using the LQ-learning, we can solve the maze problems without initial knowledge of environments.
New Results on Selective Diversity Using Order Statistics for Reception of M-Ary DPSK and PSK Signals on Nakagami Fading Channels
Changhwan KIM Seyoung CHOI Youngyearl HAN

LETTER-Image

Vol:
E84-A No:4
Page(s):
1085-1089
The performances of M-ary DPSK (MDPSK) and PSK (MPSK) systems using L-branch selection combining (SC) diversity reception in frequency-nonselective slow Nakagami fading channels are derived theoretically. For integer values of the Nakagami fading parameter m, the general formula for evaluating symbol error rate (SER) of MDPSK signals in the independent branch diversity system comprises numerical analyses with the integral-form expressions. An exact closed-form SER performance of MPSK signals under the effect of SC diversity via numerical integration is presented.
Performance Analysis of an ATM Multiplexer with a Resume Level Loaded with Homogeneous Bursty Sources
Kwang-Chul LEE Byung-Cheol SHIN

PAPER-Communication Systems and Transmission Equipment

Vol:
E81-B No:11
Page(s):
2147-2156
This paper investigates an ATM multiplexer with a resume level, which uses a selective cell discarding strategy as a priority control, and a Markov-modulated deterministic process (MMDP) as the burst input traffic. Assuming that a system is loaded with a superposition of several independent and homogeneous On-Off bursty sources with two priority classes, we obtain the cell loss probability of each priority class of an ATM multiplexer with a resume level. The performance analysis derived here includes as special cases one without priority and one with a threshold level. From the numerical results, we compare the cell loss probability, the mean queue length, the mean queuing delay, the level crossing rate, and the queue length distribution at the embedded points for the case of a threshold level with those for the case of a resume level. By selecting an appropriate resume level, we can reduce the sensitive state change around the threshold level.
Performance of an ATM Multiplexer with Selective Cell Discarding for On-Off Bursty Traffics
Sang Won MIN Hae CHUNG Chong Kwan UN

PAPER-Communication Networks and Service

Vol:
E78-B No:9
Page(s):
1253-1261
We study an asynchronous transfer mode (ATM) multiplexer with selective cell discarding (SCD), and propose as a new traffic parameter the ratio of high and low priority cell streams when a cell stream multiplexed is classified by the cell loss priority (CLP) field in a cell header. By having loss priority control, it is possible to increase the multiplexing gain. For performance analysis we assume that an on-of T bursty traffic is described with several traffic parameters and the proposed priority ratio, and is approximately modeled by a Markov-modulated deterministic process (MMDP). Assuming that several independent and homogeneous on-off bursty traffics with priority discrimination are multiplexed, we present an analytical procedure for the cell loss probability of each priority level in statistical cell multiplexing with loss priority control, and use the performance results for connection admission control (CAC). Also, we consider the effect of the proposed priority ratio. Although loss priority control increases the statistical multiplexing gain, it is not appropriate for the on-off bursty traffic to change the value of the high-priority ratio in order to obtain a larger multiplexing gain, since the admissible load is determined by the loss probability of low priority traffic for most cases and the values of the ratio in a certain range slightly affect it.

Keyword Search Result

[Keyword] MDP(18hit)

A POMDP-Based Approach to Assortment Optimization Problem for Vending Machine Open Access

Deep Coalitional Q-Learning for Dynamic Coalition Formation in Edge Computing

Improved Metric Function for AlphaSeq Algorithm to Design Ideal Complementary Codes for Multi-Carrier CDMA Systems

Improving the Efficiency of a Reaction Attack on the QC-MDPC McEliece

Energy-Aware Download Method in LTE Based Smartphone

Energy-Efficient and Throughput Maximization Scheme for Sensor-Aided Cognitive Radio Networks

Dynamic Linear Bellman Combination of Optimal Policies for Solving New Tasks

Optimal Channel-Sensing Scheme for Cognitive Radio Systems Based on Fuzzy Q-Learning

A POMDP Based Distributed Adaptive Opportunistic Spectrum Access Strategy for Cognitive Ad Hoc Networks

Photomask Data Prioritization Based on VLSI Design Intent and Its Utilization for Mask Manufacturing

Policy Gradient Based Semi-Markov Decision Problems: Approximation and Estimation Errors

Constructing a Multilayered Boundary to Defend against Intrusive Anomalies

CHQ: A Multi-Agent Reinforcement Learning Scheme for Partially Observable Markov Decision Processes

Sliding Multiple Phase Differential Detection of Trellis-Coded MDPSK-OFDM

Labeling Q-Learning in POMDP Environments

New Results on Selective Diversity Using Order Statistics for Reception of M-Ary DPSK and PSK Signals on Nakagami Fading Channels

Performance Analysis of an ATM Multiplexer with a Resume Level Loaded with Homogeneous Bursty Sources

Performance of an ATM Multiplexer with Selective Cell Discarding for On-Off Bursty Traffics

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles