1-4hit |
Takahiro YUKIZANE Shin-ya OHI Eiji MIYANO Hideo HIROSE
In difficult classification problems of the z-dimensional points into two groups giving 0-1 responses due to the messy data structure, we try to find the denser regions for the favorable customers of response 1, instead of finding the boundaries to separate the two groups. Such regions are called the bumps, and finding the boundaries of the bumps is called the bump hunting. The main objective of this paper is to find the largest region of the bumps under a specified ratio of the number of the points of response 1 to the total. Then, we may obtain a trade-off curve between the number of points of response 1 and the specified ratio. The decision tree method with the Gini's index will provide the simple-shaped boundaries for the bumps if the marginal density for response 1 shows a rather simple or monotonic shape. Since the computing time searching for the optimal trees will cost much because of the NP-hardness of the problem, some random search methods, e.g., the genetic algorithm adapted to the tree, are useful. Due to the existence of many local maxima unlike the ordinary genetic algorithm search results, the extreme-value statistics will be useful to estimate the global optimum number of captured points; this also guarantees the accuracy of the semi-optimal solution with the simple descriptive rules. This combined method of genetic algorithm search and extreme-value statistics use is new. We apply this method to some artificial messy data case which mimics the real customer database, showing a successful result. The reliability of the solution is discussed.
Hideo HIROSE Masakazu TOKUNAGA Takenori SAKUMURA Junaida SULAIMAN Herdianti DARWIS
Prediction of seasonal infectious disease spread is traditionally dealt with as a function of time. Typical methods are time series analysis such as ARIMA (autoregressive, integrated, and moving average) or ANN (artificial neural networks). However, if we regard the time series data as the matrix form, e.g., consisting of yearly magnitude in row and weekly trend in column, we may expect to use a different method (matrix approach) to predict the disease spread when seasonality is dominant. The MD (matrix decomposition) method is the one method which is used in recommendation systems. The other is the IRT (item response theory) used in ability evaluation systems. In this paper, we apply these two methods to predict the disease spread in the case of infectious gastroenteritis caused by norovirus in Japan, and compare the results obtained by using two conventional methods in forecasting, ARIMA and ANN. We have found that the matrix approach is simple and useful in prediction for the seasonal infectious disease spread.
There are two main methods for pandemic simulations: the SEIR model and the MAS model. The SEIR model can deal with simulations quickly for many homogeneous populations with simple ordinary differential equations; however, the model cannot accommodate many detailed conditions. The MAS model, the multi-agent simulation, can deal with detailed simulations under the many kinds of initial and boundary conditions with simple social network models. However, the computing cost will grow exponentially as the population size becomes larger. Thus, simulations in the large-scale model would hardly be realized unless supercomputers are available. By combining these two methods, we may perform the pandemic simulations in the large-scale model with lower costs. That is, the MAS model is used in the early stage of a pandemic simulation to determine the appropriate parameters to be used in the SEIR model. With these obtained parameters, the SEIR model may then be used. To investigate the validity of this combined method, we first compare the simulation results between the SEIR model and the MAS model. Simulation results of the MAS model and the SEIR model that uses the parameters obtained by the MAS model simulation are found to be close to each other.
To an extremely difficult problem of finding the maximum likelihood estimates in a specific mixture regression model, a combination of several optimization techniques is found to be useful. These algorithms are the continuation method, Newton-Raphson method, and simplex method. The simplex method searches for an approximate solution in a wider range of the parameter space, then a combination of the continuation method and the Newton-Raphson method finds a more accurate solution. In this paper, this combination method is applied to find the maximum likelihood estimates in a Weibull-power-law type regression model.