Takeshi YAGI Junichi MURAYAMA Takeo HARIU Sho TSUGAWA Hiroyuki OHSAKI Masayuki MURATA
We proposes a method for determining the frequency for monitoring the activities of a malware download site used for malware attacks on websites. In recent years, there has been an increase in attacks exploiting vulnerabilities in web applications for infecting websites with malware and maliciously using those websites as attack platforms. One scheme for countering such attacks is to blacklist malware download sites and filter out access to them from user websites. However, a malware download site is often constructed through the use of an ordinary website that has been maliciously manipulated by an attacker. Once the malware has been deleted from the malware download site, this scheme must be able to unblacklist that site to prevent normal user websites from being falsely detected as malware download sites. However, if a malware download site is frequently monitored for the presence of malware, the attacker may sense this monitoring and relocate that malware on a different site. This means that an attack will not be detected until the newly generated malware download site is discovered. In response to these problems, we clarify the change in attack-detection accuracy caused by attacker behavior. This is done by modeling attacker behavior, specifying a state-transition model with respect to the blacklisting of a malware download site, and analyzing these models with synthetically generated attack patterns and measured attack patterns in an operation network. From this analysis, we derive the optimal monitoring frequency that maximizes the true detection rate while minimizing the false detection rate.
Takeshi YAGI Junichi MURAYAMA Takeo HARIU Hiroyuki OHSAKI
With the diffusion of web services caused by the appearance of a new architecture known as cloud computing, a large number of websites have been used by attackers as hopping sites to attack other websites and user terminals because many vulnerable websites are constructed and managed by unskilled users. To construct hopping sites, many attackers force victims to download malware by using vulnerabilities in web applications. To protect websites from these malware infection attacks, conventional methods, such as using anti-virus software, filter files from attackers using pattern files generated by analyzing conventional malware files collected by security vendors. In addition, certain anti-virus software uses a behavior blocking approach, which monitors malicious file activities and modifications. These methods can detect malware files that are already known. However, it is difficult to detect malware that is different from known malware. It is also difficult to define malware since legitimate software files can become malicious depending on the situation. We previously proposed an access filtering method based on communication opponents, which are other servers or terminals that connect with our web honeypots, of attacks collected by web honeypots, which collect malware infection attacks to websites by using actual vulnerable web applications. In this blacklist-based method, URLs or IP addresses, which are used in malware infection attacks collected by web honeypots, are listed in a blacklist, and accesses to and from websites are filtered based on the blacklist. To reveal the effects in an actual attack situation on the Internet, we evaluated the detection ratio of anti-virus software, our method, and a composite of both methods. Our evaluation revealed that anti-virus software detected approximately 50% of malware files, our method detected approximately 98% of attacks, and the composite of the two methods could detect approximately 99% of attacks.
Masaki KOHANA Shusuke OKAMOTO Atsuko IKEGAMI
This paper describes a near-optimal allocation method for web-based multi-player online role-playing games (MORPGs), which must be able to cope with a large number of users and high frequency of user requests. Our previous work introduced a dynamic data reallocation method. It uses multiple web servers and divides the entire game world into small blocks. Each ownership of block is allocated to a web server. Additionally, the ownership is reallocated to the other web server according to the user's requests. Furthermore, this block allocation was formulated as a combinational optimization problem. And a simulation based experiment with an exact algorithm showed that our system could achieve 31% better than an ad-hoc approach. However, the exact algorithm takes too much time to solve a problem when the problem size is large. This paper proposes a meta-heuristic approach based on a tabu search to solve a problem quickly. A simulation result shows that our tabu search algorithm can generate solutions, whose average correctness is only 1% different from that of the exact algorithm. In addition, the average calculation time for 50 users on a system with five web servers is about 25.67 msec while the exact algorithm takes about 162 msec. An evaluation for a web-based MORPG system with our tabu search shows that it could achieve 420 users capacity while 320 for our previous system.
Dung Duc NGUYEN Maike ERDMANN Tomoya TAKEYOSHI Gen HATTORI Kazunori MATSUMOTO Chihiro ONO
The abundance of information published on the Internet makes filtering of hazardous Web pages a difficult yet important task. Supervised learning methods such as Support Vector Machines (SVMs) can be used to identify hazardous Web content. However, scalability is a big challenge, especially if we have to train multiple classifiers, since different policies exist on what kind of information is hazardous. We therefore propose two different strategies to train multiple SVMs for personalized Web content filters. The first strategy identifies common data clusters and then performs optimization on these clusters in order to obtain good initial solutions for individual problems. This initialization shortens the path to the optimal solutions and reduces the training time on individual training sets. The second approach is to train all SVMs simultaneously. We introduce an SMO-based kernel-biased heuristic that balances the reduction rate of individual objective functions and the computational cost of kernel matrix. The heuristic primarily relies on the optimality conditions of all optimization problems and secondly on the pre-calculated part of the whole kernel matrix. This strategy increases the amount of information sharing among learning tasks, thus reduces the number of kernel calculation and training time. In our experiments on inconsistently labeled training examples, both strategies were able to predict hazardous Web pages accurately (> 91%) with a training time of only 26% and 18% compared to that of the normal sequential training.
The link structure of the Web is generally viewed as a webgraph. One of the main objectives of web structure mining is to find hidden communities on the Web based on the webgraph, and one of its approaches tries to enumerate substructures, each of which corresponds to a set of web pages of a community or its core. Research has shown that certain substructures can find sets of pages that are inherently irrelevant to communities. In this paper, we propose a model, which we call contracted webgraphs, where such substructures are contracted into single nodes to hide useless information. We then try structure mining iteratively on those contracted webgraphs since we can expect to find further hidden information once irrelevant information is eliminated. We also explore the structural properties of contracted webgraphs from the viewpoint of scale-freeness, and we observe that they exhibit novel and extreme self-similarities.
Young Seung LEE Seung Keun PARK
Electromagnetic power transmission through two cyl-inder-penetrated circular apertures in parallel conducting planes is studied. The Weber transform and superposition principle are used to represent the scattered field. A set of simultaneous equations for the modal coefficients are constituted based on the mode-matching and boundary conditions. The whole integration path is slightly deformed into a new one below the positive real axis not to pass through the pole singularities encountered on the original path so that it is easily calculated by direct numerical quadrature. Computation shows the behaviors of power transmission in terms of aperture geometry and wavelength. The presented scheme is very amenable to numerical evaluations and useful for various electromagnetic scattering and antenna radiation analysis involved with singularity problems.
Sila CHUNWIJITRA Arjulie JOHN BERENA Hitoshi OKADA Haruki UENO
In this paper, we propose a new online authoring tool for e-Learning system to meet the social demands for internationalized higher education. The tool includes two functions – an authoring function for creating video-based content by the instructor, and a viewing function for self-learning by students. In the authoring function, an instructor creates key markings onto the raw video stream to produce virtual video clips related to each slide. With key markings, some parts of the raw video stream can be easily skipped. The virtual video clips form an aggregated video stream that is used to synchronize with the slide presentation to create learning content. The synchronized content can be previewed immediately at the client computer prior to saving at the server. The aggregated video becomes the baseline for the viewing function. Based on aggregated video stream methodology, content editing requires only the changing of key markings without editing the raw video file. Furthermore, video and pointer synchronization is also proposed for enhancing the students' learning efficiency. In viewing function, video quality control and an adaptive video buffering method are implemented to support usage in various network environments. The total system is optimized to support cross-platform and cloud computing to break the limitation of various usages. The proposed method can provide simple authoring processes with clear user interface design for instructors, and help students utilize learning contents effectively and efficiently. In the user acceptance evaluation, most respondents agree with the usefulness, ease-of-use, and user satisfaction of the proposed system. The overall results show that the proposed authoring and viewing tools have higher user acceptance as a tool for e-Learning.
Mi-Young CHOI Chang-Joo MOON Doo-Kwon BAIK
The Semantic Web uses RDF/RDFS, which can enable a machine to understand web data without human interference. But most web data is not available in RDF/RDFS documents because most web data is still stored in databases. It is much more favorable to use stored data in a database to build the Semantic Web. This paper proposes an enhanced relational RDF/RDFS interoperable data model (ER2iDM) and a transformation procedure from relational data model (RDM) to RDF/RDFS based on ER2iDM. The ER2iDM is a data model that plays the role of an inter-mediator between RDM and RDF/RDFS during a transformation procedure. The data and schema information in the database are migrated to the ER2iDM according to the proposed translation procedures without incurring loss of meaning of the entities, relationships, and data. The RDF/RDFS generation tool makes a RDF/RDFS XML document automatically from the ER2iDM. The proposed ER2iDM and transformation procedure provides detailed guidelines for transformation from RDM to RDF/RDFS unlike existing studies; therefore, we can more efficiently build up the Semantic Web using database stored data.
Yuanbin HAN Shizhan CHEN Zhiyong FENG
This paper presents a novel topic modeling (TM) approach for discovering meaningful topics for Web APIs, which is a potential dimensionality reduction way for efficient and effective classification, retrieval, organization, and management of numerous APIs. We exploit the possibility of conducting TM on multi-labeled APIs by combining a supervised TM (known as Labeled LDA) with ontology. Experiments conducting on real-world API data set show that the proposed method outperforms standard Labeled LDA with an average gain of 7.0% in measuring quality of the generated topics. In addition, we also evaluate the similarity matching between topics generated by our method and standard Labeled LDA, which demonstrates the significance of incorporating ontology.
The string analysis is a static analysis of dynamically generated strings in a target program, which is applied to check well-formed string construction in web applications. The string analysis constructs a finite state automaton that approximates a set of possible strings generated for a particular string variable at a program location at runtime. A drawback in the string analysis is imprecision in the analysis result, leading to false positives in the well-formedness checkers. To address the imprecision, this paper proposes an improvement technique of the string analysis to make it perform more precise analysis with respect to input validation in web applications. This paper presents the improvement by annotations representing screening of a set of possible strings, and empirical evaluation with experiments of the improved analyzer on real-world web applications.
With the successful adoption of link analysis techniques such as PageRank and web spam filtering, current web search engines well support “navigational search”. However, due to the use of a simple conjunctive Boolean filter in addition to the inappropriateness of user queries, such an engine does not necessarily well support “informational search”. Informational search would be better handled by a web search engine using an informational retrieval model combined with enhancement techniques such as query expansion and relevance feedback. Moreover, the realization of such an engine requires a method to prosess the model efficiently. In this paper we propose a novel extension of an existing top-k query processing technique to improve search efficiency. We add to it the technique utilizing a simple data structure called a “term-document binary matrix,” resulting in more efficient evaluation of top-k queries even when the queries have been expanded. We show on the basis of experimental evaluation using the TREC GOV2 data set and expanded versions of the evaluation queries attached to this data set that the proposed method can speed up evaluation considerably compared with existing techniques especially when the number of query terms gets larger.
Gang WANG Li ZHANG Yonggang HUANG Yan SUN
It is the key concern for service providers that how a web service stands out among functionally similar services. QoS is a distinct and decisive factor in service selection among functionally similar services. Therefore, how to design services to meet customers' QoS requirements is an urgent problem for service providers. This paper proposes an approach using QFD (Quality Function Deployment) which is a quality methodology to transfer services' QoS requirements into services' design attribute characteristics. Fuzzy set is utilized to deal with subjective and vague assessments such as importance of QoS properties. TCI (Technical Competitive Index) is defined to compare the technical competitive capacity of a web service with those of other functionally similar services in the aspect of QoS. Optimization solutions of target values of service design attributes is determined by GA (Genetic Algorithm) in order to make the technical performance of the improved service higher than those of any other rival service products with the lowest improvement efforts. Finally, we evaluate candidate improvement solutions on cost-effectiveness. As the output of QFD process, the optimization targets and order of priority of service design attributes can be used as an important basis for developing and improving service products.
The Linking Open Data (LOD) cloud is a collection of linked Resource Description Framework (RDF) data with over 31 billion RDF triples. Accessing linked data is a challenging task because each data set in the LOD cloud has a specific ontology schema, and familiarity with the ontology schema used is required in order to query various linked data sets. However, manually checking each data set is time-consuming, especially when many data sets from various domains are used. This difficulty can be overcome without user interaction by using an automatic method that integrates different ontology schema. In this paper, we propose a Mid-Ontology learning approach that can automatically construct a simple ontology, linking related ontology predicates (class or property) in different data sets. Our Mid-Ontology learning approach consists of three main phases: data collection, predicate grouping, and Mid-Ontology construction. Experiments show that our Mid-Ontology learning approach successfully integrates diverse ontology schema with a high quality, and effectively retrieves related information with the constructed Mid-Ontology.
Yukihiko SHIGESADA Shinsuke KOBAYASHI Noboru KOSHIZUKA Ken SAKAMURA
Context awareness is one of the ultimate goals of ubiquitous computing, and spatial information plays an important role in building context awareness. In this paper, we propose a new interoperable spatial information model, which is based on ucode relation (ucR) and Place Identifier (PI), for realizing ubiquitous spatial infrastructure. In addition, we propose a design environment for spatial information database using our model. Our model is based on ucode and its relation. ucode is 128 bits number and the number itself has no meaning. Hence, it is difficult to manage the relation between ucodes without using a tool. Our design environment provides to describe connection between each ucode visually and is able to manipulate data using the target space map interactively. To evaluate the proposed model and environment, we designed three spaces using our tool. In addition, we developed a web application using our spatial model. From evaluation, we have been showed that our model is effective and our design environment is useful to develop our spatial information model.
The globalization of commerce has increased the importance of retrieving and updating complex and distributed information efficiently. Web services currently show that the most promise for building distributed application systems and model-driven architecture is a new approach to developing such applications. The expanding scale and complexity of enterprise information systems (EISs) under distributed computing environments has made sharing and exchanging data particularly challenging. Data services are applications tailored specifically for information oriented tasks to deal with business service requirements, and are heavily dependent on the distributed architecture of consumer data processing. The implementation of a data service can eliminate inconsistency among various application systems in the exchange of data. This paper proposes a data-oriented model-driven developmental framework to deal with these issues, in which a platform independent model (PIM) is divided into a service model, a logic data model, and a service composition model. We also divide a platform specific model (PSM) into a physical data model and a data service model. In this development method, we define five meta-models and outline a set of rules governing the transformation from PIMs into PSMs. A code generator is also included to transform each PSM into the application code. We include a case study to demonstrate the feasibility and merits of the proposed development framework with a case study.
Shinji KIKUCHI Yoshihiro KANNA Yohsuke ISOZAKI
In recent years, there has been an increasing demand with regard to available elemental services provided by independent firms for compositing new services. Currently, however, whenever it is difficult to maintain the required level of quality of a new composite web service, assignment of the new computer's resources as provisioning at the data center is not always effective, especially in the area of performance for composite web service providers. Thus, a new approach might be required. This paper presents a new control method aiming to maintain the performance requirements for composite web services. There are three aspects of our method that are applied: first of all, the theory of constraints (TOC) proposed by E.M. Goldratt ; secondly, an evaluation process in the non-linear feed forward controlling method: and finally multiple trials in applying policies with verification. In particular, we will discuss the architectural and theoretical aspects of the method in detail, and will show the insufficiency of combining the feedback controlling approach with TOC as a result of our evaluation.
Jaekwang KIM KwangHo YOON Jee-Hyong LEE
Clickstreams in users' navigation logs have various data which are related to users' web surfing. Those are visit counts, stay times, product types, etc. When we observe these data, we can divide clickstreams into sub-clickstreams so that the pages in a sub-clickstream share more contexts with each other than with the pages in other sub-clickstreams. In this paper, we propose a method which extracts more informative rules from clickstreams for web page recommendation based on genetic programming and association rules. First, we split clickstreams into sub-clickstreams by contexts for generating more informative rules. In order to split clickstreams in consideration of context, we extract six features from users' navigation logs. A set of split rules is generated by combining those features through genetic programming, and then informative rules for recommendation are extracted with the association rule mining algorithm. Through experiments, we verify that the proposed method is more effective than the other methods in various conditions.
Theerayut THONGKRAU Pattarachai LALITROJWONG
The development of ontology at the instance level requires the extraction of the terms defining the instances from various data sources. These instances then are linked to the concepts of the ontology, and relationships are created between these instances for the next step. However, before establishing links among data, ontology engineers must classify terms or instances from a web document into an ontology concept. The tool for help ontology engineer in this task is called ontology population. The present research is not suitable for ontology development applications, such as long time processing or analyzing large or noisy data sets. OntoPop system introduces a methodology to solve these problems, which comprises two parts. First, we select meaningful features from syntactic relations, which can produce more significant features than any other method. Second, we differentiate feature meaning and reduce noise based on latent semantic analysis. Experimental evaluation demonstrates that the OntoPop works well, significantly out-performing the accuracy of 49.64%, a learning accuracy of 76.93%, and executes time of 5.46 second/instance.
Takamichi SAITO Kiyomi SEKIGUCHI Ryosuke HATSUGAI
While the Secure Socket Layer or Transport Layer Security (SSL/TLS) is assumed to provide secure communications over the Internet, many web applications utilize basic or digest authentication of Hyper Text Transport Protocol (HTTP) over SSL/TLS. Namely, in the scheme, there are two different authentication schemes in a session. Since they are separated by a layer, these are not convenient for a web application. Moreover, the scheme may also cause problems in establishing secure communication. Then we provide a scheme of authentication binding between SSL/TLS and HTTP without modifying SSL/TLS protocols and its implementation, and we show the effectiveness of our proposed scheme.
An infinitely long monopole antenna driven by a coaxial cable is revisited. The associated Weber transform and the mode-matching method are used to obtain simple simultaneous equations for the modal coefficients. Computations are performed to illustrate the behavior of current distribution and antenna admittance in terms of antenna geometries.