A Survey on Malicious Domains Detection through DNS Data Analysis
Three-dimensional data are increasingly prevalent across biomedical and social domains. Notable examples are gene-sample-time, individual-feature-time or node-node-time data, generally referred observation-attribute-context data. The unsupervised analysis of three-dimensional data can be pursued to discover putative biological modules, disease progression patterns or communities of individuals with coherent behavior, thus being key to enhance the understanding of complex biological, individual and societal systems. Although clustering can be applied to group observations, it is of limited potential since observations in three-dimensional data domains are typically only meaningfully correlated on subspaces of the overall space. Biclustering tackles this challenge but disregards the third dimension of data. In this context, triclustering -- the discovery of coherent subspaces within three-dimensional data -- has been largely researched to tackle these problems. Despite the diversity of contributions in this field, there is still lacking a structured view on the major requirements of this task, allowed homogeneity criteria (including coherency, structure, quality, locality and orthonormality criteria) and algorithmic approaches. In this context, this work formalizes the triclustering task and its scope; introduces a taxonomy to categorize the contributions on the field; provides a comprehensive comparison of state-of-the-art triclustering algorithms according to their behavior and output; and lists relevant real-world applications.
Context such as user's search history, demographics, devices and surroundings, has become prevalent in various domains of information seeking and retrieval such as mobile search, task-based search and social search. While evaluation is central and has a long history in information retrieval, it faces the big challenge of designing an appropriate methodology that embeds the context into the evaluation settings. In this survey, we summarize in a unified view, a wide range of main and recent progress in contextual information retrieval evaluation that leverages diverse context dimensions and uses different principles, methodologies and levels of measurements. More specifically, this survey aims to fill two main gaps in the literature: first, it provides a critical summary and comparison of existing contextual information retrieval evaluation methodologies and metrics according to a simple stratification model; second, it points out the impact of context dynamicity and data privacy on the evaluation design. Finally, we recommend promising research directions for future investigations.
Network Functions Virtualization (NFV) and Software-Defined Networking (SDN) are new paradigms towards open software and network hardware. While NFV aims at virtualizing network functions and deploying them into general purpose hardware, SDN makes networks programmable by separating the control and data planes. NFV and SDN are complementary technologies capable of providing one network solution. SDN can provide connectivity between Virtual Network Functions (VNFs) in a flexible and automated way, whereas NFV can use SDN as part of a service function chain. There are a great deal of studies proposing NFV/SDN architectures in different environments. Researchers have been trying to address reliability, performance, and scalability problems using different architectural designs. This Systematic Literature Review (SLR) focuses on integrated NFV/SDN architectures and has the following goals: i) to investigate and provide an in-depth review of the state-of-the-art of NFV/SDN architectures, ii) to synthesize their architectural designs, and iii) to identify areas for further improvements. In a broad view, this SLR will encourage researchers to advance the current stage of development (i.e., the state-of-the-practice) of integrated NFV/SDN architectures, as well as to shed some light on future research efforts and their challenges.
Stress is a major concern in daily life that imposes significant and growing health and economic costs on society every year. Stress and driving are a dangerous combination which can lead to life-threatening situations as a large number of road traffic crashes occur every year due to driver stress. In addition, the rate of many general health issues caused by work-related chronic stress in drivers who work in public and private transport is greater than many other occupational groups. Therefore, an early warning system for drivers stress level in car is needed to continuously predict dangerous driving situations and alert the driver pro-actively from the perspective of safety and comfortable driving. With recent developments in ambient intelligence, such as sensing technologies, pervasive devices, context recognition, and communications, it is becoming feasible to comfortably measure combinations of different sensed modalities to recognise driver stress automatically. This survey reviews the most recent researches on automatic driver stress level detection domain based on different sensors and data. Different computational techniques which have been used in this domain for data analysis are investigated. The important methodological issues that hinder the implementation of such a system are discussed and future research directions are offered.
The world is becoming a more conjunct place and the number of data sources such as social networks, online transactions, search engines and mobile devices is increasing even more than had been predicted. A large percentage of this growing dataset exists in the form of graphs, and of unprecedented sizes. While todays data from social networks contain 100s of millions of nodes connected by billions of edges, inter-connected data from globally-distributed sensors that forms the Internet of Things (IoT) can cause this to grow exponentially larger. Big data tools designed for text and tuple analysis such as MapReduce cannot process large graphs efficiently. So, graph distributed processing abstractions and systems are developed to design iterative graph algorithms and process large graphs with better performance and scalability. These graph frameworks propose novel methods or extend previous methods for processing graph data. In this article, we propose a taxonomy of graph processing systems and map existing systems to this classification. This captures the diversity in programming and computation models, runtime aspects of partitioning and communication, both for in-memory and distributed frameworks. Our effort helps to highlight key distinctions in architectural approaches, and identifies gaps for future research in scalable graph system
Reproducibility is widely considered to be an essential requirement of the scientific process. However, a number of serious concerns have been raised recently, questioning whether today's computational work is adequately reproducible. In principle, it should be possible to specify a computation to sufficient detail that anyone should be able to reproduce it exactly. But in practice, there are fundamental, technical, and social barriers to doing so. The many objectives and meanings of reproducibility are discussed within the context of scientific computing. Many technical barriers to reproducibility are described, extant approaches surveyed, and open areas of research are identified.
Recommender systems are one of the most successful applications of data mining and machine learning technology in practice and significant technological advances were made over the last two decades. Academic research in the field in the recent past was strongly fueled by the increasing availability of large datasets containing user-item rating matrices. Many of these works were therefore based on a problem abstraction where only one single user-item interaction is considered in the recommendation process. In many application domains, however, multiple user-item interactions of different types can be recorded over time. And, a number of recent works has shown that this information can be used to build richer individual user models and to discover additional behavioral patterns that can be leveraged in the recommendation process. In this work we review existing works that consider information from such sequentially-ordered user-item interaction logs when recommending. In addition, we discuss problem settings where the sequence in which items can be recommended is subject to strict or weak ordering constraints. We propose a categorization of the corresponding recommendation tasks and goals, summarize existing algorithmic solutions, discuss methodological approaches when benchmarking what we call recommender systems, and outline open challenges in the area.
Malicious software are still threatening users on a daily basis and their evolution goes from social- engineering-based bankers to advanced persistent threats (APTs). Recent research and discoveries have presented us to a wide range of anti-analysis and evasion techniques, in-memory attacks, such as Returned Oriented Programming (ROP), and systems subversion, including BIOS and hypervisors. This work presents a survey on techniques able to detect, mitigate and analyze these kinds of attacks, which require transparent and fine-grained environments as analysis resources. We cover current tools limitations, such as not being fully-transparent, and introduce systems and techniques to overcome and/or mitigate these constraints. The work presents approaches based on hypervisor introspection, System Managment Mode (SMM) instrumen- tation as well as some hardware-based ones. We also present some threats based on the same techniques. Our main goal is to give to the reader a broader and more comprehensive understanding of recently-surfaced tools and techniques.
Software-Defined Networking (SDN) opened up new opportunities in networking with its concept of segregated control plane from the data forwarding hardware which enabled network to be programmable, adjustable and reconfigurable dynamically. These characteristics can bring numerous benefits to cloud computing where dynamic changes and reconfiguration are necessary with its on demand usage pattern. Although researchers have studied in utilizing SDN in cloud data centers, gaps still exists that can be explored further. In this paper, we propose a taxonomy to depict different aspects of SDN-enabled cloud data centers and explain each element in detail. The detailed survey of the studies in utilizing SDN for cloud data centers is presented specifically focusing on the power optimization and SLA-aware resource management. We also present various simulation and modelling methods that have been developed for evaluating SDN-enabled cloud data centers. Finally, we analyze the gap in current researches and propose the future directions.
Pointwise anomaly detection and change detection focus on the study of individual data instances however an emerging area of research involves groups or collections of observations. From applications of high energy particle physics to healthcare collusion, group deviation detection techniques result in novel research discoveries, mitigation of risks, prevention of malicious collaborative activities and other interesting explanatory insights. In particular, static group anomaly detection is the process of identifying groups that are not consistent with regular group patterns while dynamic group change detection assesses significant differences in the state of a group over a period of time. Since both group anomaly detection and group change detection share fundamental ideas, this survey paper provides a clearer and deeper understanding of group deviation detection research in static and dynamic situations.
It is essential to find new ways of enabling experts in different disciplines to collaborate more efficiently in the development of ever more complex systems, under increasing market pressures. One possible solution for this challenge is to use a heterogeneous model-based approach where different teams can produce their conventional models and carry out their usual mono-disciplinary analysis, but in addition, the different models can be coupled for simulation (co-simulation), allowing the study of the global behavior of the system. Due to its potential, co-simulation is being studied in many different disciplines but with limited sharing of findings. Our aim with this work is to summarize, bridge, and enhance future research in this multidisciplinary area. We provide an overview of co-simulation approaches, research challenges, and research opportunities, together with a detailed taxonomy with different aspects of the state of the art of co-simulation and classification for the past five years. The main research needs identified are: finding generic approaches for modular, stable and accurate coupling of simulation units; and expressing the adaptations required to ensure that the coupling is correct.
Large volumes of spatio-temporal data are increasingly collected and studied in diverse domains including, climate science, social sciences, neuroscience, epidemiology, transportation, mobile health, and Earth sciences. Spatio-temporal data differs from relational data for which computational approaches are developed in the data mining community for multiple decades, in that both spatial and temporal attributes are available in addition to the actual measurements/attributes. The presence of these attributes introduces additional challenges that needs to be dealt with. Approaches for mining spatio-temporal data have been studied for over a decade in the data mining community. In this article we present a broad survey of this relatively young field of spatio-temporal data mining. We discuss different types of spatio-temporal data and the relevant data mining questions that arise in the context of analyzing each of these datasets. Based on the nature of the data mining problem studied, we classify literature on spatio-temporal data mining into six major categories: clustering, predictive learning, change detection, frequent pattern mining, anomaly detection, and relationship mining. We discuss the various forms of spatio-temporal data mining problems in each of these categories.
Until not long ago, manually capturing and storing provenance from scientific experiments were constant concerns for scientists. With the advent of computational experiments (modeled as scientific workflows) and Scientific Workflow Management Systems, produced and consumed data, as well as the provenance of a given experiment, are automatically managed, so provenance capturing and storing in such context is no longer a major concern. Similarly to several existing big data problems, the bottom line is now on how to analyze the large amounts of provenance data generated by workflow executions and how to be able to extract useful knowledge of this data. In this context, this article surveys the current state-of-art on provenance analytics by presenting the key initiatives that have been taken to support provenance data analysis. We also contribute by proposing a taxonomy to classify elements related to provenance analytics.
Huge increase in the number of digital music tracks has created a necessity to develop an automated tool to extract the needful information from those tracks. As this information has to be extracted from the contents of the music, it is known as Content Based - Music Information Retrieval (CB-MIR). As, since recent two decades, several research outcomes are observed in the area of CB-MIR, there is a need to consolidate and critically analyze the research findings to evolve future research directions. In this survey article, various tasks of content based music information retrieval and their applications are critically reviewed. In particular, the article focuses on eight MIR related tasks such as vocal/non-vocal segmentation, artist identification, genre classification, raga identification, query-by-humming (QBH), emotion recognition, instrument recognition and music clip annotation. The article elaborates the signal processing techniques to extract useful features for performing specific tasks mentioned above and discusses their strengths as well as weakness. This paper also points to some general research issues in CB-MIR and probable approaches towards solutions which help in improving the efficiency of existing CB-MIR systems.
Positional data from small and mobile GPS receivers has become ubiquitous and allows for many new applications such as road traffic or vessel monitoring as well as Location Based Services. To make these applications possible for which information on location is more important than ever, streaming spatial data needs to be managed, mined and used intelligently. This paper provides an overview of previous work in this evolving research field and discusses different applications as well as common problems and solutions. The conclusion indicates promising directions for future research.
Register allocation (assigning variables to processor registers or memory) and instruction scheduling (reordering instructions to increase throughput) in a compiler are essential tasks for generating efficient assembly code. In the last three decades, combinatorial optimization has emerged as an alternative to traditional, heuristic algorithms for these two tasks. Combinatorial optimization approaches can generate optimal code, can accurately capture trade-offs between conflicting decisions, and are more flexible at the expense of increased compilation time. This survey reviews combinatorial optimization for register allocation and instruction scheduling. It focuses on integer programming, constraint programming, and partitioned Boolean quadratic programming as combinatorial techniques that are used in the area, are based on models, and can generate provably optimal code. A detailed, multidimensional classification of the surveyed approaches based on optimization technique, scope, model accuracy, and practical scalability enables to critically compare them and to highlight developments, trends, and challenges.
To match large ontologies, automatic matchers become inevitable solution.However for a tool to efficiently and accurately match these large ontologies, they must integrate additional techniques above the normal techniques for matching small and medium sized ontologies.This paper therefore provides a discussion on the techniques being applied by ontology matching tools to achieve scalability by giving insights on each strategy.In addition we provide a review of the recent tools that employ each strategy
The gap is widening between the processor clock speed of end-system architectures and network throughput capabilities. It is now physically possible to provide single-flow throughput of speeds up to 100 Gbps, and 400 Gbps will soon be possible. Most current research into high-speed data networking focuses on managing expanding network capabilities within datacenter Local-Area Networks (LANs) or efficiently multiplexing millions of relatively small flows through a Wide-Area Network (WAN). However, datacenter hyper-convergence places high-throughput networking workloads on general-purpose hardware, and distributed High-Performance Computing (HPC) applications require time-sensitive, high-throughput end-to-end flows (also referred to as elephant flows) to occur over WANs. For these applications, the bottleneck is often the end-system, and not the intervening network. Since the problem of the end-system bottleneck was uncovered, many techniques have been developed which address this mismatch with varying degrees of effectiveness. In this survey, we describe the most promising techniques, beginning with network architecturesand NIC design, continuing with operating and end-system architectures, and concluding with clean-slate protocol design.
Software testing activities account for a considerable portion of systems development cost and, for this reason, many studies have sought to automate these activities. Test data generation has a high cost reduction potential (specially for complex domain systems), since it can decrease human effort. Although several studies have been published about this subject, articles of reviews covering this topic usually focus only in specific domains. This article presents a systematic mapping aiming at providing a broad, albeit critical, overview of the literature in the topic of test data generation using genetic algorithms. The selected studies were categorized by software testing technique (structural, functional or mutation testing) for which test data were generated and by proposed modifications on genetic algorithms. The most used evaluation metrics and software testing techniques were identified. The results showed that genetic algorithms have been successfully applied to simple test data generation, but are rarely used to generate complex test data such as images, videos, sounds, and three-dimensional models. From these results, we discuss some challenges and opportunities for researches in this area.
Context: Software development process measurement is essential to reach predictable performance and high capability processes. Software process measurement provides support for better understanding, evaluation, management and control of the development process, project and resulting product, as well. Measurement enables organizations to recognize, improve, and predict their processes quality and performance, which place organizations in better position to make appropriate and informed decisions as early as possible during the development process. Objective: This study aims to understand the measurement of the software development process, to identify studies, to create a classification scheme based on the identified studies, and then, to map such studies into the scheme so as to answer the research questions. Method: Systematic mapping is the selected research methodology for this project. Results: A total of 419 studies are included, and classified into four groups with respect to their focus and into three groups based on the publishing date. Conclusion: The project effort and productivity are the attributes that have been measured more frequently, followed by process maturity and productivity in second place. GQM and CMMI are the main methods used in the studies, whereas Agile and Lean development and Small and Medium-Size Enterprise are the most frequently identified research contexts.
Since the mid 1980s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (i) selecting the best optimizations and (ii) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.
While advances in computing resources have made processing enormous amounts of data possible, human ability to identify patterns in such data has not scaled accordingly. Thus, efficient computational methods for condensing and simplifying data are becoming vital for extracting actionable insights. In particular, while data summarization techniques have been studied extensively, only recently has summarizing interconnected data, or graphs, become popular. This survey is a structured, comprehensive overview of the state-of-the-art methods for summarizing graph data. We first broach the motivation behind and the challenges of graph summarization. We then categorize summarization approaches by the type of graphs taken as input and further organize each category by core methodology. Finally, we discuss applications of summarization on real-world graphs and conclude by describing some open problems in the field.
Contemporary mobile devices are the result of an evolution process where computational and networking capabilities have been continuously pushed so as to keep pace with the constantly growing workload requirements. This has allowed devices such as smartphone and tablets to perform increasingly complex tasks, up to the point of efficiently replacing traditional options such as desktop computers and notebooks. However, these devices are more prone to theft, to compromising or to exploitation for attacks and other malicious activity, mainly due to their portability and size. The need for investigation of the aforementioned incidents resulted in the creation of the Mobile Forensics (MF) discipline. MF, a sub-domain of Digital Forensics (DF) is specialized in extracting and processing evidence from mobile devices in such a way that attacking entities and actions are identified and traced. Beyond its primary research interest on accurate evidence acquisition from mobile devices, MF has recently expanded its scope to encompass the organized and advanced evidence representation and analysis of entities behavior. The current paper aims to present the research conducted within the MF ecosystem during the last six years. Moreover, it identifies the gaps and highlights the differences from past research directions. Lastly, it addresses challenges and open issues in the field.
In the recent past, deep learning methods have demonstrated remarkable success for supervised learning tasks in multiple domains including computer vision, natural language processing and speech processing. In this paper, we investigate the impact of deep learning in the field of Biometrics given its success in various other domains. Since Biometrics deals with identifying people using their characteristics, it involves mostly supervised learning and can leverage the success of deep learning in other related domains. In this paper, we survey 100 different approaches that explore deep learning for recognizing individuals using various biometric modalities. We find that most of deep learning research in biometrics has been focused on face and speaker recognition. Based on inferences from these approaches, we discuss how deep learning methods can benefit the field of Biometrics and the potential gaps that deep learning approaches need to address for real-world biometric applications.
Autoscaling system can reconfigure cloud-based services and applications, through various configurations of cloud so ware and provisions of hardware resources, to adapt to the changing environment at runtime. Such a behaviour is the foundation to achieve elasticity in modern cloud computing paradigm. Given the dynamic and uncertain nature of the shared cloud infrastructure, cloud autoscaling system has been engineered as one of the most complex, sophisticated and intelligent artifacts created by human, aiming to achieve self-aware, self-adaptive and dependable runtime scaling. Yet, existing Self-aware and Self-adaptive Cloud Autoscaling System (SSCAS) is not mature to a state that it can be reliably exploited in the cloud. In this article, we survey the state-of-the-art research studies on SSCAS and provide a comprehensive taxonomy for this eld. We present detailed analysis of the results and provide insights on the open challenges, as well as some of the promising solutions that are worth investigated in the future work of this area of research. Our survey and taxonomy contribute to the fundamentals to engineering more intelligent autoscaling systems in the cloud.
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing CNN ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, an evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.
Computational creativity seeks to understand computational mechanisms that can be characterized as creative. Creation of new concepts is a central challenge for any creative system. In this paper, we outline different approaches to concept creation and then review conceptual representations relevant to concept creation. The conceptual representations are organized in accordance with two important perspectives on the distinctions between them. One distinction is between symbolic, spatial and connectionist representations. The other is between descriptive and procedural representations. These two distinctions are orthogonal. Additionally, conceptual representations used in particular creative domains, i.e. language, music, image and emotion, are reviewed separately. For each representation reviewed, we cover the inference it affords, the computational means of building it, and its application in concept creation.
Over the past decades, researches have been proposing different Intrusion Detection approaches to deal with the increasing number and complexity of threats for computer systems. In this context, Random Forest models have been providing a notable performance on their applications in the realm of the behaviour-based Intrusion Detection Systems. Specificities of the Random Forest model are used to provide classification, feature selection and proximity metrics. This work provides a comprehensive review of the general basic concepts related to Intrusion Detection Systems, including taxonomies, attacks, data collection, modelling, evaluation metrics and commonly used methods. It also provides a survey of Random Forest based methods applied on this context, considering the particularities involved in these models. Finally, some open questions and challenges are posed combined with possible directions to deal with them, which may guide future works on the area.
This survey presents multidimensional scaling (MDS) methods and their applications in real world. MDS is an exploratory and multivariate data analysis technique becoming more and more popular. MDS is one of the multivariate data analysis techniques, which tries to represent the higher dimensional data into lower space. The input data for MDS analysis is measured by the dissimilarity or similarity of the objects under observation. Once the MDS technique is applied to the measured dissimilarity or similarity, MDS results in a spatial map. In the spatial map the dissimilar objects are far apart while objects which are similar are placed close to each other. In this survey paper, MDS is described fairly in comprehensive fashion by explaining the basic notions of classical MDS and how MDS can be helpful to analyze the multidimensional data. Later on various MDS based special models are described in a more mathematical way.
Cyber attacks are increasingly menacing businesses. Based on literature review and publicly available reports, this paper develops a comprehensive and systematic framework of the cybercrime business. A value chain model is constructed and used to describe 25 key value-added activities, which can be offered "as a service" for use in a cyber attack. Understanding the specialization, commercialization, and cooperation of services for cyber attacks helps to anticipate emerging cyber attack services. Finally, this framework can help to build a more cyber immune system by targeting cybercrime control-points and assigning defense responsibilities to encourage collaboration.
Monitoring the ``physics'' of control systems to detect attacks is a growing area of research. In its basic form a security monitor creates time-series models of sensor readings for an industrial control system and identifies anomalies in these measurements in order to identify potentially false control commands or false sensor readings. In this paper, we review previous work on physics-based anomaly detection based on a unified taxonomy that allows us to identify limitations and unexplored challenges, and propose new solutions.
The goal of privacy metrics is to measure the degree of privacy enjoyed by users in a system and the amount of protection offered by privacy-enhancing technologies. In this way, privacy metrics contribute to improving user privacy in the digital world. The diversity and complexity of privacy metrics in the literature makes an informed choice of metrics challenging. As a result, redundant new metrics are proposed frequently, and privacy studies are often incomparable. In this survey we alleviate these problems by structuring the landscape of privacy metrics. For this we explain and discuss a selection of over eighty privacy metrics and introduce a categorization based on the aspect of privacy they measure, their required inputs, and the type of data that needs protection. In addition, we present a method on how to choose privacy metrics based on eight questions that help identify the right privacy metrics for a given scenario, and highlight topics where additional work on privacy metrics is needed. Our survey spans multiple privacy domains and can be understood as a general framework for privacy measurement.
The Internet of Things (IoT) envisions a world-wide, interconnected network of smart physical entities. These physical entities generate a large amount of data in operation and as the IoT gains momentum in terms of deployment, the combined scale of those data seems destined to continue to grow. Increasingly, applications for the IoT involve analytics. Data analytics is the process of deriving knowledge from data, generating value like actionable insights from them. This article reviews work in the IoT and big data analytics from the perspective of their utility in creating efficient, effective and innovative applications and services for a wide spectrum of domains. We review the broad vision for the IoT as it is shaped in various communities, examine the application of data analytics across IoT domains, provide a categorisation of analytic approaches and propose a layered taxonomy from IoT data to analytics. This taxonomy provides us with insights on the appropriateness of analytical techniques, which in turn shapes a survey of enabling technology and infrastructure for IoT analytics. Finally, we look at some tradeoffs for analytics in the IoT that can shape future research.
Activity recognition aims to provide accurate and opportune information on peoples activities by leveraging sensory data available in todays sensory rich environments. Nowadays, activity recognition has become an emerging field in the areas of pervasive and ubiquitous computing. A typical activity recognition technique processes data streams that evolve from sensing platforms such as mobile sensors, on body sensors, and/or ambient sensors. This paper surveys the two overlapped areas of research of activity recognition and data stream mining. The perspective of this paper is to review the adaptation capabilities of activity recognition techniques in streaming environment. Broad categories of techniques are identified based on the different features in both data streams and activity recognition. The pros and cons of the algorithms in each category are analysed and the possible directions of future research are indicated.
Dynamic and partial reconfiguration are key differentiating capabilities of field programmable gate arrays (FPGAs). While they have been studied extensively in academic literature, they find limited use in deployed systems. We review FPGA reconfiguration, looking at architectures built for the purpose, and the properties of modern commercial architectures. We then investigate design flows, and identify the key challenges in making reconfigurable FPGA systems easier to design. Finally, we look at applications where reconfiguration has found use, as well as proposing new areas where this capability places FPGAs in a unique position for adoption.
The size of Linked Data is growing fast, thus a Linked Data management system must to be able to deal with increasing amounts of data. Even though physically handling Linked Data using a relational table is possible, querying a giant triple table becomes very costly due to the multiple nested joins required for typical queries. In addition, the heterogeneity of Linked Data poses entirely new challenges to database systems. This article provides a comprehensive study of the state of the art in storing and querying RDF data. In particular, we focus on data storage techniques, indexing strategies, and query execution mechanisms. In addition, we provide a classification of existing systems and approaches. We also provide an overview of the various benchmarking efforts in this context and discuss some of the open problems in this domain.
The Internet has undergone dramatic changes in the past 15 years, and now forms a global communication platform that billions of users rely on for their daily activities. While this transformation has brought tremendous benefits to society, it has also created new threats to online privacy, such as omnipotent governmental surveillance. As a result, public interest in systems for anonymous communication has drastically increased. In this work, we survey previous research on designing, developing, and deploying systems for anonymous communication. Our taxonomy and comparative assessment provide important insights about the differences between the existing classes of anonymous communication protocols.
We, humans, are able to identify other people even in voice disguise conditions. However, we are not immune to all voice changes when trying to identifying people from voice. Likewise, automatic speaker recognition systems can also be deceived by voice imitation and other types of disguise. Taking into account the voice disguise classification into the combination of two different categories (deliberate/non-deliberate and electronic/non-electronic), this survey provides a literature review on the influence of voice disguise in the automatic speaker recognition task and the robustness of these systems to such voice changes. Additionally, the survey addresses existing applications dealing with voice disguise and analyses some issues for future research.
GPS-equipped devices such as smartphones have become prevalent in the past decade. They have fostered abundant location-based services in applications such as navigation and location-based social networking. Continuous spatial queries serve as a building block for many location-based services. An example of such queries is to continuously maintain the nearest customers for an Uber driver when she is driving. Processing such queries with high efficiency is crucial to the user experience, since real-time updates are required to the query result as the query or data objects are moving. A popular approach to address this efficiency issue is to use safe regions. A safe region is a region inside which an object can move arbitrarily without causing any changes to the query result. As long as the query object stays in its safe region, no query result update is required. This substantially reduces the frequency of query re-evaluation and query result update, and hence improves query efficiency. Safe regions have very interesting theoretical properties and are worth in-depth analysis. We provide a comparative study of the safe region based approaches. We describe how safe regions are defined and computed for different types of continuous spatial queries, and discuss possible further improvements.
Many networking research activities are dependent on the availability of network captures. Even outside academic research there is a need for sharing network captures, to cooperate on threat assessments or for debugging. However, most network captures can not be shared due to privacy concerns. There have been many advances in the understanding of anonymisation and cryptographic methods, which have changed the perspective on the effectiveness of many anonymisation techniques. On the other hand these advances, combined with the increase of computational abilities, may have also made it feasible to perform anonymisation in real-time. This may make it easier to collect and distribute network captures, both for research and for other applications. This article surveys the literature over the period of 1998 -- 2015 on network traffic anonymisation techniques and implementations. The aim is to provide an overview of the current state of the art, and to highlight how advances in related fields have shed new light on anonymisation and pseudonimisation methodologies. The few currently maintained implementations are also reviewed. Lastly, we identify future research directions to enable easier sharing of network traffic, which in turn can enable new insights in network traffic analy
Many security and software testing applications require checking whether certain properties of a program hold for any possible usage scenario. For instance, a tool for identifying software vulnerabilities may need to rule out the existence of any backdoor to bypass a program's authentication. One approach would be to test the program using different, possibly random inputs. As the backdoor may only be hit for very specific program workloads, automated exploration of the space of possible inputs is of the essence. Symbolic execution provides an elegant solution to the problem, by systematically exploring many possible execution paths at the same time without necessarily requiring concrete inputs. Rather than taking on fully specified input values, the technique abstractly represents them as symbols, resorting to constraint solvers to construct actual instances that would cause property violations. Symbolic execution has been incubated in dozens of tools developed over the last four decades, leading to major practical breakthroughs in a number of prominent software reliability applications. The goal of this survey is to provide an overview of the main ideas, challenges, and solutions developed in the area, distilling them for a broad audience.
Intrusion alert analysis is an attractive and active topic in the area of intrusion detection and prevention system (IDPS). In recent decades, many research communities have been working in this field. Therefore, a large volume of research works are released and hence, various research areas have emerged. However, there has been no systematic and up-to-date review of research works within the field. The main objective of this paper is to achieve a taxonomy of research fields in intrusion alert analysis and present a reference guide for researchers who want to enter in this area. To this aim, a systematic mapping study (SMS) on 433 high-quality research works has been conducted. By using keywords clustering, there are ten different research topics in the field of intrusion alert analysis which can be classified into three broad groups: pre-processing, processing, and post-processing. A brief description is provided regarding these groups and their related topics. Indeed, some useful analysis are presented based on data extraction from research works. The results show that the processing group contains most of the research works and newly moved to heterogeneous correlation. Also, the post-processing group is newer than others and recently considered by research communities and security administrators.