A Survey on Malicious Domains Detection through DNS Data Analysis
Three-dimensional data are increasingly prevalent across biomedical and social domains. Notable examples are gene-sample-time, individual-feature-time or node-node-time data, generally referred observation-attribute-context data. The unsupervised analysis of three-dimensional data can be pursued to discover putative biological modules, disease progression patterns or communities of individuals with coherent behavior, thus being key to enhance the understanding of complex biological, individual and societal systems. Although clustering can be applied to group observations, it is of limited potential since observations in three-dimensional data domains are typically only meaningfully correlated on subspaces of the overall space. Biclustering tackles this challenge but disregards the third dimension of data. In this context, triclustering -- the discovery of coherent subspaces within three-dimensional data -- has been largely researched to tackle these problems. Despite the diversity of contributions in this field, there is still lacking a structured view on the major requirements of this task, allowed homogeneity criteria (including coherency, structure, quality, locality and orthonormality criteria) and algorithmic approaches. In this context, this work formalizes the triclustering task and its scope; introduces a taxonomy to categorize the contributions on the field; provides a comprehensive comparison of state-of-the-art triclustering algorithms according to their behavior and output; and lists relevant real-world applications.
Network Functions Virtualization (NFV) and Software-Defined Networking (SDN) are new paradigms towards open software and network hardware. While NFV aims at virtualizing network functions and deploying them into general purpose hardware, SDN makes networks programmable by separating the control and data planes. NFV and SDN are complementary technologies capable of providing one network solution. SDN can provide connectivity between Virtual Network Functions (VNFs) in a flexible and automated way, whereas NFV can use SDN as part of a service function chain. There are a great deal of studies proposing NFV/SDN architectures in different environments. Researchers have been trying to address reliability, performance, and scalability problems using different architectural designs. This Systematic Literature Review (SLR) focuses on integrated NFV/SDN architectures and has the following goals: i) to investigate and provide an in-depth review of the state-of-the-art of NFV/SDN architectures, ii) to synthesize their architectural designs, and iii) to identify areas for further improvements. In a broad view, this SLR will encourage researchers to advance the current stage of development (i.e., the state-of-the-practice) of integrated NFV/SDN architectures, as well as to shed some light on future research efforts and their challenges.
Pilot-Job systems play an important role in supporting distributed scientific computing. With the increasing importance of task-level parallelism in high-performance computing, Pilot-Job systems are also witnessing an adoption beyond traditional domains. Notwithstanding the growing impact on scientific research, there is no agreement upon a definition of Pilot-Job system and no clear understanding of the underlying abstraction and paradigm. Pilot-Job implementations have proliferated with no shared best practices or open interfaces and little interoperability. Ultimately, this is hindering the realization of the full impact of Pilot-Jobs by limiting their robustness, portability, and maintainability. This paper offers a com- prehensive analysis of Pilot-Job systems critically assessing their motivations, evolution, properties, and implementation. The three main contributions of this paper are: (i) an anal- ysis of the motivations and evolution of Pilot-Job systems; (ii) an outline of the Pilot abstraction, its distinguishing logi- cal components and functionalities, its terminology, and its architecture pattern; and (iii) the description of core and auxiliary properties of Pilot-Jobs systems and the analysis of seven exemplar Pilot-Job implementations. Together, these contributions illustrate the Pilot paradigm, its generality, and how it helps to address some challenges in distributed scientific computing.
The main achievements of spatio-temporal modelling in the field of Geographic Information Science over the past three decades are surveyed. This article offers an overview of: (i) the origins and history of Temporal Geographic Information Systems (T-GIS); (ii) relevant spatio-temporal data models proposed; (iii) the evolution of spatio-temporal modelling trends; and (iv) an analysis of the future trends and developments in T-GIS. It also presents some current theories and concepts that have emerged from the research performed, as well as a summary of the current progress and the upcoming challenges and potential research directions for T-GIS. One relevant result of this survey is the proposed taxonomy of spatio-temporal modelling trends, which classifies 186 modelling proposals surveyed from more than 1400 articles.
Stress is a major concern in daily life that imposes significant and growing health and economic costs on society every year. Stress and driving are a dangerous combination which can lead to life-threatening situations as a large number of road traffic crashes occur every year due to driver stress. In addition, the rate of many general health issues caused by work-related chronic stress in drivers who work in public and private transport is greater than many other occupational groups. Therefore, an early warning system for drivers stress level in car is needed to continuously predict dangerous driving situations and alert the driver pro-actively from the perspective of safety and comfortable driving. With recent developments in ambient intelligence, such as sensing technologies, pervasive devices, context recognition, and communications, it is becoming feasible to comfortably measure combinations of different sensed modalities to recognise driver stress automatically. This survey reviews the most recent researches on automatic driver stress level detection domain based on different sensors and data. Different computational techniques which have been used in this domain for data analysis are investigated. The important methodological issues that hinder the implementation of such a system are discussed and future research directions are offered.
Reproducibility is widely considered to be an essential requirement of the scientific process. However, a number of serious concerns have been raised recently, questioning whether today's computational work is adequately reproducible. In principle, it should be possible to specify a computation to sufficient detail that anyone should be able to reproduce it exactly. But in practice, there are fundamental, technical, and social barriers to doing so. The many objectives and meanings of reproducibility are discussed within the context of scientific computing. Many technical barriers to reproducibility are described, extant approaches surveyed, and open areas of research are identified.
Recommender systems are one of the most successful applications of data mining and machine learning technology in practice and significant technological advances were made over the last two decades. Academic research in the field in the recent past was strongly fueled by the increasing availability of large datasets containing user-item rating matrices. Many of these works were therefore based on a problem abstraction where only one single user-item interaction is considered in the recommendation process. In many application domains, however, multiple user-item interactions of different types can be recorded over time. And, a number of recent works has shown that this information can be used to build richer individual user models and to discover additional behavioral patterns that can be leveraged in the recommendation process. In this work we review existing works that consider information from such sequentially-ordered user-item interaction logs when recommending. In addition, we discuss problem settings where the sequence in which items can be recommended is subject to strict or weak ordering constraints. We propose a categorization of the corresponding recommendation tasks and goals, summarize existing algorithmic solutions, discuss methodological approaches when benchmarking what we call recommender systems, and outline open challenges in the area.
Software-Defined Networking (SDN) opened up new opportunities in networking with its concept of segregated control plane from the data forwarding hardware which enabled network to be programmable, adjustable and reconfigurable dynamically. These characteristics can bring numerous benefits to cloud computing where dynamic changes and reconfiguration are necessary with its on demand usage pattern. Although researchers have studied in utilizing SDN in cloud data centers, gaps still exists that can be explored further. In this paper, we propose a taxonomy to depict different aspects of SDN-enabled cloud data centers and explain each element in detail. The detailed survey of the studies in utilizing SDN for cloud data centers is presented specifically focusing on the power optimization and SLA-aware resource management. We also present various simulation and modelling methods that have been developed for evaluating SDN-enabled cloud data centers. Finally, we analyze the gap in current researches and propose the future directions.
It is essential to find new ways of enabling experts in different disciplines to collaborate more efficiently in the development of ever more complex systems, under increasing market pressures. One possible solution for this challenge is to use a heterogeneous model-based approach where different teams can produce their conventional models and carry out their usual mono-disciplinary analysis, but in addition, the different models can be coupled for simulation (co-simulation), allowing the study of the global behavior of the system. Due to its potential, co-simulation is being studied in many different disciplines but with limited sharing of findings. Our aim with this work is to summarize, bridge, and enhance future research in this multidisciplinary area. We provide an overview of co-simulation approaches, research challenges, and research opportunities, together with a detailed taxonomy with different aspects of the state of the art of co-simulation and classification for the past five years. The main research needs identified are: finding generic approaches for modular, stable and accurate coupling of simulation units; and expressing the adaptations required to ensure that the coupling is correct.
Large volumes of spatio-temporal data are increasingly collected and studied in diverse domains including, climate science, social sciences, neuroscience, epidemiology, transportation, mobile health, and Earth sciences. Spatio-temporal data differs from relational data for which computational approaches are developed in the data mining community for multiple decades, in that both spatial and temporal attributes are available in addition to the actual measurements/attributes. The presence of these attributes introduces additional challenges that needs to be dealt with. Approaches for mining spatio-temporal data have been studied for over a decade in the data mining community. In this article we present a broad survey of this relatively young field of spatio-temporal data mining. We discuss different types of spatio-temporal data and the relevant data mining questions that arise in the context of analyzing each of these datasets. Based on the nature of the data mining problem studied, we classify literature on spatio-temporal data mining into six major categories: clustering, predictive learning, change detection, frequent pattern mining, anomaly detection, and relationship mining. We discuss the various forms of spatio-temporal data mining problems in each of these categories.
Until not long ago, manually capturing and storing provenance from scientific experiments were constant concerns for scientists. With the advent of computational experiments (modeled as scientific workflows) and Scientific Workflow Management Systems, produced and consumed data, as well as the provenance of a given experiment, are automatically managed, so provenance capturing and storing in such context is no longer a major concern. Similarly to several existing big data problems, the bottom line is now on how to analyze the large amounts of provenance data generated by workflow executions and how to be able to extract useful knowledge of this data. In this context, this article surveys the current state-of-art on provenance analytics by presenting the key initiatives that have been taken to support provenance data analysis. We also contribute by proposing a taxonomy to classify elements related to provenance analytics.
Huge increase in the number of digital music tracks has created a necessity to develop an automated tool to extract the needful information from those tracks. As this information has to be extracted from the contents of the music, it is known as Content Based - Music Information Retrieval (CB-MIR). As, since recent two decades, several research outcomes are observed in the area of CB-MIR, there is a need to consolidate and critically analyze the research findings to evolve future research directions. In this survey article, various tasks of content based music information retrieval and their applications are critically reviewed. In particular, the article focuses on eight MIR related tasks such as vocal/non-vocal segmentation, artist identification, genre classification, raga identification, query-by-humming (QBH), emotion recognition, instrument recognition and music clip annotation. The article elaborates the signal processing techniques to extract useful features for performing specific tasks mentioned above and discusses their strengths as well as weakness. This paper also points to some general research issues in CB-MIR and probable approaches towards solutions which help in improving the efficiency of existing CB-MIR systems.
Positional data from small and mobile GPS receivers has become ubiquitous and allows for many new applications such as road traffic or vessel monitoring as well as Location Based Services. To make these applications possible for which information on location is more important than ever, streaming spatial data needs to be managed, mined and used intelligently. This paper provides an overview of previous work in this evolving research field and discusses different applications as well as common problems and solutions. The conclusion indicates promising directions for future research.
The gap is widening between the processor clock speed of end-system architectures and network throughput capabilities. It is now physically possible to provide single-flow throughput of speeds up to 100 Gbps, and 400 Gbps will soon be possible. Most current research into high-speed data networking focuses on managing expanding network capabilities within datacenter Local-Area Networks (LANs) or efficiently multiplexing millions of relatively small flows through a Wide-Area Network (WAN). However, datacenter hyper-convergence places high-throughput networking workloads on general-purpose hardware, and distributed High-Performance Computing (HPC) applications require time-sensitive, high-throughput end-to-end flows (also referred to as elephant flows) to occur over WANs. For these applications, the bottleneck is often the end-system, and not the intervening network. Since the problem of the end-system bottleneck was uncovered, many techniques have been developed which address this mismatch with varying degrees of effectiveness. In this survey, we describe the most promising techniques, beginning with network architecturesand NIC design, continuing with operating and end-system architectures, and concluding with clean-slate protocol design.
Networks are used to represent relationships between entities in many complex systems, spanning from online social networks to biological cell development and brain activity. These networks model relationships which present various challenges. In many cases, relationships between entities are unambiguously known: are two users friends in a social network? Do two researchers collaborate on a published paper? Do two road segments in a transportation system intersect? These are unambiguous and directly observable in the system in question. In most cases, relationship between nodes are not directly observable and must be inferred: does one gene regulate the expression of another? Do two animals who physically co-locate have a social bond? Who infected whom in a disease outbreak? Existing approaches use specialized knowledge in different home domains to infer and measure the goodness of inferred network for a specific task. However, current research lacks a rigorous validation framework which employs standard statistical validation. In this survey, we examine how network representations are learned from non-network data, the variety of questions and tasks on these data over several domains, and validation strategies for measuring the inferred network's capability of answering questions on the original system of interest.
Software testing activities account for a considerable portion of systems development cost and, for this reason, many studies have sought to automate these activities. Test data generation has a high cost reduction potential (specially for complex domain systems), since it can decrease human effort. Although several studies have been published about this subject, articles of reviews covering this topic usually focus only in specific domains. This article presents a systematic mapping aiming at providing a broad, albeit critical, overview of the literature in the topic of test data generation using genetic algorithms. The selected studies were categorized by software testing technique (structural, functional or mutation testing) for which test data were generated and by proposed modifications on genetic algorithms. The most used evaluation metrics and software testing techniques were identified. The results showed that genetic algorithms have been successfully applied to simple test data generation, but are rarely used to generate complex test data such as images, videos, sounds, and three-dimensional models. From these results, we discuss some challenges and opportunities for researches in this area.
Web application providers have been migrating their applications to cloud data centers, attracted by the emerging cloud computing paradigm. One of the appealing features of cloud is elasticity. It allows cloud users to acquire or release computing resources on demand, which enables web application providers to auto-scale the resources provisioned to their applications under dynamic workload in order to minimize resource cost while satisfying Quality of Service (QoS) requirements. In this paper, we comprehensively analyze the challenges remain in auto-scaling web applications in clouds and review the developments in this field. We present a taxonomy of auto-scaling systems according to the identified challenges and key properties. We analyze the surveyed works and map them to the taxonomy to identify the weakness in this field. Moreover, based on the analysis, we propose new future directions.
Context: Software development process measurement is essential to reach predictable performance and high capability processes. Software process measurement provides support for better understanding, evaluation, management and control of the development process, project and resulting product, as well. Measurement enables organizations to recognize, improve, and predict their processes quality and performance, which place organizations in better position to make appropriate and informed decisions as early as possible during the development process. Objective: This study aims to understand the measurement of the software development process, to identify studies, to create a classification scheme based on the identified studies, and then, to map such studies into the scheme so as to answer the research questions. Method: Systematic mapping is the selected research methodology for this project. Results: A total of 419 studies are included, and classified into four groups with respect to their focus and into three groups based on the publishing date. Conclusion: The project effort and productivity are the attributes that have been measured more frequently, followed by process maturity and productivity in second place. GQM and CMMI are the main methods used in the studies, whereas Agile and Lean development and Small and Medium-Size Enterprise are the most frequently identified research contexts.
Since the mid 1980s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (i) selecting the best optimizations and (ii) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.
Recent diversity of storage demands made various shortcomings of traditional RDBMS systems revealed, which in turn led to the emergence of a new trend of complementary non-relational data management solutions, named as NoSQL (Not only SQL). This survey mainly aims at presenting the work that has been conducted with regard to four closely related concepts of NoSQL stores: data model, consistency model, data partitioning and replication. For each concept, its different protocols, and for each protocol, its corresponding features, strengths and drawbacks are explained. Furthermore, various implementations of each protocol are exemplified and crystallized through a collection of representative academic and industrial NoSQL technologies. The rationale behind each design decision along with some corresponding extensions and improvements are discussed. Finally, we disclose some existing challenges in developing effective NoSQL stores, which need attention from the research community, application designers and architects.
While advances in computing resources have made processing enormous amounts of data possible, human ability to identify patterns in such data has not scaled accordingly. Thus, efficient computational methods for condensing and simplifying data are becoming vital for extracting actionable insights. In particular, while data summarization techniques have been studied extensively, only recently has summarizing interconnected data, or graphs, become popular. This survey is a structured, comprehensive overview of the state-of-the-art methods for summarizing graph data. We first broach the motivation behind and the challenges of graph summarization. We then categorize summarization approaches by the type of graphs taken as input and further organize each category by core methodology. Finally, we discuss applications of summarization on real-world graphs and conclude by describing some open problems in the field.
Contemporary mobile devices are the result of an evolution process where computational and networking capabilities have been continuously pushed so as to keep pace with the constantly growing workload requirements. This has allowed devices such as smartphone and tablets to perform increasingly complex tasks, up to the point of efficiently replacing traditional options such as desktop computers and notebooks. However, these devices are more prone to theft, to compromising or to exploitation for attacks and other malicious activity, mainly due to their portability and size. The need for investigation of the aforementioned incidents resulted in the creation of the Mobile Forensics (MF) discipline. MF, a sub-domain of Digital Forensics (DF) is specialized in extracting and processing evidence from mobile devices in such a way that attacking entities and actions are identified and traced. Beyond its primary research interest on accurate evidence acquisition from mobile devices, MF has recently expanded its scope to encompass the organized and advanced evidence representation and analysis of entities behavior. The current paper aims to present the research conducted within the MF ecosystem during the last six years. Moreover, it identifies the gaps and highlights the differences from past research directions. Lastly, it addresses challenges and open issues in the field.
In the recent past, deep learning methods have demonstrated remarkable success for supervised learning tasks in multiple domains including computer vision, natural language processing and speech processing. In this paper, we investigate the impact of deep learning in the field of Biometrics given its success in various other domains. Since Biometrics deals with identifying people using their characteristics, it involves mostly supervised learning and can leverage the success of deep learning in other related domains. In this paper, we survey 100 different approaches that explore deep learning for recognizing individuals using various biometric modalities. We find that most of deep learning research in biometrics has been focused on face and speaker recognition. Based on inferences from these approaches, we discuss how deep learning methods can benefit the field of Biometrics and the potential gaps that deep learning approaches need to address for real-world biometric applications.
Autoscaling system can reconfigure cloud-based services and applications, through various configurations of cloud so ware and provisions of hardware resources, to adapt to the changing environment at runtime. Such a behaviour is the foundation to achieve elasticity in modern cloud computing paradigm. Given the dynamic and uncertain nature of the shared cloud infrastructure, cloud autoscaling system has been engineered as one of the most complex, sophisticated and intelligent artifacts created by human, aiming to achieve self-aware, self-adaptive and dependable runtime scaling. Yet, existing Self-aware and Self-adaptive Cloud Autoscaling System (SSCAS) is not mature to a state that it can be reliably exploited in the cloud. In this article, we survey the state-of-the-art research studies on SSCAS and provide a comprehensive taxonomy for this eld. We present detailed analysis of the results and provide insights on the open challenges, as well as some of the promising solutions that are worth investigated in the future work of this area of research. Our survey and taxonomy contribute to the fundamentals to engineering more intelligent autoscaling systems in the cloud.
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing CNN ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, an evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.
Computational creativity seeks to understand computational mechanisms that can be characterized as creative. Creation of new concepts is a central challenge for any creative system. In this paper, we outline different approaches to concept creation and then review conceptual representations relevant to concept creation. The conceptual representations are organized in accordance with two important perspectives on the distinctions between them. One distinction is between symbolic, spatial and connectionist representations. The other is between descriptive and procedural representations. These two distinctions are orthogonal. Additionally, conceptual representations used in particular creative domains, i.e. language, music, image and emotion, are reviewed separately. For each representation reviewed, we cover the inference it affords, the computational means of building it, and its application in concept creation.
Over the past decades, researches have been proposing different Intrusion Detection approaches to deal with the increasing number and complexity of threats for computer systems. In this context, Random Forest models have been providing a notable performance on their applications in the realm of the behaviour-based Intrusion Detection Systems. Specificities of the Random Forest model are used to provide classification, feature selection and proximity metrics. This work provides a comprehensive review of the general basic concepts related to Intrusion Detection Systems, including taxonomies, attacks, data collection, modelling, evaluation metrics and commonly used methods. It also provides a survey of Random Forest based methods applied on this context, considering the particularities involved in these models. Finally, some open questions and challenges are posed combined with possible directions to deal with them, which may guide future works on the area.
This survey presents multidimensional scaling (MDS) methods and their applications in real world. MDS is an exploratory and multivariate data analysis technique becoming more and more popular. MDS is one of the multivariate data analysis techniques, which tries to represent the higher dimensional data into lower space. The input data for MDS analysis is measured by the dissimilarity or similarity of the objects under observation. Once the MDS technique is applied to the measured dissimilarity or similarity, MDS results in a spatial map. In the spatial map the dissimilar objects are far apart while objects which are similar are placed close to each other. In this survey paper, MDS is described fairly in comprehensive fashion by explaining the basic notions of classical MDS and how MDS can be helpful to analyze the multidimensional data. Later on various MDS based special models are described in a more mathematical way.
Automatic machine-based Facial Expression Analysis (FEA) has witnessed substantial progress in the past few decades motivated by its importance in psychology, security, health, entertainment and human computer interaction. However, the vast majority of current studies are based on non-occluded faces collected in a controlled laboratory environment, and automatic expression recognition from partially occluded faces remains a largely unresolved field, particularly in real-world scenarios. In recent years, increasing efforts have been directed at investigating techniques to handle partial occlusion for FEA. This survey provides a comprehensive review of the recent advances in dataset creation, algorithm development, and investigations of the effects of occlusion, which are crucial in system design and evaluations. It also outlines existing challenges in overcoming partial occlusion and discusses possible opportunities in advancing the technology. To the best of our knowledge, it is the first FEA survey dedicated to occlusion and devoted to serve as a starting point to promote future work.
The goal of privacy metrics is to measure the degree of privacy enjoyed by users in a system and the amount of protection offered by privacy-enhancing technologies. In this way, privacy metrics contribute to improving user privacy in the digital world. The diversity and complexity of privacy metrics in the literature makes an informed choice of metrics challenging. As a result, redundant new metrics are proposed frequently, and privacy studies are often incomparable. In this survey we alleviate these problems by structuring the landscape of privacy metrics. For this we explain and discuss a selection of over eighty privacy metrics and introduce a categorization based on the aspect of privacy they measure, their required inputs, and the type of data that needs protection. In addition, we present a method on how to choose privacy metrics based on eight questions that help identify the right privacy metrics for a given scenario, and highlight topics where additional work on privacy metrics is needed. Our survey spans multiple privacy domains and can be understood as a general framework for privacy measurement.
Activities of a clinical sta in healthcare environments must regularly be adapted to new treatment methods, medications and technologies. This constant evolution requires the monitoring of the work ow, or the sequence of actions from actors involved in a procedure, to ensure quality of medical services. In this context, recent advances in sensing technologies, including Real-time Location Systems (RTLS) and Computer Vision, enable high-precision tracking of actors and equipment. The current state-of-the-art about healthcare work ow monitoring typically focuses on a single technology and does not discuss its integration with others. Such an integration can lead to better solutions to evaluate medical work ows. This study aims to ll the gap regarding the analysis of monitoring technologies with a systematic literature review about sensors for capturing the work ow of healthcare environments. Its main scienti c contribution is to identify both current technologies used to track activities in a clinical environment and gaps on their combination to achieve better results. We also propose a taxonomy to classify work regarding sensing technologies and methods. Our review did not identify proposals that combine data obtained from RTLS and Computer Vision sensors. We conclude that a multimodal analysis is more exible and could yield better results.
While cloud computing has brought paradigm shifts to computing services, researchers and developers have also found some problems inherent to its nature such as bandwidth bottleneck, communication overhead, and location blindness. The concept of fog/edge computing is therefore coined to extend the services from the core in cloud data centers to the edge of the network. In recent years, many systems are proposed to better serve ubiquitous smart devices closer to the user. This paper provides a complete and up-to-date review of edge-oriented computing systems by encapsulating relevant proposals on their architecture features, management approaches, and design objectives.
Activity recognition aims to provide accurate and opportune information on peoples activities by leveraging sensory data available in todays sensory rich environments. Nowadays, activity recognition has become an emerging field in the areas of pervasive and ubiquitous computing. A typical activity recognition technique processes data streams that evolve from sensing platforms such as mobile sensors, on body sensors, and/or ambient sensors. This paper surveys the two overlapped areas of research of activity recognition and data stream mining. The perspective of this paper is to review the adaptation capabilities of activity recognition techniques in streaming environment. Broad categories of techniques are identified based on the different features in both data streams and activity recognition. The pros and cons of the algorithms in each category are analysed and the possible directions of future research are indicated.
Dynamic and partial reconfiguration are key differentiating capabilities of field programmable gate arrays (FPGAs). While they have been studied extensively in academic literature, they find limited use in deployed systems. We review FPGA reconfiguration, looking at architectures built for the purpose, and the properties of modern commercial architectures. We then investigate design flows, and identify the key challenges in making reconfigurable FPGA systems easier to design. Finally, we look at applications where reconfiguration has found use, as well as proposing new areas where this capability places FPGAs in a unique position for adoption.
The size of Linked Data is growing fast, thus a Linked Data management system must to be able to deal with increasing amounts of data. Even though physically handling Linked Data using a relational table is possible, querying a giant triple table becomes very costly due to the multiple nested joins required for typical queries. In addition, the heterogeneity of Linked Data poses entirely new challenges to database systems. This article provides a comprehensive study of the state of the art in storing and querying RDF data. In particular, we focus on data storage techniques, indexing strategies, and query execution mechanisms. In addition, we provide a classification of existing systems and approaches. We also provide an overview of the various benchmarking efforts in this context and discuss some of the open problems in this domain.
The Internet has undergone dramatic changes in the past 15 years, and now forms a global communication platform that billions of users rely on for their daily activities. While this transformation has brought tremendous benefits to society, it has also created new threats to online privacy, such as omnipotent governmental surveillance. As a result, public interest in systems for anonymous communication has drastically increased. In this work, we survey previous research on designing, developing, and deploying systems for anonymous communication. Our taxonomy and comparative assessment provide important insights about the differences between the existing classes of anonymous communication protocols.
We, humans, are able to identify other people even in voice disguise conditions. However, we are not immune to all voice changes when trying to identifying people from voice. Likewise, automatic speaker recognition systems can also be deceived by voice imitation and other types of disguise. Taking into account the voice disguise classification into the combination of two different categories (deliberate/non-deliberate and electronic/non-electronic), this survey provides a literature review on the influence of voice disguise in the automatic speaker recognition task and the robustness of these systems to such voice changes. Additionally, the survey addresses existing applications dealing with voice disguise and analyses some issues for future research.
GPS-equipped devices such as smartphones have become prevalent in the past decade. They have fostered abundant location-based services in applications such as navigation and location-based social networking. Continuous spatial queries serve as a building block for many location-based services. An example of such queries is to continuously maintain the nearest customers for an Uber driver when she is driving. Processing such queries with high efficiency is crucial to the user experience, since real-time updates are required to the query result as the query or data objects are moving. A popular approach to address this efficiency issue is to use safe regions. A safe region is a region inside which an object can move arbitrarily without causing any changes to the query result. As long as the query object stays in its safe region, no query result update is required. This substantially reduces the frequency of query re-evaluation and query result update, and hence improves query efficiency. Safe regions have very interesting theoretical properties and are worth in-depth analysis. We provide a comparative study of the safe region based approaches. We describe how safe regions are defined and computed for different types of continuous spatial queries, and discuss possible further improvements.
Many networking research activities are dependent on the availability of network captures. Even outside academic research there is a need for sharing network captures, to cooperate on threat assessments or for debugging. However, most network captures can not be shared due to privacy concerns. There have been many advances in the understanding of anonymisation and cryptographic methods, which have changed the perspective on the effectiveness of many anonymisation techniques. On the other hand these advances, combined with the increase of computational abilities, may have also made it feasible to perform anonymisation in real-time. This may make it easier to collect and distribute network captures, both for research and for other applications. This article surveys the literature over the period of 1998 -- 2015 on network traffic anonymisation techniques and implementations. The aim is to provide an overview of the current state of the art, and to highlight how advances in related fields have shed new light on anonymisation and pseudonimisation methodologies. The few currently maintained implementations are also reviewed. Lastly, we identify future research directions to enable easier sharing of network traffic, which in turn can enable new insights in network traffic analy
Many security and software testing applications require checking whether certain properties of a program hold for any possible usage scenario. For instance, a tool for identifying software vulnerabilities may need to rule out the existence of any backdoor to bypass a program's authentication. One approach would be to test the program using different, possibly random inputs. As the backdoor may only be hit for very specific program workloads, automated exploration of the space of possible inputs is of the essence. Symbolic execution provides an elegant solution to the problem, by systematically exploring many possible execution paths at the same time without necessarily requiring concrete inputs. Rather than taking on fully specified input values, the technique abstractly represents them as symbols, resorting to constraint solvers to construct actual instances that would cause property violations. Symbolic execution has been incubated in dozens of tools developed over the last four decades, leading to major practical breakthroughs in a number of prominent software reliability applications. The goal of this survey is to provide an overview of the main ideas, challenges, and solutions developed in the area, distilling them for a broad audience.
Intrusion alert analysis is an attractive and active topic in the area of intrusion detection and prevention system (IDPS). In recent decades, many research communities have been working in this field. Therefore, a large volume of research works are released and hence, various research areas have emerged. However, there has been no systematic and up-to-date review of research works within the field. The main objective of this paper is to achieve a taxonomy of research fields in intrusion alert analysis and present a reference guide for researchers who want to enter in this area. To this aim, a systematic mapping study (SMS) on 433 high-quality research works has been conducted. By using keywords clustering, there are ten different research topics in the field of intrusion alert analysis which can be classified into three broad groups: pre-processing, processing, and post-processing. A brief description is provided regarding these groups and their related topics. Indeed, some useful analysis are presented based on data extraction from research works. The results show that the processing group contains most of the research works and newly moved to heterogeneous correlation. Also, the post-processing group is newer than others and recently considered by research communities and security administrators.