Network Functions Virtualization (NFV) and Software-Defined Networking (SDN) are new paradigms towards open software and network hardware. While NFV aims at virtualizing network functions and deploying them into general purpose hardware, SDN makes networks programmable by separating the control and data planes. NFV and SDN are complementary technologies capable of providing one network solution. SDN can provide connectivity between Virtual Network Functions (VNFs) in a flexible and automated way, whereas NFV can use SDN as part of a service function chain. There are a great deal of studies proposing NFV/SDN architectures in different environments. Researchers have been trying to address reliability, performance, and scalability problems using different architectural designs. This Systematic Literature Review (SLR) focuses on integrated NFV/SDN architectures and has the following goals: i) to investigate and provide an in-depth review of the state-of-the-art of NFV/SDN architectures, ii) to synthesize their architectural designs, and iii) to identify areas for further improvements. In a broad view, this SLR will encourage researchers to advance the current stage of development (i.e., the state-of-the-practice) of integrated NFV/SDN architectures, as well as to shed some light on future research efforts and their challenges.
Pilot-Job systems play an important role in supporting distributed scientific computing. With the increasing importance of task-level parallelism in high-performance computing, Pilot-Job systems are also witnessing an adoption beyond traditional domains. Notwithstanding the growing impact on scientific research, there is no agreement upon a definition of Pilot-Job system and no clear understanding of the underlying abstraction and paradigm. Pilot-Job implementations have proliferated with no shared best practices or open interfaces and little interoperability. Ultimately, this is hindering the realization of the full impact of Pilot-Jobs by limiting their robustness, portability, and maintainability. This paper offers a com- prehensive analysis of Pilot-Job systems critically assessing their motivations, evolution, properties, and implementation. The three main contributions of this paper are: (i) an anal- ysis of the motivations and evolution of Pilot-Job systems; (ii) an outline of the Pilot abstraction, its distinguishing logi- cal components and functionalities, its terminology, and its architecture pattern; and (iii) the description of core and auxiliary properties of Pilot-Jobs systems and the analysis of seven exemplar Pilot-Job implementations. Together, these contributions illustrate the Pilot paradigm, its generality, and how it helps to address some challenges in distributed scientific computing.
A tutorial resource for researchers who want to program a computer to perform a creative task, but who lack an interdisciplinary background in the psychology, philosophy, and cognitive science of creativity. We summarize interdisciplinary perspectives on what creativity is, how it is detected in humans, and how a person would falsifiably evaluate whether or not a computer has been creative. We then survey how these perspectives have and have not been used in actual computational creativity research, as well as what new perspectives on creativity have arisen specifically from computer science, and make recommendations for how they should be used by this field in the future.
The main achievements of spatio-temporal modelling in the field of Geographic Information Science over the past three decades are surveyed. This article offers an overview of: (i) the origins and history of Temporal Geographic Information Systems (T-GIS); (ii) relevant spatio-temporal data models proposed; (iii) the evolution of spatio-temporal modelling trends; and (iv) an analysis of the future trends and developments in T-GIS. It also presents some current theories and concepts that have emerged from the research performed, as well as a summary of the current progress and the upcoming challenges and potential research directions for T-GIS. One relevant result of this survey is the proposed taxonomy of spatio-temporal modelling trends, which classifies 186 modelling proposals surveyed from more than 1400 articles.
Stress is a major concern in daily life that imposes significant and growing health and economic costs on society every year. Stress and driving are a dangerous combination which can lead to life-threatening situations as a large number of road traffic crashes occur every year due to driver stress. In addition, the rate of many general health issues caused by work-related chronic stress in drivers who work in public and private transport is greater than many other occupational groups. Therefore, an early warning system for drivers stress level in car is needed to continuously predict dangerous driving situations and alert the driver pro-actively from the perspective of safety and comfortable driving. With recent developments in ambient intelligence, such as sensing technologies, pervasive devices, context recognition, and communications, it is becoming feasible to comfortably measure combinations of different sensed modalities to recognise driver stress automatically. This survey reviews the most recent researches on automatic driver stress level detection domain based on different sensors and data. Different computational techniques which have been used in this domain for data analysis are investigated. The important methodological issues that hinder the implementation of such a system are discussed and future research directions are offered.
The rapid development of cloud computing promotes a wide deployment of data and computation outsourcing by resource-limited entities to cloud providers. Based on a pay-per-use fashion, clients without enough computational power can easily outsource large-scale computational tasks to the cloud. Nonetheless, the issue of security and privacy is a major concern when customers' confidential or sensitive data is processed and the output is generated in not fully trusted cloud environments. Recently, a number of publications have investigated and designed secure outsourcing schemes for different computational tasks. The aim of this survey is to systemize and present the cutting-edge technologies in this area. It starts by presenting security threats and requirements, followed by other factors that should be considered by secure computation outsourcing constructions. In an organized way, we then dwell on the existing secure computation outsourcing solutions to different computational tasks such as matrix computations, mathematical optimization, etc., treating the confidentiality of data as well as the integrity of result. Finally, we offer a discussion of the literature and provide a list of open challenges in the area.
Reproducibility is widely considered to be an essential requirement of the scientific process. However, a number of serious concerns have been raised recently, questioning whether today's computational work is adequately reproducible. In principle, it should be possible to specify a computation to sufficient detail that anyone should be able to reproduce it exactly. But in practice, there are fundamental, technical, and social barriers to doing so. The many objectives and meanings of reproducibility are discussed within the context of scientific computing. Many technical barriers to reproducibility are described, extant approaches surveyed, and open areas of research are identified.
In the last few decades, Structure from Motion (SfM) and visual Simultaneous Localization and Mapping (visual SLAM) techniques have gained significant interest from both the computer vision and robotic communities. Many variants of these techniques have started to make an impact in a wide range of applications, including robot navigation and augmented reality. However, despite some remarkable results in these areas, most SfM and visual SLAM techniques operate based on the assumption that the observed environment is static. However, when faced with moving objects, overall system accuracy can be jeopardized. In this paper, we present for the first time a survey of visual localization and 3D reconstruction techniques that are targetted towards operation in dynamic environments. We identify three main problems, namely: how to perform reconstruction (robust visual SLAM); how to segment and track dynamic objects; and lastly, how to achieve joint motion segmentation and reconstruction. Based on this categorization, we provide a comprehensive taxonomy of existing approaches. Finally, the advantages and disadvantages of each solution class are critically discussed from perspective of practicality and robustness.
It is essential to find new ways of enabling experts in different disciplines to collaborate more efficiently in the development of ever more complex systems, under increasing market pressures. One possible solution for this challenge is to use a heterogeneous model-based approach where different teams can produce their conventional models and carry out their usual mono-disciplinary analysis, but in addition, the different models can be coupled for simulation (co-simulation), allowing the study of the global behavior of the system. Due to its potential, co-simulation is being studied in many different disciplines but with limited sharing of findings. Our aim with this work is to summarize, bridge, and enhance future research in this multidisciplinary area. We provide an overview of co-simulation approaches, research challenges, and research opportunities, together with a detailed taxonomy with different aspects of the state of the art of co-simulation and classification for the past five years. The main research needs identified are: finding generic approaches for modular, stable and accurate coupling of simulation units; and expressing the adaptations required to ensure that the coupling is correct.
Large volumes of spatio-temporal data are increasingly collected and studied in diverse domains including, climate science, social sciences, neuroscience, epidemiology, transportation, mobile health, and Earth sciences. Spatio-temporal data differs from relational data for which computational approaches are developed in the data mining community for multiple decades, in that both spatial and temporal attributes are available in addition to the actual measurements/attributes. The presence of these attributes introduces additional challenges that needs to be dealt with. Approaches for mining spatio-temporal data have been studied for over a decade in the data mining community. In this article we present a broad survey of this relatively young field of spatio-temporal data mining. We discuss different types of spatio-temporal data and the relevant data mining questions that arise in the context of analyzing each of these datasets. Based on the nature of the data mining problem studied, we classify literature on spatio-temporal data mining into six major categories: clustering, predictive learning, change detection, frequent pattern mining, anomaly detection, and relationship mining. We discuss the various forms of spatio-temporal data mining problems in each of these categories.
Traffic load in any infrastructured 802.11 network is typically distributed unevenly between access points (AP) creating hotspots. This is due to the inherent nature of wireless networks (WLAN) where stations are free to associate to any known AP they desire and the lack of control by the APs themselves. This imbalance creates a condition where affected APs in the network suffer traffic congestion while others are under utilised leading to stations experiencing lower throughput, longer latency and operating below the network potential capacity. To alleviate this problem, some form of load balancing is required to redistribute the work load amongst other available APs in the wireless network. This paper presents a survey of the various works done in performing load balancing in an infrastructured 802.11 wireless network and will cover the common methods including admission control, association management, cell breathing and association control. Updates to the IEEE standards are also presented that support load balancing efforts. Finally, software defined networks are investigated to determine the extent of control integration to support managing and load balancing WLANs. Trends in load balancing research is also uncovered which indicate how the introduction of new wireless standards influence the amount of research.
Until not long ago, manually capturing and storing provenance from scientific experiments were constant concerns for scientists. With the advent of computational experiments (modeled as scientific workflows) and Scientific Workflow Management Systems, produced and consumed data, as well as the provenance of a given experiment, are automatically managed, so provenance capturing and storing in such context is no longer a major concern. Similarly to several existing big data problems, the bottom line is now on how to analyze the large amounts of provenance data generated by workflow executions and how to be able to extract useful knowledge of this data. In this context, this article surveys the current state-of-art on provenance analytics by presenting the key initiatives that have been taken to support provenance data analysis. We also contribute by proposing a taxonomy to classify elements related to provenance analytics.
Huge increase in the number of digital music tracks has created a necessity to develop an automated tool to extract the needful information from those tracks. As this information has to be extracted from the contents of the music, it is known as Content Based - Music Information Retrieval (CB-MIR). As, since recent two decades, several research outcomes are observed in the area of CB-MIR, there is a need to consolidate and critically analyze the research findings to evolve future research directions. In this survey article, various tasks of content based music information retrieval and their applications are critically reviewed. In particular, the article focuses on eight MIR related tasks such as vocal/non-vocal segmentation, artist identification, genre classification, raga identification, query-by-humming (QBH), emotion recognition, instrument recognition and music clip annotation. The article elaborates the signal processing techniques to extract useful features for performing specific tasks mentioned above and discusses their strengths as well as weakness. This paper also points to some general research issues in CB-MIR and probable approaches towards solutions which help in improving the efficiency of existing CB-MIR systems.
Positional data from small and mobile GPS receivers has become ubiquitous and allows for many new applications such as road traffic or vessel monitoring as well as Location Based Services. To make these applications possible for which information on location is more important than ever, streaming spatial data needs to be managed, mined and used intelligently. This paper provides an overview of previous work in this evolving research field and discusses different applications as well as common problems and solutions. The conclusion indicates promising directions for future research.
The gap is widening between the processor clock speed of end-system architectures and network throughput capabilities. It is now physically possible to provide single-flow throughput of speeds up to 100 Gbps, and 400 Gbps will soon be possible. Most current research into high-speed data networking focuses on managing expanding network capabilities within datacenter Local-Area Networks (LANs) or efficiently multiplexing millions of relatively small flows through a Wide-Area Network (WAN). However, datacenter hyper-convergence places high-throughput networking workloads on general-purpose hardware, and distributed High-Performance Computing (HPC) applications require time-sensitive, high-throughput end-to-end flows (also referred to as elephant flows) to occur over WANs. For these applications, the bottleneck is often the end-system, and not the intervening network. Since the problem of the end-system bottleneck was uncovered, many techniques have been developed which address this mismatch with varying degrees of effectiveness. In this survey, we describe the most promising techniques, beginning with network architecturesand NIC design, continuing with operating and end-system architectures, and concluding with clean-slate protocol design.
As applications and operating systems are becoming more complex, the last decade has seen the rise of many tracing tools all across the software stack. This paper presents a hands-on comparison of modern tracers on Linux systems, both in user space and kernel space. The authors implement microbenchmarks that not only quantify the overhead of different tracers, but also sample fine-grained metrics that unveil insights into the tracers' internals and show the cause of each tracer's overhead. Internal design choices and implementation particularities are discussed, which helps to understand the challenges of developing tracers. Furthermore, this analysis aims to help users choose and configure their tracers based on their specific requirements in order to reduce their overhead and get the most of out of them.
Networks are used to represent relationships between entities in many complex systems, spanning from online social networks to biological cell development and brain activity. These networks model relationships which present various challenges. In many cases, relationships between entities are unambiguously known: are two users friends in a social network? Do two researchers collaborate on a published paper? Do two road segments in a transportation system intersect? These are unambiguous and directly observable in the system in question. In most cases, relationship between nodes are not directly observable and must be inferred: does one gene regulate the expression of another? Do two animals who physically co-locate have a social bond? Who infected whom in a disease outbreak? Existing approaches use specialized knowledge in different home domains to infer and measure the goodness of inferred network for a specific task. However, current research lacks a rigorous validation framework which employs standard statistical validation. In this survey, we examine how network representations are learned from non-network data, the variety of questions and tasks on these data over several domains, and validation strategies for measuring the inferred network's capability of answering questions on the original system of interest.
Software testing activities account for a considerable portion of systems development cost and, for this reason, many studies have sought to automate these activities. Test data generation has a high cost reduction potential (specially for complex domain systems), since it can decrease human effort. Although several studies have been published about this subject, articles of reviews covering this topic usually focus only in specific domains. This article presents a systematic mapping aiming at providing a broad, albeit critical, overview of the literature in the topic of test data generation using genetic algorithms. The selected studies were categorized by software testing technique (structural, functional or mutation testing) for which test data were generated and by proposed modifications on genetic algorithms. The most used evaluation metrics and software testing techniques were identified. The results showed that genetic algorithms have been successfully applied to simple test data generation, but are rarely used to generate complex test data such as images, videos, sounds, and three-dimensional models. From these results, we discuss some challenges and opportunities for researches in this area.
Web application providers have been migrating their applications to cloud data centers, attracted by the emerging cloud computing paradigm. One of the appealing features of cloud is elasticity. It allows cloud users to acquire or release computing resources on demand, which enables web application providers to auto-scale the resources provisioned to their applications under dynamic workload in order to minimize resource cost while satisfying Quality of Service (QoS) requirements. In this paper, we comprehensively analyze the challenges remain in auto-scaling web applications in clouds and review the developments in this field. We present a taxonomy of auto-scaling systems according to the identified challenges and key properties. We analyze the surveyed works and map them to the taxonomy to identify the weakness in this field. Moreover, based on the analysis, we propose new future directions.
Context: Software development process measurement is essential to reach predictable performance and high capability processes. Software process measurement provides support for better understanding, evaluation, management and control of the development process, project and resulting product, as well. Measurement enables organizations to recognize, improve, and predict their processes quality and performance, which place organizations in better position to make appropriate and informed decisions as early as possible during the development process. Objective: This study aims to understand the measurement of the software development process, to identify studies, to create a classification scheme based on the identified studies, and then, to map such studies into the scheme so as to answer the research questions. Method: Systematic mapping is the selected research methodology for this project. Results: A total of 419 studies are included, and classified into four groups with respect to their focus and into three groups based on the publishing date. Conclusion: The project effort and productivity are the attributes that have been measured more frequently, followed by process maturity and productivity in second place. GQM and CMMI are the main methods used in the studies, whereas Agile and Lean development and Small and Medium-Size Enterprise are the most frequently identified research contexts.
Recent diversity of storage demands made various shortcomings of traditional RDBMS systems revealed, which in turn led to the emergence of a new trend of complementary non-relational data management solutions, named as NoSQL (Not only SQL). This survey mainly aims at presenting the work that has been conducted with regard to four closely related concepts of NoSQL stores: data model, consistency model, data partitioning and replication. For each concept, its different protocols, and for each protocol, its corresponding features, strengths and drawbacks are explained. Furthermore, various implementations of each protocol are exemplified and crystallized through a collection of representative academic and industrial NoSQL technologies. The rationale behind each design decision along with some corresponding extensions and improvements are discussed. Finally, we disclose some existing challenges in developing effective NoSQL stores, which need attention from the research community, application designers and architects.
Network-enabled sensing and actuation devices are key enablers to connect real-world objects to the cyber world. Internet of Things (IoT) uses these network-enabled devices and communication technologies to allow connectivity and integration of physical objects (Things) from real-world into the data-driven digital world (Internet). Enormous amounts of dynamic IoT data are collected from Internet-connected devices. IoT data is, however, often multi-variant streams that are heterogeneous, sporadic, multi-modal and spatio-temporal. IoT data can be disseminated with different granularities and have diverse structures, types and qualities. Dealing with data deluge from heterogeneous IoT resources and services impose challenges on indexing, discovery and ranking mechanisms to build applications that require on-line access and retrieval of IoT data. However, the existing IoT data indexing and discovery approaches are complex (usually based on formal and logical methods) or centralised which hinder their scalability. The primary objective of this paper is to provide a holistic overview of the state-of-the-art on indexing, discovering and ranking of IoT data. We discuss on-line analysis and fast responses to complex queries. The paper aims to pave the way for researchers to design, develop, implement and evaluate techniques and approaches in future for on-line large-scale distributed IoT applications and platforms.
While advances in computing resources have made processing enormous amounts of data possible, human ability to identify patterns in such data has not scaled accordingly. Thus, efficient computational methods for condensing and simplifying data are becoming vital for extracting actionable insights. In particular, while data summarization techniques have been studied extensively, only recently has summarizing interconnected data, or graphs, become popular. This survey is a structured, comprehensive overview of the state-of-the-art methods for summarizing graph data. We first broach the motivation behind and the challenges of graph summarization. We then categorize summarization approaches by the type of graphs taken as input and further organize each category by core methodology. Finally, we discuss applications of summarization on real-world graphs and conclude by describing some open problems in the field.
Contemporary mobile devices are the result of an evolution process where computational and networking capabilities have been continuously pushed so as to keep pace with the constantly growing workload requirements. This has allowed devices such as smartphone and tablets to perform increasingly complex tasks, up to the point of efficiently replacing traditional options such as desktop computers and notebooks. However, these devices are more prone to theft, to compromising or to exploitation for attacks and other malicious activity, mainly due to their portability and size. The need for investigation of the aforementioned incidents resulted in the creation of the Mobile Forensics (MF) discipline. MF, a sub-domain of Digital Forensics (DF) is specialized in extracting and processing evidence from mobile devices in such a way that attacking entities and actions are identified and traced. Beyond its primary research interest on accurate evidence acquisition from mobile devices, MF has recently expanded its scope to encompass the organized and advanced evidence representation and analysis of entities behavior. The current paper aims to present the research conducted within the MF ecosystem during the last six years. Moreover, it identifies the gaps and highlights the differences from past research directions. Lastly, it addresses challenges and open issues in the field.
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing CNN ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, an evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.
Computational creativity seeks to understand computational mechanisms that can be characterized as creative. Creation of new concepts is a central challenge for any creative system. In this paper, we outline different approaches to concept creation and then review conceptual representations relevant to concept creation. The conceptual representations are organized in accordance with two important perspectives on the distinctions between them. One distinction is between symbolic, spatial and connectionist representations. The other is between descriptive and procedural representations. These two distinctions are orthogonal. Additionally, conceptual representations used in particular creative domains, i.e. language, music, image and emotion, are reviewed separately. For each representation reviewed, we cover the inference it affords, the computational means of building it, and its application in concept creation.
Over the past decades, researches have been proposing different Intrusion Detection approaches to deal with the increasing number and complexity of threats for computer systems. In this context, Random Forest models have been providing a notable performance on their applications in the realm of the behaviour-based Intrusion Detection Systems. Specificities of the Random Forest model are used to provide classification, feature selection and proximity metrics. This work provides a comprehensive review of the general basic concepts related to Intrusion Detection Systems, including taxonomies, attacks, data collection, modelling, evaluation metrics and commonly used methods. It also provides a survey of Random Forest based methods applied on this context, considering the particularities involved in these models. Finally, some open questions and challenges are posed combined with possible directions to deal with them, which may guide future works on the area.
This survey presents multidimensional scaling (MDS) methods and their applications in real world. MDS is an exploratory and multivariate data analysis technique becoming more and more popular. MDS is one of the multivariate data analysis techniques, which tries to represent the higher dimensional data into lower space. The input data for MDS analysis is measured by the dissimilarity or similarity of the objects under observation. Once the MDS technique is applied to the measured dissimilarity or similarity, MDS results in a spatial map. In the spatial map the dissimilar objects are far apart while objects which are similar are placed close to each other. In this survey paper, MDS is described fairly in comprehensive fashion by explaining the basic notions of classical MDS and how MDS can be helpful to analyze the multidimensional data. Later on various MDS based special models are described in a more mathematical way.
Automatic machine-based Facial Expression Analysis (FEA) has witnessed substantial progress in the past few decades motivated by its importance in psychology, security, health, entertainment and human computer interaction. However, the vast majority of current studies are based on non-occluded faces collected in a controlled laboratory environment, and automatic expression recognition from partially occluded faces remains a largely unresolved field, particularly in real-world scenarios. In recent years, increasing efforts have been directed at investigating techniques to handle partial occlusion for FEA. This survey provides a comprehensive review of the recent advances in dataset creation, algorithm development, and investigations of the effects of occlusion, which are crucial in system design and evaluations. It also outlines existing challenges in overcoming partial occlusion and discusses possible opportunities in advancing the technology. To the best of our knowledge, it is the first FEA survey dedicated to occlusion and devoted to serve as a starting point to promote future work.
The goal of privacy metrics is to measure the degree of privacy enjoyed by users in a system and the amount of protection offered by privacy-enhancing technologies. In this way, privacy metrics contribute to improving user privacy in the digital world. The diversity and complexity of privacy metrics in the literature makes an informed choice of metrics challenging. As a result, redundant new metrics are proposed frequently, and privacy studies are often incomparable. In this survey we alleviate these problems by structuring the landscape of privacy metrics. For this we explain and discuss a selection of over eighty privacy metrics and introduce a categorization based on the aspect of privacy they measure, their required inputs, and the type of data that needs protection. In addition, we present a method on how to choose privacy metrics based on eight questions that help identify the right privacy metrics for a given scenario, and highlight topics where additional work on privacy metrics is needed. Our survey spans multiple privacy domains and can be understood as a general framework for privacy measurement.
This article presents a comprehensive survey on parallel I/O. This is an important field for High Performance Computing because of the historic gap between processing power and storage latencies, which causes applications performance to be impaired when accessing or generating large amounts of data. As the available processing power and amount of data increase, I/O remains a central issue for the scientific community. In this survey, we present background concepts everyone could benefit from. Moreover, through the comprehensive study of publications from the most important conferences and journals in a five-year time window, we discuss the state of the art of I/O optimization approaches, access pattern extraction techniques, and performance modeling, in addition to general aspects of parallel I/O research. Through this approach, we aim at identifying the general characteristics of the field and the main current and future research topics.
Activities of a clinical sta in healthcare environments must regularly be adapted to new treatment methods, medications and technologies. This constant evolution requires the monitoring of the work ow, or the sequence of actions from actors involved in a procedure, to ensure quality of medical services. In this context, recent advances in sensing technologies, including Real-time Location Systems (RTLS) and Computer Vision, enable high-precision tracking of actors and equipment. The current state-of-the-art about healthcare work ow monitoring typically focuses on a single technology and does not discuss its integration with others. Such an integration can lead to better solutions to evaluate medical work ows. This study aims to ll the gap regarding the analysis of monitoring technologies with a systematic literature review about sensors for capturing the work ow of healthcare environments. Its main scienti c contribution is to identify both current technologies used to track activities in a clinical environment and gaps on their combination to achieve better results. We also propose a taxonomy to classify work regarding sensing technologies and methods. Our review did not identify proposals that combine data obtained from RTLS and Computer Vision sensors. We conclude that a multimodal analysis is more exible and could yield better results.
While cloud computing has brought paradigm shifts to computing services, researchers and developers have also found some problems inherent to its nature such as bandwidth bottleneck, communication overhead, and location blindness. The concept of fog/edge computing is therefore coined to extend the services from the core in cloud data centers to the edge of the network. In recent years, many systems are proposed to better serve ubiquitous smart devices closer to the user. This paper provides a complete and up-to-date review of edge-oriented computing systems by encapsulating relevant proposals on their architecture features, management approaches, and design objectives.
Despite the increasing use of social media for information and news gathering, its unmoderated nature often leads to the emergence and spread of rumours, i.e. unverified pieces of information. At the same time, the openness of social media provides opportunities to study how users share and discuss rumours, and to explore how natural language processing and data mining techniques may be used to find ways of determining their veracity. In this survey we introduce and discuss two types of rumours that circulate on social media; long-standing rumours that circulate for long periods of time, and newly-emerging rumours spawned during fast-paced events such as breaking news, where unverified reports are often released piecemeal. We provide an overview of research into social media rumours with the ultimate goal of developing a rumour classification system that consists of four components: rumour detection, rumour tracking, rumour stance classification and rumour veracity classification. We delve into the approaches presented in the scientific literature for the development of each of these components. We summarise the efforts and achievements so far towards the development of rumour classification systems and conclude with suggestions for avenues for future research in social media mining for detection and resolution of rumours.
Activity recognition aims to provide accurate and opportune information on peoples activities by leveraging sensory data available in todays sensory rich environments. Nowadays, activity recognition has become an emerging field in the areas of pervasive and ubiquitous computing. A typical activity recognition technique processes data streams that evolve from sensing platforms such as mobile sensors, on body sensors, and/or ambient sensors. This paper surveys the two overlapped areas of research of activity recognition and data stream mining. The perspective of this paper is to review the adaptation capabilities of activity recognition techniques in streaming environment. Broad categories of techniques are identified based on the different features in both data streams and activity recognition. The pros and cons of the algorithms in each category are analysed and the possible directions of future research are indicated.
The size of Linked Data is growing fast, thus a Linked Data management system must to be able to deal with increasing amounts of data. Even though physically handling Linked Data using a relational table is possible, querying a giant triple table becomes very costly due to the multiple nested joins required for typical queries. In addition, the heterogeneity of Linked Data poses entirely new challenges to database systems. This article provides a comprehensive study of the state of the art in storing and querying RDF data. In particular, we focus on data storage techniques, indexing strategies, and query execution mechanisms. In addition, we provide a classification of existing systems and approaches. We also provide an overview of the various benchmarking efforts in this context and discuss some of the open problems in this domain.
Networks built to model real world phenomena are characeterised by some properties that have attracted the attention of the scientific community: (i) they are organised according to community structure and (ii) their structure evolves with time. Many researchers have worked on methods that can efficiently unveil substructures in complex networks, giving birth to the field of community discovery. A novel and challenging problem started capturing researcher interest recently: the identification of evolving communities. To model the evolution of a system, dynamic networks can be used: nodes and edges are mutable and their presence, or absence, deeply impacts the community structure that compose them. The aim of this survey is to present the distinctive features and challenges of dynamic community discov- ery, and propose a classification of published approaches. As a user manual, this work organizes state of art methodologies into a taxonomy, based on their rationale, and their specific instanciation. Given a desired definition of network dynamics, community characteristics and analytical needs, this survey will support re- searchers to identify the set of approaches that best fit their needs. The proposed classification could also help researchers to choose in which direction should future research be oriented.
The Internet has undergone dramatic changes in the past 15 years, and now forms a global communication platform that billions of users rely on for their daily activities. While this transformation has brought tremendous benefits to society, it has also created new threats to online privacy, such as omnipotent governmental surveillance. As a result, public interest in systems for anonymous communication has drastically increased. In this work, we survey previous research on designing, developing, and deploying systems for anonymous communication. Our taxonomy and comparative assessment provide important insights about the differences between the existing classes of anonymous communication protocols.
Many networking research activities are dependent on the availability of network captures. Even outside academic research there is a need for sharing network captures, to cooperate on threat assessments or for debugging. However, most network captures can not be shared due to privacy concerns. There have been many advances in the understanding of anonymisation and cryptographic methods, which have changed the perspective on the effectiveness of many anonymisation techniques. On the other hand these advances, combined with the increase of computational abilities, may have also made it feasible to perform anonymisation in real-time. This may make it easier to collect and distribute network captures, both for research and for other applications. This article surveys the literature over the period of 1998 -- 2015 on network traffic anonymisation techniques and implementations. The aim is to provide an overview of the current state of the art, and to highlight how advances in related fields have shed new light on anonymisation and pseudonimisation methodologies. The few currently maintained implementations are also reviewed. Lastly, we identify future research directions to enable easier sharing of network traffic, which in turn can enable new insights in network traffic analy
Information security systems, which protect networks and computers against cyber attacks, are becoming common due to increasing threats and government regulation. At the same time, the enormous amount of data gathered by information security systems poses a serious threat on the privacy of the people protected by those systems. To ground this threat, we survey common and novel information security technologies and analyze them according to the potential for privacy invasion. We suggest a taxonomy for privacy risks assessment of information security technologies, based on the level of data exposure, the level of identification of individual users, the data sensitivity and the user control over the monitoring, collection and analysis of the data. We discuss our results in light of the recent trends in information security and suggest several new directions for making these mechanisms more privacy-aware.
Network management and maintenance are time-consuming and often challenging tasks. With the emergent Software-Defined Networking paradigm, most of the focus is directed to the evolution of control protocols and platforms, or to deployment problems. Although researchers and network operators consider network management as a primary requirement, its development in SDN has been apparently set aside. This paper reports on the SDN architecture, introduces the concept of SDN tools and surveys the state-of-the-art in different aspects of the network management with emphasis on SDN. Because the SDN ecosystem lacks of a standardized management framework, initiatives are different and scattered.
Many security and software testing applications require checking whether certain properties of a program hold for any possible usage scenario. For instance, a tool for identifying software vulnerabilities may need to rule out the existence of any backdoor to bypass a program's authentication. One approach would be to test the program using different, possibly random inputs. As the backdoor may only be hit for very specific program workloads, automated exploration of the space of possible inputs is of the essence. Symbolic execution provides an elegant solution to the problem, by systematically exploring many possible execution paths at the same time without necessarily requiring concrete inputs. Rather than taking on fully specified input values, the technique abstractly represents them as symbols, resorting to constraint solvers to construct actual instances that would cause property violations. Symbolic execution has been incubated in dozens of tools developed over the last four decades, leading to major practical breakthroughs in a number of prominent software reliability applications. The goal of this survey is to provide an overview of the main ideas, challenges, and solutions developed in the area, distilling them for a broad audience.
Modern cloud environments support a relatively high degree of automation in service provisioning, which allows cloud users to dynamically acquire services required for deploying cloud applications. Cloud modeling languages (CMLs) have been proposed to address the diversity of features provided by todays cloud environments and support different application scenarios, e.g. migrating existing applications to the cloud, developing new cloud applications, or optimizing them. There is, however, still much debate on what a CML is and what aspects of a cloud application and the target cloud environment should be modeled by a CML. Furthermore, the distinction between CMLs on a fine-grained level exposing their modeling concepts is rarely made. In this article, we investigate the diverse features currently provided by existing CMLs. We classify and compare them according to a common framework with the goal to support cloud users in selecting the CML which fits the needs of their application scenario and setting. As a result, not only features of existing CMLs are pointed out for which extensive support is already provided but also in which existing CMLs are deficient, thereby suggesting a research agenda for the future.
Intrusion alert analysis is an attractive and active topic in the area of intrusion detection and prevention system (IDPS). In recent decades, many research communities have been working in this field. Therefore, a large volume of research works are released and hence, various research areas have emerged. However, there has been no systematic and up-to-date review of research works within the field. The main objective of this paper is to achieve a taxonomy of research fields in intrusion alert analysis and present a reference guide for researchers who want to enter in this area. To this aim, a systematic mapping study (SMS) on 433 high-quality research works has been conducted. By using keywords clustering, there are ten different research topics in the field of intrusion alert analysis which can be classified into three broad groups: pre-processing, processing, and post-processing. A brief description is provided regarding these groups and their related topics. Indeed, some useful analysis are presented based on data extraction from research works. The results show that the processing group contains most of the research works and newly moved to heterogeneous correlation. Also, the post-processing group is newer than others and recently considered by research communities and security administrators.