In the big data era, much real-world data can be naturally represented as graphs. Consequently, many application domains can be modelled as graph processing. Graph processing, especially the processing of the large scale graphs with the number of vertices and edges in the order of billions or even hundreds of billions, has attracted much attention in both industry and academia. It still remains a great challenge to process such large scale graphs. Researchers have been seeking for new possible solutions. Because of the massive degree of parallelism and the high memory access bandwidth in GPU, utilizing GPU to accelerate graph processing proves to be a promising solution. This paper surveys the key issues of graph processing on GPUs, including data layout, memory access pattern, workload mapping and specific GPU programming. In this paper, we summarize the state-of-the-art research on GPU-based graph processing, analyze the existing challenges in details, and explore the research opportunities in future.
We survey foundational features underlying modern graph query languages. We first discuss two popular graph data models: edge-labelled graphs, where nodes are connected to other nodes by directed, labelled edges; and property graphs, where nodes and edges can have attributes. Next we discuss the two most basic graph querying functionalities: graph patterns and navigational expressions. We start with graph patterns, in which a graph-structured query is matched against the data. Thereafter we discuss navigational expressions, in which patterns can be matched recursively against the graph to navigate paths of arbitrary length; we give an overview of what kinds of expressions have been proposed, and how such expressions can be combined with graph patterns. We also discuss a variety of semantics under which queries using the previous features can be evaluated, what effects the introduction of additional features and the selection of semantics has on complexity, as well as offering examples of said features in three modern languages that can be used to query graphs: SPARQL, Cypher and Gremlin. We conclude with discussion of the importance of formalisation for graph query languages, as well as possible future directions in which such languages can be extended.
Automated Vehicle Classification (AVC) based on vision sensors has received active attention from researchers, due to heightened security concerns in Intelligent Transportation Systems. In this work, we propose a categorization of AVC studies based on the granularity of classification, namely Vehicle Type Recognition (VTR), Vehicle Make Recognition (VMR) and Vehicle Make and Model Recognition (VMMR). For each category of AVC systems, we present a comprehensive review and comparison of features extraction, global representation, and classification techniques. The various datasets proposed over the years for AVC are also compared in light of the real-world challenges they represent, and those they do not. The major challenges involved in each category of AVC systems are presented, highlighting open problems in this area of research. Finally, we conclude by providing future directions of research in this area, paving the way towards efficient large-scale AVC systems. This survey shall help researchers interested in the area to analyze works completed so far in each category of AVC, focusing on techniques proposed for each module, and to chalk out strategies to enhance state-of-the-art technology.
Stylometry, or the analysis of authorial writing style, relies on the assumption that this style is quantifiable and distinct. However, deriving a universal style representation has plagued researchers for nearly 200 years, resulting in several methods and tools to address various challenges, such as use of limited training samples for accurate author recognition. Research has since concentrated on the fine tuning of these techniques and the role of stylometry in preserving and/or exposing privacy and anonymity. This survey covers these methods with emphasis on stylometry-related sub-problems. Additionally, while previous surveys neglect to include adversarial stylometric techniques, methods specifically designed to counter authorship detection are discussed. Many experimental models and databases are defined and discussion of various research approaches which employ each are provided. Finally, several research challenges and descriptions of various open-source and commercial software are provided.
Making cities smarter help improve city services and increase citizens quality of life. Information and communication technologies (ICT) are fundamental for progressing towards smarter city environments. Smart City software platforms potentially support the development and integration of Smart City applications. However, the ICT community must overcome current signicant technological and scientic challenges before these platforms can be widely used. This paper surveys the state-of-the-art in software platforms for Smart Cities. We analyzed 23 projects with respect to the most used enabling technologies, as well as functional and non-functional requirements, classifying them into four categories: Cyber-Physical Systems, InternetofThings,BigData,andCloudComputing.Basedontheseresults,wederivedareferencearchitecture to guide the development of next-generation software platforms for Smart Cities. Finally, we enumerated the most frequently cited open research challenges, and discussed future opportunities. This survey gives important references for helping application developers, city managers, system operators, end-users, and Smart City researchers to make project, investment, and research decisions.
The main achievements of spatio-temporal modelling in the field of Geographic Information Science over the past three decades are surveyed. This article offers an overview of: (i) the origins and history of Temporal Geographic Information Systems (T-GIS); (ii) relevant spatio-temporal data models proposed; (iii) the evolution of spatio-temporal modelling trends; and (iv) an analysis of the future trends and developments in T-GIS. It also presents some current theories and concepts that have emerged from the research performed, as well as a summary of the current progress and the upcoming challenges and potential research directions for T-GIS. One relevant result of this survey is the proposed taxonomy of spatio-temporal modelling trends, which classifies 186 modelling proposals surveyed from more than 1400 articles.
Canonical correlation analysis is a family of multivariate statistical methods for the analysis of paired sets of variables. Since its proposition, canonical correlation analysis has been extended to extract relations between two sets of variables when the sample size is insufficient in relation to the data dimensionality, when the relations have been considered to be non-linear, and when the dimensionality is too large for human interpretation. Applying linear algebra, this tutorial explains the theory of canonical correlation analysis including its regularised, kernel, and sparse variants. Together with the numerical examples, this overview provides a coherent compendium on the applicabilities of the variants of canonical correlation analysis. By bringing together techniques for solving the optimisation problems, evaluating the statistical significance and generalisability of the canonical correlation model, and interpreting the relations, we hope that this article can serve as a hands-on tool for applying canonical correlation methods in data analysis.
Cyber risk management largely reduces to a race for information between defenders and attackers. Defenders can gain advantage in this race by sharing cyber risk information with each other. Yet, defenders often share less than is socially desirable, because sharing decisions are guided by selfish rather than altruistic reasons. A growing line of research studies these strategic aspects that drive defenders' sharing decisions. The present survey systematizes these works in a novel framework. It provides a consolidated understanding of defenders' strategies to privately or publicly share information, and enables us to distill trends in the literature and identify future research directions. The review also reveals that many theoretical works assume cyber risk information sharing to be beneficial, while corresponding empirical validations are missing.
The continuously increasing cost of the US healthcare system has received significant attention. Central to the ideas aimed at curbing this trend is the use of technology, in the form of the mandate to implement electronic health records (EHRs). EHRs consist of patient information such as demographics, medications, laboratory test results, diagnosis codes and procedures. Mining EHRs could lead to improvement in patient health management as EHRs contain detailed information related to disease prognosis for large patient populations. In this manuscript, we provide a structured and comprehensive overview of data mining techniques for modeling EHR data. We first provide a detailed understanding of the major application areas to which EHR mining has been applied and then discuss the nature of EHR data and its accompanying challenges. Next, we describe major approaches used for EHR mining, the metrics associated with EHRs, and the various study designs. With this foundation, we then provide a systematic and methodological organization of existing data mining techniques used to model EHRs and discuss ideas for future research.
Wearable computing is rapidly getting deployed in many commercial, medical and personal domains of day-to-day life. Wearable devices appear in various forms, shapes and sizes and facilitate a wide variety of applications in many domains of life. However, wearables raise unique security and privacy concerns. Wearables also hold the promise to help enhance the existing security, privacy and safety paradigms in unique ways while preserving systems usability. The contribution of this research literature survey is three-fold. First, as a background, we identify a wide range of existing as well as upcoming wearable devices and investigate their broad applications. Second, we provide an exposition of the security and privacy of wearable computing, studying dual aspects, i.e., both attacks and defenses. Third, we provide a comprehensive study of the potential security, privacy and safety enhancements to existing systems based on the emergence of wearable technology. Although several research works have emerged exploring different offensive and defensive uses of wearables, there is a lack of a broad and precise literature review systematizing all those security and privacy aspects and the underlying threat models. This research survey also analyzes current and emerging research trends, and provides directions for future research.
It is unlikely that an hacker is able to compromise sensitive data that is stored in an encrypted form. However, when data is to be processed, it has to be decrypted, becoming vulnerable to attacks. Homomorphic encryption fixes this vulnerability by allowing one to compute directly on encrypted data. In this survey, both previous and current Somewhat Homomorphic Encryption (SHE) schemes are reviewed, and the more powerful and recent Fully Homomorphic Encryption (FHE) schemes are comprehensively studied. The concepts that support these schemes are presented, and their performance and security are analyzed from an engineering standpoint.
The Experience Sampling Method (ESM) is used by scientists from various disciplines to gather insights into the intrapsychic elements of human life. Researchers have used the ESM in a wide variety of studies, with the method seeing increased popularity. Mobile technologies have enabled new possibilities for the use of the ESM, while simultaneously leading to new conceptual, methodological, and technological challenges. In this survey, we provide an overview of the history of the ESM, usage of this methodology in the computer science discipline, as well as its evolution over time. Next, we identify and discuss important considerations for ESM studies on mobile devices, and analyse the particular methodological parameters scientists should consider in their study design. We reflect on the existing tools that support the ESM methodology and discuss the future development of such tools. Finally, we discuss the effect of future technological developments on the use of the ESM and identify areas requiring further investigation.
The aim of this article is to provide an understanding of social networks as a useful addition to the standard tool-box of techniques used by system designers. To this end, we give examples of how data about social links have been collected and used in different application contexts. We develop a broad taxonomy-based overview of common properties of social networks, review how they might be used in different applications, and point out potential pitfalls where appropriate. We propose a framework, distinguishing between two main types of social network-based user selection personalised user selection which identifies target users who may be relevant for a given source node, using the social network around the source as a context, and generic user selection or group delimitation, which filters for a set of users who satisfy a set of application requirements based on their social properties. Using this framework, we survey applications of social networks in three typical kinds of application scenarios: recommender systems, content-sharing systems (e.g., P2P or video streaming), and systems which defend against users who abuse the system (e.g., spam or sybil attacks). In each case, we discuss potential directions for future research that involve using social network properties.
Firewalls are network security components that handle incoming and outgoing network traffic based on a set of rules. The process of correctly configuring a firewall is complicated and prone to error, and it worsens as the network complexity grows. A poorly configured firewall may result in major security threats; in case of a network firewall, an organizations security could be endangered, and in the case of a personal firewall, an individual computers security is threatened. A major reason of poorly configured firewalls, as pointed out in the literature, is usability issues. Our aim is to identify existing solutions that help professional and non- professional users to create and manage firewall configuration files, and to analyze the proposals in respect of usability. A systematic literature review with a focus on usability of firewall configuration is presented in the paper. Its main goal is to explore what has already been done in this field. In the primary selection procedure, 1,202 papers were retrieved and then screened. The secondary selection led us to 35 papers carefully chosen for further investigation, of which, 14 papers were selected and summarized....
Storage as a Service (StaaS) forms a critical component of cloud computing by offering the vision of a virtually infinite pool of storage resources. It supports a variety of cloud-based data store classes in terms of availability, scalability, ACID (Atomicity, Consistency, Isolation, Durability) properties, data models, and price options. Despite many open challenges within a cloud-based data store, application providers deploy Geo-replicated data stores in order to obtain higher availability, lower response time, and more cost efficiency. The deployment of Geo-replicated data stores is in its infancy and poses vital challenges for researchers. In this paper, we first discuss the key advantages and challenges of data-intensive applications deployed within and across cloud-based data stores. Then, we provide a comprehensive taxonomy that covers key aspects of cloud-based data store: data model, data dispersion, data consistency, data transaction service, and data cost optimization. Finally, we map various cloud-based data store projects to our proposed taxonomy not only to validate the taxonomy but also to identify areas for future research.
Optical on-chip data transmission enabled by silicon photonics is widely considered a key technology to overcome the bandwidth and energy limitations of electrical interconnects. The possibility of utilizing optical links in the on-chip communication fabric has paved the way to a fascinating new research field - Optical Networks-on-Chip (ONoCs) - which has been gaining large interest in the community. Nanophotonic devices and materials, however, are still evolving, and dealing with optical data transmission on chip makes designers and researchers face a whole new set of obstacles and challenges. Designing efficient ONoCs is a challenging task and requires a detailed knowledge from on-chip traffic demands and patterns down to the physical layout and implications of integrating both electronic and photonic devices. In this paper, we provide an exhaustive review of recent ONoC proposals, discuss their strengths and weaknesses, and outline outstanding research questions. Moreover, we discuss recent research efforts in key enabling technologies, such as on-chip and adaptive laser sources, automatic synthesis tools, and ring heating techniques, which are essential to enable a widespread commercial adoption of ONoCs in the future.
Programming languages expressiveness is limited by paradigm because it is focused on solving abstraction problems without considering expressiveness of abstractions described using natural language. So, authors have developed tools for natural language software development. In this paper, many works consisting of tools that use some natural language level and domain-specific languages that have an expressiveness level similar to natural languages are reviewed. The goal of the paper is to present a review and highlight the problems that were solved and those left aside. Also, it addresses the fact that a naturalistic language based on a model is not reported.
Nano-crossbar arrays have emerged as a promising and viable technology to improve computing performance of electronic circuits beyond the limits of current CMOS. Arrays offer both structural efficiency with reconfiguration and prospective capability of integration with different technologies. However, certain problems need to be addressed and the most important one is the prevailing occurrence of faults. Considering fault rate projections as high as 20\% that is much higher than those of CMOS, it is fair to expect sophisticated fault tolerance methods. The focus of this survey paper is the assessment and evaluation of these methods and related algorithms applied in logic mapping and configuration processes. As a start, we concisely explain reconfigurable nano-crossbar arrays with their fault characteristics and models. Following that, we demonstrate configuration techniques of the arrays in the presence of permanent faults and elaborate on two main fault tolerance methodologies, namely defect-unaware and defect-aware approaches, with a short review on advantages and disadvantages. Next, we overview fault tolerance approaches for transient faults. In the experimental results section, we give detailed results of the algorithms regarding their strengths and weaknesses with a comprehensive yield, success rate, and runtime analysis. As a conclusion, we overview the proposed algorithms with future directions and upcoming challenges.
Feature selection has been proven to be effective and efficient in preparing high-dimensional data for data mining and machine learning problems. The objectives include: building simpler and more comprehensible models, improving data mining performance, and preparing clean, understandable data. The recent proliferation of big data has presented some substantial challenges and opportunities of feature selection algorithms. In this survey, we provide a comprehensive and structured overview of recent advances in feature selection research. In particular, we revisit feature selection research from a data perspective, and review representative feature selection algorithms for generic data, structured data, heterogeneous data and streaming data. Methodologically, to emphasize the differences and similarities of most existing feature selection algorithms for generic data, we generally categorize them into four groups: similarity based, information theoretical based, sparse learning based and statistical based methods. Finally, to facilitate and promote the research in this community, we also present an open-source feature selection repository that consists of most of the popular feature selection algorithms (http://featureselection.asu.edu/). Also, we use it as an example to show how to evaluate feature selection algorithms. At last, we also have a discussion about some open problems and challenges that need to be paid more attention in future research.
Complex Event Recognition applications exhibit various types of uncertainty, ranging from incomplete and erroneous data streams to imperfect complex event patterns. We review Complex Event Recognition techniques that handle, to some extent, uncertainty. We examine techniques based on automata, probabilistic graphical models and first-order logic, which are the most common ones, and approaches based on Petri Nets and Grammars, which are less frequently used. A number of limitations are identified with respect to the employed languages, their probabilistic models and their performance, as compared to the purely deterministic cases. Based on those limitations, we highlight promising directions for future work.
Automatic sarcasm detection is the task of predicting sarcasm in text. This is a crucial step to sentiment analysis, considering prevalence and challenges of sarcasm in sentiment-bearing text. Beginning with an approach that used speech-based features, automatic sarcasm detection has witnessed great interest from the sentiment analysis community. This paper is the first known compilation of past work in automatic sarcasm detection. We observe three milestones in the research so far: semi-supervised pattern extraction to identify implicit sentiment, use of hashtag-based supervision, and incorporation of context beyond target text. In this paper, we describe datasets, approaches, trends and issues in sarcasm detection. We also discuss representative performance values, shared tasks and pointers to future work, as given in prior works. In terms of resources to understand the state-of-the-art, the survey presents several useful illustrations - most prominently, a table that summarizes past papers along different dimensions such as features, annotation techniques, data forms, etc.
Authenticated encryption (AE) has long been a vital operation in cryptography due to its ability to provide confidentiality, integrity and authenticity at the same time. Its use has soared in parallel with widespread use of Internet and has led to several new schemes. There have already been studies investigating software performance of various schemes. However, the same is yet to be done for hardware. In this paper, we present a comprehensive survey of hardware performance of the most commonly used authenticated encryption schemes in literature. These schemes include encrypt-then-MAC combination, block cipher based AE modes, relatively new authenticated encryption ciphers and the recently-introduced permutation-based AE scheme. For completeness, we implemented each scheme with various standardized block ciphers and/or hash algorithms, and their lightweight versions. In our evaluation, we targeted minimizing the time-area product while maximizing the throughput on ASIC platforms. 45nm NANGATE Open Cell Library was used for syntheses. In the results, we present area, speed, time-area product, throughput, and power figures for both standard and lightweight versions of each scheme. Finally, we provide an unbiased discussion on the impact of the structure and complexity of each scheme on hardware implementation, together with recommendations on hardware-friendly authenticated encryption scheme design.
Crowd-centric research is receiving increasingly more attention as data sets on crowd behavior are becoming readily available. We have come to a point that many of the models on pedestrian analytics introduced in the last decade, which have mostly not been validated, can now be tested using real-world data sets. In this survey we concentrate exclusively on automatically gathering such data sets, which we refer to as sensing the behavior of pedestrians. We roughly distinguish two approaches: one that requires users to explicitly use local applications and wearables, and one that scans the presence of handheld devices such as smartphones. We come to the conclusion that despite the numerous reports in popular media, relatively few groups have been looking into practical solutions for sensing pedestrian behavior. Moreover, we find that much work is still needed, in particular when it comes to combing privacy, transparency, scalability, and ease of deployment. We report on over 90 relevant articles and discuss and compare in detail 30 reports on sensing pedestrian behavior.
Geomagnetism has recently attracted considerable attention for indoor localization due to its pervasiveness and unreliance on extra infrastructure. Its location signature has been observed to be temporally stable and spatially discernible for localization purposes. This survey investigates the recent challenges and advances in geomagnetism-based indoor localization using smartphones. We first study smartphone-based geomagnetism measurements. We then review recent efforts in database construction and computation reduction, followed by state-of-the-art schemes in localizing the target. For each category, we identify practical deployment challenges and compare related studies. Finally, we summarize future directions and provide guideline for new researchers in this field.
This article presents an annotated bibliography on automatic software repair. Automatic software repair consists of automatically finding a solution to software bugs, without human intervention. The uniqueness of this article is that it spans the research communities that contribute to this body of knowledge: software engineering, dependability, operating systems, programming languages and security. Furthermore, it provides a novel and structured overview of the diversity of bug oracles and repair operators used in the literature.
The Smart Home concept integrates smart applications in the daily human life. In recent years, Smart Homes have increased security and management challenges due to low capacity of small sensors, multiple connectivity to the internet for efficient applications (use of big data and cloud computing) and heterogeneity of home systems, which imposes that inexpert users should configure devices and micro-systems. This paper presents current security and management approaches in Smart Homes and shows the good practices imposed into the market for developing secure systems in houses. At last, we propose future solutions for efficiently and securely managing the Smart Homes.
Vehicular networks and their associated technologies enable an extremely varied plethora of applications and therefore attract increasing attention from a wide audience. However vehicular networks also have many challenges that arise mainly due to their dynamic and complex environment. Fuzzy Logic, known for its ability to deal with complexity, imprecision and model non-deterministic problems, is a very promising technology for use in such a dynamic and complex context. This paper presents the first comprehensive survey of research on Fuzzy Logic approaches in the context of vehicular networks, and provides fundamental information which enables readers to design their own Fuzzy Logic systems in this context. As such, the paper describes the Fuzzy Logic concepts with emphasis on their implementation in vehicular networks, includes a classification and thorough analysis of the Fuzzy Logic-based solutions in vehicular networks and discusses how Fuzzy Logic could empower the key research directions in the 5G-enabled vehicular networks, the next generation of vehicular communications.
This survey covers research on the topic of mixed criticality systems that has been published since Vestal's seminal paper in 2007. It covers the period up to and including July 2015. The survey is organised along the lines of the major research areas within this topic. These include single processor analysis (including job-based, task-based, fixed priority and EDF scheduling, shared resources and static and synchronous scheduling), multiprocessor analysis, realistic models, formal treatments, and systems issues. The survey also explores the relationship between research into mixed criticality systems and other topics such as fault tolerant scheduling, hierarchical scheduling, cyber physical systems, and probabilistic hard real-time systems. An appendix lists funded projects in the area of mixed criticality.
The presence construct, most commonly defined as the sense of "being there", has driven research and development of virtual environments (VEs) for decades. Despite that, there is not widespread agreement on how to define or operationalize this construct. The literature contains many different definitions of presence, and many proposed measures for it. This article reviews many of the definitions, measures, and models of presence from the literature. We also discuss several related constructs, including immersion, agency, transportation, and reality judgment. We also present a meta-analysis of existing models of presence informed by Slater's Place Illusion and Plausibility Illusion constructs.
To program parallel systems efficiently and easily, a wide range of programming models appeared, each with different choices concerning synchronization and communication between parallel entities. Among them, the actor model is based on loosely coupled parallel entities that communicate trough asynchronous messages thanks to the use of mailboxes. Some actor languages provide a strong integration with the object-oriented concepts; they are often called active object languages. This paper reviews four major actor and active object languages and compares them according to well-chosen dimensions that cover the programming paradigms and their implementation.
The task of quantification consists in providing an aggregate estimation (e.g. the class distribution in a classification problem) for unseen test sets, applying a model that is trained using a training set with a different data distribution.} Several real-world applications demand this kind of methods that do not require predictions for individual examples and just focus on obtaining accurate estimates at an aggregate level. During the past few years, several quantification methods have been proposed from different perspectives and with different goals. This paper presents a unified review of the main approaches with the aim of serving as an introductory tutorial for newcomers in the field.
Recently, multimedia researchers have added several so called new media to the traditional multimedia components (e.g. olfaction, haptic and gustation). The inclusion of such stimuli in addition to traditional media components is typically labeled as multiple sensorial media or mulsemedia. Capturing multimedia user perceived Quality of Experience (QoE) is already non-trivial and the addition of multiple sensorial media components increases this challenge. No standardized methodology exists to conduct subjective quality assessments of multiple sensorial media applications. To date researchers have employed different aspects of audiovisual standards to assess user QoE of multiple sensorial media applications and thus, a fragmented approach exists. In this paper, the authors highlight issues researchers face from numerous perspectives including applicability (or lack of) existing audio-visual standards to evaluate user QoE and lack of result comparability due to varying approaches, specific requirements of olfactory-based multiple sensorial media applications, and novelty associated with these applications. Finally, based on the diverse approaches in the literature and the collective experience of authors, this paper provides a tutorial and recommendations on the key steps to conduct olfactory-based multiple sensorial media QoE evaluation.
Modeling pedestrian dynamics and their implementation in a computer are challenging and important issues in the knowledge areas of transportation and computer simulation. The aim of this paper is to provide a bibliographic outlook so that the reader could have a quick access to the most relevant works related with this problem. We have used three main axes to organise the paper contents: pedestrian models, validation techniques and multiscale approaches. The backbone of the paper is the classification of existing pedestrian models; we have organised the works in the literature under five categories, according to the techniques used for the operational level in each pedestrian model. Then, the main existing validation methods, oriented to evaluate the behavioural quality of the simulation systems, are reviewed. Furthermore, we review the key issues that arise when facing multiscale pedestrian modeling, where we firstly focus on the behavioural scale (combinations of micro and macro pedestrian models) and secondly, on the scale size (from individuals to crowds). Finally, the paper concludes with a discussion about the contributions that different knowledge fields can do in a near future to this exciting area.
Digital advances have transformed the face of automatic music generation since its beginnings at the dawn of computing. Despite the many breakthroughs, issues such as the musical tasks targeted by different machines and the degree to which they succeed remain open questions. We present a functional taxonomy for music generation systems with reference to existing systems according to the purposes for which they were designed. The taxonomy also reveals the inter-relatedness among the systems. This design-centred approach contrasts with predominant methods-based surveys, and facilitates the identification of grand challenges so as to set the stage for new breakthroughs.
Distributed and multi-agent planning (MAP) is a relatively recent research field that combines technologies, algorithms and techniques developed by the Artificial Intelligence Planning and Multi-Agent Systems communities. While planning has been generally treated as a single-agent task, MAP generalizes this concept by considering multiple intelligent agents that work together to develop a course of action that satisfies the goals of the group. This paper reviews the most relevant approaches to MAP, including the solvers that took part in the 2015 Competition of Distributed and Multi-Agent Planning, and classifies them according to the key features of the solvers, distribution and coordination.