In the big data era, much real-world data can be naturally represented as graphs. Consequently, many application domains can be modelled as graph processing. Graph processing, especially the processing of the large scale graphs with the number of vertices and edges in the order of billions or even hundreds of billions, has attracted much attention in both industry and academia. It still remains a great challenge to process such large scale graphs. Researchers have been seeking for new possible solutions. Because of the massive degree of parallelism and the high memory access bandwidth in GPU, utilizing GPU to accelerate graph processing proves to be a promising solution. This paper surveys the key issues of graph processing on GPUs, including data layout, memory access pattern, workload mapping and specific GPU programming. In this paper, we summarize the state-of-the-art research on GPU-based graph processing, analyze the existing challenges in details, and explore the research opportunities in future.
Organizations are exposed to threats that increase the risk factor of their ICT systems and the assurance of their protection is crucial, as their reliance on information technology is a continuing challenge for both security experts and chief executives. To tackle down the threats decision makers should be provided with information needed to understand and mitigate them. Risk assessment forms a means of providing such information and facilitates the development of a security strategy. This paper aims at addressing the problem of selection an appropriate risk assessment method to assess and manage information security risks, by proposing a set of 17 criteria, grouped in 4 categories, for comparing such methods and provide a comparison of the 10 most popular methods based upon them. Finally, the comparison presented in the paper could be utilized by organizations to determine which method is more suitable for their needs.
Network Functions Virtualization (NFV) and Software-Defined Networking (SDN) are new paradigms towards open software and network hardware. While NFV aims at virtualizing network functions and deploying them into general purpose hardware, SDN makes networks programmable by separating the control and data planes. NFV and SDN are complementary technologies capable of providing one network solution. SDN can provide connectivity between Virtual Network Functions (VNFs) in a flexible and automated way, whereas NFV can use SDN as part of a service function chain. There are a great deal of studies proposing NFV/SDN architectures in different environments. Researchers have been trying to address reliability, performance, and scalability problems using different architectural designs. This Systematic Literature Review (SLR) focuses on integrated NFV/SDN architectures and has the following goals: i) to investigate and provide an in-depth review of the state-of-the-art of NFV/SDN architectures, ii) to synthesize their architectural designs, and iii) to identify areas for further improvements. In a broad view, this SLR will encourage researchers to advance the current stage of development (i.e., the state-of-the-practice) of integrated NFV/SDN architectures, as well as to shed some light on future research efforts and their challenges.
Smartphone applications to support healthcare are proliferating. A growing and important subset of these apps supports emergency medical intervention to address a wide range of illness-related emergencies in order to speed the arrival of relevant treatment. The emergency response characteristics and strategies employed by these apps are the focus in this study resulting in an mHealth Emergency Strategy Index (MESI). While a growing body of knowledge focuses on usability, safety and privacy aspects that characterize such apps, studies that map the various emergency intervention strategies and suggest criteria to evaluate their role as emergency agents are limited. We survey an extensive range of mHealth apps designed for emergency response along with the related assessment literature and present an index for mobile-based medical emergency intervention apps that can address assessment needs of future mHealth apps.
A tutorial resource for researchers who want to program a computer to perform a creative task, but who lack an interdisciplinary background in the psychology, philosophy, and cognitive science of creativity. We summarize interdisciplinary perspectives on what creativity is, how it is detected in humans, and how a person would falsifiably evaluate whether or not a computer has been creative. We then survey how these perspectives have and have not been used in actual computational creativity research, as well as what new perspectives on creativity have arisen specifically from computer science, and make recommendations for how they should be used by this field in the future.
The main achievements of spatio-temporal modelling in the field of Geographic Information Science over the past three decades are surveyed. This article offers an overview of: (i) the origins and history of Temporal Geographic Information Systems (T-GIS); (ii) relevant spatio-temporal data models proposed; (iii) the evolution of spatio-temporal modelling trends; and (iv) an analysis of the future trends and developments in T-GIS. It also presents some current theories and concepts that have emerged from the research performed, as well as a summary of the current progress and the upcoming challenges and potential research directions for T-GIS. One relevant result of this survey is the proposed taxonomy of spatio-temporal modelling trends, which classifies 186 modelling proposals surveyed from more than 1400 articles.
Online social networks (OSNs) are structures that help users interact, exchange, and propagate new ideas. The identification of the most influential users in OSNs is a significant process for accelerating information propagation including those in marketing applications, or for hindering the dissemination of unwanted contents such as viruses, negative online behaviors, and rumors. The present paper presents a detailed survey of influential users identification algorithms and their performance evaluation approaches in OSNs. The survey covers recent techniques, applications, and open research issues on the influential users identification in OSNs.
The rapid development of cloud computing promotes a wide deployment of data and computation outsourcing by resource-limited entities to cloud providers. Based on a pay-per-use fashion, clients without enough computational power can easily outsource large-scale computational tasks to the cloud. Nonetheless, the issue of security and privacy is a major concern when customers' confidential or sensitive data is processed and the output is generated in not fully trusted cloud environments. Recently, a number of publications have investigated and designed secure outsourcing schemes for different computational tasks. The aim of this survey is to systemize and present the cutting-edge technologies in this area. It starts by presenting security threats and requirements, followed by other factors that should be considered by secure computation outsourcing constructions. In an organized way, we then dwell on the existing secure computation outsourcing solutions to different computational tasks such as matrix computations, mathematical optimization, etc., treating the confidentiality of data as well as the integrity of result. Finally, we offer a discussion of the literature and provide a list of open challenges in the area.
The continuously increasing cost of the US healthcare system has received significant attention. Central to the ideas aimed at curbing this trend is the use of technology, in the form of the mandate to implement electronic health records (EHRs). EHRs consist of patient information such as demographics, medications, laboratory test results, diagnosis codes and procedures. Mining EHRs could lead to improvement in patient health management as EHRs contain detailed information related to disease prognosis for large patient populations. In this manuscript, we provide a structured and comprehensive overview of data mining techniques for modeling EHR data. We first provide a detailed understanding of the major application areas to which EHR mining has been applied and then discuss the nature of EHR data and its accompanying challenges. Next, we describe major approaches used for EHR mining, the metrics associated with EHRs, and the various study designs. With this foundation, we then provide a systematic and methodological organization of existing data mining techniques used to model EHRs and discuss ideas for future research.
Despite the rapid growth of hardware capacity and popularity in mobile devices, limited resources in battery and processing capacity still lack the ability to meet the increasing mobile users' demands. Both conventional techniques and emerging approaches are brought together to fill this gap between the user demand and mobile device's limited capacity. The cloud computing is an uprising topic in both business and academia in recent years to eliminate the gap. Augmentation techniques such as computation outsourcing and service oriented architectures are proposed by the proposed works, and new challenges regarding the augmentation techniques, energy efficiency, etc, needs to be studied. In this paper, we aim to provide a comprehensive taxonomy and survey of the existing techniques and frameworks for mobile cloud augmentation in terms of both computation and storage. Different from the existing taxonomies in this field, we focus on the techniques aspect, following the idea of realizing a complete mobile cloud computing system. The objective of this survey is to provide a guide on what available augmentation techniques can be adopted in mobile cloud computing systems as well as supporting mechanisms such as decision making and fault tolerance policies for realizing reliable mobile cloud services.
Large volumes of spatio-temporal data are increasingly collected and studied in diverse domains including, climate science, social sciences, neuroscience, epidemiology, transportation, mobile health, and Earth sciences. Spatio-temporal data differs from relational data for which computational approaches are developed in the data mining community for multiple decades, in that both spatial and temporal attributes are available in addition to the actual measurements/attributes. The presence of these attributes introduces additional challenges that needs to be dealt with. Approaches for mining spatio-temporal data have been studied for over a decade in the data mining community. In this article we present a broad survey of this relatively young field of spatio-temporal data mining. We discuss different types of spatio-temporal data and the relevant data mining questions that arise in the context of analyzing each of these datasets. Based on the nature of the data mining problem studied, we classify literature on spatio-temporal data mining into six major categories: clustering, predictive learning, change detection, frequent pattern mining, anomaly detection, and relationship mining. We discuss the various forms of spatio-temporal data mining problems in each of these categories.
Traffic load in any infrastructured 802.11 network is typically distributed unevenly between access points (AP) creating hotspots. This is due to the inherent nature of wireless networks (WLAN) where stations are free to associate to any known AP they desire and the lack of control by the APs themselves. This imbalance creates a condition where affected APs in the network suffer traffic congestion while others are under utilised leading to stations experiencing lower throughput, longer latency and operating below the network potential capacity. To alleviate this problem, some form of load balancing is required to redistribute the work load amongst other available APs in the wireless network. This paper presents a survey of the various works done in performing load balancing in an infrastructured 802.11 wireless network and will cover the common methods including admission control, association management, cell breathing and association control. Updates to the IEEE standards are also presented that support load balancing efforts. Finally, software defined networks are investigated to determine the extent of control integration to support managing and load balancing WLANs. Trends in load balancing research is also uncovered which indicate how the introduction of new wireless standards influence the amount of research.
The need to handle (process and store) massive amounts of data (Big Data), is a reality. In areas such as scientific experiments, social networks, credit card fraud detection, and financial analysis, massive amounts of information is generated and processed daily to extract valuable, summarized information. Due to its fast development cycle (i.e., less expensive to develop), mainly because of automatic memory management, and rich community resource, managed object-oriented programming languages (such as Java) are the first choice to develop Big Data platforms (e.g., Cassandra, Spark) on which such Big Data applications are executed. However, automatic memory management comes at a cost. This cost is introduced by the Garbage Collector which is responsible for collecting objects that are no longer being used. In this work, we study current Big Data platforms and their memory profiles to understand why classic algorithms (which are still the most common) are not appropriate and also analyze recently proposed and relevant memory management algorithms, targeted to Big Data environments. We characterize the scalability of recent memory management algorithms in terms of throughput (improves the throughput of the application) and pause time (reduces the latency of the application) when comparing to classic algorithms. We conclude our study by presenting a taxonomy of the described works.
Underwater wireless sensor networks (UWSNs) --- formed by underwater sensor nodes with sensing, processing, storage and underwater wireless communication capabilities --- will pave the way for a new era of underwater monitoring and actuation applications. UWSN has become a fast growing field. The envisioned landscape of applications that will be enabled by UWSNs has tremendous potential to change the current reality, where no more than 5\% of the volume of the oceans were explored. However, to enable large deployments of UWSNs, networking solutions toward efficient underwater data collection need to be investigated and proposed. The suitable, autonomous and on-the-fly organization of UWSN topology, through topology control algorithms, might mitigate undesired effects of the underwater wireless communication and, consequently, improve networking services and protocols. In this paper, therefore, we highlight the potentials of topology control for underwater sensor networks. We proposed to classify topology control algorithms, based on their principal methodology used to change the network topology, into three major groups: power control, wireless interface mode management and mobility assisted-based techniques. On the basis of the proposed classification, we survey the current state-of-the-art and present an in-depth discussion of topology control solutions designed for UWSNs.
As applications and operating systems are becoming more complex, the last decade has seen the rise of many tracing tools all across the software stack. This paper presents a hands-on comparison of modern tracers on Linux systems, both in user space and kernel space. The authors implement microbenchmarks that not only quantify the overhead of different tracers, but also sample fine-grained metrics that unveil insights into the tracers' internals and show the cause of each tracer's overhead. Internal design choices and implementation particularities are discussed, which helps to understand the challenges of developing tracers. Furthermore, this analysis aims to help users choose and configure their tracers based on their specific requirements in order to reduce their overhead and get the most of out of them.
Networks are used to represent relationships between entities in many complex systems, spanning from online social networks to biological cell development and brain activity. These networks model relationships which present various challenges. In many cases, relationships between entities are unambiguously known: are two users friends in a social network? Do two researchers collaborate on a published paper? Do two road segments in a transportation system intersect? These are unambiguous and directly observable in the system in question. In most cases, relationship between nodes are not directly observable and must be inferred: does one gene regulate the expression of another? Do two animals who physically co-locate have a social bond? Who infected whom in a disease outbreak? Existing approaches use specialized knowledge in different home domains to infer and measure the goodness of inferred network for a specific task. However, current research lacks a rigorous validation framework which employs standard statistical validation. In this survey, we examine how network representations are learned from non-network data, the variety of questions and tasks on these data over several domains, and validation strategies for measuring the inferred network's capability of answering questions on the original system of interest.
Approximate computing has gained research attention recently as a way to increase energy efficiency and/or performance by exploiting some applications' intrinsic error resiliency. However, little attention has been given to its potential for tackling the communication bottleneck which remains as one of the looming challenges to be tackled for efficient parallelism. This paper seeks to explore the potential benefits of approximate computing for communication reduction by surveying four promising techniques for approximate communication - compression, relaxed synchronization, value prediction, and accelerators. The techniques are compared based on an evaluation framework composed of: communication cost reduction, performance, energy reduction, application domain, overheads, and output degradation. Comparison results show that lossy link compression and approximate value prediction are good choices for reducing the communication bottleneck in bandwidth constrained applications, while relaxed synchronization and approximate accelerators can achieve greater speedups on applications amenable to these techniques. Finally, this paper also includes several suggestions for future research on approximate communication techniques.
Web application providers have been migrating their applications to cloud data centers, attracted by the emerging cloud computing paradigm. One of the appealing features of cloud is elasticity. It allows cloud users to acquire or release computing resources on demand, which enables web application providers to auto-scale the resources provisioned to their applications under dynamic workload in order to minimize resource cost while satisfying Quality of Service (QoS) requirements. In this paper, we comprehensively analyze the challenges remain in auto-scaling web applications in clouds and review the developments in this field. We present a taxonomy of auto-scaling systems according to the identified challenges and key properties. We analyze the surveyed works and map them to the taxonomy to identify the weakness in this field. Moreover, based on the analysis, we propose new future directions.
Recent diversity of storage demands made various shortcomings of traditional RDBMS systems revealed, which in turn led to the emergence of a new trend of complementary non-relational data management solutions, named as NoSQL (Not only SQL). This survey mainly aims at presenting the work that has been conducted with regard to four closely related concepts of NoSQL stores: data model, consistency model, data partitioning and replication. For each concept, its different protocols, and for each protocol, its corresponding features, strengths and drawbacks are explained. Furthermore, various implementations of each protocol are exemplified and crystallized through a collection of representative academic and industrial NoSQL technologies. The rationale behind each design decision along with some corresponding extensions and improvements are discussed. Finally, we disclose some existing challenges in developing effective NoSQL stores, which need attention from the research community, application designers and architects.
Recent Advancements in Event Processing
Network-enabled sensing and actuation devices are key enablers to connect real-world objects to the cyber world. Internet of Things (IoT) uses these network-enabled devices and communication technologies to allow connectivity and integration of physical objects (Things) from real-world into the data-driven digital world (Internet). Enormous amounts of dynamic IoT data are collected from Internet-connected devices. IoT data is, however, often multi-variant streams that are heterogeneous, sporadic, multi-modal and spatio-temporal. IoT data can be disseminated with different granularities and have diverse structures, types and qualities. Dealing with data deluge from heterogeneous IoT resources and services impose challenges on indexing, discovery and ranking mechanisms to build applications that require on-line access and retrieval of IoT data. However, the existing IoT data indexing and discovery approaches are complex (usually based on formal and logical methods) or centralised which hinder their scalability. The primary objective of this paper is to provide a holistic overview of the state-of-the-art on indexing, discovering and ranking of IoT data. We discuss on-line analysis and fast responses to complex queries. The paper aims to pave the way for researchers to design, develop, implement and evaluate techniques and approaches in future for on-line large-scale distributed IoT applications and platforms.
Shape-changing interfaces are physically tangible, interactive devices, surfaces or spaces. Over the last fifteen years, research has produced functional prototypes over many use-applications, and reviews have identified themes and possible future directions but have not yet looked at possible design or application based research. Here we gather this information together to provide a reference for designers and researchers wishing to build upon existing prototyping work, using synthesis and discussion of existing shape-changing interface reviews and comprehensive analysis and classification of 78 shape-changing interfaces. Eight categories of prototype are identified, alongside recommendations for the field.
Crowd-centric research is receiving increasingly more attention as data sets on crowd behavior are becoming readily available. We have come to a point that many of the models on pedestrian analytics introduced in the last decade, which have mostly not been validated, can now be tested using real-world data sets. In this survey we concentrate exclusively on automatically gathering such data sets, which we refer to as sensing the behavior of pedestrians. We roughly distinguish two approaches: one that requires users to explicitly use local applications and wearables, and one that scans the presence of handheld devices such as smartphones. We come to the conclusion that despite the numerous reports in popular media, relatively few groups have been looking into practical solutions for sensing pedestrian behavior. Moreover, we find that much work is still needed, in particular when it comes to combing privacy, transparency, scalability, and ease of deployment. We report on over 90 relevant articles and discuss and compare in detail 30 reports on sensing pedestrian behavior.
The last decades have seen a growing interest and demand for collaborative systems and platforms. These systems and platforms aim to provide an environment in which users can collaboratively create, share and manage resources. While offering attractive opportunities for online collaboration and information sharing, they also open several security and privacy issues. This has attracted several research efforts towards the design and implementation of novel access control solutions that can handle the complexity introduced by collaboration. Despite these efforts, transition to practice has been hindered by the lack of maturity of the proposed solutions. The access control solutions typically adopted by commercial collaborative systems like online social network websites and collaborative editing platforms, are still rather rudimentary and do not provide users with a sufficient control over their resources. This survey examines the growing literature on access control for collaborative systems centered on communities, and identifies the main challenges to be addressed in order to facilitate the adoption of collaborative access control solutions in real-life settings. Based on the literature study, we delineate a roadmap for future research in the area of access control for community-centered collaborative systems.
Owing to the widespread adoption of GPS-enabled devices, such as smart phones and GPS navigation devices, more and more location information is being collected. Compared with traditional ones (e.g., Amazon, Taobao and Dangdang), recommender systems built on location-based social networks (LBSNs) have received much attention. The former mine users preference through the relationship between users and items, e.g., online commodity, movies and music. Based on their preference, items in which they may be interested are recommended in order to help them find the items that they may like. The latter add location as a new dimension to the former, hence resulting in the three-dimensional relationship among users, locations and activities. Based on this relationship, locations, activities and friends can be recommended to users. For example, users are allowed to check in at different location on Facebook and Foursquare by using their GPS-enabled devices, which can be further used to analyze their preference. In the paper, we review the objectives and state-of-the-art of LBSN recommender systems. We indicate potential research directions.
High Performance Computing (HPC) clouds are becoming an alternative to on-premise clusters for executing traditional scientific applications and analytics business services. Most research efforts in HPC cloud aim to understand the cost-benefit of moving resource-intensive applications from dedicated on-premise environments to shared public cloud platforms. Industry trends show hybrid environments are the natural path to get the best of the on-premise and cloud resources---steady (and sensitive) workloads can run on on-premise resources and peak demand can leverage remote resources in a pay-as-you-go manner. Nevertheless, there are plenty of questions to be answered in HPC cloud, which range from how to extract the best performance of an unknown underlying platform to what services are essential to make it easier its usage. Moreover, the discussion on the right pricing and contractual models that will fit both small and large users is relevant for the sustainability of HPC clouds. This paper brings a survey and taxonomy of efforts in HPC cloud and a vision on what we believe is ahead of us, including a set of research challenges that, once tackled, can help advance businesses and scientific discoveries. This becomes particularly relevant due to the fast increasing wave of new HPC applications coming from big data and artificial intelligence.
Automatic machine-based Facial Expression Analysis (FEA) has witnessed substantial progress in the past few decades motivated by its importance in psychology, security, health, entertainment and human computer interaction. However, the vast majority of current studies are based on non-occluded faces collected in a controlled laboratory environment, and automatic expression recognition from partially occluded faces remains a largely unresolved field, particularly in real-world scenarios. In recent years, increasing efforts have been directed at investigating techniques to handle partial occlusion for FEA. This survey provides a comprehensive review of the recent advances in dataset creation, algorithm development, and investigations of the effects of occlusion, which are crucial in system design and evaluations. It also outlines existing challenges in overcoming partial occlusion and discusses possible opportunities in advancing the technology. To the best of our knowledge, it is the first FEA survey dedicated to occlusion and devoted to serve as a starting point to promote future work.
The goal of privacy metrics is to measure the degree of privacy enjoyed by users in a system and the amount of protection offered by privacy-enhancing technologies. In this way, privacy metrics contribute to improving user privacy in the digital world. The diversity and complexity of privacy metrics in the literature makes an informed choice of metrics challenging. As a result, redundant new metrics are proposed frequently, and privacy studies are often incomparable. In this survey we alleviate these problems by structuring the landscape of privacy metrics. For this we explain and discuss a selection of over eighty privacy metrics and introduce a categorization based on the aspect of privacy they measure, their required inputs, and the type of data that needs protection. In addition, we present a method on how to choose privacy metrics based on eight questions that help identify the right privacy metrics for a given scenario, and highlight topics where additional work on privacy metrics is needed. Our survey spans multiple privacy domains and can be understood as a general framework for privacy measurement.
Metamorphic testing is an approach to both test case generation and test result verification. A central element is a set of metamorphic relations, which are necessary properties of the target function or algorithm in relation to multiple inputs and their expected outputs. Since its first publication, we have witnessed a rapidly increasing body of work examining metamorphic testing from various perspectives, including metamorphic relation identification, test case generation, integration with other software engineering techniques, and the validation and evaluation of software systems. In this paper, we review the current research of metamorphic testing and discuss the challenges yet to be addressed. We also present visions for further improvement of metamorphic testing and highlight opportunities for new research.
This article presents a comprehensive survey on parallel I/O. This is an important field for High Performance Computing because of the historic gap between processing power and storage latencies, which causes applications performance to be impaired when accessing or generating large amounts of data. As the available processing power and amount of data increase, I/O remains a central issue for the scientific community. In this survey, we present background concepts everyone could benefit from. Moreover, through the comprehensive study of publications from the most important conferences and journals in a five-year time window, we discuss the state of the art of I/O optimization approaches, access pattern extraction techniques, and performance modeling, in addition to general aspects of parallel I/O research. Through this approach, we aim at identifying the general characteristics of the field and the main current and future research topics.
Robots are sophisticated machines that are susceptible to different types of faults. These faults have to be detected and diagnosed in time to allow recovery and continuous operation. The field of Fault Detection and Diagnosis (FDD) has been studied for many years. Yet, the study of FDD for robotics is relatively new, and only few surveys were presented. These surveys have focused on traditional FDD approaches and how they may broadly apply to a generic type of robots. Yet, robotic systems can be identified by fundamental characteristics, which pose different constraints and requirements from FDD. In this paper, we aim to provide the reader with useful insights regarding the use of FDD approaches which best suit the different characteristics of robotic systems. We elaborate on the advantages and the challenges these approaches must face. We use two perspectives: (1) FDD from the perspective of the different characteristics of robotic systems, and (2) FDD from the perspective of the different approaches. Finally, we describe research opportunities. With these three contributions readers from both the FDD and the robotics research communities are introduced to this subject.
Designing an optimal distributed database is an extremely complex process due to many factors like large number of relations, data transmission costs, number of network sites, communication costs between sites and query response time. In the sake of achieving an optimal design, fragmentation, replication and data allocation techniques are the key factors for providing a high rendering and supporting data access and sharing at different sites. It is worth saying, however, that these techniques often treated separately and rarely processed together. Some researches sought to find only optimal allocation methods regardless of how the fragmentation technique is performed or replication process is adopted. In contrast, others attempt to find the best fragment solution without considering how allocation would be performed. In this paper, most of different fragmentation, replication and allocation techniques are extensively and precisely scrutinized in contemporary literature for both centralized and distributed databases. Furthermore, some of these techniques presented as cases study for well-analyzed fragmentation and allocation models. These cases are cited as evidence proving that a well designed distributed database can result in significant reduction in communication costs, response time and substantial boost in performance outperforming over centralized systems for geographically distributed sites.
While cloud computing has brought paradigm shifts to computing services, researchers and developers have also found some problems inherent to its nature such as bandwidth bottleneck, communication overhead, and location blindness. The concept of fog/edge computing is therefore coined to extend the services from the core in cloud data centers to the edge of the network. In recent years, many systems are proposed to better serve ubiquitous smart devices closer to the user. This paper provides a complete and up-to-date review of edge-oriented computing systems by encapsulating relevant proposals on their architecture features, management approaches, and design objectives.
Despite the increasing use of social media for information and news gathering, its unmoderated nature often leads to the emergence and spread of rumours, i.e. unverified pieces of information. At the same time, the openness of social media provides opportunities to study how users share and discuss rumours, and to explore how natural language processing and data mining techniques may be used to find ways of determining their veracity. In this survey we introduce and discuss two types of rumours that circulate on social media; long-standing rumours that circulate for long periods of time, and newly-emerging rumours spawned during fast-paced events such as breaking news, where unverified reports are often released piecemeal. We provide an overview of research into social media rumours with the ultimate goal of developing a rumour classification system that consists of four components: rumour detection, rumour tracking, rumour stance classification and rumour veracity classification. We delve into the approaches presented in the scientific literature for the development of each of these components. We summarise the efforts and achievements so far towards the development of rumour classification systems and conclude with suggestions for avenues for future research in social media mining for detection and resolution of rumours.
Activity recognition aims to provide accurate and opportune information on peoples activities by leveraging sensory data available in todays sensory rich environments. Nowadays, activity recognition has become an emerging field in the areas of pervasive and ubiquitous computing. A typical activity recognition technique processes data streams that evolve from sensing platforms such as mobile sensors, on body sensors, and/or ambient sensors. This paper surveys the two overlapped areas of research of activity recognition and data stream mining. The perspective of this paper is to review the adaptation capabilities of activity recognition techniques in streaming environment. Broad categories of techniques are identified based on the different features in both data streams and activity recognition. The pros and cons of the algorithms in each category are analysed and the possible directions of future research are indicated.
In recent years, eye-tracking has been used by researchers in the field of programming education to analyse and understand tasks such as code comprehension, debugging, collaborative programming, tractability and the comprehension of non-code programming representations. Eye-trackers are used to gain more insights into the cognitive process of programmers and programming techniques. In this paper, we perform a systematic literature review (SLR) on existing research using eye-tracking in computer programming. We identify, evaluate, and report 65 studies, published between 1990 and 2015. Participants in these studies were mainly students and faculty members with the common programming language used are Java and UML representation. We also report on a range of eye-trackers and attention tracking tools utilized in these studies and found that the Tobii eye-trackers are more preferred among researchers. In this SLR, we report the findings based on the materials, participant sample, and eye-tracking device used in each experiment.
Online judges are systems designed for the reliable evaluation of algorithm source code submitted by users, which is next compiled and tested in a homogeneous environment. Online judges are becoming popular in various applications. Thus, we would like to review the state of the art for these systems. We classify them according to their principal objectives into systems supporting organization of competitive programming contests, enhancing education and recruitment processes, or facilitating the solving of data mining challenges, online compilers and development platforms integrated as components of other custom systems. Moreover, we present the Optil.io platform, which has been proposed for the solving of complex optimization problems. We also present the advantages of our system by analysis of the competition results conducted using the proposed platform. The competition proved that this platform, strengthened by crowdsourcing concepts, can be successfully applied to accurately and efficiently solve complex industrial- and science-driven challenges.
With the proliferation of online services and mobile technologies, the world has stepped into a multimedia big data era. Lots of research work have been done in the multimedia area, targeting at different aspects of big data analytics, such as the capture, storage, indexing, mining, and retrieval of multimedia big data. However, very few research work provides a complete survey of the whole pine-line of the multimedia big data analytics, including the management and analysis of the large amount of data, the challenges and opportunities, and the promising research directions. To serve this purpose, we present this survey which conducts a comprehensive overview of the state-of-the-art research work on multimedia big data analytics. It also aims to bridge the gap between multimedia challenges and big data solutions by providing the current big data frameworks, their applications in multimedia analyses, the strengths and limitations of the existing methods, and the potential future directions in multimedia big data analytics. To the best of our knowledge, this is the first survey which targets the most recent multimedia management techniques for very large-scale data and also provides the research studies and technologies advancing the multimedia analyses in this big data era.
Networks built to model real world phenomena are characeterised by some properties that have attracted the attention of the scientific community: (i) they are organised according to community structure and (ii) their structure evolves with time. Many researchers have worked on methods that can efficiently unveil substructures in complex networks, giving birth to the field of community discovery. A novel and challenging problem started capturing researcher interest recently: the identification of evolving communities. To model the evolution of a system, dynamic networks can be used: nodes and edges are mutable and their presence, or absence, deeply impacts the community structure that compose them. The aim of this survey is to present the distinctive features and challenges of dynamic community discov- ery, and propose a classification of published approaches. As a user manual, this work organizes state of art methodologies into a taxonomy, based on their rationale, and their specific instanciation. Given a desired definition of network dynamics, community characteristics and analytical needs, this survey will support re- searchers to identify the set of approaches that best fit their needs. The proposed classification could also help researchers to choose in which direction should future research be oriented.
We present a survey of multi-robot assembly applications and methods, and describe trends and general insights into the multi-robot assembly problem for industrial applications. We focus on fixtureless assembly strategies featuring two or more robotic systems. Such robotic systems include industrial robot arms, dexterous robotic hands, and autonomous mobile platforms, such as automated guided vehicles. In this survey, we identify the types of assemblies that are enabled by utilizing multiple robots, the algorithms that synchronize the motions of the robots to complete the assembly operations, and the metrics used to assess the quality and performance of the assemblies.
Crowdsourcing enables one to leverage on the intelligence and wisdom of potentially large groups of individuals toward solving problems. Common problems approached with crowdsourcing are labeling images, translating or transcribing text, providing opinions or ideas, and similar all tasks that computers are not good at or where they may even fail altogether. The introduction of humans into computations and/or everyday work, however, also poses critical, novel challenges in terms of quality control, as the crowd is typically composed of people with unknown and very diverse abilities, skills, interests, personal objectives and technological resources. This survey studies quality in the context of crowdsourcing along several dimensions, so as to define and characterize it and to understand the current state of the art. Specifically, this survey derives a quality model for crowdsourcing tasks, identifies the methods and techniques that can be used to assess the attributes of the model, and the actions and strategies that help prevent and mitigate quality problems. An analysis of how these features are supported by the state of the art further identifies open issues and informs an outlook on hot future research directions.
Information security systems, which protect networks and computers against cyber attacks, are becoming common due to increasing threats and government regulation. At the same time, the enormous amount of data gathered by information security systems poses a serious threat on the privacy of the people protected by those systems. To ground this threat, we survey common and novel information security technologies and analyze them according to the potential for privacy invasion. We suggest a taxonomy for privacy risks assessment of information security technologies, based on the level of data exposure, the level of identification of individual users, the data sensitivity and the user control over the monitoring, collection and analysis of the data. We discuss our results in light of the recent trends in information security and suggest several new directions for making these mechanisms more privacy-aware.
Network management and maintenance are time-consuming and often challenging tasks. With the emergent Software-Defined Networking paradigm, most of the focus is directed to the evolution of control protocols and platforms, or to deployment problems. Although researchers and network operators consider network management as a primary requirement, its development in SDN has been apparently set aside. This paper reports on the SDN architecture, introduces the concept of SDN tools and surveys the state-of-the-art in different aspects of the network management with emphasis on SDN. Because the SDN ecosystem lacks of a standardized management framework, initiatives are different and scattered.
Context: Recent years have seen growing interest in open-ended interactive tools such as games. One of the most crucial factors in developing games is to model and predict individual behavior. Although model-based approaches have been considered a standard way for this purpose, their application is often extremely difficult due to a huge space of actions can be created by games. For this reason, data-driven approaches have shown promise, in part because they are not completely reliant on expert knowledge. Objective: This study seeks to systematically review the existing research on the use of data-driven approaches in game player modeling. Method: We have carefully surveyed a nine-year sample (2008-2016) of experimental studies conducted on data-driven approaches in game player modeling, and thereby found 36 studies that addressed four primary research questions, and so we analyzed and classified the questions, methods, and findings of these published works, which we evaluated and drew conclusions from based on non-statistical methods. Results: We found that there are three primary avenues in which data-driven approaches have been studied in games research. In conclusion, we highlight critical future challenges in the area and offer directions for future study
Modern cloud environments support a relatively high degree of automation in service provisioning, which allows cloud users to dynamically acquire services required for deploying cloud applications. Cloud modeling languages (CMLs) have been proposed to address the diversity of features provided by todays cloud environments and support different application scenarios, e.g. migrating existing applications to the cloud, developing new cloud applications, or optimizing them. There is, however, still much debate on what a CML is and what aspects of a cloud application and the target cloud environment should be modeled by a CML. Furthermore, the distinction between CMLs on a fine-grained level exposing their modeling concepts is rarely made. In this article, we investigate the diverse features currently provided by existing CMLs. We classify and compare them according to a common framework with the goal to support cloud users in selecting the CML which fits the needs of their application scenario and setting. As a result, not only features of existing CMLs are pointed out for which extensive support is already provided but also in which existing CMLs are deficient, thereby suggesting a research agenda for the future.