Academic Plagiarism Detection: A Systematic Literature Review
Fairness assumptions are a valuable tool when reasoning about systems. In this paper, we classify several fairness properties found in the literature and argue that most of them are too restrictive for many applications. As an alternative we introduce the concept of justness.
Gait is a biometric trait that can allow user authentication, though being classified as a "soft" one due to a certain lack in permanence, and to sensibility to specific conditions. The earliest research relies on computer vision, especially applied in video surveillance. Traditional problems caused by illumination, perspective, occlusion, and noise have been inherited, and are still actively investigated. More recently, the spread of wearable sensors able to capture the dynamics of the walking pattern through simpler 1D signals has spurred a different research line. In particular, the standard sensors embedded in mobile devices like smartphones, originally built-in for utility, can be used to capture gait signals that can allow identifying and re-identifying walking subjects. This capture modality can solve some problems related to computer vision-based techniques, but suffers from specific limitations. Related research is still in a less advanced phase with respect to other biometric traits. However, the promising results achieved so far testify that it is worth continuing investigating. The increasing accuracy of sensors, the ubiquitous presence of mobile devices and the low cost of related techniques, make this biometrics attractive. This survey provides interested readers a reasoned and systematic overview over problems, approaches and available benchmarks.
The recent advances in DNA sequencing technology, from first generation sequencing (FGS) to third generation sequencing (TGS), have constantly transformed the genome research landscape. Its data throughput is unprecedented and of several folds as compared to the past technologies. The DNA sequencing technologies generate the sequencing data that are big, sparse, and heterogeneous. It results in the rapid development of various data protocols and bioinformatics tools. In this review, a historical snapshot of DNA sequencing is taken with an emphasis on data manipulation and tools. The technological history of DNA sequencing is described and reviewed in detail. To manipulate the sequencing data generated, different data protocols are introduced and reviewed. In particular, data compression methods are highlighted and discussed to provide readers a practical perspective in the real world setting, which has been largely ignored by most of the existing reviews. A large variety of bioinformatics tools are also reviewed to help readers extract the most from their sequencing data in different aspects such as sequencing quality control, genomic visualization, single nucleotide variant calling, INDEL calling, structural variation calling, and integrative analysis. At the end, we critically discuss the existing DNA sequencing technologies for its pitfalls and potential solutions.
Today?s Cyber-Physical Systems (CPS) are facing new cyber-attacks on daily basis. Traditional cyber security approaches and intrusion detection systems are based on old threat-knowledge and need to be updated on daily basis to stand against new generations of cyber-threats. To update the threat-knowledge database, there is a need for proper management and processing of the generated data. In recent years, computing platforms based on representation learning methodologies are emerging as a useful resource to manage and exploit the generated data to extract meaningful information. If properly utilized, strong intrusion prevention systems can be developed to protect CPS using these platforms. In this survey, we frst highlight various cyber-threats and initiatives taken by international organizations. Then we discuss various computing platforms based on representation learning models to process the generated data. We also highlight various popular data sets that can be used to train representation learning models. Recently made e?orts in the representation learning domain to protect CPS against cyber-threats are also discussed in detail. Finally, we highlight limitations as research challenges when using the available data sets and representation learning techniques designed for cyber security.
The wide proliferation of various wireless communication systems and wireless devices has led to the arrival of big data era in large scale wireless networks. Big data of large scale wireless networks has the key features of wide variety, high volume, real-time velocity and huge value leading to the unique research challenges that are different from existing computing systems. In this paper, we present a survey of the state-of-art big data analytics (BDA) approaches for large scale wireless networks. In particular, we categorize the life cycle of BDA into four consecutive stages: Data Acquisition, Data Preprocessing, Data Storage and Data Analysis. We then present a detailed survey of the technical solutions to the challenges in BDA for large scale wireless networks according to each stage in the life cycle of BDA. Besides, we discuss the open research issues and outline the future directions in this promising area.
Contrary to using distant and centralized cloud data center resources, employing decentralized resources at the edge of a network for processing data closer to user devices, such as smartphones and tablets, is an upcoming computing paradigm, referred to as fog/edge computing. Fog/edge resources are typically resource-constrained, heterogeneous, and dynamic compared to the cloud, thereby making resource management an important challenge that needs to be addressed. This article reviews publications as early as 1991, with 85% of the publications between 2013-2018, to identify and classify the architectures, infrastructure, and underlying algorithms for managing resources in fog/edge computing.
Although computer hardware is getting increasingly more powerful following the Moores laws, nothing stops end users from demanding for more immersive viewing experience in video streaming applications. 360° videos have become a popular video format because Head-Mounted Displays (HMDs) are mass-produced. HMDs allow viewers to naturally navigate through 360° videos by rotating their heads or rolling their eyes. Streaming 360° videos over the best-effort Internet, however, imposes tremendous challenges, because of the high resolution (>8 K) and the short response time (<100 ms) requirements. This survey presents the current literature related to 360° video streaming. We start from 360° video streaming systems built for real experiments for showing the practicality and efficiency of 360° video streaming.We then present the video and viewer datasets, which may be used to drive large-scale simulations. Different optimization tools in different stages of the 360° video streaming pipeline are discussed in details. We also present various applications enabled by 360° video streaming. This is followed by a quick review on the off-the-shelf hardware available at the time of writing. Last, future research directions are highlighted.
Huffman's algorithm for computing minimum-redundancy prefix-free codes has almost legendary status in the computing disciplines. Its elegant blend of simplicity and applicability has made it a favorite example in algorithms courses, and as a result it is perhaps one of the most commonly implemented algorithmic techniques. This paper presents a tutorial on Huffman coding, and surveys some of the developments that have flowed as a consequence of Huffman's original discovery, including details of code calculation, and of encoding and decoding operations. We also survey related mechanisms, covering both arithmetic coding and the recently-developed asymmetric numeral systems approach; and briefly discuss other Huffman-coding variants, including length-limited coding and infinite codes.
With the advent of fog and edge computing paradigms, computation capabilities have been moved towards the edge of the network to support the requirements of highly demanding services. To ensure the quality of such services is still met in the event of users? mobility, migrating services across different computing nodes becomes essential. Several studies have emerged recently to address service migration in different edge-centric research areas, including fog computing, multi-access edge computing (MEC), cloudlets and vehicular clouds. Since existing surveys in this area either focus on VM migration in general or migration in a single research field (e.g. MEC), the objective of this survey is to bring together studies from different, yet related, edge-centric research fields, while capturing the different facets they addressed. More specifically, we examine the diversity characterizing the landscape of migration scenarios at the edge, we present an objective-driven taxonomy of the literature and we highlight contributions that rather focused on architectural design and implementation. Finally, we identify a list of gaps and research opportunities based on the observation of the current state of the literature. One such opportunity lies in joining efforts from both networking and computing research communities to facilitate future research in this area.
In the ever-connected social networking era, terrorists exploit social media platforms via sophisticated approaches. To curb these activities, a rich collection of computational methods was developed. This article surveys the use of social media by terrorists, followed by a temporal classification framework that overviews computational counter-measures at four major stages, including inception of an attack, immediately before an attack, onset of an attack, and after an attack. The literature surveyed was organized around the four temporal stages. The resulting survey is summarized in a table with the main technology used in each stage based on the time of the attack.
This survey focuses on intrusion detection systems (IDS) that leverage host-based data sources for detecting attacks on enterprise network. The host-based IDS (HIDS) literature is organized by the input data source, presenting targeted sub-surveys of HIDS research leveraging system logs, audit data, Windows Registry, file systems, and program analysis. While system calls are generally included in audit data, several publicly available system call datasets have spawned a flurry of IDS research on this topic, which merits a separate section. Similarly, a section surveying algorithmic developments that are applicable to HIDS but tested on network data sets is included, as this is a large and growing area of applicable literature. To accommodate current researchers, a supplementary section giving descriptions of publicly available datasets is included, outlining their characteristics and shortcomings when used for IDS evaluation. Related surveys are organized and described. All sections are accompanied by tables concisely organizing the literature and datasets discussed. Finally, challenges, trends, and broader observations are throughout the survey and in the conclusion along with future directions of IDS research.
Video summarization is the method of extracting keyframes or clips from a video to generate a synopsis of the content of the video. Generally, video is compressed before storing or transmitting it in most of the practical applications. Traditional techniques require the video to be decoded to summarize them, which is a tedious job. Instead, Compressed Domain video processing can be used for summarizing videos by partially decoding them. A classification and analysis of various summarization techniques is presented with special focus on compressed domain techniques along with a discussion on Machine Learning based techniques that can be applied to summarize the videos.
Stream processing handles continuous big data in memory on a process-once-arrival basis, powering latency-critical application such as fraud detection, algorithmic trading, and health surveillance. Though the development of streaming applications has been facilitated by a variety of Data streaming Management Systems (DSMS), the problem of resource management and task scheduling is not automatically handled by the DSMS middleware and remains the heavy burden of the application providers. As the advent of cloud computing has supported customised deployment on rented resources, it is of great interest to investigate novel resource management mechanisms to host streaming systems in clouds satisfying the Quality of Service (QoS) while minimising the resource cost. In this paper, we introduce the hierarchical structure of a streaming system, define the scope of the resource management problem, and then present a comprehensive taxonomy regarding critical research topics such as resource provisioning, operator parallelisation, and task scheduling. We also review the existing works based on the proposed taxonomy, which helps in making a better comparison of the specific work properties and method features. Finally, we propose the open issues and research directions towards realising an automatic, QoS-aware resource management framework for deploying stream processing systems in distributed computing environments.
Video skimming, also known as dynamic video summarization, generates a temporally abridged version of a given video. Skimming can be achieved by identifying significant components either in uni-modal or multi-modal features extracted from the video. Being dynamic in nature, video skimming, through temporal connectivity, allows better understanding of the video from its summary. Having this obvious advantage, recently, video skimming has drawn the focus of many researchers benefiting from the easy availability of the required computing resources. In this paper, we provide a comprehensive survey on video skimming focusing on the substantial amount of literature from the past decade. We present a taxonomy of video skimming approaches, and discuss their evolution highlighting key advances. We also provide a study on the components required for the evaluation of a video skimming performance.
Motivation: In modern IT systems the increasing demand for computational power is tightly coupled with ever more energy consumption. Traditionally, energy efficiency research has focused on reducing energy consumption at the hardware level. Nevertheless, software itself provides numerous opportunities for energy efficiency. Goal: Since energy efficiency for IT systems is a rising concern, it is important to investigate existing work in the area of energy-aware software development and identify open research challenges. Our aim is to point out limitations, features, and trade-offs in terms of energy-performance for software development and provide insights on existing approaches, tools, and techniques for energy efficient programming. Method: We analyze and categorize research work mostly extracted from top tier conferences and journals with respect to the software development life cycle phases. Results: Our analysis shows a majority of the existing work concerns implementation and verification. The use of parallel and approximate programming, source code analyzers, efficient data structures, coding practices, and specific programming languages can significantly increase energy efficiency. Moreover, the utilization of energy monitoring tools and benchmarks can provide insights for the developers and raise energy-awareness for their development.
Phase Change Memory (PCM) is an emerging memory technology which has the capability to be used at main memory level of the memory hierarchy due to poor scalability, considerable leakage power, and high cost/bit of DRAM. PCM is a new resistive memory which is capable of storing data based on resistance values. The wide resistance range of PCM allows storing multiple bits per cell (MLC) rather than a single bit per cell (SLC). Unfortunately, PCM cells suffer from short lifetime. That means, PCM cells could tolerate a limited number of write operations and afterward, they tend to permanently stuck-at a constant value. Limited lifetime is a significant issue related to PCM memory, hence, in recent years many studies are conducted to prolong PCM lifetime. These schemes have vast variety and are applied at different architectural levels. In this survey, we review the important works of such schemes in order to give insights to those starting to research on PCMs.
Understanding peoples expertise is not a trivial task, since it is time-consuming when manually executed. Automated approaches have become a topic of research in recent years in various scientific fields, such as information retrieval, databases and machine learning. This article carries out a survey on automated expertise retrieval, i.e., finding data linked to a person that describes his expertise, which allows tasks such as profiling or finding people with a certain expertise. A faceted taxonomy is introduced that covers many of the existing approaches and classifies them on the basis of features chosen from studying the state-of-the-art. A list of open issues, with suggestions for future research topics, is introduced as well. It is hoped that our taxonomy and review of related works on expertise retrieval will be useful when analyzing different proposals and allow a better understanding of existing work and a systematic classification of future work on the topic.
Adaptive Authentication allows a system to dynamically select the best mechanism(s) for authenticating a user depending on contextual factors, such as location, proximity to devices, and other attributes. Though this technology has the potential to change the current password-dominated authentication landscape, research to date has not lead to practical solutions that transcend to our daily lives. Motivated to find out how to improve adaptive authentication design, we provide a structured survey of the existing literature to date and analyze it to derive future research directions.
Video description is the automatic generation of natural language sentences that describe the contents of a given video. It has applications in human-robot interaction, helping the visually impaired and video subtitling. The past few years have seen a surge of research in this area due to the unprecedented success of deep learning in computer vision and natural language processing. Numerous methods, datasets and evaluation metrics have been proposed in the literature, calling the need for a comprehensive survey to focus research e?orts in this ?ourishing new direction. This paper fills the gap by surveying the state of the art approaches with a focus on deep learning models; comparing benchmark datasets in terms of their domains, number of classes, and repository size; and identifying the pros and cons of various evaluation metrics like SPICE, CIDEr, ROUGE, BLEU, METEOR, and WMD. Classical video description approaches combined subject, object and verb detection with template based language models to generate sentences. However, the release of large datasets revealed that these methods can not cope with the diversity in unconstrained open domain videos. Classical approaches were followed by a very short era of statistical methods which were soon replaced with deep learning, the current ...
Interest in processing big data has increased rapidly to gain insights that can transform businesses, government policies and research outcomes. This has led to advancement in communication, programming and processing technologies, including Cloud computing services and technologies such as Hadoop, Spark and Storm. This trend also affects the needs of analytical applications, which are no longer monolithic but composed of several individual analytical steps running in the form of a workflow. These Big Data Workflows are vastly different in nature from traditional workflows. Researchers are currently facing the challenge of how to orchestrate and manage the execution of such workflows. In this paper, we discuss in detail orchestration requirements of these workflows as well as the challenges in achieving these requirements. We also survey current trends and research that supports orchestration of big data workflows and identify open research challenges to guide future developments in this area.
Systems deployed in regulated safety-critical domains (e.g., the medical, nuclear and automotive domains) are often required to undergo a stringent safety assessment procedure, as prescribed by a certification body, in order to demonstrate their compliance to one or more certification standards. Assurance cases are an emerging way of communicating safety, security, and dependability, as well as other properties of safety-critical systems in a structured and comprehensive manner. The significant size and complexity of these documents, however, makes the process of evaluating and assessing their validity a non-trivial task and an active area of research. Due to this, efforts have been made to develop and utilize software tools for the purpose of aiding developers and third party assessors in the act of assessing and analyzing assurance cases. This paper presents a survey of the various assurance case assessment features contained in ten assurance case software tools, all of which identified and selected by us via a previously conducted systematic literature review. We describe the various assessment techniques implemented, discuss their strengths and weaknesses, and identify possible areas in need of further research.
A large number of published datasets (or sources) that follow Linked Data principles is currently available and this number grows rapidly. However, the major target of Linked Data, i.e., linking and integration, is not easy to achieve. In general, information integration is difficult because a) datasets are produced, kept or managed by different organizations using different models, schemas, formats, b) the same real-world entities or relationships are referred with different URIs or names and in different natural languages, c) datasets usually contain complementary information, d) datasets can contain data that are erroneous, out-of-date or conflicting, e) datasets even about the same domain may follow different conceptualizations of the domain, f) everything can change (e.g., schemas, data) as time passes. This paper surveys the work that has been done in the area of Linked Data integration, it identifies the main actors and use cases, it analyzes and factorizes the integration process according to various dimensions, and it discusses the methods that are used in each step. Emphasis is given on methods that can be used for integrating large in numbers datasets. Based on this analysis, the paper concludes with directions that are worth further research.
Cyber attacks on both databases and critical infrastructure have threatened public and private sectors. Meanwhile, ubiquitous tracking and wearable computing have infringed upon privacy. Advocates and engineers have recently proposed using defensive deception as a means to leverage the information asymmetry typically enjoyed by attackers as a tool for defenders. The term deception, however, has been employed broadly and with a variety of meanings. In this paper, we survey 24 articles from 2007-2017 that use game theory to model defensive deception for cybersecurity and privacy. Then we propose a taxonomy that defines six types of deception: perturbation, moving target defense, obfuscation, mixing, honey-x, and attacker engagement. These types are delineated by their incentive structures, agents, actions, and duration: precisely concepts captured by game theory. Our aims are to rigorously define types of defensive deception, to capture a snapshot of the state of the literature, to provide a menu of models which can be used for applied research, and to identify promising areas for future work. Our taxonomy provides a systematic foundation for understanding different types of defensive deception commonly encountered in cybersecurity and privacy.
The emerging field of autonomous UAV cinematography is introduced and surveyed, while connections with different UAV application domains are examined. Current industry practices are formalized by presenting a UAV shot type taxonomy composed of framing shot types, single-UAV camera motion types and multiple-UAV camera motion types. Visually pleasing combinations of framing shot types and camera motion types are identified, while the presented camera motion types are modelled geometrically and graded into distinct energy consumption classes and required technology levels for autonomous capture. Two specific strategies are prescribed, namely focal length compensation and multidrone compensation, for partially overcoming a number of issues arising in UAV live outdoor event coverage, deemed as the most complex UAV cinematography scenario. Finally, the shot types compatible with each compensation strategy are explicitly identified. Thus, this tutorial both familiarizes readers coming from different backgrounds with the topic in a structured manner, and lays necessary groundwork for future advancements.
Data mining is used for finding meaningful information out of vast expense of data. With the advent of Big Data concept, data mining has come to much more prominence. Discovering knowledge out of gigantic volume of data efficiently is a major concern as the resources are limited. Cloud computing comes into play a major role in such situation. Cloud data mining fuses the applicability of classical data mining with the promises of cloud computing to perform knowledge discovery out of huge volumes of data with efficiency. This paper presents the existing frameworks and algorithms for cloud data mining. The frameworks are compared among each other based on similarity, data mining task support, parallelism, distribution, fault tolerance, security, memory types and storage systems and others. Similarly, the algorithms are grouped on the basis of data mining techniques such as clustering, classification and association rule mining and others. We also have attempted to discuss and identify the major applications of cloud data mining. The various taxonomies for cloud data mining frameworks and algorithms have been identified. This paper aims at gaining better insight into the present research realm and directing the future research towards efficient cloud data mining in future cloud systems.
Cloud pricing is an intricate issue due to crossing several domains of knowledge, such as cloud technologies, microeconomics, operations research, and even the philosophy of science. Many earlier works categorized the pricing models via a random walk manner. It inevitably leads to much confusion for many decision makers. The goal of this paper is to provide a systematic view of pricing in a multidisciplinary way. We present a comprehensive taxonomy of cloud pricing that is underpinned by value theory. Our taxonomy is driven by three fundamental pricing strategies that are built on the categories of a 3 by 3 matrix. These categories can map out the total of 60 pricing models. We provide the detail descriptions of these model categories and highlight both their advantages and disadvantages. Moreover, we provide an extensive survey of some pricing models that were proposed during the last decade. Based on the survey, we identify four trends of modeling and the overall direction of pricing, which is moving from intrinsic values per physical box to extrinsic values per serverless sandbox. We conclude a hyper-converged architecture is emerged and supported by cloud orchestration, virtual machine, and serverless sandbox. We outline four challenges of cloud pricing
Data mining information about people is becoming increasingly important in the data-driven society of the 21st century. Unfortunately, sometimes there are real-world considerations that conflict with the goals of data mining; sometimes the privacy of the people being data mined needs to be considered. This necessitates that the output of data mining algorithms be modified to preserve privacy while simultaneously not ruining the predictive power of the outputted model. Differential privacy is a strong, enforceable definition of privacy that can be used in data mining algorithms, guaranteeing that nothing will be learned about the people in the data that could not already be discovered without their participation. In this survey, we focus on one particular data mining algorithm -- decision trees -- and how differential privacy interacts with each of the components that constitute decision tree algorithms. We analyze both greedy and random decision trees, and the conflicts that arise when trying to balance privacy requirements with the accuracy of the model.
Orthogonal Moments provide an efficient mathematical framework for computer vision, image analysis and pattern recognition. They are derived from the polynomials which are relatively perpendicular to each other. Orthogonal Moments are more efficient than non-orthogonal moments for image representation with minimum attribute redundancy, robustness to noise, invariance to rotation, translation and scaling. Orthogonal Moments can be both continuous and discrete. Prominent continuous moments are Zernike, Pseudo-Zernike, Legendre and Gaussian-Hermite. This paper provides a comprehensive and comparative review for continuous orthogonal moments along with their applications.
Beyond 2014: Formal methods for attack tree-based security modeling
Performance-Aware Management of Cloud Resources: A Taxonomy and Future Directions
New Opportunities for Integrated Formal Methods
Metaprogramming is the process of writing computer programs that treat programs as data, enabling them to analyze or transform existing programs or generate new ones. While the concept of metaprogramming has existed for several decades, activities focusing on metaprogramming have been increasing rapidly over the past few years, with most languages offering some metaprogramming support and the amount of meta-code being developed growing exponentially. In this article, we introduce a taxonomy of metaprogramming languages and present a survey of metaprogramming languages and systems based on the taxonomy. Our classification is based on the metaprogramming model adopted by the language, the phase of the metaprogram evaluation, the metaprogram source location, and the relation between the metalanguage and the object language.
Semantic Annotation is a crucial precondition towards semantic web and has long been a research topic among communities. Currently, the most promising results are achieved via manual/semi-supervised approaches or hybrid of these two. There are already many surveys targeting the semantic annotators adopting manual/semi-supervised approaches. However, a comprehensive survey targeting unsupervised semantic approach is severely missing. Supervised approach means human intervention and training examples are required. With the vast amount of documents need to be annotated, fully automated semantic annotation is still the ultimate goal. Though fully automatic semantic annotation is hard, there are many works toward this goal. To better understand the state-of-art of fully automatic approaches for semantic annotation, this paper investigate literatures and present a classification of the approaches. In contrast to existing surveys, this paper focuses on fully automatic approaches. This paper helps reader understand the existing unsupervised approaches and get insight of the state-of-art.
Autonomous Systems would soon be integrating into our lives as delivery drones, driverless cars, and robots like Sophia. The implementation of the level of automation in these systems from being manually controlled to fully autonomous would depend upon the autonomy approach chosen to design these systems. This paper reviews the historical evolution of autonomy, its approaches and the current trends in related fields to build robust autonomous systems. Towards such a goal and with the increased number of cyberattacks, the security of these systems needs special attention from the research community. To gauge the extent to which research has been done in this area, we discuss the cybersecurity of these systems. It is essential to model the system from a security perspective, identify the vulnerabilities and threats and then model the attacks. A survey in this direction explored the theoretical/analytical system and attack models that have been proposed over the years and identified the research gap that needs to be addressed by the research community.
A character network is a graph extracted from a narrative, in which nodes represent characters and links correspond to interactions between them. A number of narrative-related problems can be addressed automatically through their analysis, such as summarization, classification, or role detection. Character networks are particularly relevant when considering works of fictions (novels, plays, movies, TV series, etc.), as their exploitation allows developing information retrieval and recommendation systems. However, works of fiction possess specific properties making these tasks harder. This survey aims at presenting and organizing the scientific literature related to the extraction of character networks from works of fiction, as well as their analysis. We first describe the extraction process in a generic way, and explain how its constituting steps are implemented in practice, depending on the medium of the narrative, the goal of the network analysis, and other factors. We then review the descriptive tools used to characterize character networks, with a focus on the way they are interpreted in this context. We illustrate the relevance of character networks by also providing a review of applications derived from their analysis. Finally, we identify the limitations of the existing approaches, and the most promising perspectives.
Large-scale software systems are currently designed as distributed entities and deployed in cloud data centers. To overcome the limitations inherent to this type of deployments, applications are increasingly being supplemented with components instantiated closer to the edges of networks ? a paradigm known as edge computing. The problem of how to efficiently orchestrate combined edge-cloud applications is however incompletely understood, and a wide range of techniques for resource and application management are currently in use. This paper investigates the problem of reliable resource provisioning in joint edge-cloud environments and surveys technologies, mechanisms, and methods that can be used to improve the reliability of distributed applications in diverse and heterogeneous network environments. Due to the complexity of the problem, special emphasis is placed on solutions to the characterization, management, and control of complex distributed applications using machine learning approaches. The structure of the survey is oriented around a decomposition of the problem of reliable resource provisioning into three categories of techniques: workload characterization and prediction, component placement and system consolidation, and application elasticity and remediation. Survey results are presented along with a problem-oriented discussion of the state of the art. Finally, a summary of identified challenges and an outline of future research directions are presented to conclude the paper.
Workflow scheduling is one of the challenging issues in emerging trends of the distributed environment that focuses on satisfying various quality of service (QoS) constraints. The cloud receives the applications as a form of a workflow, consisting of a set of interdependent tasks, to solve the large-scale scientific or enterprise problems. Workflow scheduling in cloud environment has been studied extensively over the years, and the paper provides a comprehensive review of the approaches. The paper analyses the characteristics of various workflow scheduling techniques and classifies them based on their objectives and execution model. In addition, the recent technological developments and paradigms such as serverless computing and Fog computing are creating new requirements/opportunities for workflow scheduling in a distributed environment. The serverless infrastructures are mainly designed for processing background tasks such as Internet-of-Things(IoT), web applications or event-driven applications. To address the ever-increasing demands of resources and to overcome the drawbacks of the cloud-centric IoT, Fog computing paradigm has been developed. The paper also discusses the workflow scheduling in the context of these emerging trends of cloud computing.
Deep Neural Networks (DNNs) are becoming an important tool in modern computing applications. Accelerating their training is a major challenge and techniques range from distributed algorithms to low-level circuit design. In this survey, we describe the problem from a theoretical perspective, followed by approaches for its parallelization. Specifically, we present trends in DNN architectures and the resulting implications on parallelization strategies. We discuss the different types of concurrency in DNNs; synchronous and asynchronous stochastic optimization; distributed system architectures; communication schemes; and performance modeling. Based on these approaches, we extrapolate potential directions for parallelism in deep learning.
Food is essential for human life and it is fundamental to the human experience. Food-related study may support multifarious applications and services, such as guiding the human behavior, improving the human health and understanding the culinary culture. With the fast development of social networks, mobile networks, and Internet of Things, people commonly upload, share, and record food images, recipes, cooking videos, and food diaries leading to large-scale food data. Large-scale food data offers rich knowledge about food and can help tackle many central issues of human society. Therefore, it is time to group several disparate issues related to food computing. Food computing acquires and analyzes heterogeneous food data from disparate sources for perception, recognition, retrieval, recommendation, and monitoring of food. In food computing, computational approaches are applied to address food related issues in various fields. Both large-scale food data and recent breakthroughs in computer science are transforming the way we analyze food data. Therefore, a vast amount of research work has been done, targeting different food-oriented tasks and applications. We formalize food computing and present such a comprehensive overview of various emerging concepts, methods, and tasks. We summarize key challenges and future directions ahead for food computing.
With the widespread of computing and mobile devices, authentication using biometrics has received greater attention. Although biometric systems usually provide good solutions, the recognition performance tend to be affected over time due to changing conditions and aging of the biometric data, resulting in intra-class variability. Adaptive biometric systems, which adapt the biometric reference over time, have been proposed to deal with such intra-class variability. This paper provides the most up-to-date and complete discussion on adaptive biometrics systems we aware of, including formalization, terminology, sources or variations that motivates the use of adaptation, adaptation strategies, evaluation methodology and open challenges.
The latest techniques in video motion magnification and relevant small motion analysis are surveyed. The main motion magnification techniques are discussed in chronological fashion, highlighting the inherent limitations of predecessor techniques in comparison with subsequent variants. The focus is then shifted to the specific stages within the motion magnification framework to discuss advancements that have been proposed in the literature, namely for spatial decomposition, and for emphasizing, representing and distinguishing different motion signals. The survey concludes with a treatment of different problems in varying application contexts that have benefitted from motion magnification and small motion analysis.
Although malicious software (malware) has been around since the early days of computers, the sophistication and innovation of malware has increased over the years. In order to protect institutions and the public from malware attacks, malicious activity must be detected as early as possible. Analyzing a suspicious file by static or dynamic analysis methods can provide relevant and valuable information regarding a file?s impact on the hosting system and help determine whether the file is malicious or not. Although dynamic analysis is more robust than static analysis, existing dynamic analysis tools and techniques are imperfect, and there is no single tool that can cover all aspects of malware behavior. Over the last seven years, computing environment has changed dramatically with new types of malware (ransomware, cryptominers); new analysis methods (volatile-memory forensics, side-channel analysis); new computing environments (cloud-computing, IoT devices); and more. The goal of this survey is to provide a comprehensive and up-to-date overview of existing methods used to dynamically analyze malware, which includes a description of each method, its strengths and weaknesses, and resilience against malware evasion techniques. In addition, we present prominent studies that applied machine learning methods to enhance dynamic analysis aimed at malware detection and categorization.
The machine learning community has been overwhelmed by a plethora of deep learning based approaches. Many challenging computer vision tasks such as detection, localization, recognition and segmentation of objects in unconstrained environment are being efficiently addressed by various types of deep neural networks like convolutional neural networks, recurrent networks, adversarial networks, autoencoders and so on. While there have been plenty of analytical studies regarding the object detection or recognition domain, many new deep learning techniques have surfaced with respect to image segmentation techniques. This paper approaches these various deep learning techniques of image segmentation from an analytical perspective. The main goal of this work is to provide an intuitive understanding of the major techniques that has made significant contribution to the image segmentation domain. Starting from some of the traditional image segmentation approaches, the paper progresses describing the effect deep learning had on the image segmentation domain. Thereafter, most of the major segmentation algorithms have been logically categorized with paragraphs dedicated to their unique contribution. With an ample amount of intuitive explanations, the reader is expected to have an improved ability to visualize the internal dynamics of these processes.
The number of applications being developed that require access to knowledge about the real world have increased rapidly over the past two decades. Domain ontologies, which formalize the terms being used in a discipline, have become essential for research in areas such as Machine Learning, the Internet of Things, Robotics, and Natural Language Processing because they enable separate systems to exchange information. The quality of these domain ontologies, however, must be assured for meaningful communication. Assessing the quality of domain ontologies for their suitability to potential applications remains difficult, even though a variety of frameworks and metrics have been developed for doing so. This paper reviews domain ontology assessment efforts to highlight the work that has been carried out and to clarify the important issues that remain. These assessment efforts are classified into six distinct evaluation approaches and the state-of-the-art of each described. Challenges associated with domain ontology assessment are outlined and recommendations made for future research and applications.
Autonomous robotic systems are complex, hybrid, and often safety-critical; which make their formal specification and verification uniquely challenging. Though commonly used, testing and simulation alone are insufficient to ensure the correctness of, or provide sufficient evidence for the certification of, autonomous robotics. Formal methods for autonomous robotics has received some attention in the literature, but no resource provides a current overview. This paper systematically surveys the state-of-the-art in formal specification and verification for autonomous robotics. Specially, it identifies and categorises the challenges posed by, the formalisms aimed at, and the formal approaches for the specification and verification of autonomous robotics.
In this survey, 105 papers related to interactive clustering were reviewed according to seven perspectives: (1) on what level is the interaction happening, (2) what interactive operations are involved, (3) how is user feedback incorporated, (4) how is the interactive clustering evaluated, (5) which data and (6) which clustering methods have been used, and (7) what outlined challenges there are. This paper serves as a comprehensive overview of the field and outlines the state-of-the-art within the area as well as identifies challenges and future research needs.
The skin offers exciting possibilities for human--computer interaction by enabling new types of input and feedback. We survey 42 research papers on interfaces that allow users to give input on their skin. Skin-based interfaces have developed rapidly over the past eight years but most work consists of individual prototypes, with limited overview of possibilities or identification of research directions. The purpose of this article is to synthesize what skin input is, which technologies can sense input on the skin, and how to give feedback to the user. We discuss challenges for research in each of these areas.
Coarse-grained reconfigurable architecture (CGRA) is attracting increasing interest in both academia and industry because of an excellent balance between energy efficiency and flexibility. As integrated circuit technology continues to scale down, CGRA has become a promising solution for the problem of the power wall. However, CGRA is still immature in terms of programmability, productivity and adaptability. Therefore, A thorough review of CGRAs architecture and design is undertaken for improvement. First, a novel multidimensional taxonomy is generated. Afterwards, all the major challenges that CGRA has encountered and the corresponding state-of-the-art techniques are surveyed and analyzed. Finally, the future development is discussed.
A systematic literature review is presented that surveyed the topic of cloud testing over the period (2012-2017). Cloud testing can refer either to testing cloud-based systems (testing of cloud), or to leverage the cloud for the purpose of testing (testing in the cloud): both approaches (and their combination into testing of cloud in the cloud) have drawn research interest. An extensive paper search was conducted by both automated query of popular digital libraries and by snowballing, which resulted into the final selection of 147 primary studies. Along the study a classification framework has been incrementally derived. The paper includes a quantitative analysis of the primary studies against such framework, as well as a discussion of their main highlights. We can conclude that cloud testing is an active and variegated research field, although not all topics have received so far enough attention.
Document layout analysis (DLA) is a preprocessing step of document understanding systems. It responsible for detecting and annotating document's physical structure. The DLA has several important applications such as document retrieval, content categorization, text recognition etc. Although DLA was started as a simple preprocessing phase of document understanding systems, it eventually becomes a separate and challenging research topic.The ultimate objective of DLA is to ease the subsequent analysis/recognition phases by initially identifying document homogeneous blocks. The DLA process consists of several phases that may vary from one system to another depending on types of document layouts and analysis sub-objectives. In this regard, a universal DLA algorithm that fits all types of document layouts or all analysis objectives is not yet developed. In this survey paper, we present a critical study of different document layout analysis techniques. It highlights the motivational reasons for pursuing DLA, and discusses comprehensively the different phases of DLA algorithms based on a general framework that has been formed as an outcome of reviewing the research in the field. The framework consists of preprocessing, layout analysis strategies, post-processing, and performance evaluation phases. Overall, the paper aims to deliver an essential baseline for pursuing further research in document layout analysis.
Malware still constitutes a major threat in the cybersecurity landscape, also due to the widespread use of infection vectors such as documents and other media formats. These infection vectors hide embedded malicious code to the victim users, thus facilitating the use of social engineering techniques to infect their machines. In the last decade, machine-learning algorithms provided an effective defense against such threats, being able to detect malware embedded in various infection vectors. However, the existence of an arms race in an adversarial setting like that of malware detection has recently questioned their appropriateness for this task. In this work, we focus on malware embedded in PDF files, as a representative case of how such an arms race can evolve. We first provide a comprehensive taxonomy of PDF malware attacks, and of the various learning-based detection systems that have been proposed to detect them. Then, we discuss more sophisticated attack algorithms that craft evasive PDF malware oriented to bypass such systems. We describe state-of-the-art mitigation techniques, highlighting that designing robust machine-learning algorithms remains a challenging open problem. We conclude the paper by providing a set of guidelines for designing more secure systems against the threat of adversarial malicious PDF files.
Recent advances in Internet of Things (IoT) have enabled myriad domains such as smart homes, personal monitoring devices, and enhanced manufacturing. IoT is now pervasive---new applications are being used in nearly every conceivable environment, which leads to the adoption of device-based interaction and automation. However, IoT has also raised issues about the security and privacy of these digitally augmented spaces. Program analysis is crucial in identifying those issues, yet the application and scope of program analysis in IoT remains largely unexplored by the technical community. In this paper, we study privacy and security issues in IoT that require program-analysis techniques with an emphasis on identified attacks against these systems and defenses implemented so far. Based on a study of five IoT programming platforms, we identify the key insights that result from research efforts in both the program analysis and security communities and relate the efficacy of program-analysis techniques to security and privacy issues. We conclude by studying recent IoT analysis systems and exploring their implementations. Through these explorations, we highlight key challenges and opportunities in calibrating for the environments in which IoT systems will be used.
In machine learning the data imbalance imposes challenges to perform data analytics in almost all areas of real world research. The raw primary data often suffers from the skewed perspective of data distribution of one class over the other as in the case of computer vision, information security, marketing and medical science. Goal of this paper is to present a comparative analysis of the approaches from the reference of data preprocessing, algorithmic and hybrid paradigms for contemporary imbalance data analysis techniques and their comparative study in lieu of different data distribution and their application areas.
Extracting characteristics from the training datasets of classification problems has proven effective in a number of meta-analyses. Among them, measures of classification complexity can estimate the difficulty in separating the data points into their expected classes. Descriptors of the spatial distribution of the data and estimates of the shape and size of the decision boundary are among the existent measures for this characterization. This information can support the formulation of new data-driven pre-processing and pattern recognition techniques, which can in turn be focused on challenging characteristics of the problems. This paper surveys and analyzes measures which can be extracted from the training datasets in order to characterize the complexity of the respective classification problems. Their use in recent literature is also reviewed and discussed, allowing to prospect opportunities for future work in the area. Finally, descriptions are given on an R package named Extended Complexity Library (ECoL) that implements a set of complexity measures and is made publicly available.
The color is a powerful communication component everywhere, not only as part of the message and its meaning, but also as way of discriminating contents therein. However, 5\% of world population suffer from a visual impairment, known as color vision deficiency (CVD), commonly known as colorblindness, which constrains the color perception. This handicap adulterates the way the color is perceived, compromising the reading and understanding of the message contents. This issue becomes even more serious due to the increasing availability of multimedia contents in computational environments, mainly on web and other resources provided by Internet, as well due to the growing on graphical software and tools. Aware of this problem, a significant number of research works related with the CVD condition have been described in the literature in the last two decades, in particular those aimed at improving the readability of contents via color enhancing, independently of they include text, images or both. This survey mainly addresses the state-of-the-art with respect to recoloring algorithms for still images, as well as to identify the current trends in the color adaptation techniques for colorblind people.