NDSS Symposium 2021 Program - NDSS Symposium (original) (raw)

07:00 - 07:20
Wednesday Welcome, Awards

07:30 - 08:50

Session 5A: “Smart” Home

Chair: Adwait Nadkarni, Syed Rafiul Hussain

Christopher Lentzsch (Ruhr-Universität Bochum), Sheel Jayesh Shah (North Carolina State University), Benjamin Andow (Google), Martin Degeling (Ruhr-Universität Bochum), Anupam Das (North Carolina State University), William Enck (North Carolina State University)
Amazon's voice-based assistant, Alexa, enables users to directly interact with various web services through natural language dialogues. It provides developers with the option to create third-party applications (known as Skills) to run on top of Alexa. While such applications ease users' interaction with smart devices and bolster a number of additional services, they also raise security and privacy concerns due to the personal setting they operate in. This paper aims to perform a systematic analysis of the Alexa skill ecosystem. We perform the first large-scale analysis of Alexa skills, obtained from seven different skill stores totaling to 90,194 unique skills. Our analysis reveals several limitations that exist in the current skill vetting process. We show that not only can a malicious user publish a skill under any arbitrary developer/company name, but she can also make backend code changes after approval to coax users into revealing unwanted information. We, next, formalize the different skill-squatting techniques and evaluate the efficacy of such techniques. We find that while certain approaches are more favorable than others, there is no substantial abuse of skill squatting in the real world. Lastly, we study the prevalence of privacy policies across different categories of skill, and more importantly the policy content of skills that use the Alexa permission model to access sensitive user data. We find that around 23.3% of such skills do not fully disclose the data types associated with the permissions requested. We conclude by providing some suggestions for strengthening the overall ecosystem, and thereby enhance transparency for end-users.
Wenbo Ding (Clemson University), Hongxin Hu (University at Buffalo), Long Cheng (Clemson University)
The Internet of Things (IoT) platforms bring significant convenience for increased home automation. Especially, these platforms provide many new features for managing multiple IoT devices to control their physical surroundings. However, these features also bring new safety and security challenges. For example, an attacker can manipulate IoT devices to launch attacks through unexpected physical interactions. Unfortunately, very few existing research investigates the physical interactions among IoT devices and their impacts on IoT safety and security. In this paper, we propose a novel dynamic safety and security policy enforcement system called IoTSafe, which can capture and manage real physical interactions considering contextual features on smart home platforms. To identify real physical interactions of IoT devices, we present a runtime physical interaction discovery approach, which employs both static analysis and dynamic testing techniques to identify runtime physical interactions among IoT devices. In addition, IoTSafe generates physical and non-physical interaction paths and their context in a multi-app environment. Based on paths and context data, IoTSafe constructs physical models for temporal physical interactions, which can predict incoming risky situations and block unsafe device states accordingly. We implement a prototype of IoTSafe on the SmartThings platform. Our extensive evaluations demonstrate that IoTSafe effectively identifies 39 real physical interactions among 130 potential interactions in our experimental environment. IoTSafe also successfully predicts risky situations related to temporal physical interactions with nearly 96% accuracy and prevents highly risky conditions.
Haotian Chi (Temple University), Qiang Zeng (University of South Carolina), Xiaojiang Du (Temple University), Lannan Luo (University of South Carolina)
Internet of Things (IoT) platforms enable users to deploy home automation applications. Meanwhile, privacy issues arise as large amounts of sensitive device data flow out to IoT platforms. Most of the data flowing out to a platform actually do not trigger automation actions, while homeowners currently have no control once devices are bound to the platform. We present PFirewall, a customizable data-flow control system to enhance the privacy of IoT platform users. PFirewall automatically generates data-minimization policies, which only disclose minimum amount of data to fulfill automation. In addition, PFirewall provides interfaces for homeowners to customize individual privacy preferences by defining user-specified policies. To enforce these policies, PFirewall transparently intervenes and mediates the communication between IoT devices and the platform, without modifying the platform, IoT devices, or hub. Evaluation results on four real-world testbeds show that PFirewall reduces IoT data sent to the platform by 97% without impairing home automation, and effectively mitigates user-activity inference/tracking attacks and other privacy risks.
Guoming Zhang (Zhejiang University), Xiaoyu Ji (Zhejiang University), Xinfeng Li (Zhejiang University), Gang Qu (University of Maryland), Wenyuan Xu (Zhejing University)
DolphinAttacks (i.e., inaudible voice commands) modulate audible voices over ultrasounds to inject malicious commands silently into voice assistants and manipulate controlled systems (e.g., doors or smart speakers). Eliminating DolphinAttacks is challenging if ever possible since it requires to modify the microphone hardware. In this paper, we design EarArray, a lightweight method that can not only detect such attacks but also identify the direction of attackers without requiring any extra hardware or hardware modification. Essentially, inaudible voice commands are modulated on ultrasounds that inherently attenuate faster than the one of audible sounds. By inspecting the command sound signals via the built-in multiple microphones on smart devices, EarArray is able to estimate the attenuation rate and thus detect the attacks. We propose a model of the propagation of audible sounds and ultrasounds from the sound source to a voice assistant, e.g., a smart speaker, and illustrate the underlying principle and its feasibility. We implemented EarArray using two specially-designed microphone arrays and our experiments show that EarArray can detect inaudible voice commands with an accuracy of 99% and recognize the direction of the attackers with an accuracy of 97.89%.

07:30 - 08:50

Session 5B: Software Defenses

Chair: Lucas Davi, Zhenkai Liang

Ting Chen (University of Electronic Science and Technology of China), Rong Cao (University of Electronic Science and Technology of China), Ting Li (University of Electronic Science and Technology of China), Xiapu Luo (The Hong Kong Polytechnic University), Guofei Gu (Texas A&M University), Yufei Zhang (University of Electronic Science and Technology of China), Zhou Liao (University…
Smart contracts have become lucrative and profitable targets for attackers because they can hold a great amount of money. Although there are already many studies to discover the vulnerabilities in smart contracts, they can neither guarantee discovering all vulnerabilities nor protect the deployed smart ontracts against the attacks, because they rely on offline analysis. Recently, a few online protection approaches appeared but they only focus on specific attacks and cannot be easily extended to defend against other attacks. Developing a new online protection system for smart contracts from scratch is time-consuming and requires being familiar with the internals of smart contract runtime, thus making it difficult to quickly implement and deploy mechanisms to defend against new attacks.
In this paper, we propose a novel generic runtime protection framework named SPA for smart contracts on any blockchains that support Ethereum virtual machine (EVM). SPA distinguishes itself from existing online protection approaches through its capability, efficiency, and compatibility. First, SPA empowers users to easily develop and deploy protection apps for defending against various attacks by separating the information collection, attack detection and reaction with layered design. At the higher layer, SPA provides unified interfaces to develop protection apps gainst various attacks. At the lower layer, SPA instruments EVM to collect all primitive information necessary to detect various attacks and constructs 11 kinds of structural information for the ease of developing protection apps.
Based on SPA, users can develop new rotection apps in a few lines of code without modifying EVM and easily deploy them to the blockchain. Second, SPA is efficient, because we design on-demand information retrieval to reduce the overhead of information collection and adopt dynamic linking to eliminate the overhead of inter-process communication (IPC). It allows users to develop protection apps by using any programming languages that can generate dynamic link libraries (DLLs). Third, since more and more blockchains adopt EVM as smart contract runtime, SPA can be easily migrated to such blockchains without modifying the protection apps. Based on SPA, we develop 8 protection apps to defend against the attacks exploiting major vulnerabilities in smart contracts, and integrate SPA (including all protection apps) into 3 popular blockchains: Ethereum, Expanse and Wanchain. The extensive experimental results demonstrate the effectiveness and efficiency of SPA and our protection apps.
Min Zheng (Orion Security Lab, Alibaba Group), Xiaolong Bai (Orion Security Lab, Alibaba Group), Yajin Zhou (Zhejiang University), Chao Zhang (Institute for Network Science and Cyberspace, Tsinghua University), Fuping Qu (Orion Security Lab, Alibaba Group)
Apple devices (e.g., iPhone, MacBook, iPad, and Apple Watch) are high value targets for attackers. Although these devices use different operating systems (e.g., iOS, macOS, iPadOS, watchOS, and tvOS), they are all based on a hybrid kernel called XNU. Existing attacks demonstrated that vulnerabilities in XNU could be exploited to escalate privileges and jailbreak devices. To mitigate these threats, multiple security mechanisms have been deployed in latest systems.
In this paper, we first perform a systematic assessment of deployed mitigations by Apple, and demonstrate that most of them can be bypassed through corrupting a special type of kernel objects, i.e., Mach port objects. We summarize this type of attack as (Mach) Port Object-Oriented Programming (POP). Accordingly, we define multiple attack primitives to launch the attack and demonstrate realistic scenarios to achieve full memory manipulation on recently released systems (i.e., iOS 13 and macOS 10.15). To defend against POP, we propose the Port Ultra-SHield (PUSH) system to reduce the number of unprotected Mach port objects. Specifically, PUSH automatically locates potential POP primitives and instruments related system calls to enforce the integrity of Mach port kernel objects. It does not require system modifications and only introduces 2% runtime overhead. The PUSH framework has been deployed on more than 40,000 macOS devices in a leading company. The evaluation of 18 public exploits and one zero-day exploit detected by our system demonstrated the effectiveness of PUSH. We believe that the proposed framework will facilitate the design and implementation of a more secure XNU kernel.
Evan Johnson (University of California San Diego), David Thien (University of California San Diego), Yousef Alhessi (University of California San Diego), Shravan Narayan (University Of California San Diego), Fraser Brown (Stanford University), Sorin Lerner (University of California San Diego), Tyler McMullen (Fastly Labs), Stefan Savage (University of California San Diego), Deian Stefan (University of California…
WebAssembly (Wasm) is a platform-independent bytecode that offers both good performance and runtime isolation. To implement isolation, the compiler inserts safety checks when it compiles Wasm to native machine code. While this approach is cheap, it also requires trust in the compiler's correctness---trust that the compiler has inserted each necessary check, correctly formed, in each proper place. Unfortunately, subtle bugs in the Wasm compiler can break---and emph{have broken}---isolation guarantees. To address this problem, we propose verifying memory isolation of Wasm binaries post-compilation. We implement this approach in VeriWasm, a static offline verifier for native x86-64 binaries compiled from Wasm; we prove the verifier's soundness, and find that it can detect bugs with no false positives. Finally, we describe our deployment of VeriWasm at Fastly.
Navid Emamdoost (University of Minnesota), Qiushi Wu (University of Minnesota), Kangjie Lu (University of Minnesota), Stephen McCamant (University of Minnesota)
The kernel space is shared by hardware and all processes, so its memory usage is more limited, and memory is harder to reclaim, compared to user-space memory; as a result, memory leaks in the kernel can easily lead to high-impact denial of service. The problem is particularly critical in long-running servers. Kernel code makes heavy use of dynamic (heap) allocation, and many code modules within the kernel provide their own abstractions for customized memory management. On the other hand, the kernel code involves highly complicated data flow, so it is hard to determine where an object is supposed to be released. Given the complex and critical nature of OS kernels, as well as the heavy specialization, existing methods largely fail at effectively and thoroughly detecting kernel memory leaks.
In this paper, we present K-MELD, a static detection system for kernel memory leaks. K-MELD features multiple new techniques that can automatically identify specialized allocation/deallocation functions and determine the expected memory-release locations. Specifically, we first develop a usage- and structure-aware approach to effectively identify specialized allocation functions, and employ a new rule-mining approach to identify the corresponding deallocation functions. We then develop a new ownership reasoning mechanism that employs enhanced escape analysis and consumer-function analysis to infer expected release locations. By applying K-MELD to the Linux kernel, we confirm its effectiveness: it finds 218 new bugs, with 41 CVEs assigned. Out of those 218 bugs, 115 are in specialized modules.

07:30 - 08:50

Session 5C: Machine Learning

Chair: Saman Zonouz, Minhui (Jason) Xue

Jack P. K. Ma (The Chinese University of Hong Kong), Raymond K. H. Tai (The Chinese University of Hong Kong), Yongjun Zhao (Nanyang Technological University), Sherman S.M. Chow (The Chinese University of Hong Kong)
Decision trees are popular machine-learning classification models due to their simplicity and effectiveness. Tai et al. (ESORICS '17) propose a privacy-preserving decision-tree evaluation protocol purely based on additive homomorphic encryption, without introducing dummy nodes for hiding the tree structure, but it runs a secure comparison for each decision node, resulting in linear complexity. Later protocols (DBSEC '18, PETS '19) achieve sublinear (client-side) complexity, yet the server-side path evaluation requires oblivious transfer among 2d2^d2d real and dummy nodes even for a sparse tree of depth ddd to hide the tree structure.
This paper aims for the best of both worlds and hence the most lightweight protocol to date. Our complete-tree protocol can be easily extended to the sparse-tree setting and the reusable outsourcing setting: a model owner (resp. client) can outsource the decision tree (resp. attributes) to two non-colluding servers for classifications. The outsourced extension supports multi-client joint evaluation, which is the first of its kind without using multi-key fully-homomorphic encryption (TDSC '19). We also extend our protocol for achieving privacy against malicious adversaries.
Our experiments compare in various network settings our offline and online communication costs and the online computation time with the prior sublinear protocol of Tueno et al. (PETS '19) and O(1)O(1)O(1)-round linear protocols of Kiss et al. (PETS '19), which can be seen as garbled circuit variants of Tai et al.'s. Our protocols are shown to be desirable for IoT-like scenarios with weak clients and big-data scenarios with high-dimensional feature vectors.
Bo Hui (The Johns Hopkins University), Yuchen Yang (The Johns Hopkins University), Haolin Yuan (The Johns Hopkins University), Philippe Burlina (The Johns Hopkins University Applied Physics Laboratory), Neil Zhenqiang Gong (Duke University), Yinzhi Cao (The Johns Hopkins University)
Membership inference (MI) attacks affect user privacy by inferring whether given data samples have been used to train a target learning model, e.g., a deep neural network. There are two types of MI attacks in the literature, i.e., these with and without shadow models. The success of the former heavily depends on the quality of the shadow model, i.e., the transferability between the shadow and the target; the latter, given only blackbox probing access to the target model, cannot make an effective inference of unknowns, compared with MI attacks using shadow models, due to the insufficient number of qualified samples labeled with ground truth membership information.
In this paper, we propose an MI attack, called BLINDMI, which probes the target model and extracts membership semantics via a novel approach, called differential comparison. The high-level idea is that BLINDMI first generates a dataset with nonmembers via transforming existing samples into new samples, and then differentially moves samples from a target dataset to the generated, non-member set in an iterative manner. If the differential move of a sample increases the set distance, BLINDMI considers the sample as non-member and vice versa.
BLINDMI was evaluated by comparing it with state-of-the-art MI attack algorithms. Our evaluation shows that BLINDMI improves F1-score by nearly 20% when compared to state-of-the-art on some datasets, such as Purchase-50 and Birds-200, in the blind setting where the adversary does not know the target model’s architecture and the target dataset’s ground truth labels. We also show that BLINDMI can defeat state-of-the-art defenses.
Qiao Zhang (Old Dominion University), Chunsheng Xin (Old Dominion University), Hongyi Wu (Old Dominion University)
Machine Learning as a Service (MLaaS) is enabling a wide range of smart applications on end devices. However, privacy still remains a fundamental challenge. The schemes that exploit Homomorphic Encryption (HE)-based linear computations and Garbled Circuit (GC)-based nonlinear computations have demonstrated superior performance to enable privacy-preserved MLaaS. Nevertheless, there is still a significant gap in the computation speed. Our investigation has found that the HE-based linear computation dominates the total computation time for state-of-the-art deep neural networks. Furthermore, the most time-consuming component of the HE-based linear computation is a series of Permutation (Perm) operations that are imperative for dot product and convolution in privacy-preserved MLaaS. This work focuses on a deep optimization of the HE-based linear computations to minimize the Perm operations, thus substantially reducing the overall computation time. To this end, we propose GALA: Greedy computAtion for Linear Algebra in privacy-preserved neural networks, which views the HE-based linear computation as a series of Homomorphic Add, Mult and Perm operations and chooses the least expensive operation in each linear computation step to reduce the overall cost. GALA makes the following contributions: (1) It introduces a row-wise weight matrix encoding and combines the share generation that is needed for the GC-based nonlinear computation, to reduce the Perm operations for the dot product; (2) It designs a firstAdd-second-Perm approach (named kernel grouping) to reduce Perm operations for convolution. As such, GALA efficiently reduces the cost for the HE-based linear computation, which is a critical building block in almost all of the recent frameworks for privacy-preserved neural networks, including GAZELLE (Usenix Security’18), DELPHI (Usenix Security’20), and CrypTFlow2 (CCS’20). With its deep optimization of the HE-based linear computation, GALA can be a plug-and-play module integrated into these systems to further boost their efficiency. Our experiments show that it achieves a significant speedup up to 700× for the dot product and 14× for the convolution computation under different data dimensions. Meanwhile, GALA demonstrates an encouraging runtime boost by 2.5×, 2.7×, 3.2×, 8.3×, 7.7×, and 7.5× over GAZELLE and 6.5×, 6×, 5.7×, 4.5×, 4.2×, and 4.1× over CrypTFlow2, on AlexNet, VGG, ResNet-18, ResNet-50, ResNet-101, and ResNet-152, respectively.
Junjie Liang (The Pennsylvania State University), Wenbo Guo (The Pennsylvania State University), Tongbo Luo (Robinhood), Vasant Honavar (The Pennsylvania State University), Gang Wang (University of Illinois at Urbana-Champaign), Xinyu Xing (The Pennsylvania State University)
Supervised machine learning classifiers have been widely used for attack detection, but their training requires abundant high-quality labels. Unfortunately, high-quality labels are difficult to obtain in practice due to the high cost of data labeling and the constant evolution of attackers. Without such labels, it is challenging to train and deploy targeted countermeasures.
In this paper, we propose FARE, a clustering method to enable fine-grained attack categorization under low-quality labels. We focus on two common issues in data labels: 1) missing labels for certain attack classes or families; and 2) only having coarse-grained labels available for different attack types. The core idea of FARE is to take full advantage of the limited labels while using the underlying data distribution to consolidate the low-quality labels. We design an ensemble model to fuse the results of multiple unsupervised learning algorithms with the given labels to mitigate the negative impact of missing classes and coarse-grained labels. We then train an input transformation network to map the input data into a low-dimensional latent space for fine-grained clustering. Using two security datasets (Android malware and network intrusion traces), we show that FARE significantly outperforms the state-of-the-art (semi-)supervised learning methods in clustering quality/correctness. Further, we perform an initial deployment of FARE by working with a large e-commerce service to detect fraudulent accounts. With real-world A/B tests and manual investigation, we demonstrate the effectiveness of FARE to catch previously-unseen frauds.

09:10 - 10:30

Session 6A: Fuzzing

Chair: Kangjie Lu, Sazzadur Rahaman

Hyungsub Kim (Purdue University), Muslum Ozgur Ozmen (Purdue University), Antonio Bianchi (Purdue University), Z. Berkay Celik (Purdue University), Dongyan Xu (Purdue University)
Robotic vehicles (RVs) are becoming essential tools of modern systems, including autonomous delivery services, public transportation, and environment monitoring. Despite their diverse deployment, safety and security issues with RVs limit their wide adoption. Most attempts to date in RV security aim to propose defenses that harden their control program against syntactic bugs, input validation bugs, and external sensor spoofing attacks. In this paper, we introduce PGFUZZ, a policy-guided fuzzing framework, which validates whether an RV adheres to identified safety and functional policies that cover user commands, configuration parameters, and physical states. PGFUZZ expresses desired policies through temporal logic formulas with time constraints as a guide to fuzz the analyzed system. Specifically, it generates fuzzing inputs that minimize a distance metric measuring ``how close'' the RV current state is to a policy violation. In addition, it uses static and dynamic analysis to focus the fuzzing effort only on those commands, parameters, and environmental factors that influence the ``truth value'' of any of the exercised policies. The combination of these two techniques allows PGFUZZ to increase the efficiency of the fuzzing process significantly. We validate PGFUZZ on three RV control programs, ArduPilot, PX4, and Paparazzi, with 56 unique policies. PGFUZZ discovered 156 previously unknown bugs, 106 of which have been acknowledged by their developers.
Sung Ta Dinh (Arizona State University), Haehyun Cho (Arizona State University), Kyle Martin (North Carolina State University), Adam Oest (PayPal, Inc.), Kyle Zeng (Arizona State University), Alexandros Kapravelos (North Carolina State University), Gail-Joon Ahn (Arizona State University and Samsung Research), Tiffany Bao (Arizona State University), Ruoyu Wang (Arizona State University), Adam Doupe (Arizona State University),…
JavaScript runtime systems include some specialized programming interfaces, called binding layers. Binding layers translate data representations between JavaScript and unsafe low-level languages, such as C and C++, by converting data between different types. Due to the wide adoption of JavaScript (and JavaScript engines) in the entire computing ecosystem, discovering bugs in JavaScript binding layers is critical. Nonetheless, existing JavaScript fuzzers cannot adequately fuzz binding layers due to two major challenges: Generating syntactically and semantically correct test cases, and reducing the size of the input space for fuzzing.
In this paper, we propose Favocado, a novel fuzzing approach that focuses on fuzzing binding layers of JavaScript runtime systems. Favocado can generate syntactically and semantically correct JavaScript test cases through the use of extracted semantic information and careful maintaining of execution states. This way, test cases that Favocado generates do not raise unintended runtime exceptions, which substantially increases the chance of triggering binding code. Additionally, exploiting a unique feature (relative isolation) of binding layers, Favocado significantly reduces the size of the fuzzing input space by splitting DOM objects into equivalence classes and focusing fuzzing within each equivalence class.
We demonstrate the effectiveness of Favocado in our experiments and show that Favocado outperforms another state-of-the-art DOM fuzzer and discovers six times more bugs. Finally, during the evaluation, we find 61 previously unknown bugs in four JavaScript runtime systems (Adobe Acrobat Reader, Foxit PDF Reader, Chromium, and WebKit). 33 of these bugs are security vulnerabilities.
Jinho Jung (Georgia Institute of Technology), Stephen Tong (Georgia Institute of Technology), Hong Hu (Pennsylvania State University), Jungwon Lim (Georgia Institute of Technology), Yonghwi Jin (Georgia Institute of Technology), Taesoo Kim (Georgia Institute of Technology)
Fuzzing is an emerging technique to automatically validate programs and uncover bugs. It has been widely used to test many programs and has found thousands of security vulnerabilities. However, existing fuzzing efforts are mainly centered around Unix-like systems, as Windows imposes unique challenges for fuzzing: a closed-source ecosystem, the heavy use of graphical interfaces and the lack of fast process cloning machinery.
In this paper, we propose two solutions to address the challenges Windows fuzzing faces. Our system, WINNIE, first tries to synthesize a harness for the application, a simple program that directly invokes target functions, based on sample executions. It then tests the harness, instead of the original complicated program, using an efficient implementation of fork on Windows. Using these techniques, WINNIE can bypass irrelevant GUI code to test logic deep within the application. We used WINNIE to fuzz 59 closed-source Windows binaries, and it successfully generated valid fuzzing harnesses for all of them. In our evaluation, WINNIE can support 2.2x more programs than existing Windows fuzzers could, and identified 3.9x more program states and achieved 26.6x faster execution. In total, WINNIE found 61 unique bugs in 32 Windows binaries.
Jinghan Wang (University of California, Riverside), Chengyu Song (University of California, Riverside), Heng Yin (University of California, Riverside)
Coverage metrics play an essential role in greybox fuzzing. Recent work has shown that fine-grained coverage metrics could allow a fuzzer to detect bugs that cannot be covered by traditional edge coverage. However, fine-grained coverage metrics will also select more seeds, which cannot be efficiently scheduled by existing algorithms. This work addresses this problem by introducing a new concept of multi-level coverage metric and the corresponding reinforcement-learning-based hierarchical scheduler. Evaluation of our prototype on DARPA CGC showed that our approach outperforms AFL and AFLFast significantly: it can detect 20% more bugs, achieve higher coverage on 83 out of 180 challenges, and achieve the same coverage on 60 challenges. More importantly, it can detect the same number of bugs and achieve the same coverage faster. On FuzzBench, our approach achieves higher coverage than AFL++ (Qemu) on 10 out of 20 projects.

09:10 - 10:30

Session 6B: Embedded Security

Chair: Lejla Batina, Hamed Okhravi

Rohit Bhatia (Purdue University), Vireshwar Kumar (Indian Institute of Technology Delhi), Khaled Serag (Purdue University), Z. Berkay Celik (Purdue University), Mathias Payer (EPFL), Dongyan Xu (Purdue University)
The controller area network (CAN) is widely adopted in modern automobiles to enable communications among in-vehicle electronic control units (ECUs). Lacking mainstream network security capabilities due to resource constraints, the CAN is susceptible to the ECU masquerade attack in which a compromised (attacker) ECU impersonates an uncompromised (victim) ECU and spoofs the latter’s CAN messages. A cost-effective state-of-the-art defense against such attacks is the CAN bus voltage-based intrusion detection system (VIDS), which identifies the source of each message using its voltage fingerprint on the bus. Since the voltage fingerprint emanates from an ECU's hardware characteristics, an attacker ECU by itself cannot controllably modify it. As such, VIDS has been proved effective in detecting masquerade attacks that each involve a single attacker.
In this paper, we discover a novel voltage corruption tactic that leverages the capabilities of two compromised ECUs (i.e., an attacker ECU working in tandem with an accomplice ECU) to corrupt the bus voltages recorded by the VIDS. By exploiting this tactic along with the fundamental deficiencies of the CAN protocol, we propose a novel masquerade attack called DUET, which evades all existing VIDS irrespective of the features and classification algorithms employed in them. DUET follows a two-stage attack strategy to first manipulate a victim ECU’s voltage fingerprint during VIDS retraining mode, and then impersonate the manipulated fingerprint during VIDS operation mode. Our evaluation of DUET on real CAN buses (including three in two real cars) demonstrates an impersonation success rate of at least 90% in evading two state-of-the-art VIDS.
Finally, to mitigate ECU masquerade attacks, we advocate the development of cost-effective defenses that break away from the "attack vs. IDS" arms race. We propose a lightweight defense called RAID, which enables each ECU to make protocol-compatible modifications in its frame format generating a unique dialect (spoken by ECUs) during VIDS retraining mode. RAID prevents corruption of ECUs’ voltage fingerprints, and re-enables VIDS to detect all ECU masquerade attacks including DUET.
Christian Niesler (University of Duisburg-Essen), Sebastian Surminski (University of Duisburg-Essen), Lucas Davi (University of Duisburg-Essen)
Memory corruption attacks are a pre-dominant attack vector against IoT devices. Simply updating vulnerable IoT software is not always possible due to unacceptable downtime and a required reboot. These side-effects must be avoided for highly-available embedded systems such as medical devices and, generally speaking, for any embedded system with real-time constraints.
To avoid downtime and reboot of a system, previous research has introduced the concept of hotpatching. However, the existing approaches cannot be applied to resource-constrained IoT devices. Furthermore, possible hardware-related issues have not been addressed, i.e., the inability to directly modify the firmware image due to read-only memory.
In this paper, we present the design and implementation of HERA (Hotpatching of Embedded Real-time Applications) which utilizes hardware-based built-in features of commodity Cortex-M microcontrollers to perform hotpatching of embedded systems. HERA preserves hard real-time constraints while keeping the additional resource usage to a minimum. In a case study, we apply HERA to two vulnerable medical devices. Furthermore, we leverage HERA to patch an existing vulnerability in the FreeRTOS operating system. These applications demonstrate the high practicality and efficiency of our approach.
Wenqiang Li (State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences; Department of Computer Science, the University of Georgia, USA; School of Cyber Security, University of Chinese Academy of Sciences; Department of Electrical Engineering and Computer Science, the University of Kansas, USA), Le Guan (Department of Computer Science, the University…
Finding bugs in microcontroller (MCU) firmware is challenging, even for device manufacturers who own the source code. The MCU runs different instruction sets than x86 and exposes a very different development environment. This invalidates many existing sophisticated software testing tools on x86. To maintain a unified developing and testing environment, a straightforward way is to re-compile the source code into the native executable for a commodity machine (called rehosting). However, ad-hoc re-hosting is a daunting and tedious task and subject to many issues (library-dependence, kernel-dependence and hardware-dependence). In this work, we systematically explore the portability problem of MCU software and propose para-rehosting to ease the porting process. Specifically, we abstract and implement a portable MCU (PMCU) using the POSIX interface. It models common functions of the MCU cores. For peripheral specific logic, we propose HAL-based peripheral function replacement, in which high-level hardware functions are replaced with an equivalent backend driver on the host. These backend drivers are invoked by well-designed para-APIs and can be reused across many MCU OSs. We categorize common HAL functions into four types and implement templates for quick backend development. Using the proposed approach, we have successfully rehosted nine MCU OSs including the widely deployed Amazon FreeRTOS, ARM Mbed OS, Zephyr and LiteOS. To demonstrate the superiority of our approach in terms of security testing, we used off-the-shelf dynamic analysis tools (AFL and ASAN) against the rehosted programs and discovered 28 previously-unknown bugs, among which 5 were confirmed by CVE and the other 19 were confirmed by vendors at the time of writing.
Eunsoo Kim (KAIST), Dongkwan Kim (KAIST), CheolJun Park (KAIST), Insu Yun (KAIST), Yongdae Kim (KAIST)
Cellular basebands play a crucial role in mobile communication. However, it is significantly challenging to assess their security for several reasons. Manual analysis is inevitable because of the obscurity and complexity of baseband firmware; however, such analysis requires repetitive efforts to cover diverse models or versions. Automating the analysis is also non-trivial because the firmware is significantly large and contains numerous functions associated with complex cellular protocols. Therefore, existing approaches on baseband analysis are limited to only a couple of models or versions within a single vendor. In this paper, we propose a novel approach named BaseSpec, which performs a comparative analysis of baseband software and cellular specifications. By leveraging the standardized message structures in the specification, BaseSpec inspects the message structures implemented in the baseband software systematically. It requires a manual yet one-time analysis effort to determine how the message structures are embedded in target firmware. Then, BaseSpec compares the extracted message structures with those in the specification syntactically and semantically, and finally, it reports mismatches. These mismatches indicate the developer mistakes, which break the compliance of the baseband with the specification, or they imply potential vulnerabilities. We evaluated BaseSpec with 18 baseband firmware images of 9 models from one of the top three vendors and found hundreds of mismatches. By analyzing these mismatches, we discovered 9 erroneous cases: 5 functional errors and 4 memory-related vulnerabilities. Notably, two of these are critical remote code execution 0-days. Moreover, we applied BaseSpec to 3 models from another vendor, and BaseSpec found multiple mismatches, two of which led us to discover a buffer overflow bug.

09:10 - 10:30

Session 6C: Federated Learning and Poisoning attacks

Chair: Saman Zonouz, Neil Gong

Sinem Sav (EPFL), Apostolos Pyrgelis (EPFL), Juan Ramón Troncoso-Pastoriza (EPFL), David Froelicher (EPFL), Jean-Philippe Bossuat (EPFL), Joao Sa Sousa (EPFL), Jean-Pierre Hubaux (EPFL)
In this paper, we address the problem of privacy-preserving training and evaluation of neural networks in an NNN-party, federated learning setting. We propose a novel system, POSEIDON, the first of its kind in the regime of privacy-preserving neural network training. It employs multiparty lattice-based cryptography to preserve the confidentiality of the training data, the model, and the evaluation data, under a passive-adversary model and collusions between up to N−1N-1N−1 parties. To efficiently execute the secure backpropagation algorithm for training neural networks, we provide a generic packing approach that enables Single Instruction, Multiple Data (SIMD) operations on encrypted data. We also introduce arbitrary linear transformations within the cryptographic bootstrapping operation, optimizing the costly cryptographic computations over the parties, and we define a constrained optimization problem for choosing the cryptographic parameters. Our experimental results show that POSEIDON achieves accuracy similar to centralized or decentralized non-private approaches and that its computation and communication overhead scales linearly with the number of parties. POSEIDON trains a 3-layer neural network on the MNIST dataset with 784 features and 60K samples distributed among 10 parties in less than 2 hours.
Xiaoyu Cao (Duke University), Minghong Fang (The Ohio State University), Jia Liu (The Ohio State University), Neil Zhenqiang Gong (Duke University)
Byzantine-robust federated learning aims to enable a service provider to learn an accurate global model when a bounded number of clients are malicious. The key idea of existing Byzantine-robust federated learning methods is that the service provider performs statistical analysis among the clients' local model updates and removes suspicious ones, before aggregating them to update the global model. However, malicious clients can still corrupt the global models in these methods via sending carefully crafted local model updates to the service provider. The fundamental reason is that there is no root of trust in existing federated learning methods, i.e., from the service provider's perspective, every client could be malicious.
In this work, we bridge the gap via proposing emph{FLTrust}, a new federated learning method in which the service provider itself bootstraps trust. In particular, the service provider itself collects a clean small training dataset (called emph{root dataset}) for the learning task and the service provider maintains a model (called emph{server model}) based on it to bootstrap trust. In each iteration, the service provider first assigns a trust score to each local model update from the clients, where a local model update has a lower trust score if its direction deviates more from the direction of the server model update. Then, the service provider normalizes the magnitudes of the local model updates such that they lie in the same hyper-sphere as the server model update in the vector space. Our normalization limits the impact of malicious local model updates with large magnitudes. Finally, the service provider computes the average of the normalized local model updates weighted by their trust scores as a global model update, which is used to update the global model. Our extensive evaluations on six datasets from different domains show that our FLTrust is secure against both existing attacks and strong adaptive attacks. For instance, using a root dataset with less than 100 examples, FLTrust under adaptive attacks with 40%-60% of malicious clients can still train global models that are as accurate as the global models trained by FedAvg under no attacks, where FedAvg is a popular method in non-adversarial settings.
Virat Shejwalkar (UMass Amherst), Amir Houmansadr (UMass Amherst)
Federated learning (FL) enables many data owners (e.g., mobile devices) to train a joint ML model (e.g., a next-word prediction classifier) without the need of sharing their private training data.
However, FL is known to be susceptible to poisoning attacks by malicious participants (e.g., adversary-owned mobile devices) who aim at hampering the accuracy of the jointly trained model through sending malicious inputs during the federated training process.
In this paper, we present a generic framework for model poisoning attacks on FL. We show that our framework leads to poisoning attacks that substantially outperform state-of-the-art model poisoning attacks by large margins. For instance, our attacks result in 1.5times1.5times1.5times to 60times60times60times higher reductions in the accuracy of FL models compared to previously discovered poisoning attacks.
Our work demonstrates that existing Byzantine-robust FL algorithms are significantly more susceptible to model poisoning than previously thought. Motivated by this, we design a defense against FL poisoning, called emph{divide-and-conquer} (DnC). We demonstrate that DnC outperforms all existing Byzantine-robust FL algorithms in defeating model poisoning attacks,
specifically, it is 2.5times2.5times2.5times to 12times12times12times more resilient in our experiments with different datasets and models.
Hai Huang (Tsinghua University), Jiaming Mu (Tsinghua University), Neil Zhenqiang Gong (Duke University), Qi Li (Tsinghua University), Bin Liu (West Virginia University), Mingwei Xu (Tsinghua University)
Recommender systems play a crucial role in helping users to find their interested information in various web services such as Amazon, YouTube, and Google News. Various recommender systems, ranging from neighborhood-based, association-rule-based, matrix-factorization-based, to deep learning based, have been developed and deployed in industry. Among them, deep learning based recommender systems become increasingly popular due to their superior performance.
In this work, we conduct the first systematic study on data poisoning attacks to deep learning based recommender systems. An attacker's goal is to manipulate a recommender system such that the attacker-chosen target items are recommended to many users. To achieve this goal, our attack injects fake users with carefully crafted ratings to a recommender system. Specifically, we formulate our attack as an optimization problem, such that the injected ratings would maximize the number of normal users to whom the target items are recommended. However, it is challenging to solve the optimization problem because it is a non-convex integer programming problem. To address the challenge, we develop multiple techniques to approximately solve the optimization problem. Our experimental results on three real-world datasets, including small and large datasets, show that our attack is effective and outperforms existing attacks. Moreover, we attempt to detect fake users via statistical analysis of the rating patterns of normal and fake users. Our results show that our attack is still effective and outperforms existing attacks even if such a detector is deployed.

10:50 - 12:10

Session 7A: Forensics and Audits

Chair: Zhou Li, Markus Miettinen

Yonghwi Kwon (University of Virginia), Weihang Wang (University at Buffalo, SUNY), Jinho Jung (Georgia Institute of Technology), Kyu Hyung Lee (University of Georgia), Roberto Perdisci (Georgia Institute of Technology and University of Georgia)
Cybercrime scene reconstruction that aims to reconstruct a previous execution of the cyber attack delivery process is an important capability for cyber forensics (e.g., post mortem analysis of the cyber attack executions). Unfortunately, existing techniques such as log-based forensics or record-and-replay techniques are not suitable to handle complex and long-running modern applications for cybercrime scene reconstruction and post mortem forensic analysis. Specifically, log-based cyber forensics techniques often suffer from a lack of inspection capability and do not provide details of how the attack unfolded. Record-and-replay techniques impose significant runtime overhead, often require significant modifications on end-user systems, and demand to replay the entire recorded execution from the beginning. In this paper, we propose C^2SR, a novel technique that can reconstruct an attack delivery chain (i.e., cybercrime scene) for post-mortem forensic analysis. It provides a highly desired capability: interactable partial execution reconstruction. In particular, it reproduces a partial execution of interest from a large execution trace of a long-running program. The reconstructed execution is also interactable, allowing forensic analysts to leverage debugging and analysis tools that did not exist on the recorded machine. The key intuition behind C^2SR is partitioning an execution trace by resources and reproducing resource accesses that are consistent with the original execution. It tolerates user interactions required for inspections that do not cause inconsistent resource accesses. Our evaluation results on 26 real-world programs show that C^2SR has low runtime overhead (less than 5.47%) and acceptable space overhead. We also demonstrate with four realistic attack scenarios that C^2SR successfully reconstructs partial executions of long-running applications such as web browsers, and it can remarkably reduce the user's efforts to understand the incident.
Le Yu (Purdue University), Shiqing Ma (Rutgers University), Zhuo Zhang (Purdue University), Guanhong Tao (Purdue University), Xiangyu Zhang (Purdue University), Dongyan Xu (Purdue University), Vincent E. Urias (Sandia National Laboratories), Han Wei Lin (Sandia National Laboratories), Gabriela Ciocarlie (SRI International), Vinod Yegneswaran (SRI International), Ashish Gehani (SRI International)
Cyber-attacks are becoming more persistent and complex. Most state-of-the-art attack forensics techniques either require annotating and instrumenting software applications or rely on high quality execution profile to serve as the basis for anomaly detection. We propose a novel attack forensics technique ALchemist. It is based on the observations that built-in application logs provide critical high-level semantics and audit log provides low-level fine-grained information; and the two share a lot of common elements. ALchemist is hence a log fusion technique that couples application logs and audit log to derive critical attack information invisible in either log. It is based on a relational reasoning engine Datalog and features the capabilities of inferring new relations such as the task structure of execution(e.g., tabs in firefox), especially in the presence of complex asynchronous execution models, and high-level dependencies between log events. Our evaluation on 15 popular applications including firefox, Chromium, and OpenOffice, and 14 APT attacks from the literature demonstrates that although ALchemist does not require instrumentation, it is highly effective in partitioning execution to autonomous tasks(in order to avoid bogus dependencies) and deriving precise attack provenance graphs, with very small overhead. It also outperforms NoDoze and OmegaLog, two state-of-art techniques that do not require instrumentation.
Jun Zeng (National University of Singapore), Zheng Leong Chua (Independent Researcher), Yinfang Chen (National University of Singapore), Kaihang Ji (National University of Singapore), Zhenkai Liang (National University of Singapore), Jian Mao (Beihang University)
Endpoint monitoring solutions are widely deployed in today’s enterprise environments to support advanced attack detection and investigation. These monitors continuously record system-level activities as audit logs and provide deep visibility into security incidents. Unfortunately, to recognize behaviors of interest and detect potential threats, cyber analysts face a semantic gap between low-level audit events and high-level system behaviors. To bridge this gap, existing work largely matches streams of audit logs against a knowledge base of rules that describe behaviors. However, specifying such rules heavily relies on expert knowledge. In this paper, we present Watson, an automated approach to abstracting behaviors by inferring and aggregating the semantics of audit events. Watson uncovers the semantics of events through their usage context in audit logs. By extracting behaviors as connected system operations, Watson then combines event semantics as the representation of behaviors. To reduce analysis workload, Watson further clusters semantically similar behaviors and distinguishes the representatives for analyst investigation. In our evaluation against both benign and malicious behaviors, Watson exhibits high accuracy for behavior abstraction. Moreover, Watson can reduce analysis workload by two orders of magnitude for attack investigation.

10:50 - 12:10

Session 7B: Trusted Computing

Chair: Hamed Okhravi, Marcus Peinado

Hyun Bin Lee (University of Illinois at Urbana-Champaign), Tushar M. Jois (Johns Hopkins University), Christopher W. Fletcher (University of Illinois at Urbana-Champaign), Carl A. Gunter (University of Illinois at Urbana-Champaign)
Users can improve the security of remote communications by using Trusted Execution Environments (TEEs) to protect against direct introspection and tampering of sensitive data. This can even be done with applications coded in high-level languages with complex programming stacks such as R, Python, and Ruby. However, this creates a trade-off between programming convenience versus the risk of attacks using microarchitectural side channels.
In this paper, we argue that it is possible to address this problem for important applications by instrumenting a complex programming environment (like R) to produce a Data-Oblivious Transcript (DOT) that is explicitly designed to support computation that excludes side channels. Such a transcript is then evaluated on a Trusted Execution Environment (TEE) containing the sensitive data using a small trusted computing base called the Data-Oblivious Virtual Environment (DOVE).
To motivate the problem, we demonstrate a number of subtle side-channel vulnerabilities in the R language. We then provide an illustrative design and implementation of DOVE for R, creating the first side-channel resistant R programming stack. We demonstrate that the two-phase architecture provided by DOT generation and DOVE evaluation can provide practical support for complex programming languages with usable performance and high security assurances against side channels.
Adil Ahmad (Purdue University), Juhee Kim (Seoul National University), Jaebaek Seo (Google), Insik Shin (KAIST), Pedro Fonseca (Purdue University), Byoungyoung Lee (Seoul National University)
Intel SGX aims to provide the confidentiality of user data on untrusted cloud machines. However, applications that process confidential user data may contain bugs that leak information or be programmed maliciously to collect user data. Existing research that attempts to solve this problem does not consider multi-client isolation in a single enclave. We show that by not supporting such isolation, they incur considerable slowdown when concurrently processing multiple clients in different processes, due to the limitations of SGX.
This paper proposes CHANCEL, a sandbox designed for multi-client isolation within a single SGX enclave. In particular, CHANCEL allows a program’s threads to access both a per-thread memory region and a shared read-only memory region while servicing requests. Each thread handles requests from a single client at a time and is isolated from other threads, using a Multi-Client Software Fault Isolation (MCSFI) scheme. Furthermore, CHANCEL supports various in-enclave services such as an in-memory file system and shielded client communication to ensure complete mediation of the program’s interactions with the outside world. We implemented CHANCEL and evaluated it on SGX hardware using both micro-benchmarks and realistic target scenarios, including private information retrieval and product recommendation services. Our results show that CHANCEL outperforms a baseline multi-process sandbox between 4.06−53.70× on micro-benchmarks and 0.02 − 21.18× on realistic workloads while providing strong security guarantees.
Rongzhen Cui (University of Toronto), Lianying Zhao (Carleton University), David Lie (University of Toronto)
There has been interest in mechanisms that enable the secure use of legacy code to implement trusted code in a Trusted Execution Environment (TEE), such as Intel SGX. However, because legacy code generally assumes the presence of an operating system, this naturally raises the spectre of Iago attacks on the legacy code. We observe that not all legacy code is vulnerable to Iago attacks and that legacy code must use return values from system calls in an unsafe way to have Iago vulnerabilities.
Based on this observation, we develop Emilia, which automatically detects Iago vulnerabilities in legacy applications by fuzzing applications using system call return values. We use Emilia to discover 51 Iago vulnerabilities in 17 applications, and find that Iago vulnerabilities are widespread and common. We conduct an in-depth analysis of the vulnerabilities we found and conclude that while common, the majority (82.4%) can be mitigated with simple, stateless checks in the system call forwarding layer, while the rest are best fixed by finding and patching them in the legacy code. Finally, we study and evaluate different trade-offs in the design of Emilia.

10:50 - 12:10

Session 7C: Machine Learning Applications

Chair: Anupam Das, Stjepan Picek

Hieu Le (University of California, Irvine), Athina Markopoulou (University of California, Irvine), Zubair Shafiq (University of California, Davis)
The adblocking arms race has escalated over the last few years. An entire new ecosystem of circumvention (CV) services has recently emerged that aims to bypass adblockers by obfuscating site content, making it difficult for adblocking filter lists to distinguish between ads and functional content. In this paper, we investigate recent anti-circumvention efforts by the adblocking community that leverage custom filter lists. In particular, we analyze the anti-circumvention filter list (ACVL), which supports advanced filter rules with enriched syntax and capabilities designed specifically to counter circumvention. We show that keeping ACVL rules up-to-date requires expert list curators to continuously monitor sites known to employ CV services and to discover new such sites in the wild — both tasks require considerable manual effort. To help automate and scale ACVL curation, we develop CV-INSPECTOR, a machine learning approach for automatically detecting adblock circumvention using differential execution analysis. We show that CV-INSPECTOR achieves 93% accuracy in detecting sites that successfully circumvent adblockers. We deploy CV-INSPECTOR on top-20K sites to discover the sites that employ circumvention in the wild.We further apply CV-INSPECTOR to a list of sites that are known to utilize circumvention and are closely monitored by ACVL authors. We demonstrate that CV-INSPECTOR reduces the human labeling effort by 98%, which removes a major bottleneck for ACVL authors. Our work is the first large-scale study of the state of the adblock circumvention arms race, and makes an important step towards automating anti-CV efforts.
Diogo Barradas (INESC-ID, Instituto Superior Técnico, Universidade de Lisboa), Nuno Santos (INESC-ID, Instituto Superior Técnico, Universidade de Lisboa), Luis Rodrigues (INESC-ID, Instituto Superior Técnico, Universidade de Lisboa), Salvatore Signorello (LASIGE, Faculdade de Ciências, Universidade de Lisboa), Fernando M. V. Ramos (INESC-ID, Instituto Superior Técnico, Universidade de Lisboa), André Madeira (INESC-ID, Instituto Superior Técnico, Universidade de…
An emerging trend in network security consists in the adoption of programmable switches for performing various security tasks in large-scale, high-speed networks. However, since existing solutions are tailored to specific tasks, they cannot accommodate a growing variety of ML-based security applications, i.e., security-focused tasks that perform targeted flow classification based on packet size or inter-packet frequency distributions with the help of supervised machine learning algorithms. We present FlowLens, a system that leverages programmable switches to efficiently support multi-purpose ML-based security applications. FlowLens collects features of packet distributions at line speed and classifies flows directly on the switches, enabling network operators to re-purpose this measurement primitive at run-time to serve a different flow classification task. To cope with the resource constraints of programmable switches, FlowLens computes for each flow a memory-efficient representation of relevant features, named ``flow marker''. Despite its small size, a flow marker contains enough information to perform accurate flow classification. Since flow markers are highly customizable and application-dependent, FlowLens can automatically parameterize the flow marker generation guided by a multi-objective optimization process that can balance their size and accuracy. We evaluated our system in three usage scenarios: covert channel detection, website fingerprinting, and botnet chatter detection. We find that very small markers enable FlowLens to achieve a 150 fold increase in monitoring capacity for covert channel detection with an accuracy drop of only 3% when compared to collecting full packet distributions.
Sebastian Zimmeck (Wesleyan University), Rafael Goldstein (Wesleyan University), David Baraka (Wesleyan University)
Various privacy laws require mobile apps to have privacy policies. Questionnaire-based policy generators are intended to help developers with the task of policy creation. However, generated policies depend on the generators' designs as well as developers' abilities to correctly answer privacy questions on their apps. In this study we show that policies generated with popular policy generators are often not reflective of apps' privacy practices. We believe that policy generation can be improved by supplementing the questionnaire-based approach with code analysis. We design and implement PrivacyFlash Pro, a privacy policy generator for iOS apps that leverages static analysis. PrivacyFlash Pro identifies code signatures --- composed of Plist permission strings, framework imports, class instantiations, authorization methods, and other evidence --- that are mapped to privacy practices expressed in privacy policies. Resources from package managers are used to identify libraries.
We tested PrivacyFlash Pro in a usability study with 40 iOS app developers and received promising results both in terms of reliably identifying apps' privacy practices as well as on its usability. We measured an F-1 score of 0.95 for identifying permission uses. 24 of 40 developers rated PrivacyFlash Pro with at least 9 points on a scale of 0 to 10 for a Net Promoter Score of 42.5. The mean System Usability Score of 83.4 is close to excellent. We provide PrivacyFlash Pro as an open source project to the iOS developer community. In principle, our approach is platform-agnostic and adaptable to the Android and web platforms as well. To increase privacy transparency and reduce compliance issues we make the case for privacy policies as software development artifacts. Privacy policy creation should become a native extension of the software development process and adhere to the mental model of software developers.
Nishant Vishwamitra (University at Buffalo), Hongxin Hu (University at Buffalo), Feng Luo (Clemson University), Long Cheng (Clemson University)
Cyberbullying has become widely recognized as a critical social problem plaguing today's Internet users. This problem involves perpetrators using Internet-based technologies to bully their victims by sharing cyberbullying-related content. To combat this problem, researchers have studied the factors associated with such content and proposed automatic detection techniques based on those factors. However, most of these studies have mainly focused on understanding the factors of textual content, such as comments and text messages, while largely overlooking the misuse of visual content in perpetrating cyberbullying. Recent technological advancements in the way users access the Internet have led to a new cyberbullying paradigm. Perpetrators can use visual media to bully their victims through sending and distributing images with cyberbullying content. As a first step to understand the threat of cyberbullying in images, we report in this paper a comprehensive study on the nature of images used in cyberbullying. We first collect a real-world cyberbullying images dataset with 19,300 valid images. We then analyze the images in our dataset and identify the factors related to cyberbullying images that can be used to build systems to detect cyberbullying in images. Our analysis of factors in cyberbullying images reveals that unlike traditional offensive image content (e.g., violence and nudity), the factors in cyberbullying images tend to be highly contextual. We further demonstrate the effectiveness of the factors by measuring several classifier models based on the identified factors. With respect to the cyberbullying factors identified in our work, the best classifier model based on multimodal classification achieves a mean detection accuracy of 93.36% on our cyberbullying images dataset.