http://uva.nl/

SNE Master Research Projects 2021 - 2022

2004-
2005
2005-
2006
2006-
2007
2007-
2008
2008-
2009
2009-
2010
2010-
2011
2011-
2012
2012-
2013
2013-
2014
2014-
2015
2015-
2016
2016-
2017
2017-
2018
2018-
2019
2019-
2020
2020-
2021
2021-
2022
Contact TimeLine Projects LeftOver Projects Presentations-rp1 Presentations-rp2 Objective Process Tips Project Proposal

Contact

Francesco Regazzoni, Cees de Laat
Course Codes:

Research Project 1 53841REP6Y
Research Project 2 53842REP6Y

TimeLine


RP1 (January):
  • Wednesday Nov 10, 10h00-10h30: Introduction to the Research Projects.
  • Wednesday Dec 8, 10h00-12h00; 14:00-16:00: Detailed discussion on selections for RP1.
  • Friday Jan 14, 24h00: research plan due.
  • Tuesday Feb 8, 10h00-17h00: (updated) Presentations RP1 and out of order RP2.
  • Wednesday Feb 9, 10h00 - 17h00: (updated) Presentations RP1 and out of order RP2.
  • Sunday Feb 13, 23h59: RP - reports due
RP2 (June):
  • Wednesday May 25 10h15-13h00, Detailed discussion on selections for RP2 + delivery of the preliminary analysis of the ethical implications of the project (one or two paragraph max).
  • Friday Jun 10 24h00: research plan due.
  • All Wednesday of June 9:00 - 10:00 virtual BBB room open for spontaneous questions (voluntarily)
  • Tuesday Jul 5, 10h00-17h00 - SP C0.110: (updated) Presentations RP1 and  RP2.
  • Wednesday Jul 6, 10h00-17h00 - SP C0.110: (updated) Presentations RP1 and  RP2.
  • Friday Jul 8, 17h00 (updated): RP - reports due.

Projects

Here is a list of student projects. New ones added at the end. Old and unavailable rp's will be removed including the number, hence the gaps. Remaining rp's carry over to next year. They can be found here. In a futile attempt to prevent spam "@" is replaced by "=>" in the table. Color of cell background:
Project available Presentation received. Confidentiality was requested.
Currently chosen project. Report received. Blocked, not available.
Project plan received. Completed project. Report but no presentation
wordle-s.png


title
summary
supervisor contact

students
R

P
1
/
2
1

Blockchain's Relationship with Sovrin for Digital Self-Sovereign Identities.

Summary: Sovrin (sorvin.org) is a blockchain for self-sovereign identities. TNO operates one of the nodes of the Sovrin network. Sovrin enables easy exchange and verification of identity information (e.g. “age=18+”) for business transactions. Potential savings are estimated to be over 1 B€ per year for just the Netherlands. However, Sovrin provides only an underlying infrastructure. Additional query-response protocols are needed. This is being studied in e.g. the Techruption Self-Sovereign-Identity-Framework (SSIF); project. The research question is which functionalities are needed in the protocols for this. The work includes the development of a datamodel, as well as an implementation that connects to the Sovrin network.
(2018-05)
Oskar van Deventer <oskar.vandeventer=>tno.nl>




2

Sensor data streaming framework for Unity.

In order to build a Virtual Reality “digital twin” of an existing technical framework (like a smart factory), the static 3D representation needs to “play” sensor data which either is directly connected or comes from a stored snapshot. Although a specific implementation of this already exists, the student is asked to build a more generic framework for this, which is also able to “play” position data of parts of the infrastructure (for example moving robots). This will enable the research on virtually working on a digital twin factory.
Research question:
  • What are the requirements and limitations of a seamless integration of smart factory sensor data for a digital twin scenario?
There are existing network capabilities of Unity, existing connectors from Unity to ROS (robot operation system) for sensor data transmission and an existing 3D model which uses position data.
The student is asked to:
  • Build a generic infrastructure which can either play live data or snapshot data.
  • The sensor data will include position data, but also other properties which are displayed in graphs and should be visualized by 2D plots within Unity.
The software framework will be published under an open source license after the end of the project.
Doris Aschenbrenner <d.aschenbrenner=>tudelft.nl>



3

To optimize or not: on the impact of architectural optimizations on network performance.

Project description: Networks are becoming extremely fast. On our testbed with 100Gbps network cards, we can send up to 150 millions of packets per second with under 1us of latency. To support such speeds, many microarchitectural optimizations such as the use of huge pages and direct cache placement of network packets need to be in effect. Unfortunately, these optimizations if not done carefully can significantly harm performance or security. While the security aspects are becoming clear [1], the end-to-end performance impacts remain unknown. In this project, you will investigate the performance impacts of using huge pages and last level cache management in high-performance networking environments. If you were always wondering what happens when receiving millions of packets at nanosecond scale, this project is for you!

Requirements: C programming, knowledge of computer architecture and operating systems internals.

[1] NetCAT: Practical Cache Attacks from the Network, Security and Privacy 2020.
Animesh Trivedi <(animesh.trivedi=>vu.nl>
Kaveh Razavi <kaveh=>cs.vu.nl>


4

The other faces of RDMA virtualization.

Project description: RDMA is a technology that enabled very efficient transfer of data over the network. With 100Gbps RDMA-enabled network cards, it is possible to send hundreds of millions of messages with under 1us latency. Traditionally RDMA has mostly been used in single-user setups in HPC environments. However, recently RDMA technology has been commoditized and used in general purpose workloads such as key-value stores and transaction processing. Major data centers such as Microsoft Azure are already using this technology in their backend services. It is not surprising that there is now support for RDMA virtualization to make it available to virtual machines. We would like you to investigate the limitations of this new technology in terms of isolation and quality of service between different tenants.

Requirements: C programming, knowledge of computer architecture and operating systems internals.

Supervisors: Animesh Trivedi and Kaveh Razavi, VU Amsterdam
Animesh Trivedi <animesh.trivedi=>vu.nl>
Kaveh Razavi <kaveh=>cs.vu.nl>



5

Verification of Objection Location Data through Picture Data Mining Techniques.

Shadows in the open give out more information about the location of the objects in the pictures. According to the positioning, length, and reflection side of the shadow, verification of location information found in the meta data of a picture can be verified. The objective of this project is to develop such algorithms that find freely available images on the internet where tempering with the location data has been performed. The deliverable from this project are the location verification algorithms, a live web service that verifies the location information of the object, and a non-public facing database that contains information about images that had the location information in their meta-data, removed or falsely altered.
Junaid Chaudhry <chaudhry=>ieee.org>




6

Artificial Intelligence Assisted carving.

Problem Description:
Carving for data and locating files belonging to Principal can be hard if we only use keywords. This still requires a lot of manual work to create keyword lists, which might not even be sufficient to find what we are looking for.
Goal:
  • Create a simple framework to detect documents of a certain set (or company) within carved data by utilizing machine learning. Closely related to document identification.
The research project below is currently the only open project at our Forensics department rated at MSc level. Of course, if your students have any ideas for a cybersecurity/forensics related project they are always welcome to contact us.
Danny Kielman <danny.kielman=>fox-it.com>
Mattijs Dijkstra <mattijs.dijkstra=>fox-it.com>


7

Usage Control in the inter data spaces data exchange.

Data Spaces is a new concept and a model for organising and managing data in domain specific data ecosystems. Data spaces include technical/infrastructure aspects, semantic aspects, organisational/governance aspects, and legal frameworks. Data exchange and data processing are main technical activities in data spaces that may be organised in a data workflow that can span over multiple domains and systems. Important aspects in managing data workflows include access policy enforcement and usage control that defines both enforcement of the data usage policy (i.e. allowed uses and actions) and recording of all activities on data.
The thesis will involve the following steps:
- Propose an architecture based on IDS for a selected use case incorporating the enforcement of usage control policies
- Implement the architecture and evaluate its performance.
References
[1] IDS Connector Architecture https://www.dataspaces.fraunhofer.de/en/software/connector.html
[2] IDS Connector Framework https://github.com/International-Data-Spaces-Association/IDS-Connector-Framework
https://www.dataspaces.fraunhofer.de/en/software/connector.html
[3] Jaehong Park, Ravi S. Sandhu: The UCONABC usage control model. ACM Trans. Inf. Syst. Secur. 7(1): 128-174 (2004)
[4] Slim Trabelsi, Jakub Sendor: "Sticky policies for data control in the cloud" PST 2012: 75-80
Yuri Demchenko <y.demchenko=>uva.nl>

8

Security of embedded technology.

Analyzing the security of embedded technology, which operates in an ever changing environment, is Riscure's primary business. Therefore, research and development (R&D) is of utmost importance for Riscure to stay relevant. The R&D conducted at Riscure focuses on four domains: software, hardware, fault injection and side-channel analysis. Potential SNE Master projects can be shaped around the topics of any of these fields. We would like to invite interested students to discuss a potential Research Project at Riscure in any of the mentioned fields. Projects will be shaped according to the requirements of the SNE Master.
;
Please have a look at our website for more information: https://www.riscure.com
;
Previous Research Projects conducted by SNE students:
  1. https://www.os3.nl/_media/2013-2014/courses/rp1/p67_report.pdf
  2. https://www.os3.nl/_media/2011-2012/courses/rp2/p61_report.pdf
  3. http://rp.os3.nl/2014-2015/p48/report.pdf
  4. https://www.os3.nl/_media/2011-2012/courses/rp2/p19_report.pdf
If you want to see what the atmosphere is at Riscure, please have a look at: https://vimeo.com/78065043
Please let us know If you have any additional questions!
Ronan Loftus <loftus=>riscure.com>
Alexandru Geana <Geana=>riscure.com>
Karolina Mrozek <Mrozek=>riscure.com>
Dana Geist <geist=>riscure.com>




9

Cross-blockchain oracle.

Interconnection between different blockchain instances, and smart contracts residing on those, will be essential for a thriving multi-blockchain business ecosystem. Technologies like hashed timelock contracts (HTLC) enable atomic swaps of cryptocurrencies and tokens between blockchains. A next challenge is the cross-blockchain oracle, where the status of an oracle value on one blockchain enables or prevents a transaction on another blockchain.
The goal of this research project is to explore the possibilities, impossibilities, trust assumptions, security and options for a cross-blockchain oracle, as well as to provide a minimal viable implementation.
(2018-05)
Oskar van Deventer <oskar.vandeventer=>tno.nl>
Maarten Everts <maarten.everts=>tno.nl>


24

Network aware performance optimization for Big Data applications using coflows.

Optimizing data transmission is crucial to improve the performance of data intensive applications. In many cases, network traffic control plays a key role in optimising data transmission especially when data volumes are very large. In many cases, data-intensive jobs can be divided into multiple successive computation stages, e.g., in MapReduce type jobs. A computation stage relies on the outputs of the the previous stage and cannot start until all its required inputs are in place. Inter-stage data transfer involves a group of parallel flows, which share the same performance goal such as minimising the flow's completion time.

CoFlow is an application-aware network control model for cluster-based data centric computing. The CoFlow framework is able to schedule the network usage based on the abstract application data flows (called coflows). However, customizing CoFlow for different application patterns, e.g., choosing proper network scheduling strategies, is often difficult, in particular when the high level job scheduling tools have their own optimizing strategies.

The project aims to profile the behavior of CoFlow with different computing platforms, e.g., Hadoop and Spark etc.
  1. Review the existing CoFlow scheduling strategies and related work
  2. Prototyping test applications using; big data platforms (including Apache Hadoop, Spark, Hive, Tez).
  3. Set up coflow test bed (Aalo, Varys etc.) using existing CoFlow installations.
  4. Benchmark the behavior of CoFlow in different application patterns, and characterise the behavior.
Background reading:
  1. CoFlow introduction: http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-211.pdf
  2. Junchao Wang, Huan Zhouy, Yang Huz, Cees de Laatx and Zhiming Zhao, Deadline-Aware Coflow Scheduling in a DAG, in NetCloud 2017, Hongkong, to appear [upon request]
More info: Junchao Wang, Spiros Koulouzis, Zhiming Zhao
Zhiming Zhao <z.zhao=>uva.nl>

10

Elastic data services for time critical distributed workflows.

Large-scale observations over extended periods of time are necessary for constructing and validating models of the environment. Therefore, it is necessary to provide advanced computational networked infrastructure for transporting large datasets and performing data-intensive processing. Data infrastructures manage the lifecycle of observation data and provide services for users and workflows to discover, subscribe and obtain data for different application purposes. In many cases, applications have high performance requirements, e.g., disaster early warning systems.

This project focuses on data aggregation and processing use-cases from European research infrastructures, and investigates how to optimise infrastructures to meet critical time requirements of data services, in particular for different patterns of data-intensive workflow. The student will use some initial software components [1] developed in the ENVRIPLUS [2] and SWITCH [3] projects, and will:
  1. Model the time constraints for the data services and the characteristics of data access patterns found in given use cases.
  2. Review the state of the art technologies for optimising virtual infrastructures.
  3. Propose and prototype an elastic data service solution based on a number of selected workflow patterns.
  4. Evaluate the results using a use case provided by an environmental research infrastructure.
Reference:
  1. https://staff.fnwi.uva.nl/z.zhao/software/drip/
  2. http://www.envriplus.eu
  3. http://www.switchproject.eu
More info: "Spiros Koulouzis, Paul Martin, Zhiming Zhao
Zhiming Zhao <z.zhao=>uva.nl>




11

Contextual information capture and analysis in data provenance.

Tracking the history of events and the evolution of data plays a crucial role in data-centric applications for ensuring reproducibility of results, diagnosing faults, and performing optimisation of data-flow. Data provenance systems [1] are a typical solution, capturing and recording the events generated in the course of a process workflow using contextual metadata, and providing querying and visualisation tools for use in analysing such events later.

Conceptual models such as W3C PROV (and extensions such as ProvONE), OPM and CERIF have been proposed to describe data provenance, and a number of different solutions have been developed. Choosing a suitable provenance solution for a given workflow system or data infrastructure requires consideration of not only the high-level workflow or data pipeline, but also performance issues such as the overhead of event capture and the volume of provenance data generated.

The project will be conducted in the context of EU H2020 ENVRIPLUS project [1, 2]. The goal of this project is to provide practical guidelines for choosing provenance solutions. This entails:
  1. Reviewing the state of the art for provenance systems.
  2. Prototyping sample workflows that demonstrate selected provenance models.
  3. Benchmarking the results of sample workflows, and defining guidelines for choosing between different provenance solutions (considering metadata, logging, analytics, etc.).
References:
  1. About project: http://www.envriplus.eu
  2. Provenance background in ENVRIPLUS: https://surfdrive.surf.nl/files/index.php/s/uRa1AdyURMtYxbb
  3. Michael Gerhards, Volker Sander, Torsten Matzerath, Adam Belloum, Dmitry Vasunin, and Ammar Benabdelkader. 2011. Provenance opportunities for WS-VLAM: an exploration of an e-science and an e-business approach. In Proceedings of the 6th workshop on Workflows in support of large-scale science (WORKS '11). http://dx.doi.org/10.1145/2110497.2110505
More info: - Zhiming Zhao, Adam Belloum, Paul Martin
Zhiming Zhao <z.zhao=>uva.nl>

Rik Janssen <Rik.Janssen=>os3

2
12

Profiling Partitioning Mechanisms for Graphs with Different Characteristics.

In computer systems, graph is an important model for describing many things, such as workflows, virtual infrastructures, ontological model etc. Partitioning is an frequently used graph operation in the contexts like parallizing workflow execution, mapping networked infrastructures onto distributed data centers [1], and controlling load balance of resources. However, developing an effective partition solution is often not easy; it is often a complex optimization issue involves constraints like system performance and cost constraints.;

A comprehensive benchmark on graph partitioning mechanisms is helpful to choose a partitioning solver for a specific model. This portfolio can also give advices on how to partition based on the characteristics of the graph. This project aims at benchmarking the existing partition algorithms for graphs with different characteristics, and profiling their applicability for specific type of graphs.;
This project will be conducted in the context of EU SWITCH [2] project. the students will:
  1. Review the state of the art of the graph partitioning algorithms and related tools, such as Chaco, METIS and KaHIP, etc.
  2. Investigate how to define the characteristics of a graph, such as sparse graph, skewed graph, etc. This can also be discussed with different graph models, like planar graph, DAG, hypergraph, etc.
  3. Build a benchmark for different types of graphs with various partitioning mechanisms and find the relationship behind.;
  4. Discuss about how to choose a partitioning mechanism based on the graph characteristics.
Reading material:
  1. Zhou, H., Hu Y., Wang, J., Martin, P., de Laat, C. and Zhao, Z., (2016) Fast and Dynamic Resource Provisioning for Quality Critical Cloud Applications, IEEE International Symposium On Real-time Computing (ISORC) 2016, York UK http://dx.doi.org/10.1109/ISORC.2016.22
  2. SWITCH: www.switchproject.eu

More info: Huan Zhou, Arie Taal, Zhiming Zhao

Zhiming Zhao <z.zhao=>uva.nl>

13

Auto-Tuning for GPU Pipelines and Fused Kernels.

Achieving high performance on many-core accelerators is a complex task, even for experienced programmers. This task is made even more challenging by the fact that, to achieve high performance, code optimization is not enough, and auto-tuning is often necessary. The reason for this is that computational kernels running on many-core accelerators need ad-hoc configurations that are a function of kernel, input, and accelerator characteristics to achieve high performance. However, tuning kernels in isolation may not be the best strategy for all scenarios.

Imagine having a pipeline that is composed by a certain number of computational kernels. You can tune each of these kernels in isolation, and find the optimal configuration for each of them. Then you can use these configurations in the pipeline, and achieve some level of performance. But these kernels may depend on each other, and may also influence each other. What if the choice of a certain memory layout for one kernel causes performance degradation on another kernel?

One of the existing optimization strategies to deal with pipelines is to fuse kernels together, to simplify execution patterns and decrease overhead. In this project we aim to measure the performance of accelerated pipelines in three different tuning scenarios:
  1. tuning each component in isolation,
  2. tuning the pipeline as a whole, and
  3. tuning the fused kernel. Measuring the performance of one or more pipelines in these scenarios we hope to, on one level, being able to determine which is the best strategy for the specific pipelines on different hardware platform, and on another level we hope to better understand which are the characteristics that influence this behavior.
Rob van Nieuwpoort <R.vanNieuwpoort=>uva.nl>




14

Auto-tuning for Power Efficiency.

Auto-tuning is a well-known optimization technique in computer science. It has been used to ease the manual optimization process that is traditionally performed by programmers, and to maximize the performance portability. Auto-tuning works by just executing the code that has to be tuned many times on a small problem set, with different tuning parameters. The best performing version is than subsequently used for the real problems. Tuning can be done with application-specific parameters (different algorithms, granularity, convergence heuristics, etc) or platform parameters (number of parallel threads used, compiler flags, etc).

For this project, we apply auto-tuning on GPUs. We have several GPU applications where the absolute performance is not the most important bottleneck for the application in the real world. Instead the power dissipation of the total system is critical. This can be due to the enormous scale of the application, or because the application must run in an embedded device. An example of the first is the Square Kilometre Array, a large radio telescope that currently is under construction. With current technology, it will need more power than all of the Netherlands combined. In embedded systems, power usage can be critical as well. For instance, we have GPU codes that make images for radar systems in drones. The weight and power limitations are an important bottleneck (batteries are heavy).

In this project, we use power dissipation as the evaluation function for the auto-tuning system. Earlier work by others investigated this, but only for a single compute-bound application. However, many realistic applications are memory-bound. This is a problem, because loading a value from the L1 cache can already take 7-15x more energy than an instruction that only performs a computation (e.g., multiply).

There also are interesting platform parameters than can be changed in this context. It is possible to change both core and memory clock frequencies, for instance. It will be interesting to if we can at runtime, achieve the optimal balance between these frequencies.

We want to perform auto-tuning on a set of GPU benchmark applications that we developed.
Rob van Nieuwpoort <R.vanNieuwpoort=>uva.nl>

15

Applying and Generalizing Data Locality Abstractions for Parallel Programs.

TIDA is a library for high-level programming of parallel applications, focusing on data locality. TIDA has been shown to work well for grid-based operations, like stencils and convolutions. These are in an important building block for many simulations in astrophysics, climate simulations and water management, for instance. The TIDA paper gives more details on the programming model.

This projects aims to achieve several things and answer several research questions:

TIDA currently only works with up to 3D. In many applications we have, higher dimensionalities are needed. Can we generalize the model to N dimensions?
The model currently only supports a two-level hierarchy of data locality. However, modern memory systems often have many more levels, both on CPUs and GPUs (e.g., L1, L2 and L3 cache, main memory, memory banks coupled to a different core, etc). Can we generalize the model to support N-level memory hierarchies?
The current implementation only works on CPUs, can we generalize to GPUs as well?
Given the above generalizations, can we still implement the model efficiently? How should we perform the mapping from the abstract hierarchical model to a real physical memory system?

We want to test the new extended model on a real application. We have examples available in many domains. The student can pick one that is of interest to her/him.
Rob van Nieuwpoort <R.vanNieuwpoort=>uva.nl>

16

Ethereum Smart Contract Fuzz Testing.

An Ethereum smart contract can be seen as a computer program that runs on the Ethereum Virtual Machine (EVM), with the ability to accept, hold and transfer funds programmatically. Once a smart contract has been place on the blockchain, it can be executed by anyone. Furthermore, many smart contracts accept user input. Because smart contracts operate on a cryptocurrency with real value, security of smart contracts is of the utmost importance. I would like to create a smart contract fuzzer that will check for unexpected behaviour or crashes of the EVM. Based on preliminary research, such a fuzzer does not exist yet.
Rodrigo Marcos <rodrigo.marcos=>secforce.com>





17

Smart contracts specified as contracts.

Developing a distributed state of mind: from control flow to control structure

The concepts of control flow, of data structure, as well as that of data flow are well established in the computational literature; in contrast, one can find different definitions of control structures, and typically these are not associated to the common use of the term, referring to the power relationships holding in society or in organizations.

The goal of this project is the design and development of a social architecture language that cross-compile in a modern concurrent programming language (Rust, Go, or Scala), in order to make explicit a multi-threaded, distributed state of mind, following results obtained in agent-based programming. The starting point will be a minimal language subset of AgentSpeak(L).

Potential applications: controlled machine learning for Responsible AI, control of distributed computation
Giovanni Sileno <G.Sileno=>uva.nl>
Mosata Mohajeriparizi <m.mohajeriparizi=>uva.nl>


18

Zero Trust Validation

ON2IT advocates the zero Trust Validation conceptual strategy [1] to strengthen information security at the architectural level. Zero Trust is often mistakenly perceived as an architectural approach. However, it is, in the end, a strategic approach towards protecting assets regardless of location. To enable this approach, controls are needed to provide sufficient insight (visibility), to exert control, and to provide operational feedback. However, these controls/probes are not naturally available in all environ­ments. Finding ways to embed such controls, and finding/applying them, can be challenging, especially in the context of containerized, cloud­ and virtualized workflows.

At the strategic level, Zero Trust is not sufficiently perceived as a value contributor. At the managerial level, it is perceived mainly as an architectural ‘toy’. This makes it hard to translate a Zero Trust strategic approach to the operational level; there’s a lack overall coherence. For this reason, ON2IT developed a Zero Trust Readiness Assessment framework which facilitates testing the readiness level on three levels: governance, management and operations.

Research (sub)questions that emerge:
  • What is missing in the current approach of ZTA to make it resonate with the board?
    • What are Critical Success Factors for drafting and implementing ZTA?
    • What is an easy to consume capability maturity or readiness model for the adoption of ZTA that guides boards and management teams in making the right decisions?
    • What does a management portal with associated KPIs need to offer in order to enable board and management to manage and monitor the ZTA implementation process and take appropriate ownership?
    • How do we add the necessary controls and leverage control and monitoring facilitities thusly provided efficiently?
  1. Zero Trust Validation
  2. "On Exploring Research Methods for Business Information Security Alignment and Artefact Engineering" by Yuri Bobbert, University of Antwerp
Jeroen Scheerder <Jeroen.Scheerder=>on2it.net>



19

OSINT Washing Street.

At the moment more and more OSINT is available via all kinds of sources,a lot them are legit services that are used by malicious actors. Examples are github, pastebin, twitter etc. If you look at pastebin data you might find IOC/TTPS but usually the payloads delivered in many stages so it is important to have a system that follows the path until it finds the real payload. The question here is how can you build a generic pipeline that unravels data like a matryoshka doll. So no matter the input, the pipeline will try to decode, query or perform whatever relevant action that is needed. This would result in better insight in the later stages of an attack. An example of a framework using the method is Stoq (https://github.com/PUNCH-Cyber/stoq), but this lakes research in usability and if the results are added value compared to other osint sources.
Joao Novaismarques <joao.novaismarques=>kpn.com>

20

Building an open-source, flexible, large-scale static code analyzer.

Background information
Data drives business, and maybe even the world. Businesses that make it their business to gather data are often aggregators of clientside generated data. Clientside generated data, however, is inherently untrustworthy. Malicious users can construct their data to exploit careless, or naive, programming and use this malicious, untrusted data to steal information or even take over systems.
It is no surprise that large companies such as Google, Facebook and Yahoo spend considerable resources in securing their own systems against would be attackers. Generally, many methods have been developed to make untrusted data cross the trust boundary to trusted data, and effectively make malicious data harmless. However, securing your systems against malicious data often requires expertise beyond what even skilled programmers might reasonably possess.
Problem description
Ideally, tools that analyze code for vulnerabilities would be used to detect common security issues. Such tools, or static code analyzers, exist, but are either outdated (http://rips-scanner.sourceforge.net/) or part of very expensive commercial packages (https://www.checkmarx.com/ and http://armorize.com/). Next to the need for an open­source alternative to the previously mentioned tools, we also need to look at increasing our scope. Rather than focusing on a single codebase, the tool would ideally be able to scan many remote, large scale repositories and report the findings back in an easily accessible way.
An interesting target for this research would be very popular, open source (at this stage) Content Management Systems (CMSs), and specifically plugins created for these CMSs. CMS cores are held to a very high coding standard and are often relatively secure. Plugns, however, are necessarily less so, but are generally as popular as the CMSs they are created for. This is problematic, because an insecure plugin is as dangerous as an insecure CMS. Experienced programmers and security experts generally audit the most popular plugins, but this is: a) very timeintensive, b) prone to errors and c) of limited scope, ie not every plugin can be audited. For example, if it was feasible to audit all aspects of a CMS repository (CMS core and plugins), the DigiNotar debacle could have easily been avoided.
Research proposal
Your research would consist of extending our proof of concept static code analyzer written in Python and using it to scan code repositories, possibly of some major CMSs and their plugins, for security issues and finding innovative ways of reporting on the massive amount of possible issues you are sure to find. Help others keep our data that little bit more safe.
Patrick Jagusiak <patrick.jagusiak=>dongit.nl>
Wouter van Dongen <wouter.vandongen=>dongit.nl>



21

Developing a Distributed State of Mind.

A system required to be autonomous needs to be more than just a computational black box that produces a set of outputs from a set of inputs. Interpreted as an agent provided with (some degree of) rationality, it should act based on desires, goals and internal knowledge for justifying its decisions. One could then imagine a software agent much like a human being or a human group, with multiple parallel threads of thoughts and considerations which more than often are in conflict with each other. This distributed view contrasts the common centralized view used in agent-based programming,and opens up to potential cross-fertilization with distributed computing applications which for the moment are for the most unexplored.

The goal of this project is the design and development of an efficient agent architecture in a modern concurrent programming language (Rust, Go, or Scala), in order to make explicit a multi-threaded, distributed state of mind.
Giovanni Sileno <G.Sileno=>uva.nl>
Mostafa Mohajeriparizi <m.mohajeriparizi=>uva.nl>





22

Development of a control framework to guaranty the security of a collaborative open-source project.

We’re now living in an information society, and everyone is expecting to be able to find everything on the Web. IT developers make no exception and spend a large part of their working hours searching for and reusing part of codes found on Public Repositories (e.g. GitHub, Gitlab …) or web forums (e.g. StackOverflow).

The use of open-source software has long been seen as a secure alternative as the code is available for review to everyone, and as a result, bugs and vulnerability should more easily be found and fixed. Multiple incidents related to the use of Open-source software (NPM, Gentoo, Homebrew) have shown that the greater security of open-source components turned out to be theoretical.

This research aims to highlight the root causes of major recent incidents related to open-source collaborative projects, as well as to propose a global open-source security framework that could address those issues.

References:
Huub van Wieren <vanWieren.Huub=>kpmg.nl>

23

Security of IoT communication protocols on the AWS platform.

In January 2020, Jason and Hoang from the OS3 master worked on the project “Security Evaluation on Amazon Web Services’ REST API Authentication Protocol Signature Version 4”[1]. This project has shown the resilience of the Sigv4 authentication mechanism for HTTP protocol communications.
Since June 2017, AWS released a service called AWS Greengrass[2] that can be used as an intermediate server for low connectivity devices running AWS IoT SDK[3]. This is an interesting configuration as it allows to further challenge Sigv4 authentication on a disconnected environment using the MQTT protocol.

Reference:
  1. https://homepages.staff.os3.nl/~delaat/rp/2019-2020/p65/report.pdf
  2. https://docs.aws.amazon.com/greengrass/latest/developerguide/what-is-gg.html
  3. https://github.com/aws/aws-iot-device-sdk-python
Huub van Wieren <vanWieren.Huub=>kpmg.nl>



25

Version management of project files in ICS.

Research in Industrial Control Systems: It is difficult to have proper version management of the project files as they usually are stored offline. We would like to come up with a solution to backup and store project files in real time on a server and have the capability to revert back/take snapshots etc. of the versions used. Sort of Puppet/Chef/Ansible but then for ICS.
<mvanveen=>deloitte.nl>



26

Future tooling and cyber defense strategy for ICS.

Research in Industrial Control Systems: Is zero trust networking possible in ICS? This is one of the questions we are wondering about to sharpen our vision and story around where ICS security is going and which solutions are emerging.
Pavlos Lontorfos <plontorfos=>deloitte.nl>
Dominika Rusek-Jonkers <drusek=>deloitte.nl>

Leroy van der Steenhoven <lsteenhoven=>os3.nl>

1
27

End-to-end encryption for browser-based meeting technologies.

Investigating the possibilities and limitations of end-to-end encrypted browser-based video conferencing. With a specific focus on security and preserving privacy.
  • What are possible approaches?
  • How would they compare to each other?
Jan Freudenreich <jfreudenreich=>deloitte.nl>




38

Evaluation of the Jitsi Meet approach for end-to-end encrypted browser-based video conferencing.

Determining the security of the library, implementation and the environment setup.
Jan Freudenreich <jfreudenreich=>deloitte.nl>




31

Approximate computing and side channels.

Approximate computer is an emerging computing paradigm where the precision of the computation is traded with other metrics such as energy consumption or performance. This paradigm has been shown to be effective in various application, including machine learning and video streaming. However, the effect of approximate computing on security are still unknown. This project investigates the effects of approximate computing paradigm on side channel attacks.

The specific use case considered here is the exploration of the resistance against power analysis attacks of devices when classical techniques used in the approximate computing paradigm to reduce the energy consumption (such as voltage scaling) are applied. The research will address the following challenges:

  • Selection of the most appropriated techniques for energy saving among the ones used in approximate computing paradigm
  • Realization of a number of simple cryptographic benchmarks using HDL (VHDL of Verilog) language
  • Simulation of the power consumption in the different scenarios
  • Evaluation of the side channel resistance of each
This thesis is in collaboration with University of Stuttgart (Prof. Ilia Polian)
Francesco Regazzoni <f.regazzoni=>uva.nl>

Steef van Wooning (swooning=>os3.nl)
Brice Habets (bhabets=>os3.nl)

1
32

Decentralize a legacy application using blockchain: a crowd journalism case study.

Blockchain technologies demonstrated a huge potential for application developers and operators to improve service trustworthiness, e.g., in logistics, finance and provenance. The migration of a centralized distributed application into a decentralized paradigm often requires not only a conceptual re-design of the application architecture, but also profound understanding of the technical integration between business logic with the blockchain technologies. This project, we will use the social network application (crowd journalism) as a test case to investigate the integration possibilities between a legacy system and the blockchain. Key activities in the project:
  1. investigate the integration possibilities between social network application and permissioned blockchain technologies,
  2. make a rapid prototype to demonstrate the feasibility, and
  3. assess the operational cost of blockchain services.
The software of the crowd journalism will be provided by a SME partner of EU ARTICONF project.

References: http://www.articonf.eu
Zhiming Zhao <z.zhao=>uva.nl>


33

Location aware data processing in the cloud environment.

Data intensive applications are often workflow involving distributed data sources and services. When the data volumes are very large, especially with different access constraints, the workflow system has to decide suitable locations to process the data and to deliver the results. In this project, we perform a case study of eco-Lida data from different European countries; the processing will be done using the test bed offered by the European Open Science Cloud. The project will investigate data location aware scheduling strategies, and service automation technologies for workflow execution. The data processing pipeline and data sources in the use case will be provided by partners in the EU Lifewatch, and the test bed will be provided by the European Open Science Cloud earlier adopter program.
Zhiming Zhao <z.zhao=>uva.nl>

34

Trust bootstrapping for secure data exchange infrastructure provisioned on demand.

Data exchange in the data market requires more than just end to end secure connection that is well supported by VPN. Data market and data exchange that could be integrated into complex research, industrial and business processes may require connection to data market and data exchange services supporting data search, combination and quality assurance as well as delivery to data processing or execution facilities. This can be achieved by providing trusted data exchange and execution environment on demand using cloud hosting platform.
This project will (1) investigate current state of the art in trust management, trust bootstrapping and key management in provisioned on demand cloud based services; (2) test several available solutions, and (3) implement a selected solution in a working prototype.
References
[1] Bootstrapping and Maintaining Trust in the Cloud https://www.ll.mit.edu/sites/default/files/publication/doc/2018-04/2016_12_07_SchearN_ACSAC_FP.pdf
[2] Keylime: Bootstrap and Maintain Trust on the Edge/Cloud and IoT https://github.com/keylime
Yuri Demchenko <y.demchenko=>uva.nl>

35

Supporting infrastructure for distributed data exchange scenarios when using IDS (Industrial Data Spaces) Trusted Connector.

This project will investigate the International Data Spaces Association (IDSA) Reference Architecture Model (RAM) and the proposed IDS Connector and its applicability to complex data exchange scenarios that involve multiple data sources/suppliers and multiple data consumers in a complex multi-staged data centric workflow.
The project will assess the UCON library providing native IDS Connector implementation, test it in a proposed scenario that supports one of general uses cases for secure and trusted data exchange, and identify necessary infrastructure components to support IDS Connector and RAM such as trust management, data identification and lineage, multi-stage session management, etc.
References
[1] International Data Spaces Architecture Reference Architecture Model 3.0 (IDS-RAM) https://internationaldataspaces.org/ids-ram-3-0/
[2] IDS Connector Framework https://github.com/International-Data-Spaces-Association/IDS-Connector-Framework
https://www.dataspaces.fraunhofer.de/en/software/connector.html
Yuri Demchenko <y.demchenko=>uva.nl>


36

Security projects at KPN.

The following are some ideas for RP that we would like to propose from KPN Security. Moreover, I would like to mention that we are open for other ideas as long as those are related to the proposed ones. To give a better impression, I added the "rough ideas" section as example of topics we would be interested to supervise. We are more than happy to assist the students at the moment of finding the right angle for their research.

Info stealer landscape 2021
Create an overview of the info stealer landscape 2020/2021. What stealers are used, how do they work, what are similarities, config extraction of samples, how to detect the info stealers. Hoping this could lead to something similar as https://azorult-tracker.net/ where data is published from automatically analyzing info stealers. An example of what can be used for that is openCTI (https://github.com/OpenCTI-Platform/opencti).

Hacked wordpress sites
In today’s threat landscape several malicious groups including Revil, Emotet, Qakbot and Dridex are using compromised Wordpress website to aid in their operations. This RP would be on analyzing how many of those vulnerable websites are out there using OSINT techniques like urlscan.io, Shodan and Riskiq. Also identifying the vulnerable components and if they are hacked already would help fight this problem. Ideally some notification system is put in place to warn owners and hosting companies about their website.

Rough ideas (freestyle)
  • Literature review of the state of the art of a give malware category ( Trojans, Info stealers, ransomware, etc) some examples:
  • What cloud services are been the most abused to for distributing malware? (Pastebin, GitHub, drive, Dropbox, etc) . URLHaus, Public sandboxes, and other sources could be starting points. (Curious about cdns and social applications like discord, telegram , and others)
  • Looking at raccine https://github.com/Neo23x0/Raccine, what steps do ransomware malware take and what possibilities are there to create other vaccines or how to improve Raccine.
  • Building a non detectable web scraper
    • A lot of time data from darknet is available on website and no option for an API/feed is available. These website tend to have scraping detection is several ways, this could be rate limiting to “human” behavior checks. What is the best way to scrape these type of website in such a way that is is hard to impossible to detect a bot is retrieving data. Can this be done while still maintaining a good pace of retrieving data.
  • Malware Aquarium
    • Inspired by XKCD: https://xkcd.com/350/. Can you create an open source malware aquarium. There are several challenges in how to setup up, how to get infection going, keeping it contained and how to keep track of everything (alerts on changes)?
Jordi Scharloo <jordi.scharloo=>kpn.com>

Tom van Gorkom <tom.vangorkom=>os3.nl>

1
37

Assessing data remnants in modern smartphones after factory reset.

Description:

Factory reset is a function built in modern smartphones which restores the settings of a device to the state it was shipped from the factory. While its user data becomes inaccessible through the device's user interface, research performed in 2018 reports that mobile forensic techniques can still recover old data even after a smartphone undergoes factory reset.

In recent smartphones, however, multiple security measures are implemented by the vendors due to growing concerns over security and privacy. The implementation of encryption is especially supposed to be effective for protecting user data from an attacker after factory reset. In the meantime, its impact on the digital forensics domain has not yet been explored.

In this project, the effectiveness of factory reset to digital forensics will be evaluated using modern smartphones. Using the latest digital forensic techniques, data remnants in factory reset smartphones are investigated, and its applicability to forensic domain will be evaluated.

Related research:
Zeno Geradts <zeno=>holmes.nl>
Aya Fukami <ayaf=>safeguardcyber.com>

Mattijs Blankesteijn <mblankesteijn=>os3.nl>


39

Vocal Fakes.

Deep fakes are in the news, especially those where real people are being copied. You see that really good deepfakes use doubles and voice actors. Audio deepfakes are not that good yet, and the available tools are mainly trained on the English language.
> Voice clones can be used for good (for example, for ALS patients), but also for evil, such as in CEO fraud. It is important for the police to know the latest state of affairs, on the one hand to combat crime (think not only of fraud, but also of access systems where the voice is used as biometric access controls). But there are also applications where the police can use voice cloning.
The central question is what the latest state of technology is, specifically also for the Dutch language, what the most important players are and what are the starting points for recognizing it and… to make a demo application with which the possibilities can be demonstrated.
On the internet Corentin real time voice cloning is promoted, with which you can create your own voicecloning framework, so that you can also clone other people's voices, this repository on Github was open-sourced last year, as an implementation of this research paper about a real-time working "vocoder". Perhaps a good starting point?
Zeno Geradts <zeno=>holmes.nl>




40

Web of Deepfakes.

According to the well-known magazine Wired, Text Synthesis is at least as great a threat as deepfakes. Thanks to a new language model, called GPT-3, it has now become much easier to analyze entered texts and generate variants and extensions in large volumes. This can be used for guessing passwords, automating social engineering and in many forms of scams (friend-in-need fraud) and extortion.
It is therefore not expected that this will be used to create incidents like deepfakes, but to create a web of lies, disguised as regular conversations on social media. This can also undermine the sincerity of online debate. Europol also warns against text synthesis because it allows the first steps of phishing and fraud to be fully automated.
A lot of money is also invested in text synthesis from marketing and services. For chatbots, but also because you can tailor campaigns with the specific language use of your target group. This technology can also be used by criminals.
The central question is what the latest state of affairs is, what the most important players are and what are the starting points for recognizing text synthesis in, for example, fraudulent emails / chats, and for (soon) distinguishing real people from chatbots. Perhaps interesting to build your own example in slang or for another domain?
Zeno Geradts <zeno=>holmes.nl>

Steef vanWooning <Steef.vanWooning=>os3.nl>
Danny Janssen <Danny.Janssen=>os3.nl>

2
42

Zero Trust architectures applications in the University ICT environment.

Traditionally security in ICT is managed by creating zones where within that zone everything is trusted to be secure and security is seen as defending the inside from attacks originating from the outside. For that purpose firewall's and intrusion detection systems are used. That model is considered broken. One reason is that a significant part of the security incidents are inside jobs with grave consequences. Another reason is that even good willing insiders (employees) may inadvertently become the source of an incident because of phishing or brute force hacking. For organizations such as the university an additional problem is that an ever changing population of students, (guest) researchers, educators and staff with wildly varying functions and goals (education, teaching, research and basic operations) put an enormous strain on the security and integrity of the ICT at the university. A radical different approach is to trust nothing and start from that viewpoint. This rp is to create an overview of zero-trust literature and propose a feasible approach & architecture that can work at the University scale of about 40000 persons.
Roeland Reijers <r.reijers=>uva.nl>
Cees de Laat <C.T.A.M.deLaat=>uva.nl>




43

High-speed implementation of lightweight ciphers.

The crypto community is constantly trying to improve the ciphers that we use. AES has been the 'golden standard' for a very long time, yet it can be inefficient in applications that require very cheap encryption.  To this end the community developed 'lightweight' alternatives to it and NIST launched a public competition to find good lightweight ciphers. In this project we are interested in a high-speed implementation of lightweight ciphers, utilizing the modern RISC-V architecture.
https://csrc.nist.gov/projects/lightweight-cryptography/finalists

This project involves:
-a couple of lectures on coding with RISC-V assembly
-the implementation of a lightweight cipher (e.g. the GIFT cipher) in assembly
-benchmarking our performance in a HiFive1-revb board
https://www.sifive.com/boards/hifive1-rev-b

Links/Literature:

A small intro to the RISC-V architecture:
https://www.youtube.com/watch?v=m8DqCTogb8w

High-speed cipher implementations in the RISC-V architecture:
-Efficient Cryptography on the RISC-V Architecture https://eprint.iacr.org/2019/794.pdf
-Fixslicing: A New GIFT Representation https://eprint.iacr.org/2020/412.pdf
-Fixslicing AES-like Ciphers https://eprint.iacr.org/2020/1123.pdf

The RISC-V instruction set:
https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf
https://www.cs.sfu.ca/~ashriram/Courses/CS295/assets/notebooks/RISCV/RISCV_CARD.pdf

The GIFT cipher:
GIFT: a small present https://eprint.iacr.org/2017/622.pdf
Kostas Papagiannopoulos <k.papagiannopoulos=>uva.nl>

Gheorghe Pojoga <Gheorghe.Pojoga=>os3.nl>

1
44

Federated Authentication platform.

SURF operates a federated authentication platform which amongst others can interface with 4500 Identity Providers (universities etc) from 73 countries, based on SAML2.0. In the authentication flow, the Service Provider (SP) can ask the Identity Provider (IdP) to force the user to present a second factor during login.  The SP does this by adding a specific value in the AuthnContextClassRef  field.  Unfortunately, the values for AuthnContextClassRef are not standardized.  Especially in international context with so many different actors this poses a huge problem and causes strong authentication and second factor logins to be disregarded in federated contexts, even though many IdPs support it. In this project, you will investigate possible solutions for this problem and build a proof of concept with your own mock-federation consisting of an SP and IdPs from multiple vendors (in particular Microsoft ADFS/Azure and Shibboleth), implement the chosen approach and determine if this could be used in practice without interfering with the user experience too much.
Bas Zoetekouw <bas.zoetekouw=>surf.nl>

Hilco de Lathouder <hilco.delathouder=>os3.nl>

1
45

Researching efficiency of Trendmicro's HAC-T algorithm.

HAC-T is an algorithm for efficiently clustering TLSH hashes for efficient ( O(log(n)) ) comparisons of TLSH hashes. Trendmicro recently published a Python implementation of the HAC-T algorithm, using scikit-learn.
The examples given for this implementation are ~50,000 TLSH hashes. The question is: does this particular implementation scale well enough for it to be used in production when looking at many millions of hashes?
A test set (either hashes of binary files, or pre-processed open source license texts) will be provided.

Techniques: Linux, Python, scikit-learn

Links:

Armijn Hemel <armijn=>tjaldur.nl>

Tijmen van der Spijk <tspijk=>os3.nl>
Imre Fodi <Imre.Fodi@os3.nl>

1
46

Towards unified software package dependency resolution strategy.

To install code, software package management tools need to determine which dependent package of which version to install. Each ecosystem has evolved their own ways to deal with versioning and resolve the dependencies.

The goal of this project is to:

- Inventory and document the many different ways dependencies are resolved today across ecosystems such as for example Maven/Java, RPM, Debian, npm, Rubygems, PyPI, Conda, R, Perl, Go, Dart, Rust, Swift, Eclipse, Conan and PHP.
- Propose and apply a dependency resolution classification based on the specific semantics of each resolution approach
- Suggest a possible unified strategy for dependency resolution to "debabelize" the important package dependency resolution

The research question is: Is a unified dependency resolution strategy attainable across all ecosystems?
Philippe Ombredanne <pombredanne=>nexb.com>



47

In search of popularity and prominence metric for software packages.

Software is consumed as packages such Maven/Java, RPM, Debian, npm, Rubygems. Each ecosystem typically offers a centralized package repository though some are fully decentralized (such as Go). Determining the popularity and prominence of a software package within its ecosystem in a somewhat unbiased way is an unresolved issue and goes well beyond just counting stars on GitHub.

The goal of this project is to:

- Inventory and research existing efforts to provide metric proxies such as and including Libraries.io Sourcerank, Sonatype MTTU, OpenSSF Criticality, and others
- Inventory and document the metrics and data that could be available in key ecosystems such as Maven/Java, RPM, Debian, npm, Rubygems, PyPI, Conda, R, Perl, Go, Dart, Rust, Swift, Eclipse, Conan and PHP.
- Propose new metrics directions and a validation process for a possible unified approach (or multiple ecosystem-specific approaches)

The research question is: How to rank open source packages relative prominence and popularity?

Bonus: write actual code to compute these.
Philippe Ombredanne <pombredanne=>nexb.com>


48

Towards an estimation of the volume of open source code.

There is no clear estimate of how much open source code there is in the whole wide world today. A simple count of the number of repo on GitHub or the number packages in an ecosystem such as at  http://www.modulecounts.com/ provides an incomplete and likely misleading estimation, as each package may be of vastly different size (such as npm one-liner packages)

The goal of this project is to:
- Research existing metrics to use to quantify open source projects (such as number of packages, files, lines of code, etc) possibly specialized by ecosystem and language
- Propose new metrics directions  and a validation process to establish an improved estimate of the volume of open source
- Using existing available data from sources such as SWH, ClearlyDefined, Libraries.io, GitHub or past research projects, provide a rough estimation using these new metrics.

The research question is: How much open source code is there in the world?

Bonus: write actual code to compute these
Philippe Ombredanne <pombredanne=>nexb.com>


49

Energy consumption of secure neural network inference using multi-party computation.

Secure neural network inference assumes the following setup. A service provider Bob offers neural network inference as a service, and a client Alice wants to use this service for a particular input. The aim is that Alice gets the output of the neural network for her input, but Alice should not learn anything about the parameters (weights, biases etc.) of the neural network, and Bob should not learn anything about the input provided by Alice.

Secure multi-party computation is a family of techniques that enable two or more actors to jointly evaluate a function on their private inputs without revealing anything but the function's output. Using secure multi-party computation to achieve secure neural network inference has been an active research area in recent years. The main research objective has been to make secure multi-party computation efficient enough.

In this project, the student(s) will work with software from Microsoft Research that implements secure neural network inference based on secure multi-party computation, available from https://github.com/mpc-msri/EzPC and described in [1-3]. While these papers showed that secure multi-party computation has the potential to efficiently perform secure neural network inference, the evaluation in these papers was limited to efficiency (latency, amount of data transfer). However, energy consumption is a growing concern for multiple reasons, including environmental impact, energy costs, and the limited energy of battery-powered devices. The aim of this project is to evaluate the energy consumption of secure neural network inference.

Specific questions to answer in the project may include the following:
1. How much energy does secure neural network inference consume on the client side, on the server side, and in the network?
2. What factors influence the energy consumption?
3. From the methods available in the software library, which is the most energy-efficient?


References

[1] Nishant Kumar, Mayank Rathee, Nishanth Chandran, Divya Gupta, Aseem Rastogi, Rahul Sharma. CrypTFlow: Secure TensorFlow Inference. 2020 IEEE Symposium on Security and Privacy (SP 2020), pp. 336-353, 2020
[2] Deevashwer Rathee, Mayank Rathee, Nishant Kumar, Nishanth Chandran, Divya Gupta, Aseem Rastogi, Rahul Sharma. CrypTFlow2: Practical 2-Party Secure Inference. 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS '20), pp. 325-342, 2020
[3] Deevashwer Rathee, Mayank Rathee, Rahul Kranti Kiran Goli, Divya Gupta, Rahul Sharma, Nishanth Chandran, Aseem Rastogi. SiRnn: A math library for secure RNN inference. 2021 IEEE Symposium on Security and Privacy (SP 2021), pp. 1003-1020, 2021

Zoltan Mann <z.a.mann=>uva.nl>
Daphne Chabal <d.n.m.s.chabal=>uva.nl>



50

Key Management as a Service for Blockchain Access Control Applications.

Project description: Blockchain is a distributed database that is shared and synchronised across a peer-to-peer network with no single or central control point. A fundamental aspect of blockchain is the transparency of the transactions on the ledger for its validity and auditability. Transparency is a core feature of blockchain that leads to a challenge: because some data we want to store on the blockchain are sensitive, we may not want to expose them to other peers in the network. Some blockchain solutions suggest that the answer to this problem is to store sensitive data off of the blockchain altogether. Those solutions use smart contract facilities to enable access control for the sensitive data stored off-chain. However, as good as these facilities might be, access control alone will not address the data protection requirements, such as confidentiality and integrity. Thus, such solutions additionally need to encrypt sensitive data off-chain and manage the keys.

Research proposal: Your research will investigate the existing key management mechanisms hosted on a decentralised platform to enable users to manage the private keys required to protect sensitive data off-chain on the blockchain applications. Then, you will develop and implement a mechanism that makes encryption keys available to the users and smart contracts that need them. The mechanism also restricts access to encryption keys, granting and revoking access to these keys. This project would extend our proof-of-concept access control protocol based on smart contracts to access control for sensitive data.
Marcela Tuler de Oliveira |<m.tuler=>amsterdamumc.nl>
Dr. Silvia Olabarriaga <s.d.olabarriaga=>amsterdamumc.nl>



51

Modeling of medical data access logs for understanding and detecting potential privacy breach.

Healthcare organizations keep electronic medical records (EMRs) that provide information for patient care. These organizations have the legal duty of safeguarding access to EMRs by establishing procedures to control, track and monitor all accesses to the data, as well as to detect and act upon intrusions. Extensive data access logs need to be analysed regularly to detect illegitimate data access actions, but such analysis is challenging due to the volume of the logs and the difficulty to recognize such rare events. Moreover, log data are extremely sensitive because they contain references to patients, employees and organizations. This hampers access to such log data for research purposes, for example, to develop machine learning methods that can aid in the detection of illegitimate events.

In this project we aim at taking initial steps for understanding and modelling the statistical properties of medical data access logs with the goal of developing a computational model that enables the generation of synthetic datasets to aid in the development of new approaches for intrusion detection in such logs. The structure of the logs available at the Amsterdam UMC - location AMC will be used as a starting point for the modelling.

The code of the model will be published as open source at the end of the project.
Dr. Silvia Olabarriaga <s.d.olabarriaga=>amsterdamumc.nl>
Zwinderman, A.H. (Koos)<a.h.zwinderman=>amsterdamumc.nl>

Luc Vink <luc.vink=>os3.nl>

1
52

Automated Incident Response in the Cloud: An Environment Agnostic Solution in AWS.

Organizations are moving to the cloud, some organizations go for a hybrid setup and some go full cloud. From an incident response perspective the cloud offers some great possibilities (and challenges). For this research we are looking for one or more students that want to dive into one of the big 3 clouds Microsoft Azure, Amazon AWS or Google Cloud Platform to research how the cloud can be leveraged for incident response automation. Often during an investigation you need to acquire data from a cloud environment. What we are interested in is if we can leverage serverless functions like AWS Lambda or Azure Functions to efficiently acquire and process data. The end goal is a PoC with several serverless functions that can be used for incident response cases. Together we can scope this into something managaeble for the project time period.

Korstiaan Stam <korstiaan=>invictus-ir.com>

Antonio Macovei <Antonio.Macovei=>os3.nl>
Rares Bratean <Rares.Bratean=>os3.nl>

1
53

Cloud forensics of Docker containers on Amazon AWS.

One of the challenges of the cloud from an incident response or forensics perspective is the volatility of data. This is especially challenging for cloud environments that make use of containers such as Kubernetes/Docker. In this research we are looking for someone to investigate what options an investigator has when it comes to investigating container-based systems. Ideally at the end of the research you can answer questions related to the availability, content, and acquisition methods of containers in the cloud.

Korstiaan Stam <korstiaan=>invictus-ir.com>

Artemis Mytilinaios <amytilinaios=>os3.nl>

2
54

Forensic analysis of Google Workspace evidence.

Google Workspace is the suite used by many organizations around the world for email and productivity tooling. As such investigating a Google Workspace for possible misuse, insider threat or a Business Email Compromise (BEC) attack is becoming more common. The main evidence for Google Workspace is stored in Google Workspace Audit Logs. We are looking for a student that wants to create a method for forensic analysis of those logs. During the research you will have a test environment where you can simulate attacks. We want you to come up with a forensic analysis method for identifying attacks based on the available audit logs. We want to publish this research to the world, and this is your chance to participate in that effort.

Korstiaan Stam <korstiaan=>invictus-ir.com>

Bert-Jan Pals
<bpals=>os3.nl>
Greg Charitonos <gcharitonos=>os3.nl>

1
55

Poisoning Attacks against LDP-based Federated Learning.

Federated learning is a collaborative learning infrastructure in which the data owners do not need to share raw data with one another or rely on a single trusted entity. Instead, the data owners jointly train a Machine Learning model through executing the model locally on their own data and only share the model parameters with the aggregator.
While the participants only share the updated parameters, still some private information about underlying data can be revealed from the shared parameters. To address this issue, Local Differential Privacy has been used as effective tool to protect information leakage over shared parameters in Federated Learning, say LDP-FED. However, it has not yet been investigated whether (and to what extent) the LDP-FED is resistant against data and model poisoning attacks. Also, if LDP-FED is not resistant against these attacks, how can we design a robust LDP-FL where its performance is negligibly affected by poisoning attacks.

This project aims to evaluate the resistance of the LDP-FED against poisoning attacks and to explore the possibilities of reducing the success rate of these attacks. The following papers are suggested to be studied for this work:

1. Stacey Truex, Ling Liu, Ka-Ho Chow, Mehmet Emre Gursoy, Wenqi Wei; LDP-Fed: Federated Learning with Local Differential Privacy, CoRR, 2020.

2. Mohammad Naseri, Jamie Hayes, and Emiliano De Cristofaro; Toward Robustness and Privacy in Federated Learning: Experimenting with Local and Central Differential Privacy, CoRR, 2020.

3. Lingjuan Lyu, Han Yu, Xingjun Ma, Lichao Sun, Jun Zhao, Qiang Yan, Philip S. Yu, Privacy and Robustness in Federated Learning: Attacks and Defenses, arXiv, 2020.

4. Malhar Jere, Tyler Farnan, and Farinaz Koushanfar; A Taxonomy of Attacks on Federated Learning, IEEE Security & Privacy, 2021.

5. Xiaoyu Cao, Jinyuan Jia, Neil Zhenqiang Gong, Data Poisoning Attacks to Local Differential Privacy Protocols, CoRR, 2019.

6. Minghong Fang, Xiaoyu Cao, Jinyuan Jia, Neil Zhenqiang Gong; Local Model Poisoning Attacks to Byzantine-Robust Federated Learning, the 29th Usenix Security Symposium, 2020.

Mina Alishahi <mina.sheikhalishahi=>ou.nl>



56

Privacy by Design in Smart Cities.

Smart city has been emerged as a new paradigm aiming to provide the citizens better facilities and quality of life in terms of transportation, healthcare, environment, entertainment, education, and energy. To this end, smart city monitors the physical world in real time and collects data from sensing devices, heterogeneous networks, and RFID devices. As the cities become smarter, the security and privacy of citizens is more and more threatened that require to be carefully addressed. Accordingly, it is in crucial importance to understand the privacy threats in smart cities such that the researchers, stakeholders, and engineers can design a privacy-friendly smart city.

This project aims to explore the existing and future privacy threats in a smart city and how they can be addressed by design.
The following papers are suggested to be studied for this work:

1. Mehdi Sookhak, Helen Tang, Ying He, F. Richard Yu; Security and Privacy of Smart Cities: A Survey, Research Issues and Challenges, IEEE Communications Surveys & Tutorials, 2019.

2. David Eckhoff, Isabel Wagner; Privacy in the Smart City-Applications, Technologies, Challenges, and Solutions, IEEE Communications Surveys & Tutorials, 2018.

3. Kuan Zhang, Jianbing Ni, Kan Yang, Xiaohui Liang, Ju Ren, and Xuemin (Sherman) Shen; Security and Privacy in Smart City Applications: Challenges and Solutions, IEEE Communications Magazine, 2017.


Mina Alishahi <mina.sheikhalishahi=>ou.nl>

Babak Rashidi <brashidi@os3.nl>
Cesar Panaijo <cpanaijo@os3.nl>

1
57

Privacy Preserving k-means/k-median Distributed Learning using Local Differential Privacy.

The classical clustering algorithms were designed to be implemented in central server. However, in recent years data is generally located in distributed sites in different locations. Due to privacy concerns the data owners are unwilling to share their original data to public or even to each other. Several approaches have been proposed in the literature which protects the data owners privacy while at the same time a clustering algorithm is shaped over protected data.
Given that the proposed solutions mainly need the presence of a trusted party, Local Differential Privacy (LDP) can be used as an effective solution that protects the data owners data in her local device.

This study aims to investigate the application of LDP in the distributed learning of two well-known clustering algorithms, namely k-means and k-medians, in terms of utility loss and privacy leakage. Specifically, it explores the resistance of LDP-based k-means/k-medians clustering against poisoning attacks.

The following papers are suggested to be studied for this work:

1. Maria Florina Balcan, Steven Ehrlich, Yingyu Liang; Distributed k-Means and k-Median Clustering on General Topologies, NIPS, 2013.

2. Geetha Jagannathan, Rebecca N. Wright; Privacy-Preserving Distributed k-Means Clustering over Arbitrarily Partitioned Data, ACM SIGKDD, 2005.

3. Chang Xia, Jingyu Hua, Wei Tong, Sheng Zhong; Distributed K-Means clustering guaranteeing local differential privacy, Computer & Security journal, 2020.

4. Pathum Chamikara, Mahawaga Arachchige, Peter Bertok, Ibrahim Khalil, Dongxi Liu, Seyit Camtepe, Mohammed Atiquzzaman; Local Differential Privacy for Deep Learning, IEEE Internet of Things Journal, 2020.

5. Malhar Jere, Tyler Farnan, and Farinaz Koushanfar; A Taxonomy of Attacks on Federated Learning, IEEE Security & Privacy, 2021.

Mina Alishahi <mina.sheikhalishahi=>ou.nl>

58

Scalable Blockchain-based framework in Internet of Things (IoT).

Internet of Things (IoT) is an ever-increasing technology in which many of our daily objects are connected via Internet (or other networks) and transfer data for analysis or performing certain tasks. This property makes the IoT vulnerable to security threats needing to be addressed for building trust among clients.
Blockchain, a technology born with cryptocurrency, has shown its effectiveness and robustness against some security threats when integrated with IoT. Against its capability, the main issue of integrating blockchain with IoT is its scalability and efficiency with a large-scale network like IoT.
In this project we aim to explore the existing solutions addressing the scalability of blockchain in IoT and to investigate the possibilities of improving the existing ones by proposing new solutions.

The following papers are suggested to be studied for this work:

1. Hong-Ning Dai, Zibin Zheng, Yan Zhang; Blockchain for Internet of Things: A Survey, IEEE Internet of Things Journal, VOL. 6, NO. 5, 2019.

2. Hany F. Atlam, Muhammad Ajmal Azad, Ahmed G. Alzahrani, Gary Wills; A Review of Blockchain in Internet of Things and AI, Big Data and Cognitive Compuing MDPI, 2020.

3. Tiago M. Fernandez-Carames, Paula Fraga-Lamas; A Review on the Use of Blockchain for the Internet of Things, IEEE Access, 2020.

Mina Alishahi <mina.sheikhalishahi=>ou.nl>

59

Deep Learning for Partial Image Encryption.

Face recognition has increasingly gained importance for a variety of applications, e.g., surveillance in public places, access control in organizations, in tagging photos in social network, and border control at airports. The widespread application of face recognition, however, raises privacy risks as the individuals biometric information can be used to profile and track people against their desire.
A typical solution to this problem is the application of Homomorphic Encryption, where an encrypted image is searched to be check in a list of images for a possible match. However, this solution is heavy in terms of both computation and communication costs as it requires all images pixels to be encrypted. This is while all the pixels of an image do not contain privacy-sensitive information.

In this project, we plan to investigate the application of Deep Learning in detecting the users identifiable pixels
(instead of all pixels) for partial encryption of an image.
1. Zekeriya Erkin, Martin Franz, Jorge Guajardo, Stefan Katzenbeisser, Inald Lagendijk, Tomas Toft; Privacy-Preserving Face Recognition, International Symposium on Privacy Enhancing Technologies Symposium, 2009.

2. Peiyang He, Charlie Griffin, Krzysztof Kacprzyk, Artjom Joosen, Michael Collyer, Aleksandar Shtedritski, Yuki M. Asano; Privacy-preserving Object Detection, arXive, 2021.

Mina Alishahi <mina.sheikhalishahi=>ou.nl>

Carmen Veenker <c.m.i.veenker=>uva.nl>
Danny Opdam <dopdam=>os3.nl>

1
60

Deep Learning for Detecting Network Traffic Attack.

The expansion of new communication technologies and services, along with an increasing number of interconnected network devices, web users, services, and applications, contributes to making  computer networks ever larger and more complex as systems. However, network anomalies pose significant challenges to many on-line services, which their performance is highly dependent to network performance. For instance, a faulty airport network caused nine
hours delay in all of its fights in 2007. To address the issues related to network anomalies, the security solutions need to analyze, detect, and stop such attacks in real time. Although there is a significant amount of technical and scientific literature on anomaly detection methods for network traffic, still 1) a new generated (simulated) dataset that contains a wide range of network attacks (detectable through network traffic monitoring) is missing; 2) the valuable step of feature selection is often underrepresented and treated inattentively in the literature; and 3) the detection techniques suffer from considerable false error rate.

The aim of this project is to address these issues by analyzing network traffic using Deep Learning.

The following papers are suggested to be studied for this project:

1. A. Kind, M. P. Stoecklin, and X. Dimitropoulos; Histogram-based traffic anomaly detection, IEEE Transactions on Network and Service Management, vol. 6, no. 2, 2009.

2. R. Chapaneri and S. Shah; A comprehensive survey of machine learning based network intrusion detection, in Smart Intelligent Computing and Applications, S. C. Satapathy, V. Bhateja, and S. Das, Eds. Springer, 2019.

Mina Alishahi <mina.sheikhalishahi=>ou.nl>

61

Local Differential Privacy (LDP) in Protecting the Privacy of Inter-
net of Things (IoT)
.

The Internet of things (IoT) include physical objects with sensors, software, some processing technologies, which connect and exchange data with other devices and systems over the Internet or other communication network. One of the main challenges in Internet of Things is the users privacy. While several approaches in the literature have been proposed to protect the information privacy in IoT, still a thorough analysis of the application of Local Differential Privacy (LDP) in this setting is missing. LDP offers a strong level of privacy in which the individuals perturb their data locally before sending them to the third party (named aggregator). This manes that the LDP eliminates the need of a trusted party in the middle. In this project, we aim to investigate the application of LDP in protecting the privacy of IoT data, while still the results of some statistical analyses over protected data is practically useful.

The following papers are suggested to be studied for this project:
1. Chao Li, Balaji Palanisamy; Privacy in Internet of Things: From Principles to Technologies, IEEE Internet Of Things Journal, VOL. 6, NO. 1, 2019.

2. Diego Mendez, Ioannis Papapanagiotou, Baijian Yang; Internet of Things: Survey on Security and Privacy, https://arxiv.org/pdf/1707.01879.pdf, 2017.

3. Mengmeng Yang, Lingjuan Lyu, Jun Zhao, Tianqing Zhu, Kwok-Yan Lam; Local Differential Privacy and Its Applications: A Comprehensive Survey, https://arxiv.org/pdf/2008.03686.pdf, 2015.

Mina Alishahi <mina.sheikhalishahi=>ou.nl>

62

Game Theory Meets Privacy-preserving Distributed Learning.

Companies, organization, and even individuals find mutual benefits in sharing their own information to make better decisions or to increase their revenues. However, generally for privacy concerns the data holders are unwilling to share their own table of data but are interested in getting information from other parties data. Thence, it is an essential task to define a platform in which several aspects of data-sharing come under consideration and through a game theoretic approach all parties relax their privacy requirements as much as possible to have a more effective output.

In this project, we plan to define data sharing as a game in which several aspects are considered as: 1) the value of shared data (freshness, size,. . . ), 2) privacy gain (in terms of anonymization, differential privacy, etc.), 3) trust o reputation, and 4) the utility of result. The output of the game is setting the Nash Equilibrium in a way that the best balance in terms of utility and privacy is obtained.

The following papers are suggested to be studied for this project:
1. Ningning Ding, Zhixuan Fang, Jianwei Huang; Incentive Mechanism Design for Federated Learning with Multi-Dimensional Private Information, 18th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOPT), 2020.

2. Yufeng Zhan, Jie Zhang, Zicong Hong, Leijie Wu, Peng Li, Song Guo; A Survey of Incentive Mechanism Design for Federated Learning, IEEE Transactions on Emerging Topics in Computing, 2021.

3. Ningning Ding; Zhixuan Fang; Lingjie Duan; Jianwei Huang; Incentive Mechanism Design for Distributed Coded Machine Learning, IEEE Conference on Computer Communications (InfoComm), 2021.

Mina Alishahi <mina.sheikhalishahi=>ou.nl>

63

The Output Privacy of Collaborative Classifiers Learning.

Privacy preserving data mining has focused on obtaining valid result when the input data is private. For example, secure multi-party computation techniques are utilized to construct a data-mining algorithm on whole distributed data, without revealing the original data. However, these approaches still might leave potential privacy breaches, e.g., by looking at the structure of a decision tree constructed on the protected shared data.

The aim of this project is to investigate how the output of a classifier constructed collaboratively over private data violates the input data privacy. We then plan to propose solutions to reduce the privacy leakage in this setting.

The following papers are suggested to be studied for this project:

1. Qi Jia, Linke Guo, Zhanpeng Jin, Yuguang Fang; Preserving Model Privacy for Machine Learning in Distributed Systems, IEEE Transactions on Parallel and Distributed Systems, 2018.

2. Reza Shokri, Marco Stronati, Congzheng Song, Vitaly Shmatikov; Membership Inference Attacks Against Machine Learning Models, IEEE Symposium on Security and Privacy, 2017.

3. Ting Wang, Ling Liu, Output Privacy in Data Mining, ACM Transactions on Database Systems, 2011.

4. Radhika Kotecha, Sanjay Garg; Preserving output-privacy in data stream classification, Progress in Artificial Intelligence, June 2017, Volume 6, Issue 2, pp 8710.

Mina Alishahi <mina.sheikhalishahi=>ou.nl>

64

On the Trade-off between Utility Loss and Privacy Gain in LDP-
based Distributed Learning
.

Local Differential Privacy (LDP) is a notion of privacy that provides a very strong privacy guarantee by protecting confidential information on users sides. In this setting generally the users employ a randomization mechanism to perturb their data on their devices following the rules of a mechanism properly. The collected data when aggregated preserves some statistical properties, e.g., mean value can be computed out of perturbed data. This interesting property of LDP has lead to its wide application in many real-world scenarios. In particular, it has been used as an effective tool in privacy preserving distributed machine learning. However, a thorough analysis on finding the trade-off of between the utility loss and privacy gain on LDP-based distributed learning is missing.

In this project we plan to investigate the utility-privacy trade-offs in learning some well-known classifiers when they are trained on distributed data respecting LDP.
The following papers are suggested to be studied for this project:

1. Emre Yilmaz, Mohammad Al-Rubaie, Morris Chang; Locally Differentially Private Naive Bayes Classification, https://arxiv.org/pdf/1905.01039.pdf, 2019.

2. Mengmeng Yang, Lingjuan Lyu, Jun Zhao, Tianqing Zhu, Kwok-Yan Lam; Local Differential Privacy and Its Applications: A  Comprehensive Survey, https://arxiv.org/pdf/2008.03686.pdf, 2015.

3. Mario S. Alvim, Miguel E. Andres, Konstantinos Chatzikokolakis, Pierpaolo Degano, Catuscia Palamidessi; Differential Privacy: on the trade-off between Utility and Information Leakage, 2011.

Mina Alishahi <mina.sheikhalishahi=>ou.nl>

65

Deep Learning for Private Text Generation.

The recent development of Deep Learning has led to its success in tasks related to text processing. In particular, Recurrent Neural Network (specifically LSTM) has served as effective tool in next-word prediction. However, the application of Deep Learning in 1) generating a text which respects some privacy constrains and 2) predicting the next word in a sentence in such a way protects confidential information is currently missing.

In this project, we plan to employ Deep Learning as a useful tool to detect the words and sentences that cause privacy violation through uniquely identifying a person (or other confidential information linked to a person) in a text and replace them with meaningful substitute words. Also, we plan to design LSTM that suggests the next-words
by considering the text privacy protection.

1. Shervin Minaee, Nal Kalchbrenner, Erik Cambria, Narjes Nikzad, Meysam Chenaghlu, Jianfeng Gao; Deep Learningbased Text Classification: A Comprehensive Review, ACM Computing Surveys, 2021.

2. Ankush Chatterjee, Umang Gupta, Manoj Kumar Chinnakotla, Radhakrishnan Srikanth, Michel Galley, Puneet Agrawal; Understanding Emotions in Text Using Deep Learning and Big Data; Computers in Human Behavior, 2019.

3. Hong Liang, Xiao Sun, Yunlei Sun & Yuan Gao; Text feature extraction based on deep learning: a review, EURASIP Journal on Wireless Communications and Networking volume, 2017.

4. Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Francoise Beaufays, Sean Augenstein, Hubert Eichner; Federated Learning for Mobile Keyboard Prediction, 2019.

Mina Alishahi <mina.sheikhalishahi=>ou.nl>




66

Automatically calculate the financial footprint of an application (container) inside the (public) cloud.

Background information
Using (public) cloud resources is becoming a more popular option today1. However moving (part of) the  infrastructure/applications to a cloud environment introduces several
challenges, one of these is determining the financial footprint of said applications. Many cloud providers provide tools to determine a (rough) estimate for moving to the cloud (e.g. AWS Pricing Calculator). However much of the information to determine the cost needs to be done/entered manually into such tools. Having a tool/framework that allows automatic calculation/estimation of a (web) application would provide valuable and more accurate insight for companies that want to move to the cloud.

Problem Description
Currently it's hard to determine up front what the effective costs of an application will be inside the public cloud based static cost analysis. Effective application behaviour cannot be taken into account up front easily and could be a blocking factor for a client to migrate to the cloud or even adopt a cloud (native) strategy.

Research
Determine feasibility of developing a method/framework to automatically determine the effective financial footprint in the public cloud of an application and proof that with a Proof of Concept.

Scope suggestions and requirements on Proof of Concept implementation:
- an application is already containerized
- an application is not (yet) containerized
- input out specification is part of the framework to generate input for the application to start generating CPU cycles (inside the public cloud) and to fetch the bill from e.g. AWS after X amount of time.

https://www.gartner.com/en/newsroom/press-releases/2021-04-21-gartner-forecasts-worldw
ide-public-cloud-end-user-spending-to-grow-23-percent-in-2021

Maurice Mouw <maurice.mouw=>sue.nl>
Serge van Namen <serge.van.namen=>sue.nl>


67

Future proofing networks: On core routing and SRv6

IPv6 was introduced in 1998 and is intended to be the successor of IPv4. The address space of IPv6 is 2^96 bigger than IPv4 and the transition period to IPv6 has now been going on for two decades.

When there is the ability to move away from MPLS, SRv6 might become an alternative. LDP is used over IPv4/IPv6 to exchange labels in MPLS environments. How can this task be accomplished in SRv6? Would it be possible to operate both stacks subsequently while moving away from MPLS towards SRv6? How does 4PE differ from the SRv6 technology and can they be related at all? And does SRv6 have the same key capabilities as MPLS like L2VPNs? It is interesting to take a look at how SRv6 can be implemented in existing environments instead of focussing on greenfield situations.

This project focuses on the working of routing IPv4 traffic over SRv6. Additionally it will involve routing exchange mechanisms using MP-BGP and the difference between the OSPF and ISIS extensions for SRv6. This research project will answer the question whether the switch from MPLS towards SRv6 would be feasible in an existing environment while providing comparable features.
Ruben Valke <ruben=>2hip.nl>

Sander Post <sander.post=>os3.nl>
Krik van der Vinne <krik.vandervinne=>os3.nl>

1
68

Post-Exploitation Defence of Git Repositories using Honey Tokens

It is not unheard of that secrets leak through open-source Git repositories or similar version control software. Many tools have been developed that assist in detecting secrets that are accidentally pushed to the repository, notifying the developer upon detection. However, much less attention seems to be given to incidents in which the leak is another type of sensitive data: the complete content of the repository itself. This study aims to shed some light on this issue by determining whether an additional level of security can be added to Git repositories in the form of honey tokens. Adding honey tokens to Git repositories means creating a Defense-in-Depth measure that raises an alarm once a repository is cloned or viewed. Moreover, the possibilities of trip-wiring repositories with honey tokens are reviewed by considering the applicability, usability and effectiveness of the created tokens along with the options of using these notifications to start Incident Response workflows. Overall, the study presents a way for security teams to create 'tokened repositories' as a last line of defense for compromised credentials to Git repositories.
Melanie Rieback <melanie=>radicallyopensecurity.com>

Max van der Horst <Max.vanderHorst=>os3.nl>

1
69

Audio as an entropy source for true random number generation.

There are many fields where there is a big reliance on generating nondeterministic and unpredictable sequences of numbers. True random number generators (TRNGs) try to achieve this by extracting randomness - also sometimes called entropy - from some type of physical source. A wide variety of these physical sources are available to choose from. Thermal noise, natural clock jitter, and keyboard timing and mouse movements - used by Intel, AMD, and Linux, respectively - are all examples of physical sources to extract randomness from. Sound is also an interesting physical source where randomness could be extracted from. This research will focus on how random the numbers from a sound-based TRNG are, how these numbers compare to numbers supplied by other forms of random number generation, and how viable the practical use of a sound-based TRNG is when looking at efficiency and throughput.
Taco Walstra <t.r.walstra=>uva.nl>

Oscar Muller <Oscar.Muller=>os3.nl>

1
70

Bring your own Living of the Land binaries

In the past malware found and actively use new techniques to perform malicious activities undetected. Living Off The Land Binaries and Scripts (LOLBAS) are Microsoft-signed file, either native to the OS or downloaded from Microsoft. They sometimes include extra "unexpected" functionality that are not interesting, or not documented on purpose that can be misused for malware or red teaming. Example use cases are: Executing code, file operations like downloading, uploading, coping, persistence, UAC bypass, dumping process memory and/or DLL side-loading. Various software types simplifies maintainability in enterprise environment and are thereby installed by sysadmins. This research focuses on a set of common trusted, third party enterprise applications which unpacks binaries and libraries during the installation, that can be misused for malicious activities?
Roy Duisters <roy.duisters=>shell.com>

Vincent Denneman <vincent.denneman=>os3.nl>

1
71

Development of an open source malicious network traffic generator based on MITRE ATT&CK

Currently, most network intrusion detection systems incorporate artificial intelligence, specifically machine learning and deep learning. Such systems need to be trained with a dataset simulating realistic malicious traffic inside regular network traffic. These datasets are rare to find due to either containing sensitive information, outdated traffic or the lack of realistic malicious traffic. For that reason, this study aims to build a framework through which malicious network traffic can safely be generated and included inside a dataset with realistic synthetic network traffic.
 Irina Chiscop <irina.chiscop=>tno.nl>

Jeroen van Saane <jeroen.vansaane=>os3.nl>

Dennis van wijk <dennis.vanwijk=>os3.nl>

1
41

An analysis of the security of LiFi and WiFi systems.

WiFi is the de facto standard for Wireless Local Area Networks for communication service providers globally. LiFi is a relatively new technology using Optical Wireless Communication. The term LiFi was coined in a TED Talk in July 2011 [1]. LiFi has now been commissioned in defence [2] and standardisation has commenced in the ITU [3] and IEEE [4].

One of the key claims of LiFi is the additional security that the restriction of the physical transmission medium brings. Light is unable to penetrate solid objects and so any transmission in a room, stays internal to the room. In addition, standard AES encryption is added to communication links. There are numerous claims that LiFi is more secure than WiFi [5] but WiFi has made enormous strides in recent years with the introduction of WPA3 and other mechanisms.

Our challenge is to understand if LiFi is as, or more secure, as a wireless transmission medium, than WiFi (both legacy and latest versions).  We propose testing LiFi and WiFi using a proof of concept environment  based on the latest generally available equipment to provide a side by side comparison.


[1] TED Talks July 2011. https://www.ted.com/talks/harald_haas_wireless_data_from_every_light_bulb
[2] BBC News April 2021 https://www.bbc.co.uk/news/uk-scotland-scotland-business-56900762
[3] ITU G9991  May 2019  https://www.itu.int/ITU-T/workprog/wp_item.aspx?isn=13397
[4] IEEE 802.11bb Nov 2021 https://www.ieee802.org/11/Reports/tgbb_update.htm
[5] Why LiFi is more secure than WiFi  https://lifi.co/why-lifi-is-more-secure-than-wifi/
Vegt, Arjan van der <avdvegt=>libertyglobal.com>

Carmen Veenker <c.m.i.veenker=>uva.nl>


72

Misusage of vulnerable Wordpress websites by malicious actors.

This research will look into the misuage of vulnerable Wordpress websites by malicious actor for resource development as an aid for their cyber operations. Using OSINT techniques and data analysis on open HTTP & HTTPS data, we will look at how many vulnerable websites are currently running globally. Furthermore, we will analyse how many of these websites are compromised and classified as malicious by comparing the data against Open Cyber Threat Intelligence feeds. We will determine what purpose the compromised websites serve within the cyber operations, and which actors and groups are mostly affiliated with this attack methodology. Lastly, we will propose a proof-of-concept mechanism that uses Threat Intelligence to mitigate such attacks on the webserver side.
Jordi Scharloo <jordi.scharloo=>kpn.com>

Talha Uar <tucar@os3.nl>


1
73

Detecting NTLM relay attacks using a honeypot.


Abstract:

NTLM relay attacks are very prevalent at the moment and new ways to trigger NTLM authentication methods have been found this year (printerbug and petitpotam). These authentications can then be relayed to targets like Active Directory Certificate Services. In this research we will make a framework to detect NTLM relay attacks in a generic way. The goal is to build a framework that can detect all NTLM relay attacks instead of just the vulnerabilities that are already known. This way new vulnerabilities can be detected, investigated and mitigated. We will build a honeypot as a PoC for the framework.


Steps:
* Giving a theoretical overview of NTLM authentication and NTLM relay attacks
* Creating a framework to detect NTLM relay attacks based on the behaviour of  the systems
* Building a honeypot as PoC for the detection of NTLM relay attacks
* Evaluating the framework based on the honeypot data


Sources:
https://posts.specterops.io/certified-pre-owned-d95910965cd2
https://github.com/topotam/PetitPotam
Robert Diepeveen <robert.diepeveen=>northwave.nl>

Maurits Maas <Maurits.Maas@os3.nl>
Freek Bax <fbax@os3.nl>

2
74

Industrial programmable logic controller automation with configuration management tools.

Research in Industrial Control Systems: The configuration of PLC (Programmable Logic Controller) devices in a ICS environment is not yet automated. It would be interesting to research the feasibility to automate this process using Siemens PLCs. This solution would be based on Ansible, Chef or Puppet.
Chandni Raghuraman <craghuraman=>deloitte.nl>
Pavlos Lontorfos <plontorfos=>deloitte.nl>

Nathan Keyaerts <nathan.keyaerts=>os3.nl> Mattijs Blankesteijn <mblankesteijn=>os3.nl>

1
75

Implementing side channel resistance in QARMA.

QARMA is a tweakable lightweight block cipher. This project explores the design of an implementation of QARMA resistant against power analysis attacks.
Marco Brohet <m.j.a.brohet@uva.nl>
Francesco Regazzoni <f.regazzoni=>uva.nl>

Joris Janssen <Joris.Janssen=>os3.nl>

2
76

Enriching IDS detection on network protocols using anomaly-based detection.

The growing cyberthreat has lead to the rise of Network IDS (NIDS). However the anomaly based NIDS suffers from high false positive rates and if Machine Learning (ML) based, lack of explainablity. Within this research an Domain Name System (DNS) anomaly based ML solution is created with promising results based on Zeek. The best performing model without hyper-parameters is Local Outlier Factor, while the best model with hyper-parameters is the Isolation Forest. Overall it seems like the hyper-parameters reduce performance. Additionally steps are being made towards a cookbook for other protocols. In discussion future work is lined out like looking at other hyper-parameters, protocols and real-world performance.
Francisco Dominguez <francisco.dominguez=>huntandhackett.com>

Pim van Helvoirt <Pim.VanHelvoirt=>os3.nl>

1
77

Comparison of state-of-the-art endpoint defence solutions to (partially) open-source endpoint defence

Endpoint defence evolved a lot in the last decade and the old anti-malware / anti-virus software a small sub-section of the state-of-the-art endpoint defence solutions. Instead of anti-malware / anti-virus, we are now talking about Endpoint Defense and Repsonse (EDR), Data Loss Protection (DLP), File Integrity Monitoring (FIM) and other fancy words that suppliers have the creativity to come up with. The biggest suppliers on the market are busy expanding their software with new features. This project will allow the students to get access to some vendor trial licences (1 or more) and compare the functionality of the products with free and open-source product offerings. Depending on student ability the project can result in the development of new features into open-source products. A minimum expected deliverable of the project is a comparison report and proposed development path to improve the open-source or proprietary products.
Peter Prjevara <peter.prjevara=>solvinity.com>

Dennis van Wijk <dwijk=>os3.nl>


78

Baruwa mail security solution - is it good enough?

Baruwa is an open-source mail security solution (https://pythonhosted.org/baruwa/introduction.html). It builds on other open-source components, such as SpamAssassin or ClamAV, to deliver protection against malicious e-mail. The effectiveness of the solution depends on the individual components. Spamassassin (https://github.com/apache/spamassassin) for instance promises that it "differentiates successfully between spam and non-spam in between 95% and 100% of cases", but is this true? If so, with what configuration? One research question could be related to this: what is the ideal Spamassasin configuration to achieve this ratio? Does the default configuration suffice? A student however could also choose to focus on a different component of the Baruwa - the ultimate goal of this project is to assess limitations and improve on the capabilities of this open-source product. A broad research question then could be: Could there be additional components added to the Baruwa to increase its capabilities? Another interesting question could be the assessment of the Baruwa's enterprise licences. Do they worth the buck compared to the open-source product?
Peter Prjevara <peter.prjevara=>solvinity.com>

Bram Peters <Bram.Peters=>os3.nl>


79

Investigate hidden VNC methodologies for malware

During a Red Teaming engagement we simulate a realistic cyber attack on an organisation. The end goal of such a simulation is to achieve a real impact within the organisation. These final "actions on the objective" can require interaction with applications that are specific to the client. For example, a railroad company will use very specific software to control the trains. The most practical, and sometimes only, way to interact with these tools is via a graphical user interface. However, common methods of interacting with GUI applications, such as the RDP protocol, are not as stealthy as an attacker would like. One technique that is used in the wild is the concept of a Hidden VNC service. This technique provides an operator with a VNC like experience, while remaining hidden from the real desktop. The goal of this research is to investigate various methods to create a Hidden VNC service, compare the pros and cons of each method and implement the best technique.
Huub van Wieren <vanWieren.Huub=>kpmg.nl>

Antonio Macovei <Antonio.Macovei=>os3.nl>
Shadi Alhakimi<shadi.alhakimi=>os3.nl>


80

Implement lateral movement techniques in Beacon Object File (BOF) format

Due to increasingly effective endpoint detection and response (EDR) solutions, attackers are required to move from living off the land binaries to bring your own code techniques. In other words, they no longer use tools located on the systems themselves, but instead dynamically introduce new code into the malware process when necessary. Various techniques exist to execute code in-memory, each creating different artifacts that can be detected. Currently, the most stealthy method is inline execution, where byte code is introduced in the current thread. Existing tooling mostly relies on less stealthy execution techniques. The goal of this research is to create stealthy, inline implementations of the most important lateral movement techniques..
Huub van Wieren <vanWieren.Huub=>kpmg.nl>




81

Does the oscillation protection mechanism of a hard disk drive provide enough vibration data to recover low-quality, audible voice data from their physical environment?

In 2014, Michalevsky et al. researched the ability to use pattern recognition to recover voice data from very low-quality oscillation signals. In 2017, Ortega revealed at a conference that hard disks can function as basic gyrophone using their oscillation protection feature. In 2018, Bolton et al. discussed this demonstration in a paper. However, up until now, no scientific research has been carried out to show if it is possible to use this behavior to recover low-quality voice data. We would like to see someone researching this question. If there is enough time and the student has the ability to do it: a proof of concept would be a "nice-to-have", but this is not a requirement.
Maarten van der Slik (NL) <maarten.van.der.slik=>pwc.com> Wouter Otterspeer (NL) <wouter.otterspeer=>pwc.com>

Floris Heringa <floris.heringa=>os3.nl>


82

Project with SURF

At SURF, we plan on providing virtualized routers to our constituents. Virtual routers give us scalability, efficiency and give an on-demand character to services we offer to our constituents. Such a service would be very interesting for our international research partners that connect to NetherLight, mostly with 100G connections, but also for other use cases such as offering our EduVPN service on the NFV platform.

While developing the virtual router "as a service" several questions arise, such as:

- How can we give our constituents as much functionality as possible?
- How do virtual routers perform?
- Which virtual router performs the best?
- How do we manage security?
- How do we 'slice' systems to cope with 'noisy neighbors'?

For our NFV-infrastructure, we operate a heavily tuned KVM/qemu setup, with VPP as a software data-plane for acceleration configured on Lenovo SR635 servers with AMD Epyc2 CPU's and Mellanox Connect-X 5 100Gbit NICs.

How can we make virtual routing a success with this set-up?
Marijke Kaat <marijke.kaat=>surf.nl> Eyle Brinkhuis <eyle.brinkhuis=>surf.nl>

Inigo Gonzalez de Galdeano <Inigo.GonzalezdeGaldeano=>os3.nl>
Imre Fodi <Imre.Fodi=>os3.nl>


83

Researching missing/incorrect (Linux) system calls in Qiling Framework

Qiling is an advanced binary emulation framework written in Python (and building on Unicorn Engine and QEMU) that is cross-platform and cross-architecture. It is possible to run binaries for several
architectures and operating systems. Some uses are instrumenting binaries for security research or live patching of binaries. Not every binary can currently be run successfully: several very standard Linux binaries will fail because not every system call has yet been supported or doesn't work correctly for 64 bit binaries, or for some big-endian platforms (example: MIPS big endian).

Your task is to identify the system calls that are missing or incorrect to run x86_64 bits Linux binaries in Qiling Framework without error, search for any anti-patterns that the developers might have used and ideally add/fix the missing/incorrect system calls and contribute back to Qiling Framework.

For this project you will need to know how to program Python and read some basic C (C library, Linux kernel).
Armijn Hemel - Tjaldur Software Governance Solutions <armijn=>tjaldur.nl>



84

Performance of RSA in OpenSSL vs. libgcrypt

The goal of this project is to investigate why RSA (for blind signatures) with OpenSSL seems to be about 7x as fast (on AMD64) as the implementation in libgcrypt. Once the cause has been identified, the goal is to modify the libgcrypt implementation to catch up with OpenSSL --- unless of course there is a good security reason for why libgcrypt is slower (but that I cannot exactly believe, and would be a surprise result).  It should be noted that while the underlying bignum arithmetic is likely in assembler, it is doubtful that this is the cause of the performance difference (maybe libgcyrpt fails to properly implement CRT optimizations)?

Any resulting code should ideally be provided in a way that is suitable to be merged into libgcrypt (clearly demonstrated performance improvement, clear coding style and licensing under LGPLv3+).
Christian Grothoff <grothoff=>gnu.org>



85

Research usefulness of running semgrep on pseudo C code obtained from decompilation with angr

Semgrep[1] is a tool to perform static analysis on source code. Very often when analysing programs source code is not available and only a binary file is available. There are platforms, such as angr[2], which make it possible to (partially) decompile the code and generate pseudo C code that (to an extent) resembles the original C code. Combining the two might make it possible to apply the power of tools such as semgrep to the domain of binary files. It seems that (apart from Java) this has not yet been tried (an Internet search did not reveal anything).

Your tasks:

1. decompile (Linux) ELF binaries for which source code is available using angr and generate pseudo C code
2. run semgrep on the generated pseudo C code (you might need to write some semgrep rules for this yourself)
3. run semgrep on the original source code
4. compare the results of 2. and 3. and report

You will need to know Linux and Python. Some knowledge about the ELF format might be useful as well.

[1] https://semgrep.dev/

[2] https://angr.io/
Armijn Hemel - Tjaldur Software Governance Solutions <armijn=>tjaldur.nl>



86

Efficient secure neural network inference using multi-party computation

Secure neural network inference assumes the following setup. A service provider Bob offers neural network inference as a service, and a client Alice wants to use this service for a particular input. The aim is that Alice gets the output of the neural network for her input, but Alice should not learn anything about the parameters (weights, biases etc.) of the neural network, and Bob should not learn anything about the input provided by Alice.

Secure multi-party computation is a family of techniques that enable two or more actors to jointly evaluate a function on their private inputs without revealing anything but the function's output. Using secure multi-party computation to achieve secure neural network inference has been an active research area in recent years. The main challenge is how to make secure multi-party computation efficient enough.

In this project, the student(s) will work with software from Microsoft Research that implements secure neural network inference based on secure multi-party computation, available from https://github.com/mpc-msri/EzPC and described in [1-3]. While these papers showed that secure multi-party computation has the potential to efficiently perform secure neural network inference, the evaluation setup in these papers was rather impractical (for example, using a single client that is equally strong as the server). The aim of this project is to evaluate the efficiency of this approach in more practical settings.

Specific questions to answer in the project may include the following:
1. If the computational capacity of the client and server machines differs, how is overall performance (latency, throughput) impacted by the difference in the machines' computational capacity?
2. If the same server serves multiple clients, how does the number of clients influence the system's performance?
3. How can server-side parallelization be used to improve system performance?


References

[1] Nishant Kumar, Mayank Rathee, Nishanth Chandran, Divya Gupta, Aseem Rastogi, Rahul Sharma. CrypTFlow: Secure TensorFlow Inference. 2020 IEEE Symposium on Security and Privacy (SP 2020), pp. 336-353, 2020
[2] Deevashwer Rathee, Mayank Rathee, Nishant Kumar, Nishanth Chandran, Divya Gupta, Aseem Rastogi, Rahul Sharma. CrypTFlow2: Practical 2-Party Secure Inference. 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS '20), pp. 325-342, 2020
[3] Deevashwer Rathee, Mayank Rathee, Rahul Kranti Kiran Goli, Divya Gupta, Rahul Sharma, Nishanth Chandran, Aseem Rastogi. SiRnn: A math library for secure RNN inference. 2021 IEEE Symposium on Security and Privacy (SP 2021), pp. 1003-1020, 2021
Zoltan Mann <z.a.mann=>uva.nl>
Daphne Chabal <d.n.m.s.chabal=>uva.nl>




87

 The role of IXPs in a SCION ecosystem

SCION[1] is a promising future internet architecture that guarantees secure end-to-end communication and enhanced route control. Together with failure isolation and explicit trust, it has attracted the attention not only from multiple researchers and institutions but also from the networking industry.  Although SCION is targeting a lot the ISP world, it is interesting to investigate how it is possible to combine the low-latency paths of an Internet Exchange Point with the SCION architecture without disrupting its added benefits and functionalities.

 

The students of this project are called to discover all the critical SCION functionalities that an IXP needs to adopt to respect the new architecture. Based on the previous outcome, the students can design a SCION IXP and build a small PoC that can run on the 2STiC testbed[2]. For saving time in the implementation part, the students can utilize the SCION code [3] of SIDN Labs and proceed to necessary modifications in order to prove their theory. As a last step, the students will compare their research against the real-life scenario [4] of swiss-ix where few SCION enabled networks are connected to it.

 

[1] https://scion-architecture.net/

[2] https://2stic.nl/testbed.html

[3] https://github.com/sidn/p4-scion

[4] https://www.swissix.ch/public/scion_flyer.pdf
Stavros Konstantaras <stavros.konstantaras=>ams-ix.net>

Krik van der Vinne <krik.vandervinne=>os3.nl> leroy van der Steenhoven <leroy.vandersteenhoven=>os3.nl>


88

Command and Control over Microsoft Teams

Command and Control (C2) servers are used by attackers to control operations of compromised systems. C2 servers are typically used to store stolen data and disseminate commands. Establishing C2 communications is a vital component for adversaries. For that reason, adversaries commonly attempt to mimic expected traffic to avoid detection and/or adhere to network restrictions. There are various ways with varying stealth levels to establish communication that are dependent on the victim's network structure, and defenses. Since enterprises are increasingly opting in for Microsoft Teams,  leveraging such a platform for the malicious traffic might make it indistinguishable from legitimate traffic. This research intends to develop a novel command and control architecture that leverages Microsoft Teams to establish communication and disseminate commands.
Stefan Broeder <stefan.broeder=>nl.abnamro.com>
Rob Mouris <rob.muris=>nl.abnamro.com>

Jeroen van Saane <Jeroen.vanSaane=>os3.nl>


89

Secure Multiparty Computation

There is a lot of confidential data that is interesting for researchers. Different technologies exists to make this data available for re-use without sharing the data. The most simple technology is algorithm-to-data, which works great for single datasets or federated machine learning. We created a demonstrator to show this in action (see https://dataexchange-test.lab.surf.nl/). Another technology is to use cryptographic techniques: secure multi-party computation (MPC) or (fully) homomorphic encryption. TNO has made an demo of the latter, using Pallier encryption: https://mhe.github.io/jspaillier/. The goal of this RP is to dive a bit deeper into these cryptographic techniques, and answer the questions: (1) how does this work in more detail? (2) What are the properties of the different solutions? (3) Is there a particular technology that is well suited for re-using data for research? (4) If SURF would create a convincing demonstrator for either secure MPC or homomorphic encryption, how should this look like?
Freek Dijkstra <freek.dijkstra=>surf.nl>

Cesar Panaijo <Cesar.Panaijo=>os3.nl>


90

Measuring Route Origin Validation of authoritative name servers


The Domain Name System (DNS) and Border Gateway Protocol (BGP) are two fundamental building blocks of the internet. However, these protocols were initially not developed with security in mind. For instance, malicious groups can perform prefix hijacking and additionally spoof a DNS name server IP address in the hijacked IP prefix. Additionally, BGP is also prone to route leaks. In 2008, Resource Public Key Infrastructure (RPKI) [RFC6480, 1] was proposed to address this issue.

RPKI is a hierarchical Public Key Infrastructure (PKI) that binds Internet Number Resources (INRs), such as Autonomous System Numbers (ASNs) and IP addresses, to public keys via certificates. With the RPKI certificate scheme, AS owners can prove that they are authorized to advertise certain IP prefixes. To make this certificate scheme work, the Regional Internet Registries (RIRs) control the trust anchors for each region.

We have started measuring Route Origin Validation (ROV) of DNS resolvers since the beginning of 2020 with a research project performed by SNE students [2]. RPKI has seen a rapid increase in deployment since that project which we were able to monitor closely thanks to this research.

With this project we aim to measure the other side of the DNS spectrum: the authoritative servers. For this we have a so-called RPKI beacon at our disposal [3]. The beacon announces on purpose RPKI invalid prefixes. Overlapping less specific prefixes are announced validly from elsewhere. One approach to measure the state of Route Origin Validation of an authoritative name server is to send it a query with an IP address out of the invalidly announced prefix. The state of ROV can then be determined by detection where the responses arrive.

This project is for you if you are interested in (internet) measurements of real-life security which will help create better future standards. Knowledge of programming is useful in this project but not a requirement.

[1] https://rpki.readthedocs.io/en/latest/
[2] https://rp.os3.nl/2019-2020/p04/report.pdf
[3] https://docs.google.com/presentation/d/1Qb-HkRo4qMRxIJBqR54Dz7KYY7ITsIyhlDAtRF6CvVs/edit?usp=sharing
Tom Carpay <tom@nlnetlabs.nl> Willem Toorop <willem@nlnetlabs.nl>

Brice Habets <bhabets=>os3.nl>
Sander Post <sander.post=>os3.nl>


91

Analysing a real-world malicious Network Implant

Network Implants are a well-known tool for Red Teamers and attackers. They can be bought off-the-shelf (such as the Packet Squirrel) but often attackers are looking for a more custom device that fully adheres to their needs. One of our clients found such a device that was used during an actual attack. The device is already a bit older (+- 8 years) and was sophisticated in some aspects and rather simple in others. It contains a a.o. LAN port, a 3G-dongle with SIM card, a fan, and was disguised as a ordinary network switch. It was found by a Point-of-Sale terminal so the assumption is that the goal was to steal credit card data. The objective of this research is to analyse the device, and determine the functionality, the components, the software, and use history of the device. You have full access to the device, but cannot use destructive research methods.
van Wieren, Huub <vanWieren.Huub@kpmg.nl>

Floris Heringa <floris.heringa=>os3.nl>


92

Decentralized proactive data protection in edge computing

Connected devices in the Internet of Things (IoT) produce large amounts of valuable data. Data from IoT devices is increasingly processed in edge servers, i.e., geographically distributed computing resources offering cloud-like services with low latency to nearby IoT devices. In such a setting, the protection of data from unauthorized access, as required by data protection legislation for personal data and by business imperatives for valuable non-personal data, is challenging for multiple reasons [1]. First, edge computing systems change dynamically while being used (e.g., new edge servers may become available or existing ones removed, new applications may be deployed to the edge servers etc.). Such changes may introduce new data protection risks on the fly. Second, edge servers typically have only a local view of a part of the network, which may prohibit them from detecting data protection risks that stem from the interplay of multiple nodes. Third, successfully mitigating an identified data protection risk may require changes in multiple edge servers, thus requiring coordination among otherwise independent entities [2].

The aim of this project is to develop a software framework which allows the simulation of an edge computing system and the implementation of and experimentation with different coordination schemes to achieve decentralized proactive data protection. In this software framework, every node should have its own - dynamically updated - model, corresponding to the node's local knowledge of the network. The nodes should exchange information about their knowledge, using a gossip protocol. In addition, the nodes should inform each other about identified data protection risks, and coordinate with each other using an auction protocol to decide on the best mitigation strategy for an identified data protection risk.

This project involves (online) collaboration with the University of Duisburg-Essen in Germany.

References

[1] Z. . Mann. Data protection in fog computing through monitoring and adaptation. KuVS-Fachgesprch Fog Computing 2018, Technical Report, Technische Universitt Wien, pp. 25-28, 2018
[2] Z. . Mann, F. Kunz, J. Laufer, J. Bellendorf, A. Metzger, K. Pohl. RADAR: Data protection in cloud-based computer systems at run time. IEEE Access, 9:70816-70842, 2021
Zoltan Mann <z.a.mann=>uva.nl>




93

TCP-Prague evaluation

Low Latency Low Loss Scalable Throughput (L4S) [1] is a technology intended to reduce queue delay problems, ensuring low latency to Internet Protocol flows with a high throughput performance. TCP-Prague is the reference implementation for the upcoming L4S Internet service. Other congestion controls that support L4S, such as Googles BBRv2, are already available or will be released soon. The task of this project is to compare the performance of TCP-Prague against least one of these other congestion controls (like BBRv2), on at least one of the following criteria: (i) for steady state: fairness, RTT (in)dependence and convergence speed, and for dynamic behavior: fairness, responsiveness, and stability. Further fine-tuning of the open-source implementation will be required to line-up the behavior of the congestion controls.

Supervisor: Chrysa Papagianni (c.papagianni=>uva.nl), in collaboration with Koen De Schepper (koen.de_schepper=>nokia-bell-labs.com)
[1] B. Briscoe et al. Low Latency, Low Loss, Scalable Throughput (L4S) Internet Service: Architecture. Internet-Draft draft-ietf-tsvwg-l4s-arch-09. Work in Progress. Internet Engineering Task Force, March 2022. https://datatracker.ietf.org/doc/draft-ietf-tsvwg-l4s-arch/

TCP-Prague enhancement

Chrysa Papagianni <c.papagianni=>uva.nl>

Nathan Keyaerts <nathan.keyaerts=>student.uva.nl>


94

TCP-Prague enhancement

Low Latency Low Loss Scalable Throughput (L4S) [1] is a technology intended to reduce queue delay problems, ensuring low latency to Internet Protocol flows with a high throughput performance. TCP-Prague is the reference implementation for the upcoming L4S Internet service. However, the reference implementation could be further improved on aspects such as an appropriate slow-start response to achieve faster full link utilization and a faster flow convergence time in case other flows are active. The goal of the project is to modify the Linux TCP-Prague kernel module towards this direction and validate possible improvements to the code.

Supervisor: Chrysa Papagianni (c.papagianni=>uva.nl), in collaboration with Koen De Schepper (koen.de_schepper=>nokia-bell-labs.com)

[1] B. Briscoe et al. Low Latency, Low Loss, Scalable Throughput (L4S) Internet Service: Architecture. Internet-Draft draft-ietf-tsvwg-l4s-arch-09. Work in Progress. Internet Engineering Task Force, March 2022. https://datatracker.ietf.org/doc/draft-ietf-tsvwg-l4s-arch/
Chrysa Papagianni <c.papagianni=>uva.nl>



95

Research user initiated tracing with low overhead

At ASE2014 a paper about tracing builds to find out what really goes into a binary was presented ( https://rebels.cs.uwaterloo.ca/confpaper/2014/09/14/tracing-software-build-processes-to-uncover-license-compliance-inconsistencies.html ). The method that was described uses the strace tool to trace builds on Linux. While strace captures all the necessary output it adds a lot of overhead, creates a lot of output and slows down builds very significantly.  In-kernel tracing has become more popular (for example: DTrace, Systemtap), but if understood correctly probes have to be defined in advance, for example to monitor access to a certain directory. Since during a build it is not known which directories will be accessed (and finding out is the actual goal of tracing builds) it seems that in kernel tracing is perhaps not suitable. User events ( https://lwn.net/Articles/889607/ ) might or might not change this.

The research question: is it actually possible to use in kernel tracing such as DTrace or SystemTap to do something like strace?

Constraints:

1. no configuration of enabling certain probes before running the build should be necessary (but having a wrapper script that does set up work is perfectly fine)

2. only system calls related to the build process and all its children should be traced, as to not pollute results

3. all relevant system calls related to I/O (which might include some network calls for build systems that automatically download programs during a build) need to be captured
Armijn Hemel <armijn=>tjaldur.nl>



96

Malware Aquarium: A virtualized Infrastructure where malware resides and is being monitored

This research project investigates the requirements of having an isolated virtual infrastructure which will have strong monitoring capabilities and offers the possibility of deploying multiple malware inside it and analyzing it for a longer period of time. The research tackles the problems that are encounter in a sandbox environment such as limited analysis time, no visibility on lateral movement action and development and one malware at a time deployment restriction.
Roy Duisters <roy.duisters=>shell.com>
Arjan Sturkenboom

Rares Bratean <Rares.Bratean=>os3.nl> Rio Kierkels <rkierkels=>os3.nl>


97

Path tracing to Increase internet transparency

Due to the resiliancy en redundancy of the internet, changes in the underlying routing table are mostly unnoticable to the users.
This opaqueness could cause their network traffic to be redirected via potentially harmful networks or jurisdictions.

One way of bringing this transparency is to monitor the path to an end destination using tools like traceroute.
Traceroute has some limitations (e.g. one-way path, packets being dropped, limited, information etc.) that affect the reliability of the output.

The goal of this work would be to find innovative ways to improve the reliability of such path measurements output without changes to the workings of the Internet.
Some initial ideas are, a different way of tracerouting, combining UDP/ICMP/TCP probes to improve results, improving results using tools like (RIPE) RIS, ATLAS or BGP looking glasses.
Ralph Koning <ralph.koning=>sidn.nl>

Gerlof Fokkema <gerlof.fokkema@os3.nl>

1
98

Hieararchical Classiification for side-channel analysis

Current side-channel attacks rely on the common `flat' classifier. That is, finding the secret key requires a single classifier decision. However, it can be beneficial for the  attack accuracy to replace this single, flat decision with a multi-layer approach that finds the secret key after several hiearchical decisions. In this project we will work towards combinining various types of classfiers across different decision layers, aiming to improve the attack accuracy
Kostas Papagiannopoulos <k.papagiannopoulos=>uva.nl>
Gheorghe Pojoga <Gheorghe.Pojoga=>os3.nl>


99

Persistent Fault Analysis -- attacks and countermeasures

PFA has been a very simple yet potent attack on modern cryptography. With just a single fault it can bypass redundancy protection and recover the secret key. In this project we will work towards expanding the attack, combining it with side-channel analysis and fault sensitivity analysis and looking into countermeasures.
Kostas Papagiannopoulos <k.papagiannopoulos=>uva.nl>
Greg Charitonos <gcharitonos=>os3.nl>


100

Benefits of applying machine learning to Cilium

Cilium is an open source project that enables cloud native networking, security, and observability in environments such as kubernetes and other containerized systems. Cilium makes use of a Linux kernel feature known as eBPF (Extended Berkeley Packet Filter). As a result, the Linux kernel may incorporate security, visibility, and networking control logic dynamically. Due to the volume of data contained within, only a portion of it is exported by default. This article investigates the Cilium dataset in order to determine the possible benefits of machine learning. There are a number of advantages to combining machine learning and Cilium. Among the advantages is that anomaly detection may be used to assess whether pods within a kubernetes cluster are misbehaving.
Serge van Namen <serge.van.namen=>sue.nl>
Bart van Dongen <Bart.vanDongen=>os3.nl>

2
101

Efficient identification of passwords in large quantities of plaintext data

Password policies requiring passwords to contain at least one capital letter, number, and special character make them relatively easy to distinguish from normal text by humans. This is, however, not the case for all passwords and manual identification does not scale when a lot of passwords need to be identified. The aim of this research is to develop ways to perform efficient password identification at a large scale. Multiple methods will be investigated. The Bidirectional Encoder Representations from Transformers (BERT) language model for natural language processing (NLP) will be used to identify text that may contain a password. Next, password leak lists and their corresponding passwords will be used to identify any passwords. Finally, password generation algorithms like OMEN and PassGAN - will be used to generate candidate passwords based on password leak lists to generate additional passwords that can be used for comparison.
Zeno Geradts <zeno=>holmes.nl> Romke van Dijk <romke=>holmes.nl>
Oscar Muller <Oscar.Muller=>os3.nl> Tijmen van der Spijk  <tspijk=>os3.nl>




Presentations-rp June 2022

Tuesday July 5 2022, SP C0.110, online using bigbluebutton
Time #RP Title Name(s) RP
10h00
Introduction and welcome


10h30
93
Work in Progress: TCP-Prague evaluation
Nathan Keyaerts
2
10h55
Break


11h05 77
Comparison of state-of-the-art endpoint defence solutions to (partially) open-source endpoint defence
Dennis van Wijk
2
11h30 82 Virtualized Routers Inigo Gonzalez de Galdeano,  Imre Fodi 2
11h55
Lunch


13h05 37 Assessing data remnants in modern smartphones after factory reset Mattijs Blankesteijn 2
13h30 40 Web of Deepfakes Steef vanWooning, Danny Janssen 2
13h55
Break

14h05 101 Efficient identification of passwords in large quantities of plaintext data Oscar Muller, Tijmen van der Spijk 2
14h30 88 Command and Control over Microsoft Teams Jeroen van Saane 2
14h55

Close


Wednesday July 6 2022, SP C0.110 - online using bigbluebutton
Time #RP Title Name(s) RP
10h00
Introduction

10h05
99
Persistent Fault Analysis -- attacks and countermeasures
Greg Charitonos
2
10h30
41
An analysis of the security of LiFi and WiFi systems.
Carmen Veenker
2
10h55
Break


11h05 87
 The role of IXPs in a SCION ecosystem
Krik van der Vinne, van der Steenhoven
2
11h30 100 Benefits of applying machine learning to Cilium Bart van Dongen 2
11h55
Lunch


13h05 90
Measuring Route Origin Validation of authoritative name servers
Brice Habets, Sander Post
2
13h30 96
Malware Aquarium: A virtualized Infrastructure where malware resides and is being monitored
Rares Bratean, Rio Kierkels
2
13h55
Break

14h05 79 Investigate hidden VNC methodologies for malware Antonio Macovei, Shadi Alhakimi 2
14h30 89
Secure Multiparty Computation Cesar Panaijo 2
14:55
11
Work in Progress: Contextual information capture and analysis in data provenance Rik Janssen 2
15:20

Close




Presentations-rp January 2022

Tuesday Feb 8, 2022, hybrid, to connect on line use bigbluebutton
Time #RP Title Name(s) LOC RP
10h25

Introduction Francesco Regazzoni

10h30
74
Industrial programmable logic controller automation with configuration management tools.
Nathan Keyaerts, Mattijs Blankesteijn

10h55

Break



11h05 73
Detecting NTLM relay attacks using a honeypot
Maurits Maas, Freek Bax

2
11h30 26
Future tooling and cyber defense strategy for ICS
Leroy van der Steenhoven

1
11h55
Lunch



13h30
51
Modeling of medical data access logs for understanding and detecting potential privacy breach
Luc Vink online
1
13h55
Break


14h05 43  High-speed implementation of lightweight ciphers Gheorghe Pojoga
1
14h30 53
Research cloud container evidence
Artemis Mytilinaios online
2
14h55
Break


15h30 72
Misusage of vulnerable Wordpress websites by malicious actors
Talha Uar

15h55
Break


16h05 45
Researching efficiency of Trendmicro's HAC-T algorithm
Tijmen van der Spijk, Imre Fodi
online
1
16h30 71
Development of an open source malicious network traffic generator based on MITRE ATT&CK
Dennis van Wijk, Jeroen van Saane


16h55
Close




Wednesday Feb 9 2022, hybrid, to connect on line use bigbluebutton
Time #RP Title Name(s) LOC RP
10h00
Introduction Francesco Regazzoni

10h05
69
Using sound to facilitate true random number generation
Oscar Muller

1
10h30
44
Federated Authentication platform
Hilco de Lathouder


10h55
Break



11h05
67
Future proofing networks: On core routing and SRv6
Sander Post, Krik van der Vinne
1
11h30 52
Cloud native IR automation
Antonio Macovei Rares Bratean

1
11h55
Lunch



13h05 56
Privacy by Design in Smart Cities
Babak Rashidi, Cesar Panaijo

1
13h30 36
Characteristics of Info Stealers in 2021
Tom van Gorkom

13h55
Break


14h05 54
Forensic analysis of Google Workspace evidence
Bert-Jan Pals, Greg Charitonos

14h30 70 Bring your own Living of the Land binaries Vincent Denneman


14h55
Break


15h05 76
Enriching IDS detection on network protocols using anomaly-based detection
Pin van Helvoirt

15h30
75
Implementing side channel resistance in QARMA
Joris Janssen

2
15h55

Break


16h05
68
Post-Exploitation Defence of Git Repositories using Honey Tokens
Max vanderHorst
1
16:30
59
Deep Learning for Partial Image Encryption Carmen Veenker, Danny Opdam


16h30

Close




Out of normal schedule presentations

Room B1.23 at Science Park 904 NL-1098XH Amsterdam.
Date Time Place #RP Title Name(s) LOC RP
2021-09-xx
10h00
online






11h00
online