SNE Master Research Projects 2015 - 2016 - LeftOvers

# title
supervisor contact



Automated migration testing.

Unattended content management systems are a serious risk factor for internet security and for end users, as they allow trustworthy information sources on the web to be easily infected with malware and turn evil.
  • How can we use well known software testing methodologies (e.g. continuous integration) to automatically test if available updates to software running on a website that fix security weaknesses can be safely implement with as minimal involvement of the end user as possible?
  • How would such a migration work in a real world scenario?
In this project you will at the technical requirements for automated migration testing, and if possible design a working prototype.
Michiel Leenaars <michiel=>nlnet.nl>




Virtualization vs. Security Boundaries.

Traditionally, security defenses are built upon a classification of the sensitivity and criticality of data and services. This leads to a logical layering into zones, with an emphasis on command and control at the point of inter-zone traffic. The classical "defense in depth" approach applies a series of defensive measures applied to network traffic as it traverses the various layers.

Virtualization erodes the natural edges, and this affects guarding system and network boundaries. In turn, additional technology is developed to add instruments to virtual infrastructure. The question that arises is the validity of this approach in terms of fitness for purpose, maintainability, scalability and practical viability.
Jeroen Scheerder <Jeroen.Scheerder=>on2it.net>


Efficient delivery of tiled streaming content.

HTTP Adaptive Streaming (e.g. MPEG DASH, Apple HLS, Microsoft Smooth Streaming) is responsible for an ever-increasing share of streaming video, replacing traditional streaming methods such as RTP and RTMP. The main characteristic of HTTP Adaptive Streaming is that it is based on the concept of splitting content up in numerous small chunks that are independently decodable. By sequentially requesting and receiving chunks, a client can recreate the content. An advantage of this mechanism is that it allows a client to seamlessly switch between different encodings (e.g. qualities) of the same content.
The technique known as Tiled Streaming build on this concept by not only splitting up content temporally, but also spatially, allowing for specific areas of a video to be independently encoded and requested. This method allows for the navigation in ultra-high resolution content, while not requiring the entire video to be transmitted.
An open question is how these numerous spatial tiles can be distributed and delivered most efficiently over a network, reducing both unnecessary overhead as well as latency.
Ray van Brandenburg <ray.vanbrandenburg=>tno.nl>


What is the effectiveness of monitoring darknet fora to predict possible hacking attempts against, for example, Dutch targets (banks, critical infrastructure, etc)?

The purpose of the research is that in theory, a well built system might have foreseen the DDOS attack against Ziggo's nameservers a few months ago based on chatter on hiring a botnet to "target a Dutch ISP". It may have been enough to at least take preparations against such an attack.

We reference the OS3 paper of Diana Rusu, which is titled "Forum post classification to support forensic investigations of illegal trade on the Dark Web".

As for an introduction of both of us: we are not experts on the machine learning part, but are enthusiastic to learn new subjects. Machine learning is becoming more important these days due to the growth of data, so we think learning this skill is a good investment. We both like to program in different languages.

The exact research question could be slightly changed if need be, for example if it seems that the research question is too broad.


System Security Monitoring using Hadoop.

It involves looking into data mining of system and network logs using Hadoop and then focusing on system security. This research will investigate a real time ’streaming’ approach for monitoring system security - so streaming data through hadoop (e.g. via spark streaming) and then identifying and storing possible incidents. As an example of visualization you could think of a real-time map of the world displaying both failed and successful login attempts. In any case an important first part of the project would be investigating what others have done in this field and which systems and techniques they used. This to get an overview of all the possibilities. Finally implementing a small proof of concept based on ‘best-practice’ or cutting edge tools/API’s would be a great final result.
Machiel Jansen <machiel.jansen=>surfsara.nl>
Mathijs Kattenberg <mathijs.kattenberg=>surfsara.nl>


Qualitative analysis of Internet measurement methods and bias.

In the past year NLnet Labs and other organisations have run a number of measurements on DNSSEC deployment and validation.  We used the RIPE Atlas infrastructure for measurements, while other used Google ads where flash code runs the measurements.  The results differ as the measurement points (or observation points) differ: RIPE Atlas measurment points are mainly located in Europe, while Google ads flash measurements run global (or with some stronger representation of East-Asia).

Question is can we quantify the bias in the Atlas measurements or qualitative compare the measurements, so we can correlate the results of both measurement platforms.  This would greatly help interpret our results and the results from others based on the Atlas infrastructure. The results are highly relevant as many operational discussions on DNS and DNSSEC deployment are supported or falsified by these kind of measurements.
Willem Toorop <willem=>nlnetlabs.nl>



More and more videos are being published on YouTube that contain content which is such that you want to find it soon after upload. The metadata associated with videos is often limited. Therefore, the selection has to be based on the visual content of the video.

Develop a demonstrator that automatically downloads and analyses the latest YouTube videos. The demonstrator should operate in a two stage process: first, make a content-based selection of the most relevant video material using the screenshot that YouTube provides for every new video. In case the video is considered relevant, download the entire video for full analysis. Use available open source tools such as OpenCV.

Demonstrator for the YouTube-scanner.
Mark van Staalduinen <mark.vanstaalduinen=>tno.nl>



Cross-linking of objects and people in social media pictures.

Automatically cross-link persons and objects found in one social media picture to the same persons and objects in other pictures.

Develop a concept and make a quickscan of suitable technologies. Validate the concept by developing a demonstrator using TNO/commercial/open-source software. Investigate which elements influence the cross-linking results.

Presentation of the concept and demonstrator.
John Schavemaker <john.schavemaker=>tno.nl>


Recognizing underwater objects using MPEG CDVS.

The standards organization MPEG has recently published a new standard that contains a feature that allows a device with limited bandwidth to perform "visual search": Compact Descriptors For Visual Search (CDVS). With CDVS, a number of small "compact descriptors" can be created from a photo. These descriptors extract and compress the so called "local features" that are in a certain photo. The descriptors can then be sent to a server, in place of the photo itself, to perform object recognition. The server performs the recognition by comparing the received descriptors with those from the photos in a database. The object recognition can thus be performed quickly and with minimal bandwidth usage. Of course, the more "compact descriptors" of  a photo are sent, the higher the success rate of the object recognition performed by the server, but this also increases the bandwidth used by the device that transmits the descriptors.

In this project, you will investigate whether MPEG CDVS is suitable for the recognition of underwater objects. Recognizing underwater objects is particularly challenging because the visibility is reduced, and objects further away from the camera appear blurred. Furthermore, the underwater photos are taken by marine robots, who communicate with the server via a connection based on acoustic signals, which has a very low bandwidth.

The project aims thus at answering the following research questions:
  • Is MPEG CDVS suitable to recognize the salient features of underwater photos? If not, how could the descriptors be modified to capture these features?
  • What is the bandwidth required to transmit these descriptors? Can the bandwidth be reduced without significantly affecting the success rate of the object recognition performed by the server?
At the end of the project you will produce a prototype of a system that uses CDVS to recognize underwater objects (the prototype can be built as an extension to the already existing MPEG CDVS testbed of TNO).
Lucia D’Acunto <lucia.dacunto=>tno.nl>
Stefania Giodini <stefania.giodini=>tno.nl>


Mobile app fraud detection framework.

How to prevent fraud in mobile banking applications. Applications for smartphones are commodity goods used for retail (and other) banking purpose. Leveraging this type of technology for money transfer attracts criminal organisations trying to commit fraud. One of many security controls can be detection of fraudulent transactions or other type activity. Detection can be implemented at many levels within the payment chain. One level to implement detection could be at the application level itself. This assignment will entail research into the information that would be required to detect fraud from within mobile banking applications and to turn fraud around by building a client side fraud detection framework within mobile banking applications.
Steven Raspe <steven.raspe=>nl.abnamro.com>


Malware analysis NFC enabled smartphones with payment capability.

The risk of mobile malware is rising rapidly. This combined with the development of new techniques provides a lot of new attach scenarios. One of these techniques is the use of mobile phones for payments. In this research project you will take a look at how resistant these systems are against malware on the mobile. We would like to look at the theoretical threats, but also perform hands-on testing.
NOTE: timing on this project might be a challenge since the testing environment is only available during the pilot from August 1st to November 1st.
Steven Raspe <steven.raspe=>nl.abnamro.com>



Transencrypting streaming video.

Common encryption (CE) and digital right management (DRM) are solutions used by the content industry to control the delivery of digital content, in particular streaming video. Whereas DRM focusses on securely getting a CE key to a trusted piece of user equipment, trans-encryption has been suggested as a technical alternative. Transencryption transforms the encryption of content without decrypting it. So encrypted content that can be decrypted with private key A is transformed into encrypted content that can be decrypted with private key B. This solution enables a content provider to outsource the transencryption of a piece of content to an untrusted third party in order to get the content cryptographically targeted to a single specific piece of user equipment.

In this project, you will investigate the technical viability of transencrypting streaming video by building an implementation. Your implementation should answer the following questions for at least the implemented configuration.
·         Is it possible to implement transencryption on commercial-of-the-shelff computer equipment?
·         Can the implementation handle transencryption of streaming video of 2 Mb/s?

Oskar van Deventer, <oskar.vandeventer=>tno.nl>


Research MS Enhanced Mitigation Experience Toolkit (EMET).

Every month new security vulnerabilities are identified and reported. Many of these vulnerabilities rely on memory corruption to compromise the system. For most vulnerabilities a patch is released after the fact to remediate the vulnerability. Nowadays there are also new preventive security measures that can prevent vulnerabilities from becoming exploitable without availability of a patch for the specific issue. One of these technologies is Microsoft’s Enhanced Mitigation Experience Toolkit (EMET) this adds additional protection to Windows, preventing many vulnerabilities from becoming exploitable. We would like to research whether this technology is efficient in practice and can indeed prevent exploitation of a number of vulnerabilities without applying the specific patch. Also we would like to research whether there is other impact on the system running EMET, for example a noticeable performance drop or common software which does not function properly once EMET is installed. If time permits it is also interesting to see if existing exploits can be modified to work in an environment protected by EMET.
Henri Hambartsumyan <HHambartsumyan=>deloitte.nl>


Triage software.

In previous research a remote acquisition and storage solution was designed and built that allowed sparse acquisition of disks over a VPN using iSCSI. This system allows sparse reading of remote disks. The triage software should decide which parts of the disk must be read. The initial goal is to use meta-data to retrieve the blocks that are assumed to be most relevant first. This in contrast to techniques that perform triage by running remotely while performing  a full disk scan (e.g. run bulk_extractor remotely, keyword scan or do a hash based filescan remotely).

The student is asked to:
  1. Define criteria that can be used for deciding which (parts of) files to acquire
  2. Define a configuration document/language that can be used to order based on these criteria
  3. Implement a prototype for this acquisition
"Ruud Schramp (DT)" <schramp=>holmes.nl>
"Zeno Geradts (DT)" <zeno=>holmes.nl>
"Erwin van Eijk (DT)" <eijk=>holmes.nl>


Parsing CentralTable.accdb from Office file cache and restoring cached office documents.

The Microsoft Office suit uses a file cache for several reasons, one of them is delayed uploading and caching of documents from a sharepoint server.
In these cache files office partial or complete documents that have been opened on a computer might be available. Also the master database in the file cache folder contains document metadata from sharepoint sites. In this project you are asked to research the use of the office file cache and deliver a POC for extraction and parsing of metadata from the database file, also decode or parse document contents from the cachefiles (.FSD).
Kevin Jonkers <jonkers=>fox-it.com>


UsnJrnl parsing for Microsoft Office activity.

In modern Windows versions, the NTFS filesystem keeps a log (the UsnJrnl file) of all operations that take place on files and folders. This can include interesting information about read- and write-operations on files. Microsoft Office programs perform a lot of file-operations in the background while a user is working on a file (think of autosave, back-up copies, copy-paste operations, etc.). While a lot of this activity leaves short-term traces on the file system, they can often only be found in the UsnJrnl after a while. Only little research has been done on the forensic implications of these traces. In this project, you are requested to research which traces are left in the UsnJrnl when using Office applications like Word and Excel and how these traces can be combined into a hypothesis about what activity was performed on a document.
Gina Doekhie <gina.doekhie=>fox-it.com>




The Serval Project.

Here a few projects from the Serval project. Not everything is equally appropriate for the SNE master, but it gives possibly ideas for rp's.

1. Porting Serval Project to iOS

The Serval Project (http://servalproject.org, http://developer.servalproject.org/wiki) is looking to port to iOS.  There are a variety of activities to be explored in this space, including how to provide interoperability with Android and explore user interface issues.

3. C65GS FPGA-Based Retro-Computer

The C65GS (http://c65gs.blogspot.nl, http://github.com/gardners/c65gs) is a reimplementation of the Commodore 65 computer in FPGA, plus various enhancements.  The objective is to create a fun 8-bit computer for the 21st century, complete with 1920x1200 display, ethernet, accelerometer and other features -- and then adapt it to make a secure 8-bit smart-phone.  There are various aspects of this project that can be worked on.

4. FPGA Based Mobile Phone

One of the long-term objectives of the Serval Project (http://servalproject.org, http://developer.servalproject.org/wiki) is to create a fully-open mobile phone.  We believe that the most effective path to this is to use a modern FPGA, like a Zynq, that contains an ARM processor and sufficient FPGA resources to directly drive cellular communications, without using a proprietary baseband radio.  In this way it should be possible to make a mobile phone that has no binary blobs, and is built using only free and open-source software.  There are considerable challenges to this project, not the least of which is implementing 2G/3G handset communications in an FPGA.  However, if successful, it raises the possibility of making a mobile phone that has long-range UHF mobile mesh communications as a first-class feature, which would be an extremely disruptive innovation.
Paul Gardner-Stephen <paul.gardner-stephen=>flinders.edu.au>


SURFdrive security.

SURFdrive is a personal cloud storage service for the Dutch higher education and research community, offering staff, researchers and students an easy way to store, synchronise and share files in the secure and reliable SURF community cloud.

SURFdrive is based on Owncloud, an open-source personal cloud storage product. Our challenge is to make the software environment as safe and secure as possible. Question is:
  • How can we make the environment resistant to future 0-day attacks?
Maybe anomaly detection techniques might be helpful. Research task is to examine which techniques are helpful against 0-day attacks.
Rogier Spoor <Rogier.Spoor=>surfnet.nl>




Extending the range of NFC capable devices.

Recently it has been shown that default and widely available Android devices can be used to effectively perform relay attacks on contact less payments (EMV Contact less). However, these attacks are obviously limited by the maximum range an Android device can read a bank card from a wallet or pocket. As of now, there does not seem to be a business case for large scale fraud as the range is typically limited to 5 cm. Improvements in the range, however, drastically change the business case and make relay attack fraud much more profitable. Reports from a laboratory show that a range up to 28 cm is possible. Another paper reports a simulated antenna with a range up to 55 cm. Students can research what the practical limit is, and what resources are needed to extend the range of NFC capable devices.
Jordi van den Breekel <vandenBreekel.Jordi=>kpmg.nl>


Comparison of security features of major Enterprise Mobility Management solutions

For years, Gartner has identified the major EMM (formarly known as MDM) vendors. These vendors are typically rated on performance and features; security often is not addressed in detail.
This research concerns an in-depth analysis of the security features of major EMM solutions (such as MobileIron, Good, AirWatch, XenMobile, InTune, and so forth) on major mobile platforms (iOS, Android, Windows Phone). Points of interest include: protection of data at rest (containerization and encryption), protection of data in transit (i.e. VPN), local key management, vendor specific security features (added to platform API's),
Paul van Iterson <vanIterson.Paul=>kpmg.nl>


Partitioning of big graphs.

Distributed graph processing and GPU processing of graphs that are bigger than GPU memory both require that a graph be partitioned into sections that are small enough to fit in a single machine/GPU. Having fair partitions is crucial to obtaining good workload balance, however, most current partitioning algorithms either require the entire graph to fit in memory or repeatedly process the same nodes, causing the partitioning to be a very computationally intensive process.

Since a good partitioning scheme depends on both the number of machines used (i.e., the number of partitions) and the graph itself, this means that precomputing a partitioning is unhelpful. It would mean that incrementally updating the graph becomes impossible, we therefore need to do partitioning on-the-fly, preferably distributedly. This project involves investigating 1 or more possible partitioning schemes and developing prototypes. Possible starting points:
  • Partitioning that minimises cross-partition communication
  • Fine-grained partitioning that allows easy recombining of partitions to scale to the appropriate number of machines.
  • Distributed edge-count based partitioning that minimises communication.
Expected deliverables:
  • One or more partitioning prototypes
  • Write-up of the partitioning scheme and it's benefits
Merijn Verstraaten <M.E.Verstraaten=>uva.nl>


Analysing ELF binaries to find compiler switches that were used.

The Binary Analysis Tool is an open source tool that can automate analysis of binary files by fingerprinting them. For ELF files this is done by extracting string constants, function names and variable names from the various ELF sections. Sometimes compiler optimisations move the string constants to different ELF sections and extraction will fail in the current implementation.

Your task is to find out if it is possible by looking at the binary to see if optimisation flags that cause constants of ELF sections to be moved were passed to the compiler and reporting them. The scope of this project is limited to Linux.

Armijn Hemel - Tjaldur Software Governance Solutions
Armijn Hemel <armijn=>tjaldur.nl>


Designing structured metadata for CVE reports.

Vulnerability reports such as MITRE's CVE are currently free format text, without much structure in them. This makes it hard to machine process reports and automatically extract useful information and combine it with other information sources. With tens of thousands of such reports published each year, it is increasingly hard to keep a holistic overview and see patterns. With our open source Binary Analysis Tool we aim to correlate data with firmware databases.

Your task is to analyse how we can use the information from these reports, what metadata is relevant and propose a useful metadata format for CVE reports. In your research you make an inventory of tools that can be used to convert existing CVE reports with minimal effort.

Armijn Hemel - Tjaldur Software Governance Solutions
Armijn Hemel <armijn=>tjaldur.nl>


Automatic comparison of photo response non uniformity (PRNU) on Youtube.

Goal :
  • This project would like to compare the different files available on Youtube and compare the PRNU patterns in a fast way.
Approach :
  • The software for PRNU extraction and comparison is available at NFI, however the question is how we can process large numbers of video files from Youtube based on this method and limit the amount of data transferred
Result :
  • Report and demonstrator for this approach
Zeno Geradts (DT) <zeno=>holmes.nl>


RedStar OS reverse engineering.

During 32C3 conference, two researchers showed that Redstar OS - North Koreas OS - implements custom cryptography in the pilsung.ko kernel module. Reverse engineer this module, understand the difference in the pilsung implementation of AES compared to normal AES. Is there some kind of backdoor or weakness in pilsung?
Note that we expect that deep understanding of assembly/reverse engineering and the Linux kernel is required to successfully research this topic.

for more info on RedStar OS reversing.
Tim van Essen <TvanEssen=>deloitte.nl>
Henri Hambartsumyan <hhambartsumyan=>deloitte.nl>


Efficient networking for clouds-on-a-chip.

The "Cloud" is a way to organize business where the owners of physical servers rent their resources to software companies to run their application as virtual machines. With the growing availability of multiple cores on a chip, it becomes interesting to rent different parts of a chip to different companies. In the near future, multiple virtual machines will co-exist and run simultaneously on larger and larger multi-core chips.
Meanwhile, the technology used to implement virtual machines on a chip is based on very old principles that were designed in the 1970's for single-processor systems, namely the use of shared memory to communicate data between processes running on the same processor.
As multi-core chip become prevalent, we can do better and use more modern techniques. In particular, the direct connections between cores on the chip can be used to implement a faster network than using the off-chip shared memory. This is what this project is about: demonstrate that direct use of on-chip networks yield better networking between VMs on the same chip than using shared memory.
The challenge in this project is that the on-chip network is programmatically different than "regular" network adapters like Ethernet, so we cannot use existing network stacks as-is.
The project candidate will thus need to explore the adaptation and simplification of an existing network stack to use on-chip networking.
The research should be carried out either on a current multi-core product or simulations of future many-core accelerators. Simulation technology will be provided as needed.

Raphael 'kena' Poss <r.poss=>uva.nl>


Secure on-chip protocols for clouds-on-a-chip.

The "Cloud" is a way to organize business where the owners of physical servers rent their resources to software companies to run their application as virtual machines. With the growing availability of multiple cores on a chip, it becomes interesting to rent different parts of a chip to different companies. In the near future, multiple virtual machines will co-exist and run simultaneously on larger and larger multi-core chips.
Meanwhile, the technology used to implement virtual machines on a chip is based on very old principles that were designed in the 1970's for single-processor systems, namely the virtualization of shared memory using virtual address translation within the core.
The problem with this old technique is that it assumes that the connection between cores is "secure". The physical memory accesses are communicated over the chip without any protection: if a VM running on core A exchanges data with off-chip memory, a VM running on core B that runs malicious code can exploit hardware errors or hardware design bugs to snoop and tamper with the traffic of core A.
To make Clouds-on-a-chip viable from a security perspective, further research is needed to harden the on-chip protocols, in  particular the protocols for accessing memory, virtual address translation and the routing of I/O data and interrupts.
The candidate for this project should perform a thorough analysis of the various on-chip protocols required to implement VMs on individual cores, then design protocol modifications that provide resistance against snooping and tampering by other cores on the same chip, together with an analysis of the corresponding overheads in hardware complexity and operating costs (extra network latencies and/or energy usage).
The research will be carried out in a simulation environment so that inspection of on-chip network traffic becomes possible. Simulation tools will be provided prior to the start of the project.
Raphael 'kena' Poss <r.poss=>uva.nl>


Multicast delivery of HTTP Adaptive Streaming.

HTTP Adaptive Streaming (e.g. MPEG DASH, Apple HLS, Microsoft Smooth Streaming) is responsible for an ever-increasing share of streaming video, replacing traditional streaming methods such as RTP and RTMP. The main characteristic of HTTP Adaptive Streaming is that it is based on the concept of splitting content up in numerous small chunks that are independently decodable. By sequentially requesting and receiving chunks, a client can recreate the content. An advantage of this mechanism is that it allows a client to seamlessly switch between different encodings (e.g. qualities) of the same content.
There is a growing interest from both content parties as well as operators and CDNs to not only be able to deliver these chunks over unicast via HTTP, but to also allow for them to be distributed using multicast. The question is how current multicast technologies could be used, or adapted, to achieve this goal.
Ray van Brandenburg <ray.vanbrandenburg=>tno.nl>


Generating test images for forensic file system parsers.

Traditionally, forensic file system parsers (such as The Sleuthkit and the ones contained in Encase/FTK etc.) have been focused on extracting as much information as possible. The state of software in general is lamentable — new security vulnerabilities are found every day — and forensic software is not necessarily an exception. However, software bugs that affect the results used for convictions or acquittals in criminal court are especially damning. As evidence is increasingly being processed in large automated bulk analysis systems without intervention by forensic researchers, investigators unversed in the intricacies of forensic analysis of digital materials are presented with multifaceted results that may be incomplete, incorrect, imprecise, or any combination of these.

There are multiple stages in an automated forensic analysis. The file system parser is usually one of the earlier analysis phases, and errors (in the form of faulty or missing results) produced here will influence the results of the later stages of the investigation, and not always in a predictable or detectable manner. It is relatively easy (modulo programmer quality) to create strict parsers that bomb-out on any unexpected input. But real-world data is often not well-formed, and a parser may need to be able to resync with input data and resume on a best-effort basis after having reached some unexpected input in the format. While file system images are being (semi-) hand-generated to test parsers, when doing so, testers are severely limited by their imagination in coming up with edge cases and corner cases. We need a file system chaos monkey.

The assignment consists of one of the following (may also be spawned in a separate RP:
  1. Test image generator for NTFS. Think of it as some sort of fuzzer for forensic NTFS parsers. NTFS is a complex filesystem which offers interesting possibilities to trip a parser or trick it into yielding incorrect results. For this project, familiarity with C/C++ and the use of the Windows API is required (but only as much as is necessary to create function wrappers). The goal is to automatically produce "valid" — in the sense of "the bytes went by way of ntfs.sys" — but hopefully quite bizarre NTFS images.
  2. Another interesting research avenue lies in the production of /subtly illegal/ images. For instance, in FAT, it should be possible, in the data format, to double-book clusters (aking to a hard link). It may also be possible to create circular structures in some file systems. It will be interesting to see if and how forensic filesystem parsers deal with such errors.
"Wicher Minnaard (DT)" <wicher=>holmes.nl>
Zeno Geradts <zeno=>holmes.nl>


Large scale Log Analytics.

Central log analysis is a "Big Data" challenge at Vancis. We have thousands of servers, devices and applications logging  data. We'd like to retrieve intelligent (or preferably, actionable) information from logs by applying machine learning techniques. We expect that you select and apply methods that should (substantiated by research) deliver a tangible result. The initial business question is intentionally broad. We expect you to narrow the scope such that you are left with a final research question that can be answered in the limited time you are given. You can focus on a particular type of data (e.g. system-, audit-, network-, application- logs) or combine different sets.

We expect you to demo your solution (algorithm, code-pieces) on both a small set of data (<1TiB) and a large set of data (>TiB) and proof that the solution scales. A big bonus would be if the chosen method delivers a tangible business outcome (e.g. security is improved, the speed of finding the cause for a failing service is increased, etc).

We are facilitating a ready-to-use cluster including Hadoop/Spark, ElasticSearch, LogStash & related technologies. During the project you are free to add applications if necessary to execute your task. We are more than happy to interact with you to scope the research question and support you by supplying data that you need to execute the case.
Anthony Potappel <Anthony.Potappel=>vancis.nl>
Patrick Beitsma <Patrick.beitsma=>vancis.nl>


Android Application Security.

Recent Android releases have significantly improved support for full disk encryption, with it being enabled by default as of version 5.0. As we have seen on iOS full disk encryption is not fully effective (powering on the device decrypts the disk). With disk encryption potentially not fully effective there may be need for encryption on the application level that developers can include in their app. Research the possibility for secure encryption per app, either via loadable libraries in the app, or perhaps a encryption layer between OS and app. Make a proof-of-concept implementation if the time allows for it. Note that dynamic code loading comes with its own set of application security tradeoffs.
  • Sufficient programming skills are needed.
Rick van Galen <vanGalen.Rick=>kpmg.nl>


(In)security of java usage in large software frameworks and middleware.

Java is used in almost all large software application packages. Examples such packages are middleware (Tomcat, JBoss and WebSphere) and products like SAP and Oracle. Goal of this research is to investigate on the possible attacks that exists on Java (e.g. RMI) used in such large software packages and develop a framework to securely deploy (or attack) those.
Martijn Sprengers <Sprengers.Martijn=>kpmg.nl>


A time machine for registration data.

This project involves using ElasticSearch or a similar non-relational database technology for research that we carry out at SIDN Labs with (historical) registration data on .nl domain names. Because of the size of our dataset (tens of millions of updates to all .nl domain names), our existing relational database environment is no longer adequate, e.g. in terms of performance or user-friendliness.
More info:
Marco Davids <marco.davids=>sidn.nl>
Cristian Hesselman <cristian.hesselman=>sidn.nl>


Text mining on the basis of Natural Language Processing.

This project involves using Natural Language Processing  (NLP) to analyse registrant data, e.g. to identify false information and other abuses promptly when a new domain name is registered.
More info:
Marco Davids <marco.davids=>sidn.nl>
Cristian Hesselman <cristian.hesselman=>sidn.nl>


Virtual reality interface for data analysis.

This project involves designing and developing a virtual reality (VR) interface for the analysis of large volumes of DNS data. The virtual world should enable the user to explore the data on an intuitive basis. The VR interface should also aid the recognition of irregularities and interrelationships.
More info:
Marco Davids <marco.davids=>sidn.nl>
Cristian Hesselman <cristian.hesselman=>sidn.nl>


Usage Control in the Mobile Cloud.

Mobile clouds [1] aim to integrate mobile computing and sensing with rich computational resources offered by cloud back-ends. They are particularly useful in services such as transportation, healthcare and so on when used to collect, process and present data from physical world. In this thesis, we will focus on the usage control, in particular privacy, of the collected data pertinent to mobile clouds. Usage control[2] differs from traditional access control by not only enforcing security requirements on the release of data by also on what happens afterwards. The thesis will involve the following steps:
  • Propose an architecture over cloud for "usage control as a service" (extension of authorization as a service) for the enforcement of usage control policies
  • Implement the architecture (compatible with Openstack[3] and Android) and evaluate its performance.
[1] https://en.wikipedia.org/wiki/Mobile_cloud_computing
[2] Jaehong Park, Ravi S. Sandhu: The UCONABC usage control model. ACM Trans. Inf. Syst. Secur. 7(1): 128-174 (2004)
[3] https://en.wikipedia.org/wiki/OpenStack
[4] Slim Trabelsi, Jakub Sendor: "Sticky policies for data control in the cloud" PST 2012: 75-80
Fatih Turkmen <F.Turkmen=>uva.nl>


Security and Performance Analysis of (Encrypted) NoSQL Databases.

It has been shown that encryption over SQL data gives a performance penalty
of the range 6-26% [1,2]. In return, the SQL databases ensures confidentiality/privacy against malicious users such as "curious database admins" by protecting not only data but also the logs [3]. In this thesis, we will look at the same problems from the window of NoSQL databases. NoSQL databases are frequently used in Big Data applications thanks to their scalability in certain types of data (often less structured) [4].
There are many freely available NoSQL databases such as MongoDB[5] and Cassandra[6] that support "encryption at rest". We will try to answer the following questions over the selected databases in this  thesis:
  • What are the possible weaknesses and strengths in terms of security?
  • What is the performance of the selected databases over a variety of encryption schemes?
  • What are the possible remedies/optimizations to the first two questions?
[1] http://www.databasejournal.com/features/mssql/article.php/3815501/Performance-Testing-SQL-2008146s-Transparent-Data-Encryption.htm
[2] Raluca A. Popa, Catherine M. S. Redfield, Nickolai Zeldovich, Hari Balakrishnan: CryptDB: protecting confidentiality with encrypted query processing. SOSP 2011: 85-100
[3] https://en.wikipedia.org/wiki/NoSQL
[4] https://www.mongodb.org/
[5] http://cassandra.apache.org/
Fatih Turkmen <F.Turkmen=>uva.nl>


Detection of DDoS Mitigation.

Recent rise in DDoS issues have given rise to a wide range of mitigation approaches.

An attacker that seeks to maximize impact could be interested in predicting potential success: is a potential target "protected" or not? Deciding this question  probably involves measurements, and reasoning about measurement results -- heuristics? -- among other things.  How to?  To what extent can an attacker expect to succeed in detecting the presence/absence of protective layers on the intermediate network path?

For more information in Dutch: SURFnet Project DDos
Jeroen Scheerder <js=>on2it.net>


Automated asset identification in large organizations.

Many large organizations are struggling to remain in control over their IT infrastructure. What would help for these organizations is automated asset identification: given an internal IP range, scan the network and based on certain heuristics identify what the server's role is (i.e. is it a web server, a database, an ERP system, an end user, or a VoIP device).
Rick van Galen <vanGalen.Rick=>kpmg.nl>


Automatic phishing mail identification based on language markers.

Phishing mails are still a large threat for organizations. Phishing mails are hard to identify from end users' perspective. Quite often even, internal organizations send mails around that are very similar to phishing mails. Security operations centers often miss these emails as they are not caught by spam filters. What identifiers are included in phishing mails that can be used for automatic alerting of security teams in organizations?
Rick van Galen <vanGalen.Rick=>kpmg.nl>


Forensic investigation of smartwatches.

Smartwatches are an unknown area in information risk. They are an additional display for certain sensitive data (i.e. executive mail, calendars and other notifications), but are not necessarily covered by organizations' existing mobile security products. In addition, it is often much easier to steal a watch than it is to steal a phone. What is the data that gets 'left behind' on smartwatches in case of theft, and what information risks do they pose?
Rick van Galen <vanGalen.Rick=>kpmg.nl>


IoT security aspects and testing methodology.

The internet-of-things world has a security problem, that much is clear. But what security problems are specific to IoT, and, given the large amount of different standards being followed, what is the proper testing methodology to test these?
Rick van Galen <vanGalen.Rick=>kpmg.nl>


Pentest auditability 2.0: Digging into the network.

During security tests, it is often difficult to achieve great accountability of actions. Systems may be disrupted by a security test, or may be disrupted by unrelated bugs and administration within the organization. To prove accountability of certain actions, one must keep good records of pentest activities. One such method is to simply log and analyze network traffic. But is it feasible to do this? Does one log all network traffic, or only meta-information? And is it feasible to do this given storage requirements?
Rick van Galen <vanGalen.Rick=>kpmg.nl>


WhatsApp end-to-end encryption: show us the money.

WhatsApp has recently switched to using the Signal protocol for their messaging, which should provide greatly enhanced security and privacy over their earlier, non end-to-end encrypted propietary protocol. Of course, since WhatsApp is closed source, one has to trust WhatsApp to actually use this Signal protocol, since one cannot review the source code. What other (automated) methods are there to verify that WhatsApp actually employs this protocol? This research is about reverse engineering Android and/or iOS apps. 
Rick van Galen <vanGalen.Rick=>kpmg.nl>


Android full-disk encryption: FBI-proof security?

The recent stories about iPhone encryption have sparked a new debate about control over encryption. However, is this really warranted given that the majority of smartphone users in the world use Android? How does Android full-disk encryption stack up against the competition - how does it work, and what methods are available for attackers (both criminal and law enforcement) - to circumvent it?
Rick van Galen <vanGalen.Rick=>kpmg.nl>


Applocker and SoftwareRestrictionPolicies review.

Applocker and SRP are Microsoft technologies for application whitelisting.  These solutions however strongly depend on the policies and configurations. There are various ways to bypass applocker  (e.g. powershell, Word Macro’s)
Goal of this project is to review app locker security and develop a policy that prevents or detects various applocker bypass mechanisms.

In this research project the students should:
  •  Implement a test deployment of Applocker
  •  Create an overview of mechanisms that can be used to circumvent application whitelisting technology (e.g. http://en.wooyun.io/2016/01/28/Bypass-Windows-AppLocker.html)
  •  Review options for limiting specific attack vectors (e.g. disallow powershell) and check if they block correctly
  •  Create rules and triggers to detect specific attacks based on audit logs and audit settings (e.g. someone tries to stop app locker service)
Marc Smeets <marc=>outflank.nl>


Analysis of tactics, techniques and procedures in targeted attacks.

Starting with the Mandiant report on APT1 we see more and more information being published by researchers on targeted attacks by state actors, hacking groups and determined individuals. This gives us more insight in the juicy details of the Tactics, Techniques and and Procedures (TTPs) used in these attacks. But with several dozens of these reports published, we are seeing significant variations in the sophistication of the attacks. An easy example is the use of 0-days in the attacks: some do, some dont. Another is the use of specific tools like mimikatz and powershell: some seem to like default tools while others rely heavily on customised tools. Another is the sophistication used (or needed) per business sector.

But with enough information out there, what we lack at the moment is a more structured and detailed overview of the TTPs of the attacks (info such as https://docs.google.com/spreadsheets/d/1H9_xaxQHpWaa4O_Son4Gx0YOIzlcBWMsdvePFX68EKU/ is not enough). While some corporations provide periodic trend reports, these lack the technical details and are of questionable independence. Once the overview is gained we want to know if its possible to make trend analyses that help us in better understanding these types of attacks:
  • Are there sector or country specific trends?
  • How do attacks evolve over time?
  • How much overlap is there amongst techniques being used by different actors?
  • Can you classify what TTPs belong to a low sophisticated attack, and what to an advanced?
Although this research mostly consist of desk studying, we welcome any effort to spice it up with any relevant technical research you can think of. Be creative!
Marc Smeets <marc=>outflank.nl>


Visualising DNS big data: .com .sucks and .nl .rocks?

SURFnet, the University of Twente and SIDN jointly work on a project called "OpenINTEL", in which we perform large-scale active measurements of the Domain Name System [1]. We query every single domain in a large list of top-level domains (see below) once every 24 hours and store the results on a dedicated Hadoop cluster. One interesting challenge is to find attractive and insightful ways to visualise this data. Visual analytics can help researchers gain quick intuitions or discover interesting anomalies in the data. The goal of this data is to create one or two visualisations based on this dataset. Suggested visualisations are:
  • visualising IPv4 and IPv6 addresses used in the TLDs that we measure (a good example of IP visualisation are IPv4 heatmaps, e.g. http://maps.measurement-factory.com/gallery/index.html)
  • visualising IPv4 and IPv6 geolocation in the TLDs we measurse
  • visualising domains per autonomous system for the TLDs we measure
In all cases, an important aspect of the visualisations will be showing changes over time, either as an animation, or through an interactive system (e.g. a web page).

[1] van Rijswijk-Deij, R., Jonker, M., Sperotto, A., & Pras, A. (2015). The Internet of Names: A DNS Big Dataset Actively Measuring 50% of the Entire DNS Name Space, Every Day. In Proceedings of ACM SIGCOMM 2015. London, UK: ACM Press. doi:10.1145/2785956.2789996
  1. http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p91.pdf
  2. http://wwwhome.ewi.utwente.nl/~rijswijkrm/pub/sigcomm2015-poster.pdf
Roland van Rijswijk - Deij <roland.vanrijswijk=>surfnet.nl>


Developing a public permissioned blockchain.

Blockchain technology is getting much attention triggered by the popularity of the bitcoin cryptocurrency. Ethereum (https://ethereum.org/) is a blockchain-based computer that runs smart contracts: applications that run exactly as programmed without any possibility of downtime, censorship, fraud or third party interference. However, the unlimited openness of Ethereum poses risks. For example, bad actors can permanently put illegal content or applications on such a blockchain. This risk, and the associated legal liability will refrain legitimate businesses from running applications or supporting such an infrastructure. Permissioned blockchains, see e.g.
allow for certain parties to have more control in who can do what and, therefore, can help mitigate this risk. The hypothesis is that such permissioned blockchains can retain many of the benefits of blockchain technology.

In this project, you will investigate the hypothesis by
  • Performing a brief risk analysis, identifying the most prominent risks of permissionless blockchains
  • Performing a brief analysis of the main (quantifiable) benefits of permissionless blockchains
  • Developing permission requirements for managing the identified risks and relating those to the (potential) loss of benefits (e.g., openness, censorship resistance).
  • Implementing a permissioned blockchain (for example by using technology provided by Eris (https://erisindustries.com/) or Tendermint (http://tendermint.com/)).
  • Demonstrating the functionality of the system with a test application
  • Evaluating the system
Oskar van Deventer <oskar.vandeventer=>tno.nl>
Maarten Everts <maarten.everts=>tno.nl>


Automated gadget injection for reverse engineering iOS and Android applications.

Because root detection measures implemented in mobile applications may slow down reverse engineers, we would like to develop a new method for dynamic analysis of mobile applications by injecting a so-called gadget into the existing (compiled) application. The gadget allows to perform any operation within the sandbox of the current mobile application such as accessing files within the local application directory, or hooking functions. This method not only allows to work around root detection measures, but also allows dynamic analysis of applications that are compiled for iOS or Android versions that have no available jailbreak or root method.
Cedric Van Bockhaven <cvanbockhaven=>deloitte.nl>


SCADA security demo.

During security reviews, industrial systems have revealed flaws in their communications that may lead to physical damage. We are interested to find out how SCADA systems can be set-up in a closed control loop.
  • What sensors could be used to detect tampering, and how could they be hacked?
We can provide the materials to build a SCADA set-up for mixing liquids, working with robotic arms, changing traffic lights, or something else.
Coen Steenbeek <csteenbeek=>deloitte.nl>
Dima van de Wouw <dvandewouw=>deloitte.nl>


Dynamic profiles for malware communication.

Malware has communication possibilties to a (de)central c&c. To hide this traffic we use profile, so it looks like legit traffic. However these profiles are static, making it possible to fingerprint them and detect them. Making these profiles dynamic will make it harder to detect the hidden traffic and makes it harder to create signatures for monitoring tools.
Cedric van Bockhaven <CvanBockhaven=>deloitte.nl>
Ari Davies <adavies=>deloitte.nl>


HVisualising security boundaries and POIs in virtualised environments.

Because infrastructure is increasingly dynamic we want to have an automated scanner which is capable of collecting data from host systems and map out security boundaries between guests and the network itself. Taking into account guest and host firewall settings. Ideally it would run as some agent to collect the data and then compiling a simple network graph which highlights hosts and paths between systems to find points of interests or certain super nodes which may pose a security risk if compromised.

For the project the scope would be limited to identifying services and possible security risks that occur from a network topology point of view. Preferably it would function for both para-virtualised and fully virtualised environments or even jail environments like FreeBSD jails. Although taking into account all possible firewall software stacks and the different configurations possible this would probably stretch the scope. An alternative could be running the tool on all nodes and collecting data by just trying to punch holes and seeing if it works or not. so decentralised network scanning and graphing and adding some measure of evaluating risk impacts.
Esan Wit <esan=>bunq.com>


Building an open-source, flexible, large-scale static code analyzer.

Background information
Data drives business, and maybe even the world. Businesses that make it their business to gather data are often aggregators of client­side generated data. Client­side generated data, however, is inherently untrustworthy. Malicious users can construct their data to exploit careless, or naive, programming and use this malicious, untrusted data to steal information or even take over systems.
It is no surprise that large companies such as Google, Facebook and Yahoo spend considerable resources in securing their own systems against would­be attackers. Generally, many methods have been developed to make untrusted data cross the trust­boundary to trusted data, and effectively make malicious data harmless. However, securing your systems against malicious data often requires expertise beyond what even skilled programmers might reasonably possess.
Problem description
Ideally, tools that analyze code for vulnerabilities would be used to detect common security issues. Such tools, or static code analyzers, exist, but are either out­dated (http://rips­scanner.sourceforge.net/) or part of very expensive commercial packages (https://www.checkmarx.com/ and http://armorize.com/). Next to the need for an open­source alternative to the previously mentioned tools, we also need to look at increasing our scope. Rather than focusing on a single codebase, the tool would ideally be able to scan many remote, large­scale repositories and report the findings back in an easily accessible way.
An interesting target for this research would be very popular, open­source (at this stage) Content Management Systems (CMSs), and specifically plug­ins created for these CMSs. CMS cores are held to a very high coding standard and are often relatively secure. Plug­ins, however, are necessarily less so, but are generally as popular as the CMSs they’re created for. This is problematic, because an insecure plug­in is as dangerous as an insecure CMS. Experienced programmers and security experts generally audit the most popular plug­ins, but this is: a) very time­intensive, b) prone to errors and c) of limited scope, ie not every plug­in can be audited. For example, if it was feasible to audit all aspects of a CMS repository (CMS core and plug­ins), the DigiNotar debacle could have easily been avoided.
Research proposal
Your research would consist of extending our proof­of­concept static code analyzer written in Python and using it to scan code repositories, possibly of some major CMSs and their plug­ins, for security issues and finding innovative ways of reporting on the massive amount of possible issues you are sure to find. Help others keep our data that little bit more safe.
Patrick Jagusiak <patrick.jagusiak@dongit.nl>
Wouter van Dongen <wouter.vandongen@dongit.nl>


Leader election and logical organization in inter-cloud virtual machines.

The objective of the project is to create a service that is deployed on every VM of a distributed cluster which allows the cluster of VMs to elect a leader. This can be extended further so that the cluster of VMs can have different groups with each group having its own leader.

When considering highly distributed volatile systems as those that can be created using virtual machines from different cloud providers, mapping distributed applications to the virtual machines is not a trivial task. The basic necessities for having a functioning distributed system can not be taken for granted. E.g. networking between nodes on different providers can quickly get out of hand. Another issue is logical organization of the nodes into groups with leaders. Many application mapping scenarios require temporary leaders to e.g. coordinate replication or coalesce monitoring information. Without any central node this leader needs to be elected just-in- time in a distributed fashion. It is convenient that virtual machines be equipped with a service that help organize themselves into logical groups where every group elects a leader. The volatile cloud environment means that virtual machines come and go thus the service must be dynamic enough to ensure a leader is elected in every scenario. With such a service running on each VM, an application can query the service to request the leader IP or other ID information and use this info to further optimize the application scheduling.

This area has been studied for a long time and there are various algorithms to achieve consensus and leader election. The most common are Paxos and Raft algorithms where Raft is simpler. Also, of research interest is blockchain consensus which has been popularized by bitcoin and is aimed at achieving consensus on untrusted nodes.
Yuri Demchenko <y.demchenko=>uva.nl>
Reggie Cushing <r.s.cushing=>uva.nl>