|
title
summary |
supervisor
contact
students |
R
P |
1
/
2 |
2 |
Automated migration testing.
Unattended content management systems are a serious risk factor for internet security and for end users, as they
allow trustworthy information sources on the web to be easily infected with malware and turn evil.
- How can we use well known software testing methodologies (e.g. continuous integration) to automatically
test if available updates to software running on a website that fix security weaknesses can be safely
implement with as minimal involvement of the end user as possible?
- How would such a migration work in a real world scenario?
In this project you will at the technical requirements for automated migration testing, and if possible design a
working prototype. |
Michiel Leenaars <michiel=>nlnet.nl>
|
|
|
3 |
Virtualization vs. Security Boundaries.
Traditionally, security defenses are built upon a classification of the sensitivity and criticality of data and
services. This leads to a logical layering into zones, with an emphasis on command and control at the point of
inter-zone traffic. The classical "defense in depth" approach applies a series of defensive measures applied to
network traffic as it traverses the various layers.
Virtualization erodes the natural edges, and this affects guarding system and network boundaries. In turn,
additional technology is developed to add instruments to virtual infrastructure. The question that arises is the
validity of this approach in terms of fitness for purpose, maintainability, scalability and practical viability.
|
Jeroen Scheerder
<Jeroen.Scheerder=>on2it.net>
|
|
|
4 |
Efficient delivery of tiled streaming content.
HTTP Adaptive Streaming (e.g. MPEG DASH, Apple HLS, Microsoft Smooth Streaming) is responsible for an
ever-increasing share of streaming video, replacing traditional streaming methods such as RTP and RTMP. The main
characteristic of HTTP Adaptive Streaming is that it is based on the concept of splitting content up in numerous
small chunks that are independently decodable. By sequentially requesting and receiving chunks, a client can
recreate the content. An advantage of this mechanism is that it allows a client to seamlessly switch between
different encodings (e.g. qualities) of the same content.
The technique known as Tiled Streaming build on this concept by not only splitting up content temporally, but
also spatially, allowing for specific areas of a video to be independently encoded and requested. This method
allows for the navigation in ultra-high resolution content, while not requiring the entire video to be
transmitted.
An open question is how these numerous spatial tiles can be distributed and delivered most efficiently over a
network, reducing both unnecessary overhead as well as latency. |
Omar Niamut <omar.niamut=>tno.nl>
|
|
|
6 |
Ad-hoc identity information exchange using the Sovrin blockchain.
Summary: Sovrin (sorvin.org) is a blockchain for self-sovereign identities. TNO operates one of the nodes of the
Sovrin network. Sovrin enables easy exchange and verification of identity information (e.g. "age=18+") for
business transactions. Potential savings are estimated to be over 1 B€ per year for just the Netherlands.
However, Sovrin provides only an underlying infrastructure. Additional query-response protocols are needed. This
is being studied in e.g. the Techruption Self-Sovereign-Identity-Framework (SSIF) project. The research
question is which functionalities are needed in the protocols for this. The work includes the development of a
datamodel, as well as an implementation that connects to the Sovrin
network.
(2018-05) |
Oskar van Deventer <oskar.vandeventer=>tno.nl>
|
R
P |
|
7 |
Qualitative analysis of Internet measurement methods and bias.
In the past year NLnet Labs and other organisations have run a number of measurements on DNSSEC deployment and
validation. We used the RIPE Atlas infrastructure for measurements, while other used Google ads where
flash code runs the measurements. The results differ as the measurement points (or observation points)
differ: RIPE Atlas measurment points are mainly located in Europe, while Google ads flash measurements run
global (or with some stronger representation of East-Asia).
Question is can we quantify the bias in the Atlas measurements or qualitative compare the measurements, so we
can correlate the results of both measurement platforms. This would greatly help interpret our results and
the results from others based on the Atlas infrastructure. The results are highly relevant as many operational
discussions on DNS and DNSSEC deployment are supported or falsified by these kind of measurements.
|
Willem Toorop <willem=>nlnetlabs.nl>
|
R
P |
2
|
9 |
Building an open-source, flexible, large-scale static code
analyzer.
Background information
Data drives business, and maybe even the world. Businesses that make it their business to gather data are often
aggregators of clientside generated data. Clientside generated data, however, is inherently untrustworthy.
Malicious users can construct their data to exploit careless, or naive, programming and use this malicious,
untrusted data to steal information or even take over systems.
It is no surprise that large companies such as Google, Facebook and Yahoo spend considerable resources in
securing their own systems against wouldbe attackers. Generally, many methods have been developed to make
untrusted data cross the trustboundary to trusted data, and effectively make malicious data harmless. However,
securing your systems against malicious data often requires expertise beyond what even skilled programmers might
reasonably possess.
Problem description
Ideally, tools that analyze code for vulnerabilities would be used to detect common security issues. Such tools,
or static code analyzers, exist, but are either outdated (http://ripsscanner.sourceforge.net/) or part of very
expensive commercial packages (https://www.checkmarx.com/ and http://armorize.com/). Next to the need for an
opensource alternative to the previously mentioned tools, we also need to look at increasing our scope. Rather
than focusing on a single codebase, the tool would ideally be able to scan many remote, largescale repositories
and report the findings back in an easily accessible way.
An interesting target for this research would be very popular, opensource (at this stage) Content Management
Systems (CMSs), and specifically plugins created for these CMSs. CMS cores are held to a very high coding
standard and are often relatively secure. Plugins, however, are necessarily less so, but are generally as
popular as the CMSs they’re created for. This is problematic, because an insecure plugin is as dangerous as an
insecure CMS. Experienced programmers and security experts generally audit the most popular plugins, but this
is: a) very timeintensive, b) prone to errors and c) of limited scope, ie not every plugin can be audited. For
example, if it was feasible to audit all aspects of a CMS repository (CMS core and plugins), the DigiNotar
debacle could have easily been avoided.
Research proposal
Your research would consist of extending our proofofconcept static code analyzer written in Python and using
it to scan code repositories, possibly of some major CMSs and their plugins, for security issues and finding
innovative ways of reporting on the massive amount of possible issues you are sure to find. Help others keep our
data that little bit more safe. |
Patrick Jagusiak
<patrick.jagusiak=>dongit.nl>
Wouter van Dongen <wouter.vandongen=>dongit.nl> |
|
|
10 |
Mobile app fraud detection framework.
How to prevent fraud in mobile banking applications. Applications for smartphones are commodity goods used for
retail (and other) banking purpose. Leveraging this type of technology for money transfer attracts criminal
organisations trying to commit fraud. One of many security controls can be detection of fraudulent transactions
or other type activity. Detection can be implemented at many levels within the payment chain. One level to
implement detection could be at the application level itself. This assignment will entail research into the
information that would be required to detect fraud from within mobile banking applications and to turn fraud
around by building a client side fraud detection framework within mobile banking applications. |
Steven Raspe
<steven.raspe=>nl.abnamro.com>
|
|
|
11 |
Malware analysis NFC enabled smartphones with payment capability.
The risk of mobile malware is rising rapidly. This combined with the development of new techniques provides a
lot of new attach scenarios. One of these techniques is the use of mobile phones for payments. In this research
project you will take a look at how resistant these systems are against malware on the mobile. We would like to
look at the theoretical threats, but also perform hands-on testing.
NOTE: timing on this project might be a challenge since the testing environment is only available during the
pilot from August 1st to November 1st. |
Steven Raspe
<steven.raspe=>nl.abnamro.com>
|
|
|
12 |
Research MS Enhanced Mitigation Experience Toolkit (EMET).
Every month new security vulnerabilities are identified and reported. Many of these vulnerabilities rely on
memory corruption to compromise the system. For most vulnerabilities a patch is released after the fact to
remediate the vulnerability. Nowadays there are also new preventive security measures that can prevent
vulnerabilities from becoming exploitable without availability of a patch for the specific issue. One of these
technologies is Microsoft’s Enhanced Mitigation Experience Toolkit (EMET) this adds additional protection to
Windows, preventing many vulnerabilities from becoming exploitable. We would like to research whether this
technology is efficient in practice and can indeed prevent exploitation of a number of vulnerabilities without
applying the specific patch. Also we would like to research whether there is other impact on the system running
EMET, for example a noticeable performance drop or common software which does not function properly once EMET is
installed. If time permits it is also interesting to see if existing exploits can be modified to work in an
environment protected by EMET. |
Henri Hambartsumyan
<HHambartsumyan=>deloitte.nl>
|
|
|
13
|
Triage software.
In previous research a remote acquisition and storage solution was designed and built that allowed sparse
acquisition of disks over a VPN using iSCSI. This system allows sparse reading of remote disks. The triage
software should decide which parts of the disk must be read. The initial goal is to use meta-data to retrieve
the blocks that are assumed to be most relevant first. This in contrast to techniques that perform triage by
running remotely while performing a full disk scan (e.g. run bulk_extractor remotely, keyword scan or do a
hash based filescan remotely).
The student is asked to:
- Define criteria that can be used for deciding which (parts of) files to acquire
- Define a configuration document/language that can be used to order based on these criteria
- Implement a prototype for this acquisition
|
"Ruud Schramp (DT)" <schramp=>holmes.nl>
"Zeno Geradts (DT)" <zeno=>holmes.nl>
"Erwin van Eijk (DT)" <eijk=>holmes.nl>
|
|
|
15 |
Network Functions Virtualization and Security.
The security threat landscape is ever changing with cyber-attacks becoming increasingly sophisticated and
targeted. This accompanied by the fact that an increasing number of applications and data are moving into the
cloud only further complicates the situation. The traditional approach of securing an organization with a
firewall is probably not sufficient anymore.
The question to be addressed by the research assignment is to investigate this issue and suggest solutions for
the research and education sector in the Netherlands.
- How are the research and education institutions in the Netherlands securing themselves from cyber-attacks
today?
- What additional measures need to be taken and which functionality needs to be added to their
infrastructure (IPS, IDS?)
- Can Network Functions Virtualization play a role in providing (part) of the security functionality?
- Would it be useful if the required (virtualized) functionality is centrally arranged on a common NFV
platform instead of each research and education institution arranging this for themselves?
|
Marijke Kaat <marijke.kaat=>surfnet.nl>
|
R
P |
|
17 |
SURFdrive security.
SURFdrive is a personal cloud storage service for the Dutch higher education and research community,
offering staff, researchers and students an easy way to store, synchronise and share files in the secure and
reliable SURF community cloud.
SURFdrive is based on Owncloud, an open-source personal cloud storage product. Our challenge is to make the
software environment as safe and secure as possible. Question is:
- How can we make the environment resistant to future 0-day attacks?
Maybe anomaly detection techniques might be helpful. Research task is to examine which techniques are helpful
against 0-day attacks.
|
Rogier Spoor <Rogier.Spoor=>surfnet.nl>
|
|
|
18 |
Comparison of security features of major Enterprise Mobility Management
solutions
For years, Gartner has identified the major EMM (formarly known as MDM) vendors. These vendors are typically
rated on performance and features; security often is not addressed in detail.
This research concerns an in-depth analysis of the security features of major EMM solutions (such as MobileIron,
Good, AirWatch, XenMobile, InTune, and so forth) on major mobile platforms (iOS, Android, Windows Phone). Points
of interest include: protection of data at rest (containerization and encryption), protection of data in transit
(i.e. VPN), local key management, vendor specific security features (added to platform API's),
|
Paul van Iterson <vanIterson.Paul=>kpmg.nl>
|
|
|
21 |
Designing structured metadata for CVE reports.
Vulnerability reports such as MITRE's CVE are currently free format text, without much structure in them. This
makes it hard to machine process reports and automatically extract useful information and combine it with other
information sources. With tens of thousands of such reports published each year, it is increasingly hard to keep
a holistic overview and see patterns. With our open source Binary Analysis Tool we aim to correlate data with
firmware databases.
Your task is to analyse how we can use the information from these reports, what metadata is relevant and propose
a useful metadata format for CVE reports. In your research you make an inventory of tools that can be used to
convert existing CVE reports with minimal effort.
Armijn Hemel - Tjaldur Software Governance Solutions
|
Armijn Hemel <armijn=>tjaldur.nl> |
|
|
24 |
Verification of Objection Location Data through Picture Data Mining Techniques.
Shadows in the open give out more information about the location of the objects in the pictures. According to
the positioning, length, and reflection side of the shadow, verification of location information found in the
meta data of a picture can be verified. The objective of this project is to develop such algorithms that find
freely available images on the internet where tempering with the location data has been performed. The
deliverable from this project are the location verification algorithms, a live web service that verifies the
location information of the object, and a non-public facing database that contains information about images that
had the location information in their meta-data, removed or falsely altered.
|
Junaid Chaudhry <chaudhry=>ieee.org>
|
R
P |
2
|
25 |
Multicast delivery of HTTP Adaptive Streaming.
HTTP Adaptive Streaming (e.g. MPEG DASH, Apple HLS, Microsoft Smooth Streaming) is responsible for an
ever-increasing share of streaming video, replacing traditional streaming methods such as RTP and RTMP. The main
characteristic of HTTP Adaptive Streaming is that it is based on the concept of splitting content up in numerous
small chunks that are independently decodable. By sequentially requesting and receiving chunks, a client can
recreate the content. An advantage of this mechanism is that it allows a client to seamlessly switch between
different encodings (e.g. qualities) of the same content.
There is a growing interest from both content parties as well as operators and CDNs to not only be able to
deliver these chunks over unicast via HTTP, but to also allow for them to be distributed using multicast. The
question is how current multicast technologies could be used, or adapted, to achieve this goal. |
Ray van Brandenburg
<ray.vanbrandenburg=>tno.nl>
|
|
|
26 |
Generating test images for forensic file system parsers.
Traditionally, forensic file system parsers (such as The Sleuthkit and the ones contained in Encase/FTK etc.)
have been focused on extracting as much information as possible. The state of software in general is lamentable
— new security vulnerabilities are found every day — and forensic software is not necessarily an exception.
However, software bugs that affect the results used for convictions or acquittals in criminal court are
especially damning. As evidence is increasingly being processed in large automated bulk analysis systems without
intervention by forensic researchers, investigators unversed in the intricacies of forensic analysis of digital
materials are presented with multifaceted results that may be incomplete, incorrect, imprecise, or any
combination of these.
There are multiple stages in an automated forensic analysis. The file system parser is usually one of the
earlier analysis phases, and errors (in the form of faulty or missing results) produced here will influence the
results of the later stages of the investigation, and not always in a predictable or detectable manner. It is
relatively easy (modulo programmer quality) to create strict parsers that bomb-out on any unexpected input. But
real-world data is often not well-formed, and a parser may need to be able to resync with input data and resume
on a best-effort basis after having reached some unexpected input in the format. While file system images are
being (semi-) hand-generated to test parsers, when doing so, testers are severely limited by their imagination
in coming up with edge cases and corner cases. We need a file system chaos monkey.
The assignment consists of one of the following (may also be spawned in a separate RP:
- Test image generator for NTFS. Think of it as some sort of fuzzer for forensic NTFS parsers. NTFS is a
complex filesystem which offers interesting possibilities to trip a parser or trick it into yielding
incorrect results. For this project, familiarity with C/C++ and the use of the Windows API is required (but
only as much as is necessary to create function wrappers). The goal is to automatically produce "valid" — in
the sense of "the bytes went by way of ntfs.sys" — but hopefully quite bizarre NTFS images.
- Another interesting research avenue lies in the production of /subtly illegal/ images. For instance, in
FAT, it should be possible, in the data format, to double-book clusters (aking to a hard link). It may also
be possible to create circular structures in some file systems. It will be interesting to see if and how
forensic filesystem parsers deal with such errors.
|
"Wicher Minnaard (DT)" <wicher=>holmes.nl>
Zeno Geradts <zeno=>holmes.nl>
|
|
|
28 |
Android Application Security.
Recent Android releases have significantly improved support for full disk encryption, with it being enabled by
default as of version 5.0. As we have seen on iOS full disk encryption is not fully effective (powering on the
device decrypts the disk). With disk encryption potentially not fully effective there may be need for encryption
on the application level that developers can include in their app. Research the possibility for secure
encryption per app, either via loadable libraries in the app, or perhaps a encryption layer between OS and app.
Make a proof-of-concept implementation if the time allows for it. Note that dynamic code loading comes with its
own set of application security tradeoffs.
- Sufficient programming skills are needed.
|
Rick van Galen <vanGalen.Rick=>kpmg.nl>
|
R
P |
|
29 |
(In)security of java usage in large software frameworks and middleware.
Java is used in almost all large software application packages. Examples such packages are middleware (Tomcat,
JBoss and WebSphere) and products like SAP and Oracle. Goal of this research is to investigate on the possible
attacks that exists on Java (e.g. RMI) used in such large software packages and develop a framework to securely
deploy (or attack) those. |
Martijn Sprengers <Sprengers.Martijn=>kpmg.nl>
|
|
|
31
|
Virtual reality interface for data analysis.
This project involves designing and developing a virtual reality (VR) interface for the analysis of large
volumes of DNS data. The virtual world should enable the user to explore the data on an intuitive basis. The VR
interface should also aid the recognition of irregularities and interrelationships.
More info:
|
Marco Davids <marco.davids=>sidn.nl>
Cristian Hesselman <cristian.hesselman=>sidn.nl>
|
R
P |
1
|
32
|
Usage Control in the Mobile Cloud.
Mobile clouds [1] aim to integrate mobile computing and sensing with rich computational resources offered by
cloud back-ends. They are particularly useful in services such as transportation, healthcare and so on when used
to collect, process and present data from physical world. In this thesis, we will focus on the usage control, in
particular privacy, of the collected data pertinent to mobile clouds. Usage control[2] differs from traditional
access control by not only enforcing security requirements on the release of data by also on what happens
afterwards. The thesis will involve the following steps:
- Propose an architecture over cloud for "usage control as a service" (extension of authorization as a
service) for the enforcement of usage control policies
- Implement the architecture (compatible with Openstack[3] and Android) and evaluate its performance.
References
[1] https://en.wikipedia.org/wiki/Mobile_cloud_computing
[2] Jaehong Park, Ravi S. Sandhu: The UCONABC usage control model. ACM Trans. Inf. Syst. Secur. 7(1): 128-174
(2004)
[3] https://en.wikipedia.org/wiki/OpenStack
[4] Slim Trabelsi, Jakub Sendor: "Sticky policies for data control in the cloud" PST 2012: 75-80
|
Fatih Turkmen <F.Turkmen=>uva.nl> |
|
|
33
|
Detection of DDoS Mitigation. Recent rise in DDoS issues have given rise to a wide
range of mitigation approaches.
An attacker that seeks to maximize impact could be interested in predicting potential success: is a potential
target "protected" or not? Deciding this question probably involves measurements, and reasoning about
measurement results -- heuristics? -- among other things. How to? To what extent can an attacker
expect to succeed in detecting the presence/absence of protective layers on the intermediate network path?
For more information in Dutch:
|
Jeroen Scheerder <js=>on2it.net>
|
R
P |
|
34
|
Automated asset identification in large organizations.
Many large organizations are struggling to remain in control over their IT infrastructure. What would help for
these organizations is automated asset identification: given an internal IP range, scan the network and based on
certain heuristics identify what the server's role is (i.e. is it a web server, a database, an ERP system, an
end user, or a VoIP device).
|
Rick van Galen
<vanGalen.Rick=>kpmg.nl>
|
R
P |
|
36 |
Forensic investigation of smartwatches.
Smartwatches are an unknown area in information risk. They are an additional display for certain sensitive data
(i.e. executive mail, calendars and other notifications), but are not necessarily covered by organizations'
existing mobile security products. In addition, it is often much easier to steal a watch than it is to steal a
phone. What is the data that gets 'left behind' on smartwatches in case of theft, and what information risks do
they pose? |
Rick van Galen <vanGalen.Rick=>kpmg.nl>
|
|
|
38 |
WhatsApp end-to-end encryption: show us the money.
WhatsApp has recently switched to using the Signal protocol for their messaging, which should provide greatly
enhanced security and privacy over their earlier, non end-to-end encrypted propietary protocol. Of course, since
WhatsApp is closed source, one has to trust WhatsApp to actually use this Signal protocol, since one cannot
review the source code. What other (automated) methods are there to verify that WhatsApp actually employs this
protocol? This research is about reverse engineering Android and/or iOS apps. |
Rick van Galen <vanGalen.Rick=>kpmg.nl> |
|
|
39 |
Video broadcasting manipulation detection.
The detection of manipulation of broadcasting videostreams with facial morphing on the internet . Examples are
provided from https://dl.acm.org/citation.cfm?id=2818122
and other on line sources.
|
"Zeno Geradts (DBS)" <zeno=>holmes.nl>
|
|
|
40 |
Imagenet classification of images.
Classify
- omputer screens (images in closeup with texty such as passwords en recovery keys
- handwritten notes
- screenshots (incl with visible desktop)
- natural photograph
|
"Zeno Geradts (DBS)" <zeno=>holmes.nl> |
|
|
41 |
Various projects @ Deloitte.
Please follow the link below and look specifically for the one month projects. Inform me (CdL) which one you
want to do an we create a separate project number for that.
Topic: Adding some new tests to our existing QuickScan vulnerability scanner.
Area of expertise: Development / Hacking.
Abstract: We are in the process of updating our existing QuickScan vulnerability scanner. It currently scans
for issues such as improperly configured certificates, existence of admin interfaces, vulnerabilities such as
Heartbleed, etc. We would like to add some tests, such as a check for Shellshock, HttPoxy, support for Perfect
Forward Secrecy and Secure Renegotiation.
Duration: 1 month
Topic: Evaluating various executable packers (MS Windows) and understanding how A/V products behave
Area of expertise: Red Teaming Operations
Abstract: An executable packer is a software that modifies the actual executable code while maintaining
the files behavior. Commonly used to reduce the file size of large executables for added portability or most
commonly to obfuscate them and make reverse engineering an complicated and costly or intensive process. There
are multiple legitimate and underground software packers. The purpose of this research is to identify the most
common of them and evaluate them against a number of common Antivirus (A/V) products in order to understand
the particularities between different A/V products, signature based detection and heuristic algorithms.
Duration: 1 month
Topic: Building an A/V assessment platform
Area of expertise: Red Teaming Operations
Abstract: Using common tools such as Puppet, Docker or other mass-deployment solutions create a Windows
and Linux blended solution that enables the automatic creation of a virtualized test lab for the evaluation of
a potential malware across multiple Antivirus (A/V) products concurrently and securely. This does not involve
analysis of the potential malware in a sandbox such as Cuckoo sandbox but the evaluation of an executable
across multiple free and commercial A/V products.
Duration: 1 month
Topic: How to remain undetected in an environment with Microsoft Advanced Threat Analytics (ATA)
Area of expertise: Red Teaming Operations
Abstract: In 2015 Microsoft launched an on-premises platform that protects Microsoft-driven environments from
advanced targeted attacks by automatically analyzing, learning and identifying normal and abnormal behavior of
users, devices and resources. This platform can detect a number of attacks commonly used during Red
Teaming engagements such as Pass-the-Hash and abnormal usage of the Kerberos Golden Ticket within a
domain. The purpose of this research is to figure out how to identify one or more of the following items; the
usage of ATA within a network, the location of the "beacons" that can be used to detect an attack and to
investigate what specific Windows events, network signatures or other events (could) trigger an alert.
Duration: 1 month |
"van Essen, Tim (NL -
Amsterdam)" <TvanEssen=>deloitte.nl>
|
|
|
43 |
Incident Management.
Vancis owns over 4000 virtual machines deployed over 2500 hosts. Optimizing the Incident Management System has a
strong impact on overall performance. The student will help minimizing the network disruptions and it effects in
order to increase overall availability of the network to customers. Carry out (among other) analysis and
automatic repair scripts using an iterative approach to solve issues at deeper levels.
This research question falls under the current Operational Excellence program at Vancis BV. The fproject is of
technical nature. For the project the student will have a coach available from the Network Team (Chapter lead or
the Product Owner).
|
Sander Ruiter <Sander.ruiter=>vancis.nl> |
|
|
44 |
Automatic Port Handling.
The Vancis Capacity Management System includes over 250 switches and around 2000 ports. In order allocate the
available ports to customers in a more efficient way, the team needs weekly updates based on scripted data. This
information needs to pushed automatically to the CMS using graphics that are easy to understand for the delivery
units to base their decisions on and help customers. Building such an automated flow is the objective for this
research project that also includes triggers to expand the total capacity.
This research question falls under the current Operational Excellence program at Vancis BV. The fproject is of
technical nature. For the project the student will have a coach available from the Network Team (Chapter lead or
the Product Owner).
|
Sander Ruiter <Sander.ruiter=>vancis.nl> |
|
|
45 |
Networking Monitoring System.
At Vancis we follow a preferred OSS policy. The system comprises all network services including compute and
storage components. This research question focuses on the replacement of the current Networking Monitoring
System for an OSS alternative. The student must also have business knowledge and interest for a business case is
part of the project including a technical and functional analysis using (e.g. MoSCoW method) and testing of
possible OSS solutions and make recommendations.
This research question falls under the current Operational Excellence program at Vancis BV. This project is more
in field of technical business consulting. For the project the student will have a coach available from the
Network Team (Chapter lead or the Product Owner).
|
Sander Ruiter <Sander.ruiter=>vancis.nl> |
|
|
46 |
The Serval Project.
Here a few projects from the Serval project. Not everything is equally appropriate for the SNE master, but it
gives possibly ideas for rp's.
1. Porting Serval Project to iOS
The Serval Project (http://servalproject.org, http://developer.servalproject.org/wiki) is looking to port to
iOS. There are a variety of activities to be explored in this space, including how to provide
interoperability with Android and explore user interface issues.
3. C65GS FPGA-Based Retro-Computer
The C65GS (http://c65gs.blogspot.nl, http://github.com/gardners/c65gs) is a reimplementation of the Commodore 65
computer in FPGA, plus various enhancements. The objective is to create a fun 8-bit computer for the 21st
century, complete with 1920x1200 display, ethernet, accelerometer and other features -- and then adapt it to
make a secure 8-bit smart-phone. There are various aspects of this project that can be worked on.
4. FPGA Based Mobile Phone
One of the long-term objectives of the Serval Project (http://servalproject.org,
http://developer.servalproject.org/wiki) is to create a fully-open mobile phone. We believe that the most
effective path to this is to use a modern FPGA, like a Zynq, that contains an ARM processor and sufficient FPGA
resources to directly drive cellular communications, without using a proprietary baseband radio. In this
way it should be possible to make a mobile phone that has no binary blobs, and is built using only free and
open-source software. There are considerable challenges to this project, not the least of which is
implementing 2G/3G handset communications in an FPGA. However, if successful, it raises the possibility of
making a mobile phone that has long-range UHF mobile mesh communications as a first-class feature, which would
be an extremely disruptive innovation. |
Paul Gardner-Stephen
<paul.gardner-stephen=>flinders.edu.au> |
|
|
47 |
Cross-blockchain oracle.
Interconnection between different blockchain instances, and smart contracts residing on those, will be essential
for a thriving multi-blockchain business ecosystem. Technologies like hashed timelock contracts (HTLC) enable
atomic swaps of cryptocurrencies and tokens between blockchains. A next challenge is the cross-blockchain
oracle, where the status of an oracle value on one blockchain enables or prevents a transaction on another
blockchain.
The goal of this research project is to explore the possibilities, impossibilities, trust assumptions, security
and options for a cross-blockchain oracle, as well as to provide a minimal viable implementation.
(2018-05)
|
Oskar van Deventer <oskar.vandeventer=>tno.nl>
Maarten Everts <maarten.everts=>tno.nl> |
|
|
48 |
Apple File System (APFS).
Apple recently introduce APFS with their latest version of OS X, Sierra. The new file system comes with some
interesting new features that either pose challenges or opportunities for digital forensics. The goal in this
project is to pick one or more relevant features (i.e. encryption, nanosecond timestamps, flexible space
allocation, snapshot/cloning, etc.) and reverse engineer their inner workings to come up with a proof-of-concept
parsing tool that provides useful input for forensic investigations of Apple systems. |
Yonne de Bruijn <yonne.debruijn=>fox-it.com>
|
|
|
49 |
vmdk snapshot support for DfVFS.
DfVFS is a back-end library that provides read-only access to file system objects from various storage media
types. In this day and age, virtual machines are becoming more and more common in company infrastructures. This
project aims at analyzing the vmdk snapshot structure and the implementation of the vmdk snapshot structure in
DfVFS. |
Yonne de Bruijn <yonne.debruijn=>fox-it.com> |
|
|
52 |
IoT DOS prevention and corporate
responsibility.
The Dyn DOS attacks shows a fundamental problem in internet connected devices. Huge swathes of unpatched and
improperly configured devices with access to high bandwidth are misused to bring down inter What technical
prevention and detection methods can organizations employ to make sure that they are not a contributor to this
problem? And what can they do once it does appear they are inadvertently contributing to this problem? This
would focus on literary research combining research in DoS prevention, asset management, patch management and
network monitoring. |
Rick van Galen <vanGalen.Rick=>kpmg.nl>
|
|
|
54 |
Penetration test dashboarding.
A penetration test is a difficult thing for both penetration tester as the penetration tested. How does the
penetration tested really know what is going in their penetration test, and keep in control? How can the tester
himself stay up to date on what his/her team members are actually doing?
Penetration testing is a creative process, but it can be dashboarded to some degree. The data is there - mostly
in log files - but it requires an extraordinary amount to make this log data understandable in human terms. But,
making this understandable can be an automated process using penetration test tooling. But - what information is
in fact required to be displayed in this dashboard, and what is the best way of actually showing this data? This
research would combine literary research and interviews with the development of a small proof-of-concept.
|
Rick van Galen <vanGalen.Rick=>kpmg.nl> |
|
|
55 |
Forensic investigation of wearables.
Wearables and especially smartwatches are an unfamiliar area in information risk because of their novelty. At
the moment, primary concerns are aimed at privacy issues, but not at information risk issues. However, these
devices are an additional display for certain sensitive data (i.e. executive mail, calendars and other
notifications), but are not necessarily covered by organizations' existing mobile security processes and
technology. In addition, it is often simply much easier to steal a watch or another wearable than it is to steal
a phone.
This research focuses on the following question: what value could a wearable have to cyber criminals when it is
stolen? What is the data that gets 'left behind' on smartwatches in case of theft, and what information risks do
they pose?
|
Rick van Galen <vanGalen.Rick=>kpmg.nl> |
|
|
56 |
Does a healthy lifestyle harm a cyber healthy
lifestyle?
With the "recent" health trend of fitness apps and hardware such as fitbits, combined with the /need/ to share
results with friends, family and the world through facebook, runkeeper, strava and other sites we have entered
into an era of potential cyber unhealthiness. What potentially valuable information could be retrieved from the
web or through bluetooth about people to influence health insurance rates of individuals? Note: this is a broad
question, and it is up to the student to choose his/her own liking (e.g. focus on bluetooth security of
fitbits/mi’s; identification of individuals through strava/runkeeper posts; quantifying the public sharing of
health information; etc. etc.).
|
Ruud Verbij <Verbij.Ruud=>kpmg.nl>
|
R
P |
|
58 |
Inventory of smartcard-based healthcare identification
solutions in Europe and behond: technology and adoption.
For potential international adoption of Whitebox technology in the future, in particular the technique of
patients carrying authorization codes with them to authorize healthcare professionals, we want to make an
inventory of the current status of healthcare PKIs and smartcard technology in Europe and if possible also
outside Europe.
Many countries have developed health information exchange systems over the last 1-2 decades, most of them
without much regard of what other countries are doing, or of international interoperability. However, common to
most systems developed today is the development of a (per-country) PKI for credentials, typically smartcards,
that are provided to healthcare professionals to allow the health information exchange system to identify these
professionals, and to establish their 'role' (or rather: the speciality of a doctor, such as GP, pharmacist,
gyneacologist, etc.). We know a few of these smartcard systems, e.g., in Austria and France, but not all of
them, and we do not know their degree of adoption.
In this project, we would like students to enquire about and report on the state of the art of healthcare
smartcard systems in Europe and possibly outside Europe (e.g., Asia, Russia):
- what products are rolled out by what companies, backed by what CAs (e.g., governmental, as is the case
with the Dutch "UZI" healthcare smartcard)?
- Is it easy to obtain the relevant CA keys?
- And what is the adoption rate of these smartcards under GPs, emergency care wards, hospitals, in different
countries?
- What are relevant new developments (e.g., contactless solutions) proposed by major stakeholders or
industry players in the market?
Note that this project is probably less technical than usual for an SNE student, although it is technically
interesting. For comparison, this project may also be fitting for an MBA student.
For more information, see also (in Dutch): https://whiteboxsystems.nl/sne-projecten/#project-2-onderzoek-adoptie-health-smartcards-in-europa-en-daarbuiten
General introduction
Whitebox Systems is a UvA spin-off company working on a decentralized system for health information exchange.
Security and privacy protection are key concerns for the products and standards provided by the company. The
main product is the Whitebox, a system owned by doctors (GPs) that is used by the GP to authorize other
healthcare professionals so that they - and only they - can retrieve information about a patient when needed.
Any data transfer is protected end-to-end; central components and central trust are avoided as much as possible.
The system will use a published source model, meaning that although we do not give away copyright, the code can
be inspected and validated externally.
The Whitebox is currently transitioning from an authorization model that started with doctor-initiated static
connections/authorizations, to a model that includes patient-initiated authorizations. Essentially, patients can
use an authorization code (a kind of token) that is generated by the Whitebox, to authorize a healthcare
professional at any point of care (e.g., a pharmacist or a hospital). Such a code may become part of a referral
letter or a prescription. This transition gives rise to a number of interesting questions, and thus to possible
research projects related to the Whitebox design, implementation and use. Two of these projects are described
below. If you are interested in these project or have questions about other possibilities, please contact
<guido=>whiteboxsystems.nl>.
For a more in-depth description of the projects below (in Dutch), please see https://whiteboxsystems.nl/sne-projecten/
|
Guido van 't Noordende <g.j.vantnoordende=>uva.nl>
|
|
|
59 |
Decentralized trust and key management.
Currently, the Whitebox provides a means for doctors (General Practitioner GPs) to establish static trusted
connections with parties they know personally. These connections (essentially, authenticated TLS connections
with known, validated keys), once established, can subsequently be used by the GP to authorize the party in
question to access particular patient information. Examples are static connections to the GP post which takes
care of evening/night and weekend shifts, or to a specific pharmacist. In this model, trust management is
intuïtive and direct. However, with dynamic authorizations established by patients (see general description
above), a question comes up on whether the underlying (trust) connections between the GP practice (i.e., the
Whitebox) and the authorized organization (e.g,. hospital or pharmacist) may be re-usable as a 'trusted'
connection by the GP in the future.
The basis question is:
- what is the degree of trust a doctor can place in (trust) relations that are established by this doctor's
patients, when they authorize another healthcare professional?
More in general:
- what degree of trust that can be placed in relations/connections established by a patient, also in view of
possible theft of authorization tokens held by patients?
- What kind of validation methods can exist for a GP to increase or validate a given trust relation implied
by an authorization action of a patient?
Perhaps the problem can be raised to a higher level also: can (public) auditing mechanisms -- for example, using
block chains -- be used to help establish and validate trust in organizations (technically: keys of such
organizations), in systems that implement decentralized trust-base transactions, like the Whitebox system does?
In this project, the student(s) may either implement part of a solution or design, or model the behavior of a
system inspired by the decentralized authorization model of the Whitebox.
As an example: reputation based trust management based on decentralized authorization actions by patients of
multiple doctors may be an effective way to establish trust in organization keys, over time. Modeling trust
networks may be an interesting contribution to understanding the problem at hand, and could thus be an
interesting student project in this context.
NB: this project is a rather advanced/involved design and/or modelling project. Students should be confident on
their ability to understand and design/model a complex system in the relatively short timeframe provided by an
RP2 project -- this project is not for the faint of heart. Once completed, an excellent implementation or
evaluation may become the basis for a research paper.
See also (in Dutch): https://whiteboxsystems.nl/sne-projecten/#project-2-ontwerp-van-een-decentraal-vertrouwensmodel
General introduction
Whitebox Systems is a UvA spin-off company working on a decentralized system for health information exchange.
Security and privacy protection are key concerns for the products and standards provided by the company. The
main product is the Whitebox, a system owned by doctors (GPs) that is used by the GP to authorize other
healthcare professionals so that they - and only they - can retrieve information about a patient when needed.
Any data transfer is protected end-to-end; central components and central trust are avoided as much as possible.
The system will use a published source model, meaning that although we do not give away copyright, the code can
be inspected and validated externally.
The Whitebox is currently transitioning from an authorization model that started with doctor-initiated static
connections/authorizations, to a model that includes patient-initiated authorizations. Essentially, patients can
use an authorization code (a kind of token) that is generated by the Whitebox, to authorize a healthcare
professional at any point of care (e.g., a pharmacist or a hospital). Such a code may become part of a referral
letter or a prescription. This transition gives rise to a number of interesting questions, and thus to possible
research projects related to the Whitebox design, implementation and use. Two of these projects are described
below. If you are interested in these project or have questions about other possibilities, please contact
<guido=>whiteboxsystems.nl>.
For a more in-depth description of the projects below (in Dutch), please see https://whiteboxsystems.nl/sne-projecten/
|
Guido van 't Noordende <g.j.vantnoordende=>uva.nl>
|
|
|
60 |
Behavioral analysis through the hypervisor.
Dynamic analysis is often used to gain more insight into the functionality and behavior of malicious (or
not-so-malicious) samples. Most sandboxes and dynamic analysis solutions use various hooking techniques or the
OS debugging APIs to for example monitor API calls or interact with the execution flow. Various issues arise:
- the confidentiality and integrity of the analysis tooling and its output can’t be guaranteed
- analyzing kernel-mode code from tooling running on top of that kernel doesn’t work too well
The goal is therefore to research and develop a monitoring and instrumentation API using hypervisor-level
debugging functionality.
|
Mitchel Sahertian <sahertian=>fox-it.com> |
|
|
61 |
Deanonymisation in Ethereum Using Existing Methods for Bitcoin.
Intelligence collected from a large number of sources help to provide context and insight in various scenarios,
for example:
- Contextual querying in (Forensic) Investigations
- Activity of malicious actors are tracked and subsequently turned into indicators of compromise that can
be used to detect and counter malicious activity.
The decentral and anonymous Bitcoin currency is exploited by actors with malicious intentions. The goal is to
research the metadata that is available on a node within the Bitcoin network, and to develop code that
structures and provides a real-time feed of this metadata. |
Arno Bakker <Arno.Bakker=>os3.nl>
Tim Dijkhuizen <tim.dijkhuizen=>os3.nl>
Robin Klusman <robin.klusman=>os3.nl> |
R
P |
1
|
62 |
Breaking CAPTCHAs on the Dark Web.
The darkweb contains among others tons of information about illegal activity, which might be interesting from an
intelligence perspective. The intelligence can be used to monitor activity related to specific high profile
organizations, or specific threat actors. Since there are a lot of different types of websites, with sometimes
unique subscriber requirements it is hard to scrape these websites. In some cases an existing member has to
vouch for a new member, users have to post at least once a month a message on the website (otherwise they will
be banned), you have to pay in bitcoins to get access etc.
The goal of this research project is to come up with a theoretical framework for scraping (potentially)
interesting darkweb websites, taking into account the different kind of subscription models and subscriber
requirements. For this research project it is not required to develop a PoC. |
Yonne de Bruijn <yonne.debruijn=>fox-it.com>
Dirk Gaastra <Dirk.Gaastra=>os3.nl>
Kevin Csuka <kevin.csuka=>os3.nl> |
R
P |
1
|
63 |
LDBC Graphalytics.
LDBC Graphalytics, is a mature, industrial-grade benchmark for graph-processing platforms. It consists of six
deterministic algorithms, standard datasets, synthetic dataset generators, and reference output, that enable the
objective comparison of graph analysis platforms. Its test harness produces deep metrics that quantify multiple
kinds of system scalability, such as horizontal/vertical and weak/strong, and of robustness, such as failures
and performance variability. The benchmark comes with open-source software for generating data and monitoring
performance.
Until recently, graph processing used only common big data infrastructure, that is, with much local and remote
memory per core and storage on disk. However, operating separate HPC and big data infrastructures is
increasingly more unsustainable. The energy and (human) resource costs far exceed what most organizations can
afford. Instead, we see a convergence between big data and HPC infrastructure.
For example, next-generation HPC infrastructure includes more cores and hardware threads than ever-before. This
leads to a large search space for application-developers to explore, when adapting their workloads to the
platform.
To take a step towards a better understanding of performance for graph processing platforms on next-generation
HPC infrastructure, we would like to work together with 3-5 students on the following topics:
- How to configure graph processing platforms to efficiently run on many/multi-core devices, such as the
Intel Knights Landing, which exhibits configurable and dynamic behavior?
- How to evaluate the performance of modern many-core platforms, such as the NVIDIA Tesla?
- How to setup a fair, reproducible experiment to compare and benchmark graph-processing platforms?
|
Alex Uta <a.uta=>vu.nl>
Marc X. Makkes <m.x.makkes=>vu.nl> |
|
|
65 |
Normal traffic flow information distribution to detect
malicious traffic.
In the era of an increasingly encrypted communication it is getting harder to distinguish normal from malicious
traffic. Deep packet inspection is no longer an option, unless the trusted certificate store of the monitored
clients is altered. However, Netflow data might still be able to provide relevant information about the parties
involved in the communication and the traffic volumes they exchange. So would it be possible to tell apart
ill-intentioned traffic by looking only at the flows and using a little help from the content providers, like
for example website owners and mobile application vendors?
The basic idea is to research a framework or a data interchange format between the content providers, described
above, and the monitoring devices. Both in the case of a website and a mobile application such a description can
be used to list the authorised online resources that should be used and what is the relative distribution of the
traffic between them. If such a framework proves to be successful, it can help in alerting for covert channel
malware communication, cross-site scripting and all other types of network communication not initially intended
by the original content provider.
|
TBD
|
|
|
67 |
Smart performance information discovery for Cloud
resources.
The selection of virtual machines (VMs) must account for the performance requirements of applications (or
application components) to be hosted on them. The performance of components on specific types of VM can be
predicted based on static information (e.g. CPU, memory and storage) provided by cloud providers, however the
provisioning overhead for different VM instances and the network performance in one data centre or across
different data centres is also important. Moreover, application-specific performance cannot always be easily
derived from this static information.
An information catalogue is envisaged that aims to provide a service that can deliver the most up to date cloud
resource information to cloud customers to help them use the Cloud better. The goal of this project will be to
extend earlier work [1], but will focus on smart performance information discovery. The student will:
- Investigate the state of the art for cloud performance information retrieval and cataloguing.
- Propose Cloud performance metadata, and prototype a performance information catalogue.
- Customize and integrate an (existing) automated performance collection agent with the catalogue.
- Enable smart query of performance information from the catalogue using certain metadata.
- (Optional) Test the results with the use cases in on-going EU projects like SWITCH.
Some reading material:
- Elzinga, O., Koulouzis, S., Hu, Y., Wang, J., Zhou, H., Martin, P., Taal, A., de Laat, C., and Zhao, Z
(2017), Automatic collector for dynamic cloud performance Information, IEEE Networking, Architecture and
Storage (NAS), Shenzheng, China, Auguest 7-8, 2017 https://doi.org/10.1109/NAS.2017.8026845
More info: Arie Taal, Paul Martin, Zhiming Zhao |
Zhiming Zhao <z.zhao=>uva.nl> |
|
|
68 |
Network aware performance optimization for Big Data
applications using coflows.
Optimizing data transmission is crucial to improve the performance of data intensive applications. In many
cases, network traffic control plays a key role in optimising data transmission especially when data volumes are
very large. In many cases, data-intensive jobs can be divided into multiple successive computation stages, e.g.,
in MapReduce type jobs. A computation stage relies on the outputs of the the previous stage and cannot start
until all its required inputs are in place. Inter-stage data transfer involves a group of parallel flows, which
share the same performance goal such as minimising the flow's completion time.
CoFlow is an application-aware network control model for cluster-based data centric computing. The CoFlow
framework is able to schedule the network usage based on the abstract application data flows (called coflows).
However, customizing CoFlow for different application patterns, e.g., choosing proper network scheduling
strategies, is often difficult, in particular when the high level job scheduling tools have their own optimizing
strategies.
The project aims to profile the behavior of CoFlow with different computing platforms, e.g., Hadoop and Spark
etc.
- Review the existing CoFlow scheduling strategies and related work
- Prototyping test applications using big data platforms (including Apache Hadoop, Spark, Hive, Tez).
- Set up coflow test bed (Aalo, Varys etc.) using existing CoFlow installations.
- Benchmark the behavior of CoFlow in different application patterns, and characterise the behavior.
Background reading:
- CoFlow introduction: http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-211.pdf
- Junchao Wang, Huan Zhouy, Yang Huz, Cees de Laatx and Zhiming Zhao, Deadline-Aware Coflow Scheduling in a
DAG, in NetCloud 2017, Hongkong, to appear [upon request]
More info: Junchao Wang, Spiros Koulouzis, Zhiming Zhao |
Zhiming Zhao <z.zhao=>uva.nl> |
|
|
69 |
Elastic data services for time critical distributed
workflows.
Large-scale observations over extended periods of time are necessary for constructing and validating models of
the environment. Therefore, it is necessary to provide advanced computational networked infrastructure for
transporting large datasets and performing data-intensive processing. Data infrastructures manage the lifecycle
of observation data and provide services for users and workflows to discover, subscribe and obtain data for
different application purposes. In many cases, applications have high performance requirements, e.g., disaster
early warning systems.
This project focuses on data aggregation and processing use-cases from European research infrastructures, and
investigates how to optimise infrastructures to meet critical time requirements of data services, in particular
for different patterns of data-intensive workflow. The student will use some initial software components [1]
developed in the ENVRIPLUS [2] and SWITCH [3] projects, and will:
- Model the time constraints for the data services and the characteristics of data access patterns found in
given use cases.
- Review the state of the art technologies for optimising virtual infrastructures.
- Propose and prototype an elastic data service solution based on a number of selected workflow patterns.
- Evaluate the results using a use case provided by an environmental research infrastructure.
Reference:
-
https://staff.fnwi.uva.nl/z.zhao/software/drip/
- http://www.envriplus.eu
- http://www.switchproject.eu
More info: —Spiros Koulouzis, Paul Martin, Zhiming Zhao |
Zhiming Zhao <z.zhao=>uva.nl> |
|
|
70 |
Contextual information capture and analysis in data
provenance.
Tracking the history of events and the evolution of data plays a crucial role in data-centric applications for
ensuring reproducibility of results, diagnosing faults, and performing optimisation of data-flow. Data
provenance systems [1] are a typical solution, capturing and recording the events generated in the course of a
process workflow using contextual metadata, and providing querying and visualisation tools for use in analysing
such events later.
Conceptual models such as W3C PROV (and extensions such as ProvONE), OPM and CERIF have been proposed to
describe data provenance, and a number of different solutions have been developed. Choosing a suitable
provenance solution for a given workflow system or data infrastructure requires consideration of not only the
high-level workflow or data pipeline, but also performance issues such as the overhead of event capture and the
volume of provenance data generated.
The project will be conducted in the context of EU H2020 ENVRIPLUS project [1, 2]. The goal of this project is
to provide practical guidelines for choosing provenance solutions. This entails:
- Reviewing the state of the art for provenance systems.
- Prototyping sample workflows that demonstrate selected provenance models.
- Benchmarking the results of sample workflows, and defining guidelines for choosing between different
provenance solutions (considering metadata, logging, analytics, etc.).
References:
- About project: http://www.envriplus.eu
- Provenance background in ENVRIPLUS: https://surfdrive.surf.nl/files/index.php/s/uRa1AdyURMtYxbb
- Michael Gerhards, Volker Sander, Torsten Matzerath, Adam Belloum, Dmitry Vasunin, and Ammar
Benabdelkader. 2011. Provenance opportunities for WS-VLAM: an exploration of an e-science and an e-business
approach. In Proceedings of the 6th workshop on Workflows in support of large-scale science (WORKS '11). http://dx.doi.org/10.1145/2110497.2110505
More info: - Zhiming Zhao, Adam Belloum, Paul Martin |
Zhiming Zhao <z.zhao=>uva.nl> |
|
|
73 |
Profiling Partitioning Mechanisms for Graphs with
Different Characteristics.
In computer systems, graph is an important model for describing many things, such as workflows, virtual
infrastructures, ontological model etc. Partitioning is an frequently used graph operation in the contexts like
parallizing workflow execution, mapping networked infrastructures onto distributed data centers [1], and
controlling load balance of resources. However, developing an effective partition solution is often not easy; it
is often a complex optimization issue involves constraints like system performance and cost constraints.
A comprehensive benchmark on graph partitioning mechanisms is helpful to choose a partitioning solver for a
specific model. This portfolio can also give advices on how to partition based on the characteristics of the
graph. This project aims at benchmarking the existing partition algorithms for graphs with different
characteristics, and profiling their applicability for specific type of graphs.
This project will be conducted in the context of EU SWITCH [2] project. the students will:
- Review the state of the art of the graph partitioning algorithms and related tools, such as Chaco, METIS
and KaHIP, etc.
- Investigate how to define the characteristics of a graph, such as sparse graph, skewed graph, etc. This
can also be discussed with different graph models, like planar graph, DAG, hypergraph, etc.
- Build a benchmark for different types of graphs with various partitioning mechanisms and find the
relationship behind.
- Discuss about how to choose a partitioning mechanism based on the graph characteristics.
Reading material:
- Zhou, H., Hu Y., Wang, J., Martin, P., de Laat, C. and Zhao, Z., (2016) Fast and Dynamic Resource
Provisioning for Quality Critical Cloud Applications, IEEE International Symposium On Real-time Computing
(ISORC) 2016, York UK
http://dx.doi.org/10.1109/ISORC.2016.22
- SWITCH: www.switchproject.eu
More info: Huan Zhou, Arie Taal, Zhiming Zhao
|
Zhiming Zhao <z.zhao=>uva.nl> |
|
|
74 |
Auto-Tuning for GPU Pipelines and Fused Kernels.
Achieving high performance on many-core accelerators is a complex task, even for experienced programmers. This
task is made even more challenging by the fact that, to achieve high performance, code optimization is not
enough, and auto-tuning is often necessary. The reason for this is that computational kernels running on
many-core accelerators need ad-hoc configurations that are a function of kernel, input, and accelerator
characteristics to achieve high performance. However, tuning kernels in isolation may not be the best strategy
for all scenarios.
Imagine having a pipeline that is composed by a certain number of computational kernels. You can tune each of
these kernels in isolation, and find the optimal configuration for each of them. Then you can use these
configurations in the pipeline, and achieve some level of performance. But these kernels may depend on each
other, and may also influence each other. What if the choice of a certain memory layout for one kernel causes
performance degradation on another kernel?
One of the existing optimization strategies to deal with pipelines is to fuse kernels together, to simplify
execution patterns and decrease overhead. In this project we aim to measure the performance of accelerated
pipelines in three different tuning scenarios:
- tuning each component in isolation,
- tuning the pipeline as a whole, and
- tuning the fused kernel. Measuring the performance of one or more pipelines in these scenarios we hope to,
on one level, being able to determine which is the best strategy for the specific pipelines on different
hardware platform, and on another level we hope to better understand which are the characteristics that
influence this behavior.
|
Rob van Nieuwpoort <R.vanNieuwpoort=>uva.nl> |
|
|
75 |
Speeding up next generation sequencing of potatoes
Genotype and single nucleotide polymorphism calling (SNP) is a technique to find bases in next-generation
sequencing data that differ from a reference genome. This technique is commonly used in (plant) genetic
research. However, most algorithms focus on allowing calling in diploid heterozygous organisms (specifically
human) only. Within the realm of plant breeding, many species are of polyploid nature (e.g. potato with 4
copies, wheat with 6 copies and strawberry with eight copies). For genotype and SNP calling in these organisms,
only a few algorithms exist, such as freebayes (https://github.com/ekg/freebayes).
However, with the increasing amount of next generation sequencing data being generated, we are noticing limits
to the scalability of this methodology, both in compute time and memory consumption (>100Gb).
We are looking for a student with a background in computer science, who will perform the following tasks:
- Examine the current implementation of the freebayes algorithm
- Identify bottlenecks in memory consumption and compute performance
- Come up with an improved strategy to reduce memory consumption of the freebayes algorithm
- Come up with an improved strategy to execute this algorithm on a cluster with multiple CPU’s or on GPU/s
(using the memory of multiple compute nodes)
- Implement an improved version of freebayes, according to the guidelines established above
- Test the improved algorithm on real datasets of potato.
This is a challenging master thesis project on an important food crop (potato) on a problem which is relevant
for both science and industry. As part of the thesis, you will be given the opportunity to present your
progress/results to relevant industrial partners for the Dutch breeding industry.
Occasional traveling to Wageningen will be required. |
Rob van Nieuwpoort <R.vanNieuwpoort=>uva.nl> |
|
|
77 |
Auto-tuning for Power Efficiency.
Auto-tuning is a well-known optimization technique in computer science. It has been used to ease the manual
optimization process that is traditionally performed by programmers, and to maximize the performance
portability. Auto-tuning works by just executing the code that has to be tuned many times on a small problem
set, with different tuning parameters. The best performing version is than subsequently used for the real
problems. Tuning can be done with application-specific parameters (different algorithms, granularity,
convergence heuristics, etc) or platform parameters (number of parallel threads used, compiler flags, etc).
For this project, we apply auto-tuning on GPUs. We have several GPU applications where the absolute performance
is not the most important bottleneck for the application in the real world. Instead the power dissipation of the
total system is critical. This can be due to the enormous scale of the application, or because the application
must run in an embedded device. An example of the first is the Square Kilometre Array, a large radio telescope
that currently is under construction. With current technology, it will need more power than all of the
Netherlands combined. In embedded systems, power usage can be critical as well. For instance, we have GPU codes
that make images for radar systems in drones. The weight and power limitations are an important bottleneck
(batteries are heavy).
In this project, we use power dissipation as the evaluation function for the auto-tuning system. Earlier work by
others investigated this, but only for a single compute-bound application. However, many realistic applications
are memory-bound. This is a problem, because loading a value from the L1 cache can already take 7-15x more
energy than an instruction that only performs a computation (e.g., multiply).
There also are interesting platform parameters than can be changed in this context. It is possible to change
both core and memory clock frequencies, for instance. It will be interesting to if we can at runtime, achieve
the optimal balance between these frequencies.
We want to perform auto-tuning on a set of GPU benchmark applications that we developed. |
Rob van Nieuwpoort <R.vanNieuwpoort=>uva.nl> |
|
|
78 |
Fast Data Serialization and Networking for Apache Spark.
Apache Spark is a system for large-scale data processing used for Big Data applications business applications,
but also in many scientific applications. Spark uses Java (or Scala) object serialization to transfer data over
the network. Especially if data fits in memory, the performance of serialization is the most important
bottleneck in Spark applications. Spark currently offers two mechanisms for serialization: Standard Java object
serialization and Kryo serialization.
In the Ibis project (www.cs.vu.nl/ibis), we have developed an
alternative serialization mechanism for high-performance computing applications that relies on compile-time code
generation and zero-copy networking for increased performance. Performance of JVM serialization can also be
compared with benchmarks: https://github.com/eishay/jvm-serializers/wiki.
However, we also want to evaluate if we can increase Spark performance at the application level by using out
improved object serialization system. In addition, our Ibis implementation can use fast local networks such as
Infiniband transparently. We also want to investigate if using specialized networks increases application
performance. Therefore, this project involves extending Spark with our serialization and networking methods
(based on existing libraries), and on analyzing the performance of several real-world Spark applications. |
Rob van Nieuwpoort <R.vanNieuwpoort=>uva.nl> |
|
|
79 |
Applying and Generalizing Data Locality Abstractions for Parallel Programs.
TIDA is a library for high-level programming of parallel applications, focusing on data locality. TIDA has been
shown to work well for grid-based operations, like stencils and convolutions. These are in an important building
block for many simulations in astrophysics, climate simulations and water management, for instance. The TIDA
paper gives more details on the programming model.
This projects aims to achieve several things and answer several research questions:
TIDA currently only works with up to 3D. In many applications we have, higher dimensionalities are needed. Can
we generalize the model to N dimensions?
The model currently only supports a two-level hierarchy of data locality. However, modern memory systems often
have many more levels, both on CPUs and GPUs (e.g., L1, L2 and L3 cache, main memory, memory banks coupled to a
different core, etc). Can we generalize the model to support N-level memory hierarchies?
The current implementation only works on CPUs, can we generalize to GPUs as well?
Given the above generalizations, can we still implement the model efficiently? How should we perform the mapping
from the abstract hierarchical model to a real physical memory system?
We want to test the new extended model on a real application. We have examples available in many domains. The
student can pick one that is of interest to her/him. |
Rob van Nieuwpoort <R.vanNieuwpoort=>uva.nl> |
|
|
81 |
How large is the 'dark web' compared to the 'surface web'?
Description:
Every now and then you encounter claims that the 'surface' web is about 4% of the internet and the deep web is
about 96% of the internet. Many 'infographs' are made to illustrate this, and it is a popular believe (https://www.google.nl/search?q=surface+web+4%25+deep+web+96%26&source=lnms&tbm=isch&sa=X&ved=0ahUKEwjLp8e_nNTWAhVFU1AKHdmJDVoQ_AUICigB&biw=1689&bih=922
).
However, these claims seem to originate from a white paper released in 2001 with the following claims [https://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0007.104]:
- Public information on the deep Web is currently 400 to 550 times larger than the commonly defined World
Wide Web.
- The deep Web contains 7,500 terabytes of information compared to nineteen terabytes of information in the
surface Web.
The goal of this research project is to determine how large the dark web currently is either in absolute size or
compared to the 'surface' web. Other focus points can be regarding different definitions of 'surface', 'dark'
and 'deep' web, and how the size, popularity and/or definition of the dark web has developed itself since 2001. |
Jordi van den Breekel <vandenBreekel.Jordi=>kpmg.nl> |
|
|
87 |
Ethereum Smart Contract Fuzz Testing.
An Ethereum smart contract can be seen as a computer program that runs on the Ethereum Virtual Machine (EVM),
with the ability to accept, hold and transfer funds programmatically. Once a smart contract has been place on
the blockchain, it can be executed by anyone. Furthermore, many smart contracts accept user input. Because smart
contracts operate on a cryptocurrency with real value, security of smart contracts is of the utmost importance.
I would like to create a smart contract fuzzer that will check for unexpected behaviour or crashes of the EVM.
Based on preliminary research, such a fuzzer does not exist yet.
|
Rodrigo Marcos <rodrigo.marcos=>secforce.com>
|
R
P |
|
90 |
Credential extraction of in-memory password managers.
Tools exist for the extraction of credentials for certain popular password managers (such as KeePass 2.x, the
Chrome password manager, etc.). During redteam projects where a cyber attack is simulated, we make use of
tooling that can extract credentials from memory (e.g. KeeThief for KeePass 2.x).
However, similar tooling appears to be missing for older KeePass 1.x databases and other popular password
managers including PasswordSafe, 1Password and LastPass. We are looking to investigate which protection
mechanisms these password managers employ, and whether it is possible to extract credentials in the same way.
Both solutions for offline usage and online usage are of interest (especially if a desktop client is available). |
Cedric Van Bockhaven <cvanbockhaven=>deloitte.nl>
|
R
P |
2
|
91 |
TCP tunneling over Citrix.
Citrix provides services for remote virtual desktop infrastructure (VDI / Xen Desktop) or application
virtualization (XenApp). Citrix is sometimes used as a security measure to sandbox the execution of sensitive
applications (e.g. so a financial application that may only be run from a single server, with the users that
require the access connecting to the virtual desktop). The organization then sets additional restrictions: no
access to clipboard data, no access to shared drives, and no outbound connectivity that is allowed to prevent
data leaks.
Citrix is built on top of traditional Windows technologies such as RDP to establish the connection to the
virtualized desktop infrastructure. RDP has the capability to extend the remote desktop session with clipboard
management, attaching of printers and sound devices, and drive mapping. Additionally, it is possible to create
plugins to provide other functionalities.
The rdp2tcp project features the possibility to tunnel TCP connections (TCP forwarding) over a remote desktop
session. This means no extra ports have to be opened.
We would like to investigate whether it is possible to establish a TCP tunnel over a Citrix virtual desktop
session. This would allow routing of traffic through the Citrix server, potentially providing the ability to
move laterally through the network in order to access systems connected to the Citrix server (that are not
directly exposed to the Internet). |
Cedric Van Bockhaven <cvanbockhaven=>deloitte.nl>
|
R
P |
2
|
94 |
Security of embedded technology.
Analyzing the security of embedded technology, which operates in an ever changing environment, is Riscure's
primary business. Therefore, research and development (R&D) is of utmost importance for Riscure to stay
relevant. The R&D conducted at Riscure focuses on four domains: software, hardware, fault injection and
side-channel analysis. Potential SNE Master projects can be shaped around the topics of any of these fields.
We would like to invite interested students to discuss a potential Research Project at Riscure in any of the
mentioned fields. Projects will be shaped according to the requirements of the SNE Master.
Please have a look at our website for more information: https://www.riscure.com
Previous Research Projects conducted by SNE students:
- https://www.os3.nl/_media/2013-2014/courses/rp1/p67_report.pdf
- https://www.os3.nl/_media/2011-2012/courses/rp2/p61_report.pdf
- http://rp.os3.nl/2014-2015/p48/report.pdf
- https://www.os3.nl/_media/2011-2012/courses/rp2/p19_report.pdf
If you want to see what the atmosphere is at Riscure, please have a look at: https://vimeo.com/78065043
Please let us know If you have any additional questions! |
Niek Timmers <Timmers=>riscure.com>
Albert Spruyt <Spruyt=>riscure.com>
Martijn Bogaard <bogaard=>riscure.com>
Dana Geist <geist=>riscure.com>
|
R
P |
2
|
99 |
Invisible Internet Project - I2P.
Anonymity networks, such as Tor or I2P, were built to allow users to access network resources without
revealing their identity [1,2]. This project will be aimed at theoretical research into existing attacks and
how they hold up given I2P updates and current network size [3,4,5]. The fact that only little is known about
the i2p network and due to its potential for future growth and public perception of it being the most secure
solution compared to Tor and Freenet [6], results in our research questions:
- What are possible attacks against the I2P network?
- What is the feasibility of such attacks?
In this research, we will present a number of possible attacks against the i2p network. Specifically, the
attacks that are able to deanonymize the i2p users and reveal their identities. We will be researching from
the theoretical point of these attacks, and propose a mitigation mechanisms against them. Should time and
ethical considerations allow, a proof of concept can supplement the research.
Ethical Consideration:
During our research, we will be looking on the i2p network and the way it works. In addition to that, we will
be looking at the possible attacks from a theoretical point of view. To get a better understanding of the
network, we may need to do some practical reconnaissance. However, this will be mostly passive in nature and
no attack shall be attempted against the live I2P network. Therefore, we do not see ethical issues where any
confidential or personal data might leak.
Some related work:
- https://www.cs.ucsb.edu/~chris/research/doc/raid13_i2p.pdf
- https://static.siccegge.de/pdfs/bachelor-thesis.pdf
- https://www.dailydot.com/debug/tor-freenet-i2p-anonymous-network/
- https://hal.inria.fr/file/index/docid/653136/filename/RR-7844.pdf
- https://geti2p.net/en/comparison/tor
- https://www.tandfonline.com/doi/full/10.1080/21642583.2017.1331770
- https://hal.inria.fr/hal-01238453/file/I2P-design-vs-performance-security.pdf
See also: http://www.dcssproject.net/i2p/
|
Henri Hambartsumyan <HHambartsumyan=>deloitte.nl>
Vincent van Mieghem (vvanmieghem@deloitte.nl)
|
|
|
|