Markus Demmel
Bachelors Thesis
Differential Testing
Abstract
We live in a well-connected world in which almost any device is connected to the internet and able to communicate with other devices. This exchange of communication is based on standards like different protocols and data exchange formats. One of these exchange formats is the JavaScript Object Notation (JSON). JSON is a popular way to exchange data due to it being very efficient and language independent. Douglas Crockford published the first specification of JSON at the beginning of this millennium. Meanwhile, several specifications describe how JSON should be parsed. However, developers often do not act according to these specifications. As a result, JSON parser frequently parses more inputs than they should.
This bachelor thesis investigates the consequences of the lax implementation of the standards by evaluating various parser on diverse inputs. It generates and optimizes invalid JSON by using well-known techniques of software testing. Moreover, the thesis highlights that over 50% of the analyzed parsers are error-prone to wrong JSON Arrays and JSON Objects. Whereas the structure of most illegally parsed inputs is already known, it also points out some new designs. Furthermore, it demonstrates that the problem is not restricted to specific programming languages but is still a widely-spread problem.