Unicode security attacks and test cases – fuzzing with Unicode
23 Apr 2009
When it comes to fuzzing parsers, protocols, and other software, I want the fuzzer to be capable of producing tests specific to Unicode. Here’s what it should do at a minimum:
I’ve got some code that does most of these things. Maybe I should elaborate on them some more… Does Peach or another fuzzing framework provide this already?
- Generate half a surrogate pair in UTF-8 or UTF-16
- Generate illformed byte sequences for UTF-8 and UTF-16
- Generate overlong UTF-8
- Generate unassigned and reserved code points
- Generate codepoints outside of the valid range
- Generate interesting control characters and characters with special meaning like the BOM, embedding, overrides, etc.
I’ve got some code that does most of these things. Maybe I should elaborate on them some more… Does Peach or another fuzzing framework provide this already?