confusables.js - Unicode confusables in javascript
24 Aug 2015The Unicode Confusables have long been of interest in testing security of applications and social engineering. I work with Unicode often in tools and testing, and wanted to have the confusables data available in a javascript module confusables.js. The Unicode confusables are characters which are visually similar and easily confused with other characters. More information is available from the Unicode Consortium at http://www.unicode.org/reports/tr36/#visual_spoofing.
Because of some limitations in most javascript implementations, confusables.js requires a modified String.fromCodePoint and this polyfill by Mathias Bynens works just fine.
Also known as homoglyphs, lookalikes, and spoofs - the confusables are characters that visually resemble or are indistinguishable from another character. For example the following two characters are visually similar and confusing:
FF21 ; 0041 ; SA # ( A → A ) FULLWIDTH LATIN CAPITAL LETTER A → LATIN CAPITAL LETTER A
Sometimes during penetration testing, we want to bypass word blacklists, spoof URLs, spoof email addresses, or perform other tasks. Being able to generate lookalike strings can be quite useful in these cases, and we also know that bad guys will apply the same tactics to bypass antivirus or other security boundaries as well.
If you require more capability than this javascript provides, then go check out the Unicode Consortium's utility for generating confusables.
Note that generating a full list of all confusable permutations is expensive and often unnecessary, so confusables.js only generates a single permutation from randomly selected characters.
Installation
The test page index.html
is running at http://lookout.net/test/confusablesjs
In a browser:
Two public methods are available with confusables.js to return the confusable data. You can pass in a string of characters and get a randomly selected string of confusable characters returned, or you can pass in a code point or single character and get an array of all confusables for that character.
The confusables.utility.getConfusableString()
method accepts a string of one or more characters as input and returns a string of confusable characters. Since each character of input can have several confusables, a random one is selected from the data set. This provides a quick and convenient way to select confusables without enumerating the entire set.
The confusables.utility.getConfusableCharacters()
method accepts a single character or code point value (decimal or hex) as input and returns all of it's confusable characters in an array, which could be multidimensional when several characters combine to create a single confusable: