Simple Anti-Forensic and Signature stamping techniques using Unicode
June 7th, 2009 | Published in Documents, Forensics by Alfredo Panzera
The introduction of Unicode characters (such as Persian, Cyrillic and Arabic characters) has introduced both a simple means of fingerprinting intellectual property (signature stamping) and a very simple steganographic data hiding technique.
The following is an extract from the Cyrillic Unicode character set [1].
Unicode # Character
0410 А CYRILLIC CAPITAL LETTER A
0430 а CYRILLIC SMALL LETTER A
0412 В CYRILLIC CAPITAL LETTER VE
0415 Е CYRILLIC CAPITAL LETTER IE 0435 е CYRILLIC SMALL LETTER IE
041C М CYRILLIC CAPITAL LETTER EM
041E О CYRILLIC CAPITAL LETTER O
043E о CYRILLIC SMALL LETTER O
0420 Р CYRILLIC CAPITAL LETTER ER
0440 р CYRILLIC SMALL LETTER ER
0422 Т CYRILLIC CAPITAL LETTER TE
0443 у CYRILLIC SMALL LETTER U
0405 Ѕ CYRILLIC CAPITAL LETTER DZE (this is the Old Cyrillic zelo – Macedonian)
0455 ѕ CYRILLIC SMALL LETTER DZE
The basic Latin character table reflects these same symbols. The difference is that the displayed character is not the same. For instance, this can be used by an attacker seeking to complete a phishing attach using a similar domain name now that the registration of Unicode characters has been allowed. For instance, the following domains are distinctly different, but appear the same:
\x004D\x0069\x0063\x0072\x006F \x0073\x006F\x0066\x0074\x002E\x0063\x006F\x006D
\x041C\x0069\x0441\x072\x043E\x0445\x043E\x0066\x0074\x002E\x0063\x006F\x006D
| Unicode Mixed Characters | Latin Characters |
| 041C М CYRILLIC CAPITAL LETTER EM0069 i LATIN SMALL LETTER I0441 с CYRILLIC SMALL LETTER ES
0072 r LATIN SMALL LETTER R 043E о CYRILLIC SMALL LETTER O 0455 ѕ CYRILLIC SMALL LETTER DZE 043E о CYRILLIC SMALL LETTER O 0066 f LATIN SMALL LETTER F 0074 t LATIN SMALL LETTER T 002E . FULL STOP 0063 c LATIN SMALL LETTER C 006F o LATIN SMALL LETTER O 006D m LATIN SMALL LETTER M |
004D M LATIN CAPITAL LETTER M0069 i LATIN SMALL LETTER I0063 c LATIN SMALL LETTER C
0072 r LATIN SMALL LETTER R 006F o LATIN SMALL LETTER O 0073 s LATIN SMALL LETTER S 006F o LATIN SMALL LETTER O 0066 f LATIN SMALL LETTER F 0074 t LATIN SMALL LETTER T 002E . FULL STOP 0063 c LATIN SMALL LETTER C 006F o LATIN SMALL LETTER O 006D m LATIN SMALL LETTER M |
Think of file names as well. Windows will allow names to be created using Unicode characters. Hence, if you are looking for a file called “cat.txt“, a simple string search will miss “cat.txt” defined using the following Unicode, (\x0441\x00430\x00074\x002E\x0074\x0078\x0074). I have linked a site that does online Unicode conversions and display.
An issue with trying to uncover all versions and possible combinations is that this is an NP infeasible problem. There are more ways to hide data than there are to create simple string searches. This means that we as forensic professionals need to use our greatest tool – our Brain. Things are not always as they seem.
[1] Unicode Character Table: Cyrillic
http://jrgraphix.net/research/unicode_blocks.php?block=8
Author: Craig Wright is a Director with Information Defense in Australia. He holds both the GSE-Malware and GSE-Compliance certifications from GIAC. He is a perpetual student with numerous post graduate degrees including an LLM specializing in international commercial law and ecommerce law as well as working on his 4th IT focused Masters degree (Masters in System Development) from Charles Stuart University where he is helping to launch a Masters degree in digital forensics. He starts his second doctorate, a PhD on the quantification of information system risk at CSU in April this year.
A couple of weeks ago, 
