À propos de cet événement
Existing deobfuscation techniques usually target specific obfuscation passes and assume
a prior knowledge of obfuscated location within a program. Also, some approaches tend to
be computationally costly. Conversely, few research consider bypassing obfuscation through
correlation of various variants of the same obfuscated program or a clear program and a
later obfuscated variant. Both scenarios are common both in IP protection but also malware
analysis. We formalize an attacker model targeting obfuscation by exploiting knowledge
transfer between binaries through a dedicated binary diffing algorithm leveraging
both intra-procedural structure (CFG) and inter-procedural structure (call-graph)
combined through message passing. The associated tool QBinDiff, exhibits better results
than state-of-the-art differs.
In a case where an adversary cannot find multiple variants of the same program, applying deobfuscation may be necessary and consequently locating it. The latter, requires characterizing an obfuscation from genuine code. We analyze both the binary classification problem namely determining if a function is obfuscated but also the multi-class problem namely determining the pass that is applied. Our baseline results reaches 0.95 of f1-score at the function level for binary. We show early results for the multi-class problem using various base representation and various algorithms including RandomForest, GradientBoosting and various GNNs.
Proposé par
Le CEA est un organisme de recherche sur la défense et la sécurité, les énergies nucléaire et renouvelables, la recherche technologique pour l’industrie et la recherche fondamentale en sciences de la matière et de la vie.