LAS VEGAS—Over the past five years, antivirus firms and malware-analysis startups have created automated systems for identifying and categorizing malware. Yet, such closed systems frequently fail to detect the newest malware and rarely allow security researchers to peek at the results.
At the Black Hat security conference on Aug. 7 here, a group of researchers announced an open platform for automated malware analysis that uses big data techniques to match any submitted sample with known malware families, helping security analysts and researchers trace the lineage of code and possibly aiding in the attribution of attacks.
Dubbed Cynomix, the service combines four different types of analysis to identify common code and detect new or evolving techniques, while handling the large-scale problem of dealing with hundreds of millions of malware samples, Joshua Saxe, associate research director for data science at Invincea Labs, told Black Hat attendees.
By matching a piece of malware to a family of similar programs, the automated analysis can help researchers speed their reverse engineering of samples, he said.
“If I’m a biologist and I’ve done all this work on horses and then I learn that there is an animal known as a zebra, I don’t want to have to redo all my work,” Saxe said. “The same goes for malware.”
The Cynomix project comes as antivirus companies have increasingly automated their analysis of malware, reacting to a rapidly expanding number of variants of malicious code.
In the early 2000s, attackers adopted techniques to quickly create code variants to escape standard antivirus scanners. As a result, the number of unique variants of Trojans, viruses and other malicious code skyrocketed from less than 10 million to more than 250 million today, according to antivirus testing lab AV-Test.
The increasing population of malware has made analysis more difficult and forced companies to rely more on automated techniques and machine learning. A number of cloud-based malware analysis services have popped up to allow companies to perform their own analyses on encountered binaries. Services such as ThreatGRID, Anubis by the International Secure Systems Lab, and Malwr allow analysts to compare different features or perform a runtime analysis on potentially malicious programs.
Cynomix uses four different types of analyses and combines them using machine learning algorithms to get better results, according to Invincea’s Saxe. The analysis system compares static file attributes, dynamic runtime analysis, instruction-level features and metadata from the programs to create clusters of similar threats.
“The intuition in using all of these [different techniques] is that a malware author can defeat one of these methods, but it is much harder for a malware author to defeat all four,” Saxe said.
Researchers can submit malware to Cynomix and use the system to find which features are similar to other malware. The service handles obfuscation and the larger issue of comparing features across a database of hundreds of millions of malware variants, he said. Invincea plans to use the data generated by the project to improve its products and speed detection of new malware families, Saxe said.