On Hugging Face, cybersecurity researchers found two malicious machine learning ( ML) models that used an unusual method of “broken” pickle files to evade detection.
ReversingLabs scientist Karlo Zanki stated in a statement shared with The Hacker News that” the jam data extracted from the mentioned PyTorch archives revealed the destructive Python material at the beginning of the file.” ” In both instances, the malicious load was a standard platform-aware change shell that connects to a hard-coded Internet address”.
The method has been dubbed nullifAI because it involves visit attempts to bypass existing measures put in place to identify destructive models. Below are the Touching Face archives listed.
- glockr1/ballr7
- who-r-u0000/0000000000000000000000000000000000000
It’s believed that the models are more of a proof-of-concept ( PoC ) than an active supply chain attack scenario.
The jam serialization format, which is widely used for distributing ML models, has been consistently demonstrated to be a security risk because it enables the execution of arbitrary code as soon as it is loaded and deserialized.
The cybersecurity firm discovered two models in the PyTorch style, which is nothing more than a compressed jar file. The identified models have been found to be compressed using the 7z style, whereas PyTorch uses the ZIP file by proxy for compression.
In consequence of this behavior, the models were able to fly under the radar and avoid being identified as malignant by Picklescan, a device Hugging Face uses to find wary Pickle files.
The thing serialization, which is the purpose of the Pickle file, breaks immediately after the destructive payload is executed, causing the object’s decompilation to fail, according to Zanki.
Further research has revealed that for damaged pickle files can still be half deserialized because of the conflict between Picklescan and how deserialization operates, which results in the destructive code being executed despite the tool displaying an error message. This bug has since been fixed in the open-source value.
Zanki attributed this behavior to the fact that Pickle records are deserialized sequentially, Zanki explained.
” Pickle opcodes are executed as they are encountered, and until all opcodes are executed or a broken education is encountered. Because the malicious cargo is inserted at the beginning of the Pickle stream in the learned model, Hugging Face’s existing protection scanning tools wouldn’t be able to identify it as illegal.