David is a computer security researcher with over 18 years of experience in malware analysis and antivirus software evaluation. He runs the Privacy-PC.com project which presents expert opinions on contemporary information security matters, including social engineering, penetration testing, threat intelligence, online privacy, and white hat hacking. He has a strong malware troubleshooting background, with the recent focus on ransomware countermeasures.
AI at the service of cybercriminals
Malicious use of artificial intelligence (AI) and machine learning is a particularly unsettling trend because it allows threat actors to make their attacks more sophisticated and completely undetectable.
Let’s explore the likely scenarios of AI-misuse, and how cybercriminals may be exploiting the technology in the near future.
In 2018, analysts at IBM described a new approach to “weaponising” AI. They created a system called DeepLocker that incorporates AI capabilities into malware to enhance its anti-detection mechanisms. This was achieved by exploiting a common characteristic of all AI systems – their ability to operate in obscure ways.
The role of malware boils down to ensuring that malicious activity is implemented exactly where the attackers need it to occur. That’s why IT environment monitoring and target identification are among the most important functions of any malware. This set of checks is pretty much the same for all malware strains, so they serve as telltale signs that allows antivirus and protection systems to identify them as threats.
DeepLocker authors came up with a way to cloak malware behaviour by performing these checks via a Deep Neural Network (DNN) trained on a model that corresponds to a specific target. As a result, the malware flies under the radar because it looks like a series of simple mathematical operations between matrices.
When there is no need to explicitly encode malware launch conditions, a sample is much more difficult to analyse. For instance, DeepLocker uses many attributes to identify the target machine, including software environment, geolocation, and even multimedia content, thus making it even more problematic to predict and determine malware execution conditions.
Moreover, DeepLocker creators added another layer of obfuscation: its built-in neural network does not output a clear-cut activation value, nor does it provide an answer to the question: “Am I on the target machine?” Instead, the program calculates the key to decrypt the payload. But this key will only be valid when the neural network positively identifies the target.
Therefore, any explicit information about the target is disguised as matrix factors, making reverse-engineering and target identification nearly unfeasible. Until the moment the malware hits the target, the rest of its code is completely encrypted and cannot be analysed.
The complexity of developing programs with this kind of functionality has been a big roadblock for most malware developers so far. However, as technologies and frameworks evolve, we can expect to see a new generation of dangerous and almost undetectable malware appearing on a massive scale sometime soon.
Mimicking human beings
One of the most common methods of malicious monetisation is to use bots that pose as real users of social networks and other mainstream digital platforms. The makers of such bots try to dupe anti-fraud systems to boost the popularity of a song on Spotify or an app on the Play Market. They may also use them to inflate the subscriber count on an Instagram, Twitter, or Facebook account, or generate fake likes, views, and reviews.
Since the efficiency of ordinary bots is low and platforms’ algorithms can detect bogus subscribers in a snap, recruiting human users to perform such tasks is a common practice. However, this approach requires significant budgets to pay people for their work, so developing an AI system that realistically mimics human behavior would be a breath of fresh air to spammers and phishers.
The use of AI will go beyond merely imitating the behavior of a real person; the AI will actively subscribe to various accounts, interact with them and add fake likes to them. Trained “neurobots” will be able to maintain long-term activity on rogue social network accounts, creating realistic profiles with years of history.
Social engineering of the future
With growing cyber awareness and better cybersecurity tools, fraudulent “spray and pray” emails are not highly effective these days. Although the low cost of such campaigns allows them to generate profit very quickly, criminals would be happy to harness any technology that takes it up a notch.
AI can automatically analyse a database of potential victims’ email addresses, discard known-inactive records, and quickly perform open-source intelligence (OSINT) for the rest by examining the information available in public sources. This mechanism makes it possible to compile psychological profiles and select recipients who are more susceptible to deception.
If the campaign involves correspondence with the victim, machine learning algorithms can predict the likely responses of the target and choose the most effective strategy for further communication.
Another use case involves manipulating KYC (Know Your Customer) procedures used by financial institutions for client verification. The KYC workflow usually consists of three stages:
Identification – the customer provides various documents, the bank employee checks if the appearance of the customer matches the photos and may also take additional photographs, record voice samples, or take fingerprints.
Verification – checking the validity of the data provided by the client.
Authentication – the customer must sign in to the online banking system with a username and password, answer a security question or provide biometric data.
During the pandemic, many banks perform customer identification remotely using video conferencing software. That said, the use of AI will allow con artists to manipulate real-time video streams and thereby fool such verification systems down the line.
By implementing voice and face synthesis and deepfakes, criminals will be able to get around these checks on a large scale and create new bank accounts or make fraudulent payments. This is a serious concern because video conferencing is currently one of the recommended identification mechanisms for the banking industry in several countries.
Scouring stolen databases for valuable information
Data has become the new gold. It comes as no surprise that nearly all ransomware operators steal their victims’ files in addition to encrypting them. As a result, they get extra leverage in ransom negotiations by threatening to leak the stolen data if the infected organisation refuses to pay for decryption.
From an attacker’s perspective, the problem with such a tactic is that the amount of illegally retrieved information is huge. Finding something truly valuable in terabytes of data is easier said than done. This is an area where AI can help them: a trained neural network can identify and extract structured information from unstructured documents. For instance, one of the implementations of the Named Entity Recognition (NER) technology can be applied here.
NER is a branch of language processing technology geared toward finding specific categories of words and phrases in speech and text. It was originally intended to pinpoint geographical names, addresses, as well as the names of people and organisations. The concept has expanded considerably over time. Nowadays, NER makes it possible to find relative and absolute dates along with other types of numeric data.
It means that this technology can easily find credit card details, phone numbers, and bank accounts in huge volumes of text. AI that uses NER to spot valuable data inside stolen documents will likely become another tool in cybercriminals’ toolkit.
Data poisoning attacks against deep learning systems
Since the neural network algorithm is based on deep learning data, we can expect an entirely new attack vector targeting this technology – data poisoning. The data at the core of such a network comprises huge arrays of numbers, and it can be supplemented and modified as the algorithm operates. A neural network depends on this data to generate results in response to a particular query, and even its creator cannot explain the whys and wherefores of these results.
Criminals may think of this problem as an exploitation opportunity. Unauthorised modification of a legitimate program or library is easy to identify based on discrepancies in the code or behavioral deviations in its execution. However, detecting anomalies in a deep learning model is difficult because of its inexplicability. Consequently, future malware could impact host machines by directly targeting, filtering, or modifying model data rather than libraries or APIs. A model change made by malware is nearly impossible to distinguish from tweaks by a model update.
Dark web forums are already full of threads about introducing a kind of “poisonous injection” into deep learning data, which would make a neural network behave in a specific way in certain situations. Although these discussions mostly live in the realms of theory so far, the logic of such attacks has already been formulated. Therefore, real-world implementations are probably just a matter of time.
Will abandoning AI help?
The increasing number of threats associated with the abuse of neural networks has caused many researchers to advocate restrictions for accessing this technology, government oversight of AI usage, and heavier penalties for criminal exploitation of deep learning and related areas. Would such limitations do any real good, though?
Malefactors are already operating outside the law, often cashing in on illegal access to classified technologies to carry out their campaigns. Therefore, restrictions will most likely hamper the development of this socially useful technology rather than curb individuals who plan to use it maliciously.
It will be much more effective to analyse the present and future threats associated with the criminal use of neural networks and to develop effective countermeasures that also involve deep learning systems. In doing so, one of the biggest challenges is to provide mechanisms to protect model data through a checksum-based integrity monitoring system.