If the dark web and its world interest you, this might be something you find interesting too. The first of its kind, DarkBERT is here. Researchers from the Korea Advanced Institute of Science and Technology (KAIST) have revealed DarkBERT, a generative AI language model that has been trained only on datasets derived from the dark web, in conjunction with data intelligence organization S2W.

To understand what and how will DarkBERT work, we must understand, if Dark Web.

What is Dark Web?

All the links that the search engine still needs to index are on the dark web. And sure, a significant portion of this dark web does contain material that may be illicit and contains a large amount of criminal activity. It is used to maintain the privacy and anonymity of internet activity, which is useful for both legitimate and illicit uses. The use of it for extremely criminal activities has also been reported, even though some people use it to avoid government censorship.

The project’s objective is to develop a tool to analyze data sets and respond to particular queries rather than a chatbot like ChatGPT or Bard. DarkBERT can determine whether using the dark web as a data source will help AI technologies better understand the language spoken in certain contexts, making it potentially a useful tool for cybersecurity experts and law enforcement.

DarkBERT pic
Image Source: Unsplash

The research team compiled a sizable database by monitoring the Tor network in order to optimize how DarkBert adapts to the language used on the dark web. Deduplication, data filtering, and pre-processing were also employed by the team in an effort to ease ethical concerns about the dark web’s sensitive information-filled content.

Over the course of 16 days, the model was fed two sets of data, with the pre-processed data including redacted information such as victim organizations’ names, information about data leaks, threat statements, and illicit photos. This data set included more than a thousand pages that were designated as adult entertainment.

DarkBert won’t be made accessible to the general public for a while due to the possibly dangerous nature of dark web content. Requests for the usage of the AI model for academic purposes are now permissible, nonetheless, as reported.

Image Source: Spiceworks.com

These researchers have created DarkBERT, an artificial intelligence model that seeks to shed light on the intricate details of the digital underworld, as part of their effort to combat cybercrime, a fast-developing sector that primarily relies on natural language processing.

DarkBERT gives a distinctive viewpoint on the never-ending battle against internet misbehavior, with great objectives at its core, by delving into the murky depths of leaked data sharing and the illegal drug trade.