This research project is part of my Master’s program at the University of San Francisco, where I collaborated with the AT&T Alien Labs team. I would like to share a new approach to automate the extraction of key details from cybersecurity documents. The goal is to extract entities such as country of origin, industry targeted, and malware name.
The data is obtained from the AlienVault Open Threat Exchange (OTX) platform:
Figure 1: The website otx.alienvault.com
The Open Threat Exchange is a crowd-sourced platform where, where users upload “pulses” which contain information about a recent cybersecurity threat. A pulse consists of indicators of compromise and links to blog posts, whitepapers, reports, etc. with details of the attack. The pulse normally contains a link to the full content (a blog post), together with key meta-data manually extracted from the full content (the malware…