JPL Creates World’s Largest PDF Archive to Aid Malware Research

siteadmin June 14, 2023

A team from NASA’s Jet Propulsion Lab developed a corpus of 8 million PDFs, the largest of its kind, to identify potential online threats and improve PDF technology. The corpus was built using web scraping to gather a diverse range of PDFs available publicly online. The resultant dataset, hosted by the Digital Corpora project, will also aid in privacy and software research.