Online and print directories of internet resources—often called “yellow pages” or “hotlists”—were popular among early web users in the 1990s. Since 2021, I have been collecting and digitizing “Internet directory” books—print reference books from the late 1990s and early 2000s that list web page URLs sorted into hierarchical categories with descriptions written by human curators. I initially focused on Chinese and English directories but later expanded my scope to include Japanese, Korean, Russian, and Spanish directories as well. Although most URLs in these books are no longer accessible, they provide researchers with a valuable corpus of historical URLs in various languages.
Internet directory books, especially those in non-English languages, are valuable because they feature a large number of URLs in diverse topics and cater to specific language audiences. Their physical format makes them better sources of historical URLs than born-digital compilations like Yahoo (partially archived by the Wayback Machine with snapshots from different times) and open-source directories like dmoz (which actively pruned inaccessible URLs). The print format of these books - rightly criticized by users around that time for being inflexible and unable to cater to rapid changes on the web - happens to make the books great records of URLs that were accessible around the date of publication that appealed to audiences in different languages.
I am digitizing these books to create an aggregated database of historical URLs from the late 1990s to early 2000s. For each unique URL, I collect archived snapshot timestamps from the Wayback Machine, the number of its appearances across all books in my collection, and the title and description from each appearance. I am currently developing a web interface that allows users to query the contents of my collection and access these URLs through web archives.