Performance Comparison of TOR Hidden Service Crawlers


Creative Commons License

ARISOY M., KÜÇÜKSİLLE E.

Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Dergisi, vol.6, no.2, pp.147-161, 2019 (Peer-Reviewed Journal) identifier

Abstract

TOR (The Onion Routing) is a network structure that has become popular in recent years due to providinganonymity to its users and is often preferred by hidden services. Because the privacy is essential, this networkdraws attention, so the amount of data stored increases day by day, making it difficult to scan and analyze.Various crawler software has been developed in order to scan the services (onion web pages) in this network.However, crawling here is different from the surface network. Because the TOR network is located on the lowerlayers of the surface network and the pages in TOR are accessed only through the TOR browser. In the requestsmade to the addresses, to protect the confidentiality, the data was obtained by selecting paths through differentrelays. In TOR network, reaching the target address by passing over different relays in each request, slows downit. Also, the low performance of a crawler that tries to retrieve information through TOR, brings long periods ofwaiting. Therefore, working with a software with high crawling and information acquisition speed, will improvethe analysis process of the researchers. 4 different crawler software was evaluated according to various criteria interms of guiding the people who will conduct research in this field and evaluating the superior and weaknessesof the crawlers against each other. The study provides an important point of view for choosing the right crawlerin terms of initial starting points for the researchers want to analyze of Tor web services.