Mondrian Based Real Time Anonymization Model


Creative Commons License

CİVELEK İ., AYDIN M. A.

Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Dergisi, cilt.8, sa.1, ss.472-483, 2021 (Hakemli Dergi) identifier

Özet

The presence of private information belonging to individuals in data heaps called "Big Data" causes the privacyof the person to be endangered against disclosure attacks. To protect personal privacy in big data, it is ensured thatanonymous data is created, stored, and shared in systems with anonymization methods. However, de-identifieddata cannot be reinstatement. The aim of this study is to create a new method that provides instant disidentificationand does not disrupt the data structure in the system. In the study, the Hadoop ecosystem was used to process largedata heaps. With the proposed model, it has been ensured that the requests from the user are processed in theHadoop ecosystem with the services in the middle layer, thus obtaining anonymous data. The algorithm used fordisidentification is optimized and results are compared according to algorithms in the literature. With the proposedmodel, it has been observed that the user is user-friendly in terms of querying and obtaining an anonymous dataset. According to the analysis results, an algorithm that works with 40% efficiency compared to other algorithmsin terms of processing speed was created with the disidentification algorithm used in the model.