Optimized Data Deduplication Strategy with Distributed Bloom Filters for Efficient Routing and Load Balancing in Clustered Environments


Authors : Teja Chalikanti; Bobbili Sreeja Reddy

Volume/Issue : Volume 8 - 2023, Issue 9 - September

Google Scholar : https://bit.ly/3TmGbDi

Scribd : https://tinyurl.com/28v2n39r

DOI : https://doi.org/10.5281/zenodo.8340645

Abstract : This research paper delves into the realm of data routing strategies enhanced by a distributed Bloom Filter. The utilization of data deduplication technology effectively curbs data storage requirements and optimizes resource utilization. While the potential of single-node storage and computation is limited, the cluster data deduplication approach offers significant advantages. However, it introduces fresh challenges related to diminishing deduplication rates and maintaining equilibrium among storage nodes. To address these concerns, the study introduces a novel data routing strategy grounded in distributed Bloom Filter principles. The strategy capitalizes on the concept of a "Super chunk" as the fundamental data routing unit, bolstering overall system throughput. Following Broder's theorem, a selection process identifies the k smallest fingerprints, shaping Super chunk features sent to storage nodes. By employing Bloom Filter comparisons, the optimal routing node is determined, taking into account node storage capacity and memory maintenance.The research progresses to the design and implementation of system prototypes. Rigorous experimentation yields precise parameters for various routing strategies, subsequently subjected to testing. The results affirm the viability of the proposed strategies, both theoretically and empirically.

Keywords : Data Routing, Load Balancing, Clustered Deduplication, Distributed bloom filters, Super chunk, Deduplication rate, Communication overhead, Storage system, Cloud computing, System throughput.

This research paper delves into the realm of data routing strategies enhanced by a distributed Bloom Filter. The utilization of data deduplication technology effectively curbs data storage requirements and optimizes resource utilization. While the potential of single-node storage and computation is limited, the cluster data deduplication approach offers significant advantages. However, it introduces fresh challenges related to diminishing deduplication rates and maintaining equilibrium among storage nodes. To address these concerns, the study introduces a novel data routing strategy grounded in distributed Bloom Filter principles. The strategy capitalizes on the concept of a "Super chunk" as the fundamental data routing unit, bolstering overall system throughput. Following Broder's theorem, a selection process identifies the k smallest fingerprints, shaping Super chunk features sent to storage nodes. By employing Bloom Filter comparisons, the optimal routing node is determined, taking into account node storage capacity and memory maintenance.The research progresses to the design and implementation of system prototypes. Rigorous experimentation yields precise parameters for various routing strategies, subsequently subjected to testing. The results affirm the viability of the proposed strategies, both theoretically and empirically.

Keywords : Data Routing, Load Balancing, Clustered Deduplication, Distributed bloom filters, Super chunk, Deduplication rate, Communication overhead, Storage system, Cloud computing, System throughput.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe