Distributed Hash Table (DHT)
A distributed hash table (DHT) is a decentralized storage system that provides lookup and storage schemes similar to a hash table, storing key-value pairs. [1]
Distributed hash tables are decentralized, so all nodes form the collective system without any centralized coordination. They are generally fault-tolerant because data is replicated across multiple nodes. [2]
DHTs have the following properties [1]:
- Decentralized & Autonomous: Nodes collectively form the system without any central authority.
- Fault Tolerant: System is reliable with lots of nodes joining, leaving, and failing at all times.
- Scalable: System should function efficiently with even thousands or millions of nodes.
Figure 1: In DHT, values mapped against keys. [1]
Why Is a DHT Used?
Distributed hash tables provide an easy way to find information in a large collection of data because all keys are in a consistent format, and the entire set of keys can be partitioned in a way that allows fast identification on where the key/value pair resides. The nodes participating in a distributed hash table act as peers to find specific data values, as each node stores the key partitioning scheme so that if it receives a request to access a given key, it can quickly map the key to the node that stores the data. It then sends the request to that node. [2]
Also, nodes in a distributed hash table can be easily added or removed without forcing a significant amount of re-balancing of the data in the cluster. Cluster rebalancing, especially for large data sets, can often be a time-consuming task that also impacts performance. Having a quick and easy means for growing or shrinking a cluster ensures that changes in data size does not disrupt the operation of the applications that access data in the distributed hash table. [2]
Issues in DHT [3]
· Searching
o Consequence of hash algorithm
o “abc” and “abcd” are at totally different nodes
o Warning: DHT people call lookup “search”
· Security problems
o Hard to verify data integrity
o Secure routing is an open problem
Architecture and Implementation
Distributed hash table architecture: each box in the diagram represents a software process. In the simplest case, each process runs on its own physical machine; however there is nothing preventing processes from sharing machines. [4]
Figure 2: illustrates the architecture of hash table’s. [4]
The distributed hash table’s architecture consists of the following components:
Client: a client consists of service-specific software running on a client machine that communicates across the wide area with one of many service instances running in the cluster. [4]
Service: a service is a set of cooperating software processes, each of which we call a service instance. Service instances communicate with wide-area clients and perform some application-level function. [4]
Hash table API: the hash table API is the boundary between a service instance and its ``DHT library’’. The API provides services with put(), get(), remove(), create(), and destroy() operations on hash tables. [4]
DHT library: the DHT library is a Java class library that presents the hash table API to services. The library accepts hash table operations, and cooperates with the ``bricks’’ to realize those operations. The library contains only soft state, including metadata about the cluster’s current configuration and the partitioning of data in the distributed hash tables across the ``bricks’’. [4]
Brick: bricks are the only system components that manage durable data. Each brick manages a set of network-accessible single node hash tables. A brick consists of a buffer cache, a lock manager, a persistent chained hash table implementation, and network stubs and skeletons for remote communication. [4]
References
[1] Creative Commons. (n.d.). What is a distributed hash table? Educative. https://www.educative.io/edpresso/what-is-a-distributed-hash-table
[2] What Is a Distributed Hash Table? (n.d.). Hazelcast. https://hazelcast.com/glossary/distributed-hash-table/
[3] Rescorla, E. (2021). Introductionto Distributed Hash Tables [Slides]. The Internet Engineering Task Force. https://www.ietf.org/proceedings/65/slides/plenaryt-2.pdf
[4] Gribble, S. D., Brewer, E. A., Hellerstein, J. M., & Culler, D. (n.d.). Scalable, Distributed Data Structures for Internet Service Construction. The University of California at Berkeley.