Distributed Hash Table (DHT)

Sertsedengle Shewandagn
3 min readJun 9, 2021

--

A distributed hash table (DHT) is a decentralized storage system that provides lookup and storage schemes similar to a hash table, storing key-value pairs. [1]

Distributed hash tables are decentralized, so all nodes form the collective system without any centralized coordination. They are generally fault-tolerant because data is replicated across multiple nodes. [2]

DHTs have the following properties [1]:

  • Decentralized & Autonomous: Nodes collectively form the system without any central authority.
  • Fault Tolerant: System is reliable with lots of nodes joining, leaving, and failing at all times.
  • Scalable: System should function efficiently with even thousands or millions of nodes.
DHT simple diagram

Figure 1: In DHT, values mapped against keys. [1]

Why Is a DHT Used?

Distributed hash tables provide an easy way to find information in a large collection of data because all keys are in a consistent format, and the entire set of keys can be partitioned in a way that allows fast identification on where the key/value pair resides. The nodes participating in a distributed hash table act as peers to find specific data values, as each node stores the key partitioning scheme so that if it receives a request to access a given key, it can quickly map the key to the node that stores the data. It then sends the request to that node. [2]

Also, nodes in a distributed hash table can be easily added or removed without forcing a significant amount of re-balancing of the data in the cluster. Cluster rebalancing, especially for large data sets, can often be a time-consuming task that also impacts performance. Having a quick and easy means for growing or shrinking a cluster ensures that changes in data size does not disrupt the operation of the applications that access data in the distributed hash table. [2]

Issues in DHT [3]

· Searching

o Consequence of hash algorithm

o “abc” and “abcd” are at totally different nodes

o Warning: DHT people call lookup “search”

· Security problems

o Hard to verify data integrity

o Secure routing is an open problem

Architecture and Implementation

Distributed hash table architecture: each box in the diagram represents a software process. In the simplest case, each process runs on its own physical machine; however there is nothing preventing processes from sharing machines. [4]

Figure 2: illustrates the architecture of hash table’s. [4]

The distributed hash table’s architecture consists of the following components:

Client: a client consists of service-specific software running on a client machine that communicates across the wide area with one of many service instances running in the cluster. [4]

Service: a service is a set of cooperating software processes, each of which we call a service instance. Service instances communicate with wide-area clients and perform some application-level function. [4]

Hash table API: the hash table API is the boundary between a service instance and its ``DHT library’’. The API provides services with put(), get(), remove(), create(), and destroy() operations on hash tables. [4]

DHT library: the DHT library is a Java class library that presents the hash table API to services. The library accepts hash table operations, and cooperates with the ``bricks’’ to realize those operations. The library contains only soft state, including metadata about the cluster’s current configuration and the partitioning of data in the distributed hash tables across the ``bricks’’. [4]

Brick: bricks are the only system components that manage durable data. Each brick manages a set of network-accessible single node hash tables. A brick consists of a buffer cache, a lock manager, a persistent chained hash table implementation, and network stubs and skeletons for remote communication. [4]

References

[1] Creative Commons. (n.d.). What is a distributed hash table? Educative. https://www.educative.io/edpresso/what-is-a-distributed-hash-table

[2] What Is a Distributed Hash Table? (n.d.). Hazelcast. https://hazelcast.com/glossary/distributed-hash-table/

[3] Rescorla, E. (2021). Introductionto Distributed Hash Tables [Slides]. The Internet Engineering Task Force. https://www.ietf.org/proceedings/65/slides/plenaryt-2.pdf

[4] Gribble, S. D., Brewer, E. A., Hellerstein, J. M., & Culler, D. (n.d.). Scalable, Distributed Data Structures for Internet Service Construction. The University of California at Berkeley.

--

--

No responses yet