Complexity Results in Some Clustering Algorithms

No Thumbnail Available

Date

2025-06

Journal Title

Journal ISSN

Volume Title

Publisher

Indian Statistical Institute, Kolkata

Abstract

Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a prevalent Clustering method without supervision renowned for its capability to recognize arbitrarily shaped clusters and detect noise in spatial data. Unlike partitioning methods such as k-means, DBSCAN operates without inputting a predefined number of clusters and is particularly effective in handling datasets with varying densities. In this dissertation, we have undertaken an in-depth exploration of the DBSCAN algorithm. We reviewed and analyzed several research papers that build upon or revise the original DBSCAN framework, with the goal of understanding their motivations, design choices, and computational implications. In addition to studying the foundational principles, we examined traditional spatial data structures that are commonly employed to accelerate DBSCAN, such as R-trees and KD-trees. This background enabled us to identify key computational bottlenecks in both neighbor search and density estimation. Building on these insights, we proposed two novel algorithms. The first is an approximate algorithm that efficiently replicates standard DBSCAN behavior, and the second is a modified version termed Box-based DBSCAN, which operates under a slightly altered definition of neighborhood using axis-aligned bounding boxes. The box-based approach improves clustering performance for geometrically structured data and introduces new ways to identify core regions without relying on exhaustive point-wise comparisons.

Description

Dissertation under the supervision of Prof. Sandip Das

Keywords

DBSCAN (Density-Based Spatial Clustering of Applications with Noise ), BB (Bounding Box), MBB (Minimum Bounding Box), USEC (Unit-Spherical Emptiness Checking), BCP (Bi-Chromatic Closest Pair)

Citation

53p.

Endorsement

Review

Supplemented By

Referenced By