Aneesh Raman - Academia.edu (original) (raw)
Papers by Aneesh Raman
Indexes facilitate efficient querying when the selection predicate is on an indexed key. As a res... more Indexes facilitate efficient querying when the selection predicate is on an indexed key. As a result, when loading data, if we anticipate future selective (point or range) queries, we typically maintain an index that is gradually populated as new data is ingested. In that respect, indexing can be perceived as the process of adding structure to an incoming, otherwise unsorted, data collection. The process of adding structure comes at a cost, as instead of simply appending incoming data, every new entry is inserted into the index. If the data ingestion order matches the indexed attribute order, the ingestion cost is entirely redundant and can be avoided (e.g., via bulk loading in a B+-tree). However, state-of-the-art index designs do not benefit when data is ingested in an order that is close to being sorted but not fully sorted. In this paper, we study how indexes can benefit from partial data sortedness or near-sortedness, and we propose an ensemble of techniques that combine bulk l...
2019 IEEE International Conference on Big Knowledge (ICBK), 2019
Criminological research and theory have traditionally focused on individual offenders and macro-l... more Criminological research and theory have traditionally focused on individual offenders and macro-level analysis to characterize crime distribution. However local aspects of crime activities have been also recognized as important factors in crime analysis. It is an interesting problem to discover implicit local patterns between crime activities and environmental factors such as nearby facilities and business establishment types. This work presents micro-level analysis of criminal incidents using spatial association rule mining. We show how to process crime incident points and their spatial relationships with task-relevant other spatial features, and discover interesting crime patterns using an association rule mining algorithm. A case study was conducted with real incident records and points of interest in a study area to discover interesting relationship patterns among crimes, their characteristics, and nearby spatial features. This study shows that our approach with spatial association rule mining is promising for micro-level analysis of crime.
Proceedings of the 17th International Workshop on Data Management on New Hardware (DaMoN 2021), 2021
Bloom filters (BFs) accelerate point lookups in Log-Structured Merge (LSM) trees by reducing unne... more Bloom filters (BFs) accelerate point lookups in Log-Structured Merge (LSM) trees by reducing unnecessary storage accesses to levels that do not contain the desired key. BFs are particularly beneficial when there is a significant performance difference between querying a BF (hashing and accessing memory) and accessing data (on secondary storage). This gap, however, is decreasing as modern storage devices (SSDs and NVMs) have increasingly lower latency, to the point that the cost of accessing data can be comparable to that of filter probing and hashing, especially for large key sizes that exhibit high hashing cost. In an LSM-tree, BFs are employed when querying each level of the tree, thus, exacerbating the CPU cost as the data size - and thus, the tree height - grows. To address the increasing CPU cost of BFs in LSM-trees, we propose to re-use hash calculations aggressively within and across BFs, as well as between different levels, and we show both analytically and experimentally th...
Indexes facilitate efficient querying when the selection predicate is on an indexed key. As a res... more Indexes facilitate efficient querying when the selection predicate is on an indexed key. As a result, when loading data, if we anticipate future selective (point or range) queries, we typically maintain an index that is gradually populated as new data is ingested. In that respect, indexing can be perceived as the process of adding structure to an incoming, otherwise unsorted, data collection. The process of adding structure comes at a cost, as instead of simply appending incoming data, every new entry is inserted into the index. If the data ingestion order matches the indexed attribute order, the ingestion cost is entirely redundant and can be avoided (e.g., via bulk loading in a B+-tree). However, state-of-the-art index designs do not benefit when data is ingested in an order that is close to being sorted but not fully sorted. In this paper, we study how indexes can benefit from partial data sortedness or near-sortedness, and we propose an ensemble of techniques that combine bulk l...
2019 IEEE International Conference on Big Knowledge (ICBK), 2019
Criminological research and theory have traditionally focused on individual offenders and macro-l... more Criminological research and theory have traditionally focused on individual offenders and macro-level analysis to characterize crime distribution. However local aspects of crime activities have been also recognized as important factors in crime analysis. It is an interesting problem to discover implicit local patterns between crime activities and environmental factors such as nearby facilities and business establishment types. This work presents micro-level analysis of criminal incidents using spatial association rule mining. We show how to process crime incident points and their spatial relationships with task-relevant other spatial features, and discover interesting crime patterns using an association rule mining algorithm. A case study was conducted with real incident records and points of interest in a study area to discover interesting relationship patterns among crimes, their characteristics, and nearby spatial features. This study shows that our approach with spatial association rule mining is promising for micro-level analysis of crime.
Proceedings of the 17th International Workshop on Data Management on New Hardware (DaMoN 2021), 2021
Bloom filters (BFs) accelerate point lookups in Log-Structured Merge (LSM) trees by reducing unne... more Bloom filters (BFs) accelerate point lookups in Log-Structured Merge (LSM) trees by reducing unnecessary storage accesses to levels that do not contain the desired key. BFs are particularly beneficial when there is a significant performance difference between querying a BF (hashing and accessing memory) and accessing data (on secondary storage). This gap, however, is decreasing as modern storage devices (SSDs and NVMs) have increasingly lower latency, to the point that the cost of accessing data can be comparable to that of filter probing and hashing, especially for large key sizes that exhibit high hashing cost. In an LSM-tree, BFs are employed when querying each level of the tree, thus, exacerbating the CPU cost as the data size - and thus, the tree height - grows. To address the increasing CPU cost of BFs in LSM-trees, we propose to re-use hash calculations aggressively within and across BFs, as well as between different levels, and we show both analytically and experimentally th...