Murat Ali Bayir | SUNY: University at Buffalo (original) (raw)

Papers by Murat Ali Bayir

IEEE Access

We propose a new ML model called Topological Forest that contains an ensemble of decision trees. ... more We propose a new ML model called Topological Forest that contains an ensemble of decision trees. Unlike a vanilla Random Forest, Topological Forest has a special training process that selects a smaller number of decision trees on a topological graph representation that TDA Mapper constructs. Compared to Vanilla Random Forest, Topological Forest significantly improves the computational efficiency of inference time due to the smaller ensemble size and selection of better decision trees while keeping the diversity of decision trees. Our experiments show that Topological Forest can speed up inference time by more than 100x on average while compromising at most 2% reduction in the AUC metric for the prediction quality.

WWW Journal, 2022

This paper introduces a new method for the session construction problem, which is the first main ... more This paper introduces a new method for the session construction problem, which is the first main step of the Web usage mining process. The proposed method defines user sessions as a set of navigation paths in the Web graph and produces a complete set of all possible maximal paths. Our new method is capable of generating navigation paths which cannot be extracted by using previous greedy approaches. Through experiments performed on real data, it is shown that when our new technique is used, it outperforms previous approaches in Web usage mining applications such as next-page prediction. Our analysis on Web user sessions exposes an important observation: Web users sessions contain navigation graphs that has small number of nodes where users branch out their navigation into multiple paths.

Proceedings of the 25th International Conference Companion on World Wide Web - WWW '16 Companion, 2016

Applied Soft Computing, 2015

Cloud computing enables a conventional relational database system's hardware to be adjusted dynam... more Cloud computing enables a conventional relational database system's hardware to be adjusted dynamically according to query workload, performance and deadline constraints. One can rent a large amount of resources for a short duration in order to run complex queries efficiently on large-scale data with virtual machine clusters. Complex queries usually contain common subexpressions, either in a single query or among multiple queries that are submitted as a batch. The common subexpressions scan the same relations, compute the same tasks (join, sort, etc.), and/or ship the same data among virtual computers. The total time spent for the queries can be reduced by executing these common tasks only once. In this study, we build and use efficient sets of query execution plans to reduce the total execution time. This is an NP-Hard problem therefore, a set of robust heuristic algorithms, Branch-and-Bound, Genetic, Hill Climbing, and Hybrid Genetic-Hill Climbing, are proposed to find (near-) optimal query execution plans and maximize the benefits. The optimization time of each algorithm for identifying the query execution plans and the quality of these plans are analyzed by extensive experiments.

Lecture Notes in Computer Science, 2010

With the increasing use of dynamic page generation, asynchronous page loading (AJAX) and rich use... more With the increasing use of dynamic page generation, asynchronous page loading (AJAX) and rich user interaction in the Web, it is possible to capture more information for web usage analysis. While these advances seem a great opportunity to collect more information about web user, the complexity of the usage data also increases. As a result, traditional page-view based web usage mining methods have become insufficient to fully understand web usage behavior. In order to solve the problems with current approaches our framework ...

Multiple Query Optimization (MQO) is a technique for processing a batch of queries in such a way ... more Multiple Query Optimization (MQO) is a technique for processing a batch of queries in such a way that shared tasks in these queries are executed only once, resulting in significant savings in the total evaluation. The first phase of MQO requires producing alternative query execution plans so that the shared tasks between queries are identified and maximized. The second phase of MQO is an optimization problem where the goal is selecting exactly one of the alternative plans for each query to minimize the total execution cost of all queries. A-star, branch-and-bound, dynamic programming (DP), and genetic algorithm (GA) solutions for MQO have been given in the literature. However; the performance of optimal algorithms, A-star and DP, is not sufficient for solving large MQO problems involving large number of queries. In this study, we propose an Integer Linear Programming (ILP) formulation to solve the MQO problem exactly for large number of queries and evaluate its performance. Our results show that ILP outperforms the existing A-star algorithm.

Information Sciences and Systems 2014, 2014

Multiple Query Optimization (MQO) is a technique for processing a batch of queries in such a way ... more Multiple Query Optimization (MQO) is a technique for processing a batch of queries in such a way that shared tasks in these queries are executed only once, resulting in significant savings in the total evaluation. The first phase of MQO requires producing alternative query execution plans so that the shared tasks between queries are identified and maximized. The second phase of MQO is an optimization problem where the goal is selecting exactly one of the alternative plans for each query to minimize the total execution cost of all queries. A-star, branch-and-bound, dynamic programming (DP), and genetic algorithm (GA) solutions for MQO have been given in the literature. However, the performance of optimal algorithms, A-star and DP, is not sufficient for solving large MQO problems involving large number of queries. In this study, we propose an Integer Linear Programming (ILP) formulation to solve the MQO problem exactly for a large number of queries and evaluate its performance. Our results show that ILP outperforms the existing A-star algorithm.

Annals of Epidemiology, 2010

Annals of Epidemiology, Volume 20, Issue 9, Pages 708, September 2010, Authors:CB Rudra; MA Bayir... more Annals of Epidemiology, Volume 20, Issue 9, Pages 708, September 2010, Authors:CB Rudra; MA Bayir; M. Demirbas; A. Rudra.

Ad Hoc Networks, 2014

ABSTRACT In this paper, we propose a novel routing protocol, PRO, for profile-based routing in po... more ABSTRACT In this paper, we propose a novel routing protocol, PRO, for profile-based routing in pocket switched networks. Differing from previous routing protocols, PRO treats node encounters as periodic patterns and uses them to predict the times of future encounters. Exploiting the regularity of human mobility profiles, PRO achieves fast (low-delivery-latency) and efficient (low-message-overhead) routing in intermittently connected pocket switched networks. PRO is self-learning, completely decentralized, and local to the nodes. Despite being simple, PRO forms a general framework, that can be easily instantiated to solve searching and querying problems in smartphone networks. We validate the per-formance of PRO with the "Reality Mining" dataset containing 350K hours of celltower connectivity data, and compare its performance with that of previous approaches.

Journal of Cloud Computing: Advances, Systems and Applications, 2014

MapReduce is a popular programming model for executing time-consuming analytical queries as a bat... more MapReduce is a popular programming model for executing time-consuming analytical queries as a batch of tasks on large scale data clusters. In environments where multiple queries with similar selection predicates, common tables, and join tasks arrive simultaneously, many opportunities can arise for sharing scan and/or join computation tasks. Executing common tasks only once can remarkably reduce the total execution time of a batch of queries. In this study, we propose a Multiple Query Optimization framework, SharedHive, to improve the overall performance of Hadoop Hive, an open source SQL-based data warehouse using MapReduce. SharedHive transforms a set of correlated HiveQL queries into a new set of insert queries that will produce all of the required outputs within a shorter execution time. It is experimentally shown that SharedHive achieves significant reductions in total execution times of TPC-H queries.

Data & Knowledge Engineering, 2012

In this paper, we propose a novel page view based session model and session construction method t... more In this paper, we propose a novel page view based session model and session construction method to address the Web Usage Mining (WUM) problem. Unlike the simple session models, where sessions are sequences of web pages requested from the server (or served from a browser/proxy cache) and viewed in the browser (which may not guarantee a direct relationship between subsequent web pages in the session), we define a more realistic session model in which a session is a set of paths traversed in the web graph that corresponds to a user navigation performed by following links on web pages. We define the session construction process from raw server logs as a new graph problem and present a novel algorithm, Smart-SRA (Smart Session Reconstruction Algorithm), to solve this problem efficiently. An experimental evaluation based on data collected from real web access scenarios showed that Smart-SRA produces more accurate user sessions than the session construction methods found in the literature.

In classical causal inference, inferring cause-effect relations from data relies on the assumptio... more In classical causal inference, inferring cause-effect relations from data relies on the assumption that units are independent and identically distributed. This assumption is violated in settings where units are related through a network of dependencies. An example of such a setting is ad placement in sponsored search advertising, where the clickability of a particular ad is potentially influenced by where it is placed and where other ads are placed on the search result page. In such scenarios, confounding arises due to not only the individual ad-level covariates but also the placements and covariates of other ads in the system. In this paper, we leverage the language of causal inference in the presence of interference to model interactions among the ads. Quantification of such interactions allows us to better understand the click behavior of users, which in turn impacts the revenue of the host search engine and enhances user satisfaction. We illustrate the utility of our formalizati...

— Mobility path information of cellphone users play a crucial role in a wide range of cellphone a... more — Mobility path information of cellphone users play a crucial role in a wide range of cellphone applications, including context-based search and advertising, early warning systems, city-wide sensing applications such as air pollution exposure estimation and traffic planning. However, there is a disconnect between the low level location data logs available from the cellphones and the high level mobility path information required to support these cellphone applications. In this paper, we present formal definitions to capture the cellphone users ’ mobility patterns and profiles, and provide a complete framework, Mobility Profiler, for discovering mobile user profiles starting from cell based location log data. We use real-world cellphone log data (of over 350K hours of coverage) to demonstrate our framework and perform experiments for discovering frequent mobility patterns and profiles. Our analysis of mobility profiles of cellphone users expose a significant long tail in a user’s loca...

Abstract—In this paper, we propose a novel routing protocol, PRO, for profile-based routing in po... more Abstract—In this paper, we propose a novel routing protocol, PRO, for profile-based routing in pocket switched networks. Differing from previous routing protocols, PRO treats node encounters as periodic patterns and uses them to predict the times of future encounters. Exploiting the regularity of human mobility profiles, PRO achieves fast (low-delivery-latency) and efficient (low-message-overhead) routing in intermittently connected pocket switched networks. PRO is self-learning, completely decentralized, and local to the nodes. Despite being simple, PRO forms a general framework, that can be easily instantiated to solve searching and querying problems in adhoc smartphone networks. We validate the performance of PRO with the “Reality Mining ” dataset containing 350K hours of celltower connectivity and Bluetooth connection data, and compare its performance with that of previous approaches.

Abstract—Despite the availability of the sensor and smartphone devices to fulfill the ubiquitous ... more Abstract—Despite the availability of the sensor and smartphone devices to fulfill the ubiquitous computing vision, thestate-of-the-art falls short of this vision. We argue that the reason for this gap is the lack of an infrastructure to task/utilize these devices for collaboration. We propose that Twitter can provide an “open ” publish-subscribe infrastructure for sensors and smartphones, and pave the way for ubiquitous crowd-sourced sensing and collaboration applications. We design and implement a crowd-sourced sensing and collaboration system over Twitter, and showcase our system in the context of two applications: a crowd-sourced weather radar, and a participatory noise-mapping application. Our results from real-world Twitter experiments give insights into the feasibility of this approach and outlines the research challenges in sensor/smartphone integration to Twitter. I.

Isaac Amundson Murat Ali Bayir Hugues Casse Shou-Wei Chang Chen-Mou Cheng Yongjin Cho Dingxiong D... more Isaac Amundson Murat Ali Bayir Hugues Casse Shou-Wei Chang Chen-Mou Cheng Yongjin Cho Dingxiong Deng Jicheng Fu Rajesh Harjani Edward KS Ho Kun-Yuan Hsieh Tsun-Hua Hsueh Po-Chun Huang Yun Huang Ramkumar Jayaseelan Jinkyu Jeong Hanhong Keum Abdelwahid Khamiss ... Hwanju Kim Junghyun Kim Jungwon Kim Sang-Hoon Kim Seungkyun Kim T. Vamsi Krishna Ci-Bang Kuan Manish Kumar Sang-Won Lee Youngjae Lee Jia-Jhe Li Xiao-Feng Li Yifan Li Lee Booi Lim Cheng-Yen Lin Yu-Te Lin Wen Ming Liu Xuming Lu

Distributed and Parallel Databases

Ailidani Ailijiang Murat Ali Bayir Nihat Altiparmak Douglas Alves Peixoto Vaibhav Arora Ira Assen... more Ailidani Ailijiang Murat Ali Bayir Nihat Altiparmak Douglas Alves Peixoto Vaibhav Arora Ira Assent Manos Athanassoulis Erman Ayday Samira Babalou Mehdi Bahrami Madhushi Bandara Fuat Basik Kaustubh Beedkar Ladjel Bellatreche Carsten Binnig Klemens Böhm Angela Bonifati Vanessa Braganholo Guadalupe Canahuate Alberto Cano Lei Cao Nicholas Car Fabio Casati Aniket Chakrabarti Lijun Chang Aleksey Charapko Shimin Chen Ling Chen Yu Cheng Fei Chiang Byron Choi Bin Cui Khuzaima Daudjee Engin Demir Murat Demirbas Anton Dignoes Bailu Ding Xiaofeng Ding Jaeyoung Do Bin Dong Hai Dong Laurent D’orazio Ahmed Eldawy Iman Elghandour Mohammed Eunus Ali Liyue Fan

In this paper, we propose a novel routing protocol, PRO, for profile-based routing in pocket swit... more In this paper, we propose a novel routing protocol, PRO, for profile-based routing in pocket switched networks. Differing from previous routing protocols, PRO treats node encounters as periodic patterns and uses them to predict the times of future encounters. Exploiting the regularity of human mobility profiles, PRO achieves fast (low-deliverylatency) and efficient (low-message-overhead) routing in intermittently connected pocket switched networks. PRO is self-learning, completely decentralized, and local to the nodes. Despite being simple, PRO forms a general framework, that can be easily instantiated to solve searching and querying problems in smartphone networks. We validate the performance of PRO with the “Reality Mining ” dataset containing 350K hours of celltower connectivity data, and compare its performance with that of previous approaches. 1.

Ubiquitous computing is weaving itself into the fabric of our age, creating unique opportunities ... more Ubiquitous computing is weaving itself into the fabric of our age, creating unique opportunities for accessing and sharing information regardless of time and location. Recent development in hardware technology paved the way to small and portable devices such as wireless sensors, PDAs, iPods, and leads to new generation of cell phones with computing capabilities which are called as smartphones. These smart devices enable location-aware applications as well as empower users to generate and access multimedia content anywhere. Mobility information of cell phone users plays an important role in a wide range of smartphone applications, such as context-based search and advertising, early warning systems, traffic planning, route prediction, and air pollution exposure risk estimation. However, the mobility information captured in the cell phone is low level data units and can not benefit these applications directly. In this thesis, we investigate the problem of enhancing smartphone applicati...

IEEE Access

WWW Journal, 2022

Proceedings of the 25th International Conference Companion on World Wide Web - WWW '16 Companion, 2016

Applied Soft Computing, 2015

Lecture Notes in Computer Science, 2010

Information Sciences and Systems 2014, 2014

Multiple Query Optimization (MQO) is a technique for processing a batch of queries in such a way ... more Multiple Query Optimization (MQO) is a technique for processing a batch of queries in such a way that shared tasks in these queries are executed only once, resulting in significant savings in the total evaluation. The first phase of MQO requires producing alternative query execution plans so that the shared tasks between queries are identified and maximized. The second phase of MQO is an optimization problem where the goal is selecting exactly one of the alternative plans for each query to minimize the total execution cost of all queries. A-star, branch-and-bound, dynamic programming (DP), and genetic algorithm (GA) solutions for MQO have been given in the literature. However, the performance of optimal algorithms, A-star and DP, is not sufficient for solving large MQO problems involving large number of queries. In this study, we propose an Integer Linear Programming (ILP) formulation to solve the MQO problem exactly for a large number of queries and evaluate its performance. Our results show that ILP outperforms the existing A-star algorithm.

Annals of Epidemiology, 2010

Ad Hoc Networks, 2014

Journal of Cloud Computing: Advances, Systems and Applications, 2014

Data & Knowledge Engineering, 2012

Distributed and Parallel Databases