Ayhan Demiriz | Gebze Technical University (original) (raw)

Papers by Ayhan Demiriz

Research paper thumbnail of An MIP model to schedule the call center workforce and organize the breaks

Nucleation and Atmospheric Aerosols, 2016

In modern economies, companies place a premium on managing their workforce efficiently especially... more In modern economies, companies place a premium on managing their workforce efficiently especially in labor intensive service sector, since the services have become the significant portion of the economies. Tour scheduling is an important tool to minimize the overall workforce costs while satisfying the minimum service level constraints. In this study, we consider the workforce management problem of an inbound call-center while satisfying the call demand within the short time periods with the minimum cost. We propose a mixed-integer programming model to assign workers to the daily shifts, to determine the weekly off-days, and to determine the timings of lunch and other daily breaks for each worker. The proposed model has been verified on the weekly demand data observed at a specific call center location of a satellite TV operator. The model was run on both 15 and 10 minutes demand estimation periods (planning time intervals).

Research paper thumbnail of Used Car Pricing and Beyond: A Survival Analysis Framework

A significant part of the overall automotive market is derived from the used car trade. Determini... more A significant part of the overall automotive market is derived from the used car trade. Determining correctly the used car market values will certainly help achieving fairer trade in many economies. By using the web listings as a proxy data source, we can create some models for the used car pricing based on the asking prices listed in the web adverts. This type of data acquisition requires a thorough data cleaning process to generate dependable statistical models after all. This paper proposes a survival analysis based approach to study the lifetime of the used car listings that can be found at web sites like Craigslist. Pricing models can be easily built to determine the market values of the used-cars from this type of data. One of the most important assumptions in our approach is to consider the delisting of an advert as a sale event. This is also equivalent to the death in the survival analysis context. Since the collected data have labels in terms of sale or not, we can utilize the predictive models to determine whether a particular car at a certain price will be successfully sold or not.

Research paper thumbnail of Using pairwise associations for multi-item markdown optimisation

International Journal of Systems Science: Operations & Logistics, Oct 12, 2015

ABSTRACTForecasting is an essential task conducted regularly by competitive retailers around the ... more ABSTRACTForecasting is an essential task conducted regularly by competitive retailers around the world. Most retail decisions particularly markdowns are made based on the demand forecasts which may or may not be accurate in the first place. In this study, we propose a framework for forecasting weekly demands of retail items via linear regression models within multi-item groups that incorporate both positive and negative item associations. We then utilise dynamic pricing models to optimise markdown decisions based on the forecasts within multi-item groups. Grouping items can be considered as a form of variable selection to prevent the overfitting in prediction models. We report regression results from multi-item groupings besides results from single-item regression model on a real-world data-set provided by an apparel retailer. We then report markdown optimisation results for the single items and multi-item groupings that multi-item forecasting models are built upon. The results show that the regression mo...

Research paper thumbnail of Analyzing Classified Listings at an E-Commerce Site by Using Survival Analysis

Periodicals of Engineering and Natural Sciences (PEN), Oct 2, 2013

Sahibinden.com is a leading e-commerce site in Turkey where sellers (buyers) may advertise their ... more Sahibinden.com is a leading e-commerce site in Turkey where sellers (buyers) may advertise their goods (needs) with or without a fee. Since it generates a large volume of traffic to the classified car listings, the site plays an important role for determining the market value of the used cars. In this study, we first randomly selected 200 car classifieds from 950 new classified ads on the day of February 22, 2012. We then observed these listings on a daily basis for a month to determine the possible updates and deletions of the ads. We assume that if an ad is taken out it means that the car has been sold. In addition to the cars' features, we observed the posted price and the number of daily views of the ads throughout the data collection. Therefore one can construct survival models to study the effects of the features and price of a car on the life of the ad. In other words, it is possible to study that what features and price levels expedite the sales of used cars.

Research paper thumbnail of An Integrated Approach for Shift Scheduling and Rostering Problems with Break Times for Inbound Call Centers

Mathematical Problems in Engineering, Nov 21, 2018

It may be very difficult to achieve the optimal shift schedule in call centers which have highly ... more It may be very difficult to achieve the optimal shift schedule in call centers which have highly uncertain and peaked demand during short time periods. Overlapping shift systems are usually designed for such cases. This paper studies shift scheduling and rostering problems for inbound call centers where overlapping shift systems are used. An integer programming model that determines which shifts to be opened and how many operators to be assigned to these shifts is proposed for the shift scheduling problem. For the rostering problem both integer programming and constraint programming models are developed to determine assignments of operators to all shifts, weekly days-off, and meal and relief break times of the operators. The proposed models are tested on real data supplied by an outsource call center and optimal results are found in an acceptable computation time. An improvement of 15% in the objective function compared to the current situation is observed with the proposed model for the shift scheduling problem. The computational performances of the proposed integer and constraint programming models for the rostering problem are compared using real data observed at a call center and simulated test instances. In addition, benchmark instances are used to compare our Constraint Programming (CP) approach with the existing models. The results of the comprehensive computational study indicate that the constraint programming model runs more efficiently than the integer programming model for the rostering problem. The originality of this research can be attributed to two contributions: (a) a model for shift scheduling problem and two models for rostering problem are presented in detail and compared using real data and (b) the rostering problem is considered as a task-resource allocation and considerably shorter computation times are obtained by modeling this new problem via CP.

Research paper thumbnail of webSPADE: a parallel sequence mining algorithm to analyze web log data

In this work we made a study of several other works were the association and sequence mining tech... more In this work we made a study of several other works were the association and sequence mining techniques were applied to the field of web usage mining. This report is to be submitted to classification to the Data Mining course at the phd program "Diseno, Analisis y Aplicaciones de Sistemas Inteligentes", of University of Granada.

Research paper thumbnail of A Framework for Visualizing Association Mining Results

Lecture Notes in Computer Science, 2006

Research paper thumbnail of Analyzing used-car web listings via text mining

Used car trade is one of the major components of the world economies. It is not uncommon to sell ... more Used car trade is one of the major components of the world economies. It is not uncommon to sell a car by placing an internet advertisement irrespective of the geography in these days. A typical content of an advertisement is usually composed of two parts namely the structured and the free text data. The structured data may include some information about the asking price, make, model, year, mileage of the car and the contact info. In most cases, seller may give important clues about the car's current conditions in the free text data where the title (head) of the advertisement can be included as free text too. This paper reports preliminary results from a text mining study conducted on 75K used car internet listings collected from two major car listing web sites in Turkey. As expected, the words and the phrases related to the description of the car are observed to be frequent. The leading concepts in the free text are found to be regarding how to describe the current condition of a car, for example "no crash history".

Research paper thumbnail of Analyzing Price Data to Determine Positive and Negative Product Associations

Lecture Notes in Computer Science, 2009

This paper presents a simple method for mining both positive and negative association rules in da... more This paper presents a simple method for mining both positive and negative association rules in databases using singular value decomposition (SVD) and similarity measures. In literature, SVD is used for summarizing matrices. We use transaction-item price matrix to generate so called ratio rules in the literature. Transaction-item price matrix is formed by using the price data of corresponding items from the sales transactions. Ratio rules are generated by running SVD on transactionitem price matrix. We then use similarity measures on a subset of rules found by Pareto analysis to determine positive and negative associations. The proposed method can present the positive and negative associations with their strengths. We obtain subsequent results using cosine and correlation similarity measures.

Research paper thumbnail of A Web based Decision Support System for Used Car Pricing

Used-car trade has a significant portion in overall automobile market and determining the values ... more Used-car trade has a significant portion in overall automobile market and determining the values of the cars is an important problem. This study proposes a new methodology for determining the market value of the used-cars by observing the classifieds in an e-commerce site. This type of data acquisition plays an important role to build pricing models and to conduct further analysis in our approach. In data acquisition stage, a set of new listings are chosen randomly each day from an ecommerce site (a web site like Craigslist), then these listings are observed until a predetermined period (e.g. thirty days) or delisting time, whichever comes first. The crucial part of our approach is the assumption of a sale event when the listing is no longer available i.e. delisted from the e-commerce site. The proposed methodology may potentially be used for pricing any used item based on the web listings. A web site was developed to help clients/users for determining the market values of their cars as a decision support tool that can assess the likelihood of selling a particular car at a certain price. We also presented the applicability of predictive models to determine the likelihood of selling a car within thirty day period based on the price set by the owner.

Research paper thumbnail of Linking Behavioral Patterns to Personal Attributes Through Data Re-Mining

Research paper thumbnail of Hybridization of population-based ant colony optimization via data mining

Intelligent Data Analysis, Mar 27, 2020

We propose a hybrid application of Population Based Ant Colony Optimization that uses a data mini... more We propose a hybrid application of Population Based Ant Colony Optimization that uses a data mining procedure to wisely initialize the pheromone entries. Hybridization of metaheuristics with data mining techniques has been studied by several researchers in recent years. In this line of research, frequent patterns in a number of initial high-quality solutions are extracted to guide the subsequent iterations of an algorithm, which results in an improvement in solution quality and computational time. Our proposal possesses certain differences from and contributions to existing literature. Instead of one single run that incorporates both the main metaheuristic and the data mining module inside, we propose to carry out independent runs and collect elite sets over these trials. Another contribution is the way we use the knowledge gained from the application of the data mining module. The extracted knowledge is used to initialize the memory model in the algorithm rather than to construct new initial solutions. One additional contribution is the use of a path mining algorithm (a specific sequence mining algorithm) rather than Apriori-like association mining algorithms. Computational experiments, conducted both on symmetric Travelling Salesman Problem and symmetric/asymmetric Quadratic Assignment Problem instances, showed that our proposal produces significantly better results, and is more robust than pure applications of population-based ant colony optimization.

Research paper thumbnail of Using constraint programming for the design of network-on-chip architectures

Computing, Oct 11, 2013

NoC technology is composed of packet-based interconnections, where the communication resources ar... more NoC technology is composed of packet-based interconnections, where the communication resources are distributed across the network. Therefore, the optimal resource utilization is a crucial consideration for efficient architectural designs. This paper studies the practicality of the Constraint Programming (CP) models for NoC architecture designs that effectively use a regular mesh with wormhole switching and the XY routing. The complexity of the CP models is compared with the earlier Mixed Integer Programming (MIP) models. Practical CP-based mapping and scheduling models are developed and results are reported on the benchmark datasets. Results indicate that mapping and scheduling problems can be solved at near optimality even under relatively shorter run-time limits as compared to those required by the MIP models.

Research paper thumbnail of Re-mining Positive and Negative Association Mining Results

Research paper thumbnail of A framework for balanced service and cross-selling by using queuing science

Journal of Intelligent Manufacturing, Jan 8, 2009

Research paper thumbnail of Re-mining item associations: Methodology and a case study in apparel retailing

Decision Support Systems, Dec 1, 2011

Association mining is the conventional data mining technique for analyzing market basket data and... more Association mining is the conventional data mining technique for analyzing market basket data and it reveals the positive and negative associations between items. While being an integral part of transaction data, pricing and time information have not been integrated into market basket analysis in earlier studies. This paper proposes a new approach to mine price, time and domain related attributes through re-mining of association mining results. The underlying factors behind positive and negative relationships can be characterized and described through this second data mining stage. The applicability of the methodology is demonstrated through the analysis of data coming from a large apparel retail chain, and its algorithmic complexity is analyzed in comparison to the existing techniques.

Research paper thumbnail of Enhancing Product Recommender Systems on Sparse Binary Data

Data Mining and Knowledge Discovery, Sep 1, 2004

Research paper thumbnail of A Data Mining-Based Framework for Multi-item Markdown Optimization

Springer series in fashion business, 2018

Markdown decisions in retailing are made based on the demand forecasts which may or may not be ac... more Markdown decisions in retailing are made based on the demand forecasts which may or may not be accurate in the first place. In this chapter, we propose a framework for forecasting weekly demands of retail items via linear regression models within multi-item groups that incorporate both positive and negative item associations. We then utilize dynamic pricing models to optimize markdown decisions based on the forecasts within multi-item groups. Grouping items can be considered as a form of variable selection to prevent the overfitting in prediction models. We report regression results from multi-item groupings besides results from single-item regression model on a real-world dataset provided by an apparel retailer. We then report markdown optimization results for the single items and multi-item groupings that multi-item forecasting models are built upon. The results show that the regression models provide better estimates within multi-item groups compared to the single-item model. Moreover, the overall revenues achieved in multi-item markdown optimization across all grouping schemes are higher than the total revenue yielded by single-item markdown optimization scheme.

Research paper thumbnail of 1 On Analyzing Web Log Data: A Parallel Sequence Mining Algorithm

Research paper thumbnail of DiscoVars: A New Data Analysis Perspective -- Application in Variable Selection for Clustering

arXiv (Cornell University), Apr 8, 2023

We present a new data analysis perspective to determine variable importance regardless of the und... more We present a new data analysis perspective to determine variable importance regardless of the underlying learning task. Traditionally, variable selection is considered an important step in supervised learning for both classification and regression problems. The variable selection also becomes critical when costs associated with the data collection and storage are considerably high for cases like remote sensing. Therefore, we propose a new methodology to select important variables from the data by first creating dependency networks among all variables and then ranking them (i.e. nodes) by graph centrality measures. Selecting Top-n variables according to preferred centrality measure will yield a strong candidate subset of variables for further learning tasks e.g. clustering. We present our tool as a Shiny app which is a user-friendly interface development environment. We also extend the user interface for two well-known unsupervised variable selection methods from literature for comparison reasons.

Research paper thumbnail of An MIP model to schedule the call center workforce and organize the breaks

Nucleation and Atmospheric Aerosols, 2016

In modern economies, companies place a premium on managing their workforce efficiently especially... more In modern economies, companies place a premium on managing their workforce efficiently especially in labor intensive service sector, since the services have become the significant portion of the economies. Tour scheduling is an important tool to minimize the overall workforce costs while satisfying the minimum service level constraints. In this study, we consider the workforce management problem of an inbound call-center while satisfying the call demand within the short time periods with the minimum cost. We propose a mixed-integer programming model to assign workers to the daily shifts, to determine the weekly off-days, and to determine the timings of lunch and other daily breaks for each worker. The proposed model has been verified on the weekly demand data observed at a specific call center location of a satellite TV operator. The model was run on both 15 and 10 minutes demand estimation periods (planning time intervals).

Research paper thumbnail of Used Car Pricing and Beyond: A Survival Analysis Framework

A significant part of the overall automotive market is derived from the used car trade. Determini... more A significant part of the overall automotive market is derived from the used car trade. Determining correctly the used car market values will certainly help achieving fairer trade in many economies. By using the web listings as a proxy data source, we can create some models for the used car pricing based on the asking prices listed in the web adverts. This type of data acquisition requires a thorough data cleaning process to generate dependable statistical models after all. This paper proposes a survival analysis based approach to study the lifetime of the used car listings that can be found at web sites like Craigslist. Pricing models can be easily built to determine the market values of the used-cars from this type of data. One of the most important assumptions in our approach is to consider the delisting of an advert as a sale event. This is also equivalent to the death in the survival analysis context. Since the collected data have labels in terms of sale or not, we can utilize the predictive models to determine whether a particular car at a certain price will be successfully sold or not.

Research paper thumbnail of Using pairwise associations for multi-item markdown optimisation

International Journal of Systems Science: Operations & Logistics, Oct 12, 2015

ABSTRACTForecasting is an essential task conducted regularly by competitive retailers around the ... more ABSTRACTForecasting is an essential task conducted regularly by competitive retailers around the world. Most retail decisions particularly markdowns are made based on the demand forecasts which may or may not be accurate in the first place. In this study, we propose a framework for forecasting weekly demands of retail items via linear regression models within multi-item groups that incorporate both positive and negative item associations. We then utilise dynamic pricing models to optimise markdown decisions based on the forecasts within multi-item groups. Grouping items can be considered as a form of variable selection to prevent the overfitting in prediction models. We report regression results from multi-item groupings besides results from single-item regression model on a real-world data-set provided by an apparel retailer. We then report markdown optimisation results for the single items and multi-item groupings that multi-item forecasting models are built upon. The results show that the regression mo...

Research paper thumbnail of Analyzing Classified Listings at an E-Commerce Site by Using Survival Analysis

Periodicals of Engineering and Natural Sciences (PEN), Oct 2, 2013

Sahibinden.com is a leading e-commerce site in Turkey where sellers (buyers) may advertise their ... more Sahibinden.com is a leading e-commerce site in Turkey where sellers (buyers) may advertise their goods (needs) with or without a fee. Since it generates a large volume of traffic to the classified car listings, the site plays an important role for determining the market value of the used cars. In this study, we first randomly selected 200 car classifieds from 950 new classified ads on the day of February 22, 2012. We then observed these listings on a daily basis for a month to determine the possible updates and deletions of the ads. We assume that if an ad is taken out it means that the car has been sold. In addition to the cars' features, we observed the posted price and the number of daily views of the ads throughout the data collection. Therefore one can construct survival models to study the effects of the features and price of a car on the life of the ad. In other words, it is possible to study that what features and price levels expedite the sales of used cars.

Research paper thumbnail of An Integrated Approach for Shift Scheduling and Rostering Problems with Break Times for Inbound Call Centers

Mathematical Problems in Engineering, Nov 21, 2018

It may be very difficult to achieve the optimal shift schedule in call centers which have highly ... more It may be very difficult to achieve the optimal shift schedule in call centers which have highly uncertain and peaked demand during short time periods. Overlapping shift systems are usually designed for such cases. This paper studies shift scheduling and rostering problems for inbound call centers where overlapping shift systems are used. An integer programming model that determines which shifts to be opened and how many operators to be assigned to these shifts is proposed for the shift scheduling problem. For the rostering problem both integer programming and constraint programming models are developed to determine assignments of operators to all shifts, weekly days-off, and meal and relief break times of the operators. The proposed models are tested on real data supplied by an outsource call center and optimal results are found in an acceptable computation time. An improvement of 15% in the objective function compared to the current situation is observed with the proposed model for the shift scheduling problem. The computational performances of the proposed integer and constraint programming models for the rostering problem are compared using real data observed at a call center and simulated test instances. In addition, benchmark instances are used to compare our Constraint Programming (CP) approach with the existing models. The results of the comprehensive computational study indicate that the constraint programming model runs more efficiently than the integer programming model for the rostering problem. The originality of this research can be attributed to two contributions: (a) a model for shift scheduling problem and two models for rostering problem are presented in detail and compared using real data and (b) the rostering problem is considered as a task-resource allocation and considerably shorter computation times are obtained by modeling this new problem via CP.

Research paper thumbnail of webSPADE: a parallel sequence mining algorithm to analyze web log data

In this work we made a study of several other works were the association and sequence mining tech... more In this work we made a study of several other works were the association and sequence mining techniques were applied to the field of web usage mining. This report is to be submitted to classification to the Data Mining course at the phd program "Diseno, Analisis y Aplicaciones de Sistemas Inteligentes", of University of Granada.

Research paper thumbnail of A Framework for Visualizing Association Mining Results

Lecture Notes in Computer Science, 2006

Research paper thumbnail of Analyzing used-car web listings via text mining

Used car trade is one of the major components of the world economies. It is not uncommon to sell ... more Used car trade is one of the major components of the world economies. It is not uncommon to sell a car by placing an internet advertisement irrespective of the geography in these days. A typical content of an advertisement is usually composed of two parts namely the structured and the free text data. The structured data may include some information about the asking price, make, model, year, mileage of the car and the contact info. In most cases, seller may give important clues about the car's current conditions in the free text data where the title (head) of the advertisement can be included as free text too. This paper reports preliminary results from a text mining study conducted on 75K used car internet listings collected from two major car listing web sites in Turkey. As expected, the words and the phrases related to the description of the car are observed to be frequent. The leading concepts in the free text are found to be regarding how to describe the current condition of a car, for example "no crash history".

Research paper thumbnail of Analyzing Price Data to Determine Positive and Negative Product Associations

Lecture Notes in Computer Science, 2009

This paper presents a simple method for mining both positive and negative association rules in da... more This paper presents a simple method for mining both positive and negative association rules in databases using singular value decomposition (SVD) and similarity measures. In literature, SVD is used for summarizing matrices. We use transaction-item price matrix to generate so called ratio rules in the literature. Transaction-item price matrix is formed by using the price data of corresponding items from the sales transactions. Ratio rules are generated by running SVD on transactionitem price matrix. We then use similarity measures on a subset of rules found by Pareto analysis to determine positive and negative associations. The proposed method can present the positive and negative associations with their strengths. We obtain subsequent results using cosine and correlation similarity measures.

Research paper thumbnail of A Web based Decision Support System for Used Car Pricing

Used-car trade has a significant portion in overall automobile market and determining the values ... more Used-car trade has a significant portion in overall automobile market and determining the values of the cars is an important problem. This study proposes a new methodology for determining the market value of the used-cars by observing the classifieds in an e-commerce site. This type of data acquisition plays an important role to build pricing models and to conduct further analysis in our approach. In data acquisition stage, a set of new listings are chosen randomly each day from an ecommerce site (a web site like Craigslist), then these listings are observed until a predetermined period (e.g. thirty days) or delisting time, whichever comes first. The crucial part of our approach is the assumption of a sale event when the listing is no longer available i.e. delisted from the e-commerce site. The proposed methodology may potentially be used for pricing any used item based on the web listings. A web site was developed to help clients/users for determining the market values of their cars as a decision support tool that can assess the likelihood of selling a particular car at a certain price. We also presented the applicability of predictive models to determine the likelihood of selling a car within thirty day period based on the price set by the owner.

Research paper thumbnail of Linking Behavioral Patterns to Personal Attributes Through Data Re-Mining

Research paper thumbnail of Hybridization of population-based ant colony optimization via data mining

Intelligent Data Analysis, Mar 27, 2020

We propose a hybrid application of Population Based Ant Colony Optimization that uses a data mini... more We propose a hybrid application of Population Based Ant Colony Optimization that uses a data mining procedure to wisely initialize the pheromone entries. Hybridization of metaheuristics with data mining techniques has been studied by several researchers in recent years. In this line of research, frequent patterns in a number of initial high-quality solutions are extracted to guide the subsequent iterations of an algorithm, which results in an improvement in solution quality and computational time. Our proposal possesses certain differences from and contributions to existing literature. Instead of one single run that incorporates both the main metaheuristic and the data mining module inside, we propose to carry out independent runs and collect elite sets over these trials. Another contribution is the way we use the knowledge gained from the application of the data mining module. The extracted knowledge is used to initialize the memory model in the algorithm rather than to construct new initial solutions. One additional contribution is the use of a path mining algorithm (a specific sequence mining algorithm) rather than Apriori-like association mining algorithms. Computational experiments, conducted both on symmetric Travelling Salesman Problem and symmetric/asymmetric Quadratic Assignment Problem instances, showed that our proposal produces significantly better results, and is more robust than pure applications of population-based ant colony optimization.

Research paper thumbnail of Using constraint programming for the design of network-on-chip architectures

Computing, Oct 11, 2013

NoC technology is composed of packet-based interconnections, where the communication resources ar... more NoC technology is composed of packet-based interconnections, where the communication resources are distributed across the network. Therefore, the optimal resource utilization is a crucial consideration for efficient architectural designs. This paper studies the practicality of the Constraint Programming (CP) models for NoC architecture designs that effectively use a regular mesh with wormhole switching and the XY routing. The complexity of the CP models is compared with the earlier Mixed Integer Programming (MIP) models. Practical CP-based mapping and scheduling models are developed and results are reported on the benchmark datasets. Results indicate that mapping and scheduling problems can be solved at near optimality even under relatively shorter run-time limits as compared to those required by the MIP models.

Research paper thumbnail of Re-mining Positive and Negative Association Mining Results

Research paper thumbnail of A framework for balanced service and cross-selling by using queuing science

Journal of Intelligent Manufacturing, Jan 8, 2009

Research paper thumbnail of Re-mining item associations: Methodology and a case study in apparel retailing

Decision Support Systems, Dec 1, 2011

Association mining is the conventional data mining technique for analyzing market basket data and... more Association mining is the conventional data mining technique for analyzing market basket data and it reveals the positive and negative associations between items. While being an integral part of transaction data, pricing and time information have not been integrated into market basket analysis in earlier studies. This paper proposes a new approach to mine price, time and domain related attributes through re-mining of association mining results. The underlying factors behind positive and negative relationships can be characterized and described through this second data mining stage. The applicability of the methodology is demonstrated through the analysis of data coming from a large apparel retail chain, and its algorithmic complexity is analyzed in comparison to the existing techniques.

Research paper thumbnail of Enhancing Product Recommender Systems on Sparse Binary Data

Data Mining and Knowledge Discovery, Sep 1, 2004

Research paper thumbnail of A Data Mining-Based Framework for Multi-item Markdown Optimization

Springer series in fashion business, 2018

Markdown decisions in retailing are made based on the demand forecasts which may or may not be ac... more Markdown decisions in retailing are made based on the demand forecasts which may or may not be accurate in the first place. In this chapter, we propose a framework for forecasting weekly demands of retail items via linear regression models within multi-item groups that incorporate both positive and negative item associations. We then utilize dynamic pricing models to optimize markdown decisions based on the forecasts within multi-item groups. Grouping items can be considered as a form of variable selection to prevent the overfitting in prediction models. We report regression results from multi-item groupings besides results from single-item regression model on a real-world dataset provided by an apparel retailer. We then report markdown optimization results for the single items and multi-item groupings that multi-item forecasting models are built upon. The results show that the regression models provide better estimates within multi-item groups compared to the single-item model. Moreover, the overall revenues achieved in multi-item markdown optimization across all grouping schemes are higher than the total revenue yielded by single-item markdown optimization scheme.

Research paper thumbnail of 1 On Analyzing Web Log Data: A Parallel Sequence Mining Algorithm

Research paper thumbnail of DiscoVars: A New Data Analysis Perspective -- Application in Variable Selection for Clustering

arXiv (Cornell University), Apr 8, 2023

We present a new data analysis perspective to determine variable importance regardless of the und... more We present a new data analysis perspective to determine variable importance regardless of the underlying learning task. Traditionally, variable selection is considered an important step in supervised learning for both classification and regression problems. The variable selection also becomes critical when costs associated with the data collection and storage are considerably high for cases like remote sensing. Therefore, we propose a new methodology to select important variables from the data by first creating dependency networks among all variables and then ranking them (i.e. nodes) by graph centrality measures. Selecting Top-n variables according to preferred centrality measure will yield a strong candidate subset of variables for further learning tasks e.g. clustering. We present our tool as a Shiny app which is a user-friendly interface development environment. We also extend the user interface for two well-known unsupervised variable selection methods from literature for comparison reasons.