H. Altay Güvenir - Profile on Academia.edu (original) (raw)

Papers by H. Altay Güvenir

Research paper thumbnail of Movie Trailer Scene Classification Based on Audio VGGish Features

Movie Trailer Scene Classification Based on Audio VGGish Features

2022 International Conference on Machine Learning, Control, and Robotics (MLCR)

Research paper thumbnail of Abstract 18783: Hospitalization for Atrial Fibrillation Increases in the Elderly: Recent Analysis From TuRkish Atrial Fibrillation Data Base

Abstract 18783: Hospitalization for Atrial Fibrillation Increases in the Elderly: Recent Analysis From TuRkish Atrial Fibrillation Data Base

Circulation, 2013

Objective: Atrial fibrillation (AF) is the most prevalent sustained cardiac arrhytmia and constit... more Objective: Atrial fibrillation (AF) is the most prevalent sustained cardiac arrhytmia and constitutes a major public health problem. Patients with AF often have a variety of co-morbidities and need frequent hospitalizations. The present retrospective cohort study used medical claims data to evaluate the rates of hospitalization in patients with AF in Turkey. Methods: We analyzed the records of patients over the age 18 who had the diagnosis of non-valvular atrial fibrillation (AF) according to ICD-10 code I48 from a claims and utilization management system called MEDULA which processes claims for all health insurance funds in Turkey since 2007. Covering close to 100 % of the population, MEDULA is comprised of pharmacy, inpatient, outpatient and laboratory claims and covers 23,500 pharmacies, 20,000 general practitioners, 850 government hospitals, 60 university hospitals and 500 private hospitals. In this study we have used completely anonymized data Results: Of an eligible study popu...

Research paper thumbnail of Investigating the Validity of Ground Truth in Code Reviewer Recommendation Studies

2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2019

Background: Selecting the ideal code reviewer in modern code review is a crucial first step to pe... more Background: Selecting the ideal code reviewer in modern code review is a crucial first step to perform effective code reviews. There are several algorithms proposed in the literature for recommending the ideal code reviewer for a given pull request. The success of these code reviewer recommendation algorithms is measured by comparing the recommended reviewers with the ground truth that is the assigned reviewers selected in real life. However, in practice, the assigned reviewer may not be the ideal reviewer for a given pull request. Aims: In this study, we investigate the validity of ground truth data in code reviewer recommendation studies. Method: By conducting an informal literature review, we compared the reviewer selection heuristics in real life and the algorithms used in recommendation models. We further support our claims by using empirical data from code reviewer recommendation studies. Results: By literature review, and accompanying empirical data, we show that ground truth data used in code reviewer recommendation studies is potentially problematic. This reduces the validity of the code reviewer datasets and the reviewer recommendation studies. We demonstrated the cases where the ground truth in code reviewer recommendation studies are invalid and discussed the potential solutions to address this issue.

Research paper thumbnail of Türkçe İçi̇n Bi̇r Bağ Grameri̇

Syntactic parsing, or syntactic analysis, is the process of analyzing an input sequence in order ... more Syntactic parsing, or syntactic analysis, is the process of analyzing an input sequence in order to determine its grammatical structure, i.e. the formal relationships between the words of a sentence, with respect to a given grammar. In this thesis, we developed the grammar of Turkish language in the link grammar formalism. In the grammar, we used the output of a fully described morphological analyzer, which is very important for agglutinative languages like Turkish. The grammar that we developed is lexical such that we used the lexemes of only some function words and for the rest of the word classes we used the morphological feature structures. In addition, we preserved the some of the syntactic roles of the intermediate derived forms of words in our system.

Research paper thumbnail of Modeling interestingness of streaming classification rules as a classification problem

Inducing classification rules on domains from which information is gathered at regular periods le... more Inducing classification rules on domains from which information is gathered at regular periods lead the number of such classification rules to be generally so huge that selection of interesting ones among all discovered rules becomes an important task. At each period, using the newly gathered information from the domain, the new classification rules are induced. Therefore, these rules stream through time and are so called streaming classification rules. In this paper, an interactive classification rules' interestingness learning algorithm (ICRIL) is developed to automatically label the classification rules either as "interesting" or "uninteresting" with limited user interaction. In our study, VFFP (Voting Fuzzified Feature Projections), a feature projection based incremental classification algorithm, is also developed in the framework of ICRIL. The concept description learned by the VFFP is the interestingness concept of streaming classification rules.

Research paper thumbnail of Multicriteria inventory classification using a genetic algorithm

European Journal of Operational Research, 1998

One of the application areas of genetic algorithms is parameter optimization. This paper addresse... more One of the application areas of genetic algorithms is parameter optimization. This paper addresses the problem of optimizing a set of parameters that represent the weights of criteria, where the sum of all weights is 1. A chromosome represents the values of the weights, possibly along with some cut-off points. A new crossover operation, called continuous uniform crossover, is proposed, such that it produces valid chromosomes given that the parent chromosomes are valid. The new crossover technique is applied to the problem of multicriteria inventory classification. The results are compared with the classical inventory classification technique using the Analytical Hierarchy Process. @ 1998 Elsevier Science B.V.

Research paper thumbnail of Reply to Nezic et al

Reply to Nezic et al

European Journal of Cardio-Thoracic Surgery, 2012

We appreciate the comments by Nezic et al. [1] regarding our article [2]. Nezic et al. pointed ou... more We appreciate the comments by Nezic et al. [1] regarding our article [2]. Nezic et al. pointed out the pinholes of the everting mattress sutures on the flange as the potential sites of bleeding. However, a recent Gelatin-sealed woven Dacron graft was considered because the problem was rare. In addition, we injected fibrin glue around the proximal suture line after sewing the Valsalva sinus skirt to the residual aortic wall to prevent bleeding of the pinholes. Actually, no patient required re-exploration for bleeding. Therefore, we think that the flange side pinholes of everting mattress sutures are not the potential sites of bleeding. As stated in their letter, sewing the residual aortic wall to the prosthetic sewing cuff or the prosthetic tube is a simple technique [3]. However, their technique is difficult to wrap tightly all of the proximal anastomosis line in case with the thin and fragile residual aortic wall or the inadequate length to contact the cuff after resection of coronary ostia. To prevent this problem, Chen et al. [4] made the corresponding portions of the flange long enough in order to sew between the residual aortic wall and the flange. We used the Valsalva sinus skirt as the flange, which is soft and stretchable. This characteristic is useful to sew tightly the skirt to the residual aortic wall even in the cases.

Research paper thumbnail of A Negotiation Platform for Cooperating Multi-agent Systems

Concurrent Engineering, 1993

Distributed artificial intelligence attempts to integrate and coordinate the activities of multip... more Distributed artificial intelligence attempts to integrate and coordinate the activities of multiple, intelligent problem solvers that come together to solve complex tasks in domains such as design, medical diagnosis, business management, and so on Due to the different goals, knowledge, and viewpoint of the agents, conflicts might arise at any phase of the problem-solving process. Managing diverse knowledge requires well-organized models of conflict resolution. In this paper, a system for cooperating intelligent agents which openly supports multi- agent conflict detection and resolution is described. The system is based on the insights, first, that each agent has its own conflict knowledge which is separated from its domain-level knowledge; and, second, that each agent has its own conflict management knowledge which is not accessible to or known by others. Furthermore, there are no globally-known conflict-resolution strategies. Each agent involved in a conflict chooses a resolution s...

Research paper thumbnail of An Eager Regression Method Based on Selecting Appropriate Features

This paper describes a machine learning method, called Regression by Selecthtg Best P~'ttlll... more This paper describes a machine learning method, called Regression by Selecthtg Best P~'ttllll'es (RSBF). RSBF consists of two phases: The first phase aims to find the predictive power of each feature by constructing simple linear regression lines, one per each continuous ...

Research paper thumbnail of Proceedings of the 8th European conference on Advances in Case-Based Reasoning

Proceedings of the 8th European conference on Advances in Case-Based Reasoning

Research paper thumbnail of Konum Önerisi için Zaman Tabanlı Uzman Destekli İşbirliğine Dayalı Filtreleme

Konuma dayali sosyal aglar, son on yilda, kullanicinin konum gecmislerine dayanarak tercihlerini ... more Konuma dayali sosyal aglar, son on yilda, kullanicinin konum gecmislerine dayanarak tercihlerini arastirmamiz icin bize yeni bir platform saglayarak onemli olcude gelisti. Konuma dayali sosyal aglarin cogu, kullanicilarin varliklarini aciklayabilecekleri, yorumlayabilecekleri veya ipucu birakabilecekleri bir kategori hiyerarsisi altina yerlestirilen cesitli mekanlar saglar. Cografi bilgili konum onerileri bircok arastirmacinin ilgisini cekmesine ragmen, arastirma projelerinin cogunda zamanin kullanicinin tercihleri uzerindeki etkisi goz ardi edilmistir. Bir kullanici, gunun farkli saatlerinde ziyaret etmek icin farkli mekanlari tercih edebileceginden, belirli bir kategoride ayni miktarda giris yapan iki kullanici, o mekanda bulunma zamanina bagli olarak daha az benzer olabilir. Ayrica, geleneksel isbirligine dayali filtreleme teknikleri, tum kullanicilarin tercihlerini goz onunde bulundururken, yalnizca kategori uzmanlarinin tercihlerini goz onunde bulundurarak, o kategorideki bir m...

Research paper thumbnail of Learning Interestingness of Streaming Classification Rules

Lecture Notes in Computer Science, 2004

Inducing classification rules on domains from which information is gathered at regular periods le... more Inducing classification rules on domains from which information is gathered at regular periods lead the number of such classification rules to be generally so huge that selection of interesting ones among all discovered rules becomes an important task. At each period, using the newly gathered information from the domain, the new classification rules are induced. Therefore, these rules stream through time and are so called streaming classification rules. In this paper, an interactive rule interestingness-learning algorithm (IRIL) is developed to automatically label the classification rules either as "interesting" or "uninteresting" with limited user interaction. In our study, VFP (Voting Feature Projections), a feature projection based incremental classification learning algorithm, is also developed in the framework of IRIL. The concept description learned by the VFP algorithm constitutes a novel approach for interestingness analysis of streaming classification rules.

Research paper thumbnail of Classification by Feature Partitioning

Machine Learning, Apr 1, 1996

This paper presents a new form of exemplar-based learning, based on a representation scheme calle... more This paper presents a new form of exemplar-based learning, based on a representation scheme called feature partitioning, and a particular implementation of this technique called CFP (for Classification by Feature Partitioning). Learning in CFP is accomplished by storing the objects separately in each feature dimension as disjoint sets of values called segments. A segment is expanded through generalization or specialized by dividing it into sub-segments. Classification is based on a weighted voting among the individual predictions of the features, which are simply the class values of the segments corresponding to the values of a test instance for each feature. An empirical evaluation of CFP and its comparison with two other classification techniques that consider each feature separately are given.

Research paper thumbnail of A Discretization Method Based on Maximizing the Area Under Receiver Operating Characteristic Curve

International Journal of Pattern Recognition and Artificial Intelligence, Feb 1, 2013

Many machine learning algorithms require the features to be categorical. Hence, they require all ... more Many machine learning algorithms require the features to be categorical. Hence, they require all numeric-valued data to be discretized into intervals. In this paper, we present a new discretization method based on the receiver operating characteristics (ROC) Curve (AUC) measure. Maximum area under ROC curve-based discretization (MAD) is a global, static and supervised discretization method. MAD uses the sorted order of the continuous values of a feature and discretizes the feature in such a way that the AUC based on that feature is to be maximized. The proposed method is compared with alternative discretization methods such as ChiMerge, Entropy-Minimum Description Length Principle (MDLP), Fixed Frequency Discretization (FFD), and Proportional Discretization (PD). FFD and PD have been recently proposed and are designed for Naïve Bayes learning. ChiMerge is a merging discretization method as the MAD method. Evaluations are performed in terms of M-Measure, an AUC-based metric for multi-class classi¯cation, and accuracy values obtained from Naïve Bayes and Aggregating One-Dependence Estimators (AODE) algorithms by using real-world datasets. Empirical results show that MAD is a strong candidate to be a good alternative to other discretization methods.

Research paper thumbnail of Learning Translation Rules From A Bilingual Corpus

arXiv (Cornell University), Jul 26, 1996

This paper proposes a mechanism for learning pattern correspondences between two languages from a... more This paper proposes a mechanism for learning pattern correspondences between two languages from a corpus of translated sentence pairs. The proposed mechanism uses analogical reasoning between two translations. Given a pair of translations, the similar parts of the sentences in the source language must correspond the similar parts of the sentences in the target language. Similarly, the di erent parts should correspond to the respective parts in the translated sentences. The correspondences between the similarities, and also di erences are learned in the form of translation rules. The system is tested on a small training dataset and produced promising results for further investigation.

Research paper thumbnail of Distributed artificial intelligence in engineering design

Distributed artificial intelligence in engineering design

Distributed Artificial Intelligence (DAI) research is concerned with solving problems using both ... more Distributed Artificial Intelligence (DAI) research is concerned with solving problems using both AI techniques and distributed processing. DAI techniques can be used to solve many complex problems such as engineering design problems, interpretation problems, planning problems, etc. In this paper, first a survey of DAI approach to problem solving is given and then how a DAI tool, the CEF, is used to implement a system for designing steam condensers, called STEAMER, is described

Research paper thumbnail of Application of the RIMARC Algorithm to a Large Data Set of Action Potentials and Clinical Parameters for Risk Prediction of Atrial Fibrillation

Biophysical Journal, 2015

Research paper thumbnail of Feature Dependency in Benefit Maximization: A Case Study in the Evaluation of Bank Loan Applications

In most of the real-world domains, benefit and costs of classifications can be dependent on the c... more In most of the real-world domains, benefit and costs of classifications can be dependent on the characteristics of individual examples. In such cases, there is no static benefit matrix available in the domain and each classification benefit is calculated separately. This situation, called feature dependency, is evaluated in the framework of our newly proposed classification algorithm Benefit Maximizing classifier with Feature Intervals (BMFI) that uses feature projection based knowledge representation. This new approach has been evaluated over bank loan applications and experimental results are presented.

Research paper thumbnail of Feature Projection Based Rule Classification

Due to the increase in data mining research and applications, selection of interesting rules amon... more Due to the increase in data mining research and applications, selection of interesting rules among a huge number of learned rules is an important task in data mining applications. In this paper, the metrics for the interestingness of a rule is investigated and an algorithm that can classify the learned rules according to their interestingness is developed. Classification algorithms were designed to maximize the number of correctly classified instances, given a set of unseen test cases. Furthermore, feature projection based classification algorithms were tested and shown to be successful in large number of real domains. So, in this work, a feature projection based classification algorithm (VFI, Voting Feature Intervals) is adapted to the rule interestingness problem, and FPRC (Feature Projection Based Rule Classification) algorithm is developed.

Research paper thumbnail of Feature Construction and its Application to Predicting Financial Distress

Voting Features based Classifiers, shortly VFC, have been shown to perform well on most real-worl... more Voting Features based Classifiers, shortly VFC, have been shown to perform well on most real-world data sets. They are robust to irrelevant features and missing feature values. In this paper, we introduce an extension to VFC, called Voting Features based Classifier with feature Construction, VFCC for short, and show its application to the problem of predicting if a bank will encounter financial distress, by analyzing current financial statements. The previously developed VFC learn a set of rules that contain a single condition based on a single feature in their antecedent. The VFCC algorithm proposed in this work, on the other hand, constructs rules whose antecedents may contain conjuncts based on several features. Experimental results on recent financial ratios of banks in Turkey show that the VFCC algorithm achieves better accuracy than other well-known rule learning classification algorithms.

Research paper thumbnail of Movie Trailer Scene Classification Based on Audio VGGish Features

Movie Trailer Scene Classification Based on Audio VGGish Features

2022 International Conference on Machine Learning, Control, and Robotics (MLCR)

Research paper thumbnail of Abstract 18783: Hospitalization for Atrial Fibrillation Increases in the Elderly: Recent Analysis From TuRkish Atrial Fibrillation Data Base

Abstract 18783: Hospitalization for Atrial Fibrillation Increases in the Elderly: Recent Analysis From TuRkish Atrial Fibrillation Data Base

Circulation, 2013

Objective: Atrial fibrillation (AF) is the most prevalent sustained cardiac arrhytmia and constit... more Objective: Atrial fibrillation (AF) is the most prevalent sustained cardiac arrhytmia and constitutes a major public health problem. Patients with AF often have a variety of co-morbidities and need frequent hospitalizations. The present retrospective cohort study used medical claims data to evaluate the rates of hospitalization in patients with AF in Turkey. Methods: We analyzed the records of patients over the age 18 who had the diagnosis of non-valvular atrial fibrillation (AF) according to ICD-10 code I48 from a claims and utilization management system called MEDULA which processes claims for all health insurance funds in Turkey since 2007. Covering close to 100 % of the population, MEDULA is comprised of pharmacy, inpatient, outpatient and laboratory claims and covers 23,500 pharmacies, 20,000 general practitioners, 850 government hospitals, 60 university hospitals and 500 private hospitals. In this study we have used completely anonymized data Results: Of an eligible study popu...

Research paper thumbnail of Investigating the Validity of Ground Truth in Code Reviewer Recommendation Studies

2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2019

Background: Selecting the ideal code reviewer in modern code review is a crucial first step to pe... more Background: Selecting the ideal code reviewer in modern code review is a crucial first step to perform effective code reviews. There are several algorithms proposed in the literature for recommending the ideal code reviewer for a given pull request. The success of these code reviewer recommendation algorithms is measured by comparing the recommended reviewers with the ground truth that is the assigned reviewers selected in real life. However, in practice, the assigned reviewer may not be the ideal reviewer for a given pull request. Aims: In this study, we investigate the validity of ground truth data in code reviewer recommendation studies. Method: By conducting an informal literature review, we compared the reviewer selection heuristics in real life and the algorithms used in recommendation models. We further support our claims by using empirical data from code reviewer recommendation studies. Results: By literature review, and accompanying empirical data, we show that ground truth data used in code reviewer recommendation studies is potentially problematic. This reduces the validity of the code reviewer datasets and the reviewer recommendation studies. We demonstrated the cases where the ground truth in code reviewer recommendation studies are invalid and discussed the potential solutions to address this issue.

Research paper thumbnail of Türkçe İçi̇n Bi̇r Bağ Grameri̇

Syntactic parsing, or syntactic analysis, is the process of analyzing an input sequence in order ... more Syntactic parsing, or syntactic analysis, is the process of analyzing an input sequence in order to determine its grammatical structure, i.e. the formal relationships between the words of a sentence, with respect to a given grammar. In this thesis, we developed the grammar of Turkish language in the link grammar formalism. In the grammar, we used the output of a fully described morphological analyzer, which is very important for agglutinative languages like Turkish. The grammar that we developed is lexical such that we used the lexemes of only some function words and for the rest of the word classes we used the morphological feature structures. In addition, we preserved the some of the syntactic roles of the intermediate derived forms of words in our system.

Research paper thumbnail of Modeling interestingness of streaming classification rules as a classification problem

Inducing classification rules on domains from which information is gathered at regular periods le... more Inducing classification rules on domains from which information is gathered at regular periods lead the number of such classification rules to be generally so huge that selection of interesting ones among all discovered rules becomes an important task. At each period, using the newly gathered information from the domain, the new classification rules are induced. Therefore, these rules stream through time and are so called streaming classification rules. In this paper, an interactive classification rules' interestingness learning algorithm (ICRIL) is developed to automatically label the classification rules either as "interesting" or "uninteresting" with limited user interaction. In our study, VFFP (Voting Fuzzified Feature Projections), a feature projection based incremental classification algorithm, is also developed in the framework of ICRIL. The concept description learned by the VFFP is the interestingness concept of streaming classification rules.

Research paper thumbnail of Multicriteria inventory classification using a genetic algorithm

European Journal of Operational Research, 1998

One of the application areas of genetic algorithms is parameter optimization. This paper addresse... more One of the application areas of genetic algorithms is parameter optimization. This paper addresses the problem of optimizing a set of parameters that represent the weights of criteria, where the sum of all weights is 1. A chromosome represents the values of the weights, possibly along with some cut-off points. A new crossover operation, called continuous uniform crossover, is proposed, such that it produces valid chromosomes given that the parent chromosomes are valid. The new crossover technique is applied to the problem of multicriteria inventory classification. The results are compared with the classical inventory classification technique using the Analytical Hierarchy Process. @ 1998 Elsevier Science B.V.

Research paper thumbnail of Reply to Nezic et al

Reply to Nezic et al

European Journal of Cardio-Thoracic Surgery, 2012

We appreciate the comments by Nezic et al. [1] regarding our article [2]. Nezic et al. pointed ou... more We appreciate the comments by Nezic et al. [1] regarding our article [2]. Nezic et al. pointed out the pinholes of the everting mattress sutures on the flange as the potential sites of bleeding. However, a recent Gelatin-sealed woven Dacron graft was considered because the problem was rare. In addition, we injected fibrin glue around the proximal suture line after sewing the Valsalva sinus skirt to the residual aortic wall to prevent bleeding of the pinholes. Actually, no patient required re-exploration for bleeding. Therefore, we think that the flange side pinholes of everting mattress sutures are not the potential sites of bleeding. As stated in their letter, sewing the residual aortic wall to the prosthetic sewing cuff or the prosthetic tube is a simple technique [3]. However, their technique is difficult to wrap tightly all of the proximal anastomosis line in case with the thin and fragile residual aortic wall or the inadequate length to contact the cuff after resection of coronary ostia. To prevent this problem, Chen et al. [4] made the corresponding portions of the flange long enough in order to sew between the residual aortic wall and the flange. We used the Valsalva sinus skirt as the flange, which is soft and stretchable. This characteristic is useful to sew tightly the skirt to the residual aortic wall even in the cases.

Research paper thumbnail of A Negotiation Platform for Cooperating Multi-agent Systems

Concurrent Engineering, 1993

Distributed artificial intelligence attempts to integrate and coordinate the activities of multip... more Distributed artificial intelligence attempts to integrate and coordinate the activities of multiple, intelligent problem solvers that come together to solve complex tasks in domains such as design, medical diagnosis, business management, and so on Due to the different goals, knowledge, and viewpoint of the agents, conflicts might arise at any phase of the problem-solving process. Managing diverse knowledge requires well-organized models of conflict resolution. In this paper, a system for cooperating intelligent agents which openly supports multi- agent conflict detection and resolution is described. The system is based on the insights, first, that each agent has its own conflict knowledge which is separated from its domain-level knowledge; and, second, that each agent has its own conflict management knowledge which is not accessible to or known by others. Furthermore, there are no globally-known conflict-resolution strategies. Each agent involved in a conflict chooses a resolution s...

Research paper thumbnail of An Eager Regression Method Based on Selecting Appropriate Features

This paper describes a machine learning method, called Regression by Selecthtg Best P~'ttlll... more This paper describes a machine learning method, called Regression by Selecthtg Best P~'ttllll'es (RSBF). RSBF consists of two phases: The first phase aims to find the predictive power of each feature by constructing simple linear regression lines, one per each continuous ...

Research paper thumbnail of Proceedings of the 8th European conference on Advances in Case-Based Reasoning

Proceedings of the 8th European conference on Advances in Case-Based Reasoning

Research paper thumbnail of Konum Önerisi için Zaman Tabanlı Uzman Destekli İşbirliğine Dayalı Filtreleme

Konuma dayali sosyal aglar, son on yilda, kullanicinin konum gecmislerine dayanarak tercihlerini ... more Konuma dayali sosyal aglar, son on yilda, kullanicinin konum gecmislerine dayanarak tercihlerini arastirmamiz icin bize yeni bir platform saglayarak onemli olcude gelisti. Konuma dayali sosyal aglarin cogu, kullanicilarin varliklarini aciklayabilecekleri, yorumlayabilecekleri veya ipucu birakabilecekleri bir kategori hiyerarsisi altina yerlestirilen cesitli mekanlar saglar. Cografi bilgili konum onerileri bircok arastirmacinin ilgisini cekmesine ragmen, arastirma projelerinin cogunda zamanin kullanicinin tercihleri uzerindeki etkisi goz ardi edilmistir. Bir kullanici, gunun farkli saatlerinde ziyaret etmek icin farkli mekanlari tercih edebileceginden, belirli bir kategoride ayni miktarda giris yapan iki kullanici, o mekanda bulunma zamanina bagli olarak daha az benzer olabilir. Ayrica, geleneksel isbirligine dayali filtreleme teknikleri, tum kullanicilarin tercihlerini goz onunde bulundururken, yalnizca kategori uzmanlarinin tercihlerini goz onunde bulundurarak, o kategorideki bir m...

Research paper thumbnail of Learning Interestingness of Streaming Classification Rules

Lecture Notes in Computer Science, 2004

Inducing classification rules on domains from which information is gathered at regular periods le... more Inducing classification rules on domains from which information is gathered at regular periods lead the number of such classification rules to be generally so huge that selection of interesting ones among all discovered rules becomes an important task. At each period, using the newly gathered information from the domain, the new classification rules are induced. Therefore, these rules stream through time and are so called streaming classification rules. In this paper, an interactive rule interestingness-learning algorithm (IRIL) is developed to automatically label the classification rules either as "interesting" or "uninteresting" with limited user interaction. In our study, VFP (Voting Feature Projections), a feature projection based incremental classification learning algorithm, is also developed in the framework of IRIL. The concept description learned by the VFP algorithm constitutes a novel approach for interestingness analysis of streaming classification rules.

Research paper thumbnail of Classification by Feature Partitioning

Machine Learning, Apr 1, 1996

This paper presents a new form of exemplar-based learning, based on a representation scheme calle... more This paper presents a new form of exemplar-based learning, based on a representation scheme called feature partitioning, and a particular implementation of this technique called CFP (for Classification by Feature Partitioning). Learning in CFP is accomplished by storing the objects separately in each feature dimension as disjoint sets of values called segments. A segment is expanded through generalization or specialized by dividing it into sub-segments. Classification is based on a weighted voting among the individual predictions of the features, which are simply the class values of the segments corresponding to the values of a test instance for each feature. An empirical evaluation of CFP and its comparison with two other classification techniques that consider each feature separately are given.

Research paper thumbnail of A Discretization Method Based on Maximizing the Area Under Receiver Operating Characteristic Curve

International Journal of Pattern Recognition and Artificial Intelligence, Feb 1, 2013

Many machine learning algorithms require the features to be categorical. Hence, they require all ... more Many machine learning algorithms require the features to be categorical. Hence, they require all numeric-valued data to be discretized into intervals. In this paper, we present a new discretization method based on the receiver operating characteristics (ROC) Curve (AUC) measure. Maximum area under ROC curve-based discretization (MAD) is a global, static and supervised discretization method. MAD uses the sorted order of the continuous values of a feature and discretizes the feature in such a way that the AUC based on that feature is to be maximized. The proposed method is compared with alternative discretization methods such as ChiMerge, Entropy-Minimum Description Length Principle (MDLP), Fixed Frequency Discretization (FFD), and Proportional Discretization (PD). FFD and PD have been recently proposed and are designed for Naïve Bayes learning. ChiMerge is a merging discretization method as the MAD method. Evaluations are performed in terms of M-Measure, an AUC-based metric for multi-class classi¯cation, and accuracy values obtained from Naïve Bayes and Aggregating One-Dependence Estimators (AODE) algorithms by using real-world datasets. Empirical results show that MAD is a strong candidate to be a good alternative to other discretization methods.

Research paper thumbnail of Learning Translation Rules From A Bilingual Corpus

arXiv (Cornell University), Jul 26, 1996

This paper proposes a mechanism for learning pattern correspondences between two languages from a... more This paper proposes a mechanism for learning pattern correspondences between two languages from a corpus of translated sentence pairs. The proposed mechanism uses analogical reasoning between two translations. Given a pair of translations, the similar parts of the sentences in the source language must correspond the similar parts of the sentences in the target language. Similarly, the di erent parts should correspond to the respective parts in the translated sentences. The correspondences between the similarities, and also di erences are learned in the form of translation rules. The system is tested on a small training dataset and produced promising results for further investigation.

Research paper thumbnail of Distributed artificial intelligence in engineering design

Distributed artificial intelligence in engineering design

Distributed Artificial Intelligence (DAI) research is concerned with solving problems using both ... more Distributed Artificial Intelligence (DAI) research is concerned with solving problems using both AI techniques and distributed processing. DAI techniques can be used to solve many complex problems such as engineering design problems, interpretation problems, planning problems, etc. In this paper, first a survey of DAI approach to problem solving is given and then how a DAI tool, the CEF, is used to implement a system for designing steam condensers, called STEAMER, is described

Research paper thumbnail of Application of the RIMARC Algorithm to a Large Data Set of Action Potentials and Clinical Parameters for Risk Prediction of Atrial Fibrillation

Biophysical Journal, 2015

Research paper thumbnail of Feature Dependency in Benefit Maximization: A Case Study in the Evaluation of Bank Loan Applications

In most of the real-world domains, benefit and costs of classifications can be dependent on the c... more In most of the real-world domains, benefit and costs of classifications can be dependent on the characteristics of individual examples. In such cases, there is no static benefit matrix available in the domain and each classification benefit is calculated separately. This situation, called feature dependency, is evaluated in the framework of our newly proposed classification algorithm Benefit Maximizing classifier with Feature Intervals (BMFI) that uses feature projection based knowledge representation. This new approach has been evaluated over bank loan applications and experimental results are presented.

Research paper thumbnail of Feature Projection Based Rule Classification

Due to the increase in data mining research and applications, selection of interesting rules amon... more Due to the increase in data mining research and applications, selection of interesting rules among a huge number of learned rules is an important task in data mining applications. In this paper, the metrics for the interestingness of a rule is investigated and an algorithm that can classify the learned rules according to their interestingness is developed. Classification algorithms were designed to maximize the number of correctly classified instances, given a set of unseen test cases. Furthermore, feature projection based classification algorithms were tested and shown to be successful in large number of real domains. So, in this work, a feature projection based classification algorithm (VFI, Voting Feature Intervals) is adapted to the rule interestingness problem, and FPRC (Feature Projection Based Rule Classification) algorithm is developed.

Research paper thumbnail of Feature Construction and its Application to Predicting Financial Distress

Voting Features based Classifiers, shortly VFC, have been shown to perform well on most real-worl... more Voting Features based Classifiers, shortly VFC, have been shown to perform well on most real-world data sets. They are robust to irrelevant features and missing feature values. In this paper, we introduce an extension to VFC, called Voting Features based Classifier with feature Construction, VFCC for short, and show its application to the problem of predicting if a bank will encounter financial distress, by analyzing current financial statements. The previously developed VFC learn a set of rules that contain a single condition based on a single feature in their antecedent. The VFCC algorithm proposed in this work, on the other hand, constructs rules whose antecedents may contain conjuncts based on several features. Experimental results on recent financial ratios of banks in Turkey show that the VFCC algorithm achieves better accuracy than other well-known rule learning classification algorithms.