E. Adalı | Istanbul Technical University (original) (raw)
Papers by E. Adalı
Bilgisayar ve Bilisim alanindaki gelismeler yeni uzmanlik alanlarinin dogmasina neden olmustur. B... more Bilgisayar ve Bilisim alanindaki gelismeler yeni uzmanlik alanlarinin dogmasina neden olmustur. Bu mesleklerde yetismis elemanlara duyulan gereksinim egitim kurumlarinin da kendilerini bu gelismeleri karsilayacak bicimde duzenlemelerini gerektirmistir. Bilgisayar ve bilisim alaninda, Dunya genelindeki gelismeler ve buna bagli olarak dogan yetismis eleman gereksinimi, degisik mesleklerde yetismis kisileri bu alanda calismaya tesvik etmistir. Bu durum, asil meslek sahiplerinin yeteneklerinin tartisilmasina neden olmaya baslamistir. Bilgisayar ve bilisim alaninda, Dunya genelindeki gelismeler ve buna bagli olarak dogan yetismis eleman gereksinimi, degisik mesleklerde yetismis kisileri bu alanda calismaya tesvik etmistir. Bu durum, asil meslek sahiplerinin yeteneklerinin tartisilmasina neden olmaya baslamistir. Bu yazi kapsaminda, ilk olarak bilgisayar ve bilisim alanindaki uzmanlik alanlarinin tanimlari verilmis; ardindan ulkemizin gereksinimleri goz onune alinarak yapilmasi gerekenle...
—In this paper we present a framework for extraction of Turkish phrases and their concepts. The o... more —In this paper we present a framework for extraction of Turkish phrases and their concepts. The objective of the study is meeting the requirement of sources for Turkish Semantic Extractions and represent a Turkish sentence at phrase-concept level. The semantic and grammatical analysis of a sentence is a basic content of Natural Language Processing (NLP) which is a branch of Artificial Intelligence (AI). In our study Turkish Phrase-Content Finding system is formed as a source for the other application areas in NLP. This system can be used in Summarization Systems, Information Extraction, Automatic Question Answering System, Semantic Role Labeling, and other semantic application.
Lecture Notes in Electrical Engineering, 2013
ABSTRACT In this paper, the effect of different windowing schemes on word sense disambiguation ac... more ABSTRACT In this paper, the effect of different windowing schemes on word sense disambiguation accuracy is presented. Turkish Lexical Sample Dataset has been used in the experiments. We took the samples of ambiguous verbs and nouns of the dataset and used bag-of-word properties as context information. The experi-ments have been repeated for different window sizes based on several machine learning algorithms. We follow 2/3 splitting strategy (2/3 for training, 1/3 for test-ing) and determine the most frequently used words in the training part. After re-moving stop words, we repeated the experiments by using most frequent 100, 75, 50 and 25 content words of the training data. Our findings show that the usage of most frequent 75 words as features improves the accuracy in results for Turkish verbs. Similar results have been obtained for Turkish nouns when we use the most frequent 100 words of the training set. Considering this information, selected al-gorithms have been tested on varying window sizes {30, 15, 10 and 5}. Our find-ings show that Naïve Bayes and Functional Tree methods yielded better accuracy results. And the window size \(\pm \)5 gives the best average results both for noun and the verb groups. It is observed that the best results of the two groups are 65.8 and 56 % points above the most frequent sense baseline of the verb and noun groups respectively.
In data mining and knowledge discovery, similarity between objects is one of the central concepts... more In data mining and knowledge discovery, similarity between objects is one of the central concepts. A measure of similarity can be user-defined, but an important problem is defining similarity on the basis of data. In this paper we introduce the problem of finding the pair-wise similarities of quantitative valued sequences where each sequence is a list of items. Traditional approaches for defining the similarity between two sequences typically consider only the binary values of items in sequences, not the quantitative values. Such similarity measure is often useful for finding the similarities between genes or protein sequences. However, they cannot reflect certain kinds of similarity where the sequences contain two different kinds of information type, such as quantitative and order information. However, such type of sequence data arise in many applications, for example, marketing and sales data or web log data may contain two different kinds of information. Therefore, we introduce a new similarity measure that takes into account the values of items in sequences. We give an algorithm for calculating the similarity between such quantitative sequences. Finally, we describe the results of using this approach on two different real-life datasets.
Özetçe Bu makalenin amacı, yaramaz (spam) epostaları, normal e-postalardan ayırma süreci için, ka... more Özetçe Bu makalenin amacı, yaramaz (spam) epostaları, normal e-postalardan ayırma süreci için, karar destek makineleri (Support Vector Machines -SVM), bellek tabanlı öğrenme (Memory Based Learning -MBL) ve Naïve Bayes (NB) yöntemlerinin karşılaştırmalı değerlendirmesini yapmaktır. Yaramaz e-postaların süzülmesinde kullanılan yöntemleri karşılaştıran birçok çalışma olmasına karşın, bu çalışmaların büyük çoğunluğu, farklı veri kümeleri kullandıklarından karşılaştırılabilir nitelikte değildir. Bu çalışmada, SVM, MBL ve NB yöntemleri karşılaştırılırken, herkesin erişimine açık olan ortak bir derlem (corpus) olan LINGSPAM derlemi kullanılmıştır. MBL ve NB yöntemleri, önceki çalışmalarda bu veri kümesi üzerinde sınandığı için, önceki deneylerden elde edilen en iyi parametreler ufak değişikliklerle kullanılmıştır. Ancak SVM yönteminin en iyi sonucu vermesini sağlamak için çok sayıda deney yapılmıştır. Çalışmamızda bir e-postanın, yaramaz olarak tanınması durumunda, bu e-postaya nasıl davranılacağına ilişkin senaryo önerileri verilmiş ve gerçeklenen sınıflandırıcıların hatalı çalışması durumunda ilgili senaryolara göre ortaya çıkabilecek hataların bedeli göz önüne alınarak bu üç sınıflandırma yöntemi değerlendirilmiştir. Ortaya çıkan sonuçlarda, SVM yönteminin hata bedelinin sıfır olduğu ya da yüksek olduğu senaryolar için başarımının diğer yöntemlerden daha iyi olduğu görülmüştür. Ancak hata bedelinin çok yüksek olması durumunda ise NB yöntemi en iyi sonucu vermiştir. Abstract This paper presents a comparison of support vector machines (SVM), memory-based learning (MBL) and Naïve Bayes (NB) techniques for the classification of legitimate and spam mails. Although there are a number of methodcomparative studies regarding spam mail filtering, most of the studies are tested on separate data sets. In order to evaluate the effectiveness of SVM, MBL and NB methods, we have used a common publicly available corpus (LINGSPAM). As MBL and NB methods are previously tested with this corpus, the obtained best parameters are used in the experiments with few changes. On the other hand, intense experiments are made to find the best attribute dimensions with SVMs. Results show that SVM has significantly better performance for no-cost and high-cost cases, but NB performs best when the cost is extremely high.
Özet Tümcenin anlamsal ve dilbilgisi açısından çözümlenmesi Doğal Dil İşleme (DDİ)'nin ana konula... more Özet Tümcenin anlamsal ve dilbilgisi açısından çözümlenmesi Doğal Dil İşleme (DDİ)'nin ana konulardan biridir. Çalışmamızda, tümcedeki temel dilbilgisi ve anlamsal yanlışları saptamak için yüklemi temel alan yeni bir yöntem önerilmektedir. Türkçe tümcede yüklem özne ve zaman bilgisi içerir. Ayrıca yüklem, o tümcenin hangi öbeklerden oluşabileceği konusunda da belirleyicidir. Örneğin, " büyümek " yüklemi tümce içinde nesne almazken,-de ekiyle biten dolaylı tümleç öbeğini alır. Yüklem ayrıca her bir öbeğin içereceği kavram hakkında da bilgi içermektedir. Örneğin " düşünmek " yüklemi insanlara özgüdür. Dolayısıyla özne olarak insan kavramıyla ilişkilidir. Bu saptamalardan yola çıkarak çalışmamızda, tümcelerin öbekleri bulunmuş; her bir öbeğin hangi kavramla ilişkili olduğu belirlenmiş ve tümcenin dilbilgisi çözümlemesini ve anlam çözümlemesini yapan bir model tasarlanmıştır. Vectorial Approach For Analysing Turkish Sentence Keywords Grammatical and semantic analysis, phrase-concept and verb compatibility, vector representation of sentence. Abstract The grammatical and semantic analysis of the sentence is one of the main subjects of Natural Language Processing (NLP). In this paper, we present a novel method to detect basic grammatical and semantic disorders by concentrating on the predicate. In Turkish, the predicate includes information about the subject and tense. The predicate also helps identify the phrases which make up the sentence. For example, " büyümek " (to grow) does not take an object, but it can take a locative phrase ending with the suffix "-de ". The predicate is also informative about the semantic concept of a phrase. For example " düşünmek " (to think) is specifically an action performed by a human, so the subject will be related with the concept of a human. With these properties considered, a model has been designed to find phrases in a sentence, identify their relations to specific concepts, and analyze the sentences grammatically and semantically.
This paper presents the design and the implementation of a morphological analyzer for Turkish. A ... more This paper presents the design and the implementation of
a morphological analyzer for Turkish. A new
methodology is proposed for doing the analysis of
Turkish words with an affix stripping approach and
without using any lexicon. The rule-based and
agglutinative structure of the language allows Turkish to
be modeled with finite state machines (FSMs). In contrast
to the previous works, in this study, FSMs are formed by
using the morphotactic rules in reverse order. This paper
describes the steps of this new methodology including the
classification of the suffixes, the generation of the FSMs
for each suffix class and their unification into a main
machine to cooperate in the analysis.
… Conference on Intelligent Computing and Information …, 2009
In this work, we present a MT system from Turkmen to Turkish. Our system exploits the similarity ... more In this work, we present a MT system from Turkmen to Turkish. Our system exploits the similarity of the languages by using a modified version of direct translation method. However, the complex inflectional and derivational morphology of the Turkic languages necessitate special treatment for word-by-word translation model. We also employ morphology-aware multi-word processing and statistical disambiguation processes in our system. We believe that this approach is valid for most of the Turkic languages and the architecture implemented using FSTs can be easily extended to those languages.
This paper describes the implementation of a two-level morphological analyzer for the Turkmen Lan... more This paper describes the implementation of a two-level
morphological analyzer for the Turkmen Language. Like all Turkic languages,
the Turkmen Language is an agglutinative language that has productive
inflectional and derivational suffixes. In this work, we implemented a finitestate
two-level mo
Bilgisayar ve Bilisim alanindaki gelismeler yeni uzmanlik alanlarinin dogmasina neden olmustur. B... more Bilgisayar ve Bilisim alanindaki gelismeler yeni uzmanlik alanlarinin dogmasina neden olmustur. Bu mesleklerde yetismis elemanlara duyulan gereksinim egitim kurumlarinin da kendilerini bu gelismeleri karsilayacak bicimde duzenlemelerini gerektirmistir. Bilgisayar ve bilisim alaninda, Dunya genelindeki gelismeler ve buna bagli olarak dogan yetismis eleman gereksinimi, degisik mesleklerde yetismis kisileri bu alanda calismaya tesvik etmistir. Bu durum, asil meslek sahiplerinin yeteneklerinin tartisilmasina neden olmaya baslamistir. Bilgisayar ve bilisim alaninda, Dunya genelindeki gelismeler ve buna bagli olarak dogan yetismis eleman gereksinimi, degisik mesleklerde yetismis kisileri bu alanda calismaya tesvik etmistir. Bu durum, asil meslek sahiplerinin yeteneklerinin tartisilmasina neden olmaya baslamistir. Bu yazi kapsaminda, ilk olarak bilgisayar ve bilisim alanindaki uzmanlik alanlarinin tanimlari verilmis; ardindan ulkemizin gereksinimleri goz onune alinarak yapilmasi gerekenle...
—In this paper we present a framework for extraction of Turkish phrases and their concepts. The o... more —In this paper we present a framework for extraction of Turkish phrases and their concepts. The objective of the study is meeting the requirement of sources for Turkish Semantic Extractions and represent a Turkish sentence at phrase-concept level. The semantic and grammatical analysis of a sentence is a basic content of Natural Language Processing (NLP) which is a branch of Artificial Intelligence (AI). In our study Turkish Phrase-Content Finding system is formed as a source for the other application areas in NLP. This system can be used in Summarization Systems, Information Extraction, Automatic Question Answering System, Semantic Role Labeling, and other semantic application.
Lecture Notes in Electrical Engineering, 2013
ABSTRACT In this paper, the effect of different windowing schemes on word sense disambiguation ac... more ABSTRACT In this paper, the effect of different windowing schemes on word sense disambiguation accuracy is presented. Turkish Lexical Sample Dataset has been used in the experiments. We took the samples of ambiguous verbs and nouns of the dataset and used bag-of-word properties as context information. The experi-ments have been repeated for different window sizes based on several machine learning algorithms. We follow 2/3 splitting strategy (2/3 for training, 1/3 for test-ing) and determine the most frequently used words in the training part. After re-moving stop words, we repeated the experiments by using most frequent 100, 75, 50 and 25 content words of the training data. Our findings show that the usage of most frequent 75 words as features improves the accuracy in results for Turkish verbs. Similar results have been obtained for Turkish nouns when we use the most frequent 100 words of the training set. Considering this information, selected al-gorithms have been tested on varying window sizes {30, 15, 10 and 5}. Our find-ings show that Naïve Bayes and Functional Tree methods yielded better accuracy results. And the window size \(\pm \)5 gives the best average results both for noun and the verb groups. It is observed that the best results of the two groups are 65.8 and 56 % points above the most frequent sense baseline of the verb and noun groups respectively.
In data mining and knowledge discovery, similarity between objects is one of the central concepts... more In data mining and knowledge discovery, similarity between objects is one of the central concepts. A measure of similarity can be user-defined, but an important problem is defining similarity on the basis of data. In this paper we introduce the problem of finding the pair-wise similarities of quantitative valued sequences where each sequence is a list of items. Traditional approaches for defining the similarity between two sequences typically consider only the binary values of items in sequences, not the quantitative values. Such similarity measure is often useful for finding the similarities between genes or protein sequences. However, they cannot reflect certain kinds of similarity where the sequences contain two different kinds of information type, such as quantitative and order information. However, such type of sequence data arise in many applications, for example, marketing and sales data or web log data may contain two different kinds of information. Therefore, we introduce a new similarity measure that takes into account the values of items in sequences. We give an algorithm for calculating the similarity between such quantitative sequences. Finally, we describe the results of using this approach on two different real-life datasets.
Özetçe Bu makalenin amacı, yaramaz (spam) epostaları, normal e-postalardan ayırma süreci için, ka... more Özetçe Bu makalenin amacı, yaramaz (spam) epostaları, normal e-postalardan ayırma süreci için, karar destek makineleri (Support Vector Machines -SVM), bellek tabanlı öğrenme (Memory Based Learning -MBL) ve Naïve Bayes (NB) yöntemlerinin karşılaştırmalı değerlendirmesini yapmaktır. Yaramaz e-postaların süzülmesinde kullanılan yöntemleri karşılaştıran birçok çalışma olmasına karşın, bu çalışmaların büyük çoğunluğu, farklı veri kümeleri kullandıklarından karşılaştırılabilir nitelikte değildir. Bu çalışmada, SVM, MBL ve NB yöntemleri karşılaştırılırken, herkesin erişimine açık olan ortak bir derlem (corpus) olan LINGSPAM derlemi kullanılmıştır. MBL ve NB yöntemleri, önceki çalışmalarda bu veri kümesi üzerinde sınandığı için, önceki deneylerden elde edilen en iyi parametreler ufak değişikliklerle kullanılmıştır. Ancak SVM yönteminin en iyi sonucu vermesini sağlamak için çok sayıda deney yapılmıştır. Çalışmamızda bir e-postanın, yaramaz olarak tanınması durumunda, bu e-postaya nasıl davranılacağına ilişkin senaryo önerileri verilmiş ve gerçeklenen sınıflandırıcıların hatalı çalışması durumunda ilgili senaryolara göre ortaya çıkabilecek hataların bedeli göz önüne alınarak bu üç sınıflandırma yöntemi değerlendirilmiştir. Ortaya çıkan sonuçlarda, SVM yönteminin hata bedelinin sıfır olduğu ya da yüksek olduğu senaryolar için başarımının diğer yöntemlerden daha iyi olduğu görülmüştür. Ancak hata bedelinin çok yüksek olması durumunda ise NB yöntemi en iyi sonucu vermiştir. Abstract This paper presents a comparison of support vector machines (SVM), memory-based learning (MBL) and Naïve Bayes (NB) techniques for the classification of legitimate and spam mails. Although there are a number of methodcomparative studies regarding spam mail filtering, most of the studies are tested on separate data sets. In order to evaluate the effectiveness of SVM, MBL and NB methods, we have used a common publicly available corpus (LINGSPAM). As MBL and NB methods are previously tested with this corpus, the obtained best parameters are used in the experiments with few changes. On the other hand, intense experiments are made to find the best attribute dimensions with SVMs. Results show that SVM has significantly better performance for no-cost and high-cost cases, but NB performs best when the cost is extremely high.
Özet Tümcenin anlamsal ve dilbilgisi açısından çözümlenmesi Doğal Dil İşleme (DDİ)'nin ana konula... more Özet Tümcenin anlamsal ve dilbilgisi açısından çözümlenmesi Doğal Dil İşleme (DDİ)'nin ana konulardan biridir. Çalışmamızda, tümcedeki temel dilbilgisi ve anlamsal yanlışları saptamak için yüklemi temel alan yeni bir yöntem önerilmektedir. Türkçe tümcede yüklem özne ve zaman bilgisi içerir. Ayrıca yüklem, o tümcenin hangi öbeklerden oluşabileceği konusunda da belirleyicidir. Örneğin, " büyümek " yüklemi tümce içinde nesne almazken,-de ekiyle biten dolaylı tümleç öbeğini alır. Yüklem ayrıca her bir öbeğin içereceği kavram hakkında da bilgi içermektedir. Örneğin " düşünmek " yüklemi insanlara özgüdür. Dolayısıyla özne olarak insan kavramıyla ilişkilidir. Bu saptamalardan yola çıkarak çalışmamızda, tümcelerin öbekleri bulunmuş; her bir öbeğin hangi kavramla ilişkili olduğu belirlenmiş ve tümcenin dilbilgisi çözümlemesini ve anlam çözümlemesini yapan bir model tasarlanmıştır. Vectorial Approach For Analysing Turkish Sentence Keywords Grammatical and semantic analysis, phrase-concept and verb compatibility, vector representation of sentence. Abstract The grammatical and semantic analysis of the sentence is one of the main subjects of Natural Language Processing (NLP). In this paper, we present a novel method to detect basic grammatical and semantic disorders by concentrating on the predicate. In Turkish, the predicate includes information about the subject and tense. The predicate also helps identify the phrases which make up the sentence. For example, " büyümek " (to grow) does not take an object, but it can take a locative phrase ending with the suffix "-de ". The predicate is also informative about the semantic concept of a phrase. For example " düşünmek " (to think) is specifically an action performed by a human, so the subject will be related with the concept of a human. With these properties considered, a model has been designed to find phrases in a sentence, identify their relations to specific concepts, and analyze the sentences grammatically and semantically.
This paper presents the design and the implementation of a morphological analyzer for Turkish. A ... more This paper presents the design and the implementation of
a morphological analyzer for Turkish. A new
methodology is proposed for doing the analysis of
Turkish words with an affix stripping approach and
without using any lexicon. The rule-based and
agglutinative structure of the language allows Turkish to
be modeled with finite state machines (FSMs). In contrast
to the previous works, in this study, FSMs are formed by
using the morphotactic rules in reverse order. This paper
describes the steps of this new methodology including the
classification of the suffixes, the generation of the FSMs
for each suffix class and their unification into a main
machine to cooperate in the analysis.
… Conference on Intelligent Computing and Information …, 2009
In this work, we present a MT system from Turkmen to Turkish. Our system exploits the similarity ... more In this work, we present a MT system from Turkmen to Turkish. Our system exploits the similarity of the languages by using a modified version of direct translation method. However, the complex inflectional and derivational morphology of the Turkic languages necessitate special treatment for word-by-word translation model. We also employ morphology-aware multi-word processing and statistical disambiguation processes in our system. We believe that this approach is valid for most of the Turkic languages and the architecture implemented using FSTs can be easily extended to those languages.
This paper describes the implementation of a two-level morphological analyzer for the Turkmen Lan... more This paper describes the implementation of a two-level
morphological analyzer for the Turkmen Language. Like all Turkic languages,
the Turkmen Language is an agglutinative language that has productive
inflectional and derivational suffixes. In this work, we implemented a finitestate
two-level mo