sonal P patil | SSBT, COET,North Maharashtra University, India (original) (raw)

Papers by sonal P patil

Research paper thumbnail of Inferring User Search Goals with feedback Sessions using Fuzzy K-Means Algorithm

International Journal Of Engineering And Computer Science, 2016

This document shows the concept for a broad topic and ambiguous query, different types of users m... more This document shows the concept for a broad topic and ambiguous query, different types of users may have different search goals when they submit the query to the search engine. The inference and analysis of user search goals can be very useful in improving search engine relevance information and user experience. In this paper, we propose a novel approach to infer user search goals by analyzing search engine query logs. First, we propose a framework to search different user search goals for a query by making cluster to the proposed feedback sessions. Feedback sessions are constructed from user click-through logs i.e. user response and can efficiently reflect the information needs to users. Second, we propose a novel approach to create pseudo-documents to better represent the feedback sessions for clustering. Finally, we propose a new criterion Classified Average Precision (CAP) to calculate the performance of inferring user search goals. Experimental results are presented using user click-through logs from a commercial search engine to check the effectiveness of our proposed methods.

Research paper thumbnail of Oblique Decision Tree Learning Approaches - A Critical Review

International Journal of Computer Applications, 2013

Decision tree classification techniques are currently gaining increasing impact especially in the... more Decision tree classification techniques are currently gaining increasing impact especially in the light of the ongoing growth of data mining services. A central challenge for the decision tree classification is the identification of split rule and correct attributes. In this context, the article aims at presenting the current state of research on different techniques for classification using oblique decision tree. A variation to the traditional approach is the called oblique decision tree or multivariate decision tree, which allows multivariate tests in its non-terminal nodes. Univariate trees can only perform axis-parallel splits, whereas Oblique decision trees can model the decision boundaries that are oblique to attribute axis. The majority of these decision tree induction algorithms performs a top-down growing tree strategy and relay on an impuritybased measure for splitting nodes criteria. In this context, the article aims at presenting the current state of research on different techniques for Oblique Decision Tree classification. For this, the paper analyzes various traditional Multivariate and Oblique Decision Tree algorithms CART, OC1 as well as standard SVM, GDT implementation.

Research paper thumbnail of International Journal of Innovations in Engineering and Technology (IJIET) Two-Step Approach for Acquiring Semantic Relations from Textual Web Content

study of meanings of a particular word in different context. With a vast growth of a World Wide W... more study of meanings of a particular word in different context. With a vast growth of a World Wide Web, we face an increasing amount of information resources. Mining semantic relations from such a vast resources is quite difficult & different task from a normal text mining. The normal Text mining techniques are not sufficient for the knowledge discovery as these techniques simply transforms the free text into a group of words representation and hence does not preserve any semantics. In this paper, we are presenting two-step procedure to mine semantic relations from a textual web content. The procedures are – RDF – Resource Description Framework And GP-Close – Generalized Pattern mining algorithm. RDF is language specification, used to extract a metadata in the form of RDF statements representing semantic relations from raw data. For this purpose Natural Language processing techniques i.e. Myriad will be used. Once the metadata representing semantic relations is extracted, a novel Gener...

Research paper thumbnail of Geometric Approach for Induction of Oblique Decision Tree

In this paper we present new algorithm for oblique deision tree induction. We propose new classif... more In this paper we present new algorithm for oblique deision tree induction. We propose new classifier that performs better than the other decision tree approaches in terms of accuracy, size, time. Proposed algorithm uses geometric structure in the data for assessing the hyper planes. At each node of the decision tree, we suggest the clustering hyper planes for both the classes and using this representation their angle bisectors is selected as split rule at that node. The algorithm we present here is applicable for2-class and multiclass problems. Through empirical investigation we demonstrate that this idea leads to small decision trees and better performance. We also present some analysis to show that the angle bisectors of clustering hyperplanes that we use as the split rules at each node are solutions of an interesting optimization problem and hence argue that classifier obtained with new approach is as good and novel classification method. Keywordsoblique decision tree,CART,OC1,SV...

Research paper thumbnail of Two-Step Approach for Acquiring Semantic Relations from Textual Web Content

Semantic, is one of the most important & wide spread category of Natural Language Processing, rel... more Semantic, is one of the most important & wide spread category of Natural Language Processing, related to study of meanings of a particular word in different context. With a vast growth of a World Wide Web, we face an increasing amount of information resources. Mining semantic relations from such a vast resources is quite difficult & different task from a normal text mining. The normal Text mining techniques are not sufficient for the knowledge discovery as these techniques simply transforms the free text into a group of words representation and hence does not preserve any semantics. In this paper, we are presenting two-step procedure to mine semantic relations from a textual web content. The procedures are – RDF – Resource Description Framework And GP-Close – Generalized Pattern mining algorithm. RDF is language specification, used to extract a metadata in the form of RDF statements representing semantic relations from raw data. For this purpose Natural Language processing technique...

Research paper thumbnail of Survey on How to Read a Dendrogram

Research paper thumbnail of Classification of Cotton Leaf Spot Disease Using Support Vector Machine

In order to obtain more value added products, a product quality control is essentially required M... more In order to obtain more value added products, a product quality control is essentially required Many studies show that quality of agriculture products may be reduced from many causes. One of the most important factors of such quality plant diseases. Consequently, minimizing plant diseases allows substantially improving quality of the product Suitable diagnosis of crop disease in the field is very critical for the increased production. Foliar is the major important fungal disease of cotton and occurs in all growing Indian cotton regions. In this paper I express Technological Strategies uses mobile captured symptoms of Cotton Leaf Spot images and categorize the diseases using support vector machine. The classifier is being trained to achieve intelligent farming, including early detection of disease in the groves, selective fungicide application, etc. This proposed work is based on Segmentation techniques in which, the captured images are processed for enrichment first. Then texture an...

Research paper thumbnail of Classification and Grading of Wheat Granules using SVM and Naive Bayes Classifier

India is the second leading producer of wheat in the world. Specifying the quality of wheat manua... more India is the second leading producer of wheat in the world. Specifying the quality of wheat manually is very time consuming and requires an expert judgment. With the help of image processing techniques, a system can be made to avoid the human inspection. Classification of wheat grains is carried out according to their grades to determine the quality. Images are acquired for wheat grains using digital camera. Conversions to gray scale, Smoothing, Thresholding, Canny edge detection are the checks that are performed on the acquired image using image processing technique. Classification and Grading of wheat grain is carried out by extracting morphological, color and texture features. These features are given to SVM and Naive Bayes Classifier for classification. To evaluate the classification accuracy, from the total of 1300 data sets 50% were used for training and the remaining 50% was used for testing. The classification system was supervised corresponding to the predefined classes of ...

Research paper thumbnail of Digital Image Forgery Detection Using Passive Techniques by Means of Keypoint Classification

In today’s era manipulation of image has become a simple task because of advanced photo editing s... more In today’s era manipulation of image has become a simple task because of advanced photo editing software packages as well as the capturing devices having high resolution. Verification of the truthfulness of images as well as detection of tampering without having the extra prior knowledge of image content is a significant research field. An attempt is made to review the recent developments in digital image forgery detection. Passive methods do not require prior information about the image. In this paper first various image forgery detection techniques are classified and then its general structure is developed. Passive image authentication overview is presented and the existing passive forgery detection techniques are reviewed. The present status of image forgery detection technique is discussed along with a recommendation for future research. In this paper the effort has been made for finding the best forgery detection algorithm such as SIFT for identifying the manipulated region. Th...

Research paper thumbnail of Retrieving Content Based Images with Query point technique based on K-mean Clustering

This paper provides an overview of the technical achievements in the research area of k-means clu... more This paper provides an overview of the technical achievements in the research area of k-means clustering in content-based image retrieval (CBIR). K-means clustering is a powerful technique in CBIR systems, in order to improve the performance of CBIR effectively. It is an open research area to the researcher to reduce the local maximum trapping and number of iteration to find target image. The paper covers the current state of art of the research in k-means clustering in CBIR and query point method for the system.

Research paper thumbnail of A Survey on methods of Trustworthiness towards Artificial Intelligence

International Journal of Computer Applications

Research paper thumbnail of Design of a machine learning layer to optimize forgery detection in videos

Journal of Interdisciplinary Mathematics

Abstract With the advent of so many YouTubers, the number of videos uploaded to the cloud is incr... more Abstract With the advent of so many YouTubers, the number of videos uploaded to the cloud is increasing at an exponential rate. Most of the video creators take advantage of the fact, that the videos uploaded are not checked for any kind of tampering, which is due to the computational and information limit over the cloud. Due to this, there is lot of fake-like videos over the internet, which misleads the viewers into believing whatever the video creator conveys. All this leads to a lot of chaos and misunderstanding for the viewers at large. To overcome this issue, this text proposes a novel machine learning layer which is computationally efficient and uses a two-layer processing scheme for forgery detection in large scale videos. With the help of this scheme, the system pre-learns the forgery types, and keeps a large learning set ready for speedy reference, this initially static set, is then dynamically modified based on the real time videos given to the system. This two-layer scheme allows for an improved end-to-end delay for the system, without compromising on the system security. Our evaluations on the REWIND dataset show that the proposed system gives more than 95% accuracy in terms of forgery detection, and reduces the delay by more than 50% when compared to other state-of-the art algorithms like key point selection, genetic algorithm and others.

Research paper thumbnail of Optimization of Association Rule in Horizontally Distributed Database using Unique Key Value

International Journal of Computer Applications, 2016

The proposed work describes the optimization of Association Rule for the distributed databases in... more The proposed work describes the optimization of Association Rule for the distributed databases in terms of speed, memory used while transaction of distribution as well as extracting the data from various data sources in the network. The proposed work shall have two parts including distribution of data using Association rule and ensuring the search to be redirected to specific source based on key values used to create the sub set in association rule. The availability shall be tested by verifying if the specific source is ready or not if not the search for that part shall only be carried out on the server itself.

Research paper thumbnail of Botnet-A Network Threat

Ijca Proceedings on International Conference on Recent Trends in Information Technology and Computer Science, Mar 10, 2012

Botnet are network threats that generally occur from cyber attacks, which results in serious thre... more Botnet are network threats that generally occur from cyber attacks, which results in serious threats to our network assets and organization"s properties. Botnets are collections of compromised computers (Bots) which are remotely controlled by its originator (BotMaster) under a common Command-and-Control (C&C) infrastructure. Among the various forms of malware, botnets are emerging as the most serious threat against cyber-security as they provide a distributed platform for several illegal activities such as launching distributed denial of service attacks against critical targets, malware dissemination, phishing, and click fraud. The most important characteristic of botnets is the use of command and control channels through which they can be updated and directed. The target of the botnet attacks on the integrity and resources of users might be multifarious; including the teenagers evidencing their hacking skills to organized criminal syndicates, disabling the infrastructure and causing financial damage to organizations and governments. In this context, it is crucial to know in what ways the system could be targeted. The major advantage of this classification is to identify the problem and find the specific ways of defense and recovery. This paper aims to provide a concise overview of major existing types of Botnets on the basis of attacking techniques.

Research paper thumbnail of Evaluating Student Descriptive Answers Using Natural Language Processing

Computer Assisted Assessment of free-text answers has established a great deal of work during the... more Computer Assisted Assessment of free-text answers has established a great deal of work during the last years due to the need of evaluating the deep understanding of the lessons' concepts that, according to most educators and researchers, cannot be done by simple MCQ testing. In this paper we have reviewed the techniques underpinned this system, the description of currently available systems for marking short free text response and finally proposed a system that would evaluate the descriptive type answers using Natural Language Processing.

Research paper thumbnail of Study Climate and Impact of Ict in Cultivation of Crops in Yawal Taluka, Khandesh Region

International Journal of Research in Engineering and Technology, 2016

India is agriculture based country having two different conditions of farming and farmers due to ... more India is agriculture based country having two different conditions of farming and farmers due to natural irregularity now a day. Here in this project we are going to study the role of Information Communication Technology (ICT) in the development of farmers in Yawal Taluka in Khandesh region. Different ICTs are available for agriculture sector like SMS, Tele calling, TV, News letters, Magazines, call centers etc. After studying this we can conclude whether farmers are using available ICTs or not or whether there is problem in using these ICTs due to lack of infrastructure and facilities in villages.

Research paper thumbnail of Study Paper on How to Read a Dendrogram

International Journal of Computer Applications, 2014

Research paper thumbnail of Inferring User Search Goals with feedback Sessions using Fuzzy K-Means Algorithm

International Journal Of Engineering And Computer Science, 2016

This document shows the concept for a broad topic and ambiguous query, different types of users m... more This document shows the concept for a broad topic and ambiguous query, different types of users may have different search goals when they submit the query to the search engine. The inference and analysis of user search goals can be very useful in improving search engine relevance information and user experience. In this paper, we propose a novel approach to infer user search goals by analyzing search engine query logs. First, we propose a framework to search different user search goals for a query by making cluster to the proposed feedback sessions. Feedback sessions are constructed from user click-through logs i.e. user response and can efficiently reflect the information needs to users. Second, we propose a novel approach to create pseudo-documents to better represent the feedback sessions for clustering. Finally, we propose a new criterion Classified Average Precision (CAP) to calculate the performance of inferring user search goals. Experimental results are presented using user click-through logs from a commercial search engine to check the effectiveness of our proposed methods.

Research paper thumbnail of Oblique Decision Tree Learning Approaches - A Critical Review

International Journal of Computer Applications, 2013

Decision tree classification techniques are currently gaining increasing impact especially in the... more Decision tree classification techniques are currently gaining increasing impact especially in the light of the ongoing growth of data mining services. A central challenge for the decision tree classification is the identification of split rule and correct attributes. In this context, the article aims at presenting the current state of research on different techniques for classification using oblique decision tree. A variation to the traditional approach is the called oblique decision tree or multivariate decision tree, which allows multivariate tests in its non-terminal nodes. Univariate trees can only perform axis-parallel splits, whereas Oblique decision trees can model the decision boundaries that are oblique to attribute axis. The majority of these decision tree induction algorithms performs a top-down growing tree strategy and relay on an impuritybased measure for splitting nodes criteria. In this context, the article aims at presenting the current state of research on different techniques for Oblique Decision Tree classification. For this, the paper analyzes various traditional Multivariate and Oblique Decision Tree algorithms CART, OC1 as well as standard SVM, GDT implementation.

Research paper thumbnail of International Journal of Innovations in Engineering and Technology (IJIET) Two-Step Approach for Acquiring Semantic Relations from Textual Web Content

study of meanings of a particular word in different context. With a vast growth of a World Wide W... more study of meanings of a particular word in different context. With a vast growth of a World Wide Web, we face an increasing amount of information resources. Mining semantic relations from such a vast resources is quite difficult & different task from a normal text mining. The normal Text mining techniques are not sufficient for the knowledge discovery as these techniques simply transforms the free text into a group of words representation and hence does not preserve any semantics. In this paper, we are presenting two-step procedure to mine semantic relations from a textual web content. The procedures are – RDF – Resource Description Framework And GP-Close – Generalized Pattern mining algorithm. RDF is language specification, used to extract a metadata in the form of RDF statements representing semantic relations from raw data. For this purpose Natural Language processing techniques i.e. Myriad will be used. Once the metadata representing semantic relations is extracted, a novel Gener...

Research paper thumbnail of Geometric Approach for Induction of Oblique Decision Tree

In this paper we present new algorithm for oblique deision tree induction. We propose new classif... more In this paper we present new algorithm for oblique deision tree induction. We propose new classifier that performs better than the other decision tree approaches in terms of accuracy, size, time. Proposed algorithm uses geometric structure in the data for assessing the hyper planes. At each node of the decision tree, we suggest the clustering hyper planes for both the classes and using this representation their angle bisectors is selected as split rule at that node. The algorithm we present here is applicable for2-class and multiclass problems. Through empirical investigation we demonstrate that this idea leads to small decision trees and better performance. We also present some analysis to show that the angle bisectors of clustering hyperplanes that we use as the split rules at each node are solutions of an interesting optimization problem and hence argue that classifier obtained with new approach is as good and novel classification method. Keywordsoblique decision tree,CART,OC1,SV...

Research paper thumbnail of Two-Step Approach for Acquiring Semantic Relations from Textual Web Content

Semantic, is one of the most important & wide spread category of Natural Language Processing, rel... more Semantic, is one of the most important & wide spread category of Natural Language Processing, related to study of meanings of a particular word in different context. With a vast growth of a World Wide Web, we face an increasing amount of information resources. Mining semantic relations from such a vast resources is quite difficult & different task from a normal text mining. The normal Text mining techniques are not sufficient for the knowledge discovery as these techniques simply transforms the free text into a group of words representation and hence does not preserve any semantics. In this paper, we are presenting two-step procedure to mine semantic relations from a textual web content. The procedures are – RDF – Resource Description Framework And GP-Close – Generalized Pattern mining algorithm. RDF is language specification, used to extract a metadata in the form of RDF statements representing semantic relations from raw data. For this purpose Natural Language processing technique...

Research paper thumbnail of Survey on How to Read a Dendrogram

Research paper thumbnail of Classification of Cotton Leaf Spot Disease Using Support Vector Machine

In order to obtain more value added products, a product quality control is essentially required M... more In order to obtain more value added products, a product quality control is essentially required Many studies show that quality of agriculture products may be reduced from many causes. One of the most important factors of such quality plant diseases. Consequently, minimizing plant diseases allows substantially improving quality of the product Suitable diagnosis of crop disease in the field is very critical for the increased production. Foliar is the major important fungal disease of cotton and occurs in all growing Indian cotton regions. In this paper I express Technological Strategies uses mobile captured symptoms of Cotton Leaf Spot images and categorize the diseases using support vector machine. The classifier is being trained to achieve intelligent farming, including early detection of disease in the groves, selective fungicide application, etc. This proposed work is based on Segmentation techniques in which, the captured images are processed for enrichment first. Then texture an...

Research paper thumbnail of Classification and Grading of Wheat Granules using SVM and Naive Bayes Classifier

India is the second leading producer of wheat in the world. Specifying the quality of wheat manua... more India is the second leading producer of wheat in the world. Specifying the quality of wheat manually is very time consuming and requires an expert judgment. With the help of image processing techniques, a system can be made to avoid the human inspection. Classification of wheat grains is carried out according to their grades to determine the quality. Images are acquired for wheat grains using digital camera. Conversions to gray scale, Smoothing, Thresholding, Canny edge detection are the checks that are performed on the acquired image using image processing technique. Classification and Grading of wheat grain is carried out by extracting morphological, color and texture features. These features are given to SVM and Naive Bayes Classifier for classification. To evaluate the classification accuracy, from the total of 1300 data sets 50% were used for training and the remaining 50% was used for testing. The classification system was supervised corresponding to the predefined classes of ...

Research paper thumbnail of Digital Image Forgery Detection Using Passive Techniques by Means of Keypoint Classification

In today’s era manipulation of image has become a simple task because of advanced photo editing s... more In today’s era manipulation of image has become a simple task because of advanced photo editing software packages as well as the capturing devices having high resolution. Verification of the truthfulness of images as well as detection of tampering without having the extra prior knowledge of image content is a significant research field. An attempt is made to review the recent developments in digital image forgery detection. Passive methods do not require prior information about the image. In this paper first various image forgery detection techniques are classified and then its general structure is developed. Passive image authentication overview is presented and the existing passive forgery detection techniques are reviewed. The present status of image forgery detection technique is discussed along with a recommendation for future research. In this paper the effort has been made for finding the best forgery detection algorithm such as SIFT for identifying the manipulated region. Th...

Research paper thumbnail of Retrieving Content Based Images with Query point technique based on K-mean Clustering

This paper provides an overview of the technical achievements in the research area of k-means clu... more This paper provides an overview of the technical achievements in the research area of k-means clustering in content-based image retrieval (CBIR). K-means clustering is a powerful technique in CBIR systems, in order to improve the performance of CBIR effectively. It is an open research area to the researcher to reduce the local maximum trapping and number of iteration to find target image. The paper covers the current state of art of the research in k-means clustering in CBIR and query point method for the system.

Research paper thumbnail of A Survey on methods of Trustworthiness towards Artificial Intelligence

International Journal of Computer Applications

Research paper thumbnail of Design of a machine learning layer to optimize forgery detection in videos

Journal of Interdisciplinary Mathematics

Abstract With the advent of so many YouTubers, the number of videos uploaded to the cloud is incr... more Abstract With the advent of so many YouTubers, the number of videos uploaded to the cloud is increasing at an exponential rate. Most of the video creators take advantage of the fact, that the videos uploaded are not checked for any kind of tampering, which is due to the computational and information limit over the cloud. Due to this, there is lot of fake-like videos over the internet, which misleads the viewers into believing whatever the video creator conveys. All this leads to a lot of chaos and misunderstanding for the viewers at large. To overcome this issue, this text proposes a novel machine learning layer which is computationally efficient and uses a two-layer processing scheme for forgery detection in large scale videos. With the help of this scheme, the system pre-learns the forgery types, and keeps a large learning set ready for speedy reference, this initially static set, is then dynamically modified based on the real time videos given to the system. This two-layer scheme allows for an improved end-to-end delay for the system, without compromising on the system security. Our evaluations on the REWIND dataset show that the proposed system gives more than 95% accuracy in terms of forgery detection, and reduces the delay by more than 50% when compared to other state-of-the art algorithms like key point selection, genetic algorithm and others.

Research paper thumbnail of Optimization of Association Rule in Horizontally Distributed Database using Unique Key Value

International Journal of Computer Applications, 2016

The proposed work describes the optimization of Association Rule for the distributed databases in... more The proposed work describes the optimization of Association Rule for the distributed databases in terms of speed, memory used while transaction of distribution as well as extracting the data from various data sources in the network. The proposed work shall have two parts including distribution of data using Association rule and ensuring the search to be redirected to specific source based on key values used to create the sub set in association rule. The availability shall be tested by verifying if the specific source is ready or not if not the search for that part shall only be carried out on the server itself.

Research paper thumbnail of Botnet-A Network Threat

Ijca Proceedings on International Conference on Recent Trends in Information Technology and Computer Science, Mar 10, 2012

Botnet are network threats that generally occur from cyber attacks, which results in serious thre... more Botnet are network threats that generally occur from cyber attacks, which results in serious threats to our network assets and organization"s properties. Botnets are collections of compromised computers (Bots) which are remotely controlled by its originator (BotMaster) under a common Command-and-Control (C&C) infrastructure. Among the various forms of malware, botnets are emerging as the most serious threat against cyber-security as they provide a distributed platform for several illegal activities such as launching distributed denial of service attacks against critical targets, malware dissemination, phishing, and click fraud. The most important characteristic of botnets is the use of command and control channels through which they can be updated and directed. The target of the botnet attacks on the integrity and resources of users might be multifarious; including the teenagers evidencing their hacking skills to organized criminal syndicates, disabling the infrastructure and causing financial damage to organizations and governments. In this context, it is crucial to know in what ways the system could be targeted. The major advantage of this classification is to identify the problem and find the specific ways of defense and recovery. This paper aims to provide a concise overview of major existing types of Botnets on the basis of attacking techniques.

Research paper thumbnail of Evaluating Student Descriptive Answers Using Natural Language Processing

Computer Assisted Assessment of free-text answers has established a great deal of work during the... more Computer Assisted Assessment of free-text answers has established a great deal of work during the last years due to the need of evaluating the deep understanding of the lessons' concepts that, according to most educators and researchers, cannot be done by simple MCQ testing. In this paper we have reviewed the techniques underpinned this system, the description of currently available systems for marking short free text response and finally proposed a system that would evaluate the descriptive type answers using Natural Language Processing.

Research paper thumbnail of Study Climate and Impact of Ict in Cultivation of Crops in Yawal Taluka, Khandesh Region

International Journal of Research in Engineering and Technology, 2016

India is agriculture based country having two different conditions of farming and farmers due to ... more India is agriculture based country having two different conditions of farming and farmers due to natural irregularity now a day. Here in this project we are going to study the role of Information Communication Technology (ICT) in the development of farmers in Yawal Taluka in Khandesh region. Different ICTs are available for agriculture sector like SMS, Tele calling, TV, News letters, Magazines, call centers etc. After studying this we can conclude whether farmers are using available ICTs or not or whether there is problem in using these ICTs due to lack of infrastructure and facilities in villages.

Research paper thumbnail of Study Paper on How to Read a Dendrogram

International Journal of Computer Applications, 2014