apurva desai - Academia.edu (original) (raw)
Papers by apurva desai
International Journal of Medical and Health Sciences, 2012
Facial anthropometry has its well known implications in fields like forensic science and reconstr... more Facial anthropometry has its well known implications in fields like forensic science and reconstructive surgery. Facial index and face type provides an indication about the races and individuality. The present study aimed to examine Facial Index in Gujarati Males and to find out the distribution of their face type. Presence of the morphometric variations in the faces makes it necessary to have baseline data for reconstruction and forensic science. Mean facial index of Gujarati Male was 81.7 whereas dominant face type was Euriprosopic (42.96%) and rare type of face was Leptoprosopic (3.64%). We found significant variation in the face type in adult males of Gujarat.
Abstract—In recent times, the problem of Unsolicited Bulk Email (UBE) or commonly known as Spam E... more Abstract—In recent times, the problem of Unsolicited Bulk Email (UBE) or commonly known as Spam Email, has increased at a tremendous growth rate. We present an analysis of survey based on classifications of UBE in various research works. There are many research instances for classification between spam and non-spam emails but very few research instances are available for classification of spam emails, per se. This paper does not intend to assert some UBE classification to be better than the others nor does it propose any new classification but it bemoans the lack of harmony on number and definition of categories proposed by different researchers. The paper also elaborates on factors like intent of spammer, content of UBE and ambiguity in different categories as proposed in related research works of classifications of UBE.
2019 5th International Conference On Computing, Communication, Control And Automation (ICCUBEA), 2019
The segmentation of touching symbols is one of the key factors which decrease the performance of ... more The segmentation of touching symbols is one of the key factors which decrease the performance of the Optical Character Recognition (OCR) system. The existence of touching characters in the documents is a major problem of the effective character segmentation system. In this paper, we have presented an algorithm for the segmentation of frequently used handwritten Gujarati conjunctive characters into its constituent symbols and characters. A predictive algorithm is developed for selecting the possible cut column for the segmentation of conjunctive characters. This algorithm uses the structural properties of the Gujarati alphabet. The possible cut column is defined by using the information derived from the neighboring pixels. This algorithm covers 728 handwritten conjunctive characters of Gujarati Script. In this conjunctive characters are segmented into easily separable characters which can be further sent to the classifier for recognition.
Advances in Intelligent Systems and Computing, 2021
Recognition of numerals in an image is very difficult issue because there is no prior knowledge a... more Recognition of numerals in an image is very difficult issue because there is no prior knowledge about the color of numerals, distribution of lighting, noise, complexity of background, shape similarity etc. To the best of our knowledge, there is no work reported for segmentation and recognition of Gujarati printed numerals from an image. In this paper, we have discussed the problems of segmentation and recognition of Gujarati printed numerals from an image. We have also adopted simple approach to address this problem. We have used edge detection, dilation and connected component analysis for segmentation. Various heuristics are used to find candidate object for Gujarati numeral. For classification, template matching is used. The experimental results have proved its effectiveness. We have tested our model on different types of images and have achieved success of more than 95%. I.
2016 International Conference on Computing, Analytics and Security Trends (CAST), 2016
Recognition of fruits automatically using machine vision is considered as challenging task as fru... more Recognition of fruits automatically using machine vision is considered as challenging task as fruits exist in various colors, sizes, shapes and textures. Additionally, when images are acquired of them, variation is introduced due to imaging conditions also. In this paper we have recognized nine different classes of fruits. Fruit image dataset are obtained from web as well as certain images are acquired by using mobile phone camera. These images are pre-processed to subtract the background and extract the blob representing fruit. For representing fruits and capturing their visual characteristics, combination of color, shape and texture features are used. These feature dataset is further passed to two different classifiers; multiclass SVM and KNN. The experimental results obtained are used to draw various conclusions. The best accuracy obtained by us in the study is 91.3% with KNN (K=2), classifier whereas with multiclass SVM (one-versus-all), the best accuracy obtained is 86.96%.
Email has become an important means of electronic communication but the viability of its usage is... more Email has become an important means of electronic communication but the viability of its usage is marred by Un-solicited Bulk Email (UBE) messages. UBE poses technical and socio-economic challenges to usage of emails. Besides, the definition and understanding of UBE differs from one person to another. To meet these challenges and combat this menace, we need to understand UBE. Towards this end, this paper proposes a classifier for UBE documents. Technically, this is an application of un-structured document classification using text content analysis and we approach it using supervised machine learning technique. Our experiments show the success rate of proposed classifier is 98.50%. This is the first formal attempt to provide a novel tool for UBE classification and the empirical results show that the tool is strong enough to be implemented in real world.
Advances in Intelligent Systems and Computing
International Journal of Research in Advent Technology
The Gujarati language has large and complex character set and many characters have similar stroke... more The Gujarati language has large and complex character set and many characters have similar strokes, which makes OCR more challenging. Here we suggest a two-layer classification technique with SVM (RBF) and k-NN classifiers in order to propose a robust online handwritten character recognition for Gujarati language. In the first layer of classification, SVM classifier with the RBF kernel is used and in the second layer, k-NN classifier is used. The training data of second layer classifier is decided based on the outcome of first layer classifier. Training data of a group of characters which are similar to a character returned by first layer classifier, is supplied to k-NN classifier. A hybrid feature set consisting first and second order derivative of pixel values, zoning, and normalized chain code feature. The data set of around 12000 samples was generated from different writers. Around 2000 samples of data set is used for training and rest of the samples are used to test the system. The proposed system has obtained an average accuracy of 94.65% and an average processing time of 0.095 seconds per stroke.
International Journal of Computer Vision and Image Processing
In this article, an online handwritten word recognition system for the Gujarati language is prese... more In this article, an online handwritten word recognition system for the Gujarati language is presented by combining strokes, characters, punctuation marks, and diacritics. The authors have used a support vector machine classification algorithm with a radial basis function kernel. The authors used a hybrid features set. The hybrid feature set consists of directional features with curvature data. The authors have used a normalized chain code and zoning-based chain code features. Words are a combination of characters and diacritics. Recognized strokes require post-processing to form a word. The authors have used location-based and mapping rule-based post-processing methods. The authors have achieved an accuracy of 95.3% for individual characters, 91.5% for individual words, and 83.3% for sentences. The average processing time for individual characters is 0.071 seconds.
Email has become afast and cheap means of online communicction. The main threat to email is Unsol... more Email has become afast and cheap means of online communicction. The main threat to email is Unsolicited Bulk Email (UBE), commonly known as spam email. The cument work aims at identification of slang words in Pornogrophic UBE. The motives of the paper are manyfold. This is an attempt to better understand the LIBE, the slang and their interplay. The problem has been addressed by employing Tokenization technique and IJnigram BOW model. The current paper reports the first results on identification of 115 slang words from more than 1850 Pornographic UBE analyzed by us. To the best of our lcnowledge, this is the first attempt to identify slang words in corpus of Pornographic UBE.
In recent times, the problem of Unsolicited Bulk Email (UBE) or commonly known as Spam Email, has... more In recent times, the problem of Unsolicited Bulk Email (UBE) or commonly known as Spam Email, has increased at a tremendous growth rate. We present an analysis of survey based on classifications of UBE in various research works. There are many research instances for classification between spam and non-spam emails but very few research instances are available for classification of spam emails, per se. This paper does not intend to assert some UBE classification to be better than the others nor does it propose any new classification but it bemoans the lack of harmony on number and definition of categories proposed by different researchers. The paper also elaborates on factors like intent of spammer, content of UBE and ambiguity in different categories as proposed in related research works of classifications of UBE.
May-June 2011 Volume 5, Issue 4 ffi fournal of Computer Science Thispublicationisaneffortof Karpa... more May-June 2011 Volume 5, Issue 4 ffi fournal of Computer Science Thispublicationisaneffortof KarpagamCharityTrust.Theannual subscriptionforthisjournal is Rs. 1500.Additional infornration about this.iournal can be found on the back cover. The Karpagam Charity Trust publishes JCS bimonthly. Responsibility for the contents lies upon the authors and not upon the JCS. For copying or reprint permission, write to Copyright Department, JCS Administration,
Email has become a fast and cheap means of online communication. The main thr€at to email is Unso... more Email has become a fast and cheap means of online communication. The main thr€at to email is Unsolicited Bulk Email (UBE), commonly called spam email, The currcnt work aims at identification of unigrarns in mor€ than 2700 LJBE that advertise body-enhancement drugs. The identification is based on the rtquircment that the unigram is not prtsent in English dictionary and is a slang term. The motives of tle paper are many fold. This is an attempt to analyze spasming behavior and employment of word-mutation technique. On the sidelines of the paper, we have attempted to better understand thc Spam, the slang and their interplay. The problern has been addressed by employing Tokenization technique and Unigram BOW model. We found that the nonlexicon words constitute nearly 669/0 of total number of lexis of corpus whereas slang words constitute nearly 5.3470 of non-lexicon words. Further, non-lexicon slang unigrams composed of mutated form of single lexicon word, form more than 90% of the total number of such unigrams. To the best of our knowledge, this is the first attempt to analyze usage of non-lexicon slang unigrams in any kind of UBE. '
Email has become an important means of electronic communication but the viability of its usage is... more Email has become an important means of electronic communication but the viability of its usage is marred by Un-solicited Bulk Email (UBE) messages. UBE poses technical and socioeconomic challenges to usage of emails. Besides, the definition and understanding of UBE differs from one person to another. To meet these challenges and combat this menace, we need to understand UBE. Towards this end, this paper proposes a classifier for UBE documents. Technically, this is an application of un-structured document classification using text content analysis and we approach it using supervised machine learning technique. Our experiments show the success rate of proposed classifier is 98.50%. This is the first formal attempt to provide a novel tool for UBE classification and the empirical results show that the tool is strong enough to be implemented in real world.
Email has become a fast and cheap means of online communication. The main threat to email is Unso... more Email has become a fast and cheap means of online communication. The main threat to email is Unsolicited Bulk Email (UBE), commonly called spam email. The current work aims at identification of unigrams in more than 2700 UBE that advertise body-enhancement drugs. The identification is based on the requirement that the unigram is neither present in dictionary, nor is a slang term. The motives of the paper are many fold. This is an attempt to analyze spamming behaviour and employment of wordmutation technique. On the sidelines of the paper, we have attempted to better understand the spam, the slang and their interplay. The problem has been addressed by employing Tokenization technique and Unigram BOW model. We found that the non-lexicon words constitute nearly 66% of total number of lexis of corpus whereas non-slang words constitute nearly 2.4% of non-lexicon words. Further, non-lexicon non-slang unigrams composed of 2 lexicon words, form more than 71% of the total number of such unigrams. To the best of our knowledge, this is the first attempt to analyze usage of non-lexicon non-slang unigrams in any kind of UBE.
The pursuit of researctr has increased in recent times. The-results of scientilic research works ... more The pursuit of researctr has increased in recent times. The-results of scientilic research works are published in the iorm irf research papers in journals. There is no standardized and speci{ic value for factors like title length' number of pages and number of authors of the researeh paper. Thc.attention of seientific community is positivcly *ore on innovation in rescarch rathcr than these factors' The title of rcsearch paper is important, being the first point of inieraction between the writer and the reader. Also, the rvriting stylc of authors is different and is studied b-v the stylometric analysis of their writings. The current paper presents a detailcd comparison and analysis of research titles proposed try lndian and foreign authors. The paper elaborates on the analysis by employing 65 stylometric fe*tures for more than 28.{}0 research papers from various intcrnntional journals. 'We believe that this is the first formal attempt to provide such a dctailed investigation ofthe interplay of the stylometric features of research titles designed by Indian antl foreign authors. Keyrvords-Author, Research Paper, Stylometry, Title Length l.INTRODUCTION The current times have seen an increase in the pursuit of research" Co[rpared to times around a century back, this increase in recent times, owes to provision of more formal
International Journal of Medical and Health Sciences, 2012
Facial anthropometry has its well known implications in fields like forensic science and reconstr... more Facial anthropometry has its well known implications in fields like forensic science and reconstructive surgery. Facial index and face type provides an indication about the races and individuality. The present study aimed to examine Facial Index in Gujarati Males and to find out the distribution of their face type. Presence of the morphometric variations in the faces makes it necessary to have baseline data for reconstruction and forensic science. Mean facial index of Gujarati Male was 81.7 whereas dominant face type was Euriprosopic (42.96%) and rare type of face was Leptoprosopic (3.64%). We found significant variation in the face type in adult males of Gujarat.
Abstract—In recent times, the problem of Unsolicited Bulk Email (UBE) or commonly known as Spam E... more Abstract—In recent times, the problem of Unsolicited Bulk Email (UBE) or commonly known as Spam Email, has increased at a tremendous growth rate. We present an analysis of survey based on classifications of UBE in various research works. There are many research instances for classification between spam and non-spam emails but very few research instances are available for classification of spam emails, per se. This paper does not intend to assert some UBE classification to be better than the others nor does it propose any new classification but it bemoans the lack of harmony on number and definition of categories proposed by different researchers. The paper also elaborates on factors like intent of spammer, content of UBE and ambiguity in different categories as proposed in related research works of classifications of UBE.
2019 5th International Conference On Computing, Communication, Control And Automation (ICCUBEA), 2019
The segmentation of touching symbols is one of the key factors which decrease the performance of ... more The segmentation of touching symbols is one of the key factors which decrease the performance of the Optical Character Recognition (OCR) system. The existence of touching characters in the documents is a major problem of the effective character segmentation system. In this paper, we have presented an algorithm for the segmentation of frequently used handwritten Gujarati conjunctive characters into its constituent symbols and characters. A predictive algorithm is developed for selecting the possible cut column for the segmentation of conjunctive characters. This algorithm uses the structural properties of the Gujarati alphabet. The possible cut column is defined by using the information derived from the neighboring pixels. This algorithm covers 728 handwritten conjunctive characters of Gujarati Script. In this conjunctive characters are segmented into easily separable characters which can be further sent to the classifier for recognition.
Advances in Intelligent Systems and Computing, 2021
Recognition of numerals in an image is very difficult issue because there is no prior knowledge a... more Recognition of numerals in an image is very difficult issue because there is no prior knowledge about the color of numerals, distribution of lighting, noise, complexity of background, shape similarity etc. To the best of our knowledge, there is no work reported for segmentation and recognition of Gujarati printed numerals from an image. In this paper, we have discussed the problems of segmentation and recognition of Gujarati printed numerals from an image. We have also adopted simple approach to address this problem. We have used edge detection, dilation and connected component analysis for segmentation. Various heuristics are used to find candidate object for Gujarati numeral. For classification, template matching is used. The experimental results have proved its effectiveness. We have tested our model on different types of images and have achieved success of more than 95%. I.
2016 International Conference on Computing, Analytics and Security Trends (CAST), 2016
Recognition of fruits automatically using machine vision is considered as challenging task as fru... more Recognition of fruits automatically using machine vision is considered as challenging task as fruits exist in various colors, sizes, shapes and textures. Additionally, when images are acquired of them, variation is introduced due to imaging conditions also. In this paper we have recognized nine different classes of fruits. Fruit image dataset are obtained from web as well as certain images are acquired by using mobile phone camera. These images are pre-processed to subtract the background and extract the blob representing fruit. For representing fruits and capturing their visual characteristics, combination of color, shape and texture features are used. These feature dataset is further passed to two different classifiers; multiclass SVM and KNN. The experimental results obtained are used to draw various conclusions. The best accuracy obtained by us in the study is 91.3% with KNN (K=2), classifier whereas with multiclass SVM (one-versus-all), the best accuracy obtained is 86.96%.
Email has become an important means of electronic communication but the viability of its usage is... more Email has become an important means of electronic communication but the viability of its usage is marred by Un-solicited Bulk Email (UBE) messages. UBE poses technical and socio-economic challenges to usage of emails. Besides, the definition and understanding of UBE differs from one person to another. To meet these challenges and combat this menace, we need to understand UBE. Towards this end, this paper proposes a classifier for UBE documents. Technically, this is an application of un-structured document classification using text content analysis and we approach it using supervised machine learning technique. Our experiments show the success rate of proposed classifier is 98.50%. This is the first formal attempt to provide a novel tool for UBE classification and the empirical results show that the tool is strong enough to be implemented in real world.
Advances in Intelligent Systems and Computing
International Journal of Research in Advent Technology
The Gujarati language has large and complex character set and many characters have similar stroke... more The Gujarati language has large and complex character set and many characters have similar strokes, which makes OCR more challenging. Here we suggest a two-layer classification technique with SVM (RBF) and k-NN classifiers in order to propose a robust online handwritten character recognition for Gujarati language. In the first layer of classification, SVM classifier with the RBF kernel is used and in the second layer, k-NN classifier is used. The training data of second layer classifier is decided based on the outcome of first layer classifier. Training data of a group of characters which are similar to a character returned by first layer classifier, is supplied to k-NN classifier. A hybrid feature set consisting first and second order derivative of pixel values, zoning, and normalized chain code feature. The data set of around 12000 samples was generated from different writers. Around 2000 samples of data set is used for training and rest of the samples are used to test the system. The proposed system has obtained an average accuracy of 94.65% and an average processing time of 0.095 seconds per stroke.
International Journal of Computer Vision and Image Processing
In this article, an online handwritten word recognition system for the Gujarati language is prese... more In this article, an online handwritten word recognition system for the Gujarati language is presented by combining strokes, characters, punctuation marks, and diacritics. The authors have used a support vector machine classification algorithm with a radial basis function kernel. The authors used a hybrid features set. The hybrid feature set consists of directional features with curvature data. The authors have used a normalized chain code and zoning-based chain code features. Words are a combination of characters and diacritics. Recognized strokes require post-processing to form a word. The authors have used location-based and mapping rule-based post-processing methods. The authors have achieved an accuracy of 95.3% for individual characters, 91.5% for individual words, and 83.3% for sentences. The average processing time for individual characters is 0.071 seconds.
Email has become afast and cheap means of online communicction. The main threat to email is Unsol... more Email has become afast and cheap means of online communicction. The main threat to email is Unsolicited Bulk Email (UBE), commonly known as spam email. The cument work aims at identification of slang words in Pornogrophic UBE. The motives of the paper are manyfold. This is an attempt to better understand the LIBE, the slang and their interplay. The problem has been addressed by employing Tokenization technique and IJnigram BOW model. The current paper reports the first results on identification of 115 slang words from more than 1850 Pornographic UBE analyzed by us. To the best of our lcnowledge, this is the first attempt to identify slang words in corpus of Pornographic UBE.
In recent times, the problem of Unsolicited Bulk Email (UBE) or commonly known as Spam Email, has... more In recent times, the problem of Unsolicited Bulk Email (UBE) or commonly known as Spam Email, has increased at a tremendous growth rate. We present an analysis of survey based on classifications of UBE in various research works. There are many research instances for classification between spam and non-spam emails but very few research instances are available for classification of spam emails, per se. This paper does not intend to assert some UBE classification to be better than the others nor does it propose any new classification but it bemoans the lack of harmony on number and definition of categories proposed by different researchers. The paper also elaborates on factors like intent of spammer, content of UBE and ambiguity in different categories as proposed in related research works of classifications of UBE.
May-June 2011 Volume 5, Issue 4 ffi fournal of Computer Science Thispublicationisaneffortof Karpa... more May-June 2011 Volume 5, Issue 4 ffi fournal of Computer Science Thispublicationisaneffortof KarpagamCharityTrust.Theannual subscriptionforthisjournal is Rs. 1500.Additional infornration about this.iournal can be found on the back cover. The Karpagam Charity Trust publishes JCS bimonthly. Responsibility for the contents lies upon the authors and not upon the JCS. For copying or reprint permission, write to Copyright Department, JCS Administration,
Email has become a fast and cheap means of online communication. The main thr€at to email is Unso... more Email has become a fast and cheap means of online communication. The main thr€at to email is Unsolicited Bulk Email (UBE), commonly called spam email, The currcnt work aims at identification of unigrarns in mor€ than 2700 LJBE that advertise body-enhancement drugs. The identification is based on the rtquircment that the unigram is not prtsent in English dictionary and is a slang term. The motives of tle paper are many fold. This is an attempt to analyze spasming behavior and employment of word-mutation technique. On the sidelines of the paper, we have attempted to better understand thc Spam, the slang and their interplay. The problern has been addressed by employing Tokenization technique and Unigram BOW model. We found that the nonlexicon words constitute nearly 669/0 of total number of lexis of corpus whereas slang words constitute nearly 5.3470 of non-lexicon words. Further, non-lexicon slang unigrams composed of mutated form of single lexicon word, form more than 90% of the total number of such unigrams. To the best of our knowledge, this is the first attempt to analyze usage of non-lexicon slang unigrams in any kind of UBE. '
Email has become an important means of electronic communication but the viability of its usage is... more Email has become an important means of electronic communication but the viability of its usage is marred by Un-solicited Bulk Email (UBE) messages. UBE poses technical and socioeconomic challenges to usage of emails. Besides, the definition and understanding of UBE differs from one person to another. To meet these challenges and combat this menace, we need to understand UBE. Towards this end, this paper proposes a classifier for UBE documents. Technically, this is an application of un-structured document classification using text content analysis and we approach it using supervised machine learning technique. Our experiments show the success rate of proposed classifier is 98.50%. This is the first formal attempt to provide a novel tool for UBE classification and the empirical results show that the tool is strong enough to be implemented in real world.
Email has become a fast and cheap means of online communication. The main threat to email is Unso... more Email has become a fast and cheap means of online communication. The main threat to email is Unsolicited Bulk Email (UBE), commonly called spam email. The current work aims at identification of unigrams in more than 2700 UBE that advertise body-enhancement drugs. The identification is based on the requirement that the unigram is neither present in dictionary, nor is a slang term. The motives of the paper are many fold. This is an attempt to analyze spamming behaviour and employment of wordmutation technique. On the sidelines of the paper, we have attempted to better understand the spam, the slang and their interplay. The problem has been addressed by employing Tokenization technique and Unigram BOW model. We found that the non-lexicon words constitute nearly 66% of total number of lexis of corpus whereas non-slang words constitute nearly 2.4% of non-lexicon words. Further, non-lexicon non-slang unigrams composed of 2 lexicon words, form more than 71% of the total number of such unigrams. To the best of our knowledge, this is the first attempt to analyze usage of non-lexicon non-slang unigrams in any kind of UBE.
The pursuit of researctr has increased in recent times. The-results of scientilic research works ... more The pursuit of researctr has increased in recent times. The-results of scientilic research works are published in the iorm irf research papers in journals. There is no standardized and speci{ic value for factors like title length' number of pages and number of authors of the researeh paper. Thc.attention of seientific community is positivcly *ore on innovation in rescarch rathcr than these factors' The title of rcsearch paper is important, being the first point of inieraction between the writer and the reader. Also, the rvriting stylc of authors is different and is studied b-v the stylometric analysis of their writings. The current paper presents a detailcd comparison and analysis of research titles proposed try lndian and foreign authors. The paper elaborates on the analysis by employing 65 stylometric fe*tures for more than 28.{}0 research papers from various intcrnntional journals. 'We believe that this is the first formal attempt to provide such a dctailed investigation ofthe interplay of the stylometric features of research titles designed by Indian antl foreign authors. Keyrvords-Author, Research Paper, Stylometry, Title Length l.INTRODUCTION The current times have seen an increase in the pursuit of research" Co[rpared to times around a century back, this increase in recent times, owes to provision of more formal