Yogesh Awate - Academia.edu (original) (raw)

Papers by Yogesh Awate

IETE Technical Review, 2006

Most of the current search engines are keyword based. They do not consider the meaning of the que... more Most of the current search engines are keyword based. They do not consider the meaning of the query posed to them and hence can be ineffective. In contrast to this, Agro Explorer [1] - a multilingual, meaning based search engine in the agricultural domain first extracts the meaning of the query and then performs a search based on this extracted meaning. Hence, search can be carried out even if the language of the query is different from the language of the documents. The meaning is represented in the form of Universal Networking Language (UNL) Expressions. The search is carried out using UNL expression matching. The relevant documents are in the UNL form. The Deconverter converts these documents into the language of the user's choice using a lexicon and a rule base. In this paper, we will discuss the design of the Deconverter developed by us, with Marathi as the target language, for Agro Explorer. The deconversion proceeds through the following four stages: a) Syntax Planning; b) Lexical Replacement; c) Case Mark Insertion; d) Morph Generation.

2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2009

We consider the framework of a set of recently proposed two-timescale actor-critic algorithms for... more We consider the framework of a set of recently proposed two-timescale actor-critic algorithms for reinforcement-learning (RL) using the long-run average-reward criterion and linear feature-based value-function approximation. The actor and critic updates are based on stochastic policy-gradient ascent and temporal-difference algorithms, respectively. Unlike conventional RL algorithms, policy-gradient-based algorithms guarantee convergence even with value-function approximation but suffer due to high variance of the policy-gradient estimator. To minimize this variance for an existing algorithm, we derive a stochastic-gradient-based novel critic update. We propose a novel baseline structure for variance minimization of an estimator and derive an optimal baseline which makes the covariance matrix a zero matrix - the best achievable. We derive a novel actor update based on the optimal baseline deduced for an existing algorithm. We derive another novel actor update using the optimal baseline for an unbiased policy-gradient estimator which we deduce from the Policy-Gradient Theorem with Function Approximation. We obtain a novel variance-minimization-based interpretation for an existing algorithm. The computational results demonstrate that the proposed algorithms outperform the state-of-the-art on Garnet problems.

Technological Developments in Education and Automation, 2009

We consider the framework of a set of recently proposed two-timescale actor-critic algorithms for... more We consider the framework of a set of recently proposed two-timescale actor-critic algorithms for reinforcement-learning (RL) using the long-run average-reward criterion and linear feature-based value-function approximation. The actor and critic updates are based on stochastic policy-gradient ascent and temporal-difference algorithms, respectively. Unlike conventional RL algorithms, policy-gradient-based algorithms guarantee convergence even with value-function approximation but suffer due to high variance of the policy-gradient estimator. To minimize this variance for an existing algorithm, we derive a stochasticgradient-based novel critic update. We propose a novel baseline structure for variance minimization of an estimator and derive an optimal baseline which makes the covariance matrix a zero matrix – the best achievable. We derive a novel actor update based on the optimal baseline deduced for an existing algorithm. We derive another novel actor update using the optimal baseline for an unbiased policy-gradient estimator which we deduce from the Policy-Gradient Theorem with Function Approximation. We obtain a novel variance-minimization-based interpretation for an existing algorithm. The computational results demonstrate that the proposed algorithms outperform the state-of-the-art on Garnet problems.

2009 WRI Global Congress on Intelligent Systems, 2009

We consider the framework of a set of recently proposed two-timescale actor-critic algorithms for... more We consider the framework of a set of recently proposed two-timescale actor-critic algorithms for reinforcement-learning using the long-run average-reward criterion and linear feature-based value-function approximation. The actor update is based on the stochastic policy-gradient ascent rule. We derive a stochastic-gradient-based novel critic update to minimize the variance of the policy-gradient estimator used in the actor update. We propose a novel baseline structure for variance minimization of an estimator and derive an optimal baseline which makes the covariance matrix a zero matrix – the best achievable. We derive a novel actor update based on the optimal baseline deduced for an existing algorithm. We derive another novel actor update using the optimal baseline for an unbiased policy-gradient estimator which we deduce from the Policy-Gradient Theorem with Function Approximation. The computational results demonstrate that the proposed algorithms outperform the state-of-the-art on Garnet problems.

The corner polyhedron is described by minimal valid inequalities from maximal lattice-free convex... more The corner polyhedron is described by minimal valid inequalities from maximal lattice-free convex sets. For the Relaxed Corner Polyhedron (RCP) with two free integer variables and any number of non-negative continuous variables, it is known that such facet defining inequalities arise from maximal lattice-free splits, triangles and quadrilaterals. We improve on the tightest known upper bound for the approximation of the RCP, purely by minimal valid inequalities from maximal lattice-free quadrilaterals, from 2 to 1.71. We generalize the tightest known lower bound of 1.125 for the approximation of the RCP, purely by minimal valid inequalities from maximal lattice-free triangles, to an infinite subclass of quadrilaterals.

Mathematical Programming, 2014

We compare the relative strength of valid inequalities for the integer hull of the feasible regio... more We compare the relative strength of valid inequalities for the integer hull of the feasible region of mixed integer linear programs with two equality constraints, two unrestricted integer variables and any number of nonnegative continuous variables. In particular, we prove that the closure of Type 2 triangle (resp. Type 3 triangle; quadrilateral) inequalities, are all within a factor of 1.5 of the integer hull, and provide examples showing that the approximation factor is not less than 1.125. There is no fixed approximation ratio for split or Type 1 triangle inequalities however.

IETE Technical Review, 2006

2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2009

Technological Developments in Education and Automation, 2009

We consider the framework of a set of recently proposed two-timescale actor-critic algorithms for... more We consider the framework of a set of recently proposed two-timescale actor-critic algorithms for reinforcement-learning (RL) using the long-run average-reward criterion and linear feature-based value-function approximation. The actor and critic updates are based on stochastic policy-gradient ascent and temporal-difference algorithms, respectively. Unlike conventional RL algorithms, policy-gradient-based algorithms guarantee convergence even with value-function approximation but suffer due to high variance of the policy-gradient estimator. To minimize this variance for an existing algorithm, we derive a stochasticgradient-based novel critic update. We propose a novel baseline structure for variance minimization of an estimator and derive an optimal baseline which makes the covariance matrix a zero matrix – the best achievable. We derive a novel actor update based on the optimal baseline deduced for an existing algorithm. We derive another novel actor update using the optimal baseline for an unbiased policy-gradient estimator which we deduce from the Policy-Gradient Theorem with Function Approximation. We obtain a novel variance-minimization-based interpretation for an existing algorithm. The computational results demonstrate that the proposed algorithms outperform the state-of-the-art on Garnet problems.

2009 WRI Global Congress on Intelligent Systems, 2009

We consider the framework of a set of recently proposed two-timescale actor-critic algorithms for... more We consider the framework of a set of recently proposed two-timescale actor-critic algorithms for reinforcement-learning using the long-run average-reward criterion and linear feature-based value-function approximation. The actor update is based on the stochastic policy-gradient ascent rule. We derive a stochastic-gradient-based novel critic update to minimize the variance of the policy-gradient estimator used in the actor update. We propose a novel baseline structure for variance minimization of an estimator and derive an optimal baseline which makes the covariance matrix a zero matrix – the best achievable. We derive a novel actor update based on the optimal baseline deduced for an existing algorithm. We derive another novel actor update using the optimal baseline for an unbiased policy-gradient estimator which we deduce from the Policy-Gradient Theorem with Function Approximation. The computational results demonstrate that the proposed algorithms outperform the state-of-the-art on Garnet problems.

Mathematical Programming, 2014