A. Gosavi - Academia.edu (original) (raw)
Papers by A. Gosavi
International Journal of Control, Automation and Systems, 2011
We develop the theory for Markov and semi-Markov control using dynamic programming and reinforcem... more We develop the theory for Markov and semi-Markov control using dynamic programming and reinforcement learning in which a form of semi-variance which computes the variability of rewards below a pre-specified target is penalized. The objective is to optimize a function of the rewards and risk where risk is penalized. Penalizing variance, which is popular in the literature, has some drawbacks that can be avoided with semi-variance.
Springer Series in Reliability Engineering, 2010
Engineering Management Journal, 2011
Unexpected failures can reduce throughput-especially if the failure affects the process bottlenec... more Unexpected failures can reduce throughput-especially if the failure affects the process bottleneck. Furthermore, when a failure occurs, it usually takes longer to correct than a scheduled maintenance activity would, resulting in significantly higher costs. It has been empirically shown that preventive maintenance can reduce the frequency of unexpected failures and, if done at appropriate time intervals, can reduce the overall costs . TPM is now viewed as an integral part of regular operations in production firms. With the advent of computers and the cheap availability of personal computers in the last couple of decades, computerized maintenance systems have become increasingly popular in industry . Such systems make it easy to collect and maintain historical data of machine failures, their frequencies, and down-times due to repairs and maintenance. These databases can be used to determine parameters (such as distribution type, mean, etc.) of system failures, providing an excellent basis to model and improve the maintenance process. It is no exaggeration to state that production systems cannot remain healthy and productive without a good TPM program; however, developing effective operational strategies for TPM can be quite challenging because of numerous complicating factors, such as random failures of the different machines and pieces of equipment in a system, randomness in repair/maintenance times due to variability in the availability of spare parts and repairpersons, and the complex stochastic dynamics of production systems. The manager has to analyze the underlying stochastic processes, costs, and revenues, and a host of other factors in order to develop an effective TPM program.
We consider a manufacturing system that uses Automated Guided Vehicles (AGVs) for material handli... more We consider a manufacturing system that uses Automated Guided Vehicles (AGVs) for material handling. Increasing the capacity of the AGVs used can lead to increased throughput and reduced inventory-holding costs. However, AGVs are typically expensive. Hence modeling the economics of buying an additional vehicle is an important problem from the standpoint of making a system lean. Optimization of the AGV's capacity can be performed analytically only under some simplifying assumptions about the system. We present a simulation-optimization approach to determine the optimal capacity of an AGV in a closed loop path, where the AGV is used as a device for pick-up from machines and drop-off at a conveyor. Our focus in this work is on optimizing the capacity of the AGV.
Operations Research/Computer Science Interfaces Series, 2014
2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2011
Adaptive or actor critics are a class of reinforcement learning (RL) or approximate dynamic progr... more Adaptive or actor critics are a class of reinforcement learning (RL) or approximate dynamic programming (ADP) algorithms in which one searches over stochastic policies in order to determine the optimal deterministic policy. Classically, these algorithms have been studied for Markov decision processes (MDPs) in the context of model-free updates in which transition probabilities are avoided altogether. A model-free version for the semi-MDP (SMDP) for discounted reward in which the transition time of each transition can be a random variable was proposed in Gosavi [1]. In this paper, we propose a variant in which the transition probability model is built simultaneously with the value function and action-probability functions. While our new algorithm does not require the transition probabilities apriori, it generates them along with the estimation of the value function and the action-probability functions required in adaptive critics. Model-building and model-based versions of algorithms have numerous advantages in contrast to their modelfree counterparts. In particular, they are more stable and may require less training. However the additional steps of building the model may require increased storage in the computer's memory. In addition to enumerating potential application areas for our algorithm, we will analyze the advantages and disadvantages of model building.
BMJ case reports, 2015
A 23-year-old woman, gravida 2 para 0, presented at 8 weeks gestation with a spontaneously concei... more A 23-year-old woman, gravida 2 para 0, presented at 8 weeks gestation with a spontaneously conceived triplet cornual ectopic pregnancy. She was at high risk of ectopic pregnancy as she had been previously treated for pelvic inflammatory disease and had also undergone laparoscopic salpingostomy for right-sided ectopic pregnancy. She was clinically stable and her abdomen was soft and non-tender. The diagnosis was made on transvaginal ultrasound scan and this was confirmed on the three-dimensional scan. She was counselled about her treatment options and subsequently underwent laparoscopic cornual resection using the modified endoloop method. The estimated blood loss was 20 ml intraoperatively and the patient recovered well. She subsequently conceived spontaneously with an intrauterine pregnancy and underwent lower segment caesarean section at 37 weeks in view of previous laparoscopic cornual resection. Intraoperatively, the right cornua appeared normal and there was no sign of thinning.
2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2010
ABSTRACT Variance-penalized Markov decision processes (MDPs) for an infinite time horizon have be... more ABSTRACT Variance-penalized Markov decision processes (MDPs) for an infinite time horizon have been studied in the literature for asymptotic and one-step variance; in these models, the objective function is generally the expected long-run reward minus a constant times the variance, where variance is used as a measure of risk. For the finite time horizon, asymptotic variance has been considered in Collins, but this model accounts for only a terminal reward, i.e., reward is earned at the end of the time horizon. In this paper, we seek to develop a framework for one-step variance in the finite time horizon in which rewards can be non-zero in every state. We develop a solution algorithm based on the stochastic shortest path algorithm of Bertsekas and Tsitsiklis. We also present a Q-Learning algorithm for a simulation-based scenario which applies in the absence of the transition probability model, along with some preliminary convergence results.
Pregnancy is unarguably the most important event in a woman's life. Most women will have a n... more Pregnancy is unarguably the most important event in a woman's life. Most women will have a normal pregnancy resulting in normal delivery with a healthy mother and baby. However, pregnancy carries a host of inherent complications which can at times be life threatening. ...
International Journal of Refractory Metals and Hard Materials, 2013
ASME/ISCIE 2012 International Symposium on Flexible Automation, 2012
The popularity of forklifts that use fuel cells based on proton exchange membranes (PEMs) has ste... more The popularity of forklifts that use fuel cells based on proton exchange membranes (PEMs) has steadily increased with time in manufacturing industries and distribution centers. Because they potentially reduce our dependence on fossil fuels that emit carbon dioxide while generating energy, they have certain environmental benefits in comparison to forklifts driven by lead-acid batteries that are typically charged using regular sources of energy. In this paper, we study the impact of using PEM forklifts on material-handling costs and lead times, which are commonly used in measuring the cost-effectiveness of a manufacturing system's layout. We report some initial findings in this paper. In general, we find that layouts designed for PEM forklifts tend to have lower material-handling costs, improved closeness ratings, and higher area utilization, while the shopfloor lead times tend to be shorter, leading to lower inventory and higher flexibility in responding to fluctuations in customer demand. Overall, PEM forklifts may hence improve the health of the supply chain of the product by making it more flexible and cost-effective.
Operations Research/Computer Science Interfaces Series, 2003
Operations Research/Computer Science Interfaces Series, 2003
Operations Research/Computer Science Interfaces Series, 2003
International Journal of Control, Automation and Systems, 2011
We develop the theory for Markov and semi-Markov control using dynamic programming and reinforcem... more We develop the theory for Markov and semi-Markov control using dynamic programming and reinforcement learning in which a form of semi-variance which computes the variability of rewards below a pre-specified target is penalized. The objective is to optimize a function of the rewards and risk where risk is penalized. Penalizing variance, which is popular in the literature, has some drawbacks that can be avoided with semi-variance.
Springer Series in Reliability Engineering, 2010
Engineering Management Journal, 2011
Unexpected failures can reduce throughput-especially if the failure affects the process bottlenec... more Unexpected failures can reduce throughput-especially if the failure affects the process bottleneck. Furthermore, when a failure occurs, it usually takes longer to correct than a scheduled maintenance activity would, resulting in significantly higher costs. It has been empirically shown that preventive maintenance can reduce the frequency of unexpected failures and, if done at appropriate time intervals, can reduce the overall costs . TPM is now viewed as an integral part of regular operations in production firms. With the advent of computers and the cheap availability of personal computers in the last couple of decades, computerized maintenance systems have become increasingly popular in industry . Such systems make it easy to collect and maintain historical data of machine failures, their frequencies, and down-times due to repairs and maintenance. These databases can be used to determine parameters (such as distribution type, mean, etc.) of system failures, providing an excellent basis to model and improve the maintenance process. It is no exaggeration to state that production systems cannot remain healthy and productive without a good TPM program; however, developing effective operational strategies for TPM can be quite challenging because of numerous complicating factors, such as random failures of the different machines and pieces of equipment in a system, randomness in repair/maintenance times due to variability in the availability of spare parts and repairpersons, and the complex stochastic dynamics of production systems. The manager has to analyze the underlying stochastic processes, costs, and revenues, and a host of other factors in order to develop an effective TPM program.
We consider a manufacturing system that uses Automated Guided Vehicles (AGVs) for material handli... more We consider a manufacturing system that uses Automated Guided Vehicles (AGVs) for material handling. Increasing the capacity of the AGVs used can lead to increased throughput and reduced inventory-holding costs. However, AGVs are typically expensive. Hence modeling the economics of buying an additional vehicle is an important problem from the standpoint of making a system lean. Optimization of the AGV's capacity can be performed analytically only under some simplifying assumptions about the system. We present a simulation-optimization approach to determine the optimal capacity of an AGV in a closed loop path, where the AGV is used as a device for pick-up from machines and drop-off at a conveyor. Our focus in this work is on optimizing the capacity of the AGV.
Operations Research/Computer Science Interfaces Series, 2014
2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2011
Adaptive or actor critics are a class of reinforcement learning (RL) or approximate dynamic progr... more Adaptive or actor critics are a class of reinforcement learning (RL) or approximate dynamic programming (ADP) algorithms in which one searches over stochastic policies in order to determine the optimal deterministic policy. Classically, these algorithms have been studied for Markov decision processes (MDPs) in the context of model-free updates in which transition probabilities are avoided altogether. A model-free version for the semi-MDP (SMDP) for discounted reward in which the transition time of each transition can be a random variable was proposed in Gosavi [1]. In this paper, we propose a variant in which the transition probability model is built simultaneously with the value function and action-probability functions. While our new algorithm does not require the transition probabilities apriori, it generates them along with the estimation of the value function and the action-probability functions required in adaptive critics. Model-building and model-based versions of algorithms have numerous advantages in contrast to their modelfree counterparts. In particular, they are more stable and may require less training. However the additional steps of building the model may require increased storage in the computer's memory. In addition to enumerating potential application areas for our algorithm, we will analyze the advantages and disadvantages of model building.
BMJ case reports, 2015
A 23-year-old woman, gravida 2 para 0, presented at 8 weeks gestation with a spontaneously concei... more A 23-year-old woman, gravida 2 para 0, presented at 8 weeks gestation with a spontaneously conceived triplet cornual ectopic pregnancy. She was at high risk of ectopic pregnancy as she had been previously treated for pelvic inflammatory disease and had also undergone laparoscopic salpingostomy for right-sided ectopic pregnancy. She was clinically stable and her abdomen was soft and non-tender. The diagnosis was made on transvaginal ultrasound scan and this was confirmed on the three-dimensional scan. She was counselled about her treatment options and subsequently underwent laparoscopic cornual resection using the modified endoloop method. The estimated blood loss was 20 ml intraoperatively and the patient recovered well. She subsequently conceived spontaneously with an intrauterine pregnancy and underwent lower segment caesarean section at 37 weeks in view of previous laparoscopic cornual resection. Intraoperatively, the right cornua appeared normal and there was no sign of thinning.
2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2010
ABSTRACT Variance-penalized Markov decision processes (MDPs) for an infinite time horizon have be... more ABSTRACT Variance-penalized Markov decision processes (MDPs) for an infinite time horizon have been studied in the literature for asymptotic and one-step variance; in these models, the objective function is generally the expected long-run reward minus a constant times the variance, where variance is used as a measure of risk. For the finite time horizon, asymptotic variance has been considered in Collins, but this model accounts for only a terminal reward, i.e., reward is earned at the end of the time horizon. In this paper, we seek to develop a framework for one-step variance in the finite time horizon in which rewards can be non-zero in every state. We develop a solution algorithm based on the stochastic shortest path algorithm of Bertsekas and Tsitsiklis. We also present a Q-Learning algorithm for a simulation-based scenario which applies in the absence of the transition probability model, along with some preliminary convergence results.
Pregnancy is unarguably the most important event in a woman's life. Most women will have a n... more Pregnancy is unarguably the most important event in a woman's life. Most women will have a normal pregnancy resulting in normal delivery with a healthy mother and baby. However, pregnancy carries a host of inherent complications which can at times be life threatening. ...
International Journal of Refractory Metals and Hard Materials, 2013
ASME/ISCIE 2012 International Symposium on Flexible Automation, 2012
The popularity of forklifts that use fuel cells based on proton exchange membranes (PEMs) has ste... more The popularity of forklifts that use fuel cells based on proton exchange membranes (PEMs) has steadily increased with time in manufacturing industries and distribution centers. Because they potentially reduce our dependence on fossil fuels that emit carbon dioxide while generating energy, they have certain environmental benefits in comparison to forklifts driven by lead-acid batteries that are typically charged using regular sources of energy. In this paper, we study the impact of using PEM forklifts on material-handling costs and lead times, which are commonly used in measuring the cost-effectiveness of a manufacturing system's layout. We report some initial findings in this paper. In general, we find that layouts designed for PEM forklifts tend to have lower material-handling costs, improved closeness ratings, and higher area utilization, while the shopfloor lead times tend to be shorter, leading to lower inventory and higher flexibility in responding to fluctuations in customer demand. Overall, PEM forklifts may hence improve the health of the supply chain of the product by making it more flexible and cost-effective.
Operations Research/Computer Science Interfaces Series, 2003
Operations Research/Computer Science Interfaces Series, 2003
Operations Research/Computer Science Interfaces Series, 2003