Using reinforcement learning method to price a perishable product, case study: orange

Shekari Firouzjaie, Abbas; Sahebjamnia, Navid; Abdollahzade, Hadi

doi:10.22054/jmmf.2020.54852.1013

Document Type : Research Article

Authors

¹ Industrial Engineering Department, Science and Technology of Behshahr, Mazandran, Iran.

² Department of industrial engineering, University of Science and Technology of Mazandaran, Behshahr, Iran

³ Industrial Engineering Department, Science and Technology of Behshahr, Mazandran, Iran

https://doi.org/10.22054/jmmf.2020.54852.1013

Abstract

‎Determining the optimal selling price for different commodities has always been one of the main topics of scientific and industrial research‎. ‎Perishable products have a short life and due to their deterioration over time‎, ‎they cause great damage if not managed‎. ‎Many industries‎, ‎retailers‎, ‎and service providers have the opportunity to increase their revenue through optimal pricing of perishable products that must be sold within a certain period‎. ‎In the pricing issue‎, ‎a seller must determine the price of several units of a perishable or seasonal product to be sold for a limited time‎. ‎This article examines pricing policies that increase revenue for the sale of a given inventory with an expiration date‎. ‎Booster learning algorithms are used to analyze how companies can simultaneously learn and optimize pricing strategy in response to buyers‎. ‎It is also shown that using reinforcement learning we can model a demand-dependent problem‎. ‎This paper presents an optimization method in a model-independent environment in which demand is learned and pricing decisions are updated at the moment‎. ‎We compare the performance of learning algorithms using Monte Carlo simulations‎.

Keywords

References

[1] Miguel F Anjos, Russell CH Cheng, and Christine SM Currie, Maximizing revenue in the airline industry under one-way pricing, Journal of the Operational Research Society 55 (2004), no. 5, 535{541.
[2] , Optimal pricing policies for perishable products, European Journal of Operational Research 166 (2005), no. 1, 246{254.
[3] Tal Avinadav and Teijo Arponen, An eoq model for items with a xed shelf-life and a declining demand rate based on time-to-expiry technical note, Asia-Pacific Journal of Operational Research 26 (2009), no. 06, 759{767.
[4] Tal Avinadav, Avi Herbon, and Uriel Spiegel, Optimal ordering and pricing policy for demand functions that are separable into price and inventory age, International Journal of Production Economics 155 (2014), 406{417.
[5] Yossi Aviv and Amit Pazgal, A partially observed markov decision process for dynamic pricing, Management science 51 (2005), no. 9, 1400{1416.
[6] Seyed Mohammad Esmaeil Pour Mohammad Azizi and Abdolsadeh Neisy, A new approach in geometric brownian motion model, Fuzzy Information and Engineering and Decision (Cham) (Bing-Yuan Cao, ed.), Springer International
Publishing, 2018, pp. 336{342.
[7] Seyed Mohammad Esmaeil Pourmohammad Azizi and Abdolsadeh Neisy, Mathematic modelling and optimization of bank asset and liability by using fractional goal programing approach, International Journal of Modeling and Optimization 7 (2017), no. 2, 85.
[8] Alexandre X Carvalho and Martin L Puterman, Dynamic pricing and reinforcement learning, Proceedings of the International Joint Conference on Neural Networks, 2003., vol. 4, IEEE, 2003, pp. 2916{2921.
[9] Yan Cheng, Real time demand learning-based q-learning approach for dynamic pricing in e-retailing setting, 2009 International Symposium on Information Engineering and Electronic Commerce, IEEE, 2009, pp. 594{598.
[10] Richard P Covert and George C Philip, An eoq model for items with weibull distribution deterioration, AIIE transactions 5 (1973), no. 4, 323{326.

[11] Guillermo Gallego and Garrett Van Ryzin, Optimal dynamic pricing of inventories with stochastic demand over nite horizons, Management science 40 (1994), no. 8, 999{1020.
[12] PM Ghare, A model for an exponentially decaying inventory, J. ind. Engng 14 (1963), 238{243.
[13] Abhuit Gosavii, Naveen Bandla, and Tapas K Das, A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking, IIE transactions 34 (2002), no. 9, 729{742.
[14] Chengzhi Jiang and Zhaohan Sheng, Case-based reinforcement learning for dynamic inventory control in a multiagent supply-chain system, Expert Systems with Applications 36 (2009), no. 3, 6520{6526.
[15] Ahmet Kara and Ibrahim Dogan, Reinforcement learning approaches for specifying ordering policies of perishable inventory systems, Expert Systems with Applications 91 (2018), 150{158.
[16] Kyle Y Lin, Dynamic pricing with real-time demand learning, European Journal of Operational Research 174 (2006), no. 1, 522{538.
[17] Amir Hossein Nafei, Seyed Mohammad Esmaeil Pourmohammad Azizi, and Rajab Ali Ghasempour, An approach in solving data envelopment analysis with stochastic data, International workshop on Mathematics and Decision Science,
Springer, 2016, pp. 154{162.
[18] Amir Hossein Nafei, Wenjun Yuan, and Hadi Nasseri, Group multi-attribute decision making based on interval neutrosophic sets, In nite Study, 2019.
[19] Jing Peng and Ronald J Williams, Incremental multi-step q-learning, Machine Learning Proceedings 1994, Elsevier,1994, pp. 226{232.
[20] CVL Raju, Y Narahari, and K Ravikumar, Learning dynamic prices in electronicretail markets with customer segmentation, Annals of Operations Research 143 (2006), no. 1, 59{75.
[21] Rupal Rana and Fernando S. Oliveira, Real-time dynamic pricing in a non-stationary environment using model-free reinforcement learning, Omega 47 (2014), no. C, 116{126.
[22] Rupal Rana and Fernando S Oliveira, Dynamic pricing policies for interdependent perishable products or services using reinforcement learning, Expert systems with applications 42 (2015), no. 1, 426{436.
[23] Gerald Tesauro and Jeffrey O Kephart, Pricing in agent economies using multiagent q-learning, Autonomous agents and multi-agent systems 5 (2002), no. 3,289{304.
[24] Chih-Te Yang, Liang-Yuh Ouyang, and Hsing-Han Wu, Retailer's optimal pricing and ordering policies for non-instantaneous deteriorating items with price dependent demand and partial backlogging, Mathematical Problems in Engineering 2009 (2009).
[25] Yajun Zhang and Zheng Wang, Integrated ordering and pricing policy for perishable products with inventory inaccuracy, 2018 IEEE 14th International Conference on Automation Science and Engineering (CASE) (2018), 1230{1236.
[26] Wen Zhao and Yu-Sheng Zheng, Optimal dynamic pricing for perishable assets with nonhomogeneous demand, Management science 46 (2000), no. 3, 375{388.

Journal of Mathematics and Modeling in Finance

Using reinforcement learning method to price a perishable product, case study: orange

References

References

Volume 1, Issue 1
March 2021
Pages 27-40

Using reinforcement learning method to price a perishable product, case study: orange

References

References

Volume 1, Issue 1March 2021Pages 27-40

Volume 1, Issue 1
March 2021
Pages 27-40