Online fitted policy iteration based on extreme learning machines

Reinforcement learning (RL) is a learning paradigm that can be useful in a wide variety of real-world applications. However, its applicability to complex problems remains problematic due to different causes. Particularly important among these are the high quantity of data required by the agent to le...

Full description

Bibliographic Details
Main Authors: Escandell-Montero, Pablo, Lorente, Delia, Martínez-Martínez, José M., Soria-Olivas, Emilio, Vila-Francés, Joan, Martín-Guerrero, José D.
Format: article
Language:Inglés
Published: Elsevier 2021
Subjects:
Online Access:http://hdl.handle.net/20.500.11939/6972
https://www.sciencedirect.com/science/article/abs/pii/S0950705116001209#!
_version_ 1855032523658100736
author Escandell-Montero, Pablo
Lorente, Delia
Martínez-Martínez, José M.
Soria-Olivas, Emilio
Vila-Francés, Joan
Martín-Guerrero, José D.
author_browse Escandell-Montero, Pablo
Lorente, Delia
Martín-Guerrero, José D.
Martínez-Martínez, José M.
Soria-Olivas, Emilio
Vila-Francés, Joan
author_facet Escandell-Montero, Pablo
Lorente, Delia
Martínez-Martínez, José M.
Soria-Olivas, Emilio
Vila-Francés, Joan
Martín-Guerrero, José D.
author_sort Escandell-Montero, Pablo
collection ReDivia
description Reinforcement learning (RL) is a learning paradigm that can be useful in a wide variety of real-world applications. However, its applicability to complex problems remains problematic due to different causes. Particularly important among these are the high quantity of data required by the agent to learn useful policies and the poor scalability to high-dimensional problems due to the use of local approximators. This paper presents a novel RL algorithm, called online fitted policy iteration (OFPI), that steps forward in both directions. OFPI is based on a semi-batch scheme that increases the convergence speed by reusing data and enables the use of global approximators by reformulating the value function approximation as a standard supervised problem. The proposed method has been empirically evaluated in three benchmark problems. During the experiments, OFPI has employed a neural network trained with the extreme learning machine algorithm to approximate the value functions. Results have demonstrated the stability of OFPI using a global function approximator and also performance improvements over two baseline algorithms (SARSA and Q-learning) combined with eligibility traces and a radial basis function network.
format article
id ReDivia6972
institution Instituto Valenciano de Investigaciones Agrarias (IVIA)
language Inglés
publishDate 2021
publishDateRange 2021
publishDateSort 2021
publisher Elsevier
publisherStr Elsevier
record_format dspace
spelling ReDivia69722025-04-25T14:48:01Z Online fitted policy iteration based on extreme learning machines Escandell-Montero, Pablo Lorente, Delia Martínez-Martínez, José M. Soria-Olivas, Emilio Vila-Francés, Joan Martín-Guerrero, José D. Reinforcement learning Sequential decision-making Fitted policy iteration Extreme learning machine N01 Agricultural engineering Reinforcement learning (RL) is a learning paradigm that can be useful in a wide variety of real-world applications. However, its applicability to complex problems remains problematic due to different causes. Particularly important among these are the high quantity of data required by the agent to learn useful policies and the poor scalability to high-dimensional problems due to the use of local approximators. This paper presents a novel RL algorithm, called online fitted policy iteration (OFPI), that steps forward in both directions. OFPI is based on a semi-batch scheme that increases the convergence speed by reusing data and enables the use of global approximators by reformulating the value function approximation as a standard supervised problem. The proposed method has been empirically evaluated in three benchmark problems. During the experiments, OFPI has employed a neural network trained with the extreme learning machine algorithm to approximate the value functions. Results have demonstrated the stability of OFPI using a global function approximator and also performance improvements over two baseline algorithms (SARSA and Q-learning) combined with eligibility traces and a radial basis function network. 2021-01-18T09:31:45Z 2021-01-18T09:31:45Z 2016 article publishedVersion Escandell-Montero, P., Lorente, D., Martínez-Martínez, J. M., Soria-Olivas, E., Vila-Francés, J., & Martín-Guerrero, J. D. (2016). Online fitted policy iteration based on extreme learning machines. Knowledge-Based Systems, 100, 200-211. 0950-7051 http://hdl.handle.net/20.500.11939/6972 10.1016/j.knosys.2016.03.007 https://www.sciencedirect.com/science/article/abs/pii/S0950705116001209#! en Atribución-NoComercial-SinDerivadas 3.0 España http://creativecommons.org/licenses/by-nc-nd/3.0/es/ closedAccess Elsevier electronico
spellingShingle Reinforcement learning
Sequential decision-making
Fitted policy iteration
Extreme learning machine
N01 Agricultural engineering
Escandell-Montero, Pablo
Lorente, Delia
Martínez-Martínez, José M.
Soria-Olivas, Emilio
Vila-Francés, Joan
Martín-Guerrero, José D.
Online fitted policy iteration based on extreme learning machines
title Online fitted policy iteration based on extreme learning machines
title_full Online fitted policy iteration based on extreme learning machines
title_fullStr Online fitted policy iteration based on extreme learning machines
title_full_unstemmed Online fitted policy iteration based on extreme learning machines
title_short Online fitted policy iteration based on extreme learning machines
title_sort online fitted policy iteration based on extreme learning machines
topic Reinforcement learning
Sequential decision-making
Fitted policy iteration
Extreme learning machine
N01 Agricultural engineering
url http://hdl.handle.net/20.500.11939/6972
https://www.sciencedirect.com/science/article/abs/pii/S0950705116001209#!
work_keys_str_mv AT escandellmonteropablo onlinefittedpolicyiterationbasedonextremelearningmachines
AT lorentedelia onlinefittedpolicyiterationbasedonextremelearningmachines
AT martinezmartinezjosem onlinefittedpolicyiterationbasedonextremelearningmachines
AT soriaolivasemilio onlinefittedpolicyiterationbasedonextremelearningmachines
AT vilafrancesjoan onlinefittedpolicyiterationbasedonextremelearningmachines
AT martinguerrerojosed onlinefittedpolicyiterationbasedonextremelearningmachines