Modified value-function-approximation for synchronous policy iteration with single-critic configuration for nonlinear optimal control

Tang, D.; Chen, L.; Tian, Z.; Hu, E.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/124816

Scopus	Web of Science®	Altmetric
Citations
?	?

Full metadata record

DC Field	Value	Language
dc.contributor.author	Tang, D.	-
dc.contributor.author	Chen, L.	-
dc.contributor.author	Tian, Z.	-
dc.contributor.author	Hu, E.	-
dc.date.issued	2021	-
dc.identifier.citation	International Journal of Control, 2021; 94(5):1321-1333	-
dc.identifier.issn	0020-7179	-
dc.identifier.issn	1366-5820	-
dc.identifier.uri	http://hdl.handle.net/2440/124816	-
dc.description	Published online: 11 Aug 2019.	-
dc.description.abstract	This study proposes a modiﬁed value-function-approximation (MVFA) and investi-gates its use under a single-critic conﬁguration based on neural networks (NNs) for synchronous policy iteration (SPI) to deliver compact implementation of optimal control online synthesis for control-aﬃne continuous-time nonlinear systems. Exist-ing single-critic algorithms require stabilising critic tuning laws while eliminating actor tuning. This paper thus studies alternative single-critic realisation aiming to relax the needs for stabilising mechanisms in the critic tuning law. Optimal control laws are determined from the Hamilton-Jacobi-Bellman equality by solving for the associated value function via SPI in a single-critic conﬁguration. Diﬀerent from other existing single-critic methods, an MVFA is proposed to deal with closed-loop stabil-ity during online learning. Gradient-descent tuning is employed to adjust the critic NN parameters in the interests of not complicating the problem. Parameters conver-gence and closed-loop stability are examined. The proposed MVFA-based approach yields an alternative single-critic SPI method with uniformly ultimately bounded closed-loop stability during online learning without the need for stabilising mecha-nisms in the critic tuning law. The proposed approach is veriﬁed via simulations.	-
dc.description.statementofresponsibility	Difan Tang, Lei Chen, Zhao Feng Tian and Eric Hu	-
dc.language.iso	en	-
dc.publisher	Taylor & Francis	-
dc.rights	© 2019 Informa UK Limited, trading as Taylor & Francis Group	-
dc.source.uri	https://www.tandfonline.com/	-
dc.subject	Adaptive dynamic programming; approximate dynamic programming; neural networks; nonlinear control; optimal control; policy iteration	-
dc.title	Modified value-function-approximation for synchronous policy iteration with single-critic configuration for nonlinear optimal control	-
dc.type	Journal article	-
dc.identifier.doi	10.1080/00207179.2019.1648874	-
pubs.publication-status	Published	-
dc.identifier.orcid	Tang, D. [0000-0002-7143-0441]	-
dc.identifier.orcid	Chen, L. [0000-0002-2269-2912]	-
dc.identifier.orcid	Tian, Z. [0000-0001-9847-6004]	-
dc.identifier.orcid	Hu, E. [0000-0002-7390-0961]	-
Appears in Collections:	Aurora harvest 4 Mechanical Engineering publications

Files in This Item:

File	Description	Size	Format
hdl_124816.pdf	Submitted version	3.07 MB	Adobe PDF	View/Open

Show simple item record

Adelaide Research & Scholarship