Методика построения управления космическими аппаратами с использованием методов обучения с подкреплением

М. Г. Широбоков; Широбоков М. Г.

doi:10.31857/S0023420624050082

Методика построения управления космическими аппаратами с использованием методов обучения с подкреплением

Authors: Широбоков М.Г.¹
Affiliations:
1. Институт прикладной математики им. М.В. Келдыша РАН
Issue: Vol 62, No 5 (2024)
Pages: 498-515
Section: Articles
URL: https://kld-journal.fedlab.ru/0023-4206/article/view/672821
DOI: https://doi.org/10.31857/S0023420624050082
EDN: https://elibrary.ru/IGZREA
ID: 672821

Cite item

Full Text

Open Access
Restricted Access

Access granted
Restricted Access

Subscription or Fee Access

Abstract
Full Text
About the authors
References
Supplementary files
Statistics

Abstract

В работе формулируется методика сведения общей задачи оптимального управления космическими аппаратами к задаче машинного обучения с подкреплением. Методика включает метод оценки качества алгоритма управления на основе неравенств теории вероятностей. Представлена авторская программная библиотека для сведения задач оптимального управления к обучению с подкреплением. Рассматривается два примера применения методики. Предлагаемая методика может представлять интерес также для построения управления общими механическими системами.

Full Text

About the authors

М. Г. Широбоков

Институт прикладной математики им. М.В. Келдыша РАН

Author for correspondence.
Email: shirobokov@keldysh.ru
Russian Federation, Москва

References

Понтрягин Л.В. Принцип максимума в оптимальном управлении. Москва: Едиториал УРСС, 2004.
Александров В.В., Болтянский В.Г., Лемак С.С. и др. Оптимальное управление движением. Москва: ФИЗМАТЛИТ, 2005.
Егоров А.И. Основы теории управления. Москва: ФИЗМАТЛИТ, 2004.
Беллман Р., Калаба Р. Динамическое программирование и современная теория управления. Москва: Наука, 1969.
Bertsekas D.P. Dynamic programming and optimal control. Volume I. Belmont: Athena Scientific, 2005.
Bertsekas D.P. Dynamic programming and optimal control. Volume II. Belmont: Athena Scientific, 2007.
Саттон Р.С., Барто Э.Г. Обучение с подкреплением. Москва: Бином. Лаборатория знаний, 2017.
Bertsekas D.P. Reinforcement learning and optimal control. Belmont: Athena Scientific, 2019.
Kamalapurkar R., Walters P., Rosenfeld J. et al. Reinforcement Learning for Optimal Feedback Control. A Lyapunov-Based Approach. Cham: Springer, 2018.
Gurfil P., Idan M., Kasdin N.J. Adaptive neural control of deep-space formation flying // J. Guidance, Control, and Dynamics. 2003. V. 26. Iss. 3. P. 491–501. DOI: https://dx.doi.org/10.2514/2.5072.
Leeghim H., Choi Y., Bang H. Adaptive attitude control of spacecraft using neural networks // Acta Astronautica. 2009. V. 64. Iss. 7–8. P. 778–786. DOI: https://dx.doi.org/10.1016/j.actaastro.2008.12.004.
Zeng W., Wang Q. Learning from adaptive neural network control of an underactuated rigid spacecraft // Neurocomputing. 2015. V. 168. P. 690–697. DOI: https://dx.doi.org/10.1016/j.neucom.2015.05.055.
Li S., Jiang X. RBF neural network based second-order sliding mode guidance for Mars entry under uncertainties // Aerospace Science and Technology. 2015. V. 43. P. 226–235. DOI: https://dx.doi.org/10.1016/j.ast.2015.03.006}{10.1016/j.ast.2015.03.006.
Wang C., Hill D.J. Deterministic learning theory for identification, recognition, and control. Boca Raton: CRC Press, 2010.
Bertsekas D.P, Tsitsiklis J.N. Neuro-Dynamic Programming. Belmont: Athena Scientific, 1996.
Shirobokov M., Trofimov S., Ovchinnikov M. Survey of machine learning techniques in spacecraft control design // Acta Astronautica. 2021. V. 186. P. 87–97. DOI: https://doi.org/10.1016/j.actaastro.2021.05.018.
Gaudet B., Linares R., Furfaro R. Terminal adaptive guidance via reinforcement meta-learning: Applications to autonomous asteroid close-proximity operations // Acta Astronautica. 2020. V. 171. P. 1–13. DOI: https://doi.org/10.1016/j.actaastro.2020.02.036.
Gaudet B., Linares R., Furfaro R. Adaptive guidance and integrated navigation with reinforcement meta-learning // Acta Astronautica. 2020. V. 169. P. 180–190. DOI: https://doi.org/10.1016/j.actaastro.2020.01.007.
Scorsoglio A., D’Ambrosio A., Ghilardi L. et al. Image-based deep reinforcement meta-learning for autonomous lunar landing // J. Spacecraft and Rockets. 2022. V. 59. Iss. 1. P. 153–165. DOI: https://doi.org/10.2514/1.A35072.
Gaudet B., Linares R., Furfaro R. Six degree-of-freedom body-fixed hovering over unmapped asteroids via LIDAR altimetry and reinforcement meta-learning // Acta Astronautica. 2020. V. 172. P. 90–99. DOI: https://doi.org/10.1016/j.actaastro.2020.03.026.
Лидов М.Л., Ляхова В.А. Гарантирующий синтез управления для стабилизации движения космического аппарата в окрестности неустойчивых точек либрации // Космические исследования. 1992. Т. 30. № 5. С. 579–595.
Silver D., Lever G., Heess N. et al. Deterministic policy gradient algorithms // Proc. 31st International Conference on Machine Learning. 2014. V. 32. Iss. 1. P. 387–395. URL: http://proceedings.mlr.press/v32/silver14.html.
Mnih V., Badia A.P., Mirza M. et al. Asynchronous Methods for Deep Reinforcement Learning // Proc. 33rd International Conference on Machine Learning. 2016. V. 48. P. 1928–1937. URL: https://proceedings.mlr.press/v48/mniha16.html.
Schulman J., Wolski F., Dhariwal P. et al. Proximal Policy Optimization Algorithms // arXiv preprint. 2017. 1707.06347. URL: https://arxiv.org/abs/1707.06347.
Moriarty D.E., Schultz A.C., Grefenstette J.J. Evolutionary algorithms for reinforcement learning // J. Artificial Intelligence Research. 1999. V. 11. P. 241–276.
Sehgal A., La H., Louis S. et al. Deep reinforcement learning using genetic algorithm for parameter optimization // Proc. 3d IEEE International Conference on Robotic Computing (IRC 2019). P. 596–601. DOI: https://doi.org/10.1109/IRC.2019.00121.
Sutton R.S., McAllester D.A., Singh S.P. et al. Policy gradient methods for reinforcement learning with function approximation // Advances in Neural Information Processing Systems 12 (NIPS 1999). 1999. P. 1057–1063. URL: https://proceedings.neurips.cc/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf.
Cybenko G. Approximation by superpositions of a sigmoidal function // Mathematics of Control, Signals, and Systems. 1989. V. 2. Iss. 4. P. 303–314. DOI: https://doi.org/10.1007/BF02551274.
Leshno M., Lin V.Ya., Pinkus A. et al. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function // Neural Networks. 1993. V. 6. Iss. 6. P. 861–867. DOI: https://doi.org/10.1016/S0893-6080(05)80131-5.
Pinkus A. Approximation theory of the MLP model in neural networks // Acta Numerica. 1999. V. 8. P. 143–195. DOI: https://doi.org/10.1017/S0962492900002919.
Kidger P., Lyons T. Universal Approximation with Deep Narrow Networks // Proc. Machine Learning Research. 2020. V. 125. P. 1–22. URL: http://proceedings.mlr.press/v125/kidger20a/kidger20a.pdf.
Hoeffding W. Probability inequalities for sums of bounded random variables // J. American Statistical Association. 1963. V. 58. Iss. 301. P. 13–30. DOI: https://doi.org/10.1080/01621459.1963.10500830.
Gymnasium // Веб-страница документации программной библиотеки Gymnasium (https://gymnasium.farama.org/index.html). Просмотрено: 18.09.2023.
Stable-Baselines3 // Веб-страница документации программной библиотеки Stable-Baselines3 (https://stable-baselines3.readthedocs.io/en/master/). Просмотрено: 18.09.2023.
Pytorch // Сайт программной библиотеки Pytorch (https://pytorch.org/). Просмотрено: 18.09.2023.
Jones D.R., Schonlau M., Welch W.J. Efficient global optimization of expensive black-box functions // Journal of Global optimization. 1998. V. 13. P. 455–492. DOI: https://doi.org/10.1023/A:1008306431147.
Bergstra J.S., Bardenet R., Bengio Y. et al. Algorithms for Hyper-Parameter Optimization // Advances in Neural Information Processing Systems 24 (NIPS 2011). 2011. P. 2546–2554. URL: https://papers.nips.cc/paper_files/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf.
Akiba T., Sano S., Yanase T. et al. Optuna: A next-generation hyperparameter optimization framework // Proc. 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019. P. 2623–2631. DOI: https://doi.org/10.1145/3292500.3330701.
Liaw R., Liang E., Nishihara R. et al. Tune: A research platform for distributed model selection and training // arXiv preprint. 2018. 1807.05118. URL: https://arxiv.org/pdf/1807.05118.pdf.
Balandat M., Karrer B., Jiang D. et al. BoTorch: A framework for efficient Monte-Carlo Bayesian optimization // Advances in Neural Information Processing Systems 33. 2020. P. 21524–21538. URL: https://proceedings.neurips.cc/paper/2020/file/f5b1b89d98b7286673128a5fb112cb9a-Paper.pdf.
Bergstra J., Yamins D., Cox D.D. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures // Proc. 30th International Conference on Machine Learning. 2013. V. 28. P. 115–123. URL: http://proceedings.mlr.press/v28/bergstra13.pdf.
Hairer E., Wanner G. Solving Ordinary Differential Equations I. Nonstiff Problems. Heidelberg: Springer, 2008.
Folta D.C., Pavlak T.A., Haapala A.F. et al. Earth–Moon Libration Point Orbit Stationkeeping: Theory, Modeling, and Operations // Acta Astronautica. 2014. V. 94. Iss. 1. P. 421–433.

Supplementary files

Supplementary Files

Action

1. JATS XML

Download

Username
Password
Remember me

Forgot password?	Register

Username
Password
Remember me

Forgot password?	Register

Vol 63, No 4 (2025)

Vol 63, No 4 (2025)

Методика построения управления космическими аппаратами с использованием методов обучения с подкреплением

Full Text

Abstract

Full Text

About the authors

М. Г. Широбоков

References

Supplementary files