Mostrar registro simples

dc.contributor.advisorTavares, Anderson Rochapt_BR
dc.contributor.authorSilva, Giovani dapt_BR
dc.date.accessioned2024-02-16T05:00:52Zpt_BR
dc.date.issued2023pt_BR
dc.identifier.urihttp://hdl.handle.net/10183/272023pt_BR
dc.description.abstractIn today’s complex optimization landscape, challenges often transcend the pursuit of a solitary objective, instead requiring the simultaneous consideration of multiple, some times conflicting, goals. This complexity has given rise to the field of Multi-Objective Optimization. In recent years, researchers have begun to integrate these multi-objective approaches into Reinforcement Learning, leading to the emergence of Multi-Objective Reinforcement Learning (MORL). The field is gaining traction, especially due to the ca pabilities of model-free Reinforcement Learning algorithms. Essentially, these model free MORL algorithms strive to balance multiple, often conflicting, objectives without necessitating prior knowledge of the environment. This thesis provides an in-depth anal ysis of model-free MORL algorithms anchored in Pareto Dominating Policies (PDP), specifically focusing on two key algorithms: Pareto Q-Learning (PQL) and Pareto Deep Q-Networks (PDQN), these algorithms were selected for their model-free characteristics and their resemblance to well-known reinforcement learning algorithms like Q-Learning and Deep Q-Networks. This study features implementations from scratch of both the PQL and PDQN algorithms. It evaluates the performance of PQL in the Deep Sea Treasure environment and assesses PDQN in both the Deep Sea Treasure and a simulated urban traffic setting. This research identifies common challenges, such as the generation of non-optimal policies and the difficulties associated with managing large state spaces. Our findings reveal that the application of the PDQN algorithm in real-world scenarios, such as Gym City Flow (ZHANG et al., 2019), has led to no improvements, thereby demonstrating their inefficacy. To address these challenges, this work proposes enhance ments to the PDQN algorithm and introduces a new MORL technique based on Pareto Dominating Actions. Preliminary tests indicate that this innovative approach shows promise in enhancing the effectiveness of MORL algorithms. The primary contributions of this work lie in its examination of the current state of MORL algorithms based on Pareto Dominating Policies: discussing their architecture, their chal lenges and their possible improvements, while also testing their effectiveness in MO sce narios. By doing that, we are trying to shed light on their inherent limitations and chal lenges. In light of these limitations, we propose enhancements to the PDQN algorithm through an innovative approach that holds the potential to establish an effective approach to MORL algorithms in the future. This work serves as both a critical review of existing methodologies and a forward-looking exploration of the future landscape of MORL cen tered on Pareto Optimality.en
dc.format.mimetypeapplication/pdfpt_BR
dc.language.isoengpt_BR
dc.rightsOpen Accessen
dc.subjectAprendizagem por reforçopt_BR
dc.subjectMulti-objectiveen
dc.subjectInteligência artificialpt_BR
dc.subjectPareto Dominating Policiesen
dc.subjectRedespt_BR
dc.subjectPareto Deep Q-networksen
dc.titleAnalysis and improvements of multi objective reinforcement learning algorithms based on pareto dominating policiespt_BR
dc.typeTrabalho de conclusão de graduaçãopt_BR
dc.identifier.nrb001195959pt_BR
dc.degree.grantorUniversidade Federal do Rio Grande do Sulpt_BR
dc.degree.departmentInstituto de Informáticapt_BR
dc.degree.localPorto Alegre, BR-RSpt_BR
dc.degree.date2023pt_BR
dc.degree.graduationCiência da Computação: Ênfase em Engenharia da Computação: Bachareladopt_BR
dc.degree.levelgraduaçãopt_BR


Thumbnail
   

Este item está licenciado na Creative Commons License

Mostrar registro simples