- Authors
- Sebastian Jost
- title
- Solving general Twisty Puzzles with Reinforcement Learning and automatic Algorithm Generation
- Please use the following URL when quoting:
- https://nbn-resolving.org/urn:nbn:de:bsz:14-qucosa2-949917
- publication_date
- 2025
- Date of submission
- 01.12.2024
- Date of defense
- 19.12.2024
- Abstract (EN)
- The Rubik's Cube and hundreds of twisty puzzles that were made afterwards have challenged millions of people around the world for decades. Coming up with new solution strategies for a twisty puzzle can be very difficult and often takes even experienced humans many hours. Recently, Machine Learning has been used to find new solutions to such puzzles with little to no human knowledge required. But these solutions are difficult for humans to replicate as there is little obvious structure. We propose a method to automatically generate algorithms for puzzles and use them to augment the action set of reinforcement learning agents. The augmented action set enables learning with an intuitive, dense reward function and the resulting agents learn more interpretable solution strategies while using significantly less compute than previous methods to solve the Rubik's Cube.
- Keywords (EN)
- reinforcement learning, machine learning, Rubik's Cube, AI interpretability
- Classification (DDC)
- 006
- 620
- 510
- Classification (RVK)
- ST 300
- ST 134
- Examiner
- Prof. Dr. Ostap Okhrin
- Prof. Dr. Pascal Kerschke
- Ankit Anil Chaudhari
- Awarding institution
- Technische Universität Dresden, Dresden
- version
- aktualisierte Version
- URN Qucosa
- urn:nbn:de:bsz:14-qucosa2-949917
- Qucosa date of publication
- 02.01.2025
- Document type
- master_thesis
- Document language
- English
- licence
CC BY-NC-SA 4.0- tableofcontents
1 Introduction and motivation 1 1.1 Related work 3 1.2 Mathematical Foundation 4 1.2.1 Permutations 4 1.2.2 Modeling Twisty Puzzles 4 1.2.3 Group Properties of non-bandaged Twisty Puzzles 6 2 Algorithm generation 7 2.1 Automatic Symmetry Detection 8 2.2 Automatic Piece Detection 11 2.3 Algorithm Generation 14 2.4 Reinforcement Learning 18 3 Experiments and Results 21 3.1 Experiment Description 22 3.1.1 Research Questions 22 3.1.2 Investigated Puzzles 22 3.2 Results and Discussion 27 3.2.1 Training behavior 27 3.2.2 Training Time Scaling with State Space Size 28 3.2.3 Policy Analysis 30 3.2.4 Limitations 39 3.3 Further questions 40 3.4 Conclusion 42 A Compute Hardware & training time 47 B Alternative Symmetry Detection 49