Show simple item record

dc.contributor.authorLaterre, Aen
dc.contributor.authorFu, Yen
dc.contributor.authorJabri, MKen
dc.contributor.authorCohen, A-Sen
dc.contributor.authorKas, Den
dc.contributor.authorHajjar, Ken
dc.contributor.authorDahl, TSen
dc.contributor.authorKerkeni, Aen
dc.contributor.authorBeguir, Ken
dc.date.accessioned2018-12-08T18:23:53Z
dc.date.available2018-12-08T18:23:53Z
dc.date.issued2018-07-04en
dc.identifier.urihttp://hdl.handle.net/10026.1/13006
dc.description.abstract

Adversarial self-play in two-player games has delivered impressive results when used with reinforcement learning algorithms that combine deep neural networks and tree search. Algorithms like AlphaZero and Expert Iteration learn tabula-rasa, producing highly informative training data on the fly. However, the self-play training strategy is not directly applicable to single-player games. Recently, several practically important combinatorial optimisation problems, such as the travelling salesman problem and the bin packing problem, have been reformulated as reinforcement learning problems, increasing the importance of enabling the benefits of self-play beyond two-player games. We present the Ranked Reward (R2) algorithm which accomplishes this by ranking the rewards obtained by a single agent over multiple games to create a relative performance metric. Results from applying the R2 algorithm to instances of a two-dimensional and three-dimensional bin packing problems show that it outperforms generic Monte Carlo tree search, heuristic algorithms and integer programming solvers. We also present an analysis of the ranked reward mechanism, in particular, the effects of problem instances with varying difficulty and different ranking thresholds.

en
dc.language.isoenen
dc.subjectcs.LGen
dc.subjectcs.LGen
dc.subjectcs.AIen
dc.subjectstat.MLen
dc.titleRanked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimizationen
dc.typeJournal Article
plymouth.author-urlhttp://arxiv.org/abs/1807.01672v3en
plymouth.journalPresented at the Thirty-second Conference on Neural Information Processing Systems (NeurIPS 2018), Deep Reinforcement Learning Workshop, Montreal, Canada, December 3-8, 2018en
plymouth.organisational-group/Plymouth
plymouth.organisational-group/Plymouth/Faculty of Science and Engineering
plymouth.organisational-group/Plymouth/Research Groups
plymouth.organisational-group/Plymouth/Research Groups/Institute of Health and Community
plymouth.organisational-group/Plymouth/Users by role
dc.rights.embargoperiodNot knownen
rioxxterms.licenseref.urihttp://www.rioxx.net/licenses/all-rights-reserveden
rioxxterms.typeJournal Article/Reviewen


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record


All items in PEARL are protected by copyright law.
Author manuscripts deposited to comply with open access mandates are made available in accordance with publisher policies. Please cite only the published version using the details provided on the item record or document. In the absence of an open licence (e.g. Creative Commons), permissions for further reuse of content should be sought from the publisher or author.
Theme by 
Atmire NV