Finding the Bandit in a Graph: Sequential Search-and-Stop
- Published in 2018
- Added on
In the collections
We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution. The agent can only examine vertices whose in-neighbors have already been examined. In scheduling theory, this problem is denoted by $1|prec|\sum w_jC_j$. However, in this paper, we address learning setting where we allow the agent to stop before having found the object and restart searching on a new independent instance of the same problem. The goal is to maximize the total number of hidden objects found under a time constraint. The agent can thus skip an instance after realizing that it would spend too much time on it. Our contributions are both to the search theory and multi-armed bandits. If the distribution is known, we provide a quasi-optimal greedy strategy with the help of known computationally efficient algorithms for solving $1|prec|\sum w_jC_j$ under some assumption on the DAG. If the distribution is unknown, we show how to sequentially learn it and, at the same time, act near-optimally in order to collect as many hidden objects as possible. We provide an algorithm, prove theoretical guarantees, and empirically show that it outperforms the na\"ive baseline.
Links
Other information
- key
- FindingtheBanditinaGraphSequentialSearchandStop
- type
- article
- date_added
- 2018-09-22
- date_published
- 2018-10-09
BibTeX entry
@article{FindingtheBanditinaGraphSequentialSearchandStop, key = {FindingtheBanditinaGraphSequentialSearchandStop}, type = {article}, title = {Finding the Bandit in a Graph: Sequential Search-and-Stop}, author = {Pierre Perrault and Vianney Perchet and Michal Valko}, abstract = {We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution. The agent can only examine vertices whose in-neighbors have already been examined. In scheduling theory, this problem is denoted by {\$}1|prec|\sum w{\_}jC{\_}j{\$}. However, in this paper, we address learning setting where we allow the agent to stop before having found the object and restart searching on a new independent instance of the same problem. The goal is to maximize the total number of hidden objects found under a time constraint. The agent can thus skip an instance after realizing that it would spend too much time on it. Our contributions are both to the search theory and multi-armed bandits. If the distribution is known, we provide a quasi-optimal greedy strategy with the help of known computationally efficient algorithms for solving {\$}1|prec|\sum w{\_}jC{\_}j{\$} under some assumption on the DAG. If the distribution is unknown, we show how to sequentially learn it and, at the same time, act near-optimally in order to collect as many hidden objects as possible. We provide an algorithm, prove theoretical guarantees, and empirically show that it outperforms the na\"ive baseline.}, comment = {}, date_added = {2018-09-22}, date_published = {2018-10-09}, urls = {http://arxiv.org/abs/1806.02282v1,http://arxiv.org/pdf/1806.02282v1}, collections = {Attention-grabbing titles,Protocols and strategies}, url = {http://arxiv.org/abs/1806.02282v1 http://arxiv.org/pdf/1806.02282v1}, year = 2018, urldate = {2018-09-22}, archivePrefix = {arXiv}, eprint = {1806.02282}, primaryClass = {stat.ML} }