poker_ai.ai package

Submodules

poker_ai.ai.agent module

class poker_ai.ai.agent.Agent(agent_path: Union[str, pathlib.Path, None] = None, use_manager: bool = True)

Bases: object

Create agent, optionally initialise to agent specified at path.

Variables
  • strategy (Dict[str, Dict[str, int]]) – The preflop strategy for an agent.

  • regret (Dict[str, Dict[strategy, int]]) – The regret for an agent.

poker_ai.ai.ai module

poker_ai.ai.ai.calculate_strategy(this_info_sets_regret: Dict[str, float]) → Dict[str, float]

Calculate the strategy based on the current information sets regret.

Parameters

this_info_sets_regret (Dict[str, float]) – Regret for each action at this info set.

Returns

strategy – Strategy as a probability over actions.

Return type

Dict[str, float]

poker_ai.ai.ai.cfr(agent: poker_ai.ai.agent.Agent, state: poker_ai.games.short_deck.state.ShortDeckPokerState, i: int, t: int, locks: Dict[str, multiprocessing.synchronize.Lock] = {}) → float

Regular counter factual regret minimization algorithm.

Parameters
  • agent (Agent) – Agent being trained.

  • state (ShortDeckPokerState) – Current game state.

  • i (int) – The Player.

  • t (int) – The iteration.

  • locks (Dict[str, mp.synchronize.Lock]) – The locks for multiprocessing

poker_ai.ai.ai.cfrp(agent: poker_ai.ai.agent.Agent, state: poker_ai.games.short_deck.state.ShortDeckPokerState, i: int, t: int, c: int, locks: Dict[str, multiprocessing.synchronize.Lock] = {})

Counter factual regret minimazation with pruning.

Parameters
  • agent (Agent) – Agent being trained.

  • state (ShortDeckPokerState) – Current game state.

  • i (int) – The Player.

  • t (int) – The iteration.

  • locks (Dict[str, mp.synchronize.Lock]) – The locks for multiprocessing

poker_ai.ai.ai.serialise(agent: poker_ai.ai.agent.Agent, save_path: pathlib.Path, t: int, server_state: Dict[str, Union[str, float, int, None]], locks: Dict[str, multiprocessing.synchronize.Lock] = {})

Write progress of optimising agent (and server state) to file.

Parameters
  • agent (Agent) – Agent being trained.

  • save_path (ShortDeckPokerState) – Current game state.

  • t (int) – The iteration.

  • server_state (Dict[str, Union[str, float, int, None]]) – All the variables required to resume training.

  • locks (Dict[str, mp.synchronize.Lock]) – The locks for multiprocessing

poker_ai.ai.ai.update_strategy(agent: poker_ai.ai.agent.Agent, state: poker_ai.games.short_deck.state.ShortDeckPokerState, i: int, t: int, locks: Dict[str, multiprocessing.synchronize.Lock] = {})

Update pre flop strategy using a more theoretically sound approach.

Parameters
  • agent (Agent) – Agent being trained.

  • state (ShortDeckPokerState) – Current game state.

  • i (int) – The Player.

  • t (int) – The iteration.

  • locks (Dict[str, mp.synchronize.Lock]) – The locks for multiprocessing

poker_ai.ai.runner module

Script for using multiprocessing to train agent.

CLI Use

Below you can run python runner.py –help to get the following description of the two commands available in the CLI, resume and search: ``` Usage: poker_ai train start [OPTIONS]

Train agent from scratch.

Options:
--strategy_interval INTEGER

Update the current strategy whenever the iteration % strategy_interval == 0.

--n_iterations INTEGER

The total number of iterations we should train the model for.

--lcfr_threshold INTEGER

A threshold for linear CFR which means don’t apply discounting before this iteration.

--discount_interval INTEGER

Discount the current regret and strategy whenever iteration % discount_interval == 0.

--prune_threshold INTEGER

When a uniform random number is less than 95%, and the iteration > prune_threshold, use CFR with pruning.

--c INTEGER

Pruning threshold for regret, which means when we are using CFR with pruning and have a state with a regret of less than c, then we’ll elect to not recusrively visit it and it’s child nodes.

--n_players INTEGER

The number of players in the game.

--dump_iteration INTEGER

When the iteration % dump_iteration == 0, we will compute a new strategy and write that to the accumlated strategy, which gets normalised at a later time.

--update_threshold INTEGER

When the iteration is greater than update_threshold we can start updating the strategy.

--lut_path TEXT

The path to the files for clustering the infosets.

--pickle_dir TEXT

Whether or not the lut files are pickle files. This lookup method is deprecated.

–single_process / –multi_process

Either use or don’t use multiple processes.

–sync_update_strategy / –async_update_strategy

Do or don’t synchronise update_strategy.

–sync_cfr / –async_cfr Do or don’t synchronuse CFR. –sync_discount / –async_discount

Do or don’t synchronise the discounting.

–sync_serialise / –async_serialise

Do or don’t synchronise the serialisation.

--nickname TEXT

The nickname of the study.

--help

Show this message and exit.

```

Safely run the server, and allow user to control c.

Module contents