poker_ai.ai package¶

Subpackages¶

poker_ai.ai.multiprocess package

Submodules¶

poker_ai.ai.agent module¶

class poker_ai.ai.agent.Agent(agent_path: Union[str, pathlib.Path, None] = None, use_manager: bool = True)¶

Bases: object

Create agent, optionally initialise to agent specified at path.

…

Variables

strategy (Dict[str, Dict[str, int]]) – The preflop strategy for an agent.
regret (Dict[str, Dict[strategy, int]]) – The regret for an agent.

poker_ai.ai.ai module¶

poker_ai.ai.ai.calculate_strategy(this_info_sets_regret: Dict[str, float]) → Dict[str, float]¶

Calculate the strategy based on the current information sets regret.

…

Parameters: this_info_sets_regret (Dict[str, float]) – Regret for each action at this info set.
Returns: strategy – Strategy as a probability over actions.
Return type: Dict[str, float]

poker_ai.ai.ai.cfr(agent: poker_ai.ai.agent.Agent, state: poker_ai.games.short_deck.state.ShortDeckPokerState, i: int, t: int, locks: Dict[str, multiprocessing.synchronize.Lock] = {}) → float¶

Regular counter factual regret minimization algorithm.

…

Parameters

agent (Agent) – Agent being trained.
state (ShortDeckPokerState) – Current game state.
i (int) – The Player.
t (int) – The iteration.
locks (Dict[str, mp.synchronize.Lock]) – The locks for multiprocessing

poker_ai.ai.ai.cfrp(agent: poker_ai.ai.agent.Agent, state: poker_ai.games.short_deck.state.ShortDeckPokerState, i: int, t: int, c: int, locks: Dict[str, multiprocessing.synchronize.Lock] = {})¶

Counter factual regret minimazation with pruning.

…

Parameters

agent (Agent) – Agent being trained.
state (ShortDeckPokerState) – Current game state.
i (int) – The Player.
t (int) – The iteration.
locks (Dict[str, mp.synchronize.Lock]) – The locks for multiprocessing

poker_ai.ai.ai.serialise(agent: poker_ai.ai.agent.Agent, save_path: pathlib.Path, t: int, server_state: Dict[str, Union[str, float, int, None]], locks: Dict[str, multiprocessing.synchronize.Lock] = {})¶

Write progress of optimising agent (and server state) to file.

…

Parameters

agent (Agent) – Agent being trained.
save_path (ShortDeckPokerState) – Current game state.
t (int) – The iteration.
server_state (Dict[str, Union[str, float, int, None]]) – All the variables required to resume training.
locks (Dict[str, mp.synchronize.Lock]) – The locks for multiprocessing

poker_ai.ai.ai.update_strategy(agent: poker_ai.ai.agent.Agent, state: poker_ai.games.short_deck.state.ShortDeckPokerState, i: int, t: int, locks: Dict[str, multiprocessing.synchronize.Lock] = {})¶

Update pre flop strategy using a more theoretically sound approach.

…

Parameters

agent (Agent) – Agent being trained.
state (ShortDeckPokerState) – Current game state.
i (int) – The Player.
t (int) – The iteration.
locks (Dict[str, mp.synchronize.Lock]) – The locks for multiprocessing

poker_ai.ai.runner module¶

Script for using multiprocessing to train agent.

CLI Use¶

Below you can run python runner.py –help to get the following description of the two commands available in the CLI, resume and search: ``` Usage: poker_ai train start [OPTIONS]

Train agent from scratch.

Options:

--strategy_interval INTEGER: Update the current strategy whenever the iteration % strategy_interval == 0.
--n_iterations INTEGER: The total number of iterations we should train the model for.
--lcfr_threshold INTEGER: A threshold for linear CFR which means don’t apply discounting before this iteration.
--discount_interval INTEGER: Discount the current regret and strategy whenever iteration % discount_interval == 0.
--prune_threshold INTEGER: When a uniform random number is less than 95%, and the iteration > prune_threshold, use CFR with pruning.
--c INTEGER: Pruning threshold for regret, which means when we are using CFR with pruning and have a state with a regret of less than c, then we’ll elect to not recusrively visit it and it’s child nodes.
--n_players INTEGER: The number of players in the game.
--dump_iteration INTEGER: When the iteration % dump_iteration == 0, we will compute a new strategy and write that to the accumlated strategy, which gets normalised at a later time.
--update_threshold INTEGER: When the iteration is greater than update_threshold we can start updating the strategy.
--lut_path TEXT: The path to the files for clustering the infosets.
--pickle_dir TEXT: Whether or not the lut files are pickle files. This lookup method is deprecated.

–single_process / –multi_process: Either use or don’t use multiple processes.
–sync_update_strategy / –async_update_strategy: Do or don’t synchronise update_strategy.

–sync_cfr / –async_cfr Do or don’t synchronuse CFR. –sync_discount / –async_discount

Do or don’t synchronise the discounting.

–sync_serialise / –async_serialise: Do or don’t synchronise the serialisation.

--nickname TEXT: The nickname of the study.
--help: Show this message and exit.

```

poker_ai.ai.runner._safe_search(server: poker_ai.ai.multiprocess.server.Server)¶: Safely run the server, and allow user to control c.

poker_ai.ai package¶

Subpackages¶

Submodules¶

poker_ai.ai.agent module¶

poker_ai.ai.ai module¶

poker_ai.ai.runner module¶

CLI Use¶

Module contents¶