poker_ai.ai package¶
Subpackages¶
Submodules¶
poker_ai.ai.agent module¶
-
class
poker_ai.ai.agent.
Agent
(agent_path: Union[str, pathlib.Path, None] = None, use_manager: bool = True)¶ Bases:
object
Create agent, optionally initialise to agent specified at path.
…
- Variables
strategy (Dict[str, Dict[str, int]]) – The preflop strategy for an agent.
regret (Dict[str, Dict[strategy, int]]) – The regret for an agent.
poker_ai.ai.ai module¶
-
poker_ai.ai.ai.
calculate_strategy
(this_info_sets_regret: Dict[str, float]) → Dict[str, float]¶ Calculate the strategy based on the current information sets regret.
…
- Parameters
this_info_sets_regret (Dict[str, float]) – Regret for each action at this info set.
- Returns
strategy – Strategy as a probability over actions.
- Return type
Dict[str, float]
-
poker_ai.ai.ai.
cfr
(agent: poker_ai.ai.agent.Agent, state: poker_ai.games.short_deck.state.ShortDeckPokerState, i: int, t: int, locks: Dict[str, multiprocessing.synchronize.Lock] = {}) → float¶ Regular counter factual regret minimization algorithm.
…
- Parameters
agent (Agent) – Agent being trained.
state (ShortDeckPokerState) – Current game state.
i (int) – The Player.
t (int) – The iteration.
locks (Dict[str, mp.synchronize.Lock]) – The locks for multiprocessing
-
poker_ai.ai.ai.
cfrp
(agent: poker_ai.ai.agent.Agent, state: poker_ai.games.short_deck.state.ShortDeckPokerState, i: int, t: int, c: int, locks: Dict[str, multiprocessing.synchronize.Lock] = {})¶ Counter factual regret minimazation with pruning.
…
- Parameters
agent (Agent) – Agent being trained.
state (ShortDeckPokerState) – Current game state.
i (int) – The Player.
t (int) – The iteration.
locks (Dict[str, mp.synchronize.Lock]) – The locks for multiprocessing
-
poker_ai.ai.ai.
serialise
(agent: poker_ai.ai.agent.Agent, save_path: pathlib.Path, t: int, server_state: Dict[str, Union[str, float, int, None]], locks: Dict[str, multiprocessing.synchronize.Lock] = {})¶ Write progress of optimising agent (and server state) to file.
…
- Parameters
agent (Agent) – Agent being trained.
save_path (ShortDeckPokerState) – Current game state.
t (int) – The iteration.
server_state (Dict[str, Union[str, float, int, None]]) – All the variables required to resume training.
locks (Dict[str, mp.synchronize.Lock]) – The locks for multiprocessing
-
poker_ai.ai.ai.
update_strategy
(agent: poker_ai.ai.agent.Agent, state: poker_ai.games.short_deck.state.ShortDeckPokerState, i: int, t: int, locks: Dict[str, multiprocessing.synchronize.Lock] = {})¶ Update pre flop strategy using a more theoretically sound approach.
…
- Parameters
agent (Agent) – Agent being trained.
state (ShortDeckPokerState) – Current game state.
i (int) – The Player.
t (int) – The iteration.
locks (Dict[str, mp.synchronize.Lock]) – The locks for multiprocessing
poker_ai.ai.runner module¶
Script for using multiprocessing to train agent.
CLI Use¶
Below you can run python runner.py –help to get the following description of the two commands available in the CLI, resume and search: ``` Usage: poker_ai train start [OPTIONS]
Train agent from scratch.
- Options:
- --strategy_interval INTEGER
Update the current strategy whenever the iteration % strategy_interval == 0.
- --n_iterations INTEGER
The total number of iterations we should train the model for.
- --lcfr_threshold INTEGER
A threshold for linear CFR which means don’t apply discounting before this iteration.
- --discount_interval INTEGER
Discount the current regret and strategy whenever iteration % discount_interval == 0.
- --prune_threshold INTEGER
When a uniform random number is less than 95%, and the iteration > prune_threshold, use CFR with pruning.
- --c INTEGER
Pruning threshold for regret, which means when we are using CFR with pruning and have a state with a regret of less than c, then we’ll elect to not recusrively visit it and it’s child nodes.
- --n_players INTEGER
The number of players in the game.
- --dump_iteration INTEGER
When the iteration % dump_iteration == 0, we will compute a new strategy and write that to the accumlated strategy, which gets normalised at a later time.
- --update_threshold INTEGER
When the iteration is greater than update_threshold we can start updating the strategy.
- --lut_path TEXT
The path to the files for clustering the infosets.
- --pickle_dir TEXT
Whether or not the lut files are pickle files. This lookup method is deprecated.
- –single_process / –multi_process
Either use or don’t use multiple processes.
- –sync_update_strategy / –async_update_strategy
Do or don’t synchronise update_strategy.
–sync_cfr / –async_cfr Do or don’t synchronuse CFR. –sync_discount / –async_discount
Do or don’t synchronise the discounting.
- –sync_serialise / –async_serialise
Do or don’t synchronise the serialisation.
- --nickname TEXT
The nickname of the study.
- --help
Show this message and exit.
-
poker_ai.ai.runner.
_safe_search
(server: poker_ai.ai.multiprocess.server.Server)¶ Safely run the server, and allow user to control c.