August 17, 2025
Haris Sahovic
Algorithmic Pokémon Teambuilding

Introduction: The Pokémon Teambuilding Problem
Competitive Pokémon battles are organized by formats, which define a unique set of rules and restrictions. Within a given format, success is determined by two key factors: building a high-performing team and executing an effective battle policy. This article focuses on the first of these challenges: teambuilding.
A team is a selection of Pokémon, their assigned items, abilities, movesets, and carefully tuned statistical values (EVs and IVs), all of which must adhere to the format's legality rules. In most formats, teams are composed of six Pokémon.
The core problem is one of optimization: how do we find the best team? In this post, we will explore this question by:
- Formalizing the teambuilding problem
- Introducing a simple no-interaction model as a baseline
- Proposing a more complex model that captures team synergy and interactions
Pokemon battle mechanics primer
- Format: 6v6, turn-based singles; players may switch between turns. A battle ends when one side has no usable Pokémon left.
- Types and matchups: Moves have types; damage is scaled by effectiveness (super-effective, resisted, immune). Same-type attack bonus (STAB) boosts damage. Example: Electric is strong vs Water/Flying; Ground is immune to Electric.
- Status effects: Sleep, paralysis, freeze, burn, and poison alter speed, damage, or move availability, creating tempo advantages and shaping team roles.
- Roles and synergy: Teams mix offensive threats, defensive pivots, and support. Synergy means teammates cover each other's weaknesses or enable strategies (e.g., spreading status to open sweeps). Each team must contain six unique Pokémon.
- Speed and move order: The faster Pokémon acts first (ties break randomly). Priority is rare in Gen 1, so speed control strongly affects tempo.
- Gen 1 specifics: No abilities or held items. Critical hits and accuracy/PP matter. In our experiments, we use standardized movesets and parameters to isolate model behavior.
1. Problem formalization
Let \(\mathcal{T}\) be the set of all legal teams in a given format. Our goal is to select a team \(t \in \mathcal{T}\) that maximizes its performance.
The fundamental building block for measuring performance is the win probability \(p(t \text{ beats } s)\) of team \(t\) against an opponent's team \(s\). This probability is always conditioned on a "battle policy" - a fixed strategy that determines the in-game decisions for both teams.
For now, we will assume that we have access to a function to measure this win probability. Section 2 and Section 3 will later introduce two models that can be fitted to estimate this win probability.
1.1. Countering a Specific Team
The simplest scenario is preparing for a known opponent. Given a single opponent team \(s\), our goal is to find a team \(t\) that maximizes the win probability against it:
\[ \operatorname*{arg\,max}_{t \in \mathcal{T}} \; p(t \text{ beats } s) \]This formulation is useful when preparing for a specific team, but it is too narrow for general competitive play.
1.2. Optimizing for a Metagame
In most competitive settings, such as online ladder play, we face a diverse range of opponents. This environment is often referred to as a metagame, which we can formalize as a probability distribution \(m\) over the set of all teams \(\mathcal{T}\).
The probability \(p_m(s)\) represents the likelihood of encountering team \(s\) in any given match in the metagame.
We use lowercase \(t, s\) for specific teams and uppercase \(T, S\) for random teams drawn from distributions.
Our objective now shifts to finding the team that performs best on average against this metagame:
\[ \operatorname*{arg\,max}_{t \in \mathcal{T}} \; \mathbb{E}_{S \sim m} \left[ p(t \text{ beats } S) \right] = \operatorname*{arg\,max}_{t \in \mathcal{T}} \; \sum_{s \in \mathcal{T}} p(t \text{ beats } s) \cdot p_m(s) \]This is a more robust objective, as it encourages building teams that are resilient against a variety of popular strategies.
1.3. Finding a Nash Equilibrium
The metagame is not static. Over time, counter-play dynamics emerge and the dominant strategies can change. Building a team that counters the current metagame leaves us vulnerable to future metagame shifts. This dynamic leads to a game-theoretic view of teambuilding.
By a game-theoretic view, we mean modeling teambuilding as a two-player, zero-sum, normal-form game. Each pure strategy is a legal team in \(\mathcal{T}\). When our team \(t\) faces the opponent's team \(s\), the payoff is a function of the win probability; a convenient choice is the margin \(u(t,s) = 2\,p(t \text{ beats } s) - 1\). The game is symmetric: both players choose from the same action set and payoffs satisfy \(u(t,s) = -u(s,t)\).
Instead of choosing a single best team, we can seek a mixed strategy — a probability distribution over teams that is robust to counterplay.
This leads to the concept of a Nash equilibrium. In this context, an equilibrium is a pair of distributions, \(q\) (for us) and \(m\) (for the opponent), where neither player can improve their expected win rate by unilaterally changing their strategy. Formally, for any alternative strategies \(q'\) and \(m'\):
\[ \mathbb{E}_{T \sim q, \; S \sim m} \left[ p(T \text{ beats } S) \right] \geq \mathbb{E}_{T \sim q', \; S \sim m} \left[ p(T \text{ beats } S) \right] \quad \forall q' \]and
\[ \mathbb{E}_{T \sim q, \; S \sim m} \left[ p(T \text{ beats } S) \right] \geq \mathbb{E}_{T \sim q, \; S \sim m'} \left[ p(T \text{ beats } S) \right] \quad \forall m' \]At equilibrium, our optimal strategy is to sample teams from our distribution \(q\), creating a balanced and unpredictable portfolio.
Why equilibrium? In zero-sum games, any Nash equilibrium guarantees a value \(v\): by playing its equilibrium mixture, a player secures at least \(v\) against any opponent strategy (the minimax property). This makes equilibrium mixtures inherently robust to counterplay and metagame shifts: no unilateral deviation can improve expected performance.
About symmetry. Because this teambuilding game is symmetric and zero-sum, a symmetric equilibrium exists; we can take \(q^* = m^*\) (both players use the same distribution over teams). We keep separate symbols \(q\) and \(m\) to allow for asymmetric variants (e.g., role asymmetries, format constraints, or different fixed battle policies), in which case equilibrium mixtures need not coincide.
2. A Simple Baseline: The No-Interaction Logistic Model
2.1. The No-Interaction Model
To make these objectives concrete, we need a way to estimate the win probability \(p(t \text{ beats } s)\). Let's start with a simple baseline model that ignores all interactions between Pokémon.
We assume each Pokémon has an intrinsic strength parameter, \(\theta_i\), and that the total strength of a team is simply the sum of its members' strengths. The probability of winning is then modeled using the logistic function, \(\sigma(x) = \frac{1}{1 + e^{-x}}\), based on the difference in total team strength:
\[ p(t \text{ beats } s) = \sigma\left(\sum_{i \in t} \theta_i - \sum_{j \in s} \theta_j\right) \]The logistic function maps score difference to (0,1), obeys \(\sigma(x)+\sigma(-x)=1\), and yields linear log-odds (Bradley–Terry/Elo), which is a convenient choice for our purposes.
This no-interaction model is a significant oversimplification, as it neglects synergy, type matchups, and all other strategic interactions. However, it serves as a useful analytical baseline.
An interesting consequence of this model is that it collapses all three of our proposed objectives into one. Whether countering a single team, optimizing for a metagame, or seeking a Nash equilibrium, the optimal strategy is always the same: construct the team with the highest possible sum of Pokémon strengths.
Because all interactions are ignored, there is no need for mixed strategies or adaptation - a single "best" team dominates in all contexts.
2.2 No-interaction model: experimental results
2.2.1. Methodology
To estimate the intrinsic strength (\(\theta_i\)) of each Pokémon, we use simulated battles under a simplified, controlled setup, then fit a model to those outcomes. We first state the setup assumptions that make the experiment tractable, and then outline the procedure.
Setup assumptions:
- Simplified movesets: We deliberately use heavily simplified movesets and parameters relative to the real Gen 1 OU metagame. Each Pokémon has one standardized set (four most-used moves, max EVs, neutral nature, no items). This reduces variance and isolates model behavior, but it does not capture the full diversity and depth of actual play.
- Fixed heuristic agent: All battles are played by a fixed heuristic agent (
SimpleHeuristicsPlayer
frompoke-env
). It's a simple rules-based policy that prioritizes highest estimated damage, uses strong status when advantageous, and switches into favorable type matchups; it performs no lookahead or opponent modeling. This policy is far from optimal; results reflect the model under this specific, limited battle policy, not human or optimal play.
Given these assumptions, our procedure is:
- Data Sourcing and Pokémon Selection: We used Smogon usage statistics for the Generation 1 Overused (OU) format - the community-standard singles ruleset for Generation 1, with standard clauses such as Sleep and Species, and certain legendaries banned - from July 2025. To focus on the most relevant Pokémon, we filtered for those with a usage rate of at least 0.1%. This resulted in 47 Pokémon. For each of these Pokémon, we created a single, standardized moveset using their four most-used moves. To simplify the problem, EVs were maxed out for all stats, the nature was set to a neutral one ("Serious"), and no items were assigned.
- Battle Simulation: We generated a dataset of 50,000 simulated battles. In each battle, two teams of six Pokémon were created by randomly sampling from the pool of standardized sets defined in the previous step. We call this the uniform metagame. This approach simulates a uniform metagame where any popular Pokémon is equally likely to be encountered. Battles were conducted on the Pokémon Showdown simulator, via
poke-env
. - Battle Policy: Both teams in every simulated battle were controlled by the same fixed AI, the
SimpleHeuristicsPlayer
from thepoke-env
library. Using a single agent ensures consistent, repeatable win rates under the stated assumptions. - Model Fitting: We fitted a logistic regression model to the battle outcomes. The model was designed to learn the contribution of each individual Pokémon to the probability of winning. For each battle, we created a feature vector where each Pokémon was represented by a value: +1 if it was on the first team, -1 if it was on the second, and 0 otherwise. The model then learned a single coefficient for each Pokémon, which serves as our estimate for its intrinsic strength, \(\theta_i\). To enforce team-swap symmetry, every training example \((x, y)\) was augmented with \((-x, 1 - y)\). Logistic regression hyperparameters were scikit-learn's default values (LogisticRegression).
Feature space - Baseline
Let \(P\) be the number of eligible Pokémon.
Each training example is \((x, y)\) with \(x \in \{-1,0,1\}^P\) and \(y \in \{0,1\}\), where \(x_i = 1\) if Pokémon \(i\) is on team A, \(x_i = -1\) if on team B, and \(x_i = 0\) otherwise; \(y=1\) if team A wins and \(y=0\) otherwise.
2.2.2. Results by Pokémon in Gen 1 OU
The following table lists the Pokémon in the Gen 1 OU metagame we selected with their strength parameter, in descending order.
All Pokémon rankings
2.2.3. Comparison to uniform metagame
According to our no-interaction model, the optimal team is the one composed of the six Pokémon with the highest intrinsic strength coefficients: .
Note: Team members are displayed alphabetically; order is not meaningful in this article, as leads are chosen randomly.
To validate this prediction, we evaluated this team against the uniform metagame. We ran a simulation of 500 battles, pitting this team against randomly generated teams from the same pool of Pokémon used to train the model. The team achieved a win rate of .
This high win rate demonstrates that even in this simplified framework, the model is effective at identifying a dominant strategy. Even when synergy, matchups and lower level details are disregarded, a team of individually powerful Pokémon overwhelmingly succeeds against the average team in the uniform metagame.
Key takeaways
- Best team (baseline): - see 2.2.3.
- Uniform metagame: win rate over 500 games.
- Limitation: Ignoring interactions collapses objectives; one team dominates.
3. A More Complex Model: The Interaction Model
3.1. The Interaction Model
The no-interaction model provides a simple baseline, but its core assumption - that a team's strength is merely the sum of its parts - is fundamentally at odds with the nature of Pokémon. The game is defined by interactions: synergy between teammates and strategic matchups against opponents. A Pokémon that is powerful in isolation may be ineffective without the right partners, while a seemingly weaker Pokémon might excel as a specific counter to a popular threat.
To capture these dynamics, we introduce an interaction model. This model extends our logistic regression framework by adding two new types of parameters:
- Synergy (\(\alpha_{i,j}\)): A coefficient that represents the change in win probability when Pokémon \(i\) and Pokémon \(j\) are on the same team. A positive value indicates they work well together, while a negative value suggests they are redundant or have conflicting roles.
- Matchup (\(\beta_{i,j}\)): A coefficient that represents the change in win probability when Pokémon \(i\) is in \(t\) and Pokémon \(j\) is in \(s\). This term directly models how well one Pokémon counters another.
The win probability is now calculated based on three components: the intrinsic strength of each Pokémon, the synergy between all pairs of Pokémon on each team, and the matchup effects between all pairs of opposing Pokémon.
We anchor parameters by zero-centering \(\theta\) (and similarly regularizing \(\alpha\), \(\beta\)) and we use an anti-symmetric matchup convention (\(\beta_{i,j} = -\beta_{j,i}\)). The full model is expressed as:
\[ p(t \text{ beats } s) = \sigma \left( \left( \sum_{i \in t} \theta_i - \sum_{j \in s} \theta_j \right) + \left( \sum_{\{i,k\} \subset t} \alpha_{i,k} - \sum_{\{j,l\} \subset s} \alpha_{j,l} \right) + \sum_{i \in t, j \in s} \beta_{i,j} \right) \]where \(\sigma\) is the logistic function.
This richer model no longer collapses the three teambuilding objectives. The presence of interaction terms means that the best team now depends on the context of the metagame. A team that excels against one set of opponents may be weak against another, making the problems of metagame optimization and finding a Nash equilibrium distinct and far more interesting.
3.2. Interaction model: experimental results
3.2.1. Methodology
The interaction model was trained on the same dataset of 50,000 simulated battles described in Section 2.2.1. The core difference lies in the model's complexity and the feature engineering process. As in Section 2.2.1, we augment each training example \((x, y)\) with \((-x, 1-y)\) and ensure paired examples reside in the same split to preserve team-swap symmetry.
While the no-interaction model only learned coefficients for individual Pokémon, the interaction model was designed to capture pairwise relationships. To achieve this, we expanded the feature vector for each battle to include three types of terms:
- Individual Strength: A term for each Pokémon, indicating its presence on team A (+1) or team B (-1), as in the baseline model.
- Synergy: A term for every possible pair of Pokémon. This feature was set to +1 if the pair appeared on team A, -1 if on team B, and 0 otherwise. This allows the model to learn the value of specific Pokémon pairings on the same team.
- Matchup: A term for every possible pair of opposing Pokémon. For a pair of Pokémon (i, j) where i is on team A and j is on team B, this feature was set to +1. This allows the model to learn the advantage of Pokémon i against Pokémon j.
Feature space - interaction
Let \(P\) be the number of eligible Pokémon.
We construct \(x = [x^{(\theta)}, x^{(\alpha)}, x^{(\beta)}]\) where:
- \(x^{(\theta)} \in \{-1,0,1\}^P\) with \(x^{(\theta)}_i = 1\) if \(i\) is on team A, \(-1\) if on team B, else 0
- \(x^{(\alpha)} \in \{-1,0,1\}^{\binom{P}{2}}\) indexed by unordered pairs \(\{i,k\}\) with entry +1 if \(\{i,k\} \subset\) team A, -1 if \(\{i,k\} \subset\) team B, else 0
- \(x^{(\beta)} \in \{0,1\}^{P\times P}\) indexed by ordered pairs \((i,j)\) with entry 1 if \(i \in\) team A and \(j \in\) team B, else 0
With anti-symmetric parameters \(\beta\) and the augmentation \((x,y) \mapsto (-x, 1-y)\), this encoding captures both matchup directions. Labels are \(y \in \{0,1\}\) with \(y=1\) if team A wins.
This expansion resulted in a significantly larger feature space. To manage this complexity and prevent overfitting, we fitted the logistic regression model with L2 regularization. To enforce swap-symmetry between teams, each training example \((x, y)\) was augmented with its negated counterpart \((-x, 1 - y)\), with pairs kept within the same split. The regularization hyperparameter, C, was tuned by splitting the data into a training and validation set (before augmentation) and then refitting on the full augmented set using the selected C.
3.2.2. Results by Pokémon in Gen 1 OU
The interaction model provides a more nuanced view of each Pokémon's role. Beyond a single strength score, it quantifies how each Pokémon interacts with every other Pokémon in the metagame.
The following table displays every Pokémon in our restricted Gen 1 OU metagame, sorted by their adjusted intrinsic strength (\(\theta_i\)). For each of these Pokémon, we also present the top five Pokémon that synergize best with it, the top five Pokémon it counters, and the top five Pokémon it is countered by, along with their respective \(\alpha\) and \(\beta\) parameters.
All Pokémon rankings
3.2.3. Team selection algorithm and comparison to the no-interaction model's best team in Gen 1 OU
We now address the first objective from Section 1.1 in the interaction setting: given a specific opposing team, construct a team that maximizes our win probability. As the fixed opponent, we take the no-interaction model's best team from Section 2.2.3: .
Let \(\alpha\) denote this fixed opponent team. Using the interaction model as our evaluator, we search for a team \(t\) that maximizes \(p(t \text{ beats } \alpha)\). We use a fast coordinate-ascent local search:
- Initialization: start from the six Pokémon with the largest learned intrinsic strengths (\(\theta_i\)) under the interaction model.
- Iterative improvement: for each slot in the team, try swapping in every Pokémon not currently on the team; keep the swap that increases the model-predicted win probability the most; if any improvement is found, restart from the first slot; otherwise advance to the next slot.
- Termination: stop when a full pass over all six positions yields no improvement.
This procedure quickly converges to the following counter-team: . The interaction model predicts a win probability of against \(\alpha\). To validate, we simulated 500 battles using the same fixed battle policy as in Section 2.2.1; the observed win rate was , closely matching the model's prediction.
This result highlights why interactions matter: the optimal counter to a strong, synergy-agnostic team is not simply the six most individually powerful Pokémon. Instead, the counter-team emphasizes favorable matchups (for example, Electric- and Ground-types into Starmie/Lapras/Zapdos) and synergies between members, which the interaction model captures.
3.2.4. Expansion to uniform Gen 1 OU metagame
We now optimize for a metagame rather than a single team, as described in Section 1.2. To approximate the uniform metagame in Gen 1 OU, we draw a set of random opponent teams by uniformly sampling six Pokémon from the eligible pool. Let this sample be \(\{s^{(1)}, \dots, s^{(N)}\}\). Our objective is to find a team \(t\) that maximizes the interaction model's estimate of the average win probability:
\[ \operatorname*{arg\,max}_{t \in \mathcal{T}} \; \frac{1}{N} \sum_{n=1}^N p(t \text{ beats } s^{(n)}) \]We reuse the coordinate-ascent search from 3.2.3, but evaluate each candidate swap by its average predicted win rate across the sampled teams. Using \(N=100\) sample teams, this procedure yields the following team for the uniform metagame: . Multiple runs of this procedure yields the same team.
The interaction model predicts an average win rate of against the sampled metagame. To validate, we played 500 simulated battles where the opponent was freshly and uniformly sampled each game; the observed win rate was , closely tracking the model's estimate.
Compared to the no-interaction best team, the uniform-optimized team shifts composition to emphasize strong cross-team matchups (notably Electric- and Normal-type pressure) and beneficial synergies captured by the interaction terms, underscoring the value of modeling interactions when optimizing for a metagame.
3.2.5. Metagame iteration to Nash equilibrium
Finally, we approximate a Nash equilibrium of the metagame, using the interaction model to evaluate matchups. Our iterative best-response procedure is loosely inspired by Double Oracle approaches for continuous games (Kroupa & Votroubek, 2021). We begin from the uniform-optimized team from Section 3.2.4 and iteratively add its best response team, using the coordinate-ascent procedure from Section 3.2.3. Concretely, let the current pool of strategies be \(\mathcal{S}\). We repeatedly compute a best-response team to the last-added team and append it to \(\mathcal{S}\) until a previously-seen team appears (cycle) or a size limit is reached.
Given the resulting pool \(\mathcal{S} = \{t^{(1)},\dots,t^{(K)}\}\), we form a zero-sum payoff matrix \(A \in \mathbb{R}^{K\times K}\). For each ordered pair \((i,j)\), we use the interaction model's predicted head-to-head probability and map it to a margin. Because the model is trained with symmetric augmentation \((x, y) \to (-x, 1-y)\), these predictions are anti-symmetric: \(p\!\left(t^{(i)} \text{ beats } t^{(j)}\right)= 1 - p\!\left(t^{(j)} \text{ beats } t^{(i)}\right)\), so a single direction suffices:
\[ A_{ij} = 2\, p\!\left(t^{(i)} \text{ beats } t^{(j)}\right) - 1. \]We then solve the normal-form game \((A, -A)\) — i.e., the two-player zero-sum game where the row player's payoffs are \(A\) and the column player's payoffs are \(-A\) — using support enumeration with Nashpy to obtain a mixed-strategy equilibrium over \(\mathcal{S}\). In our run, the equilibrium (both players identical) puts non-zero probability on three teams:
Notably, the uniform-optimized team from Section 3.2.4 receives 0% weight at equilibrium within this restricted pool. The equilibrium mixture reflects a balance of counterplay captured by interaction terms: each supported team covers weaknesses of the others, stabilizing the metagame against unilateral deviations.
Finally, note that the converged metagame here is very simple. This is expected: our interaction model is deliberately basic and all simulations use a fixed, heuristic battle agent. In a richer setting - with more expressive models (moves, items, EVs, leads) and stronger, adaptive agents - we expect the equilibrium support to be broader and the metagame structure more complex.
Key takeaways
4. Conclusion and Future Work
In this article, we established a formal framework for algorithmic Pokémon teambuilding. We began by defining clear optimization objectives, from countering single teams to finding a Nash equilibrium in a dynamic metagame. We demonstrated that a simple no-interaction model, which values Pokémon on individual strength alone, is easily outperformed by an interaction-based model that accounts for synergies and counter-play dynamics inherent to competitive Pokémon.
While this analysis provides a strong foundation, several exciting avenues for future research remain open. These extensions could lead to more sophisticated and powerful teambuilding tools.
- Expanding Model Complexity: Our models simplify a team to a selection of six Pokémon. A key extension would be to incorporate other critical teambuilding variables, such as move selection, item choice, abilities, natures, leads, and the fine-tuning of effort and individual values (EVs / IVs). This would create a much higher-dimensional but more realistic optimization problem.
- Exploring Different Metagames: The empirical analysis was focused on the Gen 1 OU metagame. Applying these models to other formats - from different single-battle generations to entirely different rule sets like the VGC doubles format - would test their robustness and reveal format-specific strategic patterns.
- Leveraging Human Battle Data: These models could be fitted on large-scale datasets of human competitive battles.
- Analyzing and Predicting Metagames: Beyond building a single optimal team, these frameworks can be used as analytical tools. They could model an empirical metagame to identify its core strengths and weaknesses or even predict how a metagame might evolve in response to new strategies, focusing on the dynamic simulation of a competitive ecosystem.
- Creating a Pokémon Teambuilding Tool: These models could be used to create a tool to help users build teams.
Citations and references
- Nash equilibrium: John F. Nash (1950). “Equilibrium points in n-person games.” Proceedings of the National Academy of Sciences 36(1), 48–49. https://www.pnas.org/doi/10.1073/pnas.36.1.48
- Non-cooperative games: John F. Nash (1951). “Non-Cooperative Games.” Annals of Mathematics 54(2), 286–295. https://doi.org/10.2307/1969529
- Bradley–Terry: R. A. Bradley, M. E. Terry (1952). “Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons.” Biometrika 39(3/4), 324–345. https://www.jstor.org/stable/2334029 · Overview: Wikipedia
- Elo rating system: Arpad E. Elo (1978). “The Rating of Chessplayers, Past and Present.” Arco Publishing. Overview: Wikipedia
- Pokémon Showdown: Pokémon Showdown (website). https://pokemonshowdown.com/
- poke-env: Haris Sahovic. poke-env (Python interface to Pokémon Showdown). GitHub repository: https://github.com/hsahovic/poke-env
- scikit-learn: LogisticRegression documentation. https://scikit-learn.org/.../LogisticRegression.html
- Nashpy: Vincent Knight, James Campbell (2018). “Nashpy: A Python library for the computation of Nash equilibria.” Journal of Open Source Software 3(30), 904. https://doi.org/10.21105/joss.00904 · Docs: https://nashpy.readthedocs.io/
- Double Oracle for continuous games: Tomáš Kroupa, Tomáš Votroubek (2021). “Multiple Oracle Algorithm to Solve Continuous Games” (extends Double Oracle to continuous games). arXiv:2109.04178. https://arxiv.org/abs/2109.04178
- Smogon usage statistics: https://www.smogon.com/stats/
- Smogon Dex: Gen 1 OU: https://www.smogon.com/dex/rb/formats/ou/
- Pokémon Database sprites: https://pokemondb.net/sprites