FAQ

Overview

What is Gonka?

Gonka is a decentralized network for high‑efficiency AI compute — run by those who run it. It functions as a cost-effective and efficient alternative to centralized cloud services for AI model training and inference. As a protocol, it's not a company or a start-up.

In terms of Blockchain, Gonka is the foundational ledger and coordination layer (L1) of the decentralized AI network. It records balances, transactions and cryptographic artifacts that prove Hosts have correctly performed AI work, while all actual computations (such as inference and training) happen off-chain.
In terms of Network, Gonka is a comprehensive ecosystem of participants, including Hosts and Developers that interact through a decentralized infrastructure. Powered by the Gonka Blockchain, the network distributes tasks, verifies results, and rewards honest participation only verifiable useful work, creating a competitive, scalable environment for AI workloads.

What problem is Gonka solving?

Gonka is a decentralized AI infrastructure built to reduce dependence on centralized cloud providers and to use computational power more efficiently than traditional decentralized networks. Its goal is to direct as much compute as possible toward useful AI tasks, such as inference and training, while minimizing waste due to consensus overhead.

Who are the key participants in the Gonka ecosystem?

The Gonka ecosystem has four key participant groups:

Developer builds and deploys AI applications by leveraging the network’s distributed computing power.
Gonka Contributor participates in development of the core blockchain codebase, protocol upgrades, performance optimizations, security patches, and new feature integrations.
Holder holds the network’s native coin, which simply means having a Gonka wallet with coins in it. Holders may hold coins, transfer or sell them, spend them on inference and use them according to the protocol rules. Being a holder does not imply any obligation, responsibility, or governance role beyond standard coin ownership.
Host contributes compute capacity to the network. Hosts perform inference and other computational tasks and are rewarded proportionally to their contributed compute capacity, as long as they maintain honest participation and reliability. Hosts form the backbone of the network. Only Hosts have voting power in the network. This voting power represents their weight in governance and is used to propose and vote on protocol decisions, parameter changes, and upgrades. Any Host acts as Validator, Transfer Agent and an Executor (these are not predefined or on-chain roles, but dynamic operational functions assumed when processing a inference request).

What is the GNK coin?

GNK is the native coin of the Gonka network. It is used to incentivize participants, price resources, and ensure the sustainable growth of the network.

Can I buy GNK coin?

Native GNK is not listed on any centralized exchange (CEX) yet, so you cannot buy it on a CEX. Follow official announcements on Twitter for any updates regarding listings.

There are, however, two legitimate ways to obtain GNK today:

Mine as a Host. Contribute computational resources to the network and earn GNK directly. See mine as a Host.
Buy WGNK on Ethereum and bridge it back to GNK. GNK can be bridged to Ethereum as WGNK (wrapped GNK), which is a standard ERC-20 that trades on DEXs such as Uniswap. You can buy WGNK there and then bridge it back to native GNK. See the Ethereum bridge overview.

Track WGNK price and market data

You can follow WGNK (wrapped GNK) price, market cap, and trading volume on:

Verify the contract address before trading

The only official Ethereum representation of GNK is WGNK at 0x972a7a92d92796a98801a8818bcf91f1648f2f68 — this address is both the bridge contract and the WGNK ERC-20 token. Always confirm any listing or trade resolves to this exact address.

Fake GNK listings and pages still exist on other trackers and networks: any coin claiming to be GNK on Solana, or on any contract other than the WGNK address above, is not an official GNK asset. Always verify information through official channels.

What makes the protocol efficient?

What differentiates Gonka from the "big players" is its pricing and the fact that, despite the Host's size, the inference is distributed equally. To learn more, please review the Whitepaper.

How does the network operate?

The network's operation is collaborative and depends on the role you wish to take:

As a Developer: You can use the network's computational resources to build and deploy your AI applications.
As a Host: You can contribute your computational resources to power the network. The protocol is designed to reward you for your contribution, ensuring the network's continuity and sovereignty.

Is this documentation exhaustive?

No. This documentation covers the primary concepts, standard workflows, and the most common operational scenarios of the protocol, but it does not represent the full behavior or implementation details of the codebase. The code includes additional logic, interactions, and edge cases that are not described here.

Because Gonka is an open-source and decentralized network, various parameters, mechanisms, and governance-driven behaviors may evolve through on-chain voting and community decisions. Certain details may change after publication, and not all edge cases or future updates may be reflected immediately.

For Hosts, Developers, and Contributors, the ultimate source of truth is the code itself. If any discrepancy arises between this documentation and the code, the code always prevails.

Participants are encouraged to review the relevant repositories, governance proposals, and network updates to ensure their understanding aligns with the protocol’s current state.

What is the incentive for contributing computational resources?

We've created a dedicated document focused on Tokenomics, where you can find all the information about how the incentive in being measured.

What are the hardware requirements?

You can find the minimum and recommended hardware specifications clearly outlined in the documentation. You should review this section to ensure your hardware meets the requirements for effective contribution.

What wallets can I use to store GNK coins?

You can store GNK coin in several supported wallets within the Cosmos ecosystem:

Keplr
Cosmostation
inferenced CLI - a command-line utility for local account management and network operations in Gonka.

Important for existing Leap Wallet users

If you previously created your Gonka account with Leap Wallet, please be aware that Leap is shutting down all of its products on May 28, 2026, including the browser extension, mobile app, and dashboard.

Because Leap is a non-custodial wallet, your assets and account remain on-chain. However, to keep access to your wallet, you should import your existing recovery phrase into another supported wallet, such as Keplr, before Leap services go offline.

Where can I find useful information about Gonka?

Below are the most important resources for learning about the Gonka ecosystem:

gonka.ai — the main entry point for project information and ecosystem overview.
Whitepaper — technical documentation describing the architecture, consensus model, Proof-of-Compute, etc.
Tokenomics — project tokenomics overview, including supply, distribution, incentives, and economic design.
GitHub — access to the project’s source code, repositories, development activity, and open-source contributions.
Discord — the primary place for community discussions, announcements, and technical support.
X (Twitter) — news, updates, and announcements.

Tokenomics

How is governance power calculated in Gonka?

Gonka uses a PoC-weighted voting model:

Proof-of-Compute (PoC): Voting power is proportional to your verified compute contribution.
Collateral commitment:
- 20% of PoC-derived voting weight is activated automatically.
- To unlock the remaining 80%, you must lock GNK coins as collateral.
This ensures that governance influence reflects real compute work + economic collateral.

For the first 180 epochs (approximately 6 months), new participants can participate in governance and earn voting weight through PoC alone, without collateral requirements. During this period, the full governance rights are available, while voting weight remains tied to verified compute activity.

Why does Gonka require locking GNK coins for governance power?

Voting power is never derived solely from holding coins. GNK coins serve as economic collateral, not as a source of influence. Influence is earned through continuous computational contribution, while locking GNK collateral is required to secure participation in governance and enforce accountability.

Collateral

What is collateral?

Collateral is required to activate the collateral-eligible portion of PoC weight after the Grace Period (first 180 epochs). After the Grace Period:

Base Weight (default 20%) is always active.
The remaining weight requires GNK collateral to become active.

Collateral ensures that participants with governance weight also bear economic responsibility. Parameters are defined on-chain and may change via governance. Always verify current values before making economic decisions.

Is collateral required per node or per account?

Collateral is deposited per account. If multiple ML nodes are linked to the same account, the required collateral is calculated based on the total account weight across all nodes.

Do I need to deposit collateral?

Yes, if you want to activate more than the Base Weight. If no collateral is deposited, only the Base Weight remains active.

How much collateral is required?

Formula:

Required Collateral =
Total Weight × (1 - base_weight_ratio) × collateral_per_weight_unit

Because PoC weight may fluctuate across epochs, depositing the exact minimum may result in temporary under-collateralization. Smaller weights may experience proportionally larger relative fluctuations. A buffer of up to 2× the calculated minimum is recommended while collateral levels remain relatively small.

Recommended (with conservative buffer):
Total Weight × 2 × (1 - base_weight_ratio) × collateral_per_weight_unit

Can I partially collateralize my weight?

Yes. Your total Active Weight consists of:

Base Weight (always active)
Collateral-Eligible Weight (activated proportionally to deposited collateral)

If you deposit less than the full required amount:

Base Weight remains fully active
Only the corresponding portion of collateral-eligible weight becomes active
The remaining portion stays inactive

Active Weight is calculated as:

Active Weight =
Base Weight +
(Deposited Collateral / Required Collateral) × Collateral-Eligible Weight

What happens if I do not deposit enough collateral?

Your Active Weight is reduced proportionally. Because rewards are distributed proportionally to Active Weight, other hosts receive a larger share of emissions when you under-collateralize. Inactive weight is not directly redistributed, it simply does not participate in consensus.

When does collateral take effect?

Collateral must be deposited before the start of the epoch to be effective. Collateral deposited during an epoch:

does NOT increase weight immediately
applies starting from the next epoch

Collateral cannot be increased mid-epoch.

In what unit do I deposit collateral?

Transactions must use ngonka, not GNK.

1 GNK = 1,000,000,000 ngonka

Example:

10 GNK = 10,000,000,000 ngonka

Can collateral be slashed?

Yes. Collateral may be slashed for:

Invalid inference
Downtime (Confirmation PoC failure or jail)

Invalid inference slashing is capped at once per epoch. Downtime slashing may be applied per jail event.

What happens to slashed coins?

Currently, slashed GNK is permanently burned and removed from circulation. Future governance may change this mechanism.

Can I withdraw collateral?

Yes. Withdrawal triggers an unbonding period (default: 1 epoch). During unbonding, collateral remains subject to slashing. After unbonding funds are automatically returned to your account balance.

What collateral is NOT

Collateral is NOT voting power. Voting power is derived from PoC weight, not token balance.
Collateral is NOT delegation. Each account must back its own weight.
Collateral is NOT a permanent lock. It can be withdrawn (subject to unbonding).
Collateral was NOT required during the Grace Period (first 180 epochs).

How are epoch-minted rewards distributed?

A fixed amount of GNK is minted each epoch and distributed proportionally to Active PoC Weight. Active Weight determines:

Your share of epoch-minted Reward Coins
Your governance influence

If your Active Weight is reduced due to insufficient collateral, your share of epoch rewards decreases proportionally. Inactive weight does not receive rewards.

Do I need to manually deposit collateral?

Yes. Collateral must be deposited by submitting an on-chain transaction. It is not activated automatically. If no collateral is deposited:

Your node continues operating normally.
It is not jailed or disabled.
Only the Base Weight (e.g. 20%) remains active.

Your rewards and governance influence will be reduced proportionally.

Can vested (locked) GNK be used as collateral?

No. Collateral must be deposited from your available (unlocked) GNK balance. Vested coins that are not yet released cannot be used as collateral.

Governance

What types of changes require a Governance Proposal?

Governance Proposals are required for any on-chain changes that affect the network, for example:

Updating module parameters (MsgUpdateParams)
Executing software upgrades
Adding, updating, or deprecating inference models
Any other actions that must be approved and executed via the governance module

Who can create a Governance Proposal?

Anyone with a valid governance key (cold account) can pay the required fee and create a Governance Proposal. However, each proposal must still be approved by active participants through PoC-weighted voting. Proposers are encouraged to discuss significant changes off-chain first (for example, via GitHub or community forums) to increase the likelihood of approval. See the full guide.

What happens if a proposal fails?

If a proposal does not meet quorum → it automatically fails
If the majority votes no → proposal rejected, no on-chain changes
If a significant percentage votes no_with_veto (above veto threshold) → proposal is rejected and flagged, signaling strong community disagreement
Deposits may or may not be refunded, depending on chain settings

Can governance parameters themselves be changed?

Yes. All key governance rules — quorum, majority threshold, and veto threshold — are on-chain configurable and can be updated via Governance Proposals. This allows the network to evolve decision-making rules as participation patterns and compute economic changes.

What should I do if I cannot vote because I do not have access to the cold key, or if I want another key to vote on my behalf?

If the key that holds voting power is not the key you use for day-to-day operations, governance voting permission can be granted in advance.

In this setup:

Granter = account that owns voting power (cold key)
Grantee = account that will submit votes on the granter’s behalf (warm key)

There are two common scenarios:

1. You want to vote, but you do not have access to the key that holds the voting power.

Please contact the owner of that key and ask them to grant your key permission to vote on their behalf. Without this authorization, your key cannot submit a governance vote for that voting power.

2. You want another key to vote on your behalf.

Use the grant command below from the key that holds the voting power. This will authorize the grantee key to submit governance votes for you. This delegation only allows voting on governance proposals. The grantee can still vote for their own key as well. The granter can revoke this permission at any time.

1) Grant voting permission (run from the granter key)

CommandExample response

./inferenced tx authz grant <GRANTEE_GONKA_ADDRESS> generic \
  --msg-type=/cosmos.gov.v1beta1.MsgVote \
  --from=<GRANTER_KEY_NAME> \
  --chain-id=gonka-mainnet \
  --expiration=<UNIX_TIMESTAMP> \
  --home .inference \
  --keyring-backend file

{
    "height": "0",
    "txhash": "8D96FB6FC06FFB928FBC89FE950689CD040C7F338C197BA856175EC7462A3FFA",
    "codespace": "",
    "code": 0,
    "data": "",
    "raw_log": "",
    "logs": [],
    "info": "",
    "gas_wanted": "0",
    "gas_used": "0",
    "tx": null,
    "timestamp": "",
    "events": []
}

2) Verify the grant exists (run from any node)

CommandExample response

./inferenced query authz grants <GRANTER_GONKA_ADDRESS> <GRANTEE_GONKA_ADDRESS> \
  --node="http://<MAINNET_NODE_URL>:26657" \
  --output=json | jq .

{
    "grants": [
        {
            "authorization": {
                "type": "cosmos-sdk/GenericAuthorization",
                "value": {
                    "msg": "/cosmos.gov.v1beta1.MsgVote"
                }
            },
            "expiration": "2026-12-03T18:38:18Z"
        }
    ],
    "pagination": {
        "total": "1"
    }
}

3) Vote using the grantee

CommandExample response

# Find the proposal ID which you are voting for - use it as <VOTE_PROPOSAL_ID> in the voting body 
./inferenced query gov proposals --output json

# Prepare the file with the voting body
cat > /tmp/authz-vote.json << 'EOF'
{
  "body": {
    "messages": [
      {
        "@type": "/cosmos.authz.v1beta1.MsgExec",
        "grantee": "<GRANTEE_GONKA_ADDRESS>",
        "msgs": [
          {
            "@type": "/cosmos.gov.v1beta1.MsgVote",
            "proposal_id": "<VOTE_PROPOSAL_ID>",
            "voter": "<GRANTER_GONKA_ADDRESS>",
            "option": "VOTE_OPTION_YES"
          }
        ]
      }
    ]
  }
}
EOF


# Vote using the file 
./inferenced tx authz exec /tmp/authz-vote.json \  --from=<GRANTEE_KEY_NAME> \ 
--chain-id=gonka-mainnet \
--home .inference \
--keyring-backend file \
--node="http://<MAINNET_NODE_URL>:26657" -y

{
    "pagination": {
        "total": "1"
    },
    "proposals": [
        {
            "deposit_end_time": "2026-03-06T10:40:07.016920026Z",
            "final_tally_result": {
                "abstain_count": "0",
                "no_count": "0",
                "no_with_veto_count": "0",
                "yes_count": "0"
            },
            "id": "1",
            "messages": [
                {
                    "type": "cosmos-sdk/MsgSoftwareUpgrade",
                    "value": {
                        "authority": "gonka10d07y265gmmuvt4z0w9aw880jnsr700j2h5m33",
                        "plan": {
                            "height": "406062",
                            "info": "{\n \"binaries\":{\n \"linux/amd64\":\"https://github.com/product-science/race-releases/releases/download/release%2Fv0.2.10-testnet1/inferenced-amd64.zip?checksum=sha256:fb71310427436aebac32813735231882fca420cf0d94b036f8cacd055d0e1c78\"\n },\n \"api_binaries\":{\n \"linux/amd64\":\"https://github.com/product-science/race-releases/releases/download/release%2Fv0.2.10-testnet1/decentralized-api-amd64.zip?checksum=sha256:6fe214f4bb2d831c02ce407682820d95d01e6ae94a33fe9c4617b80e0ca716ce\"\n }\n }",
                            "name": "v0.2.10",
                            "time": "0001-01-01T00:00:00Z"
                        }
                    }
                }
            ],
            "proposer": "gonka1xfvr8mywcrxrcrryvj8c5d2grvyjdj5c90fd88",
            "status": 2,
            "submit_time": "2026-03-04T10:40:07.016920026Z",
            "summary": "Upgrade Proposal v0.2.10",
            "title": "Upgrade Proposal v0.2.10",
            "total_deposit": [
                {
                    "amount": "50000000",
                    "denom": "ngonka"
                }
            ],
            "voting_end_time": "2026-03-04T10:50:07.016920026Z",
            "voting_start_time": "2026-03-04T10:40:07.016920026Z"
        }
    ]
}

Voting options:

VOTE_OPTION_YES
VOTE_OPTION_ABSTAIN
VOTE_OPTION_NO
VOTE_OPTION_NO_WITH_VETO

4) Revoke delegation (run from the granter key)

CommandExample response

./inferenced tx authz revoke <GRANTEE_GONKA_ADDRESS> /cosmos.gov.v1beta1.MsgVote \
  --from=<GRANTER_KEY_NAME> \
  --chain-id=gonka-mainnet \
  --home .inference \
  --keyring-backend file

{
    code: 0
    codespace: ""
    data: ""
    events: []
    gas_used: "0"
    gas_wanted: "0"
    height: "0"
    info: ""
    logs: []
    raw_log: ""
    timestamp: ""
    tx: null
    txhash: A2C3CDA9E95DCF143C0D8981A4F573F1E68879ECF4903B25BA97383C3F2FDFBA
}

Improvement proposals

What’s the difference between Governance Proposals and Improvement Proposals?

Governance Proposals → on-chain proposals. Used for changes that directly affect the network and require on-chain voting. Examples:

Updating network parameters (MsgUpdateParams)
Executing software upgrades
Adding new models or capabilities
Any modification that needs to be executed by the governance module

Improvement Proposals → off-chain proposals under the control of active participants. Used for shaping the long-term roadmap, discussing new ideas, and coordinating larger strategic changes.

Managed as Markdown files in the /proposals directory
Reviewed and discussed through GitHub Pull Request
Approved proposals are merged into the repository

How are Improvement Proposals reviewed and approved?

The goal of community proposal review is to gather community validation: reactions, comments, and concrete feedback that strengthens the case for eventual governance approval. This is especially relevant if the proposal implementation requires a lot of work, long-term commitment, coordination or significant changes into the protocol.

Read the recommended guide first: https://github.com/gonka-ai/gonka/discussions/795. It explains what belongs in Improvement Proposals and how to write a strong, structured proposal.
Publish and discuss improvement proposals in GitHub Discussions (preferred); previously they were stored as Markdown files in the /proposals directory.
To help the community evaluate your proposal (and improve its chances later in governance), it’s in the proposer’s interest and responsibility to actively gather early feedback and support signals (reactions, comments, concrete concerns).
- Share the Discussion link in Discord’s #improvements-proposals channel for reach and visibility, and amplify it through any other channels available to you (including direct outreach to Hosts/miners) to gather practical input and support.
- Share context about your experience and expertise in the proposal thread. If you represent a team or a company, mention it and link relevant work to help the community assess credibility and evaluate the proposal more efficiently.
Community review:
- Active contributors and maintainers discuss the proposal in GitHub Discussions. Conversation can happen on any platform, but please consolidate the key context back in GitHub Discussions: it keeps the full history in one place, stays searchable, and is much easier to maintain over time. GitHub is the main source of truth.
- Please ask questions, provide feedback, suggestions, refinements, and upvote relevant proposals. Everybody’s attention and participation in this process is essential for sustainable evolution of the chain.
Strong positive feedback and a high number of upvotes signal genuine community demand, allowing teams to treat well-received proposals as part of a community-driven roadmap and begin implementation with confidence in both community alignment and eventual governance approval. Note that feedback from the hosts is essential - it can help structure the project into milestones, unlock partial bounty payments, and even secure grants from the community pool. Ultimately, however, all on-chain updates and payments are subject to governance approval.

Can an Improvement Proposal lead to a Governance Proposal?

Yes. Often, an Improvement Proposal is used to explore ideas and gather consensus before drafting a Governance Proposal. For example:

You might first propose a new model integration as an Improvement Proposal.
After the community agrees, an on-chain Governance Proposal is created to update parameters or trigger the software upgrade.

Voting

How does the voting process work?

Once a proposal is submitted and funded with the minimum deposit, it enters the voting period
Voting options: yes, no, no_with_veto, abstain
- yes → approve the proposal
- no → reject the proposal
- no_with_veto → reject and signal a strong objection
- abstain → neither approve nor reject, but counts toward quorum
You can change your vote anytime during the voting period; only your last vote is counted
If quorum and thresholds are met, the proposal passes and executes automatically via the governance module

To vote, you can use the command below. This example votes yes, but you can replace it with your preferred option (yes, no, no_with_veto, abstain):

./inferenced tx gov vote 2 yes \
      --from <cold_key_name> \
      --keyring-backend file \
      --unordered \
      --timeout-duration=60s --gas=2000000 --gas-adjustment=5.0 \
      --node $NODE_URL/chain-rpc/ \
      --chain-id gonka-mainnet \
      --yes

How can I track the status of a Governance Proposal?

You can query the proposal status at any time using the CLI:

export NODE_URL=http://47.236.19.22:18000
./inferenced query gov tally 2 -o json --node $NODE_URL/chain-rpc/

Running a Node

What if I want to stop mining but still use my account when I come back?

To restore a Network Node in the future, it will be sufficient to back up:

cold key (most important, everything else can be rotated)
secres from tmkms: .tmkms/secrets/
keyring from .inference .inference/keyring-file/
node key from .inference/config .inference/config/node_key.json
password for warm key KEYRING_PASSWORD

My node was jailed. What does it mean?

Your validator has been jailed because it signed fewer than 50 blocks out of the last 100 blocks (the requirement counts the total number of signed blocks in that window, not consecutive ones). This means your node was temporarily excluded (about 15 minutes) from block production to protect network stability. There are several possible reasons for this:

Consensus Key Mismatch. The consensus key used by your node may differ from the one registered on-chain for your validator. Make sure the consensus key you are using matches the one registered on-chain for your validator.
Unstable Network Connection. Network instability or interruptions can prevent your node from reaching consensus, causing missed signatures. Ensure your node has a stable, low-latency connection and isn’t overloaded by other processes.

Rewards: Even if your node is jailed, you will continue to receive most of the rewards as a Host as long as it remains active in inference or other validator-related work. So, the reward is not lost unless inference issues are detected.

How to Unjail Your Node: To resume normal operation, unjail your validator once the issue is resolved. Use your cold key to submit the unjail transaction:

export NODE_URL=http://<NODE_URL>:<port>
 ./inferenced tx slashing unjail \
    --from <cold_key_name> \
    --keyring-backend file \
    --chain-id gonka-mainnet \
    --gas auto \
    --gas-adjustment 1.5 \
    --fees 200000ngonka \
    --node $NODE_URL/chain-rpc/

Then, to check if the node was unjailed:

 ./inferenced query staking delegator-validators \
    <cold_key_addr> \
    --node $NODE_URL/chain-rpc/

When a node is jailed, it shows jailed: true.

How to decommission an old cluster?

Follow this guide to safely shut down an old cluster without impacting reputation.

1) Use the following command to disable each ML Node:

curl -X POST http://localhost:9200/admin/v1/nodes/<id>/disable

You can list all node IDs with:

curl http://localhost:9200/admin/v1/nodes | jq '.[].node.id'

2) Nodes that are not scheduled to serve inference during the next Proof-of-Compute (PoC) will automatically stop during that PoC. Nodes that are scheduled to serve inference will remain active for one more epoch before stopping. You can verify a node’s status in the mlnode field at:

curl http://<inference_url>/v1/epochs/current/participants

Once a node is marked as disabled, it is safe to power off the MLNode server.

3) After all MLNodes have been disabled and powered off, you can shut down the Network Node. Before doing so, it’s recommended (but optional) to back up the following files:

.dapi/api-config.yaml
.dapi/gonka.db (created after on-chain upgrade)
.inference/config/
.inference/keyring-file/
.tmkms/

If you skip the backup, the setup can still be restored later using your Account Key.

My node cannot connect to the default seed node specified in the `config.env`

If your node cannot connect to the default seed node, simply point it to another one by updating three variables in config.env.

SEED_API_URL - HTTP endpoint of the seed node (used for API communication). Choose any URL from the list below and assign it directly to SEED_API_URL.

export SEED_API_URL=<chosen_http_url>

Available genesis API URLs:

http://185.216.21.98:8000
http://36.189.234.197:18026
http://36.189.234.237:17241
http://node1.gonka.ai:8000
http://node2.gonka.ai:8000
http://node3.gonka.ai:8000
https://node4.gonka.ai
http://47.236.26.199:8000
http://47.236.19.22:18000
http://gonka.spv.re:8000

SEED_NODE_RPC_URL - Public Tendermint RPC access MUST go through the seed node HTTP(S) proxy path /<chain-rpc>. Use the same scheme (http or https), host, and port as in SEED_API_URL, and append /chain-rpc.
```
export SEED_NODE_RPC_URL=http://<host>/chain-rpc
```
Example
```
SEED_NODE_RPC_URL=http://node2.gonka.ai:8000/chain-rpc/ 
```

Important

Do NOT use http://<host>:26657 as a public RPC endpoint.
Port 26657 MUST be internal-only (localhost/private network). Public RPC must go via /<chain-rpc>.

SEED_NODE_P2P_URL - the P2P address used for networking between nodes. You must obtain the P2P port from the seed node’s status endpoint via the same /<chain-rpc> proxy.

Query the node:

http://<host>:<http_port>/chain-rpc/status

Example

https://node3.gonka.ai/chain-rpc/status

Find listen_addr in the response, for example:

""listen_addr"": ""tcp://0.0.0.0:5000""

Use this port:

export SEED_NODE_P2P_URL=tcp://<host>:<p2p_port>

Example

export SEED_NODE_P2P_URL=tcp://node3.gonka.ai:5000

Final result example

export SEED_API_URL=http://node2.gonka.ai:8000
export SEED_NODE_RPC_URL=http://node2.gonka.ai:8000/chain-rpc/
export SEED_NODE_P2P_URL=tcp://node2.gonka.ai:5000

How to change the seed nodes?

There are two distinct ways to update seed nodes, depending on whether the node has already been initialized.

Option 1. Manually edit seed nodes (after initialization)Option 2. Reinitialize the Node (seeds auto-applied from environment)

Once the file .node_initialized is created, the system no longer updates seed nodes automatically. After that point:

The seed list is used as-is
Any changes must be done manually
You can add as many seed nodes as you want

The format is a single comma-separated string:

seeds = "<node1_id>@<node1_ip>:<node1_p2p_port>,<node2_id>@<node2_ip>:<node2_p2p_port>"

To view known peers from any running node, use chain RPC:

curl http://47.236.26.199:8000/chain-rpc/net_info | jq

In response, look for:

listen_addr - P2P endpoint
rpc_addr - RPC endpoint

Example:

     % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 94098    0 94098    0     0  91935      0 --:--:--  0:00:01 --:--:-- 91982
{
  "jsonrpc": "2.0",
  "id": -1,
  "result": {
    "listening": true,
    "listeners": [
      "Listener(@tcp://47.236.26.199:5000)"
    ],
    "n_peers": "50",
    "peers": [
      {
        "node_info": {
          "protocol_version": {
            "p2p": "8",
            "block": "11",
            "app": "0"
          },
          "id": "ce6f26b9508839c29e0bfd9e3e20e01ff4dda360",
          "listen_addr": "tcp://85.234.78.106:5000",
          "network": "gonka-mainnet",
          "version": "0.38.17",
          "channels": "40202122233038606100",
          "moniker": "my-node",
          "other": {
            "tx_index": "on",
            "rpc_address": "tcp://0.0.0.0:26657"
          }
        },
...

This displays all peers the node currently sees.

Use this method if you want the node to regenerate its configuration and automatically apply the seed nodes defined in config.env.

source config.env
docker compose down node
sudo rm -rf .inference/data/ .inference/.node_initialized
sudo mkdir -p .inference/data/

After restarting the node, it will behave like a fresh installation and recreate its configuration, including the seeds from the environment variables. To verify which seeds were actually applied:

sudo cat .inference/config/config.toml

Look for the field:

seeds = [...]

How are Hardware, Node Weight, and ML Node configuration actually validated?

The chain does not verify real hardware. It only validates the total participant weight, and this is the sole value used for weight distribution and reward calculation.

Any breakdown of this weight across ML Nodes, as well as any “hardware type” or other descriptive fields, is purely informational and can be freely modified by the Host.

When creating or updating a node (for example, via POST http://localhost:9200/admin/v1/nodes as shown in the handler code at https://github.com/gonka-ai/gonka/blob/aa85699ab203f8c7fa83eb1111a2647241c30fc4/decentralized-api/internal/server/admin/node_handlers.go#L62), the hardware field can be explicitly specified. If it is omitted, the API service attempts to auto-detect hardware information from the ML Node.

In practice, many hosts run a proxy ML Node behind which multiple servers operate; auto-detection only sees one of these servers, which is a fully valid setup. Regardless of configuration, all weight distribution and rewards rely solely on the Host total weight, and the internal split across ML Nodes or the reported hardware types never affect on-chain validation.

How to switch to `Qwen/Qwen3-235B-A22B-Instruct-2507-FP8`, upgrade ML Nodes, and remove other models?

Historical — v0.2.8 / PoC v2 migration

This entry documents the v0.2.8 / PoC v2 migration (Epoch 155), when Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 was the single enforced model. It is kept for historical reference only. As of epoch 308, Qwen3-235B has been retired by governance (proposal 78) and MiniMaxAI/MiniMax-M2.7 is the base/active PoC model. For current setup, follow the Host Quickstart and Multi-Model PoC — Host Operations Guide.

This guide explains how Hosts should update their ML Nodes in response to changes in v0.2.8 model availability and the upcoming PoC v2 update. ML Node configuration compliance with PoC v2 is observed starting Epoch 155. Hosts are encouraged to review and prepare their ML Node configuration before that point. Migration to PoC v2 can be scheduled after epoch 155. After the migration phase, weights from ML Nodes that do not meet the configuration requirements may not be counted.

1. Background: model availability changes (upgrade v0.2.8)

As part of the v0.2.8 upgrade, the active model set has been updated.

Supported models (active set)

Only the following models remain supported:

Qwen/Qwen3-235B-A22B-Instruct-2507-FP8
Qwen/Qwen3-32B-FP8

Qwen/Qwen3-32B-FP8 is supported during the migration period, but does not contribute to PoC v2 readiness or weight assignment. Participation in PoC v2 requires serving Qwen/Qwen3-235B-A22B-Instruct-2507-FP8.

Removed models

All previously supported models are removed from the active set and must not be served.

2. PoC v2 readiness criteria (Important)

Successful participation in the PoC v2 transition requires both of the following:

All your ML Nodes serve Qwen/Qwen3-235B-A22B-Instruct-2507-FP8. This is the only model that contributes to PoC v2 weight.
All your ML Nodes are upgraded to a PoC v2–compatible image:
- ghcr.io/product-science/mlnode:3.0.12-post3
- ghcr.io/product-science/mlnode:3.0.12-post3-blackwell

Important

Serving the correct model without upgrading the ML Node is not sufficient.
Nodes that do not meet both conditions will not be eligible once the network switches to a single-model configuration.
The ML Node upgrade must be completed before the migration is finished and PoC v2 is activated through a separate governance proposal following the v0.2.8 upgrade.
The v0.2.8 upgrade itself does not enable PoC v2.

3. Check ML Node allocation status (recommended safety step)

Before changing models, you should inspect the current ML Node allocation. Query your Network Node admin API:

curl http://127.0.0.1:9200/admin/v1/nodes

Look for the field:

"timeslot_allocation": [
  true,
  false
]

Interpretation:

First boolean: Whether the node is serving inference in the current epoch
Second boolean: Whether the node is scheduled to serve inference in the next PoC

Recommended behavior

Prefer changing the model only on nodes where the second value is false
This reduces risk while PoC v2 behavior is still being observed
Gradual rollout across epochs is encouraged

4. Update models for ML Nodes: keep the supported model only

Pre-download model weights (recommended). To avoid startup delays, pre-download weights into HF_HOME:

mkdir -p $HF_HOME
huggingface-cli download Qwen/Qwen3-235B-A22B-Instruct-2507-FP8

Use ML Node Management API to switch ML Node to a supported model (Qwen/Qwen3-235B-A22B-Instruct-2507-FP8).

For example:

curl -X PUT "http://localhost:9200/admin/v1/nodes/node1" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "node1",
    "host": "inference",
    "inference_port": 5000,
    "poc_port": 8080,
    "max_concurrent": 800,
    "models": {
      "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8": {
        "args": [
          "--tensor-parallel-size",
          "4",
          "--max-model-len",
          "240000"
        ]
      }
    }
  }'

Changes applied via the Admin API will replace model for the next epoch (https://gonka.ai/host/mlnode-management/#updating-an-existing-mlnode)

Note

node-config.json is used only on the first launch of the Network Node API or when the local state/database is removed. Edit it for a fresh restart. For existing nodes, model updates should be performed via the Admin API.

5. Upgrade the ML Node image (required for PoC v2)

Edit docker-compose.mlnode.yml and update the ML Node image:

Standard GPUs

image: ghcr.io/product-science/mlnode:3.0.12-post3

NVIDIA Blackwell GPUs

image: ghcr.io/product-science/mlnode:3.0.12-post3-blackwell

Apply changes and restart services. From gonka/deploy/join:

source config.env
docker compose -f docker-compose.yml -f docker-compose.mlnode.yml pull
docker compose -f docker-compose.yml -f docker-compose.mlnode.yml up -d

6. Verify model serving (applied at the next epoch)

Confirm the ML Node is serving Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 only, which is the only model used for PoC v2 weights and future weight assignment:

curl http://127.0.0.1:8080/v1/models | jq

Optionally re-check node allocation:

curl http://127.0.0.1:9200/admin/v1/nodes

Governance and PoC v2 activation notes

PoC v2 is introduced in stages, not activated all at once.

Stage 1. Observation (current state after v0.2.8)

After the v0.2.8 upgrade, PoC v2 logic is available but not active for weight assignment.

During this stage:

Hosts are able to serve Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 or Qwen/Qwen3-32B-FP8
Hosts must switch their ML Nodes to serve Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 and upgrade them to PoC v2-compatible versions in order to contribute to PoC v2 weight.
The network observes adoption to assess Host readiness for moving to PoC v2 weights.

Stage 2. Governance proposal (optional, future) Once a sufficient level of adoption among active Hosts is observed (approximately 50%):

A separate governance proposal may be submitted
This proposal may request approval to activate PoC v2 and use PoC v2 for weight assignment

The adoption threshold is observational only and does not trigger any automatic changes.

Stage 3. Activation (only after governance approval)

PoC v2 becomes the active method of weight assignment only if and when the governance proposal is approved by the chain.

Until this proposal is approved:

PoC v2 remains inactive for weight assignment
The existing PoC mechanism continues to be used to determine weight

Summary checklist

Before PoC v2 activation, ensure that:

ML Node serves Qwen/Qwen3-235B-A22B-Instruct-2507-FP8
All other models are removed from the configuration
ML Node image is 3.0.12-post3 (or 3.0.12-post3-blackwell)

Keys & security

Which CLI version should be used for warm keys created after the v0.2.9 upgrade?

For granting permissions to new warm keys created after the v0.2.9 upgrade, the CLI version v0.2.9 should be used.

Where can I find information on key management?

You can find a dedicated section on Key Management in the documentation. It outlines the procedures and best practices for securely managing your application's keys on the network.

I Cleared or Overwrote My Consensus Key

If you are using tmkms and deleted the .tmkms folder, simply restart tmkms — it will automatically generate a new key. To register the new consensus key, submit the following transaction:

./inferenced tx inference submit-new-participant \
    <PUBLIC_URL> \
    --validator-key <CONSENSUS_KEY> \
    --keyring-backend file \
    --unordered \
    --from <COLD_KEY_NAME> \
    --timeout-duration 1m \
    --node http://<node-url>/chain-rpc/ \
    --chain-id gonka-mainnet

I Deleted the Warm Key

Back up the cold key on your local device, outside the server.

1) Stop the API container:

docker compose down api --no-deps

2) Set KEY_NAME for the warm key in your config.env file.

3) [SERVER]: Recreate the warm key:

source config.env && docker compose run --rm --no-deps -it api /bin/sh

4) Then execute inside the container:

printf '%s\n%s\n' "$KEYRING_PASSWORD" "$KEYRING_PASSWORD" | \
inferenced keys add "$KEY_NAME" --keyring-backend file

5) [LOCAL]: From your local device (where you backed up the cold key), run the transaction:

./inferenced tx inference grant-ml-ops-permissions \
    gonka-account-key \
    <address-of-warm-key-you-just-created> \
    --from gonka-account-key \
    --keyring-backend file \
    --gas 2000000 \
    --node http://<node-url>/chain-rpc/

6) Start the API container:

source config.env && docker compose up -d

Proof-of-Compute (PoC)

What is Proof-of-Compute?

Proof of Compute (PoC) is a consensus mechanism that replaces capital-based or hash-based weighting with provable Transformer-based computational capability. It defines how real AI compute is measured and converted into governance and consensus weight. PoC is executed through short, synchronized Sprints that occur at the end of each epoch. Outside the Sprint, the epoch is used for real-world AI computation. In practice, the terms Proof of Compute (PoC) and Sprint are often used interchangeably. When referring to “Next PoC” or “PoC phase”, this typically means the next Sprint, which is the execution phase of Proof of Compute.

What is Sprint?

Sprint is a phase of Proof of Compute. During a Sprint, all Hosts simultaneously run AI-relevant inference on a transformer with randomized layers over a stream of nonces, producing output vectors. A Host’s voting power for the next epoch is proportional to the number of nonces it processed, as long as the reported outputs are verifiably produced by the required Sprint model.

How to simulate Proof-of-Compute (PoC)?

You may want to simulate PoC on a ML Node yourself to make sure that everything will work when the PoC phase begins on the chain.

To run this test you either need to have a running ML Node that isn't yet registered with the api node or pause the api node. To pause the api node use docker pause api. Once you’re finished with the test you can unpause: docker unpause api.

For the test itself you will be sending POST /v1/pow/init/generate request to ML Node, the same that api node sends at the start of the POC phase: https://github.com/gonka-ai/gonka/blob/312044d28c7170d7f08bf88e41427396f3b95817/mlnode/packages/pow/src/pow/service/routes.py#L32

The following model params are used for PoC: https://github.com/gonka-ai/gonka/blob/312044d28c7170d7f08bf88e41427396f3b95817/mlnode/packages/pow/src/pow/models/utils.py#L41

If your node is in the INFERENCE state then you first need to transition the node to the stopped state:

curl -X POST "http://<ml-node-host>:<port>/api/v1/stop" \
  -H "Content-Type: application/json"

Now you can send a request to initiate PoC:

curl -X POST "http://<ml-node-host>:<port>/api/v1/pow/init/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "node_id": 0,
    "node_count": 1,
    "block_hash": "EXAMPLE_BLOCK_HASH",
    "block_height": 1,
    "public_key": "EXAMPLE_PUBLIC_KEY",
    "batch_size": 1,
    "r_target": 10.0,
    "fraud_threshold": 0.01,
    "params": {
      "dim": 1792,
      "n_layers": 64,
      "n_heads": 64,
      "n_kv_heads": 64,
      "vocab_size": 8196,
      "ffn_dim_multiplier": 10.0,
      "multiple_of": 8192,
      "norm_eps": 1e-5,
      "rope_theta": 10000.0,
      "use_scaled_rope": false,
      "seq_len": 256
    },
    "url": "http://api:9100"
  }'

Send this request to 8080 port of ML Node's proxy container or directly to ML Node's 8080 https://github.com/gonka-ai/gonka/blob/312044d28c7170d7f08bf88e41427396f3b95817/deploy/join/docker-compose.mlnode.yml#L26

If the test runs successfully, you will see logs similar to the following:

2025-08-25 20:53:33,568 - pow.compute.controller - INFO - Created 4 GPU groups:
2025-08-25 20:53:33,568 - pow.compute.controller - INFO -   Group 0: GpuGroup(devices=[0], primary=0) (VRAM: 79.2GB)
2025-08-25 20:53:33,568 - pow.compute.controller - INFO -   Group 1: GpuGroup(devices=[1], primary=1) (VRAM: 79.2GB)
2025-08-25 20:53:33,568 - pow.compute.controller - INFO -   Group 2: GpuGroup(devices=[2], primary=2) (VRAM: 79.2GB)
2025-08-25 20:53:33,568 - pow.compute.controller - INFO -   Group 3: GpuGroup(devices=[3], primary=3) (VRAM: 79.2GB)
2025-08-25 20:53:33,758 - pow.compute.controller - INFO - Using batch size: 247 for GPU group [0]
2025-08-25 20:53:33,944 - pow.compute.controller - INFO - Using batch size: 247 for GPU group [1]
2025-08-25 20:53:34,151 - pow.compute.controller - INFO - Using batch size: 247 for GPU group [2]
2025-08-25 20:53:34,353 - pow.compute.controller - INFO - Using batch size: 247 for GPU group [3]

Then the service will start sending generated nonces to DAPI_API__POC_CALLBACK_URL.

2025-08-25 20:54:58,822 - pow.service.sender - INFO - Sending generated batch to http://api:9100/

The http://api:9100 url won’t be available if you paused the api container or if ML Node container and api containers don’t share the same docker network. Expect to see error messages saying that the ML Node failed to send generated batches. The important part is to make sure that the generation process is happening.

What does a confirmation ratio of 0 mean, and what should I do if this happens?

A 0% confirmation ratio is an unusual condition and indicates that no nonces were sent from your API node during the epoch, meaning the node did not participate in Confirmation Proof-of-Compute (CPoC) at all. To investigate, check the API node logs and ML Node logs, as they should indicate why nonce submission did not occur.

Possible causes include:

API node misconfiguration or downtime
publicly exposed admin or management ports that allow access to ML Nodes
consensus node lagging behind the chain, which may delay PoC participation beyond the allowed window
ML Node driver failures

To mitigate this risk, ensure that admin and management ports are not publicly accessible, verify that the API node is running and correctly configured, monitor consensus node synchronization, and set up alerts for ML Node and driver failures.

Performance & troubleshooting

How do I protect my node from DDoS attacks using the proxy pre-release (v0.2.8)?

A new proxy version is available with rate limiting and DDoS protection measures.

What’s New:

Rate limiting on API/RPC endpoints, as protection against excessive requests that have been affecting network nodes
Blocks resource-intensive internal routes like training and poc-batches
Optional disabling of /chain-api, /chain-rpc, and /chain-grpc endpoints

Update instructions

Step 1: Update proxy image

sed -i -E 's|(image:[[:space:]]*ghcr.io/product-science/proxy)(:.*)?$|\1:0.2.8-pre-release-proxy@sha256:6ccb8ac8885e03aab786298858cc763a99f99543b076f2a334b3c67d60fb295f |' docker-compose.yml

Important

Step 2 disables /chain-api, /chain-rpc, and /chain-grpc endpoints on this node. After applying it, this node will no longer serve public RPC traffic. If you operate public RPC endpoints, you must run separate RPC-only nodes (without these restrictions) and keep this node private.

Step 2 (Optional): Disable chain-api, chain-rpc, and chain-grpc

If you want to completely disable /chain-api, /chain-rpc, and /chain-grpc endpoints:

sed -i 's|DASHBOARD_PORT=5173|DASHBOARD_PORT=5173\n      - DISABLE_CHAIN_API=${DISABLE_CHAIN_API:-true}\n      - DISABLE_CHAIN_RPC=${DISABLE_CHAIN_RPC:-true}\n      - DISABLE_CHAIN_GRPC=${DISABLE_CHAIN_GRPC:-true}\n|' docker-compose.yml

Disable the training URL that was used for recent attacks:

sed -i -E -e '/GONKA_API_(EXEMPT|BLOCKED)_ROUTES/d' -e 's|(- GONKA_API_PORT=9000)|\1\n      - GONKA_API_EXEMPT_ROUTES=chat inference\n      - GONKA_API_BLOCKED_ROUTES=poc-batches training|' docker-compose.yml

After this, your proxy configuration should look like:

proxy:
    container_name: proxy
    image: ghcr.io/product-science/proxy:0.2.8-pre-release-proxy@sha256:6ccb8ac8885e03aab786298858cc763a99f99543b076f2a334b3c67d60fb295f
    ports:
      - "${API_PORT:-8000}:80"
      - "${API_SSL_PORT:-8443}:443"
    environment:
      - NGINX_MODE=${NGINX_MODE:-http}
      - SERVER_NAME=${SERVER_NAME:-}
      - GONKA_API_PORT=9000
      - GONKA_API_EXEMPT_ROUTES=chat inference
      - GONKA_API_BLOCKED_ROUTES=poc-batches training
      - CHAIN_RPC_PORT=26657
      - CHAIN_API_PORT=1317
      - CHAIN_GRPC_PORT=9090
      - DASHBOARD_PORT=5173
      - DISABLE_CHAIN_API=${DISABLE_CHAIN_API:-true}
      - DISABLE_CHAIN_RPC=${DISABLE_CHAIN_RPC:-true}
      - DISABLE_CHAIN_GRPC=${DISABLE_CHAIN_GRPC:-true}

Step 3: Pull and restart proxy

docker compose -f docker-compose.mlnode.yml -f docker-compose.yml pull proxy
source ./config.env && docker compose -f docker-compose.mlnode.yml -f docker-compose.yml up -d --no-deps proxy

Step 4: Close External Port 26657

You can close port 26657 as an external port.

It is optional, but highly recommended:

sed -i 's|- "26657:26657"|#- "26657:26657"|g' docker-compose.yml

This will comment out the port mapping in your node container:

node:
    container_name: node
    ...
    ports:
      - "5000:26656" #p2p
      #- "26657:26657" #rpc

Step 5: Restart the node:

source ./config.env && docker compose -f docker-compose.mlnode.yml -f docker-compose.yml up -d --no-deps node

Accessing Node Status After Closing Port 26657

If you previously accessed the node status using curl -s http://localhost:26657/status, you can now access it from within the containers:

Option 1: From the proxy container (using curl)Option 2: From the node container (using wget)

docker exec proxy curl -s node:26657/status | jq

docker exec node wget -qO- http://localhost:26657/status | jq

For continuous monitoring with watch:

watch -n 5 'docker exec node wget -qO- http://localhost:26657/status | jq -r ".result.sync_info | \"Block: \(.latest_block_height) | Time: \(.latest_block_time) | Syncing: \(.catching_up)\""'

How much free disk space is required for a Cosmovisor update, and how can I safely remove old backups from the `.inference` directory?

Cosmovisor creates a full backup in the .inference state folder whenever it performs an update. For example, you can see a folder like data-backup-<some_date>. As of November 20, 2025, the size of the data directory is about 150 GB, so each backup will take approximately the same amount of space. To safely run the update, it is recommended to have 250+ GB of free disk space. You can remove old backups to free space, although in some cases this may still be insufficient and you might need to expand the server disk. To remove an old backup directory, you can use:

sudo su
cd .inference
ls -la   # view the list of folders. There will be folders like data-backup... DO NOT DELETE ANYTHING EXCEPT THESE
rm -rf <data-backup...>

How to prevent unbounded memory growth in NATS?

NATS is currently configured to store all messages indefinitely, which leads to continuous growth in memory usage. A recommended solution is to configure a 24-hour time-to-live (TTL) for messages in both NATS streams.

Install the NATS CLI. Install Golang by following the instructions here: https://go.dev/doc/install. Then install the NATS CLI:
```
go install github.com/nats-io/natscli/nats@latest
```

If you already have the NATS CLI installed, run:

nats stream info txs_to_send --server localhost:<your_nats_server_port>
nats stream info txs_to_observe --server localhost:<your_nats_server_port>

How to change `inference_url`?

You may need to update your inference_url if:

You changed your API domain;
You moved your API node to a new machine;
You reconfigured HTTPS / reverse proxy;
You are migrating infrastructure and want your Host entry to point to a new endpoint.

This operation does not require re-registration, re-deployment, or key regeneration. Updating your inference_url is performed through the same transaction used for initial registration (the submit-new-participant msg).

The chain logic checks whether your Host (participant) already exists:

If the participant does not exist, the transaction creates a new one;
If the participant already exists, only three fields may be updated: InferenceURL, ValidatorKey, WorkerKey.

All other fields are preserved automatically.

This means updating inference_url is a safe, non-destructive operation.

Note

When a Node updates its execution URL, the new URL becomes active immediately for inference requests coming from other Nodes. However, the URL recorded in ActiveParticipants is not updated until the next epoch because modifying it earlier would invalidate the cryptographic proof associated with the participant set. To avoid service disruption, it is recommended to keep both the previous and the new URLs operational until the next epoch completes.

[LOCAL] Perform the update locally, using your Cold Key:

./inferenced tx inference submit-new-participant \
    <PUBLIC_URL> \
    --validator-key <CONSENSUS_KEY> \
    --keyring-backend file \
    --unordered \
    --from <COLD_KEY_NAME> \
    --timeout-duration 1m \
    --node http://<node-url>/chain-rpc/ \
    --chain-id gonka-mainnet

Verify the update by following the link below and replacing the ending with your node address http://node2.gonka.ai:8000/chain-api/productscience/inference/inference/participant/gonka1qqqc2vc7fn9jyrtal25l3yn6hkk74fq2c54qve

Why is my `application.db` growing so large, and how do I fix it?

Some nodes have an issue with growing size of application.db.

.inference/data/application.db stores the history of states for the chain (not blocks), by default it's state for 362880.

The state history contains a full merkle tree per each state and it's safe to have it preserved for significantly shorter length. For example, only for 1000 blocks.

The pruning parameters can be set in .inference/config/app.toml:

...
pruning = "custom"
pruning-keep-recent = "1000"
pruning-interval    = "100"

New configuration will be used after restart of the node container. But there is a problem - even when pruning is enabled, database clean is really slow.

There are several ways how to reset application.db:

OPTION 1: Full resync from snapshotOPTION 2: Resync from local snapshotOPTION 3: ExperimentalOPTION 4: Upgrade to the pruning fix

1) Stop node

docker stop node

2) Remove data

sudo rm -rf .inference/data/ .inference/.node_initialized
sudo mkdir -p .inference/data/

3) Start node

docker start node

This approach may take some time during which the node will not be able to record transactions.

Please use available trusted nodes to download snapshot.

Snapshots are enabled by default and stored in .inference/data/snapshots

1) Prepare new application.db ( node container's still running)

1.1) Prepare temporary home directory for inferenced

mkdir -p .inference/temp
cp -r .inference/config .inference/temp/config
mkdir -p .inference/temp/data/

1.2) Copy snapshots:

cp -r .inference/data/snapshots .inference/temp/data/

1.3) List snapshots

inferenced snapshots list --home .inference/temp

Copy height for the latest snapshot.

1.4) Start restoring from snapshot ( node container is still running)

inferenced snapshots restore <INSERT_HEIGHT> 3  --home .inference/temp

This might take some time. Once it is finished, you'll have new application.db in .inference/temp/data/application.db

2) Replace application.db with new one

2.1) Stop node container (from another terminal window)

docker stop node

2.2) Move original application.db

mv .inference/data/application.db .inference/temp/application.db-backup
mv .inference/wasm .inference/wasm.db-backup

2.3) Replace it with new one

cp -r .inference/temp/data/application.db .inference/data/application.db
cp -r .inference/temp/wasm .inference/wasm

2.4) Start node container (from another terminal window):

docker start node

3) Wait till node container is synchronized and delete .inference/temp/

If you have several nodes, it is recommended cleaning one by one.

Additional option might be to start separate instance of node container on separate CPU only machine and setup in strict validator mode:

preserve really short history
limit RPC and API access only to api container

Once it's running, move existing tmkms volume to the new node (disable block signing on existing one first).

This is the general idea of the approach. If you decide to try it and have any questions, feel free to reach out on Discord.

A fix is now available for the long-standing issue where application.db continues to grow under many pruning configurations. This improvement was contributed by Lelouch33 and is included in release 0.2.10-post6. With the updated logic and the following settings, application.db can remain around 100 GB:

SNAPSHOT_INTERVAL=1000
SNAPSHOT_KEEP_RECENT=2
pruning-keep-recent = "20000"
pruning-interval = "512"

References:

After upgrading to this binary, pruning will begin after the next snapshot block. This process is relatively heavy and may temporarily slow down the node container while the old state history is being removed.

To reduce operational impact, it is recommended to apply the update to nodes one by one and use a higher pruning-interval, such as 512, to avoid pruning too frequently.

If a node slows down significantly during pruning, restarting the node container may help it catch up.

Applying this update before the upcoming v0.2.11 upgrade is recommended to prevent pruning from starting simultaneously across many nodes.

Apply update (example from v0.2.7, which has identical inferenced):

# Pre-check: Ensure no confirmation PoC is active (fails entire script if not false)
echo "--- Pre-flight Check: Confirmation PoC Status ---" && \
CONFIRMATION_POC_ACTIVE=$(curl -sf "https://node3.gonka.ai/v1/epochs/latest" | jq -r '.is_confirmation_poc_active') && \
[ "$CONFIRMATION_POC_ACTIVE" = "false" ] && \
echo "OK: No confirmation PoC active" && \

sudo rm -rf inferenced.zip .inference/cosmovisor/upgrades/v0.2.10-post7/ .inference/data/upgrade-info.json  && \
sudo mkdir -p  .inference/cosmovisor/upgrades/v0.2.10-post7/bin/  && \
wget -q -O  inferenced.zip 'https://github.com/gonka-ai/gonka/releases/download/release%2Fv0.2.10-post7/inferenced-amd64.zip' && \
echo "5ed8941d50779fa2359a9745263b324b887465104f81073827321945ab1f392a  inferenced.zip" | sha256sum --check && \
sudo unzip -o -j  inferenced.zip -d .inference/cosmovisor/upgrades/v0.2.10-post7/bin/ && \
sudo chmod +x .inference/cosmovisor/upgrades/v0.2.10-post7/bin/inferenced && \
echo "Inference Installed and Verified"  && \

# Link Binary
echo "--- Final Verification ---" && \
sudo rm -rf .inference/cosmovisor/current  && \
sudo ln -sf upgrades/v0.2.10-post7 .inference/cosmovisor/current  && \
echo "d9093b225cbd531afc56c99d0b0996b1fa2896c0745cd73293f0de08132f7754 .inference/cosmovisor/current/bin/inferenced" | sudo sha256sum --check && \

# Restart 
source config.env && docker compose up node --no-deps --force-recreate -d

Automatic `ClaimReward` didn’t go through, what should I do?

If you have unclaimed reward, execute:

curl -X POST http://localhost:9200/admin/v1/claim-reward/recover \
    -H "Content-Type: application/json" \
    -d '{"force_claim": true, "epoch_index": 106}'

To check if you have unclaimed reward you can use:

curl http://node2.gonka.ai:8000/chain-api/productscience/inference/inference/epoch_performance_summary/106/<ACCOUNT_ADDRESS> | jq

Upgrades

Upgrade v0.2.14: Pre-upgrade Bridge Update

To help keep the Ethereum bridge stable during the mainnet upgrade, update the bridge image to 0.2.14-post3 in advance. If you have multiple network nodes, please update them one by one. Please make sure to perform this step outside of PoC or cPoC.

Run all commands from deploy/join (where docker-compose.yml and .dapi/ are).

Update bridge image to 0.2.14-post3

  bridge:
    container_name: bridge
    image: ghcr.io/product-science/bridge:0.2.14-post3

Restart bridge container

source config.env && docker compose up --force-recreate bridge

Upgrade v0.2.12: Pre-Upgrade Model Cleanup

Important

This cleanup process must be completed before the upgrade happens. If you upgrade before cleaning up the models, your node will be rejected and go offline.

Version 0.2.12 removes every governance model that is not on the post-upgrade approved list. On mainnet, only the previously enforced model and Kimi will remain.

Each DAPI persists its MLNode configurations locally. On startup, it validates every configured model against the on-chain governance list. If a configuration includes at least one unsupported model, the entire node is rejected and the host goes offline.

Version 0.2.11 masked this problem by trimming the runtime view down to the enforced model, so /admin/v1/nodes appeared clean even when the persisted config still contained extra models. Version 0.2.12 stops this trimming, meaning the persisted config is loaded directly.

To fix this, the script below finds each node with extra models in /admin/v1/config and sends a PUT request with a cleaned config to /admin/v1/nodes/<id>. These changes are persisted within 60 seconds. The remaining model's arguments, hardware, and ports are preserved exactly. Nodes that do not list the enforced model are skipped and will require manual fixing.

Paste the following script into the host's shell. By default, it will apply the changes. To preview the changes without applying them, set APPLY=dry (or any value other than --apply).

Script in the repository:

Bash
Python.

ADMIN=${ADMIN:-http://127.0.0.1:9200}
KEEP=${KEEP:-Qwen/Qwen3-235B-A22B-Instruct-2507-FP8}
APPLY=${APPLY:-"--apply"}

curl -sS "$ADMIN/admin/v1/config" | jq -r --arg k "$KEEP" '
  .nodes[] | "\(.id): " + (
    if (.models | has($k) | not) then "skip (\(.models | keys))"
    elif (.models | length) == 1 then "ok"
    else "\(.models | keys) -> [\($k)]" end)'

if [[ "$APPLY" == "--apply" ]]; then
  curl -sS "$ADMIN/admin/v1/config" \
    | jq -c --arg k "$KEEP" \
        '.nodes[] | select((.models | has($k)) and (.models | length > 1)) | .models = {($k): .models[$k]}' \
    | while IFS= read -r p; do
        id=$(jq -r .id <<<"$p")
        curl -sS -f -X PUT -H 'Content-Type: application/json' -d "$p" \
          "$ADMIN/admin/v1/nodes/$id" >/dev/null && echo "$id: updated"
      done
  echo "done; persisted within 60s"
else
  echo "preview only; rerun without APPLY=dry to commit"
fi

Wait 60 seconds after running the script to ensure the changes are persisted before triggering the upgrade. Then, verify the configuration:

curl -sS http://127.0.0.1:9200/admin/v1/config \
  | jq '.nodes[] | {id, models: (.models | keys)}'

Expected output:

{
  "id": "<nodeId>",
  "models": [
    "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8"
  ]
}

(Additional nodes will follow the same format)

Upgrade v0.2.12: Pre-download binaries

# 1. Create Directories
sudo mkdir -p .dapi/cosmovisor/upgrades/v0.2.12/bin \
              .inference/cosmovisor/upgrades/v0.2.12/bin && \

# 2. DAPI: Download -> Verify -> Unzip directly to bin -> Make Executable
wget -q -O decentralized-api.zip "https://github.com/gonka-ai/gonka/releases/download/release%2Fv0.2.12/decentralized-api-amd64.zip" && \
echo "d0143a95e12e1ada06cfea5e4d3deab13534c3523c967e9a6b87ac9f9bf3247d decentralized-api.zip" | sha256sum --check && \
sudo unzip -o -j decentralized-api.zip -d .dapi/cosmovisor/upgrades/v0.2.12/bin/ && \
sudo chmod +x .dapi/cosmovisor/upgrades/v0.2.12/bin/decentralized-api && \
echo "DAPI Installed and Verified" && \

# 3. Inference: Download -> Verify -> Unzip directly to bin -> Make Executable
sudo rm -rf inferenced.zip .inference/cosmovisor/upgrades/v0.2.12/bin/ && \
wget -q -O inferenced.zip "https://github.com/gonka-ai/gonka/releases/download/release%2Fv0.2.12/inferenced-amd64.zip" && \
echo "df7656503d39f6703767d32d5578d1291e32cb114844d8c1cd0f134d1bf4babd inferenced.zip" | sha256sum --check && \
sudo unzip -o -j inferenced.zip -d .inference/cosmovisor/upgrades/v0.2.12/bin/ && \
sudo chmod +x .inference/cosmovisor/upgrades/v0.2.12/bin/inferenced && \
echo "Inference Installed and Verified" && \

# 4. Cleanup and Final Check
rm decentralized-api.zip inferenced.zip && \
echo "--- Final Verification ---" && \
sudo ls -l .dapi/cosmovisor/upgrades/v0.2.12/bin/decentralized-api && \
sudo ls -l .inference/cosmovisor/upgrades/v0.2.12/bin/inferenced && \
echo "94ce943338d12844028e84fe770106c9d28d866cf0af99f27da30f56d69efa34 .dapi/cosmovisor/upgrades/v0.2.12/bin/decentralized-api" | sudo sha256sum --check && \
echo "642eb9858cd77d182f3e1c4d44553f5379d615983430e1fd8e85f09632af4271 .inference/cosmovisor/upgrades/v0.2.12/bin/inferenced" | sudo sha256sum --check

Bounty program

What is the bounty program? Who can participate? How are rewards paid?

It’s not necessary to be a Host to participate: anyone can report a security vulnerability or contribute fixes, improvements, and new features to the broader Gonka infrastructure.

There are two complementary tracks:

Security vulnerabilities are handled through Gonka’s official program on HackerOne. See How do I report a security vulnerability? below.
Protocol contributions (fixes, improvements, and new features) are proposed, reviewed, and validated by the community on GitHub, and rewards are paid out through a network upgrade in a stablecoin. See How do I contribute to protocol development? below.

How do I report a security vulnerability?

Gonka runs its security program on HackerOne. Submit all vulnerability reports through the official form at gonka.ai/docs/report-vulnerability rather than disclosing them in public issues, pull requests, or chats.

How rewards work on HackerOne:

Payment is made right after your report is triaged on HackerOne — it does not require you to also submit a fix.
Fixing the issue is rewarded separately, in addition to the report itself.
The authoritative severity model, reward amounts, categories, scope, and eligibility rules are all defined by the program on HackerOne. Always read the full program terms on HackerOne before submitting, as they take precedence over any summary here.

What is the vulnerability severity model?

The final severity classification and payout are determined by the Gonka program on HackerOne. The table below is provided only as a general guide to how severity is reasoned about.

A common way to think about severity is:

Risk = Impact × Likelihood

Impact is evaluated from a network perspective (a network-wide effect is required for High/Critical). Issues affecting only one participant typically cap at Low or Medium.

Impact levels

Level	Description	Examples
Critical	Catastrophic for the whole network	Full network control hijack
High	Significant disturbance at scale	Network crash/halt; theft from module; wrong rewards for all participants
Medium	Moderate disruption, limited scope	Consensus or reward integrity at risk; single-participant funds or availability
Low	Minor impact on isolated participants, no chain impact	single-component, minor effect on a single participant, non-chain

Likelihood

Organic — Unintentional; occurs under normal conditions. Estimate by probability (how often conditions trigger it, usage patterns).
Intentional — Profitable — Exploited for financial gain. Higher likelihood when gain is large and cost/complexity is low.
Intentional — Griefing — Exploited to cause disruption. Higher likelihood when network-wide effect and low cost; single-participant griefing → lower likelihood.

Risk Matrix

Impact \ Likelihood	High	Medium	Low
Critical	Critical	Critical	High
High	Critical	High	Medium
Medium	High	Medium	Low
Low	Medium	Low	Informational

How do I contribute to protocol development?

If you want to help develop the protocol (not report a security issue), the workflow is community-driven on GitHub:

Find or create work. Look for existing issues labeled up-for-grabs, or create your own issue and seek validation from the community that the work is worth doing. Before starting an existing issue, leave a quick comment that work has started and include an approximate ETA, so others have visibility and avoid duplicate effort.
Open a pull request. Ship a solid fix or implementation and open a PR against gonka-ai/gonka.
Gather community validation. Share the PR in the relevant developer channels and seek validation from the community so the change can be reviewed and included in a network upgrade.

How contribution rewards are paid: rewards for accepted contributions are paid out through a network upgrade in a stablecoin. As with all on-chain actions, the upgrade and its payments are subject to governance approval.

Where do I propose and discuss ideas for the protocol?

Publish your ideas as GitHub Discussions. Start with the welcome guide at Welcome to Proposals #795, which explains what belongs there and how to write a strong, structured proposal.
Gather feedback from the community across the channels where it is active — Telegram groups, other community groups, and the Gonka Discord. Please consolidate the key context back into GitHub Discussions so the full history stays searchable and in one place.

Where can I see the current protocol priorities?

The community-aligned Gonka Network Development Roadmap describes the strategic horizons, roadmap tracks, and current priorities for protocol development. Use it to understand what matters most right now and to align your contributions and proposals with the network’s direction.

Where can I see who was paid bounties, for what, and when?

For security bounties, the record lives in the Gonka program on HackerOne. For protocol contributions, the most reliable sources are on-chain records and GitHub. Use them as the main source of truth for who was paid, what the reward was for, and when it was executed.

Errors

`No epoch models available for this node`

Here you can find examples of common errors and typical log entries that may appear in node logs.

2025/08/28 08:37:08 ERROR No epoch models available for this node subsystem=Nodes node_id=node1
2025/08/28 08:37:08 INFO Finalizing state transition for node subsystem=Nodes node_id=node1 from_status=FAILED to_status=FAILED from_poc_status="" to_poc_status="" succeeded=false blockHeight=92476

It’s not actually an error. It just indicates that your node hasn’t been assigned a model yet. Most likely, this is because your node hasn’t participated in a Sprint, hasn’t received Voting Power, and therefore hasn’t had a model assigned. If your node has already passed PoC, you shouldn’t see this log anymore. If not, PoC takes place every ~24 hours.

How do I fix `err="no validator signing info found"` when starting from a state sync snapshot?

If you periodically hit err="no validator signing info found" during startup from a state sync snapshot, it is typically related to the Cosmos SDK iavl-fastnode behavior. A safe workaround is to disable fastnode for the initial startup, then (optionally) re-enable it after the node is fully synced.

Fix (Docker):

Stop the node:
```
docker stop node
```
In .inference/config/app.toml, set:
```
iavl-disable-fastnode = true
```
Start the node:
```
docker start node
```
After a restart, the issue should not recur.

Note

main includes v0.2.10-post6. Nodes starting from this version apply this setting automatically, so you typically won’t need to change it manually.

Inference

Why does the 4,096 output token limit cause the model to stall during thinking — returning zero tokens?

This is about you if

You see content=null and finish_reason=length.
The model is "silent" — usage shows tokens, but there's no text.
A probe request with max_tokens=100 returns nothing.

Fix-first: a working configuration for Kimi-K2.6

If you don't have time to dig in — copy this payload as a starting point. As of 2026-05-28 it worked on two public brokers; verify it's still current with your broker operator before using it.

{
  "model": "moonshotai/Kimi-K2.6",
  "messages": [
    {"role": "user", "content": "Write hello world in Python."}
  ],
  "max_tokens": 4096,
  "thinking": {"type": "disabled"},
  "thinking_token_budget": 0,
  "temperature": 0.2
}

Why these exact fields:

max_tokens: 4096 — give the model the entire available output budget. The effective cap on brokers right now is 3,072 (see Q3) — going higher is useless. Minimum 256, otherwise the gateway may force thinking_token_budget to zero.
thinking: {"type": "disabled"} — disables hidden thinking via a chat template hint.
thinking_token_budget: 0 — belt-and-suspenders: explicitly zeroes the budget at the generation parameter level (see Q2).
The model ID is case-sensitive: moonshotai/Kimi-K2.6 (capital K) on gonka-api.org, moonshotai/kimi-k2.6 (lowercase k) on gonkagate.com. Got a 404 — flip the case. Cross-check against the GET /v1/models response.

Ready-to-use curl (replace <broker> and the model ID case):

curl -sS https://<broker>/v1/chat/completions \
  -H "Authorization: Bearer $GONKA_API_KEY" \
  -H "Content-Type: application/json" \
  -d @payload.json

If it returned meaningful text — the problem is in your original payload; compare the fields one by one. If content=null — capture the id from the response and send it to the broker's support.

First check whether the rules are active on your broker

Gateway behavior depends on the broker and changes over time. Run this test:

curl https://<your-broker>/v1/chat/completions \
  -H 'content-type: application/json' \
  -H "Authorization: Bearer $GONKA_API_KEY" \
  -d '{
    "model": "moonshotai/Kimi-K2.6",
    "messages": [{"role": "user", "content": "one word"}],
    "max_tokens": 100
  }'

Gateway version	Expected result
`devshard ≥ 0.2.13` (force-zero-below-256 active)	`finish_reason="length"`, ~0–10 reasoning tokens
Older build	`finish_reason="length"`, ~40–60 reasoning tokens (default `max_tokens / 2`)

The rules below describe the recent gateway code (devshard ≥ 0.2.13). Your broker may not be updated yet. Not sure about the version? — run the fix-first above. If it works with meaningful text, the gateway is recent enough. If not — send response.id to the broker's support with a question about updating.

What happens on the model and gateway side

Kimi-K2.6 specifics. The model emits <think>…</think> blocks. Both sections (<think> and visible content) consume max_tokens equally. With small max_tokens the model burns the entire budget inside <think> and returns only </think>, which vLLM strips as a special token → content=null, finish_reason=length. From the client side — "0 tokens."

Gateway rules for thinking_token_budget (PR #1202, devshard 0.2.13+):

Condition	What the gateway does
`max_tokens < 256`	`ttb = 0` (force-zero, overrides the client)
`ttb` not set, `max_tokens >= 256`	`ttb = max_tokens / 2`
`ttb` set by the client	uses the client's value
always	clamp: `ttb ≤ 96,000` and `ttb ≤ max_tokens − 64`

Additionally:

max_tokens floor → 16 (PR #1227) — previously max_tokens=1 reliably produced content=null. Now it's silently raised to 16.
thinking: {"type":"disabled"} mirror (PR #1224) — the gateway mirrors it into chat_template_kwargs.thinking=false. The Kimi chat template reads the kwarg.

Scenarios that historically produced content=null (max_tokens=1, the probe-shape max=100, min=100, ttb=50) now return non-empty content through the recent gateway. On gonkagate.com (2026-05-25), max_tokens=100 without ttb returned ~50 reasoning tokens — force-zero-below-256 is not active there.

For Inference User:

Re-test against a broker with gateway ≥ 0.2.13 (release 2026-05-23+).
See zero tokens — capture the id from the response and send it to the broker. To extract it:

curl ... | jq .id

Format: devshard-<short>-<short>, e.g. devshard-7a4f-31b2. Where to send: the broker's support channel (for gonka-api.org — support links on the site; for gonkagate.com — the /contact section). - Don't rely on thinking:disabled alone — to be safe, set thinking_token_budget: 0 explicitly (see Q2).

For Broker: on pre-0.2.13 — update per your validation / release cadence (no rush: clients on older versions and escrow rules require re-qualification). Until the update, clients apply the workaround above; after devshard-0.2.13 the zero-output content=null cases will disappear.

With Kimi K2, the entire token limit can be spent on thinking with no actual output. Is this an output cap, bandwidth, or upstream issue?

This is a gateway policy, not a model limitation. The thinking_token_budget resolver (PR #1202) allocates max_tokens / 2 for reasoning by default. For tool-heavy flows the budget burns out before any useful output. The mitigation is to explicitly set thinking_token_budget: 0 or thinking: {"type": "disabled"} (the gateway mirrors it into chat_template_kwargs via PR #1224). The model simply respects the budget.

Same cause as in Q1 — the model splits max_tokens between <think> and visible content. This is not bandwidth and not an output cap.

Two escape hatches

thinking: {"type": "disabled"} — the gateway mirrors it into chat_template_kwargs.thinking=false (the Kimi chat template reads the kwarg) and removes the top-level thinking. "adaptive" and "auto" are accepted (Claude Code CLI / Anthropic SDK preset, PR #1224) — both resolve to enabled.
thinking_token_budget: 0 — an explicit zero goes straight to vLLM as a generation parameter and reliably zeroes the thinking budget.

Important nuance: the mechanisms work at different levels (chat template hint vs. generation parameter) and don't overlap. thinking:disabled does NOT automatically zero thinking_token_budget — with the default max_tokens=4096 and only disabled, the model still gets a hidden ttb=2048 from the gateway resolver. In our tests Kimi respected thinking:disabled even on reasoning-heavy prompts. The model documentation (the planned docs/chat-api/kimi-k2.6.md) warns that in some reasoning scenarios the model may ignore the hint — we didn't reproduce it, but hedge anyway. Belt-and-suspenders: for critical flows, send both parameters together.

Numeric confirmation

The same bug-find prompt, max_tokens=500, the answer is identical in meaning:

Config	usage.completion_tokens	Wall-clock
`thinking: {"type":"disabled"}`	65	3.6s
default (gateway resolver → ttb = max_tokens/2 = 250)	312	12.5s

Half of the default budget goes to hidden thinking even for a trivial task — hence the advice to disable thinking for tool-heavy / agentic flows.

For Inference User:

Tool-heavy / agentic flows without reasoning — "thinking": {"type": "disabled"} (Kimi) or "enable_thinking": false (Qwen, translated automatically).
Complex reasoning — set thinking_token_budget explicitly (don't rely on the default max_tokens / 2).
If thinking:disabled still causes burn on your prompt — duplicate it with thinking_token_budget: 0 explicitly.

For Broker: on pre-0.2.13 — update per cadence. Until the update, clients apply the workaround. On the landing page, note: Kimi for tool-heavy flows requires thinking:disabled, or an explicit thinking_token_budget, or a large max_tokens.

The input token cap for Kimi is 4k tokens, and the output cap is 8,192 tokens. When will these limits be raised?

The numbers in the question are incorrect

Output cap: 3,072 tokens on both tested brokers (they return finish_reason=length at exactly 3,072 even with max_tokens=8000).
Input: up to 240,000 tokens (--max-model-len on the mainnet Kimi deploy). Not 4,000.

Where the output cap comes from

The network ceiling in the code is 4,096 (RequestMaxTokensCap), but the effective limit is lower. The exact mechanism is a black box. Possible explanations (by likelihood, not confirmed from public code):

The gateway default DefaultRequestMaxTokens = 3,072 is not overridden by the broker operator.
The broker operator set request_max_tokens_cap = 3,072 per-model via an admin endpoint (POST /v1/admin/settings).
An upstream DAPI or host-side cap (e.g. vLLM --max-tokens-per-request or a loader constraint).

To know for sure — ask the broker for the request_max_tokens_cap value for each model.

How much fits in 3,072 tokens

Scenario	Fits in 3,072?
~1,900–2,200 words of regular English	yes
~600–800 lines of Python/JS	yes
A short answer (5–10 sentences)	yes
One tool call + moderate JSON (`arguments` ≤ 500 tokens)	yes
Small structured output (3–5 summary points)	yes
A long document summary (>10k source tokens)	no
Large code diffs (>2k lines)	no
3+ parallel tool calls in one response	no
Agentic loop: reasoning + tool calls + visible content at once	no

For use cases in the second group — request a cap increase from the broker (see For Broker).

How to raise the cap

The output cap is controlled by the broker, not the network. To raise it — ask your broker: they can increase request_max_tokens_cap with a single admin call (no code changes). A network-wide bump above 4,096 requires a PR to the gateway code + a new release; you can initiate it through a GitHub Discussion on gonka-ai/gonka.

For the curious / operators: the blockchain stores per-model price parameters (coins_per_input_token, coins_per_output_token) and deploy parameters (model_args), but there's no field for a hard output limit — the relaxation is a local broker policy, not a governance-defined value.

Where the 240k input comes from

The mainnet Kimi-K2.6 deploy is registered via the on-chain governance proposal v0.2.12 (inference-chain/app/upgrades/v0_2_12/upgrades.go:kimiGovernanceModel()):

ModelArgs: ["--max-model-len","240000",
            "--tool-call-parser","kimi_k2",
            "--reasoning-parser","kimi_k2"]
VRam: 720 (GB)

The model card declares 256K native context. The gateway doesn't limit input separately, other than the universal body size (10 MiB) and message count (≤ 2,048) — the "Request limits" section in docs/chat-api/README.md (planned document).

Important caveat (open issue)

Even if the broker agrees to raise the output cap, individual nodes may be started with a smaller --max-model-len. The gateway routing layer does not account for per-host context capacity (issue #818). For large payloads (>50k), landing on a "small" node is systematic behavior, not transient randomness.

For Inference User:

The real output cap is determined by the broker — ask them for the request_max_tokens_cap value for each model.
Hitting a small input limit — this is almost certainly --max-model-len on a specific node, not a global limit. The routing layer doesn't account for per-host context (issue #818); for large payloads (>50k) this is a systematic problem. Workaround: retry or split the request into several API calls.
Hitting the output cap — ask the broker to raise it. A network-wide bump (above 4,096) is a code change; raise it through a GitHub Discussion on gonka-ai/gonka.

For Broker:

Raising the cap per-model is a single admin call via POST /v1/admin/settings with model_limits[].request_max_tokens_cap, no code change. It increases per-request escrow exposure and the risk of hitting the per-host --max-model-len (5xx on individual nodes). Raise it for specific models under proven demand, after verifying --max-model-len on all escrow nodes.
A network-wide bump (above 4,096) is a PR to the gateway code + a new release. Stable demand for large outputs — open a Discussion.

Agents like Hermes, OpenClaw with 30k+ system prompts fail on Kimi. Why?

In brief

The Kimi model accepts 30k+ input at the model and gateway levels, but stability depends on routing. The native window is 256K, the mainnet deploy uses --max-model-len 240000, and the gateway accepts a body of up to 10 MiB. Empirically, a single-shot ~69,000 prompt tokens (≈800 messages × 80 words) completed in 5.5 seconds. On sustained / repeated long requests (>50k) you'll encounter instability (issue #818) — on large payloads (215k) repeated attempts may fail with 503.

Sources for verification (all in gonka-ai/gonka)

Native context 256K — the model card in docs/chat-api/ (the exact filename is planned as part of the chat-api docs set).
Mainnet deploy params (on-chain) — inference-chain/app/upgrades/v0_2_12/upgrades.go:kimiGovernanceModel().
Body / message limits (10 MiB, ≤ 2,048 messages) — docs/chat-api/README.md (planned), the "Request limits" section.

When 30k does break — two typical causes

1. A single rejected field in the agent's payload. The gateway maintains a strict allowlist. If the agent sent even one non-standard field (tags, enforced_tokens, plugins, guided_json) — the entire request is cut with HTTP 400. The Hermes-specific tags reject — anchor #reject-tags in docs/chat-api/troubleshooting.md (planned). Empirically: a valid 69k payload + tags:["session:abc"] → HTTP 400 in 2 seconds.

2. Routing to a node with a smaller --max-model-len. The gateway routing layer doesn't account for the host's actual context size when routing (issue #818; see also the planned known-issues.md §3). On very long payloads (>50k, especially >200k), landing on a "small" node is systematic behavior at the network level, not a client error: in our measurements 5×215k = 0/5 success. The request will fail on the vLLM side.

A related builder request: issue #1229 (opened in May 2026), blockers for agentic scenarios — long reasoning chains, tool-call compatibility, continuation after exceeding output limits.

Quick self-diagnosis checklist

Remove the fields tags, enforced_tokens, plugins, strict, guided_json, guided_regex, guided_grammar, guided_choice one at a time. Resend the same request after each removal.
If none of the removals helped — check the schema depth in tools[].function.parameters (≤ 16) and the total number of nodes (≤ 256), see Q9.
The payload is clean and it still fails — this is network-level (issue #818). Workaround: retry or split the request.

For Inference User:

First check the payload against the whitelist in docs/chat-api/README.md (planned). Most Hermes / OpenClaw 400s are due to a single field or schema.
Generic broker messages like "upstream model provider rejected" are misleading: some brokers collapse specific gateway 400s into a generic message, some pass through the original ("Chat completions parameter \"tags\" is currently rejected by the Gonka network..." with a link to docs). The broker comparison — comparison-brokers.md (planned). If one broker shows a generic error — try another to get a readable message and understand the root cause.
The payload is clean and it still fails — network-level (issue #818). Workaround: retry or split; on sustained >50k payloads a single retry is often not enough — split.

For Broker:

(1) Show the native context window for each model (on the landing page, via the /v1/models endpoint, or in docs) with an explicit caveat that effective per-request capacity may be lower due to host heterogeneity (issue #818). Some brokers intentionally omit this to avoid over-promising — a defensible choice. (2) Until host-level capacity advertising is implemented — consider client-side filtering or a "preferred-host" list.
UX: the gateway returns specific 400s with field names and messages ("Chat completions parameter \"tags\" is currently rejected by the Gonka network..." + a link to docs). We recommend passing detailed messages through to clients in production — it speeds up diagnosis. Security note: detailed messages may reveal internal field names, host paths, and validator IDs that help enumeration or prompt-injection probes. Conservative masking is a defensible default. If you wrap them in a generic "upstream provider rejected" for security — use a hybrid approach: full details in async logs / error tracking, a generic message with a tracking ID to the client. The compatibility map for agents — docs/chat-api/agents.md (planned).

Why does Kimi generate malformed JSON for tool calls when output exceeds 4k–8k tokens?

Neither bandwidth nor a Gonka-side limitation. Three overlapping causes.

(a) max_tokens truncation

The effective output cap on the tested brokers is 3,072 tokens; the gateway network ceiling is 4,096. When the assistant emits tool calls with large JSON blobs in arguments plus visible content, you can hit the broker's real cap and get truncated JSON. Details on per-broker override — Q3.

(b) Kimi-K2.6 tool-parser duplicate ID collision

[vLLM PR #21259 — UNVERIFIED]. With n > 1, the kimi_k2 parser recomputes history_tool_call_cnt inside the per-choice loop — both branches get id = functions.<name>:0. The gateway sees a duplicate ID in vLLM's response and rejects it with HTTP 400 (per the OpenAI spec). Anchor #reject-duplicate-tool-call-id in docs/chat-api/troubleshooting.md (planned). Upstream fix — vLLM PR #21259 (merge status not independently confirmed).

(c) Hermes tool-parser JSONDecodeError on multiple tool blocks

[vLLM #17790 — awaiting upstream fix]. Different parser, different problem: a JSONDecodeError when the model emits several tool-call blocks in one response — vLLM #17790. Related: <tool_call> inside <think> breaks hermes parsing — vLLM #42021. These don't depend on Gonka — awaiting an upstream fix.

For Inference User:

Rewrite tool_call.id on the client side before sending subsequent messages, into the canonical format functions.<name>:<global_idx> — the official Moonshot recommendation, duplicated in docs/chat-api/troubleshooting.md#reject-duplicate-tool-call-id (planned). An alternative is fresh UUIDs.
Don't dedup by id — two calls with the same id may contain different results. Losing them = losing the agent's work.
Raise max_tokens for responses with tool calls; large arguments blobs quickly hit the cap.
A generic broker error "upstream model provider rejected" usually means a gateway-side reject, not the model. First check the message and the ID for duplicates, then suspect the model (see broker differences in Q4).

For Broker:

Considering gateway-side dedup-by-id — two tool calls with the same ID may contain different results; it's safer to rewrite the ID into the canonical format functions.<name>:<global_idx> (don't dedup). Document the pattern in the customer FAQ with a link to troubleshooting.md#reject-duplicate-tool-call-id. Security note: a naive dedup-by-id is an attack surface if not validated carefully. Canonicalizing the name instead of removing is safer.
UX: pass through the specific gateway error message ("messages[N].tool_calls[M].id is duplicated") instead of a generic wrapper — it reduces time-to-fix for agentic clients. Security note: balance debug-friendliness and information disclosure — see Q4.

Could enabling guided decoding fix the token cap issue?

Guided decoding has nothing to do with the token cap. The mechanism forces the model to generate output to a schema (JSON Schema, regex), but doesn't change the token count. About the cap — Q3.

The low-level vLLM fields guided_json, guided_regex, guided_grammar, guided_choice are rejected by the gateway with HTTP 400 (anchor #reject-guided-decoding in docs/chat-api/troubleshooting.md (planned)). The reason — they bypass the xgrammar bounds applied to the response_format / structured_outputs envelope to mitigate CVE-2025-48944.

The correct fields for structured output

Field	Kimi K2.6	Qwen3-235B	Notes
`response_format` (`type: "json_schema"` or `"json_object"`)	works	works	OpenAI standard. Reliable choice. Empirically verified on both models through a public broker.
`structured_outputs` envelope (`json`/`regex`/`choice`/`grammar`/`structural_tag`/`json_object`)	HTTP 400 (network-wide reject)	HTTP 400 (network-wide reject)	PR #1215 (`StructuredOutputsValidator`) merged in the repository, but not activated on production mainnet as of 2026-05-25. Both brokers reject with an identical error: `"Chat completions parameter`structured_outputs`is currently rejected by the Gonka network"` — the error references the dev branch `dl/devshards-gateway-to-main`, not main. This is a network-wide release lag, not per-broker. The only reliable structured-output option today is `response_format` on Kimi K2.6 and Qwen3.
Both at once (`response_format` + `structured_outputs`)	HTTP 400	HTTP 400 / 502 (depends on the broker)	The gateway rejects the combo before vLLM (anchor `#reject-structured_outputs-with-response_format`). On vLLM 0.20.0 the fields are merged via `dataclasses.replace()` and violate exactly-one in `StructuredOutputsParams.__post_init__`.

For Inference User:

Need maximum portability across brokers and models — use response_format (works everywhere). The structured_outputs envelope is currently rejected network-wide.
Don't combine response_format and structured_outputs in one request — HTTP 400.

For Broker:

Guided decoding doesn't raise throughput. Don't promise it to clients as a solution for the token cap.
Watch for the rollout of PR #1215 (StructuredOutputsValidator) on all routes — Qwen3 users are already waiting for the structured_outputs envelope for regex / choice / grammar workloads.

Why does generation speed fluctuate so drastically? And why does the boost apply only to reasoning tokens?

Speed fluctuations are a real, known open problem. The roots are in three different layers.

1. Per-host slowdowns / stalls (host-level)

An open research task — issue #818 "Slow nodes investigation" (OPEN since February 2026, Priority: High). Specific patterns without a root cause (the planned known-issues.md, §1 "Host returns no stream after receipt" and §2 "Host stalls after producing chunks" — in some cases it resumes after a minute, in others never).

2. Routing variance (broker-level)

Between two consecutive requests the broker may land on different hosts with different loads. End-to-end latency varies depending on the devshard-XXXX-YYY host ID. Per-token generation speed on a stable host stays practically the same.[¹]

[¹] Illustrative observation: in one test (5 requests over ~30 sec) end-to-end latency varied such that tokens / total_latency showed a range of ~8–54 tok/s, but this metric includes TTFT and is not a published variance metric.

3. Validation windows at the network level (chain-level)

During PoC / Confirmation-PoC events (cPoC — the phases that confirm validator work within an epoch) some nodes are temporarily unavailable. At epoch boundaries there was a known problem with the snapshot preserved-nodes, in which the gateway returned attempts: [] (no available hosts on the route) — from the client side, a timeout. The effect is more noticeable the fewer nodes with that model the broker serves; it's stronger on models with a small number of providers.

"Reasoning faster than visible" — not prioritization, but output structure

There's no special fast route for reasoning tokens on the gateway. In the devshard code, delta.reasoning, delta.content, delta.reasoning_content, delta.tool_calls are all detected the same way via sseChunkHasContent. Per-token speed is the same.

Kimi with thinking enabled first generates a bulky reasoning_content (hundreds to thousands of tokens), then a short visible answer (tens to hundreds). A client that doesn't show the reasoning field sees "silent, then blurts out the answer in a burst." In reality the model was generating the whole time, the result was just hidden.

For Inference User:

Choose a broker that publishes uptime / p50 TTFT metrics. Available dashboards include gonka.pw and meter.gonka.gg (there may be others; the list is not exhaustive).
On a slow request, remember the payload size: for short ones a retry lands on a different node; for sustained large payloads (>50k) landing on a node with a reduced window is a systematic problem (issue #818), and a retry alone may not work — better to split.
Want to see progress while the model thinks — render delta.reasoning_content (or delta.reasoning) in the UI, e.g. in a collapsed block.

For Broker:

The highest-priority shared problem for the whole network. Contribute production logs / traces to issue #818 — this gives the core team data they don't have.
Help implement host-side improvements (chunked gossip recovery, per-escrow lastAfterReq tracking — tracked in the planned host-improvements.md and related issues) — they directly address routing / recovery weak spots.

Why does speed vary depending on hardware — faster on B200, slower on H200?

Speed depends on hardware — this is normal for a heterogeneous network. The PoC weight on the chain reflects the node's real performance (affecting the validator's reward share), while the broker's routing locally picks an available host from escrow — two consecutive requests may land on GPUs of different generations.

For Inference User: speed depends on the hardware distribution in the network. You don't pick hardware directly — you pick a broker. Need predictable latency — ask the broker which hardware tier they route to by default.

For Broker:

Where exactly the difference comes from (per internal benchmarks from kaitakuai/experiments — not measured on gonka-api.org or gonkagate.com):

GPU	Memory	sm	Qwen3-235B nonces/min per instance	Per-GPU
4×H100 SXM5	80 GB HBM3	90	1,248 @ batch=16	~312
4×H200	141 GB HBM3e	90	1,408 @ batch=32–64	~352
2×B200	192 GB HBM3e	100	1,984 @ batch=64	~992

H200 vs H100: +13% per-GPU. Same chip (sm_90), but HBM3e + 141 GB vs HBM3 + 80 GB → allows a smaller TP for large models and a faster KV cache.
B200/B300 vs H100/H200: ~3× per-GPU on Qwen3-235B FP8.
Kimi-K2.6 INT4 — specific numbers: 4×B200 gives 2,240 nonces/min = ~560 per-GPU (see experiments/2026-05/kimi_k26_int4_4xb200_q-int4-k2). 16×H100 TP gives 1,389 nonces/min = ~87 per-GPU (see experiments/2026-05/kimi-k26-int4-2x8xh100). The difference on a per-GPU basis is roughly 6×; in absolute numbers, per-GPU Kimi is slower than Qwen on the same hardware (4×B200 Kimi INT4 ~560 per-GPU vs Qwen ~992 per-GPU).
Kimi-K2.6 INT4 on Blackwell: VLLM_USE_FLASHINFER_MOE_INT4=1 gives +138% vs Marlin (A/B test in experiments/2026-05/kimi_k26_b300_eager_flashinfer). Applicable only to INT4 MoE workloads on the Blackwell family (kernel gate — is_device_capability_family(100), covers B100/B200/B300; B300 is effectively sm_103a).

Tracing and diagnostics: observability was merged in PR #1046 "Implement dapi & devshard observability" — it adds OpenTelemetry traces, Prometheus metrics, and dashboards. If Grafana has no per-host TTFT panels — check that DAPI / devshard are updated and the dashboards are included in the build.

Additional sources: the repo kaitakuai/experiments (updated regularly), your own per-host stats from gonka.pw, and network status from meter.gonka.gg. Want to influence the hardware distribution — scale devshard escrow toward hosts with the preferred GPUs.

Why can't the model use tools properly within Kilo Code?

Most likely, one of four causes — the gateway applies a strict parameter allowlist and tight caps on the JSON Schema. This is not Kilo-specific: the same causes trigger for any coding agent (Cline, Continue.dev, OpenCode, etc.).

1. Hard reject (HTTP 400) — needs to be fixed on the client side

Trigger	Cause	Fix
The `tags` field in the payload	Not from the OpenAI Chat Completions standard; folkloric Hermes convention; anchor `#reject-tags`	Use `metadata` (OpenAI standard) or `user` for tracking
Schema depth > 16 in `tools[].function.parameters`	CVE-driven cap	Flatten the schema; PR #1187 raised it from 5 → 16
Schema nodes > 256 (total)	CVE-driven cap	Reduce it; PR #1195 raised it from 128 → 256. MCP tools with large input schemas may approach the limit; test on your gateway. If you genuinely need an MCP tool with >256 nodes — feature request.

2. Silent coerce / strip — the request doesn't fail, but behavior changes

Trigger	What the gateway does	Notes
`tool_choice: "required"`	Silently → `"auto"` (network policy)	Anchor `#coerce-tool-choice-required`. In most cases the model will make a tool call for an obviously tool-relevant prompt, but there's no "required" guarantee
`tools[].function.strict: true`	Silently drops the field	vLLM parsers (`hermes`, `kimi_k2`) ignore the flag. PR #1193

The compat matrix for known clients: docs/chat-api/agents.md (planned). A basic working tool-calling example: Developer Quickstart §1.4.

For Inference User:

Reproduce with the same curl that Kilo Code generates (via the client's debug log or an intermediate proxy). In the 400 body the gateway usually states the name of the rejected field; the broker may mask the message into a generic "upstream rejected" — but the specific problem field is usually one.
Cross-check against the lists in agents.md and troubleshooting.md (planned) — most 400s fall into the documented reject anchors (#reject-tags, #reject-enforced_tokens, #reject-structured_outputs-kimi).
Quick checklist if the error message is unclear: check the fields tags, enforced_tokens, plugins, strict, guided_*; remove them one at a time and resend the request. Doesn't help — check the schema depth (≤ 16) and nodes (≤ 256).
The rejected field is not documented — open an issue on gonka-ai/gonka with the captured request.

For Broker:

No link to agents.md on the dashboard — a cheap quick-win to add.
Have capacity to file an issue about non-standard fields in gonka-ai/gonka — it helps every broker in the ecosystem.

Agents like Hermes and OpenClaw fail to complete tool tasks on Kimi. Why?

A composition of three factors

The original FAQ version mentioned a fourth — the special-token sanitizer — but that's about security/prompt-injection, not tool-call failure; the PR fix is deferred, since Kimi handles special tokens correctly (empirically).

The gateway allocates half of max_tokens for thinking by default (see Q1/Q2). With the default thinking_token_budget = max_tokens / 2, it goes to <think> before the model even starts emitting a tool call. For tool-heavy agentic flows the budget runs out before useful output. Mitigation — thinking_token_budget: 0 explicitly (Q2). This is a gateway policy, not a model limitation.
The output cap 3,072 (effective) / 4,096 (network ceiling) is tight for tool-heavy outputs (Q3). Large arguments blobs + visible content easily hit the ceiling.
Upstream vLLM tool-parser bugs (Q5): duplicate tool_calls[].id collisions with n>1 (vLLM PR #21259 — UNVERIFIED) and the hermes parser JSONDecodeError on multiple tool blocks (vLLM #17790).

Builder pain point with a link: issue #1229 — long reasoning chains, tool-call compatibility, continuation after exceeding output limits are listed as blockers of agentic coding workflows.

For Inference User:

For Kimi this is mandatory: "thinking": {"type": "disabled"} + "max_tokens": 4096 (or an explicit thinking_token_budget: 0, see Q2 on belt-and-suspenders). This frees the entire cap for tool-heavy output. Empirically: Kimi easily emits 5 parallel tool calls in one response in ~4 seconds.
Control tool_call.id on the client side — rewrite it into the canonical format functions.<name>:<global_idx> (Q5) to avoid the gateway duplicate-id reject.
Control the schema — keep depth ≤ 16 and nodes ≤ 256 (Q9). MCP tools with large input schemas may not pass.

For Broker:

Combine the cap bump (Q3 — per-model request_max_tokens_cap via /v1/admin/settings) with the recommendations above — it covers the main class of agent failures on your gateway.

OpenCode cannot apply requested code changes (cuts off mid-sentence). What is causing this?

Three causes; the client can work around two, but not the third.

max_tokens truncation on large diffs. Large code patches don't fit in the effective cap of 3,072 (Q3). Workaround: split the diff into several tool calls — the model fits the budget more easily on each.
vLLM crashes on edge-case params — a series of 8 merged PRs (#1170, #1171, #1172, #1174, #1180, #1212, #1215, #1216) added hardening against fields that crashed the engine. On a recent gateway (≥ devshard 0.2.13), most known crash scenarios are cut off by 400 validators instead of crashing.
Host stream drops after receipt (open — described in the planned known-issues.md §1) — the host accepted the request but doesn't return chunks. This is network-level with no client workaround other than retry.

For Inference User:

For Kimi: "thinking": {"type": "disabled"} + "max_tokens": 4096. Large diffs — into several tool calls.
Long-term: Q3 on the broker cap and Q5 on the tool-call canonical id format.

For Broker: document the "split big diffs" pattern in the customer FAQ for coding-agent clients.

Is there a model that handles both input and output without trade-offs?

MiniMax-M2.7 launched on mainnet ~2026-05-28 via the chain governance upgrade v0.2.13 — Gonka's third model. Verified live on both brokers. Clarification: "Qwen output cap 8,192" in the question is inaccurate — the output cap is the same for all models (3,072 / 4,096, Q3), not model-side.

Model	Native context	Mainnet	Native thinking	Tool calls
Kimi-K2.6	256K	240K	yes (chat_template_kwargs)	`functions.<name>:<idx>`
Qwen3-235B-A22B-Instruct-2507-FP8	128K	240K	no (Instruct)	hermes parser
MiniMax-M2.7	~180K	180K	yes (`<think>` in content)	`chatcmpl-tool-<hash>`

MiniMax deploy spec (inference-chain/app/upgrades/v0_2_13/upgrades.go:minimaxGovernanceModel()):

ModelArgs: ["--enable-auto-tool-choice", "--kv-cache-dtype", "fp8",
            "--tool-call-parser", "minimax_m2",
            "--reasoning-parser", "minimax_m2_append_think"]
VRam: 320 GB         ThroughputPerNonce: 5000 (Kimi 1500 — MiniMax ×3.3 higher)
minimaxStartEpoch: 271
HfCommit: d494266a4affc0d2995ba1fa35c8481cbd84294b

Important differences of MiniMax from Kimi/Qwen:

<think> blocks in delta.content (not in reasoning_content like Kimi) — behavior of the minimax_m2_append_think parser. Parse the tags client-side if you don't need them in the final text.
Tool-call IDs chatcmpl-tool-<hash> — already unique by shape, so the Q5 advice about canonical id rewriting doesn't apply.

Related artifacts: PR #1163 Weight Scaling (merged 2026-05-13, aligned the economics with Kimi); PR #1226 (open, not merged) — a gateway-side refactor on top of the deployed model, not a blocker.

For Inference User: MiniMax-M2.7 is available today (ID MiniMaxAI/MiniMax-M2.7 on gonka-api.org, minimaxai/minimax-m2.7 on gonkagate.com — see case-sensitivity Q1). Choose by workload: Kimi for reasoning+tools, Qwen3 for large context + structured outputs, MiniMax-M2.7 — a tool-friendly alternative to Kimi with better throughput.

For Broker: the deploy was done by the network via the v0.2.13 upgrade. Not serving MiniMax — check that the mlnode-image supports the deploy args above and the hosts are updated. PR #1226 (open) will improve the UX (per-model dispatch, tool-message shape), but doesn't block.

Why is there no working web search available?

By design — Gonka is an inference network, not an agent framework. Plugin / web execution is a concern of the client's agent layer or a broker with value-add services, not the inference path.

Specifically: on 2026-05-25 we tested the same plugins payload through two brokers. gonka-api.org silently strips the field (HTTP 200, anchor #strip-plugins in docs/chat-api/troubleshooting.md (planned)); gonkagate.com rejects it with HTTP 400 "Plugin config is invalid". Both are valid interpretations of the gateway contract: one in favor of lenient parsing (silent strip), the other strict validation (reject unknown fields). In both cases plugins is not executed: vLLM has no plugin-execution path, and quietly passing this field through would imply a backend capability that doesn't exist. When migrating between brokers, account for the divergence (details in comparison-brokers.md (planned)).

For Inference User: run the search in your own agent layer (LangChain, LlamaIndex, your own wrapper), inject the results into messages[].content before calling /v1/chat/completions. This is the standard pattern for all OpenAI-compatible endpoints.

For Broker: an opportunity for differentiation — a broker-level value-add ("we do search and inject the results into messages") is a legitimate product. Implement it fully on top of Gonka, without protocol changes. Security note: stripping plugins may reflect an abuse-resistance policy (not a UX failure) — think this through if you'll offer plugin execution as a product. Offering it as a standard — open an ecosystem Discussion on gonka-ai/gonka.

When will reliable web fetching be supported?

By design it's not on the Gonka roadmap. The right place is a side-car or a value-add at the broker layer.

For Inference User: build / buy a fetch service (Tavily, Exa, Perplexity API for search; trafilatura/Readability for parsing), normalize into text, send it through an OpenAI-compatible call. There are plenty of ready-made solutions.

For Broker: want to offer it as a tier — open an ecosystem Discussion in gonka-ai/gonka so the community converges on common conventions (e.g. a side-car that everyone deploys consistently).

Context7 docs research — summary fails. Is this the output token limit?

The same blocker as in "The input token cap for Kimi is 4k tokens, and the output cap is 8,192 tokens. When will these limits be raised?". The output cap (effective 3,072 / network ceiling 4,096) is tight for "tool result body + summary in one response." Thinking is enabled — half of it goes there (Q1/Q2).

A ready-made payload for the summary use case:

{
  "model": "moonshotai/Kimi-K2.6",
  "messages": [
    {"role": "system", "content": "You produce structured summaries of technical documents."},
    {"role": "user", "content": "Summarize the following document:\n\n<paste the text here>"}
  ],
  "max_tokens": 4096,
  "thinking": {"type": "disabled"},
  "thinking_token_budget": 0,
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "document_summary",
      "strict": true,
      "schema": {
        "type": "object",
        "additionalProperties": false,
        "required": ["summary", "key_points"],
        "properties": {
          "summary": {"type": "string", "description": "3-5 sentences"},
          "key_points": {"type": "array", "items": {"type": "string"}, "minItems": 3, "maxItems": 7}
        }
      }
    }
  }
}

For Inference User:

Use the payload above as a template. response_format compresses the output into the required shape, saving budget.
If the document is long and hits the cap (finish_reason=length) — split it into N+1 calls: one fetch+plan, the rest section summaries; stitch them together client-side.
Don't combine response_format with the structured_outputs envelope — HTTP 400 (Q6).
Schema: depth ≤ 16, nodes ≤ 256 (Q9).

For Broker: response_format is the simplest and most portable mitigation regardless of your cap-bump policy. Consider a per-customer cap-bump option once the per-model request_max_tokens_cap is in your admin config.

Gonka has no KV cache. When will caching be added?

Short answer: there's no ETA. On the Gonka gateway side everything is ready — the blocker is on the upstream vLLM side, issue #33264 has been open 4+ months with no merged PR. Until it's closed, the prompt_cache_key field in your request is silently ignored — don't include it, so you don't rely on behavior that doesn't exist.

The vLLM prefix KV cache works on each ML node. Gateway-level prompt_cache_key / cache_key are currently silently stripped — a limitation blocked by an unmerged upstream vLLM PR.

Current status quo

Gateway behavior: prompt_cache_key (OpenAI standard) and cache_key (Moonshot Kimi convention) are silently stripped — neither reaches vLLM. Anchors: docs/chat-api/troubleshooting.md#strip-prompt_cache_key and #strip-cache_key (planned).
Upstream blocker: vLLM uses the cache_salt field for prompt-cache isolation (RFC #16016, PR #17045). Aliasing prompt_cache_key → cache_salt is the open vLLM #33264 since January 2026, with no merged PR.
Security rationale: simply forwarding cache_key without isolation would be unsafe — there are published prompt-cache timing side-channel attacks (arxiv 2502.07776 PROMPTPEEK). The gateway cannot implement false cache-isolation guarantees.
80–90% hit rate is not a Gonka claim. It's either a misinterpretation of someone's marketing material or confusion with OpenAI / Anthropic native cache (which guarantee sticky routing within a single provider).

Important architectural caveat

Even when vLLM #33264 merges and the gateway adds a hash → cache_salt bridge, the cache remains per-vLLM-instance. Gonka's multi-host routing means two requests with the same cache_key may land on different hosts with different prefix caches. Without sticky routing (which doesn't exist now), guaranteeing an OpenAI-style ~80% hit rate is architecturally hard. None of the three blockers (upstream vLLM PR, gateway bridge, sticky routing) is shipped today.

For Inference User: there's nothing to do today — prompt_cache_key and cache_key are no-ops. Don't rely on these fields for cost optimization.

For Broker: no gateway-side change is needed until vLLM #33264 merges. Want to speed it up — comment / contribute to that upstream issue. After the merge, the Gonka gateway will add a bridge that lights up both fields together.

When will image input be enabled for Kimi on the Gonka gateway?

Not available today. ETA — release v0.2.14 or later (current is 0.2.13), no fixed date. Multimodal payloads (messages[].content with type: "image_url" or "video_url") currently return HTTP 400 on both public brokers.

Active work, the plan is written and broken into phases. The planned document multimodal-inference-plan.md in gonka-ai/gonka (≈466 lines, 6 phases — ML Node, Host↔ML Node, Broker/DAPI, Devshard Protocol, etc.). Until it's published, it's easier to track via the issues / PRs below.

Hard blockers today

A multimodal-specific special-token sanitizer. The Kimi-K2.6 chat template accepts image_url / video_url content parts, but the gateway currently validates only text. Multimodal payloads (image URLs, alt-text, metadata) provide an additional injection surface that must be validated. The security review flagged it as a Phase 2 blocker. There's no published CVE for this specific multimodal threat yet; internal tracking is in progress.
Independent VLM validation review. The validation methodology for image inputs needs to be independently confirmed. Issue #1026 (initial research: Qwen2-VL-2B F1=100% intermediate) + #1198 (re-validate, up-for-grabs).

Target: v0.2.14+, but there's no committed timeline; blocked by issue #1198 (independent validation, up-for-grabs).

What's empirically confirmed today: a request with a messages[0].content array containing {type:"image_url"} returns HTTP 400 on both routes (Kimi and Qwen3). Multimodal inputs are not accepted at the gateway level.

For Inference User: not available today.

For Broker: three ways to speed it up:

Take issue #1198 (up-for-grabs) — the independent VLM validation review is the hardest gating item.
Review PR #1150 "vlm benchmark".
When Phase 1-3 of the plan become reachable — prepare the gateway capability registry (Phase 3); the operator config will determine which content types your broker accepts.

FAQ

Overview

What is Gonka?

What problem is Gonka solving?

Who are the key participants in the Gonka ecosystem?

What is the GNK coin?

Can I buy GNK coin?

What makes the protocol efficient?

How does the network operate?

Is this documentation exhaustive?

What is the incentive for contributing computational resources?

What are the hardware requirements?

What wallets can I use to store GNK coins?

Where can I find useful information about Gonka?

Tokenomics

How is governance power calculated in Gonka?

Why does Gonka require locking GNK coins for governance power?

Collateral

What is collateral?

Is collateral required per node or per account?

Do I need to deposit collateral?

How much collateral is required?

Can I partially collateralize my weight?

What happens if I do not deposit enough collateral?

When does collateral take effect?

In what unit do I deposit collateral?

Can collateral be slashed?

What happens to slashed coins?

Can I withdraw collateral?

What collateral is NOT

How are epoch-minted rewards distributed?

Do I need to manually deposit collateral?

Can vested (locked) GNK be used as collateral?

Governance

What types of changes require a Governance Proposal?

Who can create a Governance Proposal?

What happens if a proposal fails?

Can governance parameters themselves be changed?

What should I do if I cannot vote because I do not have access to the cold key, or if I want another key to vote on my behalf?

Improvement proposals

What’s the difference between Governance Proposals and Improvement Proposals?

How are Improvement Proposals reviewed and approved?

Can an Improvement Proposal lead to a Governance Proposal?

Voting

How does the voting process work?

How can I track the status of a Governance Proposal?

Running a Node

What if I want to stop mining but still use my account when I come back?

My node was jailed. What does it mean?

How to decommission an old cluster?

My node cannot connect to the default seed node specified in the config.env

How to change the seed nodes?

How are Hardware, Node Weight, and ML Node configuration actually validated?

How to switch to Qwen/Qwen3-235B-A22B-Instruct-2507-FP8, upgrade ML Nodes, and remove other models?

Keys & security

Which CLI version should be used for warm keys created after the v0.2.9 upgrade?

Where can I find information on key management?

I Cleared or Overwrote My Consensus Key

I Deleted the Warm Key

Proof-of-Compute (PoC)

What is Proof-of-Compute?

What is Sprint?

How to simulate Proof-of-Compute (PoC)?

What does a confirmation ratio of 0 mean, and what should I do if this happens?

Performance & troubleshooting

How do I protect my node from DDoS attacks using the proxy pre-release (v0.2.8)?

How much free disk space is required for a Cosmovisor update, and how can I safely remove old backups from the .inference directory?

How to prevent unbounded memory growth in NATS?

How to change inference_url?

Why is my application.db growing so large, and how do I fix it?

Automatic ClaimReward didn’t go through, what should I do?

Upgrades

Upgrade v0.2.14: Pre-upgrade Bridge Update

Upgrade v0.2.12: Pre-Upgrade Model Cleanup

Upgrade v0.2.12: Pre-download binaries

Bounty program

What is the bounty program? Who can participate? How are rewards paid?

How do I report a security vulnerability?

What is the vulnerability severity model?

How do I contribute to protocol development?

My node cannot connect to the default seed node specified in the `config.env`

How to switch to `Qwen/Qwen3-235B-A22B-Instruct-2507-FP8`, upgrade ML Nodes, and remove other models?

How much free disk space is required for a Cosmovisor update, and how can I safely remove old backups from the `.inference` directory?

How to change `inference_url`?

Why is my `application.db` growing so large, and how do I fix it?

Automatic `ClaimReward` didn’t go through, what should I do?

`No epoch models available for this node`

How do I fix `err="no validator signing info found"` when starting from a state sync snapshot?