401 Unauthorized on Inference Providers router chat completions (token works for /v1/models)

Hi HF team,
I’m trying to use the Inference Providers router for chat completions:

POST https://huggingface.co/proxy/router.huggingface.co/v1/chat/completions

I consistently get:
{“error”:“Invalid username or password.”} (401)

What I’ve already verified:

  • GET https://huggingface.co/proxy/router.huggingface.co/v1/models succeeds (status 200) with the same token.

  • Token is fine‑grained with “Make calls to Inference Providers.”

  • Inference Providers are enabled in settings.

  • I set HF Inference as preferred provider.

  • I generated a new token after enabling settings.

Model tried:

  • Qwen/Qwen3.5-9B:preferred (also tried other model/provider combos)

The issue persists both in a small Python test and in a Streamlit app.

Could you check if my account has a restriction for chat completions or if there’s another required permission?

Thanks!

2 Likes

Account issues are not typically handled on the forum, so you’ll need to email support (website@huggingface.co) first.

That said, this symptom doesn’t seem to be an account issue. Maybe.


My view is that this is probably not a hidden chat-completions restriction on your account.

The current Hugging Face docs do not describe a separate chat-only entitlement beyond the normal routed-inference requirements: a Hugging Face account, a fine-grained token with “Make calls to Inference Providers”, remaining Inference Providers credits, and a model that is actually available for chat on a provider-backed route. (Hugging Face)

What your current evidence means

The important distinction is this:

  • GET /v1/models is a discovery call.
  • POST /v1/chat/completions is an execution call.

A successful models-list response shows that your token is present and acceptable enough for the router to return catalog information. It does not prove that the router can execute a paid, provider-backed chat request for your chosen model and provider path. Hugging Face’s own docs explicitly list remaining credits as a prerequisite for chat-style routed usage. (Hugging Face)

So this combination:

  • /v1/models = 200
  • /v1/chat/completions = 401

is entirely possible without any special account ban.

Why your specific setup looks risky

This is the strongest clue in your case:

  • you used Qwen/Qwen3.5-9B:preferred
  • you set HF Inference as your preferred provider

Hugging Face’s current Inference Providers docs say that:

  • the default router behavior is effectively :fastest
  • :preferred follows your provider preference order
  • you can also pin a provider explicitly with model:provider syntax. (Hugging Face)

Separately, the current Supported Models catalog shows Qwen/Qwen3.5-9B with together as the provider listing. (Hugging Face)

That means your test is not a neutral “does Qwen work on the router?” test. It is a more specific test:

“Does this model work when I force routing to follow my preferred-provider policy, where HF Inference is ranked first?”

That is a weaker test, because it mixes model choice with provider policy.

The most likely explanations, ranked

1. Provider-selection mismatch

This is my top diagnosis.

Your model appears in the current catalog with Together. Your request uses :preferred, and your settings prefer HF Inference. That makes it plausible that you are steering the router toward a provider/model path that is not the natural or supported chat backend for that model. (Hugging Face)

This is also consistent with how Inference Providers works now. It is not the old “single generic inference backend.” It is provider-aware and task-aware. (Hugging Face)

2. Credits or billing problem with misleading error text

This is the second most likely.

The docs say chat-style Inference Providers usage requires remaining credits. The pricing docs also explain that Team and Enterprise orgs can centralize billing with X-HF-Bill-To, and org admins can set spending limits or disable providers. (Hugging Face)

There is also a public forum case where the surfaced message included {"error":"Invalid username or password."} even though the deeper issue involved payment. That means this error string is not always a literal diagnosis. (Hugging Face Forums)

3. Runtime token mismatch

Still possible, but less convincing than the first two.

There are public issues where Invalid username or password came from token or environment problems, including whoami failures and bad runtime token state. (GitHub)

But in your case, the same token already works for /v1/models, and you already regenerated the token after changing settings. That weakens the “purely bad token” theory.

4. Account-side backend bug

Possible, but lower probability.

There are multiple public reports of the same 401 body on router chat paths, including Qwen-related examples and HF Inference-backed routes. That means account- or router-side bugs do happen. (Hugging Face Forums)

But that is not the first conclusion I would jump to, because your current provider-policy choice already creates a cleaner explanation.

What I do not think is happening

I do not think the best explanation is:

“Your account is blocked from chat completions, but allowed to use /v1/models.”

I did not find official documentation for a separate hidden permission like that. The documented requirements are token scope, credits, and provider-backed model availability. (Hugging Face)

How to isolate it cleanly

Use a staged test plan. Each step changes one variable only.

Step 1: Stop using :preferred

This is the first thing to change.

Use either:

  • :fastest, which follows the router’s default policy, or
  • an explicit provider suffix.

The docs explicitly document both patterns. (Hugging Face)

Step 2: Test a known-good router chat example

Use a model that Hugging Face itself uses in current router examples.

curl -i https://huggingface.co/proxy/router.huggingface.co/v1/chat/completions \
  -H "Authorization: Bearer $HF_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-120b:fastest",
    "messages": [{"role":"user","content":"Reply with exactly OK"}],
    "max_tokens": 8
  }'

Hugging Face’s current docs use openai/gpt-oss-120b as a normal chat-completions example on the router. (Hugging Face)

Interpretation:

  • if this works, your account and token are probably fine for chat
  • if this fails with the same 401, look harder at credits, billing, or account-side issues

Step 3: Test your Qwen model with an explicit provider

Because the Supported Models catalog currently shows Qwen/Qwen3.5-9B under Together, test that exact provider path.

curl -i https://huggingface.co/proxy/router.huggingface.co/v1/chat/completions \
  -H "Authorization: Bearer $HF_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3.5-9B:together",
    "messages": [{"role":"user","content":"Reply with exactly OK"}],
    "max_tokens": 8
  }'

If this works but Qwen/Qwen3.5-9B:preferred fails, then the problem is almost certainly your provider preference path, not your account. (Hugging Face)

Step 4: Verify the exact token used in the failing runtime

Do this inside the same environment that fails.

import os
from huggingface_hub import HfApi

token = os.environ["HF_TOKEN"].strip()
print("suffix:", token[-8:])
print(HfApi().whoami(token=token))

The token docs confirm that user access tokens are the normal bearer-token mechanism for Inference Providers. (Hugging Face)

This catches:

  • old token still loaded
  • whitespace or newline contamination
  • Streamlit secrets not matching shell env
  • wrong variable name used by the app

Step 5: Check credits and org billing

If you are on Team or Enterprise, or if billing should go to an org, test with the org billing header:

curl -i https://huggingface.co/proxy/router.huggingface.co/v1/chat/completions \
  -H "Authorization: Bearer $HF_TOKEN" \
  -H "X-HF-Bill-To: your-org-name" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-120b:fastest",
    "messages": [{"role":"user","content":"Reply with exactly OK"}],
    "max_tokens": 8
  }'

HF documents this billing mode directly. (Hugging Face)

This step matters if:

  • your personal credits are empty
  • your org should be paying
  • your org admins limited providers or spending

Step 6: Compare router base path patterns

Use only the documented router path:

  • POST https://huggingface.co/proxy/router.huggingface.co/v1/chat/completions

with the model in the JSON body.

Do not mix this with older or provider-path-specific URL styles unless you have a reason. The current chat-completions docs show the OpenAI-style /v1/chat/completions route with the model passed in the request body. (Hugging Face)

How to read the outcomes

Here is the diagnosis matrix.

Case A: gpt-oss-120b:fastest works, Qwen/Qwen3.5-9B:preferred fails

That strongly suggests provider-policy mismatch.

Case B: gpt-oss-120b:fastest works, Qwen/Qwen3.5-9B:together also works

That confirms your account is fine and :preferred was the problem.

Case C: both fail with the same 401

That points more toward:

  • credits/billing
  • runtime token mismatch
  • account-side router issue

Case D: adding X-HF-Bill-To fixes it

That means the problem was billing path, not chat permissions.

When to suspect a real account problem

Suspect a real account/backend issue only if all of these are true:

  • a known-good router model like openai/gpt-oss-120b:fastest still fails
  • an explicit-provider Qwen test still fails
  • the exact runtime token passes a whoami check
  • credits and billing path are confirmed
  • the same minimal curl fails outside your app too

At that point, you have narrowed it to something HF support can actually investigate.

There are public examples of unusual account-wide 401 problems on Hugging Face, but they usually affect far more than just one chat-completions path. (Hugging Face Forums)

What to send support if needed

If you escalate, send:

  • exact timestamp

  • full response headers

  • full response body

  • x-request-id from the failing response

  • your HF username

  • whether billing is personal or org

  • whether X-HF-Bill-To changes anything

  • results of:

    • openai/gpt-oss-120b:fastest
    • Qwen/Qwen3.5-9B:together

That gives support enough to separate auth, billing, provider routing, and account-state issues. Public HF discussions around 401 troubleshooting repeatedly rely on request IDs and minimal reproductions. (Hugging Face Forums)

My actual conclusion for your case

Most likely:

  1. not a hidden chat restriction
  2. not a missing extra permission beyond the one you already enabled
  3. most likely a provider-selection problem caused by :preferred + HF Inference
  4. second most likely a credits/billing issue surfaced with a misleading auth message

A broader caution also applies: on platforms like HF, some failures can come from server-side behavior changes rather than a local code change alone.

The fastest isolating test is this pair:

  • openai/gpt-oss-120b:fastest
  • Qwen/Qwen3.5-9B:together