Account issues are not typically handled on the forum, so you’ll need to email support (website@huggingface.co) first.
That said, this symptom doesn’t seem to be an account issue. Maybe.
My view is that this is probably not a hidden chat-completions restriction on your account.
The current Hugging Face docs do not describe a separate chat-only entitlement beyond the normal routed-inference requirements: a Hugging Face account, a fine-grained token with “Make calls to Inference Providers”, remaining Inference Providers credits, and a model that is actually available for chat on a provider-backed route. (Hugging Face)
What your current evidence means
The important distinction is this:
GET /v1/models is a discovery call.
POST /v1/chat/completions is an execution call.
A successful models-list response shows that your token is present and acceptable enough for the router to return catalog information. It does not prove that the router can execute a paid, provider-backed chat request for your chosen model and provider path. Hugging Face’s own docs explicitly list remaining credits as a prerequisite for chat-style routed usage. (Hugging Face)
So this combination:
/v1/models = 200
/v1/chat/completions = 401
is entirely possible without any special account ban.
Why your specific setup looks risky
This is the strongest clue in your case:
- you used
Qwen/Qwen3.5-9B:preferred
- you set HF Inference as your preferred provider
Hugging Face’s current Inference Providers docs say that:
- the default router behavior is effectively
:fastest
:preferred follows your provider preference order
- you can also pin a provider explicitly with
model:provider syntax. (Hugging Face)
Separately, the current Supported Models catalog shows Qwen/Qwen3.5-9B with together as the provider listing. (Hugging Face)
That means your test is not a neutral “does Qwen work on the router?” test. It is a more specific test:
“Does this model work when I force routing to follow my preferred-provider policy, where HF Inference is ranked first?”
That is a weaker test, because it mixes model choice with provider policy.
The most likely explanations, ranked
1. Provider-selection mismatch
This is my top diagnosis.
Your model appears in the current catalog with Together. Your request uses :preferred, and your settings prefer HF Inference. That makes it plausible that you are steering the router toward a provider/model path that is not the natural or supported chat backend for that model. (Hugging Face)
This is also consistent with how Inference Providers works now. It is not the old “single generic inference backend.” It is provider-aware and task-aware. (Hugging Face)
2. Credits or billing problem with misleading error text
This is the second most likely.
The docs say chat-style Inference Providers usage requires remaining credits. The pricing docs also explain that Team and Enterprise orgs can centralize billing with X-HF-Bill-To, and org admins can set spending limits or disable providers. (Hugging Face)
There is also a public forum case where the surfaced message included {"error":"Invalid username or password."} even though the deeper issue involved payment. That means this error string is not always a literal diagnosis. (Hugging Face Forums)
3. Runtime token mismatch
Still possible, but less convincing than the first two.
There are public issues where Invalid username or password came from token or environment problems, including whoami failures and bad runtime token state. (GitHub)
But in your case, the same token already works for /v1/models, and you already regenerated the token after changing settings. That weakens the “purely bad token” theory.
4. Account-side backend bug
Possible, but lower probability.
There are multiple public reports of the same 401 body on router chat paths, including Qwen-related examples and HF Inference-backed routes. That means account- or router-side bugs do happen. (Hugging Face Forums)
But that is not the first conclusion I would jump to, because your current provider-policy choice already creates a cleaner explanation.
What I do not think is happening
I do not think the best explanation is:
“Your account is blocked from chat completions, but allowed to use /v1/models.”
I did not find official documentation for a separate hidden permission like that. The documented requirements are token scope, credits, and provider-backed model availability. (Hugging Face)
How to isolate it cleanly
Use a staged test plan. Each step changes one variable only.
Step 1: Stop using :preferred
This is the first thing to change.
Use either:
:fastest, which follows the router’s default policy, or
- an explicit provider suffix.
The docs explicitly document both patterns. (Hugging Face)
Step 2: Test a known-good router chat example
Use a model that Hugging Face itself uses in current router examples.
curl -i https://huggingface.co/proxy/router.huggingface.co/v1/chat/completions \
-H "Authorization: Bearer $HF_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-oss-120b:fastest",
"messages": [{"role":"user","content":"Reply with exactly OK"}],
"max_tokens": 8
}'
Hugging Face’s current docs use openai/gpt-oss-120b as a normal chat-completions example on the router. (Hugging Face)
Interpretation:
- if this works, your account and token are probably fine for chat
- if this fails with the same 401, look harder at credits, billing, or account-side issues
Step 3: Test your Qwen model with an explicit provider
Because the Supported Models catalog currently shows Qwen/Qwen3.5-9B under Together, test that exact provider path.
curl -i https://huggingface.co/proxy/router.huggingface.co/v1/chat/completions \
-H "Authorization: Bearer $HF_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3.5-9B:together",
"messages": [{"role":"user","content":"Reply with exactly OK"}],
"max_tokens": 8
}'
If this works but Qwen/Qwen3.5-9B:preferred fails, then the problem is almost certainly your provider preference path, not your account. (Hugging Face)
Step 4: Verify the exact token used in the failing runtime
Do this inside the same environment that fails.
import os
from huggingface_hub import HfApi
token = os.environ["HF_TOKEN"].strip()
print("suffix:", token[-8:])
print(HfApi().whoami(token=token))
The token docs confirm that user access tokens are the normal bearer-token mechanism for Inference Providers. (Hugging Face)
This catches:
- old token still loaded
- whitespace or newline contamination
- Streamlit secrets not matching shell env
- wrong variable name used by the app
Step 5: Check credits and org billing
If you are on Team or Enterprise, or if billing should go to an org, test with the org billing header:
curl -i https://huggingface.co/proxy/router.huggingface.co/v1/chat/completions \
-H "Authorization: Bearer $HF_TOKEN" \
-H "X-HF-Bill-To: your-org-name" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-oss-120b:fastest",
"messages": [{"role":"user","content":"Reply with exactly OK"}],
"max_tokens": 8
}'
HF documents this billing mode directly. (Hugging Face)
This step matters if:
- your personal credits are empty
- your org should be paying
- your org admins limited providers or spending
Step 6: Compare router base path patterns
Use only the documented router path:
POST https://huggingface.co/proxy/router.huggingface.co/v1/chat/completions
with the model in the JSON body.
Do not mix this with older or provider-path-specific URL styles unless you have a reason. The current chat-completions docs show the OpenAI-style /v1/chat/completions route with the model passed in the request body. (Hugging Face)
How to read the outcomes
Here is the diagnosis matrix.
Case A: gpt-oss-120b:fastest works, Qwen/Qwen3.5-9B:preferred fails
That strongly suggests provider-policy mismatch.
Case B: gpt-oss-120b:fastest works, Qwen/Qwen3.5-9B:together also works
That confirms your account is fine and :preferred was the problem.
Case C: both fail with the same 401
That points more toward:
- credits/billing
- runtime token mismatch
- account-side router issue
Case D: adding X-HF-Bill-To fixes it
That means the problem was billing path, not chat permissions.
When to suspect a real account problem
Suspect a real account/backend issue only if all of these are true:
- a known-good router model like
openai/gpt-oss-120b:fastest still fails
- an explicit-provider Qwen test still fails
- the exact runtime token passes a
whoami check
- credits and billing path are confirmed
- the same minimal
curl fails outside your app too
At that point, you have narrowed it to something HF support can actually investigate.
There are public examples of unusual account-wide 401 problems on Hugging Face, but they usually affect far more than just one chat-completions path. (Hugging Face Forums)
What to send support if needed
If you escalate, send:
-
exact timestamp
-
full response headers
-
full response body
-
x-request-id from the failing response
-
your HF username
-
whether billing is personal or org
-
whether X-HF-Bill-To changes anything
-
results of:
openai/gpt-oss-120b:fastest
Qwen/Qwen3.5-9B:together
That gives support enough to separate auth, billing, provider routing, and account-state issues. Public HF discussions around 401 troubleshooting repeatedly rely on request IDs and minimal reproductions. (Hugging Face Forums)
My actual conclusion for your case
Most likely:
- not a hidden chat restriction
- not a missing extra permission beyond the one you already enabled
- most likely a provider-selection problem caused by
:preferred + HF Inference
- second most likely a credits/billing issue surfaced with a misleading auth message
A broader caution also applies: on platforms like HF, some failures can come from server-side behavior changes rather than a local code change alone.
The fastest isolating test is this pair:
openai/gpt-oss-120b:fastest
Qwen/Qwen3.5-9B:together