503 - Can't rebuild my space - it's always paused or 503

PhillyMac · March 16, 2026, 12:36am

Space: PhillyMac/deku-corpus-builder
Container logs show Streamlit running on port 7860 successfully
Browser returns 503 on every request
Root ID: 1-69b734ae-6e89e03645bc568f68d02530

I’ve tried everything I can think of or that I can ask Claude.
Help?

John6666 · March 16, 2026, 6:41am

If asking Claude doesn’t resolve the issue, there’s a possibility that HF is intentionally blocking or mistakenly blocking the space (in which case you’ll have to contact HF support website@huggingface.co), but with Streamlit, there are quite a few cases where the problem stems from configuration errors in the Dockerfile or README.md.

Most likely, your Space is launching a Streamlit process, but Hugging Face never marks the Space as a healthy HTTP target. In practice that usually means one of these: the Space metadata points HF at the wrong port, Streamlit is bound or configured in a proxy-hostile way, startup never reaches a healthy state, or the Space record itself is wedged. HF’s Docker docs say Docker Spaces expose one external app port via app_port, with 7860 as the default example, while the legacy built-in Streamlit SDK is a separate path that only allows port 8501. HF’s official Streamlit Docker template also shows a very specific healthy shape: sdk: docker, app_port: 8501, EXPOSE 8501, a health check on /_stcore/health, and --server.address=0.0.0.0. (Hugging Face)

What your four facts imply

Your facts narrow the problem a lot:

the container logs show Streamlit starting
the browser gets 503 on every request
the Space often looks paused or un-restartable
you have a load-balancer trace root id

That pattern is more consistent with routing or health failure than with ordinary app-code exceptions. Streamlit’s deployment guide says that when Streamlit appears to be running remotely but the app does not load, the most likely cause is that the Streamlit port is not actually exposed or reachable from the outside. HF’s config reference also says a Space is flagged unhealthy if startup exceeds the allowed timeout. (Streamlit Document)

Your Root ID looks like an AWS Application Load Balancer trace id, not an app-specific error code. AWS documents that X-Amzn-Trace-Id is added or updated by the load balancer and can be used to trace a request through the edge and target path. That makes it useful for Hugging Face support, but not directly diagnostic on its own. (AWS Documentation)

The most likely causes, ranked

1. Space mode or port metadata mismatch

This is my top suspect.

If your repo is a Docker Space, HF expects the external port to match README.md app_port. HF documents that the Docker Space default is 7860, but that value is only correct if your app, your Dockerfile, and your runtime all actually use the same port. If your repo is still a legacy sdk: streamlit Space, HF says only 8501 is allowed. A repo that mixes those two worlds can easily produce exactly your symptom: “Streamlit started” in logs, but 503 at the browser because the proxy is checking the wrong place. (Hugging Face)

This is why 7860 in the logs is not enough by itself. 7860 is valid for Docker Spaces, but invalid for legacy built-in Streamlit Spaces. The key question is not “what port did Streamlit print,” but “does HF route to that exact port for this exact Space type.” (Hugging Face)

2. Bind-address or health-surface problem

Second suspect.

Streamlit’s config reference says server.address controls where the server listens. If it is set to a specific address, the app is only accessible from that address. The official HF Streamlit Docker template binds to 0.0.0.0 and defines a health check at http://localhost:8501/_stcore/health, which shows the shape HF expects for a routable Streamlit container. If your app is listening on 127.0.0.1 or otherwise not on the container’s public interface, logs can look healthy while the proxy still fails every request. (Streamlit Document)

3. Streamlit reverse-proxy incompatibility

Third suspect.

Streamlit has open upstream issues around reverse proxies, path prefixes, and WebSocket traffic on /_stcore/stream. Recent examples include failures behind Istio and other proxy layers, plus problems caused by URL rewrites and subpaths. Those issues matter because Hugging Face Spaces sit behind a proxy. (GitHub)

That said, your symptom is 503 on every request, not “HTML loads but the app hangs on skeletons.” Streamlit’s own deployment guide separates those failure modes: “never loads” usually points to exposure or reachability problems, while “keeps loading forever” more often points to CORS or WebSocket problems. That makes pure proxy/WebSocket breakage possible, but not my first guess for your case. (Streamlit Document)

4. Slow or incomplete startup health

Fourth suspect.

HF documents startup_duration_timeout, default 30 minutes, and says the Space is flagged unhealthy if startup exceeds that time. If your app downloads data, builds an index, authenticates to a private upstream, or waits on a missing secret before it can actually serve traffic, the process may still appear in logs while HF never gets a healthy app surface. (Hugging Face)

5. Space-level HF state problem

Still plausible.

There are recent public reports of Docker Spaces that stop building properly or never trigger real builds despite valid files, and there are forum cases where the same code works after recreation or duplication under a new slug. That does not prove your Space is in that category, but it is a real failure family. (GitHub)

6. Intentional HF restriction

Possible in principle. Not the default reading.

HF’s Terms allow suspension or termination, and the Content Policy describes moderation decisions that can disable content or suspend an account, with an appeal path. But HF’s documented rate limiting is a 429 Too Many Requests, not a 503, and a plain 503 by itself is weak evidence of deliberate blocking. (Hugging Face)

What I think is happening in your case

My best inference is this:

HF is reaching the Space lifecycle, but not a healthy app endpoint. The Streamlit process exists. The HF edge still cannot validate the Space as ready. So the browser gets a generic 503 from the proxy layer instead of your app. That inference is supported by HF’s Docker routing model, Streamlit’s remote-deployment troubleshooting, and the official Streamlit Docker template’s use of a concrete health endpoint. (Hugging Face)

If I had to rank probabilities without seeing the repo, I would put them in this order: port/SDK mismatch first, bind/health visibility second, HF Space-state issue third, Streamlit proxy edge case fourth, intentional restriction last. That ordering is an inference from the docs and public cases above. (Hugging Face)

The exact checks I would do, in order

1. Check the README YAML first

For a Docker Space, the top of README.md should be internally consistent with the actual app port:

---
sdk: docker
app_port: 7860
startup_duration_timeout: 1h
---

That is valid only if your Dockerfile and Streamlit command also use 7860. HF documents sdk: docker and app_port, and documents startup_duration_timeout as the health timeout ceiling. (Hugging Face)

If the YAML says sdk: streamlit, then 7860 is immediately suspicious, because HF says only 8501 is allowed for built-in Streamlit Spaces. (Hugging Face)

2. Make the runtime use the same port everywhere

For a Docker Space on 7860, all of these should agree:

README.md: app_port: 7860
Dockerfile: EXPOSE 7860
startup command: streamlit run ... --server.port=7860 --server.address=0.0.0.0
optional health check: curl --fail http://localhost:7860/_stcore/health

HF’s official Streamlit template demonstrates that same pattern on 8501, which is the same principle with a different port. (Hugging Face)

3. Strip Streamlit config to the minimum safe baseline

Streamlit’s config docs are clear here:

server.address controls the listen address
server.port controls the actual server port
browser.serverPort is not how you change the app port
browser.serverAddress is the browser-facing address used for URL/CORS/XSRF purposes
baseUrlPath is only for serving under a path prefix (Streamlit Document)

For Hugging Face, the clean baseline is:

[server]
address = "0.0.0.0"
port = 7860
headless = true

[browser]
gatherUsageStats = false

What I would remove first, unless you intentionally need them:

browser.serverAddress
browser.serverPort
server.baseUrlPath

Those are common “looks fine in logs, broken in browser” settings in proxied deployments. (Streamlit Document)

4. Add a health check if you do not already have one

The official HF Streamlit Docker template uses:

HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health

That is not proof HF requires the exact same instruction in every Space, but it is a very strong known-good reference. If you are on 7860, mirror the same pattern on 7860. (Hugging Face)

5. Replace the app with a trivial hello-world once

Use the smallest possible app:

import streamlit as st
st.title("health test")
st.write("ok")

If that still 503s after the port/config cleanup, suspicion shifts away from your application code and toward either the Space record or HF-side routing/state. Streamlit’s own troubleshooting guide recommends using a simple HTTP test to separate app/config problems from platform/network problems. (Streamlit Document)

6. Duplicate the Space to a new private slug

This is a good diagnostic, not superstition. HF officially supports duplicating Spaces, and public reports show cases where the same code works under a new slug while the original remains broken. If the duplicate works, the original Space is likely wedged or flagged. (GitHub)

How to classify the failure from the browser

Open DevTools and reload once.

If / itself returns 503, that points more to HF routing/health not seeing a good backend target.
If / loads but requests to /_stcore/stream fail, that points more to Streamlit proxy/WebSocket behavior. Public Streamlit issues around /_stcore/stream and path rewrites match that pattern. (GitHub)

Since you said every request returns 503, my read is that the problem is earlier than the WebSocket layer. That is an inference, but it is the simpler fit. (Streamlit Document)

What I would not assume

I would not assume this is rate limiting. HF documents rate limiting as 429, with RateLimit headers and five-minute windows. That does not match your symptom. (Hugging Face)

I would also not assume a global HF outage right now. The public status page currently shows the main services, including Spaces and Spaces Proxy, as online. That does not rule out a Space-specific bug, but it weakens the “platform-wide incident” theory. (Hugging Peace Status)

I would also not assume intentional blocking from the 503 alone. HF can suspend or disable content, but the 503 by itself is not decisive evidence of that. (Hugging Face)

The strongest “good” baseline to compare against

If your goal is a Docker Space running Streamlit cleanly, compare your repo against this shape:

---
sdk: docker
app_port: 7860
---

EXPOSE 7860
HEALTHCHECK CMD curl --fail http://localhost:7860/_stcore/health
ENTRYPOINT ["streamlit", "run", "app.py", "--server.port=7860", "--server.address=0.0.0.0"]

[server]
address = "0.0.0.0"
port = 7860
headless = true

That is a direct adaptation of HF’s documented Docker model plus the official Streamlit Docker template’s known-good health and bind pattern. (Hugging Face)

If you have to escalate to Hugging Face

Send one concise support note with:

Space: PhillyMac/deku-corpus-builder
exact UTC time of one failed request
Root ID: 1-69b734ae-6e89e03645bc568f68d02530
whether the repo is sdk: docker or sdk: streamlit
whether a minimal hello-world app still 503s
whether a duplicated Space works
whether / fails directly or only Streamlit subrequests fail

The root id is useful because AWS traces can be correlated on the load-balancer side. HF’s policy pages also give you the right contacts if this turns out to be a moderation or account-state issue rather than a technical one. (AWS Documentation)

Bottom line

My best diagnosis is:

You are probably dealing with a deployment-shape mismatch, not a mysterious paused-state bug. The most likely mismatch is between HF’s routing expectations and what your Streamlit container is actually exposing. The first place to look is the quartet of sdk, app_port, EXPOSE, and Streamlit server.port/address. If those are already perfect, then the next most likely answer is a stuck Space record, and the duplicate-space test becomes very informative. (Hugging Face)

PhillyMac · March 16, 2026, 1:48pm

Thank you so much for this amazingly detailed response. I think it’s going to take me a bit to sort through it and do the recommended checks. Very much appreciated!

Topic		Replies	Views
Streamlit app issue, run time error ,no issues with the code 🤗Hub	6	166	June 11, 2025
Restart error 503. Factory reset. No rebuild. Same with other space Spaces	5	57	February 26, 2026
Streamlit at Huggingface Space issue Beginners	3	103	December 22, 2025
Streamlit Space Stuck on "Building" Spaces	6	1897	November 27, 2023
Spaces Docker Build Pauses and 503 Error on Restart Spaces	2	305	December 4, 2025