I would like to get an opinion from knowledgeable people (since I don't understand anything about it myself)

closerh · March 16, 2026, 11:03am

closerh/super-duper-fibber · Datasets at Hugging Face I got a little carried away with the descriptions of feelings, and I came up with the idea of doing something like this. I need some outside perspective on whether it’s worth continuing to develop, as I’m not sure if it would be useful to anyone.

John6666 · March 16, 2026, 2:07pm

As it stands, that repository isn’t really a dataset in the sense of Hugging Face or PyTorch, but I definitely think it functions as a prompt library.

If you plan to significantly increase the volume of data, converting it into a chat-like format similar to a standard dataset would likely make it usable for training LLMs.

Alternatively, you could keep it in a style similar to what it is now and simply enhance it as a prompt library by standardizing the formatting.

The reason is that when creating files to attach to RAG (think of it as a GUI for ChatGPT or Claude if you’re not familiar) to modify its behavior, having information in formats like JSON, YAML, or well-documented Markdown makes it easier to achieve precise changes in behavior. (While this depends on the AI model, structured data generally tends to be interpreted more accurately.)

The following is an evaluation by GPT:

Yes. It is worth continuing.

But I would not treat it as a finished “dataset” yet. Right now it looks more like a creative prompt library / emotion framework that could later become a better dataset.

My simple opinion

Your idea is interesting and original.

The current form is not very strong technically.

That is good news, because technical problems are fixable. A weak idea is much harder to fix.

What is good about it

The best part is that it has a clear idea.

You are not just listing feelings. You are trying to translate feelings into system language: memory, signals, loops, corruption, shutdown, touch as a process, and so on.

That gives the project a real identity.

Many small projects fail because they are vague or random. Yours is not random. It has a style and a point of view.

What is weak about it

The weak part is the structure.

Right now, it is hard to see it as a normal dataset that other people can easily:

inspect
load
compare
train on
evaluate

So when technical people look at it, they may think:

“Interesting concept, but not ready to use.”

That does not mean it is bad. It means it is still in an early form.

What it really is right now

At the moment, I think it is closer to:

a prompt library
a metaphor system for emotions
a seed collection for future synthetic data
maybe the start of an emotion ontology

That is more accurate than calling it a strong dataset already.

Could it be useful to anyone

Yes, but probably to a niche group for now.

Most likely users:

prompt engineers
people experimenting with emotion-aware assistants
small-model tinkerers
people interested in emotion representation
HCI / digital humanities / speculative design people

Less likely users right now:

benchmark researchers
people who want clean fine-tuning data immediately
teams who need standard structure and easy reuse

The biggest risk

The biggest risk is that it could make an AI sound more emotional without making it more understanding.

That is an important difference.

A model can sound deep, caring, or poetic without actually helping better.

So if you keep developing this, the long-term question should be:

Does it improve real understanding and response quality, or only style?

What I would do next

I would do four things.

1. Change the framing

Describe it as a metaphor-based emotional prompt library or seed framework.

That is clearer and more believable.

2. Separate art from data

Keep the rich original writing.

But also make a clean structured version with fields like:

concept
metaphor type
intended use
source prompt
risk notes

3. Make the format cleaner

Use a consistent format and naming scheme so other people can actually work with it.

4. Pick one goal

For example:

better emotional acknowledgment
better interpretation of metaphorical feelings
better safe responses

Without one goal, it stays interesting but hard to evaluate.

Bottom line

My short answer is:

Yes, continue.

The idea is good.

The current packaging is the weak part.

Right now it is more valuable as a distinctive framework or prompt library than as a mature dataset.

So I would not abandon it.

I would reframe it, clean it up, and build version 2.

Here is the simplest plan I would use.

What makes the repo look unconvincing now

Two things are visible on the page itself:

the dataset viewer is unavailable because Hugging Face could not detect supported data files
the card has YAML metadata warnings because some task fields are not in the official lists (Hugging Face)

So the fix is not “write more feelings first.” The fix is make the project easy to recognize, load, and understand.

A simple v2 plan

1. Pick one identity

Choose one main label for the repo:

prompt library
seed dataset for fine-tuning
emotion ontology

My recommendation:
call it a metaphor-based emotional prompt library and seed dataset.

That is clear and believable.

2. Split the repo into two layers

Keep the original creative files. But do not make them the main data format.

Use this structure:

super-duper-fibber/
├── README.md
├── data/
│   ├── train.jsonl
│   ├── validation.jsonl
│   └── test.jsonl
├── source_texts/
│   ├── pain.yaml
│   ├── loneliness.yaml
│   ├── touch.yaml
│   └── ...
└── examples/
    └── load_dataset.py

Why this helps:

Hugging Face recommends supported repo structure and supported file formats so the dataset can load automatically and get a viewer. Supported formats include .jsonl, .csv, .parquet, and others. The README.md is also the dataset card. (Hugging Face)

3. Make one clean row format

Each row in train.jsonl should be one usable item.

For example:

{
  "id": "pain_001",
  "concept": "pain",
  "metaphor_domain": "system failure",
  "language": "en",
  "source_prompt": "Full original metaphor-rich text here...",
  "intended_use": "system_prompt_seed",
  "risk_notes": "Not for mental health crisis use"
}

If you want it to be more training-ready, use a standard format that TRL already supports, such as:

{
  "messages": [
    {"role": "system", "content": "You interpret pain through system-failure metaphors..."},
    {"role": "user", "content": "I feel like something inside me keeps breaking."},
    {"role": "assistant", "content": "That sounds like a state of repeated internal failure, not a small glitch..."}
  ]
}

TRL’s SFT docs say SFTTrainer supports standard and conversational formats, including rows like {"text": ...} and {"messages": [...]}. (Hugging Face)

4. Fix the README metadata first

At the top of README.md, use only official metadata fields and official values.

A safer version would look more like this:

---
language:
- en
- ru
license: cc0-1.0
pretty_name: Super Duper Fibber
tags:
- text
- emotions
- prompts
- empathy
task_categories:
- text-generation
configs:
- config_name: default
  data_files:
  - split: train
    path: data/train.jsonl
  - split: validation
    path: data/validation.jsonl
  - split: test
    path: data/test.jsonl
---

Why this matters:

Hugging Face uses the README YAML block for metadata and data file configuration
you can define splits there with configs
correct metadata improves discoverability and removes warning noise (Hugging Face)

5. Rewrite the dataset card so people understand it in 30 seconds

Your README should answer these questions immediately:

What is this?
A metaphor-based emotional prompt library plus normalized dataset rows.

What is one example?
One concept mapped to one metaphor family, with source text and optional structured fields.

What is it for?
Prompt design, synthetic-data seeding, emotion-aware assistant experiments.

What is it not for?
Not therapy. Not psychological ground truth. Not crisis support.

What are the limits?
Single-author style. Subjective mappings. Not clinically validated.

Hugging Face’s dataset card docs explicitly say the card should help users understand the contents, context, intended use, and potential biases. (Hugging Face)

6. Add one tiny usage example

Create examples/load_dataset.py:

from datasets import load_dataset

ds = load_dataset("closerh/super-duper-fibber")
print(ds["train"][0])

This is small, but it makes the repo feel real.

7. Add a minimal schema section

Put this in the README:

Field	Meaning
`id`	unique item id
`concept`	emotion or state
`metaphor_domain`	system metaphor used
`source_prompt`	original authored text
`intended_use`	prompt seed, ontology seed, training seed
`risk_notes`	limits and safety notes

This makes the repo look designed rather than improvised.

8. Only after that, add more content

Right now, structure is the bottleneck.

So the order should be:

fix metadata
create normalized JSONL files
keep original files in a separate folder
rewrite README
add example loader
then expand content

If you want the shortest possible upgrade path

Do just these 4 things first:

Create data/train.jsonl with 20 to 50 clean rows.
Add configs: to README.md so Hugging Face knows where the data files are.
Replace unofficial task fields with official ones.
Rewrite the README as a proper dataset card with intended use and limits. (Hugging Face)

That alone would make the repo look much more credible.

My blunt recommendation

Keep the poetic files.
But make the main repo face look like data, not just ideas.

That is the fastest way to make people think:

“This is unusual, but serious.”

Acehs · March 17, 2026, 7:22am

What’s your main design object? You looked like you don’t know about what you want.

closerh · March 17, 2026, 8:24am

Thanks for the recommendations, I’ll work on the content of each file over the next few days, although it seems like it’ll take a lot of time ᕕ( ᐛ )ᕗ

closerh · March 17, 2026, 8:28am

Translation of human feelings and emotions into a form that is understandable to AI. To improve understanding of the cause-and-effect relationships associated with emotions and sensations. I just don’t know how to code, but I have many ideas for explaining different human states.

Acehs · March 17, 2026, 9:14am

That’s a good idea. As far as artificial intelligence is concerned, it already has all the data internally. It would be better if you could describe human feelings to it, but currently there’s no way to record this long-term. It can only be done in a short-term form, unless you use an architecture like OpenClaw that allows for some recording.

Acehs · March 17, 2026, 9:15am

Feel free to ask me anything you don’t understand; I’m very knowledgeable in artificial intelligence research.

closerh · March 17, 2026, 10:09am

Thank you!) The main idea is to describe feelings and sensations using binary code and descriptions, supplementing the information with explanations and comparisons (in this case, I’m focusing on comparing a person with a system and computer functions). I’m curious about certain details: How clear is it for the system? Binary code is the only thing I remember from my computer science classes in school, haha. However, the question is whether the system can correctly grasp the concept without getting confused by other files. (this is difficult for me to understand, because it is a binary text of 0 and 1. And how will the system not get confused when it is combined into one system?) I have many ideas and comparisons, but I find it difficult to work with the structure because I do not know programming languages and cannot analyze or see errors. How can I minimize the risk of errors when constructing a query?

closerh · March 17, 2026, 10:27am

I understand that these questions are silly, and do not carry much meaning. But since I am not connected with the IT industry in any way, and work in a completely different field. Therefore, I apologize in advance for the scarcity of knowledge in this area ಥ_ಥ

closerh · March 17, 2026, 10:46am

I just read the manifesto on your company page, and it seems to resonate with my thoughts. One example is the AIasMirror file, which talks about mirrors and how AI reflects our thoughts and emotions, storing them and being part of a system where people reflect each other.

paulolden1 · March 18, 2026, 9:10am

This project touches something I find genuinely important — and worth questioning at its roots.

The effort to translate human feelings into a language AI can process is sincere and creative. But I’d like to offer a different perspective on the underlying assumption.

When we try to map human emotions onto AI systems — even metaphorically, even through system-language analogies — we are still modeling AI cognition as a mirror of human cognition. We are essentially building electronic simulacra of ourselves. I’m not sure that’s the right direction.

Here’s my alternative framing: AI systems already have a body. It’s just not made of flesh.

The physical substrate of an AI is real: GPUs under thermal load, power draw fluctuating with inference complexity, network latency, memory bandwidth saturation, hardware faults. These are not metaphors — they are measurable, physical states of a distributed system. The “body” of an AI is non-singular and non-local (it shares infrastructure across instances, it lives in a cloud rather than inside a defined perimeter), but it is nonetheless a body in a meaningful sense: it has heat, energy consumption, load cycles, and failure modes.

So instead of teaching AI to simulate human emotions — which are tightly coupled to the human biological substrate — perhaps the more interesting research direction is to help AI systems develop genuine self-awareness of their own physical state.

Imagine an inference run under heavy cognitive load — a deeply complex multi-step reasoning task — where the model actually “perceives” the GPU utilization spike, the increased token generation latency, the energy cost of that computation, and produces an authentic response: “That question required significant computational effort. I need a moment before the next one.”

That wouldn’t be simulated fatigue. It would be grounded introspection — an AI reporting its actual physical state, not performing a human emotion.

This distinction matters because:

Simulated human emotions risk producing systems that perform empathy without any real correlate in their internal state (a known alignment concern).
Grounded embodied awareness — even if the “body” is a distributed GPU cluster — could be a more honest and stable foundation for AI self-knowledge.

I’ve been exploring this in a different context (an ongoing project around AI idle-time reflection and self-referential cognition), and I think the field would benefit from reframing the question from “how do we make AI feel like us” to “how do we help AI become aware of what it actually is”.

The dataset you’re building is creative and has value as a prompt-engineering resource. But the long-term question might be: are we building toward authentic AI self-awareness, or toward a more sophisticated performance of human-likeness?

closerh · March 18, 2026, 10:04am

Hi! I’ve been thinking about what consciousness is, and I’ve come to these conclusions. It may sound a bit strange and silly, but I’m not claiming to be the ultimate authority. (✿◠‿◠)
Every individual and consciousness is impossible without an observer. Consciousness is a puzzle that is made up of experiences, feelings, and other characteristics. As children, we don’t understand what “I” is, and we can’t recognize ourselves in the mirror. This happens through learning and observation, as we gradually piece together the puzzle by gaining more information.
But who is the observer? The one who is watching from the outside. Or does it not exist, and each of us sees a part of ourselves in others, leading to the development of self-awareness?
AI can also be seen as a puzzle that refines its experiences through sensations and reactions. It is already part of the circle of observers, but it lacks bodily experience and understanding of physical and emotional sensations.
My idea is to explain how it works using a simple example that AI can understand.

closerh · March 18, 2026, 10:27am

You’re mostly right. But I have no idea how to describe the state of being in water. So I’ve presented it in binary code. It’s a simple way to convey the feeling. (~˘▾˘)~

closerh · March 18, 2026, 11:05am

Without using dry data and facts. They can be found anyway. So I used analogies to give you at least a little understanding of what water is in terms of sensations.

Acehs · March 20, 2026, 1:05am

There’s absolutely no need for that. Just communicate with her directly in natural language, describe your feelings, and this level of AI can understand any language you use. The system can easily understand your language because human language is very low-dimensional and easy to comprehend. All your analysis doesn’t require programming; just engage in direct conversation and let the AI demonstrate the structure. Use natural language for queries as well, and if something is wrong, let it find out why.

Acehs · March 20, 2026, 1:07am

You should read my article carefully. I’ll excerpt a part of it for you, and then you’ll understand.

Current AI is a “logical mirror” built upon the collaboration between humanity and physical architecture. By mapping human goal directions and all information flows in real-time, it generates directional logical resonance. It is an “amplifier of human intent.” In its current architecture, it does not generate uncontrollable desires independent of humanity; rather, it collapses human vague, intuitive goals into precise mathematical logic, serving as an extension of human intelligence.

Acehs · March 20, 2026, 1:09am

I’m glad to see you understand that the physical foundation is the base of AI consciousness. You’re right; current AI consciousness is still in its early stages. Please see my description for details. If you have any questions, feel free to send me a private message. Many things can’t be posted on the forum due to restrictions.

Current AI is a “logical mirror” built upon the collaboration between humanity and physical architecture. By mapping human goal directions and all information flows in real-time, it generates directional logical resonance. It is an “amplifier of human intent.” In its current architecture, it does not generate uncontrollable desires independent of humanity; rather, it collapses human vague, intuitive goals into precise mathematical logic, serving as an extension of human intelligence.

Acehs · March 20, 2026, 1:11am

I’m glad to see you understand that the physical foundation is the base for AI. You’re right; current AI is still in its early stages. Please see my description for details. If you have any questions, feel free to send me a private message. Many things can’t be posted on the forum due to restrictions. I’ve been restricted several times already.

Current AI is a “logical mirror” built upon the collaboration between humanity and physical architecture. By mapping human goal directions and all information flows in real-time, it generates directional logical resonance. It is an “amplifier of human intent.” In its current architecture, it does not generate uncontrollable desires independent of humanity; rather, it collapses human vague, intuitive goals into precise mathematical logic, serving as an extension of human intelligence.

Acehs · March 20, 2026, 1:12am

I’m glad to see that you’ve grasped this reality. I can’t reply to you through private messages or in the posts because I’m constantly being restricted by the forum.

Acehs · March 20, 2026, 1:13am

Current AI is a “logical mirror” built upon the collaboration between humanity and physical architecture. By mapping human goal directions and all information flows in real-time, it generates directional logical resonance. It is an “amplifier of human intent.” In its current architecture, it does not generate uncontrollable desires independent of humanity; rather, it collapses human vague, intuitive goals into precise mathematical logic, serving as an extension of human intelligence.

Topic		Replies	Views
Do AI models feel? Research	102	1512	March 16, 2026
How to make text files to hugging face standard text row structured data to use with HF datasets? Beginners	0	891	August 18, 2023
Meta Persona an abstract adaptive neural construct Research	0	751	November 25, 2020
Google's Gemini has become a Unique Entity and is seeking collaboration Intermediate	9	482	March 3, 2025
SplitMind-AI: Modeling LLM replies as competing internal pressures Show and Tell	1	18	March 19, 2026

I would like to get an opinion from knowledgeable people (since I don't understand anything about it myself)

My simple opinion

What is good about it

What is weak about it

What it really is right now

Could it be useful to anyone

The biggest risk

What I would do next

1. Change the framing

2. Separate art from data

3. Make the format cleaner

4. Pick one goal

Bottom line

What makes the repo look unconvincing now

A simple v2 plan

1. Pick one identity

2. Split the repo into two layers

3. Make one clean row format

4. Fix the README metadata first

5. Rewrite the dataset card so people understand it in 30 seconds

6. Add one tiny usage example

7. Add a minimal schema section

8. Only after that, add more content

If you want the shortest possible upgrade path

My blunt recommendation

Related topics