[EN] A few ideas for a next twtxt version #twtxt

Created: 2024-11-06

Updated: 2025-03-03

Motivation

I first discovered twtxt on Antenna back in 2021

This led me to other related projects, like:

https://twtxt.net
https://twtxt.dev
https://darch.dk/timeline/

I’ve always appreciated the unique, "hacky" aspects of twtxt. Despite a few inconveniences compared to other micro-blogging proposals, its philosophy has always appealed to me.

twtxt 1.0 is a proposal from 2016 that grew with many community suggested fixes, and has many retro-compatible patches, like the extensions. What I’d call ‘twtxt 1.1’.

Recently I've been loosely following the efforts toward a "twtxt 2.0."

From what I’ve seen, these updates are good proposals, although there’s the dilemma of breaking compatibility with that twtxt 1.1.

Follow-up on twtxt v2 (October 8, 2024) - Server is down by 2025-03
Archive.org mirror to Follow-up on twtxt v2
sorenpeter’s ideas about a simplified twtxt (October 24, 2024)

I’ve been thinking as well of a few ideas, If we are breaking things I’ll leave these here if it helps to nurture the conversation.

Ideas for a 2.0 version

1. Better support for twt metadata

Currently there is no standard to store metadata for a tweet. In what language is it written? Has it been edited? Is it NSFW?

How about something like

{key_1=value,key_2=value}

And some keys could be, lang=[ISO 639-1] (language), spoiler=true/false, edited=[EDIT_COUNTS] (is it edited, in which revision is)

2. Make it easier to reply and to manage edits

Replying currently requires to calculate each twt hash, which doesn't allow editing the twt since its hash will change.

https://twtxt.dev/#mentions-and-threads

My proposal would be to hash the URL we are replying to instead, and before the date, each twt must have a consecutive integer as an Id.

# follow = nick https://example.com/twt.txt OOEODEHW
1   2024-11-01T09:00:00Z   Hello...
2   2024-11-01T09:01:00Z   {url=OOEODEHW,id=1} ...world!
3   2024-11-01T09:01:00Z   {url=OOEODEHW,id=2} A reply to “...world!”
4   2024-11-01T09:02:00Z   {url=OOEODEHW,id=1} A reply to 'Hello', creating a new thread
5   2024-11-01T09:10:00Z   {edited=1} A twt modified once
6   2024-11-01T09:11:00Z   {url=OOEODEHW,id=5} This way the thread is not broken even if the content of twt 5 is edited.
8   2024-11-01T09:12:00Z   The twt #7 was deleted, we assume that since there is a 8th
9   2024-11-01T09:13:00Z   {edited=99} A twt modified 99 times, WTH!

Also, I'm proposing for the hash to be based on SHA-256 (From what I've researched is strong enough to collisions, and more retro-compatible that blake), and using 8 characters to support a space of about 1.1×10¹², or a collision chance of ~0.5% if we have 99,999 URLs, based on a quick calculation:

Hash Collision Calculator

Update 2025-03: Now I think that instead of

{url=OOEODEHW,id=2}

something like

{reply_to=OOEODEHW_2}

could be better.

3. Avoid Downloading the Entire History for Each Refresh

I'd say that the twtxt spec is not clear enough on how to frequently load files from the server.

My initial approach for the ‘twtxt-php’ client was asking for .txt files every X minutes and saving it in a cache. Although for current Internet standards, 200 KB is not too much, I think that constantly loading files with thousands of twts is not optimized. After some research now I'm sure this implementation is wrong.

I wasn’t aware of the HTTP header `If-Modified-Since` where the server is responsible to inform you that the resource has been recently modified, in this case, compared with your local copy. You get a "304 Not Modified" response, not having to load the whole file.

https://taoshu.in/net/http-download-only-changed-using-wget-or-curl.html

From what I've quickly checked on the source code, Yarn, a popular platform, doesn't use that approach either. (Please correct me if I’m wrong to update this post)

Checking the source code for the original Python client, we see it does that, but it’s not well documented:

https://github.com/buckket/twtxt/blob/master/twtxt/twhttp.py#L78

4. Moving Discoverability from Web Server Log Files to an Endpoint

The original spec suggests using HTTP headers and logs:

https://twtxt.readthedocs.io/en/latest/user/discoverability.html

My client didn’t implement this, reading the log files or sending the ‘User-Agent’ request header. In retrospect it’s a decent solution... I should have worked on that, but the advantages weren't clear to me, and it wasn't easy to retrieve the logs from PHP in a shared hosting.

Also, for some protocols like Gemini (popular on the smol web) and Gopher (I don’t use it, but I think it's preferred in the retro-computing scene) we don’t usually have neither the header nor logs.

One proposal was to use WebMentions, but I’ve noticed they can be hard to implement, maintain, and aren't supported by many clients.

https://brainbaking.com/post/2023/05/why-i-retired-my-webmention-server/

My idea here is to have a simple place where you can say, "Hey, my .txt file at this URL has replies to yours. Check it out!"

# discovery_url = https://example.com/discovery/[URL]/
# discovery_url = gemini://example.com/discovery?[URL]
# discovery_url = https://mirror.example/discovery/[URL]/

For Web, it could be a form or a GET endpoint.

For Gemini I’m thinking of the ‘Input expected’ status code, like we see it in Antenna’s Send a transmission:

Gemini Spec - Input expected
https://portal.mozz.us/gemini/warmedal.se/~antenna/submit
gemini://warmedal.se/~antenna/submit

To avoid spam, you can inspect if the URL has valuable content, so you can add it to your following list after. But that's a whole new conversation.

5. Better support for different protocols and mirrors

The Web is not fully decentralized. An URL is a single failure point. To avoid that we should have mirrors to the same resources.

Also, it’s not clear in the spec what happens if a resource is down or if you are using another protocol rather than HTTP/S.

My proposal would be to allow the ‘follow’ metadata to have multiple URLs, and retrieving those in order.

# follow = nick https://example.com/twt.txt OOEODEHW
# follow = nick gemini://example.com/twt.txt
# follow = nick http://mirror.example/nick/twt.txt

In this case, if my client only supports gemini, it’ll skip the first URL until finding one with gemini protocol.

Or if it only supports http/s and the first URL is down (404, or timeout, for example), it will try with the mirror.

Same for other resources such as:

# avatar = https://example.com/picture.png
# avatar = gemini://example.com/picture.png
# avatar = http://mirror.example/nick/my_picture.png

6. Better support for multi-language files

I speak English and Spanish, and I’m learning French. I had the decision of having a single file for each language, although for my use case on X and Masto I switch quickly between those 3 languages, something not supported on twtxt.

My proposal would be to have the following

# default_lang = en
1   2024-11-01T09:00:00Z   {lang=en} A twt where we override the default file language
2   2024-11-01T09:01:00Z   {lang=es} Un twt en español
3   2024-11-01T09:02:00Z   {lang=fr} Un twt en français
4   2024-11-01T09:02:00Z   This twt is assumed in english by the client

Use the ISO two letter codes as in this standard:

ISO 639-1 standard

That way, the client may hide content in languages I don't know, or can suggest a translation.

7. Keep the UTC offset

As in

twtxt file - Format specification

I think the offset is not difficult to calculate (sum N hours to the UTC) and helps to know the local timezone when that twt was written

2024-09-23T09:26:48-06:00	I’m in Central Time
2024-09-23T09:26:48+01:00	I traveled abroad! ✈️

8. Better support for CLI clients

Although nowadays I don’t use a CLI, I thought is was a good idea supporting Gemtext or seeing twts on the terminal. There we can’t show the image, so an emoji would help to distinguish between users.

# emoji = 🍓

For example:

https://portal.mozz.us/gemini/station.martinrue.com/
gemini://station.martinrue.com/

9. Keep UTF-8

Yeah, please use UTF-8 as is in the original spec

twtxt file - Format specification

~~~

And basically that’s what I can think of now. Let me know if something resonates with you.

⁂

EOT

---

Cómo enviar una respuesta

Send me your comments to

text.eapl.mx.mebiu [at] slmail.me

My Microblogging