I first discovered twtxt on Antenna back in 2021
gemini://warmedal.se/~antenna/
This led me to other related projects, like:
https://twtxt.net
https://twtxt.dev
https://darch.dk/timeline/
I’ve always appreciated the unique, "hacky" aspects of twtxt. Despite a few inconveniences compared to other micro-blogging proposals, its philosophy has always appealed to me.
twtxt 1.0 is a proposal from 2016 that grew with many community suggested fixes, and has many retro-compatible patches, like the extensions. What I’d call ‘twtxt 1.1’.
Recently I've been loosely following the efforts toward a "twtxt 2.0."
From what I’ve seen, these updates are good proposals, although there’s the dilemma of breaking compatibility with that twtxt 1.1.
Follow-up on twtxt v2 (October 8, 2024)
sorenpeter’s ideas about a simplified twtxt (October 24, 2024)
I’ve been thinking as well of a few ideas, If we are breaking things I’ll leave these here if it helps to nurture the conversation.
Currently there is no standard to store metadata for a tweet. In what language is it written? Has it been edited? Is it NSFW?
How about something like
{key_1=value,key_2=value}
And some keys could be, lang=[ISO 639-1] (language), spoiler=true/false, edited=[EDIT_COUNTS] (is it edited, in which revision is)
Replying currently requires to calculate each twt hash, which doesn't allow editing the twt since its hash will change.
https://twtxt.dev/#mentions-and-threads
My proposal would be to hash the URL we are replying to instead, and before the date, each twt must have a consecutive integer as an Id.
# follow = nick https://example.com/twt.txt OOEODEHW 1 2024-11-01T09:00:00Z Hello... 2 2024-11-01T09:01:00Z {url=OOEODEHW,id=1} ...world! 3 2024-11-01T09:01:00Z {url=OOEODEHW,id=2} A reply to “...world!” 4 2024-11-01T09:02:00Z {url=OOEODEHW,id=1} A reply to 'Hello', creating a new thread 5 2024-11-01T09:10:00Z {edited=1} A twt modified once 6 2024-11-01T09:11:00Z {url=OOEODEHW,id=5} This way the thread is not broken even if the content of twt 5 is edited. 8 2024-11-01T09:12:00Z The twt #7 was deleted, we assume that since there is a 8th 9 2024-11-01T09:13:00Z {edited=99} A twt modified 99 times, WTH!
Also, I'm proposing for the hash to be based on SHA-256 (From what I've researched is strong enough to collisions, and more retro-compatible that blake), and using 8 characters to support a space of about 1.1×10¹², or a collision chance of ~0.5% if we have 99,999 URLs, based on a quick calculation:
I'd say that the twtxt spec is not clear enough on how to frequently load files from the server.
My initial approach for the ‘twtxt-php’ client was asking for .txt files every X minutes and saving it in a cache. Although for current Internet standards, 200 KB is not too much, I think that constantly loading files with thousands of twts is not optimized. After some research now I'm sure this implementation is wrong.
I wasn’t aware of the HTTP header `If-Modified-Since` where the server is responsible to inform you that the resource has been recently modified, in this case, compared with your local copy. You get a "304 Not Modified" response, not having to load the whole file.
https://taoshu.in/net/http-download-only-changed-using-wget-or-curl.html
From what I've quickly checked on the source code, Yarn, a popular platform, doesn't use that approach either. (Please correct me if I’m wrong to update this post)
Checking the source code for the original Python client, we see it does that, but it’s not well documented:
https://github.com/buckket/twtxt/blob/master/twtxt/twhttp.py#L78
The original spec suggests using HTTP headers and logs:
https://twtxt.readthedocs.io/en/latest/user/discoverability.html
My client didn’t implement this, reading the log files or sending the ‘User-Agent’ request header. In retrospect it’s a decent solution... I should have worked on that, but the advantages weren't clear to me, and it wasn't easy to retrieve the logs from PHP in a shared hosting.
Also, for some protocols like Gemini (popular on the smol web) and Gopher (I don’t use it, but I think it's preferred in the retro-computing scene) we don’t usually have neither the header nor logs.
One proposal was to use WebMentions, but I’ve noticed they can be hard to implement, maintain, and aren't supported by many clients.
https://brainbaking.com/post/2023/05/why-i-retired-my-webmention-server/
My idea here is to have a simple place where you can say, "Hey, my .txt file at this URL has replies to yours. Check it out!"
# discovery_url = https://example.com/discovery/[URL]/ # discovery_url = gemini://example.com/discovery?[URL] # discovery_url = https://mirror.example/discovery/[URL]/
For Web, it could be a form or a GET endpoint.
For Gemini I’m thinking of the ‘Input expected’ status code, like we see it in Antenna’s Send a transmission:
Gemini Spec - Input expected
https://portal.mozz.us/gemini/warmedal.se/~antenna/submit
gemini://warmedal.se/~antenna/submit
To avoid spam, you can inspect if the URL has valuable content, so you can add it to your following list after. But that's a whole new conversation.
The Web is not fully decentralized. An URL is a single failure point. To avoid that we should have mirrors to the same resources.
Also, it’s not clear in the spec what happens if a resource is down or if you are using another protocol rather than HTTP/S.
My proposal would be to allow the ‘follow’ metadata to have multiple URLs, and retrieving those in order.
# follow = nick https://example.com/twt.txt OOEODEHW # follow = nick gemini://example.com/twt.txt # follow = nick http://mirror.example/nick/twt.txt
In this case, if my client only supports gemini, it’ll skip the first URL until finding one with gemini protocol.
Or if it only supports http/s and the first URL is down (404, or timeout, for example), it will try with the mirror.
Same for other resources such as:
# avatar = https://example.com/picture.png # avatar = gemini://example.com/picture.png # avatar = http://mirror.example/nick/my_picture.png
I speak English and Spanish, and I’m learning French. I had the decision of having a single file for each language, although for my use case on X and Masto I switch quickly between those 3 languages, something not supported on twtxt.
My proposal would be to have the following
# default_lang = en 1 2024-11-01T09:00:00Z {lang=en} A twt where we override the default file language 2 2024-11-01T09:01:00Z {lang=es} Un twt en español 3 2024-11-01T09:02:00Z {lang=fr} Un twt en français 4 2024-11-01T09:02:00Z This twt is assumed in english by the client
Use the ISO two letter codes as in this standard:
That way, the client may hide content in languages I don't know, or can suggest a translation.
As in
twtxt file - Format specification
I think the offset is not difficult to calculate (sum N hours to the UTC) and helps to know the local timezone when that twt was written
2024-09-23T09:26:48-06:00 I’m in Central Time 2024-09-23T09:26:48+01:00 I traveled abroad! ✈️
Although nowadays I don’t use a CLI, I thought is was a good idea supporting Gemtext or seeing twts on the terminal. There we can’t show the image, so an emoji would help to distinguish between users.
# emoji = 🍓
For example:
https://portal.mozz.us/gemini/station.martinrue.com/
gemini://station.martinrue.com/
Yeah, please use UTF-8 as is in the original spec
twtxt file - Format specification
~~~
And basically that’s what I can think of now. Let me know if something resonates with you.
EOT
---
Get all entries in EPUB format / Todos los textos en un EPUB
¡Envíame tus comentarios!
Send me your comments to
text.eapl.mx.mebiu [at] slmail.me
or