Forum: RetroDigital BBS

I just made the following request to GNU coreutils team.

From Annada Behera@annada@tilde.green to tilde.meta on Wed May 28 15:52:48 2025

Dear GNU Coreutils maintainers,
I am writing to propose a backward-compatible enhancement that could
improve modern scripting enviroments while maintaining complete
compatiblily with existing workflows without any impact on performance. PROBLEM:
Although the output of coreutils are meant to be human readable
only, many scripts today use/pipe them to other commands for various
kinds of automation. This leads to brittle solutions involving
complex awk/sed/grep gymnastics that break when the output format
changes slightly. While "everything is text" philosophy has served GNU/Unix/Linux well, structured data processing has become important in
modern computing.
Even Microsoft people recognized this more than 20 years ago and added
built in structured output into MS Powershell from day one completely eliminating text parsing entirely. Cloud tools like Docker, kubectl, MS
Github, Google Gcloud and increasing number of cli tools are providing
JSON output options as flags, as well as shells like Nushell, who have reimplemented most of the coreutils to output structured data. This is
not unpresidented in the industry.
PROPOSAL: stdoutm and stderrm
I would like to propose the addition of two new optional machine
readable output streams (in addition to already present human readable streams):
- stdout (fd 1): human readable output
- stderr (fd 2): human readable errors
- stdoutm (fd 3): machine readable output (NEW)
- stderrm (fd 4): machine readable errors (NEW)
The machine readable output format and conventions needs to be
established. JSON is the most obvious choice with battle-tested parsers
and tools, and immediately available for the scripting ecosystem. This
could be implemented incrementally, starting with "high-usage" commands
like (ls, ps, df, du) and then gradually expand coverage.
If the structured output is generated only when fd3/4 are open, there
should be not performance penalty and all existing behavior will
identical. It also doesn't require any flags or arguments.
EXAMPLES:
# Traditional usage - UNCHANGED
ls -l
# Structured output
ls 3>&1 1>/dev/null > metadata.json
# Structured output scripting
ls 3>&1 1>/dev/null | fx 'this.filter(x => x.size > 1048576)'
ls 3>&1 1>/dev/null | jq '.[] | select(.size > 1048576)'
# Traditional brittle approach (unreadable)
ls -la | awk '$5 > 1048576 {print $9}' | grep -v '^d'
# Structured error handling
find / -name "*.txt" 4>&1 3>/dev/null | jq '.[] | select(.error == "EACCES")'
This eliminates unreliable fragile regex based approaches, provides
structured error handling, integrated with already present tools like
fx, jq and python scripts making sure existing scripts are not affected
at all (while gradually transitioning to structured output).
Would the maintainer team be interested in discussing further?
Thank you for your time and considerations.
Annada
--- Synchronet 3.20a-Linux NewsLink 1.2

From keyboardan@keyboardan@tilde.club to tilde.meta on Sat Jun 7 16:44:25 2025

--=-=-=
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

I like this idea, Annada :-) .

=2D-=20
The pioneers of a warless world are the youth that
refuse military service. ~ Albert Einstein

--=-=-=
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQJKBAEBCgA0FiEEOVeKaEm0xBhCsMmYlk/BEMQK1XUFAmhEXloWHGtleWJvYXJk YW5AdGlsZGUuY2x1YgAKCRCWT8EQxArVddKWEAC9ENyToucDnj7ivsXc+MO+xlVk l9bgt4of02W7eWS5RQ/ViqVPA9D2oF6j4iv4mz48Pm/Q9Ye1K6qfCTUoLuTPBBdS 8N4YGXq670x4o/jVsr5Emk/MYLa2SQK7jw7zYCKgfrjAikBI1O0hbvePg4qMAqTt 7KtkAxRmtl6VlaxdjkAR2WLSqf5LqAmGOviBH1oMgHZjGx0JDZ07+Nb8tn/B2Kyv pGYU7/gDwHbJ89ysT/t8aPINRrUX2/dzoShuEEPjjuFv3euO3NwZfnfMHFUof3Tt 0jmD8Hirh9adQjTrJqhWtztCBRSuyuKMmWhm0YPgHmWZCEmlY1COl2zO3ue98poo Z/HedA21KsL/HrVu0FJj3KiWbsORJYhZ7fsbWezsXdojW5/ZL/0Mr0oJ5a+rcISB BAU75Yb+Cxh09ynHsAJoCC1aZn7Zz7wIxFiEaEDIEt+equbY+VFS920WK/7lra5W 9xTVcauX0yQFDiuIFizZWelBZNNCcCgeYzePWQE3EFEVfafccAWPd387wxWSrIEA xBFgnOAOYLUpCr5aNih2RFV10omv6XV2V98jLBPMXc6MjN09qPUu/1htTn8lHW9A FL79fqPeSlnvhro9I6OCxUsNU5q8PTeS7Z6QhjduXYBZI28xseru8aHj/s5MNzLS 9X9ytzH4zsEFz3Tmcw==
=tYMW
-----END PGP SIGNATURE-----
--=-=-=--
--- Synchronet 3.20a-Linux NewsLink 1.2

From Anton Shepelev@ant@tilde.culb to tilde.meta on Sun Jul 20 19:56:53 2025

keyboardan,

When I view your post in the tin newsreader, installed on the ~club
machine, it shows the following two attachments in addtion to the
direct text:

[-- text/plain, size 0.1K, charset US-ASCII, 6 lines, quoted-printable --]
[-- application/pgp-signature, name signature.asc, size 0.8K, 17, 7bit --]

and says it is unable to open them. Are you sure your articles are
standard Usenet messages?
--- Synchronet 3.20a-Linux NewsLink 1.2

From Anton Shepelev@ant@tilde.culb to tilde.meta on Sun Jul 20 20:15:47 2025

Subject: Re: I just made the following request to GNU coreutils team.

Please, share the URL to your poposal, that we may follow, and if need
be, participate in its discussion.

Annada Behera <annada@tilde.green> wrote:

Although the output of coreutils are meant to be human readable
only, many scripts today use/pipe them to other commands for various
kinds of automation. This leads to brittle solutions involving
complex awk/sed/grep gymnastics that break when the output format
changes slightly. While "everything is text" philosophy has served >GNU/Unix/Linux well, structured data processing has become important in >modern computing.

But pure text can also be structured and machine-oriented,
rather than human-oriented, such as tab- or comma-separated files,
which are /way/ simpler than JSON.

I would like to propose the addition of two new optional machine
readable output streams (in addition to already present human readable >streams):

- stdout (fd 1): human readable output
- stderr (fd 2): human readable errors
- stdoutm (fd 3): machine readable output (NEW)
- stderrm (fd 4): machine readable errors (NEW)

The machine readable output format and conventions needs to be
established. JSON is the most obvious choice with battle-tested parsers
and tools, and immediately available for the scripting ecosystem. This
could be implemented incrementally, starting with "high-usage" commands
like (ls, ps, df, du) and then gradually expand coverage.

I think it is a good idea, but if the medium is JSON text, then repeated parsing is still there.
--- Synchronet 3.20a-Linux NewsLink 1.2

From yeti@yeti@tilde.institute to tilde.meta on Sun Jul 20 19:00:02 2025

Anton Shepelev <ant@tilde.culb> wrote:

I think it is a good idea, but if the medium is JSON text, then
repeated parsing is still there.

That was why I did not reply yet. IMO using a blown up text
representation is counterproductive there.

I would be happier with a second set of utilities only spitting out
binary coded serialised data. Maybe just give them a different prefix?
`Bls`, `Bfind`, ...? That way the analogy to the usual commands would
be given and that would help transforming current text pipes to binary
ones ™good enough™?

Something like

MessagePack – It's like JSON, but fast and small.
<https://msgpack.org/>

may help there. That's not completely free of parsing, but looks much
lighter.
--
I do not bite, I just want to play.
--- Synchronet 3.20a-Linux NewsLink 1.2

From xwindows@xwindows@tilde.club to tilde.meta on Mon Jul 21 16:33:55 2025

On Sun, 20 Jul 2025, Anton Shepelev wrote:

[-- text/plain, size 0.1K, charset US-ASCII, 6 lines, quoted-printable --] [-- application/pgp-signature, name signature.asc, size 0.8K, 17, 7bit --]

and says it is unable to open them. Are you sure your articles are
standard Usenet messages?

The fact that TIN correctly shown that as an "attachment" [1]
as opposed to littering its Base64 content all over the main message,
mean the this is indeed a valid netnews article-- it is just formatted according to MIME standard, rather than being old-style bare
single-part article.

What this actually mean is you are reading is a GPG/OpenPGP-signed
netnews article: inside that "attachment" is just a few-line blob
of Base64-encoded message integrity metadata. This is not a new thing,
from what I have heard, it has been in-use on USENET too.

If your newsreader does not [2] support OpenPGP-MIME [3],
then it obviously can't do anything useful with this "attachment".
In such case, you can safely ignore any of such "attachment" with `application/pgp-signature` content type-- you're not missing anything
that the poster was saying or trying to show. [4]

Regards,
xwindows

[1] Scare-quoted "attachment" because it is not; as the content type
of the main message is explicitly `multipart/signed`.
A display of cryptographic signature as attachment is just
a fallback handling of in MIME-supporting newsreader
for generic `multipart/*` content type. [2]

[2] This fact is obvious, since ones which do support this standard
would not display the mysterious MIME part as "attachment",
and will instead flag this entire article as cryptographically-signed,
then proceed to show options for user to verify that signature. [5]

[3] RFC 3156: MIME Security with OpenPGP [Aug-2001]
https://www.rfc-editor.org/rfc/rfc3156.html

[4] Obligatory comic insert:
https://www.explainxkcd.com/wiki/index.php/1181:_PGP

[5] This is what displayed, when I navigated to the article you mentioned,
using Claws Mail-- which supports GnuPG integration (my emphases in red):
http://tilde.club/~xwindows/temp/2025-07-21/pgpsigned.png
https://tilde.club/~xwindows/temp/2025-07-21/pgpsigned.png
--
xwindows' gallery of freely-licensed artworks
https://tilde.club/~xwindows/ http://tilde.club/~xwindows/ gopher://tilde.club/1/~xwindows/
--- Synchronet 3.20a-Linux NewsLink 1.2

From keyboardan@keyboardan@tilde.club to tilde.meta on Mon Jul 21 11:41:05 2025

--=-=-=
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

xwindows <xwindows@tilde.club> writes:

On Sun, 20 Jul 2025, Anton Shepelev wrote:

[-- text/plain, size 0.1K, charset US-ASCII, 6 lines, quoted-printable -=

-]

[-- application/pgp-signature, name signature.asc, size 0.8K, 17, 7bit -=

-]

=20
and says it is unable to open them. Are you sure your articles are
standard Usenet messages?

The fact that TIN correctly shown that as an "attachment" [1]
as opposed to littering its Base64 content all over the main message,
mean the this is indeed a valid netnews article-- it is just formatted according to MIME standard, rather than being old-style bare
single-part article.

What this actually mean is you are reading is a GPG/OpenPGP-signed
netnews article: inside that "attachment" is just a few-line blob
of Base64-encoded message integrity metadata. This is not a new thing,
from what I have heard, it has been in-use on USENET too.

If your newsreader does not [2] support OpenPGP-MIME [3],
then it obviously can't do anything useful with this "attachment".
In such case, you can safely ignore any of such "attachment" with `application/pgp-signature` content type-- you're not missing anything
that the poster was saying or trying to show. [4]

Regards,
xwindows

[1] Scare-quoted "attachment" because it is not; as the content type
of the main message is explicitly `multipart/signed`.
A display of cryptographic signature as attachment is just
a fallback handling of in MIME-supporting newsreader
for generic `multipart/*` content type. [2]

[2] This fact is obvious, since ones which do support this standard
would not display the mysterious MIME part as "attachment",
and will instead flag this entire article as cryptographically-signed,
then proceed to show options for user to verify that signature. [5]

[3] RFC 3156: MIME Security with OpenPGP [Aug-2001]
https://www.rfc-editor.org/rfc/rfc3156.html

[4] Obligatory comic insert:
https://www.explainxkcd.com/wiki/index.php/1181:_PGP

[5] This is what displayed, when I navigated to the article you mentioned,
using Claws Mail-- which supports GnuPG integration (my emphases in r=

ed):

http://tilde.club/~xwindows/temp/2025-07-21/pgpsigned.png
https://tilde.club/~xwindows/temp/2025-07-21/pgpsigned.png

Thank you xwindows.

You have so much more patience, and you give great replying information.

What did a "standard Usenet messages" actually meant... And where did
they have read this word concept from, remains in a fog of mystery...

=2D-=20
The pioneers of a warless world are the youth that
refuse military service. ~ Albert Einstein

--=-=-=
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQJKBAEBCgA0FiEEOVeKaEm0xBhCsMmYlk/BEMQK1XUFAmh+GUIWHGtleWJvYXJk YW5AdGlsZGUuY2x1YgAKCRCWT8EQxArVdfimD/9ZLDYs1ZW7mHz7NzzY5EGsp5MX wEcEWQ1hpWDXPAdpuZFB9jD94SJjhQ8mr2qkoJ2C/zjPhv5fBQM1AMmMN6PQIJJ3 YtG7g/Zvx/k/pE5pLVImDfS2YKJpi/X/BqgdR82PTXF2GFCL5kpHqkOiVejSmVzA h8CAa+CAslaTkA9k9Mn51mxKgev4xrhq3jVb69Jj/rAgkbvAGCJROiYHwksNORu1 e/rYKGDvUwB7tVhnKqEpBnmnr8uEekK4Ag/fRpo/Q2THBXsZD3fGByuwLCsDQm4D iBmIbcXJVttyEn2hUeOL+D8pCkZ7qsNK3N/MBrTvkALIt1PDB2eqxIu52mBLol+k s6TzpAy3XbiHRjlUTwrjdKN8EeljDdTIdpqmi/Bi5tMD92+T3ljfvv+wtOu4E4J7 dXiREiI6oEw/DJHVt/ztbdE9bWYomilA4/kRO8tKNt+SqZHWSXB1uaX6Pt9YQfI1 mX2o/rEbf9kjIVgIBly9sC0BaYuOg7pYxtGbYysxSTeFFW2CDvRzoiQjU+BPjLiC Tnkx+iejFzTpjqZsJpjWp4yUaF9L4ftm64xbUZrcnXTTcEFwIzhkXfa6M3MN2bok MWkM53SbHqIcsCfvouPyUcKGDKFvyGA6A3WCw6rUz6tatijspY+Pv14Mtzew0xlx W1mx47IKe7CU6gad1A==
=l1pC
-----END PGP SIGNATURE-----
--=-=-=--
--- Synchronet 3.20a-Linux NewsLink 1.2

From xwindows@xwindows@tilde.club to tilde.meta on Mon Jul 21 21:21:03 2025

References: <c6c3a734dfa4d036146debfcfd46578f05d6db33.camel@tilde.green>

<87o6uzpndy.fsf@tilde.club>
<105j74l$231g6$1@tilde.club>

<a1174e70-7df3-4c36-90d2-bbc4f4eab155@tilde.club>
<87a54xq11a.fsf@tilde.club>

On Mon, 21 Jul 2025, keyboardan wrote:

What did a "standard Usenet messages" actually meant...

Well, there have been several revision of Netnews message formats;
majority of them actually predated MIME specification for many years.

Original netnews message format was first specified in 1983 (RFC 850),
and the longest-living one was specified in 1987 (RFC 1036);
while MIME was officially introduced in 1992 (RFC 1341)--
half-decade after it, and was mainly used just in email context.

There is actually only *one* version of netnews message format specification that was released after MIME (and officially reference it):
the current edition from 2009 (RFC 5536).

For people who actively use/used text-only USENET, I can understand
the confusion about it not being a valid format; because
even regular multipart MIME messages are not common in that that realm [1][2] (it's more common in binary-allowing USENET newsgroups,
from what I understand); and is likely quite rare to randomly
find ones used in cryptographic signing out there.

The main reasons I don't get surprised with this are:

A. I have read netnews message format specifications (both old and new),
thus know that the current version explicitly allows MIME formatting.
B. I have heard about how people used PGP/GPG to sign netnews articles.
C. I have read MIME specifiation (just casually, not back to back).
D. I kinda know what PGP/MIME-signed messages look like at byte level;
because I have used GPG-encrypted+signed emails before, and have actually
tried "view source" on the result.

I think the D. part is the most important reason; because in the past,
I have also seen users (not normies, by the way) asked the same question
but in *mailing list* context; ~ant is not alone in this one.

Regards,
~xwindows

[1] Unlike email, where attachment (a main use of multipart MIME message)
has been considered a basic user-level option since at least
mid/late-1990.

[2] Vast majority of the posts in text USENET are made the format I described
in the grandparent post as "old-style bare single-part article"
i.e. RFC 1036 pre-MIME style. Such message would sometimes have
MIME header, but they would display correctly in very-old newsreaders
that only supported old RFC 1036. (MIME multipart article would
collapse into single part and look broken, and article using
MIME with Base64 Content-Encoding on main part wouldn't be readable
under such newsreaders at all)
--
xwindows' gallery of freely-licensed artworks
https://tilde.club/~xwindows/ http://tilde.club/~xwindows/ gopher://tilde.club/1/~xwindows/
--- Synchronet 3.20a-Linux NewsLink 1.2

From xwindows@xwindows@tilde.club to tilde.meta on Mon Jul 21 21:23:26 2025

(Reposted due to incorrect threading on the last try)

On Mon, 21 Jul 2025, keyboardan wrote:

What did a "standard Usenet messages" actually meant...

Well, there have been several revision of Netnews message formats;
majority of them actually predated MIME specification for many years.

Original netnews message format was first specified in 1983 (RFC 850),
and the longest-living one was specified in 1987 (RFC 1036);
while MIME was officially introduced in 1992 (RFC 1341)--
half-decade after it, and was mainly used just in email context.

There is actually only *one* version of netnews message format specification that was released after MIME (and officially reference it):
the current edition from 2009 (RFC 5536).

For people who actively use/used text-only USENET, I can understand
the confusion about it not being a valid format; because
even regular multipart MIME messages are not common in that that realm [1][2] (it's more common in binary-allowing USENET newsgroups,
from what I understand); and is likely quite rare to randomly
find ones used in cryptographic signing out there.

The main reasons I don't get surprised with this are:

A. I have read netnews message format specifications (both old and new),
thus know that the current version explicitly allows MIME formatting.
B. I have heard about how people used PGP/GPG to sign netnews articles.
C. I have read MIME specifiation (just casually, not back to back).
D. I kinda know what PGP/MIME-signed messages look like at byte level;
because I have used GPG-encrypted+signed emails before, and have actually
tried "view source" on the result.

I think the D. part is the most important reason; because in the past,
I have also seen users (not normies, by the way) asked the same question
but in *mailing list* context; ~ant is not alone in this one.

Regards,
~xwindows

[1] Unlike email, where attachment (a main use of multipart MIME message)
has been considered a basic user-level option since at least
mid/late-1990.

[2] Vast majority of the posts in text USENET are made the format I described
in the grandparent post as "old-style bare single-part article"
i.e. RFC 1036 pre-MIME style. Such message would sometimes have
MIME header, but they would display correctly in very-old newsreaders
that only supported old RFC 1036. (MIME multipart article would
collapse into single part and look broken, and article using
MIME with Base64 Content-Encoding on main part wouldn't be readable
under such newsreaders at all)
--
xwindows' gallery of freely-licensed artworks
https://tilde.club/~xwindows/ http://tilde.club/~xwindows/ gopher://tilde.club/1/~xwindows/
--- Synchronet 3.20a-Linux NewsLink 1.2

From Anton Shepelev@ant@tilde.culb to tilde.meta on Mon Jul 21 18:52:24 2025

xwindows <xwindows@tilde.club> wrote:

On Sun, 20 Jul 2025, Anton Shepelev wrote:

[-- text/plain, size 0.1K, charset US-ASCII, 6 lines, quoted-printable --] >> [-- application/pgp-signature, name signature.asc, size 0.8K, 17, 7bit --] >>
and says it is unable to open them. Are you sure your articles are
standard Usenet messages?

The fact that TIN correctly shown that as an "attachment" [1]
as opposed to littering its Base64 content all over the main message,
mean the this is indeed a valid netnews article-- it is just formatted >according to MIME standard, rather than being old-style bare
single-part article.

What this actually mean is you are reading is a GPG/OpenPGP-signed
netnews article: inside that "attachment" is just a few-line blob
of Base64-encoded message integrity metadata. This is not a new thing,
from what I have heard, it has been in-use on USENET too.

If your newsreader does not [2] support OpenPGP-MIME [3],
then it obviously can't do anything useful with this "attachment".
In such case, you can safely ignore any of such "attachment" with >`application/pgp-signature` content type-- you're not missing anything
that the poster was saying or trying to show. [4]

In addition to the PGP part, it also complains about a .txt attachement,
with the contents of the article body, and that is rather annoying.
I will see if I can re-configure ~club's tin to stop complaining.
Sylpheed, on the other hand, knows about PGP, and does not complain.

--- Synchronet 3.20a-Linux NewsLink 1.2

From ant@ant@~.club to tilde.meta on Tue Jul 22 02:00:59 2025

xwindows:

D. I kinda know what PGP/MIME-signed messages look like at
byte level; because I have used GPG-encrypted+signed
emails before, and have actually tried "view source" on
the result.

I think the D. part is the most important reason; because
in the past, I have also seen users (not normies, by the
way) asked the same question but in *mailing list*
context; ~ant is not alone in this one.

I have seen PGP e-mails, but never knew it was used in
Usenet articles. Tin my club newsreader feels surprised as
well...
--- Synchronet 3.20a-Linux NewsLink 1.2

From Annada Behera@annada@tilde.green to tilde.meta on Thu Jul 24 13:43:01 2025

Please, share the URL to your poposal, that we may follow, and if need
be, participate in its discussion.

I send it to their mailing list. Here is a link to the mailing list
archive.
https://lists.gnu.org/archive/html/coreutils/2025-05/msg00013.html

Annada Behera <annada@tilde.green> wrote:

Although the output of coreutils are meant to be human readable
only, many scripts today use/pipe them to other commands for various
kinds of automation. This leads to brittle solutions involving
complex awk/sed/grep gymnastics that break when the output format
changes slightly. While "everything is text" philosophy has served
GNU/Unix/Linux well, structured data processing has become important
in
modern computing.

But pure text can also be structured and machine-oriented,
rather than human-oriented, such as tab- or comma-separated files,
which are /way/ simpler than JSON.

Yes, pure text can be structured. Output of, say, ls is also structured
but we have to do brittle parsing with AWK/Perl regex and run into a
lot of edge-cases too. I proposed JSON because we have some very good battle-tested JSON parsers like jq and fx. We can use TSV/CSV but the
data is not guaranteed to be tabular and CSV parsing with AWK have edge
cases where they too break in unexpected ways. XML/TOML/YAML is ok, but
then JSON is the most popular data format these days. We have a whole
species of database engines that are built on top of JSON which gives me
more confidence. And since, this is meant for machine readability any
ways, complexity of JSON shouldn't matter.

I would like to propose the addition of two new optional machine
readable output streams (in addition to already present human readable
streams):

   - stdout (fd 1): human readable output
   - stderr (fd 2): human readable errors
   - stdoutm (fd 3): machine readable output (NEW)
   - stderrm (fd 4): machine readable errors (NEW)

The machine readable output format and conventions needs to be
established. JSON is the most obvious choice with battle-tested
parsers and tools, and immediately available for the scripting
ecosystem. This could be implemented incrementally, starting with
"high-usage" commands like (ls, ps, df, du) and then gradually expand
coverage.

I think it is a good idea, but if the medium is JSON text, then repeated >parsing is still there.

I don't understand what repeated parsing you are talking about, care to elaborate.
--- Synchronet 3.20a-Linux NewsLink 1.2

From ant@ant@tilde.club to tilde.meta on Fri Jul 25 13:51:54 2025

Annada Behera to Anton Shepelev:

But pure text can also be structured and machine-
oriented, rather than human-oriented, such as tab- or
comma-separated files, which are /way/ simpler than
JSON.

Yes, pure text can be structured.

And I value that, as TSV/CSV are much simpler than JSON and
both human- and machine-readable. I should hate to lose
them from my toolchains. And I do fear to lose them, as the
advent of JSON will cause a gradual extinction of classical
processing.

Output of, say, ls is also structured but we have to do
brittle parsing with AWK/Perl regex and run into a lot of
edge-cases too.

I understand what you mean. In addition to robustness, JSON
brings a higher power of expression. At the expense of a
more compicated and less human-readable format.

With suitable settings, I believe one can parse ls, e.g.:

<https://s.tilde.club/?file=3nwf>

I think it is a good idea, but if the medium is JSON
text, then repeated parsing is still there.

I don't understand what repeated parsing you are talking
about, care to elaborate.

Consider the following toolchain, assuming JSON input and
output:
t1 | t2 | t3 | t4

tools 2..4 parse the JSON from tools 1..3. That's repeated
parsing for you, whereas a purely structural approach would
parse JSON at the input of t1, process the data internally
in its native form, and the generate JSON on output.
Otherwise, JSON is repeatedly parsed and generated. In case
of simple filtering functions, this is literall parsing of
the same JSON data.
--- Synchronet 3.20a-Linux NewsLink 1.2

From Annada Behera@annada@tilde.green to tilde.meta on Mon Jul 28 15:22:20 2025

Annada Behera to Anton Shepelev:

But pure text can also be structured and machine-
oriented, rather than human-oriented, such as tab- or
comma-separated files, which are /way/ simpler than
JSON.

Yes, pure text can be structured.

And I value that, as TSV/CSV are much simpler than JSON and
both human- and machine-readable. I should hate to lose
them from my toolchains. And I do fear to lose them, as the
advent of JSON will cause a gradual extinction of classical
processing.

Output of, say, ls is also structured but we have to do
brittle parsing with AWK/Perl regex and run into a lot of
edge-cases too.

I understand what you mean. In addition to robustness, JSON
brings a higher power of expression. At the expense of a
more compicated and less human-readable format.

I am proposing a machine-only readable format. For instance, in `ls`
if human-readablility is important, just use 'ls' like everyone
else. Human-parsibility is at 'stdout'. Fd3 is strictly meant to be machine-read, not for human beings. 'ls 3>&1 1>/dev/null' only meant for
piping to other programs. If you can come up with a more expressive, (potentially binary format) with battle tested parsers like jq/fx, I am
up for it. But at this point, JSON is so universal, anyone can look at
the output and correctly guess which tool they need to parse with.
Again, my proposal is not to replace the stdout with JSON output, my
proposal is to leave the userfacing stdout untouched, and only when a
fd-3 is open, collect the necessary data into JSON and put it in fd-3.
No performance overhead whatsoever if fd-3 is not open.

With suitable settings, I believe one can parse ls, e.g.:

<https://s.tilde.club/?file=3nwf>

Parse yes. Robust and handles edge cases. Not likely. The entire point
of format specifications. There is none of ls' output.

I think it is a good idea, but if the medium is JSON
text, then repeated parsing is still there.

I don't understand what repeated parsing you are talking
about, care to elaborate.

Consider the following toolchain, assuming JSON input and
output:
t1 | t2 | t3 | t4

tools 2..4 parse the JSON from tools 1..3. That's repeated
parsing for you, whereas a purely structural approach would
parse JSON at the input of t1, process the data internally
in its native form, and the generate JSON on output.
Otherwise, JSON is repeatedly parsed and generated. In case
of simple filtering functions, this is literall parsing of
the same JSON data.

Interesting point. But a lot of tools in coreutils are essentially
helpers around the this specific human-machine-parsibility issue. For
example, `ls | wc -l` or `ls | sort` is essentially a doing processing
on the output of the ls. In my world, you do your processing in a
complete programming language of your choice. For instance, you'd do
something like,
ls 3>&1 1>dev>null | fx 'this.length' # for ls | wc -l
ls 3>&1 1>dev>null | fx 'this.sort()' # for ls | sort
See how fx uses Javascript to do any processing. With the entire weight
of a JS to do any kind of processing you want. So, ideally the '3>&1 1>/dev/null' should be at the end of your pipes. JS is just an example,
you choose your poison. This is language agnostic as long as it has a
library that can parse JSON. And if you have a legacy pipeline that does something you have been relying on, then you leave it untouched and
just append '3>&1 4 1>/dev/null' and continue with a real programming
language from there.
And honestly, I feel the repeated parsing is a little overstated. Unix pipelines are generally very short and JSON parsers are extremely
optimized and fast even for a computer from 1990s, without any
brittleness whatsoever. Don't let perfect be the enemy of good.
--- Synchronet 3.20a-Linux NewsLink 1.2

From ant@ant@tilde.club to tilde.meta on Mon Jul 28 13:55:18 2025

Annada Behera to Anton Shepelev:

I understand what you mean.┴ In addition to robustness,
JSON brings a higher power of expression. At the
expense of a more compicated and less human-readable
format.

I am proposing a machine-only readable format. For
instance, in `ls` if human-readablility is important, just
use 'ls' like everyone else. Human-parsibility is at
'stdout'. Fd3 is strictly meant to be machine-read, not
for human beings. 'ls 3>&1 1>/dev/null' only meant for
piping to other programs.

Yes, I see your point: you are proposing, as it were, and
different signal band, a complete separation between human-
and machine-oriented I/O. And I am a tad worried about this
because I value the /unity/ that currently exists in Unix
tools, where the text-stream I/O is simultaneously machine-
and human-oriented, with all the attendant quirks and
problems.

If you can come up with a more expressive, (potentially
binary format) with battle tested parsers like jq/fx, I am
up for it. But at this point, JSON is so universal, anyone
can look at the output and correctly guess which tool they
need to parse with.

JSON is very good as it is, and please to leave it in text
form in your proposal, to keep some of the human
readability. We humans need it, if only for debugging our
tool chains.

Again, my proposal is not to replace the stdout with JSON
output, my proposal is to leave the userfacing stdout
untouched, and only when a fd-3 is open, collect the
necessary data into JSON and put it in fd-3. No
performance overhead whatsoever if fd-3 is not open.

Nor did I performace overhead among my misgivings.

With suitable settings, I believe one can parse ls,
e.g.:

<https://s.tilde.club/?file=3nwf>

Parse yes. Robust and handles edge cases. Not likely.

This may be a straw man for you, what with my little
experience on the Unix console, but may I ask to go ahead
and break my script?

The entire point of format specifications. There is none
of ls' output.

How so? Is the POSIX doc ambiguous?--

https://manned.org/man/ls.1p#head11

Consider the following toolchain, assuming JSON input
and output:

t1 | t2 | t3 | t4

tools 2..4 parse the JSON from tools 1..3. That's
repeated parsing for you, whereas a purely structural
approach would parse JSON at the input of t1, process
the data internally in its native form, and the generate
JSON on output. Otherwise, JSON is repeatedly parsed
and generated. In case of simple filtering functions,
this is literall parsing of the same JSON data.

Interesting point. But a lot of tools in coreutils are
essentially helpers around the this specific human-
machine-parsibility issue.

I think they are means to an end, rather than helpers around
an issue...

For example, `ls | wc -l` or `ls | sort` is essentially a
doing processing on the output of the ls.

Yes, which is why many tools have added the -h option for
human-oriented output.

In my world, you do your processing in a complete
programming language of your choice.

Then you have to write Lua, Perl, Python, Pascal[1], or even
C, and then rely on libraries instead of shell utilities.
This is a valid, but completely different mode of operation
from using the shell and utilities, including mini-languages
such as ed(1), sed(1), awk(1), and grep(1).

For instance, you'd do something like,

ls 3>&1 1>dev>null | fx 'this.length' # for ls | wc -l
ls 3>&1 1>dev>null | fx 'this.sort()' # for ls | sort

See how fx uses Javascript to do any processing.

Yes. I did not know from your previous examples that `fx'
used Javascript. In my view, so huge, complicated, and
trendy a language as Javascript is hardly compatible with
the Unix Way, cf.:

<https://felipec.wordpress.com/2025/02/13/rust-not-for-linux/>

And honestly, I feel the repeated parsing is a little
overstated.

Overstated or not, it is there in Unix toolchains and in
your proposal, and, for aught I understand, eliminated in
Powershell.

Unix pipelines are generally very short and JSON parsers
are extremely optimized and fast even for a computer from
1990s, without any brittleness whatsoever. Don't let
perfect be the enemy of good.

Well, I did not suggest that you modify your proposal to
avoid repeated parsing...
____________________
1. https://wiki.freepascal.org/Pascal_Script
--- Synchronet 3.20a-Linux NewsLink 1.2

From Annada Behera@annada@tilde.green to tilde.meta on Tue Jul 29 17:12:16 2025

-----Original Message-----
From: ant <ant@tilde.club>
Subject: Re: I just made the following request to GNU coreutils team.
Date: 07/28/2025 04:25:18 PM
Newsgroups: tilde.meta
Annada Behera to Anton Shepelev:

I understand what you mean.а In addition to robustness,
JSON brings a higher power of expression. At the
expense of a more compicated and less human-readable
format.

I am proposing a machine-only readable format. For
instance, in `ls` if human-readablility is important, just
use 'ls' like everyone else. Human-parsibility is at
'stdout'. Fd3 is strictly meant to be machine-read, not
for human beings. 'ls 3>&1 1>/dev/null' only meant for
piping to other programs.

Yes, I see your point: you are proposing, as it were, and
different signal band, a complete separation between human-
and machine-oriented I/O. And I am a tad worried about this
because I value the /unity/ that currently exists in Unix
tools, where the text-stream I/O is simultaneously machine-
and human-oriented, with all the attendant quirks and
problems.

You may value the unity that currently exists, my proposal
is not breaking any of your workflow. It is not substituing
what currently exists, it's additive for people who value
robust parsing.

If you can come up with a more expressive, (potentially
binary format) with battle tested parsers like jq/fx, I am
up for it. But at this point, JSON is so universal, anyone
can look at the output and correctly guess which tool they
need to parse with.

JSON is very good as it is, and please to leave it in text
form in your proposal, to keep some of the human
readability. We humans need it, if only for debugging our
tool chains.

I too want it that way, text JSON, precisely for the debugging
reason you mentioned.

Again, my proposal is not to replace the stdout with JSON
output, my proposal is to leave the userfacing stdout
untouched, and only when a fd-3 is open, collect the
necessary data into JSON and put it in fd-3. No
performance overhead whatsoever if fd-3 is not open.

Nor did I performace overhead among my misgivings.

Ok.

With suitable settings, I believe one can parse ls,
e.g.:

           <https://s.tilde.club/?file=3nwf>

Parse yes. Robust and handles edge cases. Not likely.

This may be a straw man for you, what with my little
experience on the Unix console, but may I ask to go ahead
and break my script?

I didn't have to look carefully. I just ran it on my home
directory and it broke for files older than 6 months, which
has 7 fields instead of 8, even with careful field selection
--time-style=iso, --quoting-style=shell-escape. And you
don't get proper types, everything is a string.
And even if you could perfect it with all the edge cases and
all, you are essentially writing a human-readable to JSON
parser. Are you really going to put it on one liners and
beat the battle tested extremely optimized JSON parsers out
there.

The entire point of format specifications. There is none
of ls' output.

How so? Is the POSIX doc ambiguous?--

Does ls output have a specification? The POSIX documentation
describes the layout for human consumption. It defines what
the columns mean, but it is not a formal data interchange
specification in the way that RFC 8259 is for JSON. It
leaves too much ambiguity for robust machine parsing (e.g.,
the timestamp format change).

            https://manned.org/man/ls.1p#head11

Consider the following toolchain, assuming JSON input
and output:

                   t1 | t2 | t3 | t4

tools 2..4 parse the JSON from tools 1..3. That's
repeated parsing for you, whereas a purely structural
approach would parse JSON at the input of t1, process
the data internally in its native form, and the generate
JSON on output. Otherwise, JSON is repeatedly parsed
and generated. In case of simple filtering functions,
this is literall parsing of the same JSON data.

Interesting point. But a lot of tools in coreutils are
essentially helpers around the this specific human-
machine-parsibility issue.

I think they are means to an end, rather than helpers around
an issue...

I am not saying that those tools are useless on their own
and don't provide essential computational primitives, I am
saying that, they are now being used as a workaround for
brittle parsing.

For example, `ls | wc -l` or `ls | sort` is essentially a
doing processing on the output of the ls.

Yes, which is why many tools have added the -h option for
human-oriented output.

The -h flag evolution shows that the Unix community already
recognized the human/machine tension.

In my world, you do your processing in a complete
programming language of your choice.

Then you have to write Lua, Perl, Python, Pascal[1], or even
C, and then rely on libraries instead of shell utilities.
This is a valid, but completely different mode of operation
from using the shell and utilities, including mini-languages
such as ed(1), sed(1), awk(1), and grep(1).

No. You don't have to write Lua/Perl/Python. Again, 'stdout'
is untouched. If you already have a script/mini-language
that works for you, more power to you.
For those who opt-in, tools like `jq` or `fx` act as more
powerful "mini-languages" within the shell, directly
analogous to `awk` or `sed`, but operating on structured
data instead of fragile text.
You stay within the shell, using the same pipeline
composition model.

For instance, you'd do something like,

ls 3>&1 1>/dev/null | fx 'this.length' # for ls | wc -l
ls 3>&1 1>/dev/null | fx 'this.sort()' # for ls | sort

See how fx uses Javascript to do any processing.

Yes. I did not know from your previous examples that `fx'
used Javascript. In my view, so huge, complicated, and
trendy a language as Javascript is hardly compatible with
the Unix Way, cf.:

<https://felipec.wordpress.com/2025/02/13/rust-not-for-linux/>

First, you are attacking Javascript, when I am proposing
machine-readability. My proposal, again, doesn't care what
you are using to parse the JSON. You can use C (if that is
"Unix Way" enough, for you) for all you want.
Second, you are confusing "different from 1970s" design with
"against Unix philosophy." Unix philosophy say,
- Do one thing well - output machine readable JSON, nothing more.
- Compose together - fd3 proposal is fundamentally a composing tool.
- Text streams - JSON is just text, structured text, but text.
It's beside the point, but I'd argue, just because JS is
used in web-bloat, doesn't mean it's bad. Google/Apple
has poured billions of dollars into the language and the
engineering that has gone into it would make any 70s Unix
pioneer proud. Tools are neutral, it's important how you use
them. It's probably faster then your AWK/sed/bash at this
point.

And honestly, I feel the repeated parsing is a little
overstated.

Overstated or not, it is there in Unix toolchains and in
your proposal, and,

JSON parser are optimized of handling a very narrow
specifications, unlike AWK/grep that that are more general
and has to handle wide variety of cases, regex engines need
complex backtracking. JSON parsers are highly specialized,
vectorized and single pass with predictable parser. Unlike
your Unix toolchains, JSON parsing can be streamable. So,
parsing starts as soon as it is dumped.
For majority of use cases, the JSON is not going to be
dumped any where, it only exists in pipes and fifo, which
even a 90s computer will not break a sweat. The bottleneck
is not JSON parsing, it's IO. You are optimizing for
microseconds in JSON repeated parsing, while the miliseconds
in IO is the bottleneck.
And I am not even factoring in the brittleness.

for aught I understand, eliminated in Powershell.

And no one uses Powershell, (and similar like NuShell)
because it break the user's workflow. It proves the concept
but requires abandoning existing Unix tools entirely - my
proposal keeps everything working.

Unix pipelines are generally very short and JSON parsers
are extremely optimized and fast even for a computer from
1990s, without any brittleness whatsoever. Don't let
perfect be the enemy of good.

Well, I did not suggest that you modify your proposal to
avoid repeated parsing...

Repeated parsing is not an issue as I mentioned earlier.
____________________
1. https://wiki.freepascal.org/Pascal_Script
--- Synchronet 3.20a-Linux NewsLink 1.2

Who's Online

System Info

Sysop:	deepend
Location:	Calgary, Alberta
Users:	279
Nodes:	10 (0 / 10)
Uptime:	46:01:18
Calls:	2,398
Files:	5,145
D/L today:	13 files (7,305K bytes)
Messages:	436,299

Synchronet Oneliners
- Vintagegeek@rdbbs
  Thu Sep 25 09:47:55 2025
  The Price is Right
- Vintagegeek@rdbbs
  Thu Oct 2 09:15:49 2025
  Next
- Vintagegeek@rdbbs
  Fri Oct 3 10:18:45 2025
  Gotta gotra gotta - make a hole
- Dev00903@rdbbs
  Mon Oct 6 04:48:23 2025
  hello guys
- Dev00903@rdbbs
  Mon Oct 6 04:48:48 2025
  how it works this?
- Vintagegeek@rdbbs
  Fri Oct 10 11:47:51 2025
  Is it 1984 ?

I just made the following request to GNU coreutils team.

Who's Online

System Info

Synchronet Oneliners