• I just made the following request to GNU coreutils team.

    From Annada Behera@annada@tilde.green to tilde.meta on Wed May 28 15:52:48 2025
    Dear GNU Coreutils maintainers,
    I am writing to propose a backward-compatible enhancement that could
    improve modern scripting enviroments while maintaining complete
    compatiblily with existing workflows without any impact on performance. PROBLEM:
    Although the output of coreutils are meant to be human readable
    only, many scripts today use/pipe them to other commands for various
    kinds of automation. This leads to brittle solutions involving
    complex awk/sed/grep gymnastics that break when the output format
    changes slightly. While "everything is text" philosophy has served GNU/Unix/Linux well, structured data processing has become important in
    modern computing.
    Even Microsoft people recognized this more than 20 years ago and added
    built in structured output into MS Powershell from day one completely eliminating text parsing entirely. Cloud tools like Docker, kubectl, MS
    Github, Google Gcloud and increasing number of cli tools are providing
    JSON output options as flags, as well as shells like Nushell, who have reimplemented most of the coreutils to output structured data. This is
    not unpresidented in the industry.
    PROPOSAL: stdoutm and stderrm
    I would like to propose the addition of two new optional machine
    readable output streams (in addition to already present human readable streams):
    - stdout (fd 1): human readable output
    - stderr (fd 2): human readable errors
    - stdoutm (fd 3): machine readable output (NEW)
    - stderrm (fd 4): machine readable errors (NEW)
    The machine readable output format and conventions needs to be
    established. JSON is the most obvious choice with battle-tested parsers
    and tools, and immediately available for the scripting ecosystem. This
    could be implemented incrementally, starting with "high-usage" commands
    like (ls, ps, df, du) and then gradually expand coverage.
    If the structured output is generated only when fd3/4 are open, there
    should be not performance penalty and all existing behavior will
    identical. It also doesn't require any flags or arguments.
    EXAMPLES:
    # Traditional usage - UNCHANGED
    ls -l
    # Structured output
    ls 3>&1 1>/dev/null > metadata.json
    # Structured output scripting
    ls 3>&1 1>/dev/null | fx 'this.filter(x => x.size > 1048576)'
    ls 3>&1 1>/dev/null | jq '.[] | select(.size > 1048576)'
    # Traditional brittle approach (unreadable)
    ls -la | awk '$5 > 1048576 {print $9}' | grep -v '^d'
    # Structured error handling
    find / -name "*.txt" 4>&1 3>/dev/null | jq '.[] | select(.error == "EACCES")'
    This eliminates unreliable fragile regex based approaches, provides
    structured error handling, integrated with already present tools like
    fx, jq and python scripts making sure existing scripts are not affected
    at all (while gradually transitioning to structured output).
    Would the maintainer team be interested in discussing further?
    Thank you for your time and considerations.
    Annada
    --- Synchronet 3.20a-Linux NewsLink 1.2
  • From keyboardan@keyboardan@tilde.club to tilde.meta on Sat Jun 7 16:44:25 2025
    --=-=-=
    Content-Type: text/plain
    Content-Transfer-Encoding: quoted-printable

    I like this idea, Annada :-) .

    =2D-=20
    The pioneers of a warless world are the youth that
    refuse military service. ~ Albert Einstein

    --=-=-=
    Content-Type: application/pgp-signature; name="signature.asc"

    -----BEGIN PGP SIGNATURE-----

    iQJKBAEBCgA0FiEEOVeKaEm0xBhCsMmYlk/BEMQK1XUFAmhEXloWHGtleWJvYXJk YW5AdGlsZGUuY2x1YgAKCRCWT8EQxArVddKWEAC9ENyToucDnj7ivsXc+MO+xlVk l9bgt4of02W7eWS5RQ/ViqVPA9D2oF6j4iv4mz48Pm/Q9Ye1K6qfCTUoLuTPBBdS 8N4YGXq670x4o/jVsr5Emk/MYLa2SQK7jw7zYCKgfrjAikBI1O0hbvePg4qMAqTt 7KtkAxRmtl6VlaxdjkAR2WLSqf5LqAmGOviBH1oMgHZjGx0JDZ07+Nb8tn/B2Kyv pGYU7/gDwHbJ89ysT/t8aPINRrUX2/dzoShuEEPjjuFv3euO3NwZfnfMHFUof3Tt 0jmD8Hirh9adQjTrJqhWtztCBRSuyuKMmWhm0YPgHmWZCEmlY1COl2zO3ue98poo Z/HedA21KsL/HrVu0FJj3KiWbsORJYhZ7fsbWezsXdojW5/ZL/0Mr0oJ5a+rcISB BAU75Yb+Cxh09ynHsAJoCC1aZn7Zz7wIxFiEaEDIEt+equbY+VFS920WK/7lra5W 9xTVcauX0yQFDiuIFizZWelBZNNCcCgeYzePWQE3EFEVfafccAWPd387wxWSrIEA xBFgnOAOYLUpCr5aNih2RFV10omv6XV2V98jLBPMXc6MjN09qPUu/1htTn8lHW9A FL79fqPeSlnvhro9I6OCxUsNU5q8PTeS7Z6QhjduXYBZI28xseru8aHj/s5MNzLS 9X9ytzH4zsEFz3Tmcw==
    =tYMW
    -----END PGP SIGNATURE-----
    --=-=-=--
    --- Synchronet 3.20a-Linux NewsLink 1.2
  • From Anton Shepelev@ant@tilde.culb to tilde.meta on Sun Jul 20 19:56:53 2025
    keyboardan,

    When I view your post in the tin newsreader, installed on the ~club
    machine, it shows the following two attachments in addtion to the
    direct text:

    [-- text/plain, size 0.1K, charset US-ASCII, 6 lines, quoted-printable --]
    [-- application/pgp-signature, name signature.asc, size 0.8K, 17, 7bit --]

    and says it is unable to open them. Are you sure your articles are
    standard Usenet messages?
    --- Synchronet 3.20a-Linux NewsLink 1.2
  • From Anton Shepelev@ant@tilde.culb to tilde.meta on Sun Jul 20 20:15:47 2025
    Subject: Re: I just made the following request to GNU coreutils team.

    Please, share the URL to your poposal, that we may follow, and if need
    be, participate in its discussion.

    Annada Behera <annada@tilde.green> wrote:

    Although the output of coreutils are meant to be human readable
    only, many scripts today use/pipe them to other commands for various
    kinds of automation. This leads to brittle solutions involving
    complex awk/sed/grep gymnastics that break when the output format
    changes slightly. While "everything is text" philosophy has served >GNU/Unix/Linux well, structured data processing has become important in >modern computing.

    But pure text can also be structured and machine-oriented,
    rather than human-oriented, such as tab- or comma-separated files,
    which are /way/ simpler than JSON.

    I would like to propose the addition of two new optional machine
    readable output streams (in addition to already present human readable >streams):

    - stdout (fd 1): human readable output
    - stderr (fd 2): human readable errors
    - stdoutm (fd 3): machine readable output (NEW)
    - stderrm (fd 4): machine readable errors (NEW)

    The machine readable output format and conventions needs to be
    established. JSON is the most obvious choice with battle-tested parsers
    and tools, and immediately available for the scripting ecosystem. This
    could be implemented incrementally, starting with "high-usage" commands
    like (ls, ps, df, du) and then gradually expand coverage.

    I think it is a good idea, but if the medium is JSON text, then repeated parsing is still there.
    --- Synchronet 3.20a-Linux NewsLink 1.2
  • From yeti@yeti@tilde.institute to tilde.meta on Sun Jul 20 19:00:02 2025
    Anton Shepelev <ant@tilde.culb> wrote:

    I think it is a good idea, but if the medium is JSON text, then
    repeated parsing is still there.

    That was why I did not reply yet. IMO using a blown up text
    representation is counterproductive there.

    I would be happier with a second set of utilities only spitting out
    binary coded serialised data. Maybe just give them a different prefix?
    `Bls`, `Bfind`, ...? That way the analogy to the usual commands would
    be given and that would help transforming current text pipes to binary
    ones ™good enough™?

    Something like

    MessagePack – It's like JSON, but fast and small.
    <https://msgpack.org/>

    may help there. That's not completely free of parsing, but looks much
    lighter.
    --
    I do not bite, I just want to play.
    --- Synchronet 3.20a-Linux NewsLink 1.2
  • From xwindows@xwindows@tilde.club to tilde.meta on Mon Jul 21 16:33:55 2025
    On Sun, 20 Jul 2025, Anton Shepelev wrote:

    [-- text/plain, size 0.1K, charset US-ASCII, 6 lines, quoted-printable --] [-- application/pgp-signature, name signature.asc, size 0.8K, 17, 7bit --]

    and says it is unable to open them. Are you sure your articles are
    standard Usenet messages?

    The fact that TIN correctly shown that as an "attachment" [1]
    as opposed to littering its Base64 content all over the main message,
    mean the this is indeed a valid netnews article-- it is just formatted according to MIME standard, rather than being old-style bare
    single-part article.

    What this actually mean is you are reading is a GPG/OpenPGP-signed
    netnews article: inside that "attachment" is just a few-line blob
    of Base64-encoded message integrity metadata. This is not a new thing,
    from what I have heard, it has been in-use on USENET too.

    If your newsreader does not [2] support OpenPGP-MIME [3],
    then it obviously can't do anything useful with this "attachment".
    In such case, you can safely ignore any of such "attachment" with `application/pgp-signature` content type-- you're not missing anything
    that the poster was saying or trying to show. [4]

    Regards,
    xwindows


    [1] Scare-quoted "attachment" because it is not; as the content type
    of the main message is explicitly `multipart/signed`.
    A display of cryptographic signature as attachment is just
    a fallback handling of in MIME-supporting newsreader
    for generic `multipart/*` content type. [2]

    [2] This fact is obvious, since ones which do support this standard
    would not display the mysterious MIME part as "attachment",
    and will instead flag this entire article as cryptographically-signed,
    then proceed to show options for user to verify that signature. [5]

    [3] RFC 3156: MIME Security with OpenPGP [Aug-2001]
    https://www.rfc-editor.org/rfc/rfc3156.html

    [4] Obligatory comic insert:
    https://www.explainxkcd.com/wiki/index.php/1181:_PGP

    [5] This is what displayed, when I navigated to the article you mentioned,
    using Claws Mail-- which supports GnuPG integration (my emphases in red):
    http://tilde.club/~xwindows/temp/2025-07-21/pgpsigned.png
    https://tilde.club/~xwindows/temp/2025-07-21/pgpsigned.png
    --
    xwindows' gallery of freely-licensed artworks
    https://tilde.club/~xwindows/ http://tilde.club/~xwindows/ gopher://tilde.club/1/~xwindows/
    --- Synchronet 3.20a-Linux NewsLink 1.2
  • From keyboardan@keyboardan@tilde.club to tilde.meta on Mon Jul 21 11:41:05 2025
    --=-=-=
    Content-Type: text/plain
    Content-Transfer-Encoding: quoted-printable

    xwindows <xwindows@tilde.club> writes:

    On Sun, 20 Jul 2025, Anton Shepelev wrote:

    [-- text/plain, size 0.1K, charset US-ASCII, 6 lines, quoted-printable -=
    -]
    [-- application/pgp-signature, name signature.asc, size 0.8K, 17, 7bit -=
    -]
    =20
    and says it is unable to open them. Are you sure your articles are
    standard Usenet messages?

    The fact that TIN correctly shown that as an "attachment" [1]
    as opposed to littering its Base64 content all over the main message,
    mean the this is indeed a valid netnews article-- it is just formatted according to MIME standard, rather than being old-style bare
    single-part article.

    What this actually mean is you are reading is a GPG/OpenPGP-signed
    netnews article: inside that "attachment" is just a few-line blob
    of Base64-encoded message integrity metadata. This is not a new thing,
    from what I have heard, it has been in-use on USENET too.

    If your newsreader does not [2] support OpenPGP-MIME [3],
    then it obviously can't do anything useful with this "attachment".
    In such case, you can safely ignore any of such "attachment" with `application/pgp-signature` content type-- you're not missing anything
    that the poster was saying or trying to show. [4]

    Regards,
    xwindows


    [1] Scare-quoted "attachment" because it is not; as the content type
    of the main message is explicitly `multipart/signed`.
    A display of cryptographic signature as attachment is just
    a fallback handling of in MIME-supporting newsreader
    for generic `multipart/*` content type. [2]

    [2] This fact is obvious, since ones which do support this standard
    would not display the mysterious MIME part as "attachment",
    and will instead flag this entire article as cryptographically-signed,
    then proceed to show options for user to verify that signature. [5]

    [3] RFC 3156: MIME Security with OpenPGP [Aug-2001]
    https://www.rfc-editor.org/rfc/rfc3156.html

    [4] Obligatory comic insert:
    https://www.explainxkcd.com/wiki/index.php/1181:_PGP

    [5] This is what displayed, when I navigated to the article you mentioned,
    using Claws Mail-- which supports GnuPG integration (my emphases in r=
    ed):
    http://tilde.club/~xwindows/temp/2025-07-21/pgpsigned.png
    https://tilde.club/~xwindows/temp/2025-07-21/pgpsigned.png

    Thank you xwindows.

    You have so much more patience, and you give great replying information.

    What did a "standard Usenet messages" actually meant... And where did
    they have read this word concept from, remains in a fog of mystery...


    =2D-=20
    The pioneers of a warless world are the youth that
    refuse military service. ~ Albert Einstein

    --=-=-=
    Content-Type: application/pgp-signature; name="signature.asc"

    -----BEGIN PGP SIGNATURE-----

    iQJKBAEBCgA0FiEEOVeKaEm0xBhCsMmYlk/BEMQK1XUFAmh+GUIWHGtleWJvYXJk YW5AdGlsZGUuY2x1YgAKCRCWT8EQxArVdfimD/9ZLDYs1ZW7mHz7NzzY5EGsp5MX wEcEWQ1hpWDXPAdpuZFB9jD94SJjhQ8mr2qkoJ2C/zjPhv5fBQM1AMmMN6PQIJJ3 YtG7g/Zvx/k/pE5pLVImDfS2YKJpi/X/BqgdR82PTXF2GFCL5kpHqkOiVejSmVzA h8CAa+CAslaTkA9k9Mn51mxKgev4xrhq3jVb69Jj/rAgkbvAGCJROiYHwksNORu1 e/rYKGDvUwB7tVhnKqEpBnmnr8uEekK4Ag/fRpo/Q2THBXsZD3fGByuwLCsDQm4D iBmIbcXJVttyEn2hUeOL+D8pCkZ7qsNK3N/MBrTvkALIt1PDB2eqxIu52mBLol+k s6TzpAy3XbiHRjlUTwrjdKN8EeljDdTIdpqmi/Bi5tMD92+T3ljfvv+wtOu4E4J7 dXiREiI6oEw/DJHVt/ztbdE9bWYomilA4/kRO8tKNt+SqZHWSXB1uaX6Pt9YQfI1 mX2o/rEbf9kjIVgIBly9sC0BaYuOg7pYxtGbYysxSTeFFW2CDvRzoiQjU+BPjLiC Tnkx+iejFzTpjqZsJpjWp4yUaF9L4ftm64xbUZrcnXTTcEFwIzhkXfa6M3MN2bok MWkM53SbHqIcsCfvouPyUcKGDKFvyGA6A3WCw6rUz6tatijspY+Pv14Mtzew0xlx W1mx47IKe7CU6gad1A==
    =l1pC
    -----END PGP SIGNATURE-----
    --=-=-=--
    --- Synchronet 3.20a-Linux NewsLink 1.2
  • From xwindows@xwindows@tilde.club to tilde.meta on Mon Jul 21 21:21:03 2025
    References: <c6c3a734dfa4d036146debfcfd46578f05d6db33.camel@tilde.green>

    <87o6uzpndy.fsf@tilde.club>
    <105j74l$231g6$1@tilde.club>

    <a1174e70-7df3-4c36-90d2-bbc4f4eab155@tilde.club>
    <87a54xq11a.fsf@tilde.club>

    On Mon, 21 Jul 2025, keyboardan wrote:

    What did a "standard Usenet messages" actually meant...

    Well, there have been several revision of Netnews message formats;
    majority of them actually predated MIME specification for many years.

    Original netnews message format was first specified in 1983 (RFC 850),
    and the longest-living one was specified in 1987 (RFC 1036);
    while MIME was officially introduced in 1992 (RFC 1341)--
    half-decade after it, and was mainly used just in email context.

    There is actually only *one* version of netnews message format specification that was released after MIME (and officially reference it):
    the current edition from 2009 (RFC 5536).

    For people who actively use/used text-only USENET, I can understand
    the confusion about it not being a valid format; because
    even regular multipart MIME messages are not common in that that realm [1][2] (it's more common in binary-allowing USENET newsgroups,
    from what I understand); and is likely quite rare to randomly
    find ones used in cryptographic signing out there.

    The main reasons I don't get surprised with this are:

    A. I have read netnews message format specifications (both old and new),
    thus know that the current version explicitly allows MIME formatting.
    B. I have heard about how people used PGP/GPG to sign netnews articles.
    C. I have read MIME specifiation (just casually, not back to back).
    D. I kinda know what PGP/MIME-signed messages look like at byte level;
    because I have used GPG-encrypted+signed emails before, and have actually
    tried "view source" on the result.

    I think the D. part is the most important reason; because in the past,
    I have also seen users (not normies, by the way) asked the same question
    but in *mailing list* context; ~ant is not alone in this one.

    Regards,
    ~xwindows


    [1] Unlike email, where attachment (a main use of multipart MIME message)
    has been considered a basic user-level option since at least
    mid/late-1990.

    [2] Vast majority of the posts in text USENET are made the format I described
    in the grandparent post as "old-style bare single-part article"
    i.e. RFC 1036 pre-MIME style. Such message would sometimes have
    MIME header, but they would display correctly in very-old newsreaders
    that only supported old RFC 1036. (MIME multipart article would
    collapse into single part and look broken, and article using
    MIME with Base64 Content-Encoding on main part wouldn't be readable
    under such newsreaders at all)
    --
    xwindows' gallery of freely-licensed artworks
    https://tilde.club/~xwindows/ http://tilde.club/~xwindows/ gopher://tilde.club/1/~xwindows/
    --- Synchronet 3.20a-Linux NewsLink 1.2
  • From xwindows@xwindows@tilde.club to tilde.meta on Mon Jul 21 21:23:26 2025
    (Reposted due to incorrect threading on the last try)

    On Mon, 21 Jul 2025, keyboardan wrote:

    What did a "standard Usenet messages" actually meant...

    Well, there have been several revision of Netnews message formats;
    majority of them actually predated MIME specification for many years.

    Original netnews message format was first specified in 1983 (RFC 850),
    and the longest-living one was specified in 1987 (RFC 1036);
    while MIME was officially introduced in 1992 (RFC 1341)--
    half-decade after it, and was mainly used just in email context.

    There is actually only *one* version of netnews message format specification that was released after MIME (and officially reference it):
    the current edition from 2009 (RFC 5536).

    For people who actively use/used text-only USENET, I can understand
    the confusion about it not being a valid format; because
    even regular multipart MIME messages are not common in that that realm [1][2] (it's more common in binary-allowing USENET newsgroups,
    from what I understand); and is likely quite rare to randomly
    find ones used in cryptographic signing out there.

    The main reasons I don't get surprised with this are:

    A. I have read netnews message format specifications (both old and new),
    thus know that the current version explicitly allows MIME formatting.
    B. I have heard about how people used PGP/GPG to sign netnews articles.
    C. I have read MIME specifiation (just casually, not back to back).
    D. I kinda know what PGP/MIME-signed messages look like at byte level;
    because I have used GPG-encrypted+signed emails before, and have actually
    tried "view source" on the result.

    I think the D. part is the most important reason; because in the past,
    I have also seen users (not normies, by the way) asked the same question
    but in *mailing list* context; ~ant is not alone in this one.

    Regards,
    ~xwindows


    [1] Unlike email, where attachment (a main use of multipart MIME message)
    has been considered a basic user-level option since at least
    mid/late-1990.

    [2] Vast majority of the posts in text USENET are made the format I described
    in the grandparent post as "old-style bare single-part article"
    i.e. RFC 1036 pre-MIME style. Such message would sometimes have
    MIME header, but they would display correctly in very-old newsreaders
    that only supported old RFC 1036. (MIME multipart article would
    collapse into single part and look broken, and article using
    MIME with Base64 Content-Encoding on main part wouldn't be readable
    under such newsreaders at all)
    --
    xwindows' gallery of freely-licensed artworks
    https://tilde.club/~xwindows/ http://tilde.club/~xwindows/ gopher://tilde.club/1/~xwindows/
    --- Synchronet 3.20a-Linux NewsLink 1.2
  • From Anton Shepelev@ant@tilde.culb to tilde.meta on Mon Jul 21 18:52:24 2025
    xwindows <xwindows@tilde.club> wrote:
    On Sun, 20 Jul 2025, Anton Shepelev wrote:

    [-- text/plain, size 0.1K, charset US-ASCII, 6 lines, quoted-printable --] >> [-- application/pgp-signature, name signature.asc, size 0.8K, 17, 7bit --] >>
    and says it is unable to open them. Are you sure your articles are
    standard Usenet messages?

    The fact that TIN correctly shown that as an "attachment" [1]
    as opposed to littering its Base64 content all over the main message,
    mean the this is indeed a valid netnews article-- it is just formatted >according to MIME standard, rather than being old-style bare
    single-part article.

    What this actually mean is you are reading is a GPG/OpenPGP-signed
    netnews article: inside that "attachment" is just a few-line blob
    of Base64-encoded message integrity metadata. This is not a new thing,
    from what I have heard, it has been in-use on USENET too.

    If your newsreader does not [2] support OpenPGP-MIME [3],
    then it obviously can't do anything useful with this "attachment".
    In such case, you can safely ignore any of such "attachment" with >`application/pgp-signature` content type-- you're not missing anything
    that the poster was saying or trying to show. [4]

    In addition to the PGP part, it also complains about a .txt attachement,
    with the contents of the article body, and that is rather annoying.
    I will see if I can re-configure ~club's tin to stop complaining.
    Sylpheed, on the other hand, knows about PGP, and does not complain.

    --- Synchronet 3.20a-Linux NewsLink 1.2
  • From ant@ant@~.club to tilde.meta on Tue Jul 22 02:00:59 2025
    xwindows:


    D. I kinda know what PGP/MIME-signed messages look like at
    byte level; because I have used GPG-encrypted+signed
    emails before, and have actually tried "view source" on
    the result.


    I think the D. part is the most important reason; because
    in the past, I have also seen users (not normies, by the
    way) asked the same question but in *mailing list*
    context; ~ant is not alone in this one.

    I have seen PGP e-mails, but never knew it was used in
    Usenet articles. Tin my club newsreader feels surprised as
    well...
    --- Synchronet 3.20a-Linux NewsLink 1.2
  • From Annada Behera@annada@tilde.green to tilde.meta on Thu Jul 24 13:43:01 2025
    Please, share the URL to your poposal, that we may follow, and if need
    be, participate in its discussion.
    I send it to their mailing list. Here is a link to the mailing list
    archive.
    https://lists.gnu.org/archive/html/coreutils/2025-05/msg00013.html
    Annada Behera <annada@tilde.green> wrote:

    Although the output of coreutils are meant to be human readable
    only, many scripts today use/pipe them to other commands for various
    kinds of automation. This leads to brittle solutions involving
    complex awk/sed/grep gymnastics that break when the output format
    changes slightly. While "everything is text" philosophy has served
    GNU/Unix/Linux well, structured data processing has become important
    in
    modern computing.

    But pure text can also be structured and machine-oriented,
    rather than human-oriented, such as tab- or comma-separated files,
    which are /way/ simpler than JSON.
    Yes, pure text can be structured. Output of, say, ls is also structured
    but we have to do brittle parsing with AWK/Perl regex and run into a
    lot of edge-cases too. I proposed JSON because we have some very good battle-tested JSON parsers like jq and fx. We can use TSV/CSV but the
    data is not guaranteed to be tabular and CSV parsing with AWK have edge
    cases where they too break in unexpected ways. XML/TOML/YAML is ok, but
    then JSON is the most popular data format these days. We have a whole
    species of database engines that are built on top of JSON which gives me
    more confidence. And since, this is meant for machine readability any
    ways, complexity of JSON shouldn't matter.
    I would like to propose the addition of two new optional machine
    readable output streams (in addition to already present human readable
    streams):

       - stdout  (fd 1): human readable output
       - stderr  (fd 2): human readable errors
       - stdoutm (fd 3): machine readable output (NEW)
       - stderrm (fd 4): machine readable errors (NEW)

    The machine readable output format and conventions needs to be
    established. JSON is the most obvious choice with battle-tested
    parsers and tools, and immediately available for the scripting
    ecosystem. This could be implemented incrementally, starting with
    "high-usage" commands like (ls, ps, df, du) and then gradually expand
    coverage.

    I think it is a good idea, but if the medium is JSON text, then repeated >parsing is still there.
    I don't understand what repeated parsing you are talking about, care to elaborate.
    --- Synchronet 3.20a-Linux NewsLink 1.2
  • From ant@ant@tilde.club to tilde.meta on Fri Jul 25 13:51:54 2025
    Annada Behera to Anton Shepelev:

    But pure text can also be structured and machine-
    oriented, rather than human-oriented, such as tab- or
    comma-separated files, which are /way/ simpler than
    JSON.

    Yes, pure text can be structured.

    And I value that, as TSV/CSV are much simpler than JSON and
    both human- and machine-readable. I should hate to lose
    them from my toolchains. And I do fear to lose them, as the
    advent of JSON will cause a gradual extinction of classical
    processing.

    Output of, say, ls is also structured but we have to do
    brittle parsing with AWK/Perl regex and run into a lot of
    edge-cases too.

    I understand what you mean. In addition to robustness, JSON
    brings a higher power of expression. At the expense of a
    more compicated and less human-readable format.

    With suitable settings, I believe one can parse ls, e.g.:

    <https://s.tilde.club/?file=3nwf>

    I think it is a good idea, but if the medium is JSON
    text, then repeated parsing is still there.

    I don't understand what repeated parsing you are talking
    about, care to elaborate.

    Consider the following toolchain, assuming JSON input and
    output:
    t1 | t2 | t3 | t4

    tools 2..4 parse the JSON from tools 1..3. That's repeated
    parsing for you, whereas a purely structural approach would
    parse JSON at the input of t1, process the data internally
    in its native form, and the generate JSON on output.
    Otherwise, JSON is repeatedly parsed and generated. In case
    of simple filtering functions, this is literall parsing of
    the same JSON data.
    --- Synchronet 3.20a-Linux NewsLink 1.2
  • From Annada Behera@annada@tilde.green to tilde.meta on Mon Jul 28 15:22:20 2025
    Annada Behera to Anton Shepelev:

    But pure text can also be structured and machine-
    oriented, rather than human-oriented, such as tab- or
    comma-separated files, which are /way/ simpler than
    JSON.

    Yes, pure text can be structured.

    And I value that, as TSV/CSV are much simpler than JSON and
    both human- and machine-readable.  I should hate to lose
    them from my toolchains.  And I do fear to lose them, as the
    advent of JSON will cause a gradual extinction of classical
    processing.

    Output of, say, ls is also structured but we have to do
    brittle parsing with AWK/Perl regex and run into a lot of
    edge-cases too.

    I understand what you mean.  In addition to robustness, JSON
    brings a higher power of expression.  At the expense of a
    more compicated and less human-readable format.
    I am proposing a machine-only readable format. For instance, in `ls`
    if human-readablility is important, just use 'ls' like everyone
    else. Human-parsibility is at 'stdout'. Fd3 is strictly meant to be machine-read, not for human beings. 'ls 3>&1 1>/dev/null' only meant for
    piping to other programs. If you can come up with a more expressive, (potentially binary format) with battle tested parsers like jq/fx, I am
    up for it. But at this point, JSON is so universal, anyone can look at
    the output and correctly guess which tool they need to parse with.
    Again, my proposal is not to replace the stdout with JSON output, my
    proposal is to leave the userfacing stdout untouched, and only when a
    fd-3 is open, collect the necessary data into JSON and put it in fd-3.
    No performance overhead whatsoever if fd-3 is not open.
    With suitable settings, I believe one can parse ls, e.g.:

                 <https://s.tilde.club/?file=3nwf>
    Parse yes. Robust and handles edge cases. Not likely. The entire point
    of format specifications. There is none of ls' output.
    I think it is a good idea, but if the medium is JSON
    text, then repeated parsing is still there.

    I don't understand what repeated parsing you are talking
    about, care to elaborate.

    Consider the following toolchain, assuming JSON input and
    output:
                         t1 | t2 | t3 | t4

    tools 2..4 parse the JSON from tools 1..3.  That's repeated
    parsing for you, whereas a purely structural approach would
    parse JSON at the input of t1, process the data internally
    in its native form, and the generate JSON on output.
    Otherwise, JSON is repeatedly parsed and generated.  In case
    of simple filtering functions, this is literall parsing of
    the same JSON data.
    Interesting point. But a lot of tools in coreutils are essentially
    helpers around the this specific human-machine-parsibility issue. For
    example, `ls | wc -l` or `ls | sort` is essentially a doing processing
    on the output of the ls. In my world, you do your processing in a
    complete programming language of your choice. For instance, you'd do
    something like,
    ls 3>&1 1>dev>null | fx 'this.length' # for ls | wc -l
    ls 3>&1 1>dev>null | fx 'this.sort()' # for ls | sort
    See how fx uses Javascript to do any processing. With the entire weight
    of a JS to do any kind of processing you want. So, ideally the '3>&1 1>/dev/null' should be at the end of your pipes. JS is just an example,
    you choose your poison. This is language agnostic as long as it has a
    library that can parse JSON. And if you have a legacy pipeline that does something you have been relying on, then you leave it untouched and
    just append '3>&1 4 1>/dev/null' and continue with a real programming
    language from there.
    And honestly, I feel the repeated parsing is a little overstated. Unix pipelines are generally very short and JSON parsers are extremely
    optimized and fast even for a computer from 1990s, without any
    brittleness whatsoever. Don't let perfect be the enemy of good.
    --- Synchronet 3.20a-Linux NewsLink 1.2
  • From ant@ant@tilde.club to tilde.meta on Mon Jul 28 13:55:18 2025
    Annada Behera to Anton Shepelev:

    I understand what you mean.┴ In addition to robustness,
    JSON brings a higher power of expression. At the
    expense of a more compicated and less human-readable
    format.

    I am proposing a machine-only readable format. For
    instance, in `ls` if human-readablility is important, just
    use 'ls' like everyone else. Human-parsibility is at
    'stdout'. Fd3 is strictly meant to be machine-read, not
    for human beings. 'ls 3>&1 1>/dev/null' only meant for
    piping to other programs.

    Yes, I see your point: you are proposing, as it were, and
    different signal band, a complete separation between human-
    and machine-oriented I/O. And I am a tad worried about this
    because I value the /unity/ that currently exists in Unix
    tools, where the text-stream I/O is simultaneously machine-
    and human-oriented, with all the attendant quirks and
    problems.

    If you can come up with a more expressive, (potentially
    binary format) with battle tested parsers like jq/fx, I am
    up for it. But at this point, JSON is so universal, anyone
    can look at the output and correctly guess which tool they
    need to parse with.

    JSON is very good as it is, and please to leave it in text
    form in your proposal, to keep some of the human
    readability. We humans need it, if only for debugging our
    tool chains.

    Again, my proposal is not to replace the stdout with JSON
    output, my proposal is to leave the userfacing stdout
    untouched, and only when a fd-3 is open, collect the
    necessary data into JSON and put it in fd-3. No
    performance overhead whatsoever if fd-3 is not open.

    Nor did I performace overhead among my misgivings.

    With suitable settings, I believe one can parse ls,
    e.g.:

    <https://s.tilde.club/?file=3nwf>

    Parse yes. Robust and handles edge cases. Not likely.

    This may be a straw man for you, what with my little
    experience on the Unix console, but may I ask to go ahead
    and break my script?

    The entire point of format specifications. There is none
    of ls' output.

    How so? Is the POSIX doc ambiguous?--

    https://manned.org/man/ls.1p#head11

    Consider the following toolchain, assuming JSON input
    and output:

    t1 | t2 | t3 | t4

    tools 2..4 parse the JSON from tools 1..3. That's
    repeated parsing for you, whereas a purely structural
    approach would parse JSON at the input of t1, process
    the data internally in its native form, and the generate
    JSON on output. Otherwise, JSON is repeatedly parsed
    and generated. In case of simple filtering functions,
    this is literall parsing of the same JSON data.

    Interesting point. But a lot of tools in coreutils are
    essentially helpers around the this specific human-
    machine-parsibility issue.

    I think they are means to an end, rather than helpers around
    an issue...

    For example, `ls | wc -l` or `ls | sort` is essentially a
    doing processing on the output of the ls.

    Yes, which is why many tools have added the -h option for
    human-oriented output.

    In my world, you do your processing in a complete
    programming language of your choice.

    Then you have to write Lua, Perl, Python, Pascal[1], or even
    C, and then rely on libraries instead of shell utilities.
    This is a valid, but completely different mode of operation
    from using the shell and utilities, including mini-languages
    such as ed(1), sed(1), awk(1), and grep(1).

    For instance, you'd do something like,

    ls 3>&1 1>dev>null | fx 'this.length' # for ls | wc -l
    ls 3>&1 1>dev>null | fx 'this.sort()' # for ls | sort

    See how fx uses Javascript to do any processing.

    Yes. I did not know from your previous examples that `fx'
    used Javascript. In my view, so huge, complicated, and
    trendy a language as Javascript is hardly compatible with
    the Unix Way, cf.:

    <https://felipec.wordpress.com/2025/02/13/rust-not-for-linux/>

    And honestly, I feel the repeated parsing is a little
    overstated.

    Overstated or not, it is there in Unix toolchains and in
    your proposal, and, for aught I understand, eliminated in
    Powershell.

    Unix pipelines are generally very short and JSON parsers
    are extremely optimized and fast even for a computer from
    1990s, without any brittleness whatsoever. Don't let
    perfect be the enemy of good.

    Well, I did not suggest that you modify your proposal to
    avoid repeated parsing...
    ____________________
    1. https://wiki.freepascal.org/Pascal_Script
    --- Synchronet 3.20a-Linux NewsLink 1.2
  • From Annada Behera@annada@tilde.green to tilde.meta on Tue Jul 29 17:12:16 2025
    -----Original Message-----
    From: ant <ant@tilde.club>
    Subject: Re: I just made the following request to GNU coreutils team.
    Date: 07/28/2025 04:25:18 PM
    Newsgroups: tilde.meta
    Annada Behera to Anton Shepelev:
    I understand what you mean.а In addition to robustness,
    JSON brings a higher power of expression.  At the
    expense of a more compicated and less human-readable
    format.

    I am proposing a machine-only readable format. For
    instance, in `ls` if human-readablility is important, just
    use 'ls' like everyone else. Human-parsibility is at
    'stdout'. Fd3 is strictly meant to be machine-read, not
    for human beings. 'ls 3>&1 1>/dev/null' only meant for
    piping to other programs.

    Yes, I see your point: you are proposing, as it were, and
    different signal band, a complete separation between human-
    and machine-oriented I/O.  And I am a tad worried about this
    because I value the /unity/ that currently exists in Unix
    tools, where the text-stream I/O is simultaneously machine-
    and human-oriented, with all the attendant quirks and
    problems.
    You may value the unity that currently exists, my proposal
    is not breaking any of your workflow. It is not substituing
    what currently exists, it's additive for people who value
    robust parsing.
    If you can come up with a more expressive, (potentially
    binary format) with battle tested parsers like jq/fx, I am
    up for it. But at this point, JSON is so universal, anyone
    can look at the output and correctly guess which tool they
    need to parse with.

    JSON is very good as it is, and please to leave it in text
    form in your proposal, to keep some of the human
    readability.  We humans need it, if only for debugging our
    tool chains.
    I too want it that way, text JSON, precisely for the debugging
    reason you mentioned.
    Again, my proposal is not to replace the stdout with JSON
    output, my proposal is to leave the userfacing stdout
    untouched, and only when a fd-3 is open, collect the
    necessary data into JSON and put it in fd-3.  No
    performance overhead whatsoever if fd-3 is not open.

    Nor did I performace overhead among my misgivings.
    Ok.
    With suitable settings, I believe one can parse ls,
    e.g.:

               <https://s.tilde.club/?file=3nwf>

    Parse yes.  Robust and handles edge cases.  Not likely.

    This may be a straw man for you, what with my little
    experience on the Unix console, but may I ask to go ahead
    and break my script?
    I didn't have to look carefully. I just ran it on my home
    directory and it broke for files older than 6 months, which
    has 7 fields instead of 8, even with careful field selection
    --time-style=iso, --quoting-style=shell-escape. And you
    don't get proper types, everything is a string.
    And even if you could perfect it with all the edge cases and
    all, you are essentially writing a human-readable to JSON
    parser. Are you really going to put it on one liners and
    beat the battle tested extremely optimized JSON parsers out
    there.
    The entire point of format specifications.  There is none
    of ls' output.

    How so?  Is the POSIX doc ambiguous?--
    Does ls output have a specification? The POSIX documentation
    describes the layout for human consumption. It defines what
    the columns mean, but it is not a formal data interchange
    specification in the way that RFC 8259 is for JSON. It
    leaves too much ambiguity for robust machine parsing (e.g.,
    the timestamp format change).
                https://manned.org/man/ls.1p#head11

    Consider the following toolchain, assuming JSON input
    and output:

                       t1 | t2 | t3 | t4

    tools 2..4 parse the JSON from tools 1..3.  That's
    repeated parsing for you, whereas a purely structural
    approach would parse JSON at the input of t1, process
    the data internally in its native form, and the generate
    JSON on output.  Otherwise, JSON is repeatedly parsed
    and generated.  In case of simple filtering functions,
    this is literall parsing of the same JSON data.

    Interesting point.  But a lot of tools in coreutils are
    essentially helpers around the this specific human-
    machine-parsibility issue.

    I think they are means to an end, rather than helpers around
    an issue...
    I am not saying that those tools are useless on their own
    and don't provide essential computational primitives, I am
    saying that, they are now being used as a workaround for
    brittle parsing.
    For example, `ls | wc -l` or `ls | sort` is essentially a
    doing processing on the output of the ls.

    Yes, which is why many tools have added the -h option for
    human-oriented output.
    The -h flag evolution shows that the Unix community already
    recognized the human/machine tension.
    In my world, you do your processing in a complete
    programming language of your choice.

    Then you have to write Lua, Perl, Python, Pascal[1], or even
    C, and then rely on libraries instead of shell utilities.
    This is a valid, but completely different mode of operation
    from using the shell and utilities, including mini-languages
    such as ed(1), sed(1), awk(1), and grep(1).
    No. You don't have to write Lua/Perl/Python. Again, 'stdout'
    is untouched. If you already have a script/mini-language
    that works for you, more power to you.
    For those who opt-in, tools like `jq` or `fx` act as more
    powerful "mini-languages" within the shell, directly
    analogous to `awk` or `sed`, but operating on structured
    data instead of fragile text.
    You stay within the shell, using the same pipeline
    composition model.
    For instance, you'd do something like,

    ls 3>&1 1>/dev/null | fx 'this.length' # for ls | wc -l
    ls 3>&1 1>/dev/null | fx 'this.sort()' # for ls | sort

    See how fx uses Javascript to do any processing.

    Yes.  I did not know from your previous examples that `fx'
    used Javascript.  In my view, so huge, complicated, and
    trendy a language as Javascript is hardly compatible with
    the Unix Way, cf.:

      <https://felipec.wordpress.com/2025/02/13/rust-not-for-linux/>

    First, you are attacking Javascript, when I am proposing
    machine-readability. My proposal, again, doesn't care what
    you are using to parse the JSON. You can use C (if that is
    "Unix Way" enough, for you) for all you want.
    Second, you are confusing "different from 1970s" design with
    "against Unix philosophy." Unix philosophy say,
    - Do one thing well - output machine readable JSON, nothing more.
    - Compose together - fd3 proposal is fundamentally a composing tool.
    - Text streams - JSON is just text, structured text, but text.
    It's beside the point, but I'd argue, just because JS is
    used in web-bloat, doesn't mean it's bad. Google/Apple
    has poured billions of dollars into the language and the
    engineering that has gone into it would make any 70s Unix
    pioneer proud. Tools are neutral, it's important how you use
    them. It's probably faster then your AWK/sed/bash at this
    point.
    And honestly, I feel the repeated parsing is a little
    overstated.


    Overstated or not, it is there in Unix toolchains and in
    your proposal, and,
    JSON parser are optimized of handling a very narrow
    specifications, unlike AWK/grep that that are more general
    and has to handle wide variety of cases, regex engines need
    complex backtracking. JSON parsers are highly specialized,
    vectorized and single pass with predictable parser. Unlike
    your Unix toolchains, JSON parsing can be streamable. So,
    parsing starts as soon as it is dumped.
    For majority of use cases, the JSON is not going to be
    dumped any where, it only exists in pipes and fifo, which
    even a 90s computer will not break a sweat. The bottleneck
    is not JSON parsing, it's IO. You are optimizing for
    microseconds in JSON repeated parsing, while the miliseconds
    in IO is the bottleneck.
    And I am not even factoring in the brittleness.
    for aught I understand, eliminated in Powershell.
    And no one uses Powershell, (and similar like NuShell)
    because it break the user's workflow. It proves the concept
    but requires abandoning existing Unix tools entirely - my
    proposal keeps everything working.
    Unix pipelines are generally very short and JSON parsers
    are extremely optimized and fast even for a computer from
    1990s, without any brittleness whatsoever.  Don't let
    perfect be the enemy of good.

    Well, I did not suggest that you modify your proposal to
    avoid repeated parsing...
    Repeated parsing is not an issue as I mentioned earlier.
    ____________________
    1. https://wiki.freepascal.org/Pascal_Script
    --- Synchronet 3.20a-Linux NewsLink 1.2