• Crossbow, the minimalist feed aggregator

    From Dacav Doe@dacav@tilde.institute to tilde.projects on Thu Jun 11 23:05:32 2020

    Hello, brothers and sisters of the Tildeverse!

    I'm very close to a (second) release of one of my side projects: Crossbow.
    As the subject implies, it is a minimalist RSS/Atom feed aggregator. It is written in modern C and portable across UNIX-like systems.

    Crossbow aims at being:

    - well integrated in a UNIX environment (leverages cron and local mail delivery)
    - well documented by means of man pages (uses mdoc)
    - supportive of my beloved Gopher protocol (besides HTTP, HTTPS, and local files)
    - a powerful tool allowing to handle feeds with sub-commands, as by UNIX tradition.
    - a privacy enhancement tool, since it reduces the need to use a web browser

    I hope you don't mind if I shamelessly trigger your curiosity with a teaser, that is the man-page you will find at the end of this post. It contains a collection of recipes that should give you the flavour of the software.

    If anyone is curious about this software, the project is hosted on gitlab:

    https://gitlab.com/dacav/crossbow/

    On February I released the version 0.9.0 which is available (with documentation) here:

    https://gitlab.com/dacav/crossbow/-/releases

    Please note that:

    - The new version will use a save-file format which is incompatible with the
    one used by version 0.9.0.

    - Since the new release is not complete, there's no "dist" tarball available
    quite yet, so you'll need to do it yourself

    Any feedback, suggestion, bug report, typo report is welcome. Feel free to answer in this thread, to place a pull request on gitlab, or to drop an email in my mailbox.


    Bye! :)
    / dacav


    -- * -- * -- * --


    CROSSBOW(1) General Commands Manual CROSSBOW(1)

    NAME
    crossbow-cookbook - examples of handling feeds

    SYNOPSIS
    crossbow set [...]

    DESCRIPTION
    This manual page contains short recipes describing
    common usage patterns for the crossbow feed aggregator.

    In all the following examples we will assume that the
    $ID environment variable is defined as an arbitrary
    feed identifier, and that the $URL environment variable
    is defined as the feed URL.

    EXAMPLES
    Simple local mail notification
    We want a periodic bulk notification about updates
    availability.

    The following feed set up can be used for this purpose:

    crossbow set -i "$ID" -u "$URL" \
    -o pretty \
    -f "updates from $ID:\n title: %t\n link: %l\n"

    The invocation of crossbow-fetch(1) will emit on
    stdout(3) a "record" like the following for each new
    item:

    updates from foobar:
    title: Today is a good day
    link: http://example.com/today-is-a-good-day

    The user can schedule on cron(8) a periodic invocation
    of crossbow-fetch(1). Assuming that local mail
    delivery is enabled, and since any output of a cronjob
    is emailed to the owner of the crontab(5), the user
    will receive an email having as body the concatenation
    of the records.

    Keep a local HTML file collection
    Let's consider the case of a feed for which the item's
    description field reports the whole article in HTML
    format. Individual articles need to be stored in a
    separate HTML file under a certain directory on the
    filesystem.

    The following feed set up can be used for this purpose:

    crossbow set -i "$ID" -u "$URL" \
    -o pipe \
    -f "sed -n w%n.html" \
    -C /some/destination/path/

    The invocation of crossbow-fetch(1) will spawn one
    sed(1) process for each new item. The item description
    will be piped to sed(1), which in turn will write it on
    a file (w command) . The output files will be named
    000000.html, 000001.html, 000002.html ..., since %n is
    expanded with an incremental numeric value. See
    crossbow-outfmt(5).

    Security remark: Unless the feed is trusted, it is
    strongly discouraged to use anything but %n to name
    files. Consider for example the case where %t is used
    instead of %n, and the title of a post is
    ../../../../home/user/public-html/index

    Security remark: We are using the w command of sed(1)
    to write to a file. It is not possible to use shell
    redirection since sub-commands are never executed
    through a shell interpreter. Invoking a shell
    interpreter from a command template is strongly
    discouraged, since the placeholders would be directly
    mixed with the shell script, and doing proper shell
    escaping against untrusted input is really hard, if not
    impossible. It is on the other hand safe to invoke a
    shell script whose code lives in a file and pass
    parameters to it. See crossbow-outfmt(5).

    Download the full article
    This scenario is similar to the previous one, except
    that the item description contains only part of the
    content, or nothing at all. The link field contains a
    valid URL, which is intended to be reached by means of
    a browser.

    In this case we can leverage curl(1) to do the
    retrieval:

    crossbow set -i "$ID" -u "$URL" \
    -o subproc \
    -f "curl -o /some/destination/path/%n.html %l"
    -C /some/destination/path/

    One mail per item
    We want to turn individual feed items into plain (HTML-
    free) text messages delivered via email.

    Our goal can be achieved by means of a generic shell
    script like the following:

    #!/bin/sh

    set -e

    while getopts l:s:t: opt; do
    case "$opt" in
    l) link="$OPTARG";;
    s) source="$OPTARG";;
    t) title="$OPTARG";;
    *) exit 1;;
    esac
    done

    lynx "${link:--stdin}" -dump -force_html |
    sed "s/^~/~~/" | # Escape dangerous tilde expressions
    mail -s "${source:+${source}: }${title:-...}" "${USER:?}"

    The script can be installed in the PATH, e.g. as
    /usr/local/bin/crossbow-to-mail, and then integrated in
    crossbow(1) as follows:

    ΓÇó If the feed provides the whole content as item
    description:

    crossbow set -i "$ID" -u "$URL" \
    -o pipe \
    -f "crossbow-to-mail -s joe's\ blog -t %t"

    ΓÇó If the feed provides only the URL of the article as
    item link:

    crossbow set -i "$ID" -u "$URL" \
    -o subproc \
    -f "crossbow-to-mail -l %l -s joe's\ blog -t %t"

    Some useful remarks about the shell script:

    ΓÇó The script depends on the excellent lynx(1) browser
    to download and parse the HTML into textual form.

    ΓÇó The "s/^~/~~/" sed(1) command improves security by
    preventing tilde escapes to be honored by unsafe
    implementations of mail(1). The mutt(1) mail user
    agent, if available, can be used as a safer drop-in
    replacement.

    Follow YouTube user, channel or playlist
    The YouTube site provides feeds for users, channels and
    playlists. Each of these entities is assigned with a
    unique identifier which can be easily obtained by
    looking at the web URL.

    Once the user, channel or playlist identifier is known,
    it is trivial to obtain the corresponding feeds:

    ΓÇó https://youtube.com/videos.xml?user=user

    ΓÇó https://youtube.com/videos.xml?channel_id=channel

    ΓÇó https://youtube.com/videos.xml?playlist_id=playlist

    It is possible to combine crossbow(1) with the youtube-
    dl(1) tool, to maintain up to date a local collection
    of video or audio files.

    What follows is a convenient wrapper script that
    ensures proper file naming:

    #!/bin/sh

    while getopts f:l:n: opt; do
    case "$opt" in
    f) format="$OPTARG";;
    l) link="$OPTARG";;
    n) incremental_id="$OPTARG";;
    *) exit 1;;
    esac
    done

    : "${link:?mandatory flag missing: -l}"
    : "${incremental_id:?mandatory flag missing: -n}"

    # Transform a title in a reasonably safe 'slug'
    slugify() {
    tr -d \\n | # explicitly drop new-lines
    tr /[:punct:][:space:] . | # turn all sneaky chars into dots
    tr -cs [:alnum:] # squeeze ugly repetitions
    }

    fname="$(
    youtube-dl \
    --get-filename \
    -o "%(id)s_%(title)s.%(ext)s" \
    "$link"
    )" || exit 1

    youtube-dl \
    ${format:+-f "$format"} \
    -o "$(printf %s_%s "$incremental_id" "$fname" | slugify)" \
    --no-progress \
    "$link"

    Once again, the script can be installed in the PATH,
    e.g. as /usr/local/bin/crossbow-ytdl And then
    integrated in crossbow(1) as follows:

    ΓÇó To save each published item:

    crossbow set -i "$ID" -u "$URL" \
    -o subproc \
    -f "crossbow-ytdl -l %l -n %n" \
    -C /some/destination/path

    ΓÇó To save each published item as audio:

    crossbow set -i "$ID" -u "$URL" \
    -o subproc \
    -f "crossbow-ytdl -f bestaudio -l %l -n %n" \
    -C /some/destination/path

    SEE ALSO
    crossbow-fetch(1), sed(1), youtube-dl(1), crontab(5),
    cron(8)

    Debian February 3, 2020 CROSSBOW(1)
    --- Synchronet 3.18b-Linux NewsLink 1.113
  • From Dacav Doe@dacav@tilde.institute to tilde.projects on Sat Jul 11 21:07:41 2020
    It is release time for Crossbow!

    https://gitlab.com/dacav/crossbow/-/tags/v1.1.1

    I spoke about Crossbow some time ago on this thread.

    I think it's especially interesting for Tilde inhabitants due to its UNIX-friendliness and for the possibly unique Gopher support

    If you are curious on how it works, check out the crossbow-cookbook(7)
    man page!

    Happy hacking
    --- Synchronet 3.18b-Linux NewsLink 1.113
  • From Dacav Doe@dacav@tilde.institute to tilde.projects on Wed Aug 5 10:12:15 2020
    It is release time again for Crossbow

    https://gitlab.com/dacav/crossbow/-/releases/v1.1.2

    Thanks to a couple of fellow hackers, it now supports the pledge(2) syscall under OpenBSD, and compiles on Darwin.

    Some interesting news about it:

    I've fuzzy-tested the underlying libraries (libnxml, libmxml) discovering a couple of crashes/leaks. The bugs are not severe, anyway I already sent fixes to the owner/maintainer.

    Even so, I'm somewhat unsatisfied with both libraries, so in the long run (I don't have much free time these days) I plan to write my own minimal rss/atom reading library, leveraging the excellent mini-xml library (https://www.msweet.org/mxml/). The project already started, and it is hosted on gitlab (https://gitlab.com/dacav/liborss).


    Happy hacking!
    - dacav
    --- Synchronet 3.18b-Linux NewsLink 1.113