Hello, brothers and sisters of the Tildeverse!
I'm very close to a (second) release of one of my side projects: Crossbow.
As the subject implies, it is a minimalist RSS/Atom feed aggregator. It is written in modern C and portable across UNIX-like systems.
Crossbow aims at being:
- well integrated in a UNIX environment (leverages cron and local mail delivery)
- well documented by means of man pages (uses mdoc)
- supportive of my beloved Gopher protocol (besides HTTP, HTTPS, and local files)
- a powerful tool allowing to handle feeds with sub-commands, as by UNIX tradition.
- a privacy enhancement tool, since it reduces the need to use a web browser
I hope you don't mind if I shamelessly trigger your curiosity with a teaser, that is the man-page you will find at the end of this post. It contains a collection of recipes that should give you the flavour of the software.
If anyone is curious about this software, the project is hosted on gitlab:
https://gitlab.com/dacav/crossbow/
On February I released the version 0.9.0 which is available (with documentation) here:
https://gitlab.com/dacav/crossbow/-/releases
Please note that:
- The new version will use a save-file format which is incompatible with the
one used by version 0.9.0.
- Since the new release is not complete, there's no "dist" tarball available
quite yet, so you'll need to do it yourself
Any feedback, suggestion, bug report, typo report is welcome. Feel free to answer in this thread, to place a pull request on gitlab, or to drop an email in my mailbox.
Bye! :)
/ dacav
-- * -- * -- * --
CROSSBOW(1) General Commands Manual CROSSBOW(1)
NAME
crossbow-cookbook - examples of handling feeds
SYNOPSIS
crossbow set [...]
DESCRIPTION
This manual page contains short recipes describing
common usage patterns for the crossbow feed aggregator.
In all the following examples we will assume that the
$ID environment variable is defined as an arbitrary
feed identifier, and that the $URL environment variable
is defined as the feed URL.
EXAMPLES
Simple local mail notification
We want a periodic bulk notification about updates
availability.
The following feed set up can be used for this purpose:
crossbow set -i "$ID" -u "$URL" \
-o pretty \
-f "updates from $ID:\n title: %t\n link: %l\n"
The invocation of crossbow-fetch(1) will emit on
stdout(3) a "record" like the following for each new
item:
updates from foobar:
title: Today is a good day
link:
http://example.com/today-is-a-good-day
The user can schedule on cron(8) a periodic invocation
of crossbow-fetch(1). Assuming that local mail
delivery is enabled, and since any output of a cronjob
is emailed to the owner of the crontab(5), the user
will receive an email having as body the concatenation
of the records.
Keep a local HTML file collection
Let's consider the case of a feed for which the item's
description field reports the whole article in HTML
format. Individual articles need to be stored in a
separate HTML file under a certain directory on the
filesystem.
The following feed set up can be used for this purpose:
crossbow set -i "$ID" -u "$URL" \
-o pipe \
-f "sed -n w%n.html" \
-C /some/destination/path/
The invocation of crossbow-fetch(1) will spawn one
sed(1) process for each new item. The item description
will be piped to sed(1), which in turn will write it on
a file (w command) . The output files will be named
000000.html, 000001.html, 000002.html ..., since %n is
expanded with an incremental numeric value. See
crossbow-outfmt(5).
Security remark: Unless the feed is trusted, it is
strongly discouraged to use anything but %n to name
files. Consider for example the case where %t is used
instead of %n, and the title of a post is
../../../../home/user/public-html/index
Security remark: We are using the w command of sed(1)
to write to a file. It is not possible to use shell
redirection since sub-commands are never executed
through a shell interpreter. Invoking a shell
interpreter from a command template is strongly
discouraged, since the placeholders would be directly
mixed with the shell script, and doing proper shell
escaping against untrusted input is really hard, if not
impossible. It is on the other hand safe to invoke a
shell script whose code lives in a file and pass
parameters to it. See crossbow-outfmt(5).
Download the full article
This scenario is similar to the previous one, except
that the item description contains only part of the
content, or nothing at all. The link field contains a
valid URL, which is intended to be reached by means of
a browser.
In this case we can leverage curl(1) to do the
retrieval:
crossbow set -i "$ID" -u "$URL" \
-o subproc \
-f "curl -o /some/destination/path/%n.html %l"
-C /some/destination/path/
One mail per item
We want to turn individual feed items into plain (HTML-
free) text messages delivered via email.
Our goal can be achieved by means of a generic shell
script like the following:
#!/bin/sh
set -e
while getopts l:s:t: opt; do
case "$opt" in
l) link="$OPTARG";;
s) source="$OPTARG";;
t) title="$OPTARG";;
*) exit 1;;
esac
done
lynx "${link:--stdin}" -dump -force_html |
sed "s/^~/~~/" | # Escape dangerous tilde expressions
mail -s "${source:+${source}: }${title:-...}" "${USER:?}"
The script can be installed in the PATH, e.g. as
/usr/local/bin/crossbow-to-mail, and then integrated in
crossbow(1) as follows:
ΓÇó If the feed provides the whole content as item
description:
crossbow set -i "$ID" -u "$URL" \
-o pipe \
-f "crossbow-to-mail -s joe's\ blog -t %t"
ΓÇó If the feed provides only the URL of the article as
item link:
crossbow set -i "$ID" -u "$URL" \
-o subproc \
-f "crossbow-to-mail -l %l -s joe's\ blog -t %t"
Some useful remarks about the shell script:
ΓÇó The script depends on the excellent lynx(1) browser
to download and parse the HTML into textual form.
ΓÇó The "s/^~/~~/" sed(1) command improves security by
preventing tilde escapes to be honored by unsafe
implementations of mail(1). The mutt(1) mail user
agent, if available, can be used as a safer drop-in
replacement.
Follow YouTube user, channel or playlist
The YouTube site provides feeds for users, channels and
playlists. Each of these entities is assigned with a
unique identifier which can be easily obtained by
looking at the web URL.
Once the user, channel or playlist identifier is known,
it is trivial to obtain the corresponding feeds:
ΓÇó
https://youtube.com/videos.xml?user=user
ΓÇó
https://youtube.com/videos.xml?channel_id=channel
ΓÇó
https://youtube.com/videos.xml?playlist_id=playlist
It is possible to combine crossbow(1) with the youtube-
dl(1) tool, to maintain up to date a local collection
of video or audio files.
What follows is a convenient wrapper script that
ensures proper file naming:
#!/bin/sh
while getopts f:l:n: opt; do
case "$opt" in
f) format="$OPTARG";;
l) link="$OPTARG";;
n) incremental_id="$OPTARG";;
*) exit 1;;
esac
done
: "${link:?mandatory flag missing: -l}"
: "${incremental_id:?mandatory flag missing: -n}"
# Transform a title in a reasonably safe 'slug'
slugify() {
tr -d \\n | # explicitly drop new-lines
tr /[:punct:][:space:] . | # turn all sneaky chars into dots
tr -cs [:alnum:] # squeeze ugly repetitions
}
fname="$(
youtube-dl \
--get-filename \
-o "%(id)s_%(title)s.%(ext)s" \
"$link"
)" || exit 1
youtube-dl \
${format:+-f "$format"} \
-o "$(printf %s_%s "$incremental_id" "$fname" | slugify)" \
--no-progress \
"$link"
Once again, the script can be installed in the PATH,
e.g. as /usr/local/bin/crossbow-ytdl And then
integrated in crossbow(1) as follows:
ΓÇó To save each published item:
crossbow set -i "$ID" -u "$URL" \
-o subproc \
-f "crossbow-ytdl -l %l -n %n" \
-C /some/destination/path
ΓÇó To save each published item as audio:
crossbow set -i "$ID" -u "$URL" \
-o subproc \
-f "crossbow-ytdl -f bestaudio -l %l -n %n" \
-C /some/destination/path
SEE ALSO
crossbow-fetch(1), sed(1), youtube-dl(1), crontab(5),
cron(8)
Debian February 3, 2020 CROSSBOW(1)
--- Synchronet 3.18b-Linux NewsLink 1.113