Usenet retention time
From
ant@ant@tilde.club to
tilde.meta on Sun Oct 19 02:03:20 2025
As far as I understand, the major strategies for article
retention are:
a. by article age (articles have a fixed lifespan),
b. by article count (a fixed number of articles in newsgroup).
The fist strategy seems almost universal, but it has a drawback,
whereby such load-balancing favours high-traffic groups, i.e.
it is often disapponting in a small newsgroup to see some
thirty articles for the past year, whereas retaining twice
that number would have required just another 20 kb. Thus, the
cost of retaining a given period of history is proportional to
the traffic. Can a retention strategy be devised that is an
interpolation of strategies a. and b., and would it make sense?
If Ret is retention time and rate the average rate of incoming
articles in the group, then the strategies above may be
expressed as:
a. Ret = const ,
b. Ret = const / rate .
A naive intermediate solution is:
Ret = const / sqrt( rate );
N = Ret * rate = const * sqrt( rage );
It makes the number of retained articles proportional to the
square root of the newsgroup traffic, so that low-traffic groups
have a longer retention time, but still take less storage than
high-traffic ones. It may be thoght is a compression of the
article-count ratio.
rate should be determined by some sort of exponential smoothing
of the article frequency or period between two consequtive
articles. (if done right, both are valid methods for averaging
the rate of events, but I have yet to publish my algorithm)
Since I have never succeeded in implemnting adaptive exponential
smoothing, my first and simplest proposal is to make the length
of the exponential window proportional to the current effective
retention time.
This is just my stream of thought; they say it is good for the
soul.
--- Synchronet 3.20a-Linux NewsLink 1.2