• Usenet retention time

    From ant@ant@tilde.club to tilde.meta on Sun Oct 19 02:03:20 2025
    As far as I understand, the major strategies for article
    retention are:

    a. by article age (articles have a fixed lifespan),
    b. by article count (a fixed number of articles in newsgroup).

    The fist strategy seems almost universal, but it has a drawback,
    whereby such load-balancing favours high-traffic groups, i.e.
    it is often disapponting in a small newsgroup to see some
    thirty articles for the past year, whereas retaining twice
    that number would have required just another 20 kb. Thus, the
    cost of retaining a given period of history is proportional to
    the traffic. Can a retention strategy be devised that is an
    interpolation of strategies a. and b., and would it make sense?

    If Ret is retention time and rate the average rate of incoming
    articles in the group, then the strategies above may be
    expressed as:

    a. Ret = const ,
    b. Ret = const / rate .

    A naive intermediate solution is:

    Ret = const / sqrt( rate );
    N = Ret * rate = const * sqrt( rage );

    It makes the number of retained articles proportional to the
    square root of the newsgroup traffic, so that low-traffic groups
    have a longer retention time, but still take less storage than
    high-traffic ones. It may be thoght is a compression of the
    article-count ratio.

    rate should be determined by some sort of exponential smoothing
    of the article frequency or period between two consequtive
    articles. (if done right, both are valid methods for averaging
    the rate of events, but I have yet to publish my algorithm)

    Since I have never succeeded in implemnting adaptive exponential
    smoothing, my first and simplest proposal is to make the length
    of the exponential window proportional to the current effective
    retention time.

    This is just my stream of thought; they say it is good for the
    soul.
    --- Synchronet 3.20a-Linux NewsLink 1.2