Re: Keeping track of what a user has read on a web site



In article <11h8uojahj8q993@xxxxxxxxxxxxxxxxxx>,
gordonb.04o1s@xxxxxxxxxxx (Gordon Burditt) wrote:

>> This is evident when I want to have a function in my page that alerts
>> the user if there is new content of a specific kind (for example: "1
>> new articles on cooking"). That function will report that until the
>> user has reset the timestamp (by visiting the 'what's new?' page that
>> lists all new articles). I can't mark just this article as 'read'.
>>
>> So, what options do I have? Well, each item have an ID, so if I
>> should keep track of read/unread I should base that on the IDs in the
>> aggregation database.
>
> I think it is better to keep track of what the user *HAS* read. Why?
> You don't have to mess with user read lists when a new article is
> added, but you DO have to mess with the unread list.

Good point.

>> Looking at how newsreaders (specifically those that make use of a
>> .newsrc file do it, they keep track of series of ID's, like
>> "12,14-67,69" - which in my case could mean that the user has read
>> the items with ID 13 and 68.
>
> Newsreaders using .newsrc keep track of what the user HAS read.
> The newsrc isn't edited when a new article shows up. The newsrc
> approach of using ranges is a compact way to store what has been
> read, but a bit awkward to manipulate. Also, if the user interface
> has a "catch-up" function (mark everything read), this collapses
> down to a single range starting with the lowest possible ID (e.g.
> 1).

Exactly.

>>The aggregation database looks something like this:
>>
>>ID | Kind | Headline | Original ID
>>------+-------------+-------------------------------+------------
>> 1 | article | Home made pie | 23
>> 2 | article | Hamburgers a'plenty | 24
>> 3 | forum | Anyone likes strawberries? | 298
>> 4 | comments | Re: Home made pie | 67
>>
>> Get the idea? The ID is the id in the aggregated database, the kind
>> is from what original database the content came from and the original
>> ID is the id in that database
>
> Another approach is to keep a SQL table containing user ID and
> article ID (and forum ID, if there's more than one). An entry in
> that table means that user has read that article.

Yes, but lookup in that table would take time, especially as time goes by and
new articles and new forum posts arrive. The aggregate database are there to
check up only on what's new and contains nothing but the last months fresh
items.

>> So, if I go and read "Hamburgers a'plenty", it should perhaps update
>> my profile to say "1,3-4" or somesuch to note that I have read id
>> number 2. Or perhaps I should just keep track of all the IDs I have
>> read? The aggregate database keeps content around for about a month,
>> which could mean thousands of items.
>>
>> I am guessing that a MySQL query that looked like this:
>> "select * from aggregate where id not in(1,2,3,4,5,6,7,8.....1678)"
>
> select aggregate.* from aggregate LEFT JOIN readlist on aggregate.id
> readlist.id and readlist.userid = 'this guys user id' where
> readlist.id is null;
>
> gets you a list of all articles this guy hasn't read. A problem with
> this approach is that readlist grows continuously over time.

Exactly.

>> So, I am wondering how YOU would have done - Or are you already doing
>> this in one way or the other? I'm just venting here and hoping that
>> someone will come with good suggestions on how to solve this in an
>> efficient manner.
>
> The .newsrc approach isn't too bad: it assumes that articles are
> created in sequential order, and that getting the high id of current
> articles is fairly easy. (select max(id) from aggregate). Many
> newsreaders assume that if the article id <= the max and id >= the min
> not yet expired and it's not in the newsrc list, there's a pretty good
> chance that it actually exists. If it later discovers that the
> article does not exist (say, trying to fetch it or its subject line),
> it marks it read.
>
> You could try putting the list of ranges into SQL. It saves storage.
> Chances are, you'd need to wipe out and re-store the entire list of
> ranges for a particular user (for a particular forum, if there's more
> than one) every time.

Exactly. I'll probably store the id access line in the user table, from where I
load info about the current user at the beginning of each page load. LIke this:

<?
$q=mysql_query("select * from member where email = '{$_COOKIE['email']}'
and md5pass = '{$_COOKIE['passwd']}'");
$user=mysql_fetch_array($q);
?>

That way, I could have a function that takes the current kind and id and
matches it to the list, something like this:

<?
$user["read"] = "article=12,23-56;forum=23,56-12989"
# $article = array of current article

if (has_read($user["read"], $article["id"], "article")){
# The user has read this
}
?>

But the has_read() function need to be really effective, since when listing 40
articles, it should be called with each article id to check read status.

*ponders*

--
Sandman[.net]
.