Lock-free reference counting

This is most probably not a novel idea, and I would be interested in
reading about this subject if there is some material out there about it.

One problem with reference counting smart pointers as a memory
management tool is that if an object is shared among different threads,
the reference counting becomes inefficient because each time the
reference count has to be changed, it needs locking. Two threads must
not change the reference count at the same time, or it will malfunction.

So I was thinking: Inside one single thread locking is not necessary
because one single thread never tries to access the reference count from
two places at the same time. Thus why not have separate reference
counters for each thread?

In other words, rather than having just one reference count for the
object, it could have n of them, where n is the number of threads which
share the object. The smart pointer would modify the reference count
corresponding to the thread it's running in. If it's running in thread
0, it modifies the first reference count, if it's running in thread 1 it
modifies the second, etc. How the smart pointer knows which thread it's
running in is an implementation detail, but the number could be, for
example, given to it when it's created (and it could keep it as a member
variable), or by other similar means.

The fact that once a reference count drops to 0 it will never be
changed again, can be used here. When a reference count drops to 0, the
smart pointer can simply check if *all* the reference counts are 0, and
if so, it can lock, delete the object, null the pointer, and unlock.
(The locking is necessary here to avoid double deletions, which could
happen if there was no locking.)

Thus the only place where locking is needed is in the deletion itself,
rather than each time the reference count is modified.

Can anyone see any flaw in this idea?