Re: Server Advice Wanted.



"Brett Watters" <blwatters@xxxxxxx> wrote in
news:431de2da@xxxxxxxxxxxxxxxxxxxxxx:

> DVD-R is estimated at about 50-100 years. DDS tape
> is only good for 10 years of storage. And since you use
> the same tapes over-and-over, most fail within a few
> years of heavy use. They are also subject to magnets,
> less forgiving of heat, moisure/water, etc. and can fail
> mechanically -- i.e. snap, get caught in machines,
> have their bearings go, etc.

My experience with CDs is very different than what you
describe. In most case I expect a maximum of 3~4 years
per disk on a mildle used disk on the other a hard drive
will give me around 8~10 years of life before the troubles
begin. I do not have any experience with CD/DVD jukeboxes
(no scratchs better handling etc) so I cant comment on that
really.


> Honestly, no need to update them. Just use DVD-R.
> The scanned images and basic info about the book
> won't change. The DB can hold any information
> which is likely to change -- last time it was accessed,
> number of accesses, etc.
>
> If you honestly need to change the information -- bad
> scans or something -- you just make a new copy of
> the book on a different DVD and point the the DB
> entry to the new copy. You can include the 'Copy: 2'
> marking on the text/data file for the book so that
> if you need to rebuild the entire thing, you can.
>
> (You could also just copy the DVD to a HD,
> correct the HD image and burn it out to a
> replacement DVD.)

My concern is not how hard it is for me to do it
it is how hard the customer will perceive it is.
I am dealing with people that know almost nothing
about computers and they are not very willing to
learn so my software needs to be as simple as possible
and the use of external programs or procedures that
require knowledge of the OS at zero level if possible.

> As far as the need to write the disks... it is going
> to take someone longer to scan them all in than
> write out to DVDs. High-end systems can make
> a 4.7GB disk in about 30 minutes. Remember you
> can make the DVDs on workstations (with the
> book info), put the DVDs into the juke box and
> then order your program to scan through the
> directories, read the info from the data files, and
> update the DB. Thus, you can have dozens of
> workstations making the library if you wish.
> 3.2TB is about 680 disks (1360 with backups),
> and 2 per hour, that is about 17 weeks for
> one machine/person or a month for four
> workstations.

I can have the photographers do the writting on
DVDs as they scan the images they have required
knowledge this is not my problem.

> As for updating the library over time... you
> said you only expect about 5GB of data per
> year. Simply maintain a 'volume' of the HD
> which contains new books. When this reaches
> 4.5GB, then burn the volume onto a DVD,
> and update the DB entries to point to the
> volume on the jukebox.
>
> Based on your estimates this would only
> need to be done once per year. Even if
> you were off by a factor of 10, that is only
> once per a hour or so each month -- including
> the backup copy.

This process will probably require me to go at
the customer at least once a year. This is no
problem since my contruct has a number of visits
on the customer per year for maintance reasons
but if this becomes once per month it would be
prooven problematic for me.

> Why? You get to set the rules for how large of
> a cache you need. Remember the goal is to
> reduce the size of the HD since they are more
> expensive to mirror and backup.

That is correct but at the same time I need to
minimize the wait time as far as possible.
I am considering some kind of caching mechanism
in order to avoid hitting the DVDs all the time.

> I'd put DVDs against mirrored HDs anytime.
>
> 1. DVDs are fixed. You aren't writting to them again.
> The jukebox can't write to them, so it can't corrupt
> them. i.e. no FAT failures, no fragmentation, or
> accidentally deleting files, etc.
>
> Mirroring a HD only safe guards against physical
> failure of the HD. If the OS, app, FAT, gets corrupted,
> both sets get corrupted.
>
> 2. If a 3TB mirrored system fails, the cost and down
> time to replace one set and re-image it is staggering.
> If a 4.7GB disk fails, it takes a few seconds to
> replace.
>
> 3. If a 3TB system needs to be restored from a
> backup... ugg.


That is why I have a dual computer as a requirement
the second one is just mirroring the information I need
(in mirrored disks as well) so I can recover even from
a system crash (motherboard burn etc).

> 4. Daily tape backups of a 3TB system aren't
> practical. If you use incremental backups, restore
> time get increasing long and worse... the chance
> of it working decreases.

I know I haven't any usage statistics yet but from
what it has been described I need a backup once a week
and I have a 4 tape sets so a lost of one weeks work
is the worst case senario and only if both systems will
fail at the same time and this is going to happen only
from external sources (Physical disasters, electrical
porblems etc) in this case I think that a lost of a weeks
work is at least acceptable.

> 5. If a DVD fails. It can only takes down a tiny
> percentage of your library, and can be replaced
> from your backup quickly. Even if an entire
> jukebox fails... it only takes down a percentage
> of your library, and you can move the DVDs
> to another quickly.
>
> If a HD fails, it could take days to rebuild
> a 3TB mirror. And of course you have no
> protection if anything other than the media
> fails -- i.e. the controller itself, corruption,
> fragmentation, running out of HD space, etc.

That is why I need raid 0 and Raid 1 raid 0 is the
ability to combine multiple disks as one which meens
that if a disk fails then part of the data will be
lost and replacing a 250GB disk is easier than replacing
a 4TB disk isn't it? Raid 1 is the ability to mirror
a disk in a second disk sector by sector having a
second copy of the disk when one of the disks fails
but sectors etc then the mirrored is used automaticaly
by the controler until the difective disk is changed.

keep in mind that this is a very simplefied explenation.

> For backups with a DVD jukebox, it is easy.
> Burn two sets of DVDs. Take one set off site.
> You then only need to backup the DB. You
> honestly don't need to backup all the cache
> information, since the system can rebuild it
> during normal use.
>
> However, if the company honestly wants
> two copies of the data on the network...

The company doesn't <want> two copies of the
data but they require a maximum down time of
2 hours this is impossible for me to support
only the trip to the customers site is around
5 hours so I need the second computer and
backup although this is decreases my earnings
and increases the customers cost I need this
installation to work with out any problems
if possible because this is a key customer.


> then make three copies of each DVD.
> Keep one off site and double up on
> jukeboxes. Start with 8 jukeboxes
> rather than 4. For an extra $40,000,
> you can just include an 'alternate'
> volume info for each DB book entry and
> your app can access either jukebox entry --
> ether using only one as a primary and
> going to the other in case it fails, or using
> both to help increase performance and
> noting if one fails.

In order to reach the same level of security
as the one I have in mind with hard disks
I will probably need 16 jukeboxes 8 per machine
and a set of backups. Considering that I have to
actually code part of RAID 1 in my program along
with the fact that I need to support writting
DVDs and a few other extras that are not needed
in a raid chain of disks this has some headen costs
that I need to care for.


> You always have the option of doing this
> later. For example, you can use it without
> OCR'd information, but program a system
> scan through the books, access the scanned
> pages, and OCR them into the DB at some
> later time.
>
> However, I would recommend that during
> scanning, someone enters the page numbers
> of the TOC and abstract into the data file,
> so you can automate this process later.

Actually this information will be entered on the
file name when the images are created.
The OCR has been declined for starters and I'll try
push it for the second version of the software.


You have a number of valid comments in hear most
of them very well thought and I really appreciate
your time and knowledge the only thing I need is
some hardware to play with to see how things work
for my self do you have any recomendations that I
could follow?


Thank you for your time and effort appreciated.

Regards
Yannis.

.



Relevant Pages

  • Re: *Another* Backup software question...
    ... At home I rsync to an external SATA disk with 1 TB. ... Burning DVDs is too much work. ... External HDDs are the better medias for backups at home. ... I don't know what is used for what kind of professional backup. ...
    (Debian-User)
  • Re: best format for large file split accross several DVDs?
    ... Life's too short to split and burn to DVDs. ... and convenient multiple generation multiple location backup. ... The disk about 500ml. ... Well unless I create my own movie clips they aren;t worth backing up. ...
    (uk.comp.sys.mac)
  • Update on backup solutions..
    ... I wanted to use DVDS but I discovered that DVDS dont do proper file names, so I couldn;t do a true backup, and when I tarred up the stuff into 4 gig chunks, they wouldn't do more than 2 gig files.. ... Sod it, I muttered, and wandered into my friendly hardware man, and got a end of line disk at very good money, and set up a script to Rdiff-backup the whole disk onto half of the new one. ...
    (comp.os.linux.misc)
  • Re: Server Advice Wanted.
    ... > per disk on a mildle used disk on the other a hard drive ... DVDs say 50-100 years. ... fail more often than DVDs since a) the DVD is read-only, ... better system than a tape backup. ...
    (borland.public.delphi.non-technical)
  • Re: RAID, software or hardware?
    ... I want to use RAID because my data is important. ... repair to be VERY simple if a disk fails. ... What do you have planned for the rest of the backup chain? ...
    (comp.os.linux.hardware)