Re: [vox-tech] perl db question
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [vox-tech] perl db question
I won't go into details, but in summary I'd strongly recommend
a SQL database for anything much more complicated then a small table
or two. I'll list some of the things you should consider if your
going to do it yourself:
1. What do you do if you add a column? Write a program to migrate
your data from the old format? How many lines of code?
2. What happens if *gasp* you want to allow more then one user/process
access the data? Locking is tricky, painful to get right, painful
to track down when you get it wrong, and can result in lost data.
3. What happens if a table doesn't fit in memory? What if it does?
Getting a nice balance of speed, performance and robustness is
tricky. Ideally buffer sizes would change dynamically as needed,
and handle paging to/from disk as needed, complete with safechecking
in case of a powerfailure.
4. What happens if your index isn't fast enough? What data structure
will you use? Hashtables? How big are the bins? Which hash
function? What if part doesn't fit in memory? Ideally these would
5. New kinds of datatypes, timestamps, strings, blobs, arrays,
datetime, date, timezones etc are all a pain to handle, not to
mention date calculations, searches, and similar. How much code will
you use? Will you have y2k problems? Y2010? Y2038 (unix epoch).
6. New reports, ways to find records, new ways to view data, etc.
7. New functionality, say emailing customers who haven't bought something
in 90 days. Maybe a most reliable cars report based on repairs per
8. Recovery tools, indexs can and do get out of sync, files corrupt,
even backups are tricky for live databases, are you going to write
tools to check the validity of a row/column? Rebuild indexes? Check
for illegal values?
9. Things like holding strings can be tricky, bounds checking, insuring
strings don't leak into the next column, insuring that control/escape
characters don't cause confusion, unicode support, string sorting by
partial columns etc.
Each of the above can take a substantial amount of developer time,
substantial number of lines of code, bugs, and time to track them down,
losing data can be very annoying to users. Lost confidence, etc.
Rebuilding indexes because a rule, data, whatever changed unexpectedly
In my experience small projects often waste much more time reimplementing
a database, then it would to just support a real relational database
up front. Spend your time on the unique stuff, the next level of
complexity, whatever it maybe will likely be easier with a database.
SQL syntax can be quite handy... use it... Hell I even write my apache
logs to a sql database and I find it quite handy.
It gives me higher density (more logs per MB), greater flexibility when
I want to find something out quick. Anything from which bots searched
my webserver today, what are the top-10 broken url's, top 10 url's,
top 10 refers, tracking the progress of the various web-borne virus's etc.
It populates 6 relational databases, storing each agent, host, refer, url
etc only if it's unique. Not bad for 45 lines of perl or so.
Mathematics/Institute of Theoretical Dynamics
vox-tech mailing list