l i n u x - u s e r s - g r o u p - o f - d a v i s
Next Meeting:
July 7: Social gathering
Next Installfest:
Latest News:
Jun. 14: June LUGOD meeting cancelled
Page last updated:
2010 Jun 23 19:41

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] Memory addressing?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] Memory addressing?

On 06/23/2010 12:39 PM, Brian Lavender wrote:
> What started this is that having a hash provides ideally a O(1).  So,
> I say that to achieve this, one would ideally want to have a large array
> to store your IP addresses and a hashing function that provides good
> distribution. To which, Tim pointed to using a smaller array and using
> chaining, because he initially thought that arrays were limited to smaller
> sizes than was observed by doing some simple tests. To which I replied,
> using a fixed size array and the chaining could cause the hash to decay
> into a linked list which has a O(n).

Indeed, the classic memory vs access time for hashes.

> And to which Tim has noted using a
> large fixed size array may just take up all your architectural limits
> of the program. I imagine that having such a large array would push
> the lower bound of the stack up in memory, or if allocated at runtime,
> would push the heap way down from the top.

I had what sounds like a similar problem for web logging.  I wanted to 
take a URL which contained agent, IP, URL, referer.  I posted the source 
on lugod quite awhile ago.

The pseudo code was:
for hits:
     agent_id=getid(agent,"agent string");
     ip_id=getid(ip,"ip string");
     url_id=getid(url,"url string");
     refer_id=getid(refer,"refer string");

Of course getid would lookup the value and if it didn't exist it would 
add that string to the table.

This resulted in several advantages:
* Made it much easier to extract info from the logs, things like how
   many hits did a subset of the website have last year vs this year
   so we can evaluate advertising effectiveness.
* Logs were denser (less disk)
* Running reports like top 10 for things like hits, bandwidth, IPs,
   agents, and 404s was easy.

While load testing on a rather old machine (PII-400 I believe) I could 
record 1000 hits/sec even when using mysql as the backend.  With a 
memory resident database or something lighter like sqllite I'd expect 
much better.  Of course hardware has come a long way since the PII-400 
as well.

Such a setup might work well with for an IDS as well.  With some care it 
should be relatively straight forward to make it parallel friendly.

With all that said, I used the big hash method in a simple little tool I 
wrote to instantly tell you where your bandwidth is going:

The output:
tcpdump -n -c 1000 | ./tcpdump.pl | head -5
1000 packets captured
1000 packets received by filter
0 packets dropped by kernel
lines=1000 lengths=760
     613884 ->
     153463 ->
     153462 ->
       2684 ->

vox-tech mailing list

LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
O'Reilly and Associates
For numerous book donations.