l i n u x - u s e r s - g r o u p - o f - d a v i s
L U G O D
 
Next Meeting:
April 21: Google Glass
Next Installfest:
TBD
Latest News:
Mar. 18: Google Glass at LUGOD's April meeting
Page last updated:
2010 Jun 23 19:41

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] Memory addressing?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] Memory addressing?



On 06/23/2010 12:39 PM, Brian Lavender wrote:
> What started this is that having a hash provides ideally a O(1).  So,
> I say that to achieve this, one would ideally want to have a large array
> to store your IP addresses and a hashing function that provides good
> distribution. To which, Tim pointed to using a smaller array and using
> chaining, because he initially thought that arrays were limited to smaller
> sizes than was observed by doing some simple tests. To which I replied,
> using a fixed size array and the chaining could cause the hash to decay
> into a linked list which has a O(n).

Indeed, the classic memory vs access time for hashes.

> And to which Tim has noted using a
> large fixed size array may just take up all your architectural limits
> of the program. I imagine that having such a large array would push
> the lower bound of the stack up in memory, or if allocated at runtime,
> would push the heap way down from the top.

I had what sounds like a similar problem for web logging.  I wanted to 
take a URL which contained agent, IP, URL, referer.  I posted the source 
on lugod quite awhile ago.

The pseudo code was:
for hits:
     agent_id=getid(agent,"agent string");
     ip_id=getid(ip,"ip string");
     url_id=getid(url,"url string");
     refer_id=getid(refer,"refer string");
     append_log(agent_id,ip_id,url_id,refer_id,datastamp,returncode)

Of course getid would lookup the value and if it didn't exist it would 
add that string to the table.

This resulted in several advantages:
* Made it much easier to extract info from the logs, things like how
   many hits did a subset of the website have last year vs this year
   so we can evaluate advertising effectiveness.
* Logs were denser (less disk)
* Running reports like top 10 for things like hits, bandwidth, IPs,
   agents, and 404s was easy.

While load testing on a rather old machine (PII-400 I believe) I could 
record 1000 hits/sec even when using mysql as the backend.  With a 
memory resident database or something lighter like sqllite I'd expect 
much better.  Of course hardware has come a long way since the PII-400 
as well.

Such a setup might work well with for an IDS as well.  With some care it 
should be relatively straight forward to make it parallel friendly.

With all that said, I used the big hash method in a simple little tool I 
wrote to instantly tell you where your bandwidth is going:
   http://broadley.org/bill/tcpdump.pl

The output:
tcpdump -n -c 1000 | ./tcpdump.pl | head -5
1000 packets captured
1000 packets received by filter
0 packets dropped by kernel
lines=1000 lengths=760
     613884 128.120.46.32.8649 -> 128.120.246.1.48876
     153463 128.120.46.33.8649 -> 128.120.246.1.48609
     153462 128.120.46.33.8649 -> 128.120.246.52.54912
       2684 131.215.74.22.61733 -> 128.120.246.102.9996


_______________________________________________
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech



LinkedIn
LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
facebook
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
O'Reilly and Associates
For numerous book donations.