l i n u x - u s e r s - g r o u p - o f - d a v i s
L U G O D
 
Next Meeting:
November 4: Social gathering
Next Installfest:
TBD
Latest News:
Oct. 24: LUGOD election season has begun!
Page last updated:
2005 Jul 08 12:46

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] Matching Contents of Lists
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] Matching Contents of Lists




Lango, Trevor M. wrote:
First I apologize for the lame "reply" format - I am forced to use Microsoft Outlook Web Access (shudder) at work and wouldn't you know - it doesn't offer any options for mail format...?
Based on your rules above, TALL0047A and TAL0047A do in fact match

No, actually - however many characters are present in each have to match.  If the number of alpha characters in the first set of each field in the two lists differ - no match.


Are you really saying:

From both items
remove trailing alphas
No. If trailing alphas are present they must also match.


take the last 4 digits
remove any leading zeros
Yes.


Do the strings always start with alphas? Or are there sometimes numerics
within the first 1-4 characters?
Yes - always start with alphas.



Is there stuff between the leading and ending portions, such that the
entries may be more than 10 characters long?

There will never be more than 4 leading alphas, 5 numerics, and 2 trailing alphas.
So if, the string always starts with alphas, followed by digits, followed (optionally) by alphas, and the digits must match when leading zeros are removed then you could:

# here is one method (as always there are many ways to do it)
# read each file, parse the sections of each str, put those in a hash
# then compare the hashs, deleting the keys when you get a match

use strict;

my (@new, %file1, %file2);

open (FILE, "file1");
while (<FILE>) {
my $key = join("",parse($_));
$file1{$key} = $_;
}
close (FILE);

open (FILE, "file2");
while (<FILE>) {
my $key = join("",parse($_));
$file2{$key} = $_;
}
close (FILE);

my %tmp = %file1;
while (my ($key,$value) = each %tmp) {

if (defined $file2{$key}) {
delete $file1{$key};
delete $file2{$key};

push @new, $value;
}
}

print "matching\n";
print join("\n", @new),"\n";

print "in file1 but not file2\n";
print join("\n", sort values %file1),"\n";

print "in file2 but not file1\n";
print join("\n", sort values %file2),"\n";

sub parse {
my $str = $_[0];

# Capture the parts, leading alpha, followed by n digits,
# followed optionally by alphas
$str =~ /([a-zA-Z]+)(\d+)([a-zA-Z]+)?/;

my @str = ($1,$2,$3); # put the matches back into an array
$str[1] =~ s/^0+//; # strip leading 0s from digit portion

return @str;
}
_______________________________________________
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech



LinkedIn
LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
facebook
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
O'Reilly and Associates
For numerous book donations.