l i n u x - u s e r s - g r o u p - o f - d a v i s
L U G O D
 
Next Meeting:
November 4: Social gathering
Next Installfest:
TBD
Latest News:
Oct. 24: LUGOD election season has begun!
Page last updated:
2005 Apr 24 22:55

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] simple parsing of multiple-line records... [solved]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] simple parsing of multiple-line records... [solved]



Thanks again Mark, for the excellent examples.

After a bit of reading in an O'Reilly book on AWK, it seems that multiple-line records aren't as hard as i was making them:

awk ' BEGIN {FS = "\n"; RS = ""}... rest of script'

did the job!

thanks!

--
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341



On Apr 22, 2005, at 12:35 AM, Mark K. Kim wrote:

AWK was made for what you're trying to do. I don't know AWK all that
well, but here's a very simple example that will produce a line-record
from the data you gave. This works with GAWK, and may or may not work
with pure AWK:

/easting/ { printf("rec=%d easting=%s ", rec, $2); }
/northing/ { printf("northing=%s ", $2); }
/elevation/ { printf("elevation=%s ", $2); }
/distance along surface/ { printf("dist=%s\n", $4); rec++; }

After creating the above as a file (call it "parse.awk"), you can parse
your data (assuming the data is in a file called "data.txt") like this:

$gawk -f parse.awk data.txt

which will produce the following:

rec=0 easting=661674.9375 northing=4035004.0000 elevation=968.8617 dist=15.9540
rec=1 easting=661683.7500 northing=4034946.7500 elevation=961.4768 dist=58.4077

where "rec" is the record number, and following is the rest of the data.
This code makes several assumptions like data ordering and so forth, SO
YOU SHOULDN'T USE THE ABOVE CODE in production but use it only as a
reference to write a more robust code.

Here's a code that's a little more robust but you should still customize
it for your data:

# Check to see whether we want to print the record, then print if we do.
function rec_chk() {
# If I have all the data for a record, print the record
if(easting && northing && elevation && dist) {
# Print the record
print_rec();

# Reset all data and increment the record counter
easting = northing = elevation = dist = "";
rec++;
}
}

# Print the record.
function print_rec() {
printf("%d %s %s %s %s\n", rec, easting, northing, elevation, dist);
}

# Data updates
/easting/ { easting=$2; rec_chk(); }
/northing/ { northing=$2; rec_chk(); }
/elevation/ { elevation=$2; rec_chk(); }
/distance along surface/ { dist=$4; rec_chk(); }

There is a lot of built-in functions if you want to do some math
operations or variable type conversions or whatever you can think of.
See `info gawk` under "functions" or "library functions" for more
information.

I hope that gets you started.

-Mark

PS: I usually just use PERL for something like this than AWK -- PERL has
more support, is more flexible, and I know it better than AWK! GAWK sure
does make simple parsing like this easier if you know how, though!

PPS: A friend of mine who went to Columbia tells me he took a class from
"A" (Prof. Alfred Aho) of "AWK"... =P How funny!


On Thu, 21 Apr 2005, Dylan Beaudette wrote:

Hi everyone,

this might be a really simple question, but :


i have a text file with multiple-line records in a format like this:

----------------------------
easting: 661674.9375
northing: 4035004.0000
elevation: 968.8617
distance along surface: 15.9540

easting: 661683.7500
northing: 4034946.7500
elevation: 961.4768
distance along surface: 58.4077
-----------------------------


I am familiar with single-line records, and work with them in AWK quite
frequently. However, i have yet to come upon an elegant solution to
parsing something like the above file, such that i can assign each
field in a multi-line record to a unique variable. a solution in AWK
would be ideal, but i am open to tackle this problem with something
like python.


any ideas?

thanks!

Dylan


_______________________________________________
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech

--
Mark K. Kim
AIM: markus kimius
Homepage: http://www.cbreak.org/
Xanga: http://www.xanga.com/vindaci
Friendster: http://www.friendster.com/user.php?uid=13046
PGP key fingerprint: 7324 BACA 53AD E504 A76E  5167 6822 94F0 F298 5DCE
PGP key available on the homepage
_______________________________________________
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech


_______________________________________________
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech



LinkedIn
LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
facebook
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Appahost Applications
For a significant contribution towards our projector, and a generous donation to allow us to continue meeting at the Davis Library.