l i n u x - u s e r s - g r o u p - o f - d a v i s
L U G O D
 
Next Meeting:
January 6: Social gathering
Next Installfest:
TBD
Latest News:
Nov. 18: Club officer elections
Page last updated:
2004 Nov 24 14:06

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] Understanding a C hello world program
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] Understanding a C hello world program



On Wed, 24 Nov 2004, Peter Jay Salzman wrote:
[snip]
>    p@satan$ size hello_world-1.o
>       text    data     bss     dec     hex filename
>         48       0       0      48      30 hello_world-1.o
[snip]

What the... how did you make your program so small?  My output:

  $gcc hello.c -o hello
  $size hello
     text    data     bss     dec     hex filename
      960     256       4    1220     4c4 hello
  $strip hello
  $size hello
     text    data     bss     dec     hex filename
      960     256       4    1220     4c4 hello
  $ls -l hello
  -rwxr-xr-x  1 mark    mark    2948 Nov 24 10:24 hello*
  $gcc --version
  gcc (GCC) 3.3.5 (Debian 1:3.3.5-2)
  [snip]

-_-'

> I assume that data is the initialized data segment where programmer
> initialized variables are stored.  That's zero because I have no global
> variables.

Yes.

> I also assume that bss is zero because I have no programmer-uninitialized
> global variables.

What is bss?  Is that space dynamically generated in memory at runtime?

> When I disassemble the object file, I get 35 bytes:
[snip]
> however, "size" reports a text size of 48.  Where do the extra 13 bytes come
> from that size reports?  Probably from "hello world\n\0", which is 13 bytes.

Yes.

> But if that's true, the string lives in the text segment.  I always pictured
> the text segment as being straight opcodes.  There must be a structure to the
> text segment that I was unaware of.

To the CPU, there's no distinction between opcodes and read-only data.
Pretty much the only differences between text and data sections are:

  Text: r-x (readable, not writable, executable)
  Data: rw- (readable, writable, not executable)

At least in theory. data section is also executable in practice which
allows buffer overflow to be used to execute malicious codes.  But we
can't simply block out execution of the data section because I think
function pointers need to be executed outside the text area..???

Anyway, so if you think about a constant string like, "hello, world",
there's no reason not to store it in the text section, because the string
isn't supposed to be changeable.  If you were to store it in the data
section, however, you could accidentally overwrite it, which is fine from
a single-program point of view, but you can save some memory if you put it
in the text section, because the text section isn't replicated (only the
other sections) if you run multiple versions of the same program.  This
makes it more important that the text section be non-writable and that
policy be enforced, so that one program accidentally modifying the text
section doesn't affect another instance of the same program.

Or that's how I understand it.

> But then, looking at the output of the picture that objdump has of the object
> file...
[snip]
> However, I note that there's a section called .rodata, which I've never heard
> of before, but I'm assumming that it stands for "read only data section".
> It's 13 bytes - just the size of "hello world\n\0".

Looking at the assembly output (thanks for that idea, Bryan!), .rodata is
a custom name of a section:

   .section .rodata

whereas the .text section is reserved:

   .text

They might as well have named .rodata section as .whatever instead of
.rodata and it'll still work the same way.  What the compiler does with
this section is I think linker-dependent, but it probably takes the
section and stick it in one of the canonical sections -- .text, .data,
etc. according to some predefined rule.  (It's probably .text section by
default, unless specified otherwise.)

These custom sections are useful because it allows the linker to shuffle
them around to fit any hardware constraints that may exist.  The linker
won't break up a section, though.

>From the assembly output, the .rodata section goes before the .text
section, so it probably gets mapped to some default section which happens
to be .text.  It's possible gcc knows to map the custom section named
.rodata to the .text area, but I just tried changing the name from .rodata
to .whatever and it still compiles without complaining.

> That's probably why this
> program segfaults:
>
>    #include<stdio.h>
>
>    int main(void)
>    {
>       char *string="hello world\n";
>
>       string[3] = 'T';
>
>       puts(string);
>
>       return 0;
>    }
>
> because string is the address of something that lives in a section of memory
> marked read-only.  Whammo -- sigsegv.  I remember Mark posting about 3 or 4
> years ago that this actually worked on some other Unicies (not Linux).

Wow... you still remember that?  Yeah I seem to recall trying out
something like that.  I think it was one of Sun's OSes.  Might have been
HP/UX, too, since those two were the primary other unices I had access to.

> Maybe my immediate question is -- where do read-only strings live?  In the
> text section or the .rodata section?  I've seen evidence that it lives in
> both section.

.rodata is a custom section name which lives in the .text section.  Think
of it as an alias to an offset into the .text section.  At least that's
how it worked in one assembler I used to use, and this seems to work the
same way from what you've discovered.  That's pretty cool!

> If anyone cares to riff off this, I'd certainly be interested in anything
> anyone says.

I think it's cool that you've done all this analysis.  I didn't know about
the size program and I haven't really seen much usage of objdump so it's
cool to see it used here.  Thanks Peter!

-Mark


-- 
Mark K. Kim
AIM: markus kimius
Homepage: http://www.cbreak.org/
Xanga: http://www.xanga.com/vindaci
Friendster: http://www.friendster.com/user.php?uid=13046
PGP key fingerprint: 7324 BACA 53AD E504 A76E  5167 6822 94F0 F298 5DCE
PGP key available on the homepage
_______________________________________________
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech



LinkedIn
LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
facebook
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Appahost Applications
For a significant contribution towards our projector, and a generous donation to allow us to continue meeting at the Davis Library.