4

I am studying computer architecture from the Intel Manual. The thing that I understand is that the instructions that we give are logical addresses which consist of a segment selector and an offset. It is basically CS register<<4 + offset. The Segment Selector maps to the GDT or LDT as given in the TI bit of the segment selector. GDT consists of Segment Descriptors which have BASE, LIMIT and RPL and the output is base address. This base address + offset provides the logical address.

What are the rules that decide which segment register (SS, DS, etc.) applies to different memory operations? e.g. what determines which segment is used for mov eax, [edi]?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
ashish
  • 150
  • 2
  • 11
  • Your question is not clear t me, can you explain a bit? – Rupsingh Aug 09 '16 at 06:26
  • basically i want to know that how that data segment, stack segment gets assigned for a procedure. – ashish Aug 09 '16 at 07:10
  • This may help http://stackoverflow.com/questions/29785991/can-someone-help-me-with-segmentation-and-8086-intels-microprocessor – Rupsingh Aug 09 '16 at 07:32
  • It explains that the segments are 64kb and can overlap in a linear address space but does not explain the process! – ashish Aug 09 '16 at 07:53
  • I edited your question to clearly ask what I *think* you were trying to ask, but I may have guessed wrong. Please edit if necessary. – Peter Cordes Aug 09 '16 at 16:07

1 Answers1

6

Code fetch always uses CS.

Data addressing modes default to DS (or SS when EBP or ESP are the base register) in "normal" addressing modes. (e.g. mov eax, [edi] is equivalent to [ds:edi], mov eax, [ebp+edi*4] is equivalent to mov eax, [ss: ebp + edi*4]).

(Some disassemblers make the segment explicit even when it's the default, so you see a lot of DS: cluttering up the disassembly output. (You can use a segment override prefix to select which segment will apply to the memory operand in an instruction.) In NASM syntax, explicitly using a [ds:edi] addressing mode will result in a redundant ds prefix in the machine code.)

Some instructions with implicit memory operands have different defaults:

Some string instructions use ES:EDI implicitly. e.g. The movs instruction reads from [DS:ESI] and writes to [ES:EDI], making it easy to copy between segments without segment override prefixes.

Memory operands using esp or ebp as the base register default to SS, and so do the implicit accesses for stack instructions like push/pop/call/ret.

FS and GS are never the default, so they can be used for special purposes (like thread-local storage) in a flat memory model system like modern 32 and 64-bit OSes.

wikipedia explains the same thing here.


This is also documented officially in Intel's ISA manuals. e.g. in Volume 2 (the instruction-set ref), Table 2-1. 16-Bit Addressing Forms with the ModR/M Byte has a footnote saying:

The default segment register is SS for the effective addresses containing a BP index, DS for other effective addresses.

(note that SP isn't a valid base address for 16-bit addressing modes.
Also note that when they say "index", that means when BP is used at all, even for [bp + si] or [bp+di]. In 32 and 64-bit addressing modes, there is a clearer distinction between base and index, and [symbol + ebp*4] still implies DS as the segment because EBP is used as an index, not the base.)

There's no equivalent footnote for 32 or 64-bit addressing modes, so the details must be in another volume of the manual.

See also the tag wiki for more links.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • where do these segments reside in memory? Linear address? Physical address? – ashish Aug 09 '16 at 18:25
  • 2
    @ashish: In current 32-bit OSes, all the segments (except FS and GS) have base=0, limit=4GiB, giving a flat memory space where `CS`, `DS`, `SS`, and `ES` are all equivalent. Segment translation happens before virtual-to-physical address translation. In Real mode, segment registers don't index descriptors, they're just multiplied by 16 and added to the offset part of the address. – Peter Cordes Aug 09 '16 at 18:36
  • And in long mode i.e., x86_64 mode? – ashish Aug 09 '16 at 18:46
  • 1
    @ashish: IIRC, segments other than FS and GS can't even have non-zero bases in long mode. AMD simplified / neutered them a lot for AMD64, since 32-bit OSes didn't use the functionality. Only Multics fans were disappointed: https://stackoverflow.com/a/10810340/224132 – Peter Cordes Aug 09 '16 at 18:56
  • that link answered my question. Thanks! – ashish Aug 09 '16 at 19:11
  • @PeterCordes where can I read about the values that CS, DS, SS, and ES are set to? What specifies that they're all set to base=0, limit=4gib? – Omar Darwish Sep 20 '20 at 23:52
  • @OmarDarwish: https://en.wikipedia.org/wiki/X86_memory_segmentation#Practices / https://en.wikipedia.org/wiki/Flat_memory_model – Peter Cordes Sep 20 '20 at 23:57