2009年1月30日 星期五

IA32 System Programming - Part VII

Task Switch

A processor supported task context (or state) is defined as a TSS structure, which includes the following fields:

Dynamic Fields
  • Segment selector registers (CS, SS, DS, ES, FS, and GS).
  • General purpose registers (EAX, EBX, ECX, EDX, EBX, EBP, ESP, ESI, and EDI).
  • The processor status register (EFLAGS).
  • The program counter register (EIP).
  • Links to previous task.
Static Fiels
  • Task LDT (local descriptor table) segment descriptor.
  • Task page directory base register (CR3/PDBR)
  • Stack pointers for privilege level 0~2.
Static TSS fields are usually initialized by system software during task creation time.

There are 4 cases the processor will transfer execution to another task:
  • A far call or jump directly to a TSS descriptor in the GDT.
  • A far call or jump indirectly to a task-gate descriptor in the GDT or the current LDT.
  • An asserted interrupt or exception vector points to a task-gate descriptor in the IDT.
  • An "iret" when the EFLAGS::NT flag is set.
Analogous to the call gate descriptor for indirect access to privileged procedures, the task gate descriptor is defined for protected indirect reference to tasks. CPL, RPL and DPL of target TSS descriptor are checked in a direct TSS call or jump. CPL, RPL and DPL of the task gate descriptor are checked in an indirect task switch. Processor states are saved or restored into/from the task context in the TSS structure.


IA32 System Programming - Part VI

Fast System Calls

Fast system calls are provided by the IA32 architecture with a low overhead mechanism for system software.

  • The “sysenter” instruction is for use by user code running at privilege level 3 to access system procedures running at privilege level 0.
  • The “sysexit” instruction is for use by system procedures running at privilege level 0 for fast returns to user code running at privilege level 3.

The target procedure entry point and tack pointer are predefined in fixed MSR (model specific register) addresses. MSRs could be accessed with the “rdmsr” and “wrmsr” privileged instructions. The overhead of complex privilege checks is simplified, and memory access to descriptor tables is then eliminated.



IA32 System Programming - Part V

Inter-Privilege-Level Call

Program control transfer to privileged code segments through call gate descriptors, which in turn contain information to the location of target code segments, is called an inter-privilege call. The call gate descriptor is specified in the far form of the call/jmp instruction. The processor performs various privilege level checking before loading new data to the CS and EIP registers. General rules include checking following fields:

  • CPL of current code segment.
  • RPL of the requestor (segment selector of the far form call/jmp instruction)
  • DPL of the call gate descriptor.
  • DPL of target code segment descriptor.

CPL, RPL and DPL of target code segment are checked for privilege level switch. In addition, the DPL of the call gate descriptor acts as a guardian to control who has the access right of the target code segment according to the requestor’s privilege level. For instance, system software components that are designed to be accessed by both the system software itself and application programs (e.g., device I/O interfaces) could be executed through call gates that allow access at all privilege levels (DPL 0~3). Services that are designed to be used by system software internally (e.g., device initialization procedures) should only be accessed through more privileged call gates (DPL 0 or 1).

Stack switch occurs automatically if CPL differs from target code segment DPL. CPL changes to destination DPL accordingly. Stack pointers should be defined for each the task in its TSS structure for each privilege level it uses. Stack unwind is performed by the processor automatically after a far return instruction.



IA32 System Programming - Part IV

Inter-Segment Call

To transfer program control directly to another code segments without privilege level change, the target procedure entry point is specified in the far form of a call/jmp instruction. The processor performs various privilege level checking before loading new data to the CS and EIP registers. Involved privilege level fields are:

  • CPL (the privilege level of current code segment which contains the source call or jmp instruction)
  • DPL (the privilege level of the target code segment descriptor which contains the target procedure)
  • RPL (the requestor’s segment selector privilege level in the call or jmp instruction)

System software executive that needs to be protected from user privilege codes are placed in the non-conforming code segments. Execution cannot be transferred to a less privileged code segment directly through direct call or jump; otherwise a general exception will be asserted by the processor.

Some type of exception handlers (e.g., divide-error or overflow) and system software components that don’t have to access protected facilities (e.g., math libraries) could be loaded in conforming code segments. They are executed in higher privilege levels while keeping the CPL unchanged, which prevents it from accessing more privileged data. In this way, system overhead in privilege level change is alleviated.

There is no CPL change in either conforming or non-conforming form of direct call or jump to a target code segment. Since the CPL does not change, no stack switch occurs.



IA32 System Programming - Part III

Program Control Transfer Overview

When the CR0::PE bit is set, the processor will switch to protected mode and enables segmentation. There is no single control bit to disable protected mode once the processor enters protected mode. Similarly, when the CR0:PG bit is set, the processor enables paging and there is no single mode-bit to disable paging mechanism.

In protected mode, the processor always performs its execution within a task context. There is at least one task defined in the system. In addition, except explicit scheduling policy performed by the system software, task dispatching, execution, and suspension are supported by the processor task management facility. A task context is defined as a structure called TSS (task state segment), which contains code execution space information (a code segment, one or more data segment, and a stack segment) and task state information (processor status, general purpose registers, program counter, page directory base, local descriptor base, I/O map base, and a link to previous task). Program control transfer between tasks is supported by the processor through either the direct task switch or the indirect way called task-gate. Task switch mechanism will be depicted in Part VII later.

Program control transfer without explicit user task switch involves inter-segment call, inter-privilege-level call, fast system call, and interrupt/exception. The former two will be depicted in Part IV and Part V. Fast system call will be discussed in Part VI. And interrupt/exception will be left as future topic temporarily. The corresponding formal documentation of these sections could be found in the IA32 Intel Architecture Software Developer’s Manual – Volume 3: System Programming Guide.


Before digging into these topics further, readers could refer to the IA32 Intel Architecture Software Developer’s Manual – Volume 2: Instruction Set Reference, section 3.2, for the format and usage of the “call”, “jmp”, and “ret” instructions. Near call and jump refer to program transfer to a procedure within the current code segment. This is not what we’re interested and the focus will be put on the far form of these instructions.


IA32 System Programming - Part II

Segmentation and Paging Overview

IA32 protected mode memory management is divided into two parts: segmentation and paging. Segmentation provides hardware supported linear address space partitioning for isolation and deployment of code, data, and stack sections. Paging provides mechanism for on-demand virtual to physical memory mapping which could be utilized to isolate and protect memory between multiple tasks. Minimal form of segmentation is required in IA32 protected mode. So there is no way to disable segmentation. Paging is, however, an optional function for system software.

Segmentation starts by using a 16-bit segment selector and a 32-bit offset to locate a particular byte in the processor’s linear address space. The “selector:offset” pair is called a logical address (also called the far pointer). A selector is used to identify/lookup a segment descriptor in the descriptor table. The TI field in the selector specifies whether to lookup in a global descriptor table (GDT) or in the local descriptor table (LDT). The GDT and LDT base addresses are specified by the GDTR and LDTR registers respectively. There are other types of descriptor tables, but they are not involved in the logical to linear address translation so are out of the scope here temporarily. The RPL field specifies the requestor’s privilege level and is involved in the complex privilege level checking facility we’ll depicted in the next few parts of this IA32 system programming series. Each entry of the descriptor is fixed to 8-bytes in size. The base and limit fields of a descriptor specify the base and range of the segment in the processor linear space. The flags field of a descriptor specifies the descriptor type. When the S descriptor flag is set, the descriptor type is ether a code or a data segment. When the S descriptor flag is clear, the descriptor is a system descriptor which includes system segment descriptors (LDT segment descriptor, TSS descriptor) and gate descriptors (call-gate descriptor, task-gate descriptor, interrupt-gate descriptor, and trap gate descriptor). Gate descriptors are some kind of “gate” which indirectly points to a code entry point in a code segment or a TSS segment.

If paging is not enabled, the linear address space is directly mapped to the physical address space of the processor. If enabled, the mapping is indirectly through levels of page tables. When paging is enabled, the linear address space is divided into fixed-size pages. The processor’s system register specifies the size of a page configured by the system software, which could be 4K, 2M , or 4M bytes. If the page of a linear address is not currently allocated with a physical page, a page fault exception will be asserted. The corresponding exception handler of system software typically catches the exception then allocates a physical frame and/or copy data from the disk for the linear address. The first level of page table is called a page directory, whose base address is specified by the system software in the CR3 system register. To minimize bus access required for address translation, the most recently translated entries are cached in the translation look aside buffers (TLBs). When the CR3 registers is reloaded, the processor will flush and invalidates previously cached contents so the system software is safe to discard cache coherency problems about TLBs.

The whole story above is abbreviated in the following figure.


2009年1月29日 星期四

IA32 System Programming - Part I


System Architecture Overview

Except the well-known general purpose registers and segmentation registers, IA32 x86 incorporates a set of system registers, data structures, and instructions in its system level architecture. Only privileged code is allowed to access these system level resources. This architecture includes:
  • EFLAGS register controls I/O, maskable interrupts, debugging, task switch and virtual-86 mode.
  • Control registers (CR0~CR4) control the operation mode of the processor and performs memory paging.
  • Debug registers (DR0~DR7) and instructions provides facilities for system debugging code and performance monitoring.
  • Descriptor registers (GDTR, LDTR, IDTR, TR), descriptor tables, and related load/store instructions control segmented memory management.