MenuetOS.de: Dokumente/Programmierung/Assembler Befehlssatz

Assembler Kommandos für nasm und fasm

\H{insAAA} \i\c{AAA}, \i\c{AAS}, \i\c{AAM}, \i\c{AAD}: ASCII
Anpassung

\c AAA                           ; 37                   [8086]

\c AAS                           ; 3F                   [8086]

\c AAD                           ; D5 0A                [8086]
\c AAD imm                       ; D5 ib                [8086]

\c AAM                           ; D4 0A                [8086]
\c AAM imm                       ; D4 ib                [8086]

Diese Instruktionen werden in Verbindung mit den add, subtract,
multiply und divide Instruktionen verwendet um BinärCodierte
Dezimal-Arithmetik in \e{ungepackt}-form zu verarbeiten (eine BCD
Ziffer pro Byte - einfach um nach oder von ASCII zu übersetzen, daher
der Instruktionsname). Es gibt ebenfalls gepackte BCD Instruktionen
\c{DAA} und \c{DAS}: siehe \k{insDAA}.

\c{AAA} sollte nach einer ein-Byte \c{ADD} Instruktion benutzt werden,
wo das Ziel das \c{AL} Register ist: mit Hilfe der Überprüfung der Werte
im niederen Nibble von \c{AL} und auch dem Auxiliary Carry Flag \c{AF},
es bestimmt ob die Addition einen Überlauf hat, und passt ihn, wenn
nötig an (und setzt das Carry Flag). Du kannst lange BCD-Zeichen zu-
sammenfügen indem man \c{ADD}/\c{AAA} auf die niederen Ziffern ausführt,
dann führt man \c{ADC}/\c{AAA} auf jede folgende Ziffer aus.

\c{AAS} arbeitet ähnlich wie \c{AAA}, aber man benutzt es vielmehr nach
den \c{SUB} Instruktionen als nach \c{ADD}.

\c{AAM} wird benutzt nachdem du zwei Dezimale Ziffern miteinander
multipliziert hast und das Ergebnis in \c{AL} ist: es dividiert \c{AL}
durch Zehn und speichert den Quotient in \c{AH}, der Rest bleibt in
\c{AL}. Der Divisor 10 kann durch angeben eines Operanden bei der
Instruktion geändert werden: eine besonders handliche Nutzung dessen
ist \c{AAM 16}, aufgrund der zwei Nibbles in \c{AL} die in \c{AH} und
\c{AL} getrennt werden.

\c{AAD} führt die umgekehrte Operation wie \c{AAM} aus: es multipliziert
\c{AH} mit 10, addiert es zu \c{AL} und setzt \c{AH} auf null. Und
wieder kann der Multiplikator 10 geändert werden.

\H{insADC} \i\c{ADC}: Add with Carry

\c ADC r/m8,reg8                 ; 10 /r                [8086]
\c ADC r/m16,reg16               ; o16 11 /r            [8086]
\c ADC r/m32,reg32               ; o32 11 /r            [386]

\c ADC reg8,r/m8                 ; 12 /r                [8086]
\c ADC reg16,r/m16               ; o16 13 /r            [8086]
\c ADC reg32,r/m32               ; o32 13 /r            [386]

\c ADC r/m8,imm8                 ; 80 /2 ib             [8086]
\c ADC r/m16,imm16               ; o16 81 /2 iw         [8086]
\c ADC r/m32,imm32               ; o32 81 /2 id         [386]

\c ADC r/m16,imm8                ; o16 83 /2 ib         [8086]
\c ADC r/m32,imm8                ; o32 83 /2 ib         [386]

\c ADC AL,imm8                   ; 14 ib                [8086]
\c ADC AX,imm16                  ; o16 15 iw            [8086]
\c ADC EAX,imm32                 ; o32 15 id            [386]

\c{ADC} führt eine Integer-Addition aus: es addiert seine zwei Operanden
miteinander, plus dem Wert im Carry-Flag und gibt das Ergebnis im Ziel-
Operanden (ersten) zurück. Die Flags werden nach dem Ergebnis der
Operation gesetzt: insbesondere das Carry-Flag wird beeinflusst und kann
von einer folgenden \c{ADC} Instruktion benutzt werden.

In den Formen eines schnellen zweiten 8-Bit Operanden und eines
längeren ersten Operanden, wird der zweite Operand als signed angesehen
und wird vorzeichenbehaftet auf die länge des ersten Operanden gesetzt.
In diesen Fällen wird die \c{BYTE} Anweisung bei NASM benötigt um diese
Art der Instruktion zu erzeugen.

Um zwei Nummern ohne das Carry-Flag zu berücksichtigen zu addieren,
benutze \c{ADD} (\k{insADD}).

\H{insADD} \i\c{ADD}: Add Integers

\c ADD r/m8,reg8                 ; 00 /r                [8086]
\c ADD r/m16,reg16               ; o16 01 /r            [8086]
\c ADD r/m32,reg32               ; o32 01 /r            [386]

\c ADD reg8,r/m8                 ; 02 /r                [8086]
\c ADD reg16,r/m16               ; o16 03 /r            [8086]
\c ADD reg32,r/m32               ; o32 03 /r            [386]

\c ADD r/m8,imm8                 ; 80 /0 ib             [8086]
\c ADD r/m16,imm16               ; o16 81 /0 iw         [8086]
\c ADD r/m32,imm32               ; o32 81 /0 id         [386]

\c ADD r/m16,imm8                ; o16 83 /0 ib         [8086]
\c ADD r/m32,imm8                ; o32 83 /0 ib         [386]

\c ADD AL,imm8                   ; 04 ib                [8086]
\c ADD AX,imm16                  ; o16 05 iw            [8086]
\c ADD EAX,imm32                 ; o32 05 id            [386]

\c{ADD} führt eine Integer-Addition aus: es addiert seine zwei
Operanden miteinander und gibt das Ergebnis in seinen Ziel-Operanden
(ersten) aus. Die Flags werden nach dem Ergebnis der Operation
gesetzt: insbesondere ist das Carry-Flag beeinflusst und kann bei
einer folgenden \c{ADC} (\k{insADC}) Instruktion genutzt werden.

In den Formen eines schnellen zweiten 8-Bit Operanden und eines
längeren ersten Operanden, wird der zweite Operand als signed angesehen
und wird vorzeichenbehaftet auf die länge des ersten Operanden gesetzt.
In diesen Fällen wird die \c{BYTE} Anweisung bei NASM benötigt um diese
Art der Instruktion zu erzeugen.

\H{insAND} \i\c{AND}: Bitwise AND

\c AND r/m8,reg8                 ; 20 /r                [8086]
\c AND r/m16,reg16               ; o16 21 /r            [8086]
\c AND r/m32,reg32               ; o32 21 /r            [386]

\c AND reg8,r/m8                 ; 22 /r                [8086]
\c AND reg16,r/m16               ; o16 23 /r            [8086]
\c AND reg32,r/m32               ; o32 23 /r            [386]

\c AND r/m8,imm8                 ; 80 /4 ib             [8086]
\c AND r/m16,imm16               ; o16 81 /4 iw         [8086]
\c AND r/m32,imm32               ; o32 81 /4 id         [386]

\c AND r/m16,imm8                ; o16 83 /4 ib         [8086]
\c AND r/m32,imm8                ; o32 83 /4 ib         [386]

\c AND AL,imm8                   ; 24 ib                [8086]
\c AND AX,imm16                  ; o16 25 iw            [8086]
\c AND EAX,imm32                 ; o32 25 id            [386]

\c{AND} führt eine Bitweise UND-Operation zwischen seinen beiden
Operanden aus (z.B. jedes Bit des Ergebnis ist 1, wenn und nur wenn
die entsprechenden Bits der beiden Eingaben 1 sind), und speichert
das Ergebnis in den Ziel-Operanden (ersten).

In den Formen eines schnellen zweiten 8-Bit Operanden und eines
längeren ersten Operanden, wird der zweite Operand als signed angesehen
und wird vorzeichenbehaftet auf die länge des ersten Operanden gesetzt.
In diesen Fällen wird die \c{BYTE} Anweisung bei NASM benötigt um diese
Art der Instruktion zu erzeugen.

Die MMX Instruktion \c{PAND} (siehe \k{insPAND}) führt die selbe
Operation auf die 64-Bit MMX Register aus.

\H{insARPL} \i\c{ARPL}: Adjust RPL Field of Selector

\c ARPL r/m16,reg16              ; 63 /r                [286,PRIV]

\c{ARPL} erwartet seine zwei Wort-Operanden als sortiert. Es stellt
das RPL (Requested Privilege Level - in den unteren zwei Bits des
Selektors gespeichert) Feld des Ziel-Operanden (ersten) ein um sicher-
zustellen dass es nicht weniger priviligiert ist als das RPL-Feld des
Quell-Operanden. Das Zero-Flag ist gesetzt, wenn und nur wenn sich
etwas geändert hat.

\H{insBOUND} \i\c{BOUND}: Check Array Index against Bounds

\c BOUND reg16,mem               ; o16 62 /r            [186]
\c BOUND reg32,mem               ; o32 62 /r            [386]

\c{BOUND} erwartet, dass sein zweiter Operand auf ein Bereich im
Speicher zeigt, der zwei vorzeichenbehaftet Werte mit der gleichen
größe wie der erste Operand enthält (z.B. zwei Wörter in der 16-Bit
Form; zwei Doppelwörter in der 32-Bit Form). Es führt zwei
vorzeichenbehaftete vergleiche durch: wenn der Wert im Register als
sein erster Operand, kleiner als der erste Wert im Speicher
durchlaufen wird, oder größer oder gleich dem zweiten, wirft es
eine BR-Ausnahme. Sonst tut es nichts.

\H{insBSF} \i\c{BSF}, \i\c{BSR}: Bit Scan

\c BSF reg16,r/m16               ; o16 0F BC /r         [386]
\c BSF reg32,r/m32               ; o32 0F BC /r         [386]

\c BSR reg16,r/m16               ; o16 0F BD /r         [386]
\c BSR reg32,r/m32               ; o32 0F BD /r         [386]

\c{BSF} sucht in seinem Quell-Operanden (zweiten) nach einem
gesetzten Bit, von unten beginnend, und wenn es eins findet,
speichert er den Index in seinen Ziel-Operanden (ersten). Wenn
kein gesetztes Bit gefunden wurde, ist der Inhalt des Ziel-
Operanden nicht definiert.

\c{BSR} führt die selbe Funktion aus, sucht aber von oben beginnend,
somit findet es höchstwertigste gesetzte Bit.

Bit Indexe sind von 0 (niederwertigstes) bis 15 oder 31 (höher-
wertigstes).

\H{insBSWAP} \i\c{BSWAP}: Byte Swap

\c BSWAP reg32                   ; o32 0F C8+r          [486]

\c{BSWAP} tauscht die Reihenfolge der vier Bytes eines 32-Bit Registers:
bits 0-7 tauscht die Plätze mit den Bits 24-31 und die Bits 8-18 wird
mit den Bits 16-23 getauscht. Es gibt kein explizites 16-Bit Äquivalent:
um die Bytes zu tauschen können \c{AX}, \c{BX}, \c{CX} oder \c{DX},
\c{XCHG} genutzt werden.

\H{insBT} \i\c{BT}, \i\c{BTC}, \i\c{BTR}, \i\c{BTS}: Bit Test

\c BT r/m16,reg16                ; o16 0F A3 /r         [386]
\c BT r/m32,reg32                ; o32 0F A3 /r         [386]
\c BT r/m16,imm8                 ; o16 0F BA /4 ib      [386]
\c BT r/m32,imm8                 ; o32 0F BA /4 ib      [386]

\c BTC r/m16,reg16               ; o16 0F BB /r         [386]
\c BTC r/m32,reg32               ; o32 0F BB /r         [386]
\c BTC r/m16,imm8                ; o16 0F BA /7 ib      [386]
\c BTC r/m32,imm8                ; o32 0F BA /7 ib      [386]

\c BTR r/m16,reg16               ; o16 0F B3 /r         [386]
\c BTR r/m32,reg32               ; o32 0F B3 /r         [386]
\c BTR r/m16,imm8                ; o16 0F BA /6 ib      [386]
\c BTR r/m32,imm8                ; o32 0F BA /6 ib      [386]

\c BTS r/m16,reg16               ; o16 0F AB /r         [386]
\c BTS r/m32,reg32               ; o32 0F AB /r         [386]
\c BTS r/m16,imm                 ; o16 0F BA /5 ib      [386]
\c BTS r/m32,imm                 ; o32 0F BA /5 ib      [386]

Diese Instruktionen testen alle ein Bit von ihrem ersten Operanden,
wo der Index im zweiten Operanden gegeben ist und speichert den Wert
dieses Bits in das Carry-Flag. Bit Indexe sind von 0 (niederwertigste)
bis 15 oder 31 (höchstwertigstes).

Um zusätzlich den Originalen Wert des Bits im Carry-Flag zu sichern,
löscht \c{BTR} das Bit im Operanden selber. \c{BTS} setzt das Bit und
\c{BTC} tauscht den Bitwert um. \c{BT} modifiziert den Operanden nicht.

Das Bit-Offset sollte nicht größer als die größe des Operanden sein.

\H{insCALL} \i\c{CALL}: Call Subroutine

\c CALL imm                      ; E8 rw/rd             [8086]
\c CALL imm:imm16                ; o16 9A iw iw         [8086]
\c CALL imm:imm32                ; o32 9A id iw         [386]
\c CALL FAR mem16                ; o16 FF /3            [8086]
\c CALL FAR mem32                ; o32 FF /3            [386]
\c CALL r/m16                    ; o16 FF /2            [8086]
\c CALL r/m32                    ; o32 FF /2            [386]

\c{CALL} ruft eine Unterroutine auf, indem er den aktuellen
Instruction Pointer (\c{IP}) sowie auch optional \c{CS} auf den
Stack pusht und dann zur angegebenen Adresse springt.

\c{CS} genauso wie \c{IP} gepusht wenn, und nur wenn der Aufruf ein
'far call' ist, z.B. wenn eine Ziel-Segment-Adresse in der Instruktion
angegeben ist. zwei von Doppelpunkt getrennte Argumente sind 'far calls';
somit entspricht es der \c{CALL FAR mem} form.

Du kannst dich zwischen den beiden direkten \i{far call} formen
(\c{CALL imm:imm}) entscheiden, indem du die \c{WORD} und \c{DWORD}
Schlüsselwörter benutzt: \c{CALL WORD 0x1234:0x5678}) oder \c{CALL
DWORD 0x1234:56789abc}.

Die \c{CALL FAR mem} formen führen einen 'far call' aus, indem sie die
Ziel-Adresse aus dem Speicher laden. die geladene Adresse besteht aus
16 oder 32 Offset-Bits (Abhängig von der Operandgröße) und 16
Segment-Bits. Die Operandgröße kann mit \c{CALL WORD FAR mem} oder
\c{CALL DWORD FAR mem} vorgegeben werden.

Die \c{CALL r/m} formen führen einen \i{near call} aus (innerhalb des
selben Segments), indem sie die Ziel-Adresse aus dem Speicher oder
aus einem Register laden. Das Schlüsselwort \c{NEAR} kann aus Gründen
besserer Lesbarkeit angegeben werden, ist aber nicht notwendig. Die
Operandgröße kann durch \c{CALL WORD mem} oder \c{CALL DWORD mem}
vorgegeben werden.

Aus Komfortgründen brauchst du bei NASM keinen lästigen Aufruf der Art
\c{CALL SEG routine:routine}, aber erlaubt stattdessen das einfachere
Synonym \c{CALL FAR routine}.

Die \c{CALL r/m} formen von oben sind 'near calls'; NASM akzeptiert
das \c{NEAR} Schlüsselwort (z.B. \c{CALL NEAR [address]}), selbst
wenn es nicht unbedingt benötigt wird.

\H{insCBW} \i\c{CBW}, \i\c{CWD}, \i\c{CDQ}, \i\c{CWDE}: Sign Extensions

\c CBW                           ; o16 98               [8086]
\c CWD                           ; o16 99               [8086]
\c CDQ                           ; o32 99               [386]
\c CWDE                          ; o32 98               [386]

Diese Instruktionen wandeln einen Short-Wert in einen längeren mit
vorzeichen um, indem sie das höchste Bit des Originalwertes kopieren
und es beim erweiterten einfügen.

\c{CBW} erweitert \c{AL} nach \c{AX} indem es das höchste Bit von
\c{AL} in jedes Bit von \c{AH} kopiert. \c{CWD} erweitert \c{AX}
nach \c{DX:AX}, indem es das höchste Bit von \c{AX} in jedes Bit
von \c{DX} kopiert. \c{CWDE} erweitert \c{AX} nach \c{EAX} und
\c{CDQ} erweitert \c{EAX} nach \c{EDX:EAX}.

\H{insCLC} \i\c{CLC}, \i\c{CLD}, \i\c{CLI}, \i\c{CLTS}: Clear Flags

\c CLC                           ; F8                   [8086]
\c CLD                           ; FC                   [8086]
\c CLI                           ; FA                   [8086]
\c CLTS                          ; 0F 06                [286,PRIV]

Diese Instruktionen löschen verschiedene Flags. \c{CLC} löscht das
Carry-Flag; \c{CLD} löscht das Direction-Flag; \c{CLI} löschte das
Interrupt-Flag (deaktiviert Interrupts); und \c{CLTS} löscht das
Task-Switched (\c{TS}) Flag in \c{CR0}.

Um das Carry, Direction oder Interrupt Flag zu setzen, benutze die
\c{STC}, \c{STD} und \c{STI} Instruktionen (\k{insSTD}). Um das
Carry-Flag zu invertieren, benutze \c{CMC} (\k{insCMC}).

\H{insCMC} \i\c{CMC}: Complement Carry Flag

\c CMC                           ; F5                   [8086]

\c{CMC} ändert den Wert des Carry-Flag: wenn es 0 war, wird es auf 1
gesetzt und umgekehrt.

\H{insCMOVcc} \i\c{CMOVcc}: Conditional Move

\c CMOVcc reg16,r/m16            ; o16 0F 40+cc /r      [P6]
\c CMOVcc reg32,r/m32            ; o32 0F 40+cc /r      [P6]

\c{CMOV} kopiert seinen Quell-Operanden (zweiten) in seinen Ziel-
Operanden (ersten), wenn die gegebene Code-Bedingung erfüllt ist;
sonst tut es nichts.

Für eine Liste von Code-Bedingungen, siehe \k{iref-cc}.

Obwohl die \c{CMOV} Instruktionen oben geflaggt werden \c{P6}, werden
sie nicht von allen Pentium Pro Prozessoren unterstützt; die
\c{CPUID} Instruktion (\k{insCPUID}) gibt ein Bit zurück, welches
anzeigt ob die 'Conditional Moves' unterstützt werden.

\H{insCMP} \i\c{CMP}: Compare Integers

\c CMP r/m8,reg8                 ; 38 /r                [8086]
\c CMP r/m16,reg16               ; o16 39 /r            [8086]
\c CMP r/m32,reg32               ; o32 39 /r            [386]

\c CMP reg8,r/m8                 ; 3A /r                [8086]
\c CMP reg16,r/m16               ; o16 3B /r            [8086]
\c CMP reg32,r/m32               ; o32 3B /r            [386]

\c CMP r/m8,imm8                 ; 80 /0 ib             [8086]
\c CMP r/m16,imm16               ; o16 81 /0 iw         [8086]
\c CMP r/m32,imm32               ; o32 81 /0 id         [386]

\c CMP r/m16,imm8                ; o16 83 /0 ib         [8086]
\c CMP r/m32,imm8                ; o32 83 /0 ib         [386]

\c CMP AL,imm8                   ; 3C ib                [8086]
\c CMP AX,imm16                  ; o16 3D iw            [8086]
\c CMP EAX,imm32                 ; o32 3D id            [386]

\c{CMP} führt eine interne Subtraktion seines zweiten Operanden mit
seinem ersten Operanden durch und beeinflusst die Flags so als ob
eine richtige Subtraktion stattfinden würde, aber speichert das
Ergebnis der Subtraktion nirgendwo hin.

In den formen mit einem direkten zweiten 8-Bit Operanden und einem
längeren ersten Operanden, wird der zweite Operand als vorzeichen-
behaftet angenommen und wird auf die länge des ersten Operanden
vorzeichen-erweitert. In diesen Füllen ist die \c{BYTE} Anweisung
nötig um NASM dazu zu zwingen diese Form der Instruktion auszuführen.

\H{insCMPSB} \i\c{CMPSB}, \i\c{CMPSW}, \i\c{CMPSD}: Compare Strings

\c CMPSB                         ; A6                   [8086]
\c CMPSW                         ; o16 A7               [8086]
\c CMPSD                         ; o32 A7               [386]

\c{CMPSB} vergleicht das Byte bei \c{[DS:SI]} oder \c{[DS:ESI]} mit
dem Byte bei \c{[ES:DI]} oder \c{[ES:EDI]} und setzt entsprechend
die Flags. Dann erhöht oder verringert es \c{SI} und \c{DI} (oder
\c{ESI} und \c{EDI}) (Abhängig vom Direction-Flag: erhöhen, wenn das
Flag gelöscht ist, verringern, wenn es gesetzt ist).

Die benutzten Register sind \c{SI} und \c{DI}, wenn die Adressgröße
16 Bits ist und \c{ESI} und \c{EDI}, wenn sie 32 Bits ist. Wenn du
eine Adressgröße brauchst die nicht der aktuellen \c{BITS} Einstellung
entspricht, kannst du explizites \i\c{a16} oder \i\c{a32} Prefix
benutzen.

Das Segment-Register das genutzt wird um von \c{[SI]} oder \c{[ESI]}
geladen zu werden, kann überschrieben werden indem man ein Segment-
Registernamen als Prefix benutzt (als Beispiel, \c{es cmpsb}). Die
Benutzung von \c{ES} zum laden von \c{[DI]} oder \c{[EDI]} kann nicht
überschrieben werden.

\c{CMPSW} und \c{CMPSD} arbeiten auf die gleiche weise, aber sie
vergleichen ein Wort oder ein Doppelwort anstelle eines Bytes und
erhöhen oder verringern das Adress-Register um 2 oder 4 anstelle von 1.

Die \c{REPE} und \c{REPNE} Prefixe (äquivalent zu \c{REPZ} und
\c{REPNZ}) können benutzt werden um Instruktionen bis zu \c{CX}
oder \c{ECX} (abhängig von der Adressgröße) male zu wiederholen,
bis das erste ungleiche oder gleiche Byte gefunden ist.

\H{insCMPXCHG} \i\c{CMPXCHG}, \i\c{CMPXCHG486}: Compare and Exchange

\c CMPXCHG r/m8,reg8             ; 0F B0 /r             [PENT]
\c CMPXCHG r/m16,reg16           ; o16 0F B1 /r         [PENT]
\c CMPXCHG r/m32,reg32           ; o32 0F B1 /r         [PENT]

\c CMPXCHG486 r/m8,reg8          ; 0F A6 /r             [486,UNDOC]
\c CMPXCHG486 r/m16,reg16        ; o16 0F A7 /r         [486,UNDOC]
\c CMPXCHG486 r/m32,reg32        ; o32 0F A7 /r         [486,UNDOC]

Diese zwei Instruktionen führen exakt die selbe Operation durch; jedoch
unterstützen offensichtlich einige (nicht alle) 486er Prozessoren es
unter einem nicht-standardisierten Opcode, NASM bietet dafür die
undokumentierte \c{CMPXCHG486} form an, um nicht-standardisierten
Opcode zu erzeugen.

\c{CMPXCHG} vergleicht seinen Ziel-Operanden (ersten) mit dem Wert in
\c{AL}, \c{AX} oder \c{EAX} (abhängig von der größe der Instruktion).
Wenn sie gleich sind, kopiert es seinen Quell-Operanden (zweiten) in
das Ziel und setzt das Zero-Flag. Andernfalls löscht es das Zero-Flag
und belässt das Ziel.

\c{CMPXCHG} ist beabsichtigt für kleinste Operationen in Multi-
Tasking oder Multiprozessor-Umgebungen. Um einen Wert sicher im
Shared-Memory zu aktualisieren, zum Beispiel wenn du einen Wert in
\c{EAX} laden möchtest und den aktualisierten Wert in \c{EBX} lädst
und dann die \c{lock cmpxchg [value],ebx} Instruktion ausführst.
Wenn sich \c{value} nicht seit dem laden geändert hat, wird es auf
wunsch mit dem neuen Wert aktualisiert und das Zero-Flag wird gesetzt,
damit du weißt, dass es funktioniert hat. (Das \c{LOCK} Prefix
hindert einen anderen prozessor daran irgendetwas in mitten dieser
Operation zu tun.) Wenn jedoch ein anderer Prozessor den Wert inmitten
deines ladens oder dem versuchten speichern modifiziert hat, speichert
er es nicht und du wirst über den Fehlschlag über ein gelöschtes Zero-
Flag informiert, so dass du von vorne beginnen kannst.

\H{insCMPXCHG8B} \i\c{CMPXCHG8B}: Compare and Exchange Eight Bytes

\c CMPXCHG8B mem                 ; 0F C7 /1             [PENT]

Dies ist eine größere und unhandlichere Version von \c{CMPXCHG}: es
vergleicht den 64-Bit (acht-Byte) Wert, der in \c{[mem]} gespeichert
ist mir dem Wert in \c{EDX:EAX}. Wenn sie gleich sind, wird das Zero-
Flag gesetzt und speichert \c{ECX:EBX} in den Speicherbereich. Wenn
sie ungleich sind, wird das Zero-Flag gelöscht und belässt den
Speicherbereich.

\H{insCPUID} \i\c{CPUID}: Get CPU Identification Code

\c CPUID                         ; 0F A2                [PENT]

\c{CPUID} gibt verschiedene Informationen über den Prozessor zurück,
auf den es ausgeführt wird. Es füllt die vier Register \c{EAX},
\c{EBX}, \c{ECX} und \c{EDX} mit Informationen, welche sich auf den
Eingabe-Inhalt von \c{EAX} bezieht.

\c{CPUID} wirkt auch als Barriere, um serialisierte Instruktionen aus-
zuführen: Wenn man die \c{CPUID} Instruktion ausführt, wird garantiert,
das alle Auswirkungen (Speicher-Modifikationen, Flag-Modifikationen,
Register-Modifikationen) der vorigen Instruktionen beendet sind, bevor
die nächste Instruktion geholt wird.

Die Information wird wie folgt zurückgegeben:

\b Wenn \c{EAX} Null bei der Eingabe ist, \c{EAX} erhält bei der Ausgabe
den maximalen akzeptierten Eingabewert von \c{EAX} und \c{EBC:EDX:ECX}
enthält den String \c{"GenuineIntel"} (oder auch nicht, wenn du einen
kompatiblen Prozessor hast). \c{EBX} enthält \c{"Genu"} (in NASM's
eigener Art der Zeichenkosntanten, wie in \k{chrconst} beschrieben),
\c{EDX} enthält \c{"ineI"} und \c{ECX} enthält \c{"ntel"}.

\b Wenn \c{EAX} Eins bei der Eingabe ist, enthält \c{EAX} Versions-
Informationen über den Prozessor bei der Ausgabe und \c{EDX} enthält
einen Satz von Feature-Flags, die die Unterstützung oder Nicht-Unter-
stützung von verschiedenen Features anzeigt. Zum Beispiel ist Bit 8
gesetzt, wenn die \c{CMPXCHG8B} (\k{insCMPXCHG8B}) Instruktion unter-
stützt wird, Bit 15 ist gesetzt, wenn die 'Conditional Move' Instruktionen
(\k{insCMOVcc} und \k{insFCMOVB}) unterstützt werden und Bit 23 ist
gesetzt, wenn die MMX-Instruktionen unterstützt werden.

\b Wenn \c{EAX} zwei bei der Eingabe ist, enthalten \c{EAX], \c{EBX},
\c{ECX} und \c{EDX} alle Informationen über caches und TLB's (Translation
Lookahead Buffers).

Für weitere Informationen zu den Daten die von \c{CPUID} zurückgegeben
werden, siehe die Dokumentation auf der Intel Webseite.

\H{insDAA} \i\c{DAA}, \i\c{DAS}: Decimal Adjustments

\c DAA                           ; 27                   [8086]
\c DAS                           ; 2F                   [8086]

Diese Instruktionen werden in Verbindung mit den 'add' und 'subtract'
Instruktionen verwendet, um Binär-Codierte Dezimal-Berechnungen in
\e{packed} (eine BCD-Ziffer pro nibble) form durchzuführen. Für das
ungepackte äquivalent, siehe \k{insAAA}.

\c{DAA} sollte nach einer Ein-Byte \c{ADD} Instruktion benutzt werden,
wo das Ziel das \c{AL} Register war: mit Hilfe der Überprüfung des
Wertes in \c{AL} und auch des 'Auxiliary-Flag' \c{AF}, legt es fest,
ob eine Ziffer der Addition übergelaufen ist und passt diese (und
setzt das Carry und Auxiliary-Flag) gegebenenfalls an. Du kannst auch
lange BCD-Zeichenketten zusammenfügen, indem du \c{ADD}/\c{DAA} auf die
zwei niederen Ziffern ausführst und dann \c{ADC}/\c{DAA} auf jedes
folgende Ziffernpaar ausführst.

\c{DAS} arbeitet ähnlich wie \c{DAA}, aber wird eher nach der \c{SUB}
Instruktion als nach \c{ADD} benutzt.

\H{insDEC} \i\c{DEC}: Decrement Integer

\c DEC reg16                     ; o16 48+r             [8086]
\c DEC reg32                     ; o32 48+r             [386]
\c DEC r/m8                      ; FE /1                [8086]
\c DEC r/m16                     ; o16 FF /1            [8086]
\c DEC r/m32                     ; o32 FF /1            [386]

\c{DEC} subtrahiert 1 von seinem Operanden. Es beeinflusst nicht das
Carry-Flag: um das Carry-Flag zu beeinflussen, benutze \c{SUB etwas,1}
(siehe \k{insSUB}). Siehe auch \c{INC} (\k{insINC}).

\H{insDIV} \i\c{DIV}: Unsigned Integer Divide

\c DIV r/m8                      ; F6 /6                [8086]
\c DIV r/m16                     ; o16 F7 /6            [8086]
\c DIV r/m32                     ; o32 F7 /6            [386]

\c{DIV} führt eine vorzeichenlose Integer-Division durch. Der explizite
Operand ist der Divisor; Der Dividend und die Ziel-Operanden sind auf
folgende Weise impliziert:

\b Für \c{DIV r/m8}, \c{AX} wird von dem gegebenen Operanden dividiert;
der Quotient wird in \c{AL} und der Rest in \c{AH} gespeichert.

\b Für \c{DIV r/m16}, \c{DX:AX} wird von dem gegebenen Operanden dividiert;
der Quotient wird in \c{AX} und der Rest in \c{DX} gespeichert.

\b Für \c{DIV r/m32}, \c{EDX:EAX} wird von dem gegebenen Operanden dividiert;
der Quotient wird in \c{EAX} und der Rest in \c{EDX} gespeichert.

Vorzeichenbehaftete Division wird von der \c{IDIV} Instruktion durchgeführt:
siehe \k{insIDIV}.

\H{insEMMS} \i\c{EMMS}: Empty MMX State

\c EMMS                          ; 0F 77                [PENT,MMX]

\c{EMMS} setzt das FPU TAG-Wort (markiert welche Fließkomma-Register
verfügbar sind) auf alle, bedeutet, dass alle Register für die FPU
zur Verfügung stehen. Es sollte nach einer MMX-Instruktion und vor
jeder folgenden Fließkomma-Operation benutzt werden.

\H{insENTER} \i\c{ENTER}: Create Stack Frame

\c ENTER imm,imm                 ; C8 iw ib             [186]

\c{ENTER} baut einen Stack-Frame für einen Hochsprachen-Prozedur-
Aufruf. Der erste Operand (das \c{iw} in der oberen Opcode Definition
bezieht sich auf den ersten Operanden) gibt die Menge des Stack-Bereichs,
der für lokale Variablen angelegt werden soll, an; der zweite (das obige
\c{ib}) gibt die Verschachtelungsebene für die Prozedur an (für Sprachen
wie Pascal mit verschachtelten Prozeduren).

Die \c{ENTER} Funktion mit einer Verschachtelungsebene von Null ist
äquivalent zu

\c           PUSH EBP            ; or PUSH BP         in 16 bits
\c           MOV EBP,ESP         ; or MOV BP,SP       in 16 bits
\c           SUB ESP,operand1    ; or SUB SP,operand1 in 16 bits

Dies erzeugt einen Stack-Frame mit den Prozedurparametern die aufwärts
von \c{EBP} erreichbar sind und den lokalen Variablen die abwärts von
\c{EBP} erreichbar sind.

Mit einem Verschachtelungslevel von eins, ist der erzeugte Stack-Frame
4 (oder 2) Bytes größer und der Wert des letzten Frame-Pointer \c{EBP}
ist im Speicher bei \c{[EBP-4]} erreichbar.

Dies erlaubt \c{ENTER}, wenn es mit einem Verschachtelungslevel von
zwei aufgerufen wurde, zum Stack-Frame des vorrangegangenen Wertes von
\c{EBP} zu 'schauen', den Frame-Pointer beim Offset -4 von dem zu
finden und es mit seinem neuen Frame-Pointer zu belegen, so dass wenn
eine Level-zwei Prozedur innerhalb einer Level-eins Prozedur aufgerufen
wurde, \c{[EBP-4]} den Frame-Pointer der Level-eins Prozedur hält und
\c{[EBP-8]} den der Level-zwei Prozedur. Somit können bis zu 31 Ver-
schachtelungslevels erreicht werden.

Stack-Frames die mit \c{ENTER} erzeugt wurden können mit der \c{LEAVE}
Instruktion zerstört werden: siehe \k{insLEAVE}.

\H{insF2XM1} \i\c{F2XM1}: Calculate 2**X-1

\c F2XM1                         ; D9 F0                [8086,FPU]

\c{F2XM1} verdoppelt den Wert von \c{ST0}, subtrahiert davon 1 und
speichert das Ergebnis wieder zurück in \c{ST0}. Der Inhalt am anfang
von \c{ST0} muss eine Zahl im Bereich von -1 bis +1 sein.

\H{insFABS} \i\c{FABS}: Floating-Point Absolute Value

\c FABS                          ; D9 E1                [8086,FPU]

\c{FABS} berechnet den absoluten Wert von \c{ST0} und speichert das
Ergebnis wieder in \c{ST0} zurück.

\H{insFADD} \i\c{FADD}, \i\c{FADDP}: Floating-Point Addition

\c FADD mem32                    ; D8 /0                [8086,FPU]
\c FADD mem64                    ; DC /0                [8086,FPU]

\c FADD fpureg                   ; D8 C0+r              [8086,FPU]
\c FADD ST0,fpureg               ; D8 C0+r              [8086,FPU]

\c FADD TO fpureg                ; DC C0+r              [8086,FPU]
\c FADD fpureg,ST0               ; DC C0+r              [8086,FPU]

\c FADDP fpureg                  ; DE C0+r              [8086,FPU]
\c FADDP fpureg,ST0              ; DE C0+r              [8086,FPU]

\c{FADD}, ein gegebener Operand, addiert den Operand \c{ST0} zu und
speichert das Ergebnis wieder in \c{ST0} zurück. Wenn der Operand
den \c{TO} modifizierer hat, wird das Ergebnis im gegebenen Register
gespeichert.

\c{FADDP} führt die gleiche Funktion wie \c{FADD TO} aus, aber holt
den Register Stack nachdem er das Ergebnis gespeichert hat.

Die gegebenen zwei-Operanden formen sind Synonyme für die ein-Operanden
formen.

\H{insFBLD} \i\c{FBLD}, \i\c{FBSTP}: BCD Floating-Point Load and Store

\c FBLD mem80                    ; DF /4                [8086,FPU]
\c FBSTP mem80                   ; DF /6                [8086,FPU]

\c{FBLD} läd eine 80-Bit (Zehn-Byte) gepackte Binär-Codierte Dezimal-
zahl von der angegebenen Speicher-Adresse, wandelt es in eine reelle
Zahl und legt es auf den Register-Stack. \c{FBSTP} speichert den Wert
von \c{ST0} in gepackter BCD-form an die gegebene Adresse und holt den
Register-Stack.

\H{insFCHS} \i\c{FCHS}: Floating-Point Change Sign

\c FCHS                          ; D9 E0                [8086,FPU]

\c{FCHS} negiert die Zahl in \c{ST0}: negative Zahlen werden positiv
und umgekehrt.

\H{insFCLEX} \i\c{FCLEX}, \{FNCLEX}: Clear Floating-Point Exceptions

\c FCLEX                         ; 9B DB E2             [8086,FPU]
\c FNCLEX                        ; DB E2                [8086,FPU]

\c{FCLEX} löscht Fließkomma-Ausnahme, die vielleicht noch existieren.
\c{FNCLEX} tut das selbe aber wartet nicht auf die Beendigung voriger
Fließkomma-Operationen (inklusive das Handling von existierenden
Ausnahmen).

\H{insFCMOVB} \i\c{FCMOVcc}: Floating-Point Conditional Move

\c FCMOVB fpureg                 ; DA C0+r              [P6,FPU]
\c FCMOVB ST0,fpureg             ; DA C0+r              [P6,FPU]

\c FCMOVBE fpureg                ; DA D0+r              [P6,FPU]
\c FCMOVBE ST0,fpureg            ; DA D0+r              [P6,FPU]

\c FCMOVE fpureg                 ; DA C8+r              [P6,FPU]
\c FCMOVE ST0,fpureg             ; DA C8+r              [P6,FPU]

\c FCMOVNB fpureg                ; DB C0+r              [P6,FPU]
\c FCMOVNB ST0,fpureg            ; DB C0+r              [P6,FPU]

\c FCMOVNBE fpureg               ; DB D0+r              [P6,FPU]
\c FCMOVNBE ST0,fpureg           ; DB D0+r              [P6,FPU]

\c FCMOVNE fpureg                ; DB C8+r              [P6,FPU]
\c FCMOVNE ST0,fpureg            ; DB C8+r              [P6,FPU]

\c FCMOVNU fpureg                ; DB D8+r              [P6,FPU]
\c FCMOVNU ST0,fpureg            ; DB D8+r              [P6,FPU]

\c FCMOVU fpureg                 ; DA D8+r              [P6,FPU]
\c FCMOVU ST0,fpureg             ; DA D8+r              [P6,FPU]

Die \c{FCMOV} Instruktionen führen bedingte 'Move'-Operationen aus:
jede von ihnen bewegt den Inhalt der gegebenen Register nach \c{ST0}
wenn seine Bedingung erfüllt ist, ansonsten tun sie nichts.

Die Bedingungen sind nicht die selben wie die Standard-Codebedingungen,
die bei bedingten Sprung-Instruktionen benutzt werden. Die Bedingungen
\c{B}, \c{BE}, \c{NB}, \c{NBE}, \c{E} und \c{NE} sind genauso wie die
normalen, aber keine der anderen Standardbedingungen werden unterstützt.
Stattdessen wird die Bedingung \c{U} und sein Gegenstück \c{NU} angeboten;
Die \c{U} Bedingung ist erfüllt wenn die letzten zwei verglichenen
Fließkomma-Zahlen ungeordnet sind, z.B.: wenn sie nicht gleich sind, aber
es auch nicht gesagt werden kann, dass eine größer als die andere ist,
zum Beispiel wenn sie NaNs (Not a Number) sind. (Der Flag-Status, der das
signalisiert, ist das Parity-Flag: somit ist die \c{U} Bedingung
begrifflich äquivalent zu \c{PE} und \c{NU} ist äquivalent zu \c{PO}.)

Die \c{FCMOV} Bedingungen testen die Hauptprozessor Status-Flags, nicht
die FPU Status-Flags, somit wird \c{FCMOV} nicht direkt nach \c{FCOM}
arbeiten. Stattdessen solltest du entweder \c{FCOMI} benutzen, welches
direkt in das Haupt-CPU-Flag-Wort schreibt oder \c{FSTSW} um die FPU-
Flags herauszubekommen.

Obwohl die \c{FCMOV} Instruktion oben \c{P6} geflaggt ist, werden sie
nicht von jedem Pentium Pro Prozessor unterstützt; Die \c{CPUID}
Instruktion (\k{insCPUID}) gibt in einem bestimmten Bit zurück, ob die
bedingten 'Moves' unterstützt werden.

\H{insFCOM} \i\c{FCOM}, \i\c{FCOMP}, \i\c{FCOMPP}, \i\c{FCOMI}, \i\c{FCOMIP}: Floating-Point Compare

\c FCOM mem32                    ; D8 /2                [8086,FPU]
\c FCOM mem64                    ; DC /2                [8086,FPU]
\c FCOM fpureg                   ; D8 D0+r              [8086,FPU]
\c FCOM ST0,fpureg               ; D8 D0+r              [8086,FPU]

\c FCOMP mem32                   ; D8 /3                [8086,FPU]
\c FCOMP mem64                   ; DC /3                [8086,FPU]
\c FCOMP fpureg                  ; D8 D8+r              [8086,FPU]
\c FCOMP ST0,fpureg              ; D8 D8+r              [8086,FPU]

\c FCOMPP                        ; DE D9                [8086,FPU]

\c FCOMI fpureg                  ; DB F0+r              [P6,FPU]
\c FCOMI ST0,fpureg              ; DB F0+r              [P6,FPU]

\c FCOMIP fpureg                 ; DF F0+r              [P6,FPU]
\c FCOMIP ST0,fpureg             ; DF F0+r              [P6,FPU]

\c{FCOM} vergleicht \c{ST0} mit dem angegebenen Operand und setzt die
entsprechenden FPU-Flags. \c{ST0} wird als ein linksseitiger bei einem
vergleich behandelt, so dass das Carry-Flag gesetzt wird (als ein
'kleiner als' Ergebnis) wenn \c{ST0} kleiner als der angegebene Operand
ist.

\c{FCOMP} tut das selbe wie \c{FCOM}, aber holt danach den Register-Stack.
\c{FCOMPP} vergleicht \c{ST0} mit \c{ST1} und holt den Register-Stack
zweimal.

\c{FCOMI} und \c{FCOMIP} arbeiten wie dir entsprechenden formen von
\c{FCOM} und \c{FCOMP}, aber schreiben ihre Ergebnisse direkt in die
CPU Flag-Register und nicht in das FPU Status-Wort, so dass sie direkt
von bedingten Sprüngen oder bedingten 'Move' Instruktionen gefolgt
werden können.

Die \c{FCOM} Instruktionen unterscheiden sich von den \c{FUCOM}
Instruktionen (\k{insFUCOM}) nur in der weise, dass sie NaN's
(Not a Number) still behandeln: \c{FUCOM} behandelt sie still und
setzt die Condition-Code-Flags in ein 'ungeordnetes' Ergebnis, während
\c{FCOM} eine Ausnahme erzeugt.

\H{insFCOS} \i\c{FCOS}: Cosine

\c FCOS                          ; D9 FF                [386,FPU]

\c{FCOS} berechnet den Kosinus von \c{ST0} (in Radiant) und speichert
das Ergebnis in \c{ST0}. Siehe auch \c{FSINCOS} (\k{insFSIN}).

\H{insFDECSTP} \i\c{FDECSTP}: Decrement Floating-Point Stack Pointer

\c FDECSTP                       ; D9 F6                [8086,FPU]

\c{FDECSTP} vermindert das 'oberste' Feld im Fließkomma-Status-Wort.
Dies hat den Effekt, dass der FPU Register-Stack sich um eins weiter-
dreht, als wenn der Inhalt von \c{ST7} auf den Stack abgelegt würde.
Siehe auch \c{FINCSTP} (\k{insFINCSTP}).

\H{insFDISI} \i\c{FxDISI}, \i\c{FxENI}: Disable and Enable Floating-Point Interrupts

\c FDISI                         ; 9B DB E1             [8086,FPU]
\c FNDISI                        ; DB E1                [8086,FPU]

\c FENI                          ; 9B DB E0             [8086,FPU]
\c FNENI                         ; DB E0                [8086,FPU]

\c{FDISI} und \c{FENI} deaktivieren und aktivieren Fließkomma-Interrupts.
Diese Instruktionen sind nur auf originalen 8087 Prozessoren von Bedeutung:
Die 287er und höher behandeln diese als 'No-OPeration' (NOP) Instruktionen.

\c{FNDISI} und \c{FNENI} tun das selbe wie \c{FDISI} und \c{FENI}, aber
ohne auf den Fließkomma-Prozessor zu warten, was er als erstes getan hat.

\H{insFDIV} \i\c{FDIV}, \i\c{FDIVP}, \i\c{FDIVR}, \i\c{FDIVRP}: Floating-Point Division

\c FDIV mem32                    ; D8 /6                [8086,FPU]
\c FDIV mem64                    ; DC /6                [8086,FPU]

\c FDIV fpureg                   ; D8 F0+r              [8086,FPU]
\c FDIV ST0,fpureg               ; D8 F0+r              [8086,FPU]

\c FDIV TO fpureg                ; DC F8+r              [8086,FPU]
\c FDIV fpureg,ST0               ; DC F8+r              [8086,FPU]

\c FDIVR mem32                   ; D8 /0                [8086,FPU]
\c FDIVR mem64                   ; DC /0                [8086,FPU]

\c FDIVR fpureg                  ; D8 F8+r              [8086,FPU]
\c FDIVR ST0,fpureg              ; D8 F8+r              [8086,FPU]

\c FDIVR TO fpureg               ; DC F0+r              [8086,FPU]
\c FDIVR fpureg,ST0              ; DC F0+r              [8086,FPU]

\c FDIVP fpureg                  ; DE F8+r              [8086,FPU]
\c FDIVP fpureg,ST0              ; DE F8+r              [8086,FPU]

\c FDIVRP fpureg                 ; DE F0+r              [8086,FPU]
\c FDIVRP fpureg,ST0             ; DE F0+r              [8086,FPU]

\c{FDIV} dividiert \c{ST0} durch den angegebenen Operanden und speichert
das Ergebnis in \c{ST0} zurück, es sei denn \c{TO} ist mit angegeben, in
dessen Fall es den angegebenen Operanden durch \c{ST0} dividiert und das
Ergebnis im Operanden speichert.

\c{FDIVR} tut das selbe, aber es dividiert auf umgekehrte weise: wenn
\c{TO} nicht angegeben ist, dividiert es den angegebenen Operanden durch
\c{ST0} und speichert das Ergebnis in \c{ST0}, wenn \c{TO} hingegen
angegeben ist, dividiert es \c{ST=} durch seinen Operanden und speichert
das Ergebnis in den Operanden.

\c{FDIVP} funktioniert wie \c{FDIV TO}, aber holt den Register-Stack
einmal wenn es beendet ist. \c{FDIVRP} funktioniert wie \c{FDIVR TO},
aber holt den Register-Stack einmal wenn es beendet ist.

\H{insFFREE} \i\c{FFREE}: Flag Floating-Point Register as Unused

\c FFREE fpureg                  ; DD C0+r              [8086,FPU]

\c{FFREE} markiert das angegebene Register als leer.

\H{insFIADD} \i\c{FIADD}: Floating-Point/Integer Addition

\c FIADD mem16                   ; DE /0                [8086,FPU]
\c FIADD mem32                   ; DA /0                [8086,FPU]

\c{FIADD} addiert den 16-Bit oder 32-Bit Integer der am angegebene
Speicherort gespeichert ist zum \c{ST0} hinzu und speichert das
Ergebnis in \c{ST0}.

\H{insFICOM} \i\c{FICOM}, \i\c{FICOMP}: Floating-Point/Integer Compare

\c FICOM mem16                   ; DE /2                [8086,FPU]
\c FICOM mem32                   ; DA /2                [8086,FPU]

\c FICOMP mem16                  ; DE /3                [8086,FPU]
\c FICOMP mem32                  ; DA /3                [8086,FPU]

\c{FICOM} vergleicht \c{ST0} mit dem 16-Bit oder 32-Bit Integer, der
am angegebenen Speicherort gespeichert ist und setzt die entsprechenden
FPU-Flags. \c{FICOMP} tut das selbe, aber holt danach den Register-Stack.

\H{insFIDIV} \i\c{FIDIV}, \i\c{FIDIVR}: Floating-Point/Integer Division

\c FIDIV mem16                   ; DE /6                [8086,FPU]
\c FIDIV mem32                   ; DA /6                [8086,FPU]

\c FIDIVR mem16                  ; DE /0                [8086,FPU]
\c FIDIVR mem32                  ; DA /0                [8086,FPU]

\c{FIDIV} dividiert \c{ST0} durch den 16-Bit oder 32-Bit Integer, der am
angegebenen Speicherort angegeben ist und speichert das Ergebnis in
\c{ST0}. \c{FIDIVR} dividiert auf die umgekehrte weise: es dividiert
den Integer durch \c{ST0} aber speichert das Ergebnis in \c{ST0}.

\H{insFILD} \i\c{FILD}, \i\c{FIST}, \i\c{FISTP}: Floating-Point/Integer Conversion

\c FILD mem16                    ; DF /0                [8086,FPU]
\c FILD mem32                    ; DB /0                [8086,FPU]
\c FILD mem64                    ; DF /5                [8086,FPU]

\c FIST mem16                    ; DF /2                [8086,FPU]
\c FIST mem32                    ; DB /2                [8086,FPU]

\c FISTP mem16                   ; DF /3                [8086,FPU]
\c FISTP mem32                   ; DB /3                [8086,FPU]
\c FISTP mem64                   ; DF /0                [8086,FPU]

\c{FILD} läd einen Integer aus einem Speicherort, wandelt ihn in eine
reelle Zahl um und legt ihn auf den FPU-Register-Stack ab. \c{FIST}
wandelt \c{ST0} in einen Integer um und speichert diesen im Speicher;
\c{FISTP} tut das selbe wie \c{FIST}, aber holt danach den Register-Stack.

\H{insFIMUL} \i\c{FIMUL}: Floating-Point/Integer Multiplication

\c FIMUL mem16                   ; DE /1                [8086,FPU]
\c FIMUL mem32                   ; DA /1                [8086,FPU]

\c{FIMUL} multipliziert \c{ST0} mit dem im angegebenen Speicherort
gespeicherten 16-Bit oder 32-Bit Integer und speichert das Ergebnis
in \c{ST0}.

\H{insFINCSTP} \i\c{FINCSTP}: Increment Floating-Point Stack Pointer

\c FINCSTP                       ; D9 F7                [8086,FPU]

\c{FINCSTP} erhöht das 'oberste' Feld im Fließkomma-Status-Wort. Dies
hat den Effekt, dass das FPU-Register-Stack sich um eins weiterdreht,
als wenn das Register-Stack 'abgeholt' würde; ungleich dem abholen
des Stacks, wie es von vielen FPU Instruktionen durchgeführt wird,
kennzeichnet es nicht das neue \c{ST7} (vorher \c{ST0}) als leer.
Siehe auch \c{FDECSTP} (\k{insFDECSTP}).

\H{insFINIT} \i\c{FINIT}, \i\c{FNINIT}: Initialise Floating-Point Unit

\c FINIT                         ; 9B DB E3             [8086,FPU]
\c FNINIT                        ; DB E3                [8086,FPU]

\c{FINIT} initialisiert die FPU zu seinen Standard-Einstellungen. Es
markiert alle Register als leer, trotzdem ändert es nicht deren
Wert. \c{FNINIT} tut das selbe ohne zuerst auf laufende Ausnahmen
zu warten.

\H{insFISUB} \i\c{FISUB}: Floating-Point/Integer Subtraction

\c FISUB mem16                   ; DE /4                [8086,FPU]
\c FISUB mem32                   ; DA /4                [8086,FPU]

\c FISUBR mem16                  ; DE /5                [8086,FPU]
\c FISUBR mem32                  ; DA /5                [8086,FPU]

\c{FISUB} subtrahiert den 16-Bit oder 32-Bit Integer, der am angegebenen
Speicherort gespeichert ist von \c{ST0} und speichert das Ergebnis in
\c{ST0}. \c{FISUBR} subtrahiert auf umgekehrte weise, z.B. subtrahiert
\c{ST0} vom angegebenen Integer, aber speichert das Ergebnis immer noch
in \c{ST0}.

\H{insFLD} \i\c{FLD}: Floating-Point Load

\c FLD mem32                     ; D9 /0                [8086,FPU]
\c FLD mem64                     ; DD /0                [8086,FPU]
\c FLD mem80                     ; DB /5                [8086,FPU]
\c FLD fpureg                    ; D9 C0+r              [8086,FPU]

\c{FLD} läd einen Fließkomma-Wert aus einem angegebenen Register oder
Speicherort und legt ihn auf den FPU-Register-Stack ab.

\H{insFLD1} \i\c{FLDxx}: Floating-Point Load Constants

\c FLD1                          ; D9 E8                [8086,FPU]
\c FLDL2E                        ; D9 EA                [8086,FPU]
\c FLDL2T                        ; D9 E9                [8086,FPU]
\c FLDLG2                        ; D9 EC                [8086,FPU]
\c FLDLN2                        ; D9 ED                [8086,FPU]
\c FLDPI                         ; D9 EB                [8086,FPU]
\c FLDZ                          ; D9 EE                [8086,FPU]

Diese Instruktionen legen spezifische Standardkonstanten auf den
FPU-Register-Stack ab. \c{FLD1} legt den Wert 1 ab; \c{FLDL2E} legt
den Logarithmus von e zur Basis 2 ab; \c{FLDL2T} legt den Logarithmus
von 10 zur Basis 2 ab; \c{FLDLG2} legt den Logarithmus von 2 zur
Basis 10 ab; \c{FLDLN2} legt den Logarithmus von 2 zur Basis e ab;
\c{FLDPI} legt PI ab und \c{FLDZ} legt Null ab.

\H{insFLDCW} \i\c{FLDCW}: Load Floating-Point Control Word

\c FLDCW mem16                   ; D9 /5                [8086,FPU]

\c{FLDCW} läd einen 16-Bit Wert aus dem Speicher und speichert ihn in
das FPU-Kontroll-Wort (kontrolliert dinge wie den Rundungsmodus, die
Genauigkeit und die Ausnahmen-Masken). Siehe auch \c{FSTCW} (\k{insFSTCW}).

\H{insFLDENV} \i\c{FLDENV}: Load Floating-Point Environment

\c FLDENV mem                    ; D9 /4                [8086,FPU]

\c{FLDENV} läd die FPU-Umgebungswerte (Kontroll-Wort, Status-Wort, Tag-
Wort, Instruktion-Pointer, Data-Pointer und den letzten Opcode) aus dem
Speicher. Der Speicherbereich ist 14 oder 28 Bytes lang, abhängig vom
aktuellen CPU-Modus. Siehe auch \c{FSTENV} (\k{insFSTENV}).

\H{insFMUL} \i\c{FMUL}, \i\c{FMULP}: Floating-Point Multiply

\c FMUL mem32                    ; D8 /1                [8086,FPU]
\c FMUL mem64                    ; DC /1                [8086,FPU]

\c FMUL fpureg                   ; D8 C8+r              [8086,FPU]
\c FMUL ST0,fpureg               ; D8 C8+r              [8086,FPU]

\c FMUL TO fpureg                ; DC C8+r              [8086,FPU]
\c FMUL fpureg,ST0               ; DC C8+r              [8086,FPU]

\c FMULP fpureg                  ; DE C8+r              [8086,FPU]
\c FMULP fpureg,ST0              ; DE C8+r              [8086,FPU]

\c{FMUL} multipliziert \c{ST0} mit dem angegebenen Operanden und speichert
das Ergebnis in \c{ST0}, es sei denn, der \c{TO}-Bezeichner wird benutzt,
in dem Falle speichert es das Ergebnis in den Operanden. \c{FMULP} tut
das selbe wie \c{FMUL TO} und holt dann den Register-Stack ab.

\H{insFNOP} \i\c{FNOP}: Floating-Point No Operation

\c FNOP                          ; D9 D0                [8086,FPU]

\c{FNOP} tut nichts.

\H{insFPATAN} \i\c{FPATAN}, \i\c{FPTAN}: Arctangent and Tangent

\c FPATAN                        ; D9 F3                [8086,FPU]
\c FPTAN                         ; D9 F2                [8086,FPU]

\c{FPATAN} berechnet den Arcus-Tangens in Radianten vom Ergebnis der
Division von \c{ST1} durch \c{ST0}, speichert das Ergebnis in \c{ST1}
und holt den Register-Stack ab. Es arbeitet wie die C \c{atan2} Funktion,
in dieser er die beiden VOrzeichen von \c{ST0} und \c{ST1} ändert und
den Ausgabewert durch PI ändert (es führt eine echte Koordinaten-
Konvertierung durch, wobei \c{ST1} die Y-Koordinate ist und \c{ST0} die
X-Koordinate, also nicht nur ein einfacher Arcus-Tangens).

\c{FPTAN} berechnet den Tangens des Wertes in \c{ST0} (in Radianten)
und speichert das Ergebnis in \c{ST0} zurück.

\H{insFPREM} \i\c{FPREM}, \i\c{FPREM1}: Floating-Point Partial Remainder

\c FPREM                         ; D9 F8                [8086,FPU]
\c FPREM1                        ; D9 F5                [386,FPU]

Diese beiden Instruktionen erzeugen den Rest der durch die Division
von \c{ST0} durch \c{ST1} entstanden ist. Dieser wird berechnet, indem
man \c{ST0} durch \c{ST1} teilt, das Ergebnis zu einer Ganzzahl rundet,
wieder mit \c{ST1} multipliziert und man den Wert berechnet, welcher
benötigt würde um wieder hinzuaddiert, den Originalwert von \c{ST0} zu
erhalten.

Die zwei Instruktionen unterscheiden in der weise, wie sie die Ganzzahl
runden. \c{FPREM} rundet gegen 0, so dass der zurückgegebene Rest immer
dasselbe Vorzeichen hat wie der Originalwert in \c{ST0}; \c{FPREM1} rundet
sie auf die nächste Ganzzahl, so dass der Rest immer den Wert der größeren
Gewichtigkeit von \c{St1} hat.

Beide Instruktionen berechnen teile des Restes, das bedeutet dass sie das
Endergebnis evtl. nicht mit anbieten, aber Zwischenergebnisse stattdessen
evtl. in \c{ST0} ablegen. Wenn das geschieht, setzen sie das C2-Flag im
FPU Status-Wort; um deshalb einen Rest zu berechnen, solltest du \c{FPREM}
oder \c{FPREM1} �fters ausführen bis C2 gelöscht ist.

\H{insFRNDINT} \i\c{FRNDINT}: Floating-Point Round to Integer

\c FRNDINT                       ; D9 FC                [8086,FPU]

\c{FRNDINT} rundet den Inhalt von \c{ST0} zu einer Ganzzahl, abhängig
vom aktuellen Rundungsmodus, der im FPU Kontroll-Wort gesetzt ist und
speichert das Ergebnis in \c{ST0} zurück.

\H{insFRSTOR} \i\c{FSAVE}, \i\c{FRSTOR}: Save/Restore Floating-Point State

\c FSAVE mem                     ; 9B DD /6             [8086,FPU]
\c FNSAVE mem                    ; DD /6                [8086,FPU]

\c FRSTOR mem                    ; DD /4                [8086,FPU]

\c{FSAVE} speichert den vollständigen Fließkomma-Status, inklusive allen
Informationen die von \c{FSTENV} (\k{insFSTENV}) gespeichert werden plus
allen Registern, in ein 94 oder 108 Byte Speicherbereich (abhängig vom CPU-
Modus). \c{FRSTOR} stellt den Fließkomma-Status vom selben Speicherbereich
wieder her.

\c{FNSAVE} tut dasselbe wie \c{FSAVE} ohne zuerst auf laufende Fließ-
komma-Ausnahmen zu warten und zu löschen.

\H{insFSCALE} \i\c{FSCALE}: Scale Floating-Point Value by Power of Two

\c FSCALE                        ; D9 FD                [8086,FPU]

\c{FSCALE} skaliert eine Zahl um zwei: es rundet \c{ST1} gegen Null
um eine Ganzzahl zu erhalten, dann multipliziert es \c{ST0} mit zwei
bis zur Ganzzahl und speichert das Ergebnis in \c{ST0}.

\H{insFSETPM} \i\c{FSETPM}: Set Protected Mode

\c FSETPM                        ; DB E4                [286,FPU]

Diese Instruktion initialisiert den Protected Modus auf dem 287er
Fließkomma Koprozessor. Es hat nur auf diesem Prozessor Bedeutung:
387er und höhere behandeln diese Instruktion als No-OPeration (NOP).

\H{insFSIN} \i\c{FSIN}, \i\c{FSINCOS}: Sine and Cosine

\c FSIN                          ; D9 FE                [386,FPU]
\c FSINCOS                       ; D9 FB                [386,FPU]

\c{FSIN} berechnet den Sinus von \c{ST0} (in Radianten) und speichert
das Ergebnis in \c{ST0}. \c{FSINCOS} tut dasselbe, legt aber dann den
Kosinus desselben Wertes auf den Register-Stack ab, so dass der Sinus
in \c{ST1} und der Kosinus in \c{ST0} abgelegt ist. \c{FSINCOS} ist
schneller als wenn \c{FSIN} und \c{FCOS} (siehe \k{insFCOS}) hinter-
einander ausgeführt werden würde.

\H{insFSQRT} \i\c{FSQRT}: Floating-Point Square Root

\c FSQRT                         ; D9 FA                [8086,FPU]

\c{FSQRT} berechnet die Quadratwurzel von \c{ST0} und speichert das
Ergebnis in \c{ST0}.

\H{insFST} \i\c{FST}, \i\c{FSTP}: Floating-Point Store

\c FST mem32                     ; D9 /2                [8086,FPU]
\c FST mem64                     ; DD /2                [8086,FPU]
\c FST fpureg                    ; DD D0+r              [8086,FPU]

\c FSTP mem32                    ; D9 /3                [8086,FPU]
\c FSTP mem64                    ; DD /3                [8086,FPU]
\c FSTP mem80                    ; DB /0                [8086,FPU]
\c FSTP fpureg                   ; DD D8+r              [8086,FPU]

\c{FST} speichert den Wert in \c{ST0} in den angegebenen Speicherort
oder in ein anderes FPU-Register. \c{FSTP} tut dasselbe, aber holt
danach den Register-Stack ab.

\H{insFSTCW} \i\c{FSTCW}: Store Floating-Point Control Word

\c FSTCW mem16                   ; 9B D9 /0             [8086,FPU]
\c FNSTCW mem16                  ; D9 /0                [8086,FPU]

\c{FSTCW} speichert das FPU-Kontroll-Wort (welches dinge wie den Rundungs-
Modus, die Genauigkeit und die Ausnahme-Masken regelt) in einen 2-Byte
Speicherbereich. Siehe auch \c{FLDCW} (\k{insFLDCW}).

\c{FNSTCW} tut dasselbe wie \c{FSTCW}, ohne zuerst auf die Beendigung noch
laufender Fließkomma-Ausnahmen zu warten.

\H{insFSTENV} \i\c{FSTENV}: Store Floating-Point Environment

\c FSTENV mem                    ; 9B D9 /6             [8086,FPU]
\c FNSTENV mem                   ; D9 /6                [8086,FPU]

\c{FSTENV} speichert die FPU-Umgebungsvariablen (Kontroll-Wort,
Status-Wort, Tag-Wort, Instruction-Pointer, Data-Pointer und den
letzten Opcode) in den Speicher. Der Speicherbereich ist 14 oder
28 Bytes lang, abhängig vom aktuellen CPU-Modus. Siehe auch
\c{FLDENV} (\k{insFLDENV}).

\c{FNSTENV} tut dasselbe wie \c{FSTENV}, ohne zuerst auf Beendigung noch
laufender Fließkomma-Ausnahmen zu warten.

\H{insFSTSW} \i\c{FSTSW}: Store Floating-Point Status Word

\c FSTSW mem16                   ; 9B DD /0             [8086,FPU]
\c FSTSW AX                      ; 9B DF E0             [286,FPU]

\c FNSTSW mem16                  ; DD /0                [8086,FPU]
\c FNSTSW AX                     ; DF E0                [286,FPU]

\c{FSTSW} speichert das FPU-Status-Wort in \c{AX} oder in einen 2-Byte
Speicherbereich.

\c{FNSTSW} tut dasselbe wie \c{FSTSW}, ohne zuerst auf Beendigung noch
laufender Fließkomma-Ausnahmen zu warten.

\H{insFSUB} \i\c{FSUB}, \i\c{FSUBP}, \i\c{FSUBR}, \i\c{FSUBRP}: Floating-Point Subtract

\c FSUB mem32                    ; D8 /4                [8086,FPU]
\c FSUB mem64                    ; DC /4                [8086,FPU]

\c FSUB fpureg                   ; D8 E0+r              [8086,FPU]
\c FSUB ST0,fpureg               ; D8 E0+r              [8086,FPU]

\c FSUB TO fpureg                ; DC E8+r              [8086,FPU]
\c FSUB fpureg,ST0               ; DC E8+r              [8086,FPU]

\c FSUBR mem32                   ; D8 /5                [8086,FPU]
\c FSUBR mem64                   ; DC /5                [8086,FPU]

\c FSUBR fpureg                  ; D8 E8+r              [8086,FPU]
\c FSUBR ST0,fpureg              ; D8 E8+r              [8086,FPU]

\c FSUBR TO fpureg               ; DC E0+r              [8086,FPU]
\c FSUBR fpureg,ST0              ; DC E0+r              [8086,FPU]

\c FSUBP fpureg                  ; DE E8+r              [8086,FPU]
\c FSUBP fpureg,ST0              ; DE E8+r              [8086,FPU]

\c FSUBRP fpureg                 ; DE E0+r              [8086,FPU]
\c FSUBRP fpureg,ST0             ; DE E0+r              [8086,FPU]

\c{FSUB} subtrahiert den angegebenen Operanden von \c{ST0} und speichert
das Ergebnis zurück in \c{ST0}, außer der \c{TO}-Bezeichner ist mit an-
gegeben, in diesem Fall subtrahiert es \c{ST0} vom angegebenen Operanden
und speichert das Ergebnis in den Operanden.

\c{FSUBR} tut dasselbe, aber subtrahiert in umgekehrter weise: wenn \c{TO}
nicht mit angegeben ist, subtrahiert es \c{ST0} vom angegebenen Operanden
und speichert das Ergebnis in \c{ST0}, wenn jedoch \c{TO} mit angegeben ist,
subtrahiert es seinen Operanden von \c{ST0} und speichert das Ergebnis in
den Operanden.

\c{FSUBP} arbeitet wie \c{FSUB TO}, aber holt den Register-Stack nach
Beendigung ab. \c{FSUBRP} arbeitet wie \c{FSUBR TO}, aber holt den
Register-Stack nach Beendigung ab.

\H{insFTST} \i\c{FTST}: Test \c{ST0} Against Zero

\c FTST                          ; D9 E4                [8086,FPU]

\c{FTST} vergleicht \c{ST0} mit Null und setzt die entsprechenden
FPU-Flags. \c{ST0} wird als die Linke-Seite des Vergleichs behandelt,
sodass ein 'kleiner-als'-Ergebnis erzeugt wird, wenn \c{ST0} negativ
ist.

\H{insFUCOM} \i\c{FUCOMxx}: Floating-Point Unordered Compare

\c FUCOM fpureg                  ; DD E0+r              [386,FPU]
\c FUCOM ST0,fpureg              ; DD E0+r              [386,FPU]

\c FUCOMP fpureg                 ; DD E8+r              [386,FPU]
\c FUCOMP ST0,fpureg             ; DD E8+r              [386,FPU]

\c FUCOMPP                       ; DA E9                [386,FPU]

\c FUCOMI fpureg                 ; DB E8+r              [P6,FPU]
\c FUCOMI ST0,fpureg             ; DB E8+r              [P6,FPU]

\c FUCOMIP fpureg                ; DF E8+r              [P6,FPU]
\c FUCOMIP ST0,fpureg            ; DF E8+r              [P6,FPU]

\c{FUCOM} vergleicht \c{ST0} mit dem angegebenen Operanden und setzt
die entsprechenden FPU-Flags. \c{ST0} wird als die Linke-Seite des
Vergleichs behandelt, so dass das Carry-Flag gesetzt wird (für ein
'kleiner-als'-Ergebnis) wenn \c{ST0} kleiner als der angegebene Operand
ist.

\c{FUCOMP} tut dasselbe wie \c{FUCOM}, aber holt danach den Register-
Stack ab. \c{FUCOMPP} vergleicht \c{ST0} mit \c{ST1} und holt dann den
Register-Stack zweimal ab.

\c{FUCOMI} und \c{FUCOMIP} arbeiten wie die entsprechenden formen von
\c{FUCOM} und \c{FUCOMP}, aber schreibt ihre Ergebnisse direkt in die
CPU-Flags-Register und nicht ins FPU-Status-Wort, sodass ihnen direkt
danach bedingte Sprung- oder bedingte 'move'-Instruktionen folgen können.

Die \c{FUCOM} Instruktionen unterscheiden sich von den \c{FCOM}
Instruktionen nur auf die Weise, wie sie NaN's (Not a Number) behandeln:
\c{FUCOM} wird sie still behandeln und die Condition-Code-Flags in ein
'ungeordnetes' Ergebnis speichern, wogegen \c{FCOM} eine Ausnahme erzeugen
wird.

\H{insFXAM} \i\c{FXAM}: Examine Class of Value in \c{ST0}

\c FXAM                          ; D9 E5                [8086,FPU]

\c{FXAM} setzt die FPU-Flags C3, C2 und C0 abhängig von dem Wert, der in
\c{ST0} gespeichert ist: 000 für ein nicht unterstütztes Format, 001 für
ein NaN (Not a Number), 010 für eine normale endliche Zahl, 011 für eine
unendliche, 100 für eine Null, 101 für ein leeres Register und 110 für
eine unnormale. Es setzt auch das C1-Flag nach dem Vorzeichen der Zahl.

\H{insFXCH} \i\c{FXCH}: Floating-Point Exchange

\c FXCH                          ; D9 C9                [8086,FPU]
\c FXCH fpureg                   ; D9 C8+r              [8086,FPU]
\c FXCH fpureg,ST0               ; D9 C8+r              [8086,FPU]
\c FXCH ST0,fpureg               ; D9 C8+r              [8086,FPU]

\c{FXCH} tauscht \c{ST0} mit dem angegebenen FPU-Register aus. Die
keine-Operanden-Form tauscht \c{ST0} mit \c{ST1} aus.

\H{insFXTRACT} \i\c{FXTRACT}: Extract Exponent and Significand

\c FXTRACT                       ; D9 F4                [8086,FPU]

\c{FXTRACT} teilt die Zahl in \c{ST0} in seinen Exponenten und Mantisse
auf, speichert den Exponenten in \c{ST0} zurück und legt die Mantisse
auf den Register-Stack ab (sodass die Mantisse in \c{ST0} und der
Exponent in \c{ST1} ist).

\H{insFYL2X} \i\c{FYL2X}, \i\c{FYL2XP1}: Compute Y times Log2(X) or Log2(X+1)

\c FYL2X                         ; D9 F1                [8086,FPU]
\c FYL2XP1                       ; D9 F9                [8086,FPU]

\c{FYL2X} multipliziert \c{ST1}  mit dem Basis-2 Logarithmus von \c{ST0},
speichert das Ergebnis in \c{ST1} und holt den Register-Stack ab (so dass
das Ergebnis in \c{ST0} ist). \c{ST0} muss ungleich Null und Positiv sein.

\c{FYL2XP1} arbeitet auf dieselbe weise, aber tauscht den Basis-2
Logarithmus von \c{ST0} mit dem von \c{ST0}+1 aus. \c{ST0} darf nicht
größer als 1 Minus der halben Quadratqurzel von zwei sein.

\H{insHLT} \i\c{HLT}: Halt Processor

\c HLT                           ; F4                   [8086]

\c{HLT} versetzt den Prozessor in einen Haltezustand, wo er keine
weiteren Operation mehr ausführt, solange bis er von einem Interrupt
oder einem Reset neugestartet wird.

\H{insIBTS} \i\c{IBTS}: Insert Bit String

\c IBTS r/m16,reg16              ; o16 0F A7 /r         [386,UNDOC]
\c IBTS r/m32,reg32              ; o32 0F A7 /r         [386,UNDOC]

Es scheint keine eindeutige Dokumentation für diese Instruktion zu geben:
das beste was ich darüber gefunden habe, war folgendes 'Nimmt vom zweiten
Operanden einen String von Bits und legt ihn in den ersten Operanden ab'.
Er ist für \c{CMPXCHG486}. NASM unterstützt ihn nur zur Vollständigkeit.
Sein Gegenstück ist \c{XBTS} (siehe \k{insXBTS}).

\H{insIDIV} \i\c{IDIV}: Signed Integer Divide

\c IDIV r/m8                     ; F6 /7                [8086]
\c IDIV r/m16                    ; o16 F7 /7            [8086]
\c IDIV r/m32                    ; o32 F7 /7            [386]

\c{IDIV} führt eine Vorzeichenbehaftete Ganzzahl-Division durch. Der
explizite angegebene Operand ist der Teiler; Die Dividenden und Ziel-
Operanden sind auf folgender Weise impliziert:

\b Für \c{IDIV r/m8}, \c{AX} wird vom angegebene Operand dividiert;
der Quotient wird in \c{AL} und der Rest in \c{AH} gespeichert.

\b Für \c{IDIV r/m16}, \c{DX:AX} wird vom angegebene Operand dividiert;
der Quotient wird in \c{AX} und der Rest in \c{DX} gespeichert.

\b Für \c{IDIV r/m32}, \c{EDX:EAX} wird vom angegebenen Operand dividiert;
der Quotient wird in \c{EAX} und der Rest in \c{EDX} gespeichert.

Nicht vorzeichenbehaftete Ganzzahl-Division wird mit der \c{DIV}
Instruktion durchgeführt: siehe \k{insDIV}.

\H{insIMUL} \i\c{IMUL}: Signed Integer Multiply

\c IMUL r/m8                     ; F6 /5                [8086]
\c IMUL r/m16                    ; o16 F7 /5            [8086]
\c IMUL r/m32                    ; o32 F7 /5            [386]

\c IMUL reg16,r/m16              ; o16 0F AF /r         [386]
\c IMUL reg32,r/m32              ; o32 0F AF /r         [386]

\c IMUL reg16,imm8               ; o16 6B /r ib         [286]
\c IMUL reg16,imm16              ; o16 69 /r iw         [286]
\c IMUL reg32,imm8               ; o32 6B /r ib         [386]
\c IMUL reg32,imm32              ; o32 69 /r id         [386]

\c IMUL reg16,r/m16,imm8         ; o16 6B /r ib         [286]
\c IMUL reg16,r/m16,imm16        ; o16 69 /r iw         [286]
\c IMUL reg32,r/m32,imm8         ; o32 6B /r ib         [386]
\c IMUL reg32,r/m32,imm32        ; o32 69 /r id         [386]

\c{IMUL} führt eine vorzeichenbehaftete Ganzzahl-Multiplikation aus.
Für die Ein-Operanden-Form sind der andere Operand und das Ziel auf
folgende Weise impliziert:

\b Für \c{IMUL r/m8}, \c{AL} wird mit dem angegebenen Operand multipliziert;
das Produkt wird in \c{AX} gespeichert.

\b Für \c{IMUL r/m16}, \c{AX} wird mit dem angegebenen Operand multipliziert;
das Produkt wird in \c{DX:AX} gespeichert.

\b Für \c{IMUL r/m32}, \c{EAX} wird mit dem angegebenen Operand multipliziert;
das produkt wird in \c{EDX:EAX} gespeichert.

Die Zwei-Operanden-Form multipliziert seine zwei Operanden und speichert
das Ergebnis in seinen Ziel-Operanden (ersten). Die Drei-Operanden-Form
multipliziert seine letzten zwei Operanden und speichert das Ergebnis in
den ersten Operanden.

Die Zwei-Operanden-Form ist eine Kurzform für die Drei-Operanden-Form,
wie man bei der Überprüfung der Opcode-Beschreibung sehen kann: in der
Zwei-Operanden-Form nimmt der \c{/r}-Code beides, seine Register und
\c{r/m} Teile vom selben Operanden (dem ersten).

In der Form mit dem direkten 8-Bit-Operanden und anderen längeren
Quell-Operanden wird der direkte Operand als vorzeichenbehaftet
angesehen und wird auf die länge des anderen Quell-Operanden Vorzeichen-
erweitert. In diesen Fällen wird der \c{BYTE}-Bezeichner benötigt, damit
NASM diese Form der Instruktion zu verarbeiten kann.

Vorzeichenlose Ganzzahl-Multiplikation wird mit der \c{MUL} Instruktion
ausgeführt: siehe \k{insMUL}.

\H{insIN} \i\c{IN}: Input from I/O Port

\c IN AL,imm8                    ; E4 ib                [8086]
\c IN AX,imm8                    ; o16 E5 ib            [8086]
\c IN EAX,imm8                   ; o32 E5 ib            [386]
\c IN AL,DX                      ; EC                   [8086]
\c IN AX,DX                      ; o16 ED               [8086]
\c IN EAX,DX                     ; o32 ED               [386]

\c{IN} liest ein Byte, Wort oder Doppelwort vom spezifizierten I/O-Port
und speichert ihn in den angegebenen Ziel-Register. Die Port-Nummer kann
als ein direkter Wert spezifiert sein, wenn es zwischen 0 un 255 ist,
sonst muss er in \c{DX} gespeichert sein. Siehe auch \c{OUT} (\k{insOUT}).

\H{insINC} \i\c{INC}: Increment Integer

\c INC reg16                     ; o16 40+r             [8086]
\c INC reg32                     ; o32 40+r             [386]
\c INC r/m8                      ; FE /0                [8086]
\c INC r/m16                     ; o16 FF /0            [8086]
\c INC r/m32                     ; o32 FF /0            [386]

\c{INC} addiert 1 zu seinem Operanden hinzu. Es hat keinen Einfluss auf
das Carry-Flag: um das Carry-Flag zu beeinflussen, benutze \c{ADD etwas, 1}
(siehe \k{insADD}). Siehe auch \c{DEC} (\k{insDEC}).

\H{insINSB} \i\c{INSB}, \i\c{INSW}, \i\c{INSD}: Input String from I/O Port

\c INSB                          ; 6C                   [186]
\c INSW                          ; o16 6D               [186]
\c INSD                          ; o32 6D               [386]

\c{INSB} nimmt ein Byte vom I/O-Port der in \c{DX} spezifiziert ist und
speichert ihn nach \c{[ES:DI]} oder \c{[ES:EDI]}. Dann inkrementiert oder
dekrementiert es \c{DI} oder \c{EDI} (abhängig vom 'Direction-Flag':
inkrementiert, wenn das Flag gelöscht ist, dekrementiert wenn es gesetzt
ist).

Das benutzte Register ist \c{DI}, wenn die Adressgröße 16 Bits ist und
\c{EDI} wenn sie 32 Bits ist. Wenn du eine Adressgröße brauchst die
ungleich der aktuellen \c{BITS}-Einstellung ist, kannst du einen exliziten
\i\c{a16} oder \i\c{a32} Prefix benutzen.

Segment-�berschreibende Prefixe haben für diese Instruktion keine Auswirkung:
Die Nutzung von \c{ES} für das laden von \c{[DI]} oder \c{[EDI]} kann nicht
überschrieben werden.

\c{INSW} und \c{INSD} arbeitet auf dieselbe Weise, aber deren Eingabe
ist ein Wort oder ein Doppelwort, anstelle eines Bytes und erhöht oder
verringert den Adressregister um 2 oder 4, anstelle von 1.

Das \c{REP} Prefix kann benutzt werden um die Instruktion \c{CX}-mal (oder
\c{ECX} - Die Adressgröße entscheidet, welches benutzt wird) auszuführen.

Siehe auch \c{OUTSB}, \c{OUTSW} und \c{OUTSD} (\k{insOUTSB}).

\H{insINT} \i\c{INT}: Software Interrupt

\c INT imm8                      ; CD ib                [8086]

\c{INT} verursacht einen Software-Interrupt durch eine spezifizierte
Vektor-Zahl von 0 bis 255.

Der Code der durch die \c{INT} Instruktion erzeugt wird ist immer zwei
Bytes lang: Obwohl es Kurzformen für einige \c{INT} Instruktionen gibt,
erzeugt NASM diese nicht, wenn es die \c{INT}-mnemonic sieht. Um Ein-Byte-
Haltepunkt-Instruktionen zu erzeugen, nutze stattdessen die \c{INT3} oder
\c{INT1} Instruktionen (Siehe \k{insINT1}).

\H{insINT1} \i\c{INT3}, \i\c{INT1}, \i\c{ICEBP}, \i\c{INT01}: Breakpoints

\c INT1                          ; F1                   [P6]
\c ICEBP                         ; F1                   [P6]
\c INT01                         ; F1                   [P6]

\c INT3                          ; CC                   [8086]

\c{INT1} und \c{INT3} sind kurze Ein-Byte-Formen der \c{INT 1} und
\c{INT 3} (siehe \k{insINT}) Instruktionen. Sie führen eine ähnliche
Funktion wie ihre längeren Gegenstücke aus, aber nehmen weniger Code
ein. Sie werden als Haltepunkte von Debuggern benutzt.

\c{INT1} und sein alternatives Synonym \c{INT01} und \c{ICEBP} ist
eine Instruktion die von 'In-Curcuit'-Emulatoren (ICEs) verwendet
werden. Obwohl es nicht dokumentiert ist, ist es auf einigen
Prozessoren bis runter zum 286er verfügbar, aber es ist nur beim
Pentium Pro dokumentiert. \c{INT3} ist die Instruktion, die
normalerweise als Haltepunkt von Debuggern benutzt wird.

\c{INT3} ist nicht absolut äquivalent zu \c{INT 3}: Die Kurzform,
seit es als Haltepunkt genutzt wird, umgeht die normalen IOPL-Checks
im Virtual-8086-Modus und geht auch nicht durch die Interrupt-Weiterleitung.

\H{insINTO} \i\c{INTO}: Interrupt if Overflow

\c INTO                          ; CE                   [8086]

\C{INTO} führt einen \c{INT 4} Software-Interrupt durch (siehe \k{insINT}),
jedoch nur wenn das Overflow-Flag gesetzt ist.

\H{insINVD} \i\c{INVD}: Invalidate Internal Caches

\c INVD                          ; 0F 08                [486]

\c{INVD} neutralisiert und löscht die internen Prozessor-Chaches und
veranlasst den Prozessor dazu dasselbe mit dem externen Cache zu machen.
Es schreibt nicht zuerst den Inhalt des Caches in den Speicher: Alle
modifizierten Daten, die im Cache gehalten wurden, gehen verloren. Um
zuerst die Daten zurückzuschreiben, benutze \c{WBINVD} (\k{insWBINVD}).

\H{insINVLPG} \i\c{INVLPG}: Invalidate TLB Entry

\c INVLPG mem                    ; 0F 01 /0             [486]

\c{INVLPG} neutralisiert den Translation-Lookahead-Buffer (TLB) Eintrag,
der mit der Speicheradresse verbunden ist.

\H{insIRET} \i\c{IRET}, \i\c{IRETW}, \i\c{IRETD}: Return from Interrupt

\c IRET                          ; CF                   [8086]
\c IRETW                         ; o16 CF               [8086]
\c IRETD                         ; o32 CF               [386]

\c{IRET} returns from an interrupt (hardware or software) by means
of popping \c{IP} (or \c{EIP}), \c{CS} and the flags off the stack
and then continuing execution from the new \c{CS:IP}.

\c{IRETW} pops \c{IP}, \c{CS} and the flags as 2 bytes each, taking
6 bytes off the stack in total. \c{IRETD} pops \c{EIP} as 4 bytes,
pops a further 4 bytes of which the top two are discarded and the
bottom two go into \c{CS}, and pops the flags as 4 bytes as well,
taking 12 bytes off the stack.

\c{IRET} is a shorthand for either \c{IRETW} or \c{IRETD}, depending
on the default \c{BITS} setting at the time.

\H{insJCXZ} \i\c{JCXZ}, \i\c{JECXZ}: Jump if CX/ECX Zero

\c JCXZ imm                      ; o16 E3 rb            [8086]
\c JECXZ imm                     ; o32 E3 rb            [386]

\c{JCXZ} performs a short jump (with maximum range 128 bytes) if and
only if the contents of the \c{CX} register is 0. \c{JECXZ} does the
same thing, but with \c{ECX}.

\H{insJMP} \i\c{JMP}: Jump

\c JMP imm                       ; E9 rw/rd             [8086]
\c JMP SHORT imm                 ; EB rb                [8086]
\c JMP imm:imm16                 ; o16 EA iw iw         [8086]
\c JMP imm:imm32                 ; o32 EA id iw         [386]
\c JMP FAR mem                   ; o16 FF /5            [8086]
\c JMP FAR mem                   ; o32 FF /5            [386]
\c JMP r/m16                     ; o16 FF /4            [8086]
\c JMP r/m32                     ; o32 FF /4            [386]

\c{JMP} jumps to a given address. The address may be specified as an
absolute segment and offset, or as a relative jump within the
current segment.

\c{JMP SHORT imm} has a maximum range of 128 bytes, since the
displacement is specified as only 8 bits, but takes up less code
space. NASM does not choose when to generate \c{JMP SHORT} for you:
you must explicitly code \c{SHORT} every time you want a short jump.

You can choose between the two immediate \i{far jump} forms (\c{JMP
imm:imm}) by the use of the \c{WORD} and \c{DWORD} keywords: \c{JMP
WORD 0x1234:0x5678}) or \c{JMP DWORD 0x1234:0x56789abc}.

The \c{JMP FAR mem} forms execute a far jump by loading the
destination address out of memory. The address loaded consists of 16
or 32 bits of offset (depending on the operand size), and 16 bits of
segment. The operand size may be overridden using \c{JMP WORD FAR
mem} or \c{JMP DWORD FAR mem}.

The \c{JMP r/m} forms execute a \i{near jump} (within the same
segment), loading the destination address out of memory or out of a
register. The keyword \c{NEAR} may be specified, for clarity, in
these forms, but is not necessary. Again, operand size can be
overridden using \c{JMP WORD mem} or \c{JMP DWORD mem}.

As a convenience, NASM does not require you to jump to a far symbol
by coding the cumbersome \c{JMP SEG routine:routine}, but instead
allows the easier synonym \c{JMP FAR routine}.

The \c{CALL r/m} forms given above are near calls; NASM will accept
the \c{NEAR} keyword (e.g. \c{CALL NEAR [address]}), even though it
is not strictly necessary.

\H{insJcc} \i\c{Jcc}: Conditional Branch

\c Jcc imm                       ; 70+cc rb             [8086]
\c Jcc NEAR imm                  ; 0F 80+cc rw/rd       [386]

The \i{conditional jump} instructions execute a near (same segment)
jump if and only if their conditions are satisfied. For example,
\c{JNZ} jumps only if the zero flag is not set.

The ordinary form of the instructions has only a 128-byte range; the
\c{NEAR} form is a 386 extension to the instruction set, and can
span the full size of a segment. NASM will not override your choice
of jump instruction: if you want \c{Jcc NEAR}, you have to use the
\c{NEAR} keyword.

The \c{SHORT} keyword is allowed on the first form of the
instruction, for clarity, but is not necessary.

\H{insLAHF} \i\c{LAHF}: Load AH from Flags

\c LAHF                          ; 9F                   [8086]

\c{LAHF} sets the \c{AH} register according to the contents of the
low byte of the flags word. See also \c{SAHF} (\k{insSAHF}).

\H{insLAR} \i\c{LAR}: Load Access Rights

\c LAR reg16,r/m16               ; o16 0F 02 /r         [286,PRIV]
\c LAR reg32,r/m32               ; o32 0F 02 /r         [286,PRIV]

\c{LAR} takes the segment selector specified by its source (second)
operand, finds the corresponding segment descriptor in the GDT or
LDT, and loads the access-rights byte of the descriptor into its
destination (first) operand.

\H{insLDS} \i\c{LDS}, \i\c{LES}, \i\c{LFS}, \i\c{LGS}, \i\c{LSS}: Load Far Pointer

\c LDS reg16,mem                 ; o16 C5 /r            [8086]
\c LDS reg32,mem                 ; o32 C5 /r            [8086]

\c LES reg16,mem                 ; o16 C4 /r            [8086]
\c LES reg32,mem                 ; o32 C4 /r            [8086]

\c LFS reg16,mem                 ; o16 0F B4 /r         [386]
\c LFS reg32,mem                 ; o32 0F B4 /r         [386]

\c LGS reg16,mem                 ; o16 0F B5 /r         [386]
\c LGS reg32,mem                 ; o32 0F B5 /r         [386]

\c LSS reg16,mem                 ; o16 0F B2 /r         [386]
\c LSS reg32,mem                 ; o32 0F B2 /r         [386]

These instructions load an entire far pointer (16 or 32 bits of
offset, plus 16 bits of segment) out of memory in one go. \c{LDS},
for example, loads 16 or 32 bits from the given memory address into
the given register (depending on the size of the register), then
loads the \e{next} 16 bits from memory into \c{DS}. \c{LES},
\c{LFS}, \c{LGS} and \c{LSS} work in the same way but use the other
segment registers.

\H{insLEA} \i\c{LEA}: Load Effective Address

\c LEA reg16,mem                 ; o16 8D /r            [8086]
\c LEA reg32,mem                 ; o32 8D /r            [8086]

\c{LEA}, despite its syntax, does not access memory. It calculates
the effective address specified by its second operand as if it were
going to load or store data from it, but instead it stores the
calculated address into the register specified by its first operand.
This can be used to perform quite complex calculations (e.g. \c{LEA
EAX,[EBX+ECX*4+100]}) in one instruction.

\c{LEA}, despite being a purely arithmetic instruction which
accesses no memory, still requires square brackets around its second
operand, as if it were a memory reference.

\H{insLEAVE} \i\c{LEAVE}: Destroy Stack Frame

\c LEAVE                         ; C9                   [186]

\c{LEAVE} destroys a stack frame of the form created by the
\c{ENTER} instruction (see \k{insENTER}). It is functionally
equivalent to \c{MOV ESP,EBP} followed by \c{POP EBP} (or \c{MOV
SP,BP} followed by \c{POP BP} in 16-bit mode).

\H{insLGDT} \i\c{LGDT}, \i\c{LIDT}, \i\c{LLDT}: Load Descriptor Tables

\c LGDT mem                      ; 0F 01 /2             [286,PRIV]
\c LIDT mem                      ; 0F 01 /3             [286,PRIV]
\c LLDT r/m16                    ; 0F 00 /2             [286,PRIV]

\c{LGDT} and \c{LIDT} both take a 6-byte memory area as an operand:
they load a 32-bit linear address and a 16-bit size limit from that
area (in the opposite order) into the GDTR (global descriptor table
register) or IDTR (interrupt descriptor table register). These are
the only instructions which directly use \e{linear} addresses,
rather than segment/offset pairs.

\c{LLDT} takes a segment selector as an operand. The processor looks
up that selector in the GDT and stores the limit and base address
given there into the LDTR (local descriptor table register).

See also \c{SGDT}, \c{SIDT} and \c{SLDT} (\k{insSGDT}).

\H{insLMSW} \i\c{LMSW}: Load/Store Machine Status Word

\c LMSW r/m16                    ; 0F 01 /6             [286,PRIV]

\c{LMSW} loads the bottom four bits of the source operand into the
bottom four bits of the \c{CR0} control register (or the Machine
Status Word, on 286 processors). See also \c{SMSW} (\k{insSMSW}).

\H{insLOADALL} \i\c{LOADALL}, \i\c{LOADALL286}: Load Processor State

\c LOADALL                       ; 0F 07                [386,UNDOC]
\c LOADALL286                    ; 0F 05                [286,UNDOC]

This instruction, in its two different-opcode forms, is apparently
supported on most 286 processors, some 386 and possibly some 486.
The opcode differs between the 286 and the 386.

The function of the instruction is to load all information relating
to the state of the processor out of a block of memory: on the 286,
this block is located implicitly at absolute address \c{0x800}, and
on the 386 and 486 it is at \c{[ES:EDI]}.

\H{insLODSB} \i\c{LODSB}, \i\c{LODSW}, \i\c{LODSD}: Load from String

\c LODSB                         ; AC                   [8086]
\c LODSW                         ; o16 AD               [8086]
\c LODSD                         ; o32 AD               [386]

\c{LODSB} loads a byte from \c{[DS:SI]} or \c{[DS:ESI]} into \c{AL}.
It then increments or decrements (depending on the direction flag:
increments if the flag is clear, decrements if it is set) \c{SI} or
\c{ESI}.

The register used is \c{SI} if the address size is 16 bits, and
\c{ESI} if it is 32 bits. If you need to use an address size not
equal to the current \c{BITS} setting, you can use an explicit
\i\c{a16} or \i\c{a32} prefix.

The segment register used to load from \c{[SI]} or \c{[ESI]} can be
overridden by using a segment register name as a prefix (for
example, \c{es lodsb}).

\c{LODSW} and \c{LODSD} work in the same way, but they load a
word or a doubleword instead of a byte, and increment or decrement
the addressing registers by 2 or 4 instead of 1.

\H{insLOOP} \i\c{LOOP}, \i\c{LOOPE}, \i\c{LOOPZ}, \i\c{LOOPNE}, \i\c{LOOPNZ}: Loop with Counter

\c LOOP imm                      ; E2 rb                [8086]
\c LOOP imm,CX                   ; a16 E2 rb            [8086]
\c LOOP imm,ECX                  ; a32 E2 rb            [386]

\c LOOPE imm                     ; E1 rb                [8086]
\c LOOPE imm,CX                  ; a16 E1 rb            [8086]
\c LOOPE imm,ECX                 ; a32 E1 rb            [386]
\c LOOPZ imm                     ; E1 rb                [8086]
\c LOOPZ imm,CX                  ; a16 E1 rb            [8086]
\c LOOPZ imm,ECX                 ; a32 E1 rb            [386]

\c LOOPNE imm                    ; E0 rb                [8086]
\c LOOPNE imm,CX                 ; a16 E0 rb            [8086]
\c LOOPNE imm,ECX                ; a32 E0 rb            [386]
\c LOOPNZ imm                    ; E0 rb                [8086]
\c LOOPNZ imm,CX                 ; a16 E0 rb            [8086]
\c LOOPNZ imm,ECX                ; a32 E0 rb            [386]

\c{LOOP} decrements its counter register (either \c{CX} or \c{ECX} -
if one is not specified explicitly, the \c{BITS} setting dictates
which is used) by one, and if the counter does not become zero as a
result of this operation, it jumps to the given label. The jump has
a range of 128 bytes.

\c{LOOPE} (or its synonym \c{LOOPZ}) adds the additional condition
that it only jumps if the counter is nonzero \e{and} the zero flag
is set. Similarly, \c{LOOPNE} (and \c{LOOPNZ}) jumps only if the
counter is nonzero and the zero flag is clear.

\H{insLSL} \i\c{LSL}: Load Segment Limit

\c LSL reg16,r/m16               ; o16 0F 03 /r         [286,PRIV]
\c LSL reg32,r/m32               ; o32 0F 03 /r         [286,PRIV]

\c{LSL} is given a segment selector in its source (second) operand;
it computes the segment limit value by loading the segment limit
field from the associated segment descriptor in the GDT or LDT.
(This involves shifting left by 12 bits if the segment limit is
page-granular, and not if it is byte-granular; so you end up with a
byte limit in either case.) The segment limit obtained is then
loaded into the destination (first) operand.

\H{insLTR} \i\c{LTR}: Load Task Register

\c LTR r/m16                     ; 0F 00 /3             [286,PRIV]

\c{LTR} looks up the segment base and limit in the GDT or LDT
descriptor specified by the segment selector given as its operand,
and loads them into the Task Register.

\H{insMOV} \i\c{MOV}: Move Data

\c MOV r/m8,reg8                 ; 88 /r                [8086]
\c MOV r/m16,reg16               ; o16 89 /r            [8086]
\c MOV r/m32,reg32               ; o32 89 /r            [386]
\c MOV reg8,r/m8                 ; 8A /r                [8086]
\c MOV reg16,r/m16               ; o16 8B /r            [8086]
\c MOV reg32,r/m32               ; o32 8B /r            [386]

\c MOV reg8,imm8                 ; B0+r ib              [8086]
\c MOV reg16,imm16               ; o16 B8+r iw          [8086]
\c MOV reg32,imm32               ; o32 B8+r id          [386]
\c MOV r/m8,imm8                 ; C6 /0 ib             [8086]
\c MOV r/m16,imm16               ; o16 C7 /0 iw         [8086]
\c MOV r/m32,imm32               ; o32 C7 /0 id         [386]

\c MOV AL,memoffs8               ; A0 ow/od             [8086]
\c MOV AX,memoffs16              ; o16 A1 ow/od         [8086]
\c MOV EAX,memoffs32             ; o32 A1 ow/od         [386]
\c MOV memoffs8,AL               ; A2 ow/od             [8086]
\c MOV memoffs16,AX              ; o16 A3 ow/od         [8086]
\c MOV memoffs32,EAX             ; o32 A3 ow/od         [386]

\c MOV r/m16,segreg              ; o16 8C /r            [8086]
\c MOV r/m32,segreg              ; o32 8C /r            [386]
\c MOV segreg,r/m16              ; o16 8E /r            [8086]
\c MOV segreg,r/m32              ; o32 8E /r            [386]

\c MOV reg32,CR0/2/3/4           ; 0F 20 /r             [386]
\c MOV reg32,DR0/1/2/3/6/7       ; 0F 21 /r             [386]
\c MOV reg32,TR3/4/5/6/7         ; 0F 24 /r             [386]
\c MOV CR0/2/3/4,reg32           ; 0F 22 /r             [386]
\c MOV DR0/1/2/3/6/7,reg32       ; 0F 23 /r             [386]
\c MOV TR3/4/5/6/7,reg32         ; 0F 26 /r             [386]

\c{MOV} copies the contents of its source (second) operand into its
destination (first) operand.

In all forms of the \c{MOV} instruction, the two operands are the
same size, except for moving between a segment register and an
\c{r/m32} operand. These instructions are treated exactly like the
corresponding 16-bit equivalent (so that, for example, \c{MOV
DS,EAX} functions identically to \c{MOV DS,AX} but saves a prefix
when in 32-bit mode), except that when a segment register is moved
into a 32-bit destination, the top two bytes of the result are
undefined.

\c{MOV} may not use \c{CS} as a destination.

\c{CR4} is only a supported register on the Pentium and above.

\H{insMOVD} \i\c{MOVD}: Move Doubleword to/from MMX Register

\c MOVD mmxreg,r/m32             ; 0F 6E /r             [PENT,MMX]
\c MOVD r/m32,mmxreg             ; 0F 7E /r             [PENT,MMX]

\c{MOVD} copies 32 bits from its source (second) operand into its
destination (first) operand. When the destination is a 64-bit MMX
register, the top 32 bits are set to zero.

\H{insMOVQ} \i\c{MOVQ}: Move Quadword to/from MMX Register

\c MOVQ mmxreg,r/m64             ; 0F 6F /r             [PENT,MMX]
\c MOVQ r/m64,mmxreg             ; 0F 7F /r             [PENT,MMX]

\c{MOVQ} copies 64 bits from its source (second) operand into its
destination (first) operand.

\H{insMOVSB} \i\c{MOVSB}, \i\c{MOVSW}, \i\c{MOVSD}: Move String

\c MOVSB                         ; A4                   [8086]
\c MOVSW                         ; o16 A5               [8086]
\c MOVSD                         ; o32 A5               [386]

\c{MOVSB} copies the byte at \c{[ES:SI]} or \c{[ES:ESI]} to
\c{[DS:DI]} or \c{[DS:EDI]}. It then increments or decrements
(depending on the direction flag: increments if the flag is clear,
decrements if it is set) \c{SI} and \c{DI} (or \c{ESI} and \c{EDI}).

The registers used are \c{SI} and \c{DI} if the address size is 16
bits, and \c{ESI} and \c{EDI} if it is 32 bits. If you need to use
an address size not equal to the current \c{BITS} setting, you can
use an explicit \i\c{a16} or \i\c{a32} prefix.

The segment register used to load from \c{[SI]} or \c{[ESI]} can be
overridden by using a segment register name as a prefix (for
example, \c{es movsb}). The use of \c{ES} for the store to \c{[DI]}
or \c{[EDI]} cannot be overridden.

\c{MOVSW} and \c{MOVSD} work in the same way, but they copy a word
or a doubleword instead of a byte, and increment or decrement the
addressing registers by 2 or 4 instead of 1.

The \c{REP} prefix may be used to repeat the instruction \c{CX} (or
\c{ECX} - again, the address size chooses which) times.

\H{insMOVSX} \i\c{MOVSX}, \i\c{MOVZX}: Move Data with Sign or Zero Extend

\c MOVSX reg16,r/m8              ; o16 0F BE /r         [386]
\c MOVSX reg32,r/m8              ; o32 0F BE /r         [386]
\c MOVSX reg32,r/m16             ; o32 0F BF /r         [386]

\c MOVZX reg16,r/m8              ; o16 0F B6 /r         [386]
\c MOVZX reg32,r/m8              ; o32 0F B6 /r         [386]
\c MOVZX reg32,r/m16             ; o32 0F B7 /r         [386]

\c{MOVSX} sign-extends its source (second) operand to the length of
its destination (first) operand, and copies the result into the
destination operand. \c{MOVZX} does the same, but zero-extends
rather than sign-extending.

\H{insMUL} \i\c{MUL}: Unsigned Integer Multiply

\c MUL r/m8                      ; F6 /4                [8086]
\c MUL r/m16                     ; o16 F7 /4            [8086]
\c MUL r/m32                     ; o32 F7 /4            [386]

\c{MUL} performs unsigned integer multiplication. The other operand
to the multiplication, and the destination operand, are implicit, in
the following way:

\b For \c{MUL r/m8}, \c{AL} is multiplied by the given operand; the
product is stored in \c{AX}.

\b For \c{MUL r/m16}, \c{AX} is multiplied by the given operand;
the product is stored in \c{DX:AX}.

\b For \c{MUL r/m32}, \c{EAX} is multiplied by the given operand;
the product is stored in \c{EDX:EAX}.

Signed integer multiplication is performed by the \c{IMUL}
instruction: see \k{insIMUL}.

\H{insNEG} \i\c{NEG}, \i\c{NOT}: Two's and One's Complement

\c NEG r/m8                      ; F6 /3                [8086]
\c NEG r/m16                     ; o16 F7 /3            [8086]
\c NEG r/m32                     ; o32 F7 /3            [386]

\c NOT r/m8                      ; F6 /2                [8086]
\c NOT r/m16                     ; o16 F7 /2            [8086]
\c NOT r/m32                     ; o32 F7 /2            [386]

\c{NEG} replaces the contents of its operand by the two's complement
negation (invert all the bits and then add one) of the original
value. \c{NOT}, similarly, performs one's complement (inverts all
the bits).

\H{insNOP} \i\c{NOP}: No Operation

\c NOP                           ; 90                   [8086]

\c{NOP} performs no operation. Its opcode is the same as that
generated by \c{XCHG AX,AX} or \c{XCHG EAX,EAX} (depending on the
processor mode; see \k{insXCHG}).

\H{insOR} \i\c{OR}: Bitwise OR

\c OR r/m8,reg8                  ; 08 /r                [8086]
\c OR r/m16,reg16                ; o16 09 /r            [8086]
\c OR r/m32,reg32                ; o32 09 /r            [386]

\c OR reg8,r/m8                  ; 0A /r                [8086]
\c OR reg16,r/m16                ; o16 0B /r            [8086]
\c OR reg32,r/m32                ; o32 0B /r            [386]

\c OR r/m8,imm8                  ; 80 /1 ib             [8086]
\c OR r/m16,imm16                ; o16 81 /1 iw         [8086]
\c OR r/m32,imm32                ; o32 81 /1 id         [386]

\c OR r/m16,imm8                 ; o16 83 /1 ib         [8086]
\c OR r/m32,imm8                 ; o32 83 /1 ib         [386]

\c OR AL,imm8                    ; 0C ib                [8086]
\c OR AX,imm16                   ; o16 0D iw            [8086]
\c OR EAX,imm32                  ; o32 0D id            [386]

\c{OR} performs a bitwise OR operation between its two operands
(i.e. each bit of the result is 1 if and only if at least one of the
corresponding bits of the two inputs was 1), and stores the result
in the destination (first) operand.

In the forms with an 8-bit immediate second operand and a longer
first operand, the second operand is considered to be signed, and is
sign-extended to the length of the first operand. In these cases,
the \c{BYTE} qualifier is necessary to force NASM to generate this
form of the instruction.

The MMX instruction \c{POR} (see \k{insPOR}) performs the same
operation on the 64-bit MMX registers.

\H{insOUT} \i\c{OUT}: Output Data to I/O Port

\c OUT imm8,AL                   ; E6 ib                [8086]
\c OUT imm8,AX                   ; o16 E7 ib            [8086]
\c OUT imm8,EAX                  ; o32 E7 ib            [386]
\c OUT DX,AL                     ; EE                   [8086]
\c OUT DX,AX                     ; o16 EF               [8086]
\c OUT DX,EAX                    ; o32 EF               [386]

\c{IN} writes the contents of the given source register to the
specified I/O port. The port number may be specified as an immediate
value if it is between 0 and 255, and otherwise must be stored in
\c{DX}. See also \c{IN} (\k{insIN}).

\H{insOUTSB} \i\c{OUTSB}, \i\c{OUTSW}, \i\c{OUTSD}: Output String to I/O Port

\c OUTSB                         ; 6E                   [186]

\c OUTSW                         ; o16 6F               [186]

\c OUTSD                         ; o32 6F               [386]

\c{OUTSB} loads a byte from \c{[DS:SI]} or \c{[DS:ESI]} and writes
it to the I/O port specified in \c{DX}. It then increments or
decrements (depending on the direction flag: increments if the flag
is clear, decrements if it is set) \c{SI} or \c{ESI}.

The register used is \c{SI} if the address size is 16 bits, and
\c{ESI} if it is 32 bits. If you need to use an address size not
equal to the current \c{BITS} setting, you can use an explicit
\i\c{a16} or \i\c{a32} prefix.

The segment register used to load from \c{[SI]} or \c{[ESI]} can be
overridden by using a segment register name as a prefix (for
example, \c{es outsb}).

\c{OUTSW} and \c{OUTSD} work in the same way, but they output a
word or a doubleword instead of a byte, and increment or decrement
the addressing registers by 2 or 4 instead of 1.

The \c{REP} prefix may be used to repeat the instruction \c{CX} (or
\c{ECX} - again, the address size chooses which) times.

\H{insPACKSSDW} \i\c{PACKSSDW}, \i\c{PACKSSWB}, \i\c{PACKUSWB}: Pack Data

\c PACKSSDW mmxreg,r/m64         ; 0F 6B /r             [PENT,MMX]
\c PACKSSWB mmxreg,r/m64         ; 0F 63 /r             [PENT,MMX]
\c PACKUSWB mmxreg,r/m64         ; 0F 67 /r             [PENT,MMX]

All these instructions start by forming a notional 128-bit word by
placing the source (second) operand on the left of the destination
(first) operand. \c{PACKSSDW} then splits this 128-bit word into
four doublewords, converts each to a word, and loads them side by
side into the destination register; \c{PACKSSWB} and \c{PACKUSWB}
both split the 128-bit word into eight words, converts each to a
byte, and loads \e{those} side by side into the destination
register.

\c{PACKSSDW} and \c{PACKSSWB} perform signed saturation when
reducing the length of numbers: if the number is too large to fit
into the reduced space, they replace it by the largest signed number
(\c{7FFFh} or \c{7Fh}) that \e{will} fit, and if it is too small
then they replace it by the smallest signed number (\c{8000h} or
\c{80h}) that will fit. \c{PACKUSWB} performs unsigned saturation:
it treats its input as unsigned, and replaces it by the largest
unsigned number that will fit.

\H{insPADDB} \i\c{PADDxx}: MMX Packed Addition

\c PADDB mmxreg,r/m64            ; 0F FC /r             [PENT,MMX]
\c PADDW mmxreg,r/m64            ; 0F FD /r             [PENT,MMX]
\c PADDD mmxreg,r/m64            ; 0F FE /r             [PENT,MMX]

\c PADDSB mmxreg,r/m64           ; 0F EC /r             [PENT,MMX]
\c PADDSW mmxreg,r/m64           ; 0F ED /r             [PENT,MMX]

\c PADDUSB mmxreg,r/m64          ; 0F DC /r             [PENT,MMX]
\c PADDUSW mmxreg,r/m64          ; 0F DD /r             [PENT,MMX]

\c{PADDxx} all perform packed addition between their two 64-bit
operands, storing the result in the destination (first) operand. The
\c{PADDxB} forms treat the 64-bit operands as vectors of eight
bytes, and add each byte individually; \c{PADDxW} treat the operands
as vectors of four words; and \c{PADDD} treats its operands as
vectors of two doublewords.

\c{PADDSB} and \c{PADDSW} perform signed saturation on the sum of
each pair of bytes or words: if the result of an addition is too
large or too small to fit into a signed byte or word result, it is
clipped (saturated) to the largest or smallest value which \e{will}
fit. \c{PADDUSB} and \c{PADDUSW} similarly perform unsigned
saturation, clipping to \c{0FFh} or \c{0FFFFh} if the result is
larger than that.

\H{insPADDSIW} \i\c{PADDSIW}: MMX Packed Addition to Implicit
Destination

\c PADDSIW mmxreg,r/m64          ; 0F 51 /r             [CYRIX,MMX]

\c{PADDSIW}, specific to the Cyrix extensions to the MMX instruction
set, performs the same function as \c{PADDSW}, except that the
result is not placed in the register specified by the first operand,
but instead in the register whose number differs from the first
operand only in the last bit. So \c{PADDSIW MM0,MM2} would put the
result in \c{MM1}, but \c{PADDSIW MM1,MM2} would put the result in
\c{MM0}.

\H{insPAND} \i\c{PAND}, \i\c{PANDN}: MMX Bitwise AND and AND-NOT

\c PAND mmxreg,r/m64             ; 0F DB /r             [PENT,MMX]
\c PANDN mmxreg,r/m64            ; 0F DF /r             [PENT,MMX]

\c{PAND} performs a bitwise AND operation between its two operands
(i.e. each bit of the result is 1 if and only if the corresponding
bits of the two inputs were both 1), and stores the result in the
destination (first) operand.

\c{PANDN} performs the same operation, but performs a one's
complement operation on the destination (first) operand first.

\H{insPAVEB} \i\c{PAVEB}: MMX Packed Average

\c PAVEB mmxreg,r/m64            ; 0F 50 /r             [CYRIX,MMX]

\c{PAVEB}, specific to the Cyrix MMX extensions, treats its two
operands as vectors of eight unsigned bytes, and calculates the
average of the corresponding bytes in the operands. The resulting
vector of eight averages is stored in the first operand.

\H{insPCMPEQB} \i\c{PCMPxx}: MMX Packed Comparison

\c PCMPEQB mmxreg,r/m64          ; 0F 74 /r             [PENT,MMX]
\c PCMPEQW mmxreg,r/m64          ; 0F 75 /r             [PENT,MMX]
\c PCMPEQD mmxreg,r/m64          ; 0F 76 /r             [PENT,MMX]

\c PCMPGTB mmxreg,r/m64          ; 0F 64 /r             [PENT,MMX]
\c PCMPGTW mmxreg,r/m64          ; 0F 65 /r             [PENT,MMX]
\c PCMPGTD mmxreg,r/m64          ; 0F 66 /r             [PENT,MMX]

The \c{PCMPxx} instructions all treat their operands as vectors of
bytes, words, or doublewords; corresponding elements of the source
and destination are compared, and the corresponding element of the
destination (first) operand is set to all zeros or all ones
depending on the result of the comparison.

\c{PCMPxxB} treats the operands as vectors of eight bytes,
\c{PCMPxxW} treats them as vectors of four words, and \c{PCMPxxD} as
two doublewords.

\c{PCMPEQx} sets the corresponding element of the destination
operand to all ones if the two elements compared are equal;
\c{PCMPGTx} sets the destination element to all ones if the element
of the first (destination) operand is greater (treated as a signed
integer) than that of the second (source) operand.

\H{insPDISTIB} \i\c{PDISTIB}: MMX Packed Distance and Accumulate
with Implied Register

\c PDISTIB mmxreg,mem64          ; 0F 54 /r             [CYRIX,MMX]

\c{PDISTIB}, specific to the Cyrix MMX extensions, treats its two
input operands as vectors of eight unsigned bytes. For each byte
position, it finds the absolute difference between the bytes in that
position in the two input operands, and adds that value to the byte
in the same position in the implied output register. The addition is
saturated to an unsigned byte in the same way as \c{PADDUSB}.

The implied output register is found in the same way as \c{PADDSIW}
(\k{insPADDSIW}).

Note that \c{PDISTIB} cannot take a register as its second source
operand.

\H{insPMACHRIW} \i\c{PMACHRIW}: MMX Packed Multiply and Accumulate
with Rounding

\c PMACHRIW mmxreg,mem64         ; 0F 5E /r             [CYRIX,MMX]

\c{PMACHRIW} acts almost identically to \c{PMULHRIW}
(\k{insPMULHRW}), but instead of \e{storing} its result in the
implied destination register, it \e{adds} its result, as four packed
words, to the implied destination register. No saturation is done:
the addition can wrap around.

Note that \c{PMACHRIW} cannot take a register as its second source
operand.

\H{insPMADDWD} \i\c{PMADDWD}: MMX Packed Multiply and Add

\c PMADDWD mmxreg,r/m64          ; 0F F5 /r             [PENT,MMX]

\c{PMADDWD} treats its two inputs as vectors of four signed words.
It multiplies corresponding elements of the two operands, giving
four signed doubleword results. The top two of these are added and
placed in the top 32 bits of the destination (first) operand; the
bottom two are added and placed in the bottom 32 bits.

\H{insPMAGW} \i\c{PMAGW}: MMX Packed Magnitude

\c PMAGW mmxreg,r/m64            ; 0F 52 /r             [CYRIX,MMX]

\c{PMAGW}, specific to the Cyrix MMX extensions, treats both its
operands as vectors of four signed words. It compares the absolute
values of the words in corresponding positions, and sets each word
of the destination (first) operand to whichever of the two words in
that position had the larger absolute value.

\H{insPMULHRW} \i\c{PMULHRW}, \i\c{PMULHRIW}: MMX Packed Multiply
High with Rounding

\c PMULHRW mmxreg,r/m64          ; 0F 59 /r             [CYRIX,MMX]
\c PMULHRIW mmxreg,r/m64         ; 0F 5D /r             [CYRIX,MMX]

These instructions, specific to the Cyrix MMX extensions, treat
their operands as vectors of four signed words. Words in
corresponding positions are multiplied, to give a 32-bit value in
which bits 30 and 31 are guaranteed equal. Bits 30 to 15 of this
value (bit mask \c{0x7FFF8000}) are taken and stored in the
corresponding position of the destination operand, after first
rounding the low bit (equivalent to adding \c{0x4000} before
extracting bits 30 to 15).

For \c{PMULHRW}, the destination operand is the first operand; for
\c{PMULHRIW} the destination operand is implied by the first operand
in the manner of \c{PADDSIW} (\k{insPADDSIW}).

\H{insPMULHW} \i\c{PMULHW}, \i\c{PMULLW}: MMX Packed Multiply

\c PMULHW mmxreg,r/m64           ; 0F E5 /r             [PENT,MMX]
\c PMULLW mmxreg,r/m64           ; 0F D5 /r             [PENT,MMX]

\c{PMULxW} treats its two inputs as vectors of four signed words. It
multiplies corresponding elements of the two operands, giving four
signed doubleword results.

\c{PMULHW} then stores the top 16 bits of each doubleword in the
destination (first) operand; \c{PMULLW} stores the bottom 16 bits of
each doubleword in the destination operand.

\H{insPMVccZB} \i\c{PMVccZB}: MMX Packed Conditional Move

\c PMVZB mmxreg,mem64            ; 0F 58 /r             [CYRIX,MMX]
\c PMVNZB mmxreg,mem64           ; 0F 5A /r             [CYRIX,MMX]
\c PMVLZB mmxreg,mem64           ; 0F 5B /r             [CYRIX,MMX]
\c PMVGEZB mmxreg,mem64          ; 0F 5C /r             [CYRIX,MMX]

These instructions, specific to the Cyrix MMX extensions, perform
parallel conditional moves. The two input operands are treated as
vectors of eight bytes. Each byte of the destination (first) operand
is either written from the corresponding byte of the source (second)
operand, or left alone, depending on the value of the byte in the
\e{implied} operand (specified in the same way as \c{PADDSIW}, in
\k{insPADDSIW}).

\c{PMVZB} performs each move if the corresponding byte in the
implied operand is zero. \c{PMVNZB} moves if the byte is non-zero.
\c{PMVLZB} moves if the byte is less than zero, and \c{PMVGEZB}
moves if the byte is greater than or equal to zero.

Note that these instructions cannot take a register as their second
source operand.

\H{insPOP} \i\c{POP}: Pop Data from Stack

\c POP reg16                     ; o16 58+r             [8086]
\c POP reg32                     ; o32 58+r             [386]

\c POP r/m16                     ; o16 8F /0            [8086]
\c POP r/m32                     ; o32 8F /0            [386]

\c POP CS                        ; 0F                   [8086,UNDOC]
\c POP DS                        ; 1F                   [8086]
\c POP ES                        ; 07                   [8086]
\c POP SS                        ; 17                   [8086]
\c POP FS                        ; 0F A1                [386]
\c POP GS                        ; 0F A9                [386]

\c{POP} loads a value from the stack (from \c{[SS:SP]} or
\c{[SS:ESP]}) and then increments the stack pointer.

The address-size attribute of the instruction determines whether
\c{SP} or \c{ESP} is used as the stack pointer: to deliberately
override the default given by the \c{BITS} setting, you can use an
\i\c{a16} or \i\c{a32} prefix.

The operand-size attribute of the instruction determines whether the
stack pointer is incremented by 2 or 4: this means that segment
register pops in \c{BITS 32} mode will pop 4 bytes off the stack and
discard the upper two of them. If you need to override that, you can
use an \i\c{o16} or \i\c{o32} prefix.

The above opcode listings give two forms for general-purpose
register pop instructions: for example, \c{POP BX} has the two forms
\c{5B} and \c{8F C3}. NASM will always generate the shorter form
when given \c{POP BX}. NDISASM will disassemble both.

\c{POP CS} is not a documented instruction, and is not supported on
any processor above the 8086 (since they use \c{0Fh} as an opcode
prefix for instruction set extensions). However, at least some 8086
processors do support it, and so NASM generates it for completeness.

\H{insPOPA} \i\c{POPAx}: Pop All General-Purpose Registers

\c POPA                          ; 61                   [186]
\c POPAW                         ; o16 61               [186]
\c POPAD                         ; o32 61               [386]

\c{POPAW} pops a word from the stack into each of, successively,
\c{DI}, \c{SI}, \c{BP}, nothing (it discards a word from the stack
which was a placeholder for \c{SP}), \c{BX}, \c{DX}, \c{CX} and
\c{AX}. It is intended to reverse the operation of \c{PUSHAW} (see
\k{insPUSHA}), but it ignores the value for \c{SP} that was pushed
on the stack by \c{PUSHAW}.

\c{POPAD} pops twice as much data, and places the results in
\c{EDI}, \c{ESI}, \c{EBP}, nothing (placeholder for \c{ESP}),
\c{EBX}, \c{EDX}, \c{ECX} and \c{EAX}. It reverses the operation of
\c{PUSHAD}.

\c{POPA} is an alias mnemonic for either \c{POPAW} or \c{POPAD},
depending on the current \c{BITS} setting.

Note that the registers are popped in reverse order of their numeric
values in opcodes (see \k{iref-rv}).

\H{insPOPF} \i\c{POPFx}: Pop Flags Register

\c POPF                          ; 9D                   [186]
\c POPFW                         ; o16 9D               [186]
\c POPFD                         ; o32 9D               [386]

\c{POPFW} pops a word from the stack and stores it in the bottom 16
bits of the flags register (or the whole flags register, on
processors below a 386). \c{POPFD} pops a doubleword and stores it
in the entire flags register.

\c{POPF} is an alias mnemonic for either \c{POPFW} or \c{POPFD},
depending on the current \c{BITS} setting.

See also \c{PUSHF} (\k{insPUSHF}).

\H{insPOR} \i\c{POR}: MMX Bitwise OR

\c POR mmxreg,r/m64              ; 0F EB /r             [PENT,MMX]

\c{POR} performs a bitwise OR operation between its two operands
(i.e. each bit of the result is 1 if and only if at least one of the
corresponding bits of the two inputs was 1), and stores the result
in the destination (first) operand.

\H{insPSLLD} \i\c{PSLLx}, \i\c{PSRLx}, \i\c{PSRAx}: MMX Bit Shifts

\c PSLLW mmxreg,r/m64            ; 0F F1 /r             [PENT,MMX]
\c PSLLW mmxreg,imm8             ; 0F 71 /6 ib          [PENT,MMX]

\c PSLLD mmxreg,r/m64            ; 0F F2 /r             [PENT,MMX]
\c PSLLD mmxreg,imm8             ; 0F 72 /6 ib          [PENT,MMX]

\c PSLLQ mmxreg,r/m64            ; 0F F3 /r             [PENT,MMX]
\c PSLLQ mmxreg,imm8             ; 0F 73 /6 ib          [PENT,MMX]

\c PSRAW mmxreg,r/m64            ; 0F E1 /r             [PENT,MMX]
\c PSRAW mmxreg,imm8             ; 0F 71 /4 ib          [PENT,MMX]

\c PSRAD mmxreg,r/m64            ; 0F E2 /r             [PENT,MMX]
\c PSRAD mmxreg,imm8             ; 0F 72 /4 ib          [PENT,MMX]

\c PSRLW mmxreg,r/m64            ; 0F D1 /r             [PENT,MMX]
\c PSRLW mmxreg,imm8             ; 0F 71 /2 ib          [PENT,MMX]

\c PSRLD mmxreg,r/m64            ; 0F D2 /r             [PENT,MMX]
\c PSRLD mmxreg,imm8             ; 0F 72 /2 ib          [PENT,MMX]

\c PSRLQ mmxreg,r/m64            ; 0F D3 /r             [PENT,MMX]
\c PSRLQ mmxreg,imm8             ; 0F 73 /2 ib          [PENT,MMX]

\c{PSxxQ} perform simple bit shifts on the 64-bit MMX registers: the
destination (first) operand is shifted left or right by the number of
bits given in the source (second) operand, and the vacated bits are
filled in with zeros (for a logical shift) or copies of the original
sign bit (for an arithmetic right shift).

\c{PSxxW} and \c{PSxxD} perform packed bit shifts: the destination
operand is treated as a vector of four words or two doublewords, and
each element is shifted individually, so bits shifted out of one
element do not interfere with empty bits coming into the next.

\c{PSLLx} and \c{PSRLx} perform logical shifts: the vacated bits at
one end of the shifted number are filled with zeros. \c{PSRAx}
performs an arithmetic right shift: the vacated bits at the top of
the shifted number are filled with copies of the original top (sign)
bit.

\H{insPSUBB} \i\c{PSUBxx}: MMX Packed Subtraction

\c PSUBB mmxreg,r/m64            ; 0F F8 /r             [PENT,MMX]
\c PSUBW mmxreg,r/m64            ; 0F F9 /r             [PENT,MMX]
\c PSUBD mmxreg,r/m64            ; 0F FA /r             [PENT,MMX]

\c PSUBSB mmxreg,r/m64           ; 0F E8 /r             [PENT,MMX]
\c PSUBSW mmxreg,r/m64           ; 0F E9 /r             [PENT,MMX]

\c PSUBUSB mmxreg,r/m64          ; 0F D8 /r             [PENT,MMX]
\c PSUBUSW mmxreg,r/m64          ; 0F D9 /r             [PENT,MMX]

\c{PSUBxx} all perform packed subtraction between their two 64-bit
operands, storing the result in the destination (first) operand. The
\c{PSUBxB} forms treat the 64-bit operands as vectors of eight
bytes, and subtract each byte individually; \c{PSUBxW} treat the operands
as vectors of four words; and \c{PSUBD} treats its operands as
vectors of two doublewords.

In all cases, the elements of the operand on the right are
subtracted from the corresponding elements of the operand on the
left, not the other way round.

\c{PSUBSB} and \c{PSUBSW} perform signed saturation on the sum of
each pair of bytes or words: if the result of a subtraction is too
large or too small to fit into a signed byte or word result, it is
clipped (saturated) to the largest or smallest value which \e{will}
fit. \c{PSUBUSB} and \c{PSUBUSW} similarly perform unsigned
saturation, clipping to \c{0FFh} or \c{0FFFFh} if the result is
larger than that.

\H{insPSUBSIW} \i\c{PSUBSIW}: MMX Packed Subtract with Saturation to
Implied Destination

\c PSUBSIW mmxreg,r/m64          ; 0F 55 /r             [CYRIX,MMX]

\c{PSUBSIW}, specific to the Cyrix extensions to the MMX instruction
set, performs the same function as \c{PSUBSW}, except that the
result is not placed in the register specified by the first operand,
but instead in the implied destination register, specified as for
\c{PADDSIW} (\k{insPADDSIW}).

\H{insPUNPCKHBW} \i\c{PUNPCKxxx}: Unpack Data

\c PUNPCKHBW mmxreg,r/m64        ; 0F 68 /r             [PENT,MMX]
\c PUNPCKHWD mmxreg,r/m64        ; 0F 69 /r             [PENT,MMX]
\c PUNPCKHDQ mmxreg,r/m64        ; 0F 6A /r             [PENT,MMX]

\c PUNPCKLBW mmxreg,r/m64        ; 0F 60 /r             [PENT,MMX]
\c PUNPCKLWD mmxreg,r/m64        ; 0F 61 /r             [PENT,MMX]
\c PUNPCKLDQ mmxreg,r/m64        ; 0F 62 /r             [PENT,MMX]

\c{PUNPCKxx} all treat their operands as vectors, and produce a new
vector generated by interleaving elements from the two inputs. The
\c{PUNPCKHxx} instructions start by throwing away the bottom half of
each input operand, and the \c{PUNPCKLxx} instructions throw away
the top half.

The remaining elements, totalling 64 bits, are then interleaved into
the destination, alternating elements from the second (source)
operand and the first (destination) operand: so the leftmost element
in the result always comes from the second operand, and the
rightmost from the destination.

\c{PUNPCKxBW} works a byte at a time, \c{PUNPCKxWD} a word at a
time, and \c{PUNPCKxDQ} a doubleword at a time.

So, for example, if the first operand held \c{0x7A6A5A4A3A2A1A0A}
and the second held \c{0x7B6B5B4B3B2B1B0B}, then:

\b \c{PUNPCKHBW} would return \c{0x7B7A6B6A5B5A4B4A}.

\b \c{PUNPCKHWD} would return \c{0x7B6B7A6A5B4B5A4A}.

\b \c{PUNPCKHDQ} would return \c{0x7B6B5B4B7A6A5A4A}.

\b \c{PUNPCKLBW} would return \c{0x3B3A2B2A1B1A0B0A}.

\b \c{PUNPCKLWD} would return \c{0x3B2B3A2A1B0B1A0A}.

\b \c{PUNPCKLDQ} would return \c{0x3B2B1B0B3A2A1A0A}.

\H{insPUSH} \i\c{PUSH}: Push Data on Stack

\c PUSH reg16                    ; o16 50+r             [8086]
\c PUSH reg32                    ; o32 50+r             [386]

\c PUSH r/m16                    ; o16 FF /6            [8086]
\c PUSH r/m32                    ; o32 FF /6            [386]

\c PUSH CS                       ; 0E                   [8086]
\c PUSH DS                       ; 1E                   [8086]
\c PUSH ES                       ; 06                   [8086]
\c PUSH SS                       ; 16                   [8086]
\c PUSH FS                       ; 0F A0                [386]
\c PUSH GS                       ; 0F A8                [386]

\c PUSH imm8                     ; 6A ib                [286]
\c PUSH imm16                    ; o16 68 iw            [286]
\c PUSH imm32                    ; o32    id            [386]

\c{PUSH} decrements the stack pointer (\c{SP} or \c{ESP}) by 2 or 4,
and then stores the given value at \c{[SS:SP]} or \c{[SS:ESP]}.

The address-size attribute of the instruction determines whether
\c{SP} or \c{ESP} is used as the stack pointer: to deliberately
override the default given by the \c{BITS} setting, you can use an
\i\c{a16} or \i\c{a32} prefix.

The operand-size attribute of the instruction determines whether the
stack pointer is decremented by 2 or 4: this means that segment
register pushes in \c{BITS 32} mode will push 4 bytes on the stack,
of which the upper two are undefined. If you need to override that,
you can use an \i\c{o16} or \i\c{o32} prefix.

The above opcode listings give two forms for general-purpose
\i{register push} instructions: for example, \c{PUSH BX} has the two
forms \c{53} and \c{FF F3}. NASM will always generate the shorter
form when given \c{PUSH BX}. NDISASM will disassemble both.

Unlike the undocumented and barely supported \c{POP CS}, \c{PUSH CS}
is a perfectly valid and sensible instruction, supported on all
processors.

The instruction \c{PUSH SP} may be used to distinguish an 8086 from
later processors: on an 8086, the value of \c{SP} stored is the
value it has \e{after} the push instruction, whereas on later
processors it is the value \e{before} the push instruction.

\H{insPUSHA} \i\c{PUSHAx}: Push All General-Purpose Registers

\c PUSHA                         ; 60                   [186]
\c PUSHAD                        ; o32 60               [386]
\c PUSHAW                        ; o16 60               [186]

\c{PUSHAW} pushes, in succession, \c{AX}, \c{CX}, \c{DX}, \c{BX},
\c{SP}, \c{BP}, \c{SI} and \c{DI} on the stack, decrementing the
stack pointer by a total of 16.

\c{PUSHAD} pushes, in succession, \c{EAX}, \c{ECX}, \c{EDX},
\c{EBX}, \c{ESP}, \c{EBP}, \c{ESI} and \c{EDI} on the stack,
decrementing the stack pointer by a total of 32.

In both cases, the value of \c{SP} or \c{ESP} pushed is its
\e{original} value, as it had before the instruction was executed.

\c{PUSHA} is an alias mnemonic for either \c{PUSHAW} or \c{PUSHAD},
depending on the current \c{BITS} setting.

Note that the registers are pushed in order of their numeric values
in opcodes (see \k{iref-rv}).

See also \c{POPA} (\k{insPOPA}).

\H{insPUSHF} \i\c{PUSHFx}: Push Flags Register

\c PUSHF                         ; 9C                   [186]
\c PUSHFD                        ; o32 9C               [386]
\c PUSHFW                        ; o16 9C               [186]

\c{PUSHFW} pops a word from the stack and stores it in the bottom 16
bits of the flags register (or the whole flags register, on
processors below a 386). \c{PUSHFD} pops a doubleword and stores it
in the entire flags register.

\c{PUSHF} is an alias mnemonic for either \c{PUSHFW} or \c{PUSHFD},
depending on the current \c{BITS} setting.

See also \c{POPF} (\k{insPOPF}).

\H{insPXOR} \i\c{PXOR}: MMX Bitwise XOR

\c PXOR mmxreg,r/m64             ; 0F EF /r             [PENT,MMX]

\c{PXOR} performs a bitwise XOR operation between its two operands
(i.e. each bit of the result is 1 if and only if exactly one of the
corresponding bits of the two inputs was 1), and stores the result
in the destination (first) operand.

\H{insRCL} \i\c{RCL}, \i\c{RCR}: Bitwise Rotate through Carry Bit

\c RCL r/m8,1                    ; D0 /2                [8086]
\c RCL r/m8,CL                   ; D2 /2                [8086]
\c RCL r/m8,imm8                 ; C0 /2 ib             [286]
\c RCL r/m16,1                   ; o16 D1 /2            [8086]
\c RCL r/m16,CL                  ; o16 D3 /2            [8086]
\c RCL r/m16,imm8                ; o16 C1 /2 ib         [286]
\c RCL r/m32,1                   ; o32 D1 /2            [386]
\c RCL r/m32,CL                  ; o32 D3 /2            [386]
\c RCL r/m32,imm8                ; o32 C1 /2 ib         [386]

\c RCR r/m8,1                    ; D0 /3                [8086]
\c RCR r/m8,CL                   ; D2 /3                [8086]
\c RCR r/m8,imm8                 ; C0 /3 ib             [286]
\c RCR r/m16,1                   ; o16 D1 /3            [8086]
\c RCR r/m16,CL                  ; o16 D3 /3            [8086]
\c RCR r/m16,imm8                ; o16 C1 /3 ib         [286]
\c RCR r/m32,1                   ; o32 D1 /3            [386]
\c RCR r/m32,CL                  ; o32 D3 /3            [386]
\c RCR r/m32,imm8                ; o32 C1 /3 ib         [386]

\c{RCL} and \c{RCR} perform a 9-bit, 17-bit or 33-bit bitwise
rotation operation, involving the given source/destination (first)
operand and the carry bit. Thus, for example, in the operation
\c{RCR AL,1}, a 9-bit rotation is performed in which \c{AL} is
shifted left by 1, the top bit of \c{AL} moves into the carry flag,
and the original value of the carry flag is placed in the low bit of
\c{AL}.

The number of bits to rotate by is given by the second operand. Only
the bottom five bits of the rotation count are considered by
processors above the 8086.

You can force the longer (286 and upwards, beginning with a \c{C1}
byte) form of \c{RCL foo,1} by using a \c{BYTE} prefix: \c{RCL
foo,BYTE 1}. Similarly with \c{RCR}.

\H{insRDMSR} \i\c{RDMSR}: Read Model-Specific Registers

\c RDMSR                         ; 0F 32                [PENT]

\c{RDMSR} reads the processor Model-Specific Register (MSR) whose
index is stored in \c{ECX}, and stores the result in \c{EDX:EAX}.
See also \c{WRMSR} (\k{insWRMSR}).

\H{insRDPMC} \i\c{RDPMC}: Read Performance-Monitoring Counters

\c RDPMC                         ; 0F 33                [P6]

\c{RDPMC} reads the processor performance-monitoring counter whose
index is stored in \c{ECX}, and stores the result in \c{EDX:EAX}.

\H{insRDTSC} \i\c{RDTSC}: Read Time-Stamp Counter

\c RDTSC                         ; 0F 31                [PENT]

\c{RDTSC} reads the processor's time-stamp counter into \c{EDX:EAX}.

\H{insRET} \i\c{RET}, \i\c{RETF}, \i\c{RETN}: Return from Procedure Call

\c RET                           ; C3                   [8086]
\c RET imm16                     ; C2 iw                [8086]

\c RETF                          ; CB                   [8086]
\c RETF imm16                    ; CA iw                [8086]

\c RETN                          ; C3                   [8086]
\c RETN imm16                    ; C2 iw                [8086]

\c{RET}, and its exact synonym \c{RETN}, pop \c{IP} or \c{EIP} from
the stack and transfer control to the new address. Optionally, if a
numeric second operand is provided, they increment the stack pointer
by a further \c{imm16} bytes after popping the return address.

\c{RETF} executes a far return: after popping \c{IP}/\c{EIP}, it
then pops \c{CS}, and \e{then} increments the stack pointer by the
optional argument if present.

\H{insROL} \i\c{ROL}, \i\c{ROR}: Bitwise Rotate

\c ROL r/m8,1                    ; D0 /0                [8086]
\c ROL r/m8,CL                   ; D2 /0                [8086]
\c ROL r/m8,imm8                 ; C0 /0 ib             [286]
\c ROL r/m16,1                   ; o16 D1 /0            [8086]
\c ROL r/m16,CL                  ; o16 D3 /0            [8086]
\c ROL r/m16,imm8                ; o16 C1 /0 ib         [286]
\c ROL r/m32,1                   ; o32 D1 /0            [386]
\c ROL r/m32,CL                  ; o32 D3 /0            [386]
\c ROL r/m32,imm8                ; o32 C1 /0 ib         [386]

\c ROR r/m8,1                    ; D0 /1                [8086]
\c ROR r/m8,CL                   ; D2 /1                [8086]
\c ROR r/m8,imm8                 ; C0 /1 ib             [286]
\c ROR r/m16,1                   ; o16 D1 /1            [8086]
\c ROR r/m16,CL                  ; o16 D3 /1            [8086]
\c ROR r/m16,imm8                ; o16 C1 /1 ib         [286]
\c ROR r/m32,1                   ; o32 D1 /1            [386]
\c ROR r/m32,CL                  ; o32 D3 /1            [386]
\c ROR r/m32,imm8                ; o32 C1 /1 ib         [386]

\c{ROL} and \c{ROR} perform a bitwise rotation operation on the given
source/destination (first) operand. Thus, for example, in the
operation \c{ROR AL,1}, an 8-bit rotation is performed in which
\c{AL} is shifted left by 1 and the original top bit of \c{AL} moves
round into the low bit.

The number of bits to rotate by is given by the second operand. Only
the bottom 3, 4 or 5 bits (depending on the source operand size) of
the rotation count are considered by processors above the 8086.

You can force the longer (286 and upwards, beginning with a \c{C1}
byte) form of \c{ROL foo,1} by using a \c{BYTE} prefix: \c{ROL
foo,BYTE 1}. Similarly with \c{ROR}.

\H{insRSM} \i\c{RSM}: Resume from System-Management Mode

\c RSM                           ; 0F AA                [PENT]

\c{RSM} returns the processor to its normal operating mode when it
was in System-Management Mode.

\H{insSAHF} \i\c{SAHF}: Store AH to Flags

\c SAHF                          ; 9E                   [8086]

\c{SAHF} sets the low byte of the flags word according to the
contents of the \c{AH} register. See also \c{LAHF} (\k{insLAHF}).

\H{insSAL} \i\c{SAL}, \i\c{SAR}: Bitwise Arithmetic Shifts

\c SAL r/m8,1                    ; D0 /4                [8086]
\c SAL r/m8,CL                   ; D2 /4                [8086]
\c SAL r/m8,imm8                 ; C0 /4 ib             [286]
\c SAL r/m16,1                   ; o16 D1 /4            [8086]
\c SAL r/m16,CL                  ; o16 D3 /4            [8086]
\c SAL r/m16,imm8                ; o16 C1 /4 ib         [286]
\c SAL r/m32,1                   ; o32 D1 /4            [386]
\c SAL r/m32,CL                  ; o32 D3 /4            [386]
\c SAL r/m32,imm8                ; o32 C1 /4 ib         [386]

\c SAR r/m8,1                    ; D0 /0                [8086]
\c SAR r/m8,CL                   ; D2 /0                [8086]
\c SAR r/m8,imm8                 ; C0 /0 ib             [286]
\c SAR r/m16,1                   ; o16 D1 /0            [8086]
\c SAR r/m16,CL                  ; o16 D3 /0            [8086]
\c SAR r/m16,imm8                ; o16 C1 /0 ib         [286]
\c SAR r/m32,1                   ; o32 D1 /0            [386]
\c SAR r/m32,CL                  ; o32 D3 /0            [386]
\c SAR r/m32,imm8                ; o32 C1 /0 ib         [386]

\c{SAL} and \c{SAR} perform an arithmetic shift operation on the given
source/destination (first) operand. The vacated bits are filled with
zero for \c{SAL}, and with copies of the original high bit of the
source operand for \c{SAR}.

\c{SAL} is a synonym for \c{SHL} (see \k{insSHL}). NASM will
assemble either one to the same code, but NDISASM will always
disassemble that code as \c{SHL}.

The number of bits to shift by is given by the second operand. Only
the bottom 3, 4 or 5 bits (depending on the source operand size) of
the shift count are considered by processors above the 8086.

You can force the longer (286 and upwards, beginning with a \c{C1}
byte) form of \c{SAL foo,1} by using a \c{BYTE} prefix: \c{SAL
foo,BYTE 1}. Similarly with \c{SAR}.

\H{insSALC} \i\c{SALC}: Set AL from Carry Flag

\c SALC                          ; D6                   [8086,UNDOC]

\c{SALC} is an early undocumented instruction similar in concept to
\c{SETcc} (\k{insSETcc}). Its function is to set \c{AL} to zero if
the carry flag is clear, or to \c{0xFF} if it is set.

\H{insSBB} \i\c{SBB}: Subtract with Borrow

\c SBB r/m8,reg8                 ; 18 /r                [8086]
\c SBB r/m16,reg16               ; o16 19 /r            [8086]
\c SBB r/m32,reg32               ; o32 19 /r            [386]

\c SBB reg8,r/m8                 ; 1A /r                [8086]
\c SBB reg16,r/m16               ; o16 1B /r            [8086]
\c SBB reg32,r/m32               ; o32 1B /r            [386]

\c SBB r/m8,imm8                 ; 80 /3 ib             [8086]
\c SBB r/m16,imm16               ; o16 81 /3 iw         [8086]
\c SBB r/m32,imm32               ; o32 81 /3 id         [386]

\c SBB r/m16,imm8                ; o16 83 /3 ib         [8086]
\c SBB r/m32,imm8                ; o32 83 /3 ib         [8086]

\c SBB AL,imm8                   ; 1C ib                [8086]
\c SBB AX,imm16                  ; o16 1D iw            [8086]
\c SBB EAX,imm32                 ; o32 1D id            [386]

\c{SBB} performs integer subtraction: it subtracts its second
operand, plus the value of the carry flag, from its first, and
leaves the result in its destination (first) operand. The flags are
set according to the result of the operation: in particular, the
carry flag is affected and can be used by a subsequent \c{SBB}
instruction.

In the forms with an 8-bit immediate second operand and a longer
first operand, the second operand is considered to be signed, and is
sign-extended to the length of the first operand. In these cases,
the \c{BYTE} qualifier is necessary to force NASM to generate this
form of the instruction.

To subtract one number from another without also subtracting the
contents of the carry flag, use \c{SUB} (\k{insSUB}).

\H{insSCASB} \i\c{SCASB}, \i\c{SCASW}, \i\c{SCASD}: Scan String

\c SCASB                         ; AE                   [8086]
\c SCASW                         ; o16 AF               [8086]
\c SCASD                         ; o32 AF               [386]

\c{SCASB} compares the byte in \c{AL} with the byte at \c{[ES:DI]}
or \c{[ES:EDI]}, and sets the flags accordingly. It then increments
or decrements (depending on the direction flag: increments if the
flag is clear, decrements if it is set) \c{DI} (or \c{EDI}).

The register used is \c{DI} if the address size is 16 bits, and
\c{EDI} if it is 32 bits. If you need to use an address size not
equal to the current \c{BITS} setting, you can use an explicit
\i\c{a16} or \i\c{a32} prefix.

Segment override prefixes have no effect for this instruction: the
use of \c{ES} for the load from \c{[DI]} or \c{[EDI]} cannot be
overridden.

\c{SCASW} and \c{SCASD} work in the same way, but they compare a
word to \c{AX} or a doubleword to \c{EAX} instead of a byte to
\c{AL}, and increment or decrement the addressing registers by 2 or
4 instead of 1.

The \c{REPE} and \c{REPNE} prefixes (equivalently, \c{REPZ} and
\c{REPNZ}) may be used to repeat the instruction up to \c{CX} (or
\c{ECX} - again, the address size chooses which) times until the
first unequal or equal byte is found.

\H{insSETcc} \i\c{SETcc}: Set Register from Condition

\c SETcc r/m8                    ; 0F 90+cc /2          [386]

\c{SETcc} sets the given 8-bit operand to zero if its condition is
not satisfied, and to 1 if it is.

\H{insSGDT} \i\c{SGDT}, \i\c{SIDT}, \i\c{SLDT}: Store Descriptor Table Pointers

\c SGDT mem                      ; 0F 01 /0             [286,PRIV]
\c SIDT mem                      ; 0F 01 /1             [286,PRIV]
\c SLDT r/m16                    ; 0F 00 /0             [286,PRIV]

\c{SGDT} and \c{SIDT} both take a 6-byte memory area as an operand:
they store the contents of the GDTR (global descriptor table
register) or IDTR (interrupt descriptor table register) into that
area as a 32-bit linear address and a 16-bit size limit from that
area (in that order). These are the only instructions which directly
use \e{linear} addresses, rather than segment/offset pairs.

\c{SLDT} stores the segment selector corresponding to the LDT (local
descriptor table) into the given operand.

See also \c{LGDT}, \c{LIDT} and \c{LLDT} (\k{insLGDT}).

\H{insSHL} \i\c{SHL}, \i\c{SHR}: Bitwise Logical Shifts

\c SHL r/m8,1                    ; D0 /4                [8086]
\c SHL r/m8,CL                   ; D2 /4                [8086]
\c SHL r/m8,imm8                 ; C0 /4 ib             [286]
\c SHL r/m16,1                   ; o16 D1 /4            [8086]
\c SHL r/m16,CL                  ; o16 D3 /4            [8086]
\c SHL r/m16,imm8                ; o16 C1 /4 ib         [286]
\c SHL r/m32,1                   ; o32 D1 /4            [386]
\c SHL r/m32,CL                  ; o32 D3 /4            [386]
\c SHL r/m32,imm8                ; o32 C1 /4 ib         [386]

\c SHR r/m8,1                    ; D0 /5                [8086]
\c SHR r/m8,CL                   ; D2 /5                [8086]
\c SHR r/m8,imm8                 ; C0 /5 ib             [286]
\c SHR r/m16,1                   ; o16 D1 /5            [8086]
\c SHR r/m16,CL                  ; o16 D3 /5            [8086]
\c SHR r/m16,imm8                ; o16 C1 /5 ib         [286]
\c SHR r/m32,1                   ; o32 D1 /5            [386]
\c SHR r/m32,CL                  ; o32 D3 /5            [386]
\c SHR r/m32,imm8                ; o32 C1 /5 ib         [386]

\c{SHL} and \c{SHR} perform a logical shift operation on the given
source/destination (first) operand. The vacated bits are filled with
zero.

A synonym for \c{SHL} is \c{SAL} (see \k{insSAL}). NASM will
assemble either one to the same code, but NDISASM will always
disassemble that code as \c{SHL}.

The number of bits to shift by is given by the second operand. Only
the bottom 3, 4 or 5 bits (depending on the source operand size) of
the shift count are considered by processors above the 8086.

You can force the longer (286 and upwards, beginning with a \c{C1}
byte) form of \c{SHL foo,1} by using a \c{BYTE} prefix: \c{SHL
foo,BYTE 1}. Similarly with \c{SHR}.

\H{insSHLD} \i\c{SHLD}, \i\c{SHRD}: Bitwise Double-Precision Shifts

\c SHLD r/m16,reg16,imm8         ; o16 0F A4 /r ib      [386]
\c SHLD r/m16,reg32,imm8         ; o32 0F A4 /r ib      [386]
\c SHLD r/m16,reg16,CL           ; o16 0F A5 /r         [386]
\c SHLD r/m16,reg32,CL           ; o32 0F A5 /r         [386]

\c SHRD r/m16,reg16,imm8         ; o16 0F AC /r ib      [386]
\c SHRD r/m32,reg32,imm8         ; o32 0F AC /r ib      [386]
\c SHRD r/m16,reg16,CL           ; o16 0F AD /r         [386]
\c SHRD r/m32,reg32,CL           ; o32 0F AD /r         [386]

\c{SHLD} performs a double-precision left shift. It notionally places
its second operand to the right of its first, then shifts the entire
bit string thus generated to the left by a number of bits specified
in the third operand. It then updates only the \e{first} operand
according to the result of this. The second operand is not modified.

\c{SHRD} performs the corresponding right shift: it notionally
places the second operand to the \e{left} of the first, shifts the
whole bit string right, and updates only the first operand.

For example, if \c{EAX} holds \c{0x01234567} and \c{EBX} holds
\c{0x89ABCDEF}, then the instruction \c{SHLD EAX,EBX,4} would update
\c{EAX} to hold \c{0x12345678}. Under the same conditions, \c{SHRD
EAX,EBX,4} would update \c{EAX} to hold \c{0xF0123456}.

The number of bits to shift by is given by the third operand. Only
the bottom 5 bits of the shift count are considered.

\H{insSMI} \i\c{SMI}: System Management Interrupt

\c SMI                           ; F1                   [386,UNDOC]

This is an opcode apparently supported by some AMD processors (which
is why it can generate the same opcode as \c{INT1}), and places the
machine into system-management mode, a special debugging mode.

\H{insSMSW} \i\c{SMSW}: Store Machine Status Word

\c SMSW r/m16                    ; 0F 01 /4             [286,PRIV]

\c{SMSW} stores the bottom half of the \c{CR0} control register (or
the Machine Status Word, on 286 processors) into the destination
operand. See also \c{LMSW} (\k{insLMSW}).

\H{insSTC} \i\c{STC}, \i\c{STD}, \i\c{STI}: Set Flags

\c STC                           ; F9                   [8086]
\c STD                           ; FD                   [8086]
\c STI                           ; FB                   [8086]

These instructions set various flags. \c{STC} sets the carry flag;
\c{STD} sets the direction flag; and \c{STI} sets the interrupt flag
(thus enabling interrupts).

To clear the carry, direction, or interrupt flags, use the \c{CLC},
\c{CLD} and \c{CLI} instructions (\k{insCLC}). To invert the carry
flag, use \c{CMC} (\k{insCMC}).

\H{insSTOSB} \i\c{STOSB}, \i\c{STOSW}, \i\c{STOSD}: Store Byte to String

\c STOSB                         ; AA                   [8086]
\c STOSW                         ; o16 AB               [8086]
\c STOSD                         ; o32 AB               [386]

\c{STOSB} stores the byte in \c{AL} at \c{[ES:DI]} or \c{[ES:EDI]},
and sets the flags accordingly. It then increments or decrements
(depending on the direction flag: increments if the flag is clear,
decrements if it is set) \c{DI} (or \c{EDI}).

The register used is \c{DI} if the address size is 16 bits, and
\c{EDI} if it is 32 bits. If you need to use an address size not
equal to the current \c{BITS} setting, you can use an explicit
\i\c{a16} or \i\c{a32} prefix.

Segment override prefixes have no effect for this instruction: the
use of \c{ES} for the store to \c{[DI]} or \c{[EDI]} cannot be
overridden.

\c{STOSW} and \c{STOSD} work in the same way, but they store the
word in \c{AX} or the doubleword in \c{EAX} instead of the byte in
\c{AL}, and increment or decrement the addressing registers by 2 or
4 instead of 1.

The \c{REP} prefix may be used to repeat the instruction \c{CX} (or
\c{ECX} - again, the address size chooses which) times.

\H{insSTR} \i\c{STR}: Store Task Register

\c STR r/m16                     ; 0F 00 /1             [286,PRIV]

\c{STR} stores the segment selector corresponding to the contents of
the Task Register into its operand.

\H{insSUB} \i\c{SUB}: Subtract Integers

\c SUB r/m8,reg8                 ; 28 /r                [8086]
\c SUB r/m16,reg16               ; o16 29 /r            [8086]
\c SUB r/m32,reg32               ; o32 29 /r            [386]

\c SUB reg8,r/m8                 ; 2A /r                [8086]
\c SUB reg16,r/m16               ; o16 2B /r            [8086]
\c SUB reg32,r/m32               ; o32 2B /r            [386]

\c SUB r/m8,imm8                 ; 80 /5 ib             [8086]
\c SUB r/m16,imm16               ; o16 81 /5 iw         [8086]
\c SUB r/m32,imm32               ; o32 81 /5 id         [386]

\c SUB r/m16,imm8                ; o16 83 /5 ib         [8086]
\c SUB r/m32,imm8                ; o32 83 /5 ib         [386]

\c SUB AL,imm8                   ; 2C ib                [8086]
\c SUB AX,imm16                  ; o16 2D iw            [8086]
\c SUB EAX,imm32                 ; o32 2D id            [386]

\c{SUB} performs integer subtraction: it subtracts its second
operand from its first, and leaves the result in its destination
(first) operand. The flags are set according to the result of the
operation: in particular, the carry flag is affected and can be used
by a subsequent \c{SBB} instruction (\k{insSBB}).

In the forms with an 8-bit immediate second operand and a longer
first operand, the second operand is considered to be signed, and is
sign-extended to the length of the first operand. In these cases,
the \c{BYTE} qualifier is necessary to force NASM to generate this
form of the instruction.

\H{insTEST} \i\c{TEST}: Test Bits (notional bitwise AND)

\c TEST r/m8,reg8                ; 84 /r                [8086]
\c TEST r/m16,reg16              ; o16 85 /r            [8086]
\c TEST r/m32,reg32              ; o32 85 /r            [386]

\c TEST r/m8,imm8                ; F6 /7 ib             [8086]
\c TEST r/m16,imm16              ; o16 F7 /7 iw         [8086]
\c TEST r/m32,imm32              ; o32 F7 /7 id         [386]

\c TEST AL,imm8                  ; A8 ib                [8086]
\c TEST AX,imm16                 ; o16 A9 iw            [8086]
\c TEST EAX,imm32                ; o32 A9 id            [386]

\c{TEST} performs a `mental' bitwise AND of its two operands, and
affects the flags as if the operation had taken place, but does not
store the result of the operation anywhere.

\H{insUMOV} \i\c{UMOV}: User Move Data

\c UMOV r/m8,reg8                ; 0F 10 /r             [386,UNDOC]
\c UMOV r/m16,reg16              ; o16 0F 11 /r         [386,UNDOC]
\c UMOV r/m32,reg32              ; o32 0F 11 /r         [386,UNDOC]

\c UMOV reg8,r/m8                ; 0F 12 /r             [386,UNDOC]
\c UMOV reg16,r/m16              ; o16 0F 13 /r         [386,UNDOC]
\c UMOV reg32,r/m32              ; o32 0F 13 /r         [386,UNDOC]

This undocumented instruction is used by in-circuit emulators to
access user memory (as opposed to host memory). It is used just like
an ordinary memory/register or register/register \c{MOV}
instruction, but accesses user space.

\H{insVERR} \i\c{VERR}, \i\c{VERW}: Verify Segment Readability/Writability

\c VERR r/m16                    ; 0F 00 /4             [286,PRIV]

\c VERW r/m16                    ; 0F 00 /5             [286,PRIV]

\c{VERR} sets the zero flag if the segment specified by the selector
in its operand can be read from at the current privilege level.
\c{VERW} sets the zero flag if the segment can be written.

\H{insWAIT} \i\c{WAIT}: Wait for Floating-Point Processor

\c WAIT                          ; 9B                   [8086]

\c{WAIT}, on 8086 systems with a separate 8087 FPU, waits for the
FPU to have finished any operation it is engaged in before
continuing main processor operations, so that (for example) an FPU
store to main memory can be guaranteed to have completed before the
CPU tries to read the result back out.

On higher processors, \c{WAIT} is unnecessary for this purpose, and
it has the alternative purpose of ensuring that any pending unmasked
FPU exceptions have happened before execution continues.

\H{insWBINVD} \i\c{WBINVD}: Write Back and Invalidate Cache

\c WBINVD                        ; 0F 09                [486]

\c{WBINVD} invalidates and empties the processor's internal caches,
and causes the processor to instruct external caches to do the same.
It writes the contents of the caches back to memory first, so no
data is lost. To flush the caches quickly without bothering to write
the data back first, use \c{INVD} (\k{insINVD}).

\H{insWRMSR} \i\c{WRMSR}: Write Model-Specific Registers

\c WRMSR                         ; 0F 30                [PENT]

\c{WRMSR} writes the value in \c{EDX:EAX} to the processor
Model-Specific Register (MSR) whose index is stored in \c{ECX}. See
also \c{RDMSR} (\k{insRDMSR}).

\H{insXADD} \i\c{XADD}: Exchange and Add

\c XADD r/m8,reg8                ; 0F C0 /r             [486]
\c XADD r/m16,reg16              ; o16 0F C1 /r         [486]
\c XADD r/m32,reg32              ; o32 0F C1 /r         [486]

\c{XADD} exchanges the values in its two operands, and then adds
them together and writes the result into the destination (first)
operand. This instruction can be used with a \c{LOCK} prefix for
multi-processor synchronisation purposes.

\H{insXBTS} \i\c{XBTS}: Extract Bit String

\c XBTS reg16,r/m16              ; o16 0F A6 /r         [386,UNDOC]
\c XBTS reg32,r/m32              ; o32 0F A6 /r         [386,UNDOC]

No clear documentation seems to be available for this instruction:
the best I've been able to find reads `Takes a string of bits from
the first operand and puts them in the second operand'. It is
present only in early 386 processors, and conflicts with the opcodes
for \c{CMPXCHG486}. NASM supports it only for completeness. Its
counterpart is \c{IBTS} (see \k{insIBTS}).

\H{insXCHG} \i\c{XCHG}: Exchange

\c XCHG reg8,r/m8                ; 86 /r                [8086]
\c XCHG reg16,r/m8               ; o16 87 /r            [8086]
\c XCHG reg32,r/m32              ; o32 87 /r            [386]

\c XCHG r/m8,reg8                ; 86 /r                [8086]
\c XCHG r/m16,reg16              ; o16 87 /r            [8086]
\c XCHG r/m32,reg32              ; o32 87 /r            [386]

\c XCHG AX,reg16                 ; o16 90+r             [8086]
\c XCHG EAX,reg32                ; o32 90+r             [386]
\c XCHG reg16,AX                 ; o16 90+r             [8086]
\c XCHG reg32,EAX                ; o32 90+r             [386]

\c{XCHG} exchanges the values in its two operands. It can be used
with a \c{LOCK} prefix for purposes of multi-processor
synchronisation.

\c{XCHG AX,AX} or \c{XCHG EAX,EAX} (depending on the \c{BITS}
setting) generates the opcode \c{90h}, and so is a synonym for
\c{NOP} (\k{insNOP}).

\H{insXLATB} \i\c{XLATB}: Translate Byte in Lookup Table

\c XLATB                         ; D7                   [8086]

\c{XLATB} adds the value in \c{AL}, treated as an unsigned byte, to
\c{BX} or \c{EBX}, and loads the byte from the resulting address (in
the segment specified by \c{DS}) back into \c{AL}.

The base register used is \c{BX} if the address size is 16 bits, and
\c{EBX} if it is 32 bits. If you need to use an address size not
equal to the current \c{BITS} setting, you can use an explicit
\i\c{a16} or \i\c{a32} prefix.

The segment register used to load from \c{[BX+AL]} or \c{[EBX+AL]}
can be overridden by using a segment register name as a prefix (for
example, \c{es xlatb}).

\H{insXOR} \i\c{XOR}: Bitwise Exclusive OR

\c XOR r/m8,reg8                 ; 30 /r                [8086]
\c XOR r/m16,reg16               ; o16 31 /r            [8086]
\c XOR r/m32,reg32               ; o32 31 /r            [386]

\c XOR reg8,r/m8                 ; 32 /r                [8086]
\c XOR reg16,r/m16               ; o16 33 /r            [8086]
\c XOR reg32,r/m32               ; o32 33 /r            [386]

\c XOR r/m8,imm8                 ; 80 /6 ib             [8086]
\c XOR r/m16,imm16               ; o16 81 /6 iw         [8086]
\c XOR r/m32,imm32               ; o32 81 /6 id         [386]

\c XOR r/m16,imm8                ; o16 83 /6 ib         [8086]
\c XOR r/m32,imm8                ; o32 83 /6 ib         [386]

\c XOR AL,imm8                   ; 34 ib                [8086]
\c XOR AX,imm16                  ; o16 35 iw            [8086]
\c XOR EAX,imm32                 ; o32 35 id            [386]

\c{XOR} performs a bitwise XOR operation between its two operands
(i.e. each bit of the result is 1 if and only if exactly one of the
corresponding bits of the two inputs was 1), and stores the result
in the destination (first) operand.

In the forms with an 8-bit immediate second operand and a longer
first operand, the second operand is considered to be signed, and is
sign-extended to the length of the first operand. In these cases,
the \c{BYTE} qualifier is necessary to force NASM to generate this
form of the instruction.

The MMX instruction \c{PXOR} (see \k{insPXOR}) performs the same
operation on the 64-bit MMX registers.