Discussion:
[gem5-users] REX prefix implementation in x86
Abhishek Singh
2018-11-01 16:51:38 UTC
Permalink
Hello Everyone,

I wanted to introduce a new implementation for Mov Instruction using R11
register, my new opcodes are placed in two_byte.isa and I have duplicated
'mov' functionality present in files move.py and ldstop.isa.

My question is: I understand how to decode opcode for example if the new
opcode is '0x11'
take top 5 bits and then 3 bits to write a case function in two_byte.isa

I am not understanding, how should I make sure it uses REX format same as
MOV?


For example:
In the case of 8 bits:


*41* 8a 03 mov (%r11),%al

*41* 0f xx 03 new_mov (%r11),%al

In the case of 16*: *

*66 41* 8b 03 mov (%r11),%ax

*66 41* 0f xx 03 new_mov (%r11),%ax


In the case of 32*: *

*41* 8b 03 mov (%r11),%eax

*41* 0f xx 03 new_mov (%r11),%eax


In the case of 64*: *

*49* 8b 03 mov (%r11),%rax

*49* 0f xx 03 new_mov (%r11),%rax

***Numbers in bold are REX bits, xx are new opcodes.

Gabe or anyone who has any information on this?


Best regards,

Abhishek
Gabe Black
2018-11-01 22:25:17 UTC
Permalink
Hi Abhishek. In x86, and in gem5 in general but particularly in x86,
decoding happens in two steps. The predecoder reads in the bytes which are
in memory and applies context to them (operating mode, various global
settings like address sizes) and translates them into a canonical structure
called an ExtMachInst. In x86, that step gathers up all the prefixes,
opcode bytes, etc., and stores them in the ExtMachInst. When an instruction
is specified in the decoder, it has some parameters which specify what
format its operands come in. That's useful if the basic functionality of
the instruction is the same, but in different scenarios it uses register
indices from different parts of the encoding for instance. If that flavor
of operand is defined to include bits from the REX prefix, then that will
be factored in when that instruction is set up. The format of those
specifiers is modeled after an encoding you'll find in the AMD architecture
manuals where it serves a similar purpose, and you can look at that to get
an idea of what a particular specifier means.

If you use the same operand suffixes as regular mov does (for instance
Ev,Gv), then your mov should get its arguments in the same way. For
reference, E means that operand may be a register or a memory location
based on the ModRM byte, and G means the "reg" field of modRM. The small v
means to use the effective operand size.

Gabe

On Thu, Nov 1, 2018 at 9:51 AM Abhishek Singh <
Post by Abhishek Singh
Hello Everyone,
I wanted to introduce a new implementation for Mov Instruction using R11
register, my new opcodes are placed in two_byte.isa and I have duplicated
'mov' functionality present in files move.py and ldstop.isa.
My question is: I understand how to decode opcode for example if the new
opcode is '0x11'
take top 5 bits and then 3 bits to write a case function in two_byte.isa
I am not understanding, how should I make sure it uses REX format same as
MOV?
*41* 8a 03 mov (%r11),%al
*41* 0f xx 03 new_mov (%r11),%al
In the case of 16*: *
*66 41* 8b 03 mov (%r11),%ax
*66 41* 0f xx 03 new_mov (%r11),%ax
In the case of 32*: *
*41* 8b 03 mov (%r11),%eax
*41* 0f xx 03 new_mov (%r11),%eax
In the case of 64*: *
*49* 8b 03 mov (%r11),%rax
*49* 0f xx 03 new_mov (%r11),%rax
***Numbers in bold are REX bits, xx are new opcodes.
Gabe or anyone who has any information on this?
Best regards,
Abhishek
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Abhishek Singh
2018-11-02 01:37:44 UTC
Permalink
Hello Gabe,

Thanks for your help, just to verify what I have understood from your
explanation is, to add new instruction which behaves like MOV
I just need to take care of using proper operands(Gb,Eb), and *REX(Prefix)
will be taken care automatically*.

From available opcodes in two_byte.isa, I have chosen 6c, 6d, 7c, and 7d.
for example to implement
6c:New_mov(Eb,Gb)

I just add following line it in two_byte.isa file
""

0x0D: decode LEGACY_DECODEVAL {

// no prefix

0x0: decode OPCODE_OP_BOTTOM3 {

{

0x4: NEWMOV(Eb,Gb);

}

}
""
And just duplicate function by changing name in "
insts/general_purpose/data_transfer/move.py" and "microops/ldstop.isa"

So if I create binary for "41 0f 6c 03" (for NEWMOV (%r11),%al)

I do have to worry for "41" in "41 0f 6c 03" (41 is used for Extension of
r/m field, base field, or opcode reg field(reference:
http://ref.x86asm.net/coder64.html))

Is this correct?

Best regards,

Abhishek
Post by Gabe Black
Hi Abhishek. In x86, and in gem5 in general but particularly in x86,
decoding happens in two steps. The predecoder reads in the bytes which are
in memory and applies context to them (operating mode, various global
settings like address sizes) and translates them into a canonical structure
called an ExtMachInst. In x86, that step gathers up all the prefixes,
opcode bytes, etc., and stores them in the ExtMachInst. When an instruction
is specified in the decoder, it has some parameters which specify what
format its operands come in. That's useful if the basic functionality of
the instruction is the same, but in different scenarios it uses register
indices from different parts of the encoding for instance. If that flavor
of operand is defined to include bits from the REX prefix, then that will
be factored in when that instruction is set up. The format of those
specifiers is modeled after an encoding you'll find in the AMD architecture
manuals where it serves a similar purpose, and you can look at that to get
an idea of what a particular specifier means.
If you use the same operand suffixes as regular mov does (for instance
Ev,Gv), then your mov should get its arguments in the same way. For
reference, E means that operand may be a register or a memory location
based on the ModRM byte, and G means the "reg" field of modRM. The small v
means to use the effective operand size.
Gabe
On Thu, Nov 1, 2018 at 9:51 AM Abhishek Singh <
Post by Abhishek Singh
Hello Everyone,
I wanted to introduce a new implementation for Mov Instruction using R11
register, my new opcodes are placed in two_byte.isa and I have duplicated
'mov' functionality present in files move.py and ldstop.isa.
My question is: I understand how to decode opcode for example if the new
opcode is '0x11'
take top 5 bits and then 3 bits to write a case function in two_byte.isa
I am not understanding, how should I make sure it uses REX format same as
MOV?
*41* 8a 03 mov (%r11),%al
*41* 0f xx 03 new_mov (%r11),%al
In the case of 16*: *
*66 41* 8b 03 mov (%r11),%ax
*66 41* 0f xx 03 new_mov (%r11),%ax
In the case of 32*: *
*41* 8b 03 mov (%r11),%eax
*41* 0f xx 03 new_mov (%r11),%eax
In the case of 64*: *
*49* 8b 03 mov (%r11),%rax
*49* 0f xx 03 new_mov (%r11),%rax
***Numbers in bold are REX bits, xx are new opcodes.
Gabe or anyone who has any information on this?
Best regards,
Abhishek
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Abhishek Singh
2018-11-02 04:13:30 UTC
Permalink
There was typo in my last line
It is
I do *NOT* have to worry for "41" in "41 0f 6c 03" (41 is used for Extension
of r/m field, base field, or opcode reg field(reference:
http://ref.x86asm.net/coder64.html))


On Thu, Nov 1, 2018 at 9:37 PM Abhishek Singh <
Post by Abhishek Singh
Hello Gabe,
Thanks for your help, just to verify what I have understood from your
explanation is, to add new instruction which behaves like MOV
I just need to take care of using proper operands(Gb,Eb), and *REX(Prefix)
will be taken care automatically*.
From available opcodes in two_byte.isa, I have chosen 6c, 6d, 7c, and 7d.
for example to implement
6c:New_mov(Eb,Gb)
I just add following line it in two_byte.isa file
""
0x0D: decode LEGACY_DECODEVAL {
// no prefix
0x0: decode OPCODE_OP_BOTTOM3 {
{
0x4: NEWMOV(Eb,Gb);
}
}
""
And just duplicate function by changing name in "
insts/general_purpose/data_transfer/move.py" and "microops/ldstop.isa"
So if I create binary for "41 0f 6c 03" (for NEWMOV (%r11),%al)
I do have to worry for "41" in "41 0f 6c 03" (41 is used for Extension of
http://ref.x86asm.net/coder64.html))
Is this correct?
Best regards,
Abhishek
Post by Gabe Black
Hi Abhishek. In x86, and in gem5 in general but particularly in x86,
decoding happens in two steps. The predecoder reads in the bytes which are
in memory and applies context to them (operating mode, various global
settings like address sizes) and translates them into a canonical structure
called an ExtMachInst. In x86, that step gathers up all the prefixes,
opcode bytes, etc., and stores them in the ExtMachInst. When an instruction
is specified in the decoder, it has some parameters which specify what
format its operands come in. That's useful if the basic functionality of
the instruction is the same, but in different scenarios it uses register
indices from different parts of the encoding for instance. If that flavor
of operand is defined to include bits from the REX prefix, then that will
be factored in when that instruction is set up. The format of those
specifiers is modeled after an encoding you'll find in the AMD architecture
manuals where it serves a similar purpose, and you can look at that to get
an idea of what a particular specifier means.
If you use the same operand suffixes as regular mov does (for instance
Ev,Gv), then your mov should get its arguments in the same way. For
reference, E means that operand may be a register or a memory location
based on the ModRM byte, and G means the "reg" field of modRM. The small v
means to use the effective operand size.
Gabe
On Thu, Nov 1, 2018 at 9:51 AM Abhishek Singh <
Post by Abhishek Singh
Hello Everyone,
I wanted to introduce a new implementation for Mov Instruction using R11
register, my new opcodes are placed in two_byte.isa and I have duplicated
'mov' functionality present in files move.py and ldstop.isa.
My question is: I understand how to decode opcode for example if the new
opcode is '0x11'
take top 5 bits and then 3 bits to write a case function in two_byte.isa
I am not understanding, how should I make sure it uses REX format same
as MOV?
*41* 8a 03 mov (%r11),%al
*41* 0f xx 03 new_mov (%r11),%al
In the case of 16*: *
*66 41* 8b 03 mov (%r11),%ax
*66 41* 0f xx 03 new_mov (%r11),%ax
In the case of 32*: *
*41* 8b 03 mov (%r11),%eax
*41* 0f xx 03 new_mov (%r11),%eax
In the case of 64*: *
*49* 8b 03 mov (%r11),%rax
*49* 0f xx 03 new_mov (%r11),%rax
***Numbers in bold are REX bits, xx are new opcodes.
Gabe or anyone who has any information on this?
Best regards,
Abhishek
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Gabe Black
2018-11-03 00:41:06 UTC
Permalink
You don't need to worry about changing ldstop.isa unless you're adding a
new microop also, but yes I think that's correct. If you use Eb and Gb, I
think you're restricting your operand size to always be a byte, but that
might be what you want.

Gabe

On Thu, Nov 1, 2018 at 9:13 PM Abhishek Singh <
Post by Abhishek Singh
There was typo in my last line
It is
I do *NOT* have to worry for "41" in "41 0f 6c 03" (41 is used for Extension
http://ref.x86asm.net/coder64.html))
On Thu, Nov 1, 2018 at 9:37 PM Abhishek Singh <
Post by Abhishek Singh
Hello Gabe,
Thanks for your help, just to verify what I have understood from your
explanation is, to add new instruction which behaves like MOV
I just need to take care of using proper operands(Gb,Eb), and *REX(Prefix)
will be taken care automatically*.
From available opcodes in two_byte.isa, I have chosen 6c, 6d, 7c, and 7d.
for example to implement
6c:New_mov(Eb,Gb)
I just add following line it in two_byte.isa file
""
0x0D: decode LEGACY_DECODEVAL {
// no prefix
0x0: decode OPCODE_OP_BOTTOM3 {
{
0x4: NEWMOV(Eb,Gb);
}
}
""
And just duplicate function by changing name in "
insts/general_purpose/data_transfer/move.py" and "microops/ldstop.isa"
So if I create binary for "41 0f 6c 03" (for NEWMOV (%r11),%al)
I do have to worry for "41" in "41 0f 6c 03" (41 is used for Extension
http://ref.x86asm.net/coder64.html))
Is this correct?
Best regards,
Abhishek
Post by Gabe Black
Hi Abhishek. In x86, and in gem5 in general but particularly in x86,
decoding happens in two steps. The predecoder reads in the bytes which are
in memory and applies context to them (operating mode, various global
settings like address sizes) and translates them into a canonical structure
called an ExtMachInst. In x86, that step gathers up all the prefixes,
opcode bytes, etc., and stores them in the ExtMachInst. When an instruction
is specified in the decoder, it has some parameters which specify what
format its operands come in. That's useful if the basic functionality of
the instruction is the same, but in different scenarios it uses register
indices from different parts of the encoding for instance. If that flavor
of operand is defined to include bits from the REX prefix, then that will
be factored in when that instruction is set up. The format of those
specifiers is modeled after an encoding you'll find in the AMD architecture
manuals where it serves a similar purpose, and you can look at that to get
an idea of what a particular specifier means.
If you use the same operand suffixes as regular mov does (for instance
Ev,Gv), then your mov should get its arguments in the same way. For
reference, E means that operand may be a register or a memory location
based on the ModRM byte, and G means the "reg" field of modRM. The small v
means to use the effective operand size.
Gabe
On Thu, Nov 1, 2018 at 9:51 AM Abhishek Singh <
Post by Abhishek Singh
Hello Everyone,
I wanted to introduce a new implementation for Mov Instruction using
R11 register, my new opcodes are placed in two_byte.isa and I have
duplicated 'mov' functionality present in files move.py and ldstop.isa.
My question is: I understand how to decode opcode for example if the
new opcode is '0x11'
take top 5 bits and then 3 bits to write a case function in two_byte.isa
I am not understanding, how should I make sure it uses REX format same
as MOV?
*41* 8a 03 mov (%r11),%al
*41* 0f xx 03 new_mov (%r11),%al
In the case of 16*: *
*66 41* 8b 03 mov (%r11),%ax
*66 41* 0f xx 03 new_mov (%r11),%ax
In the case of 32*: *
*41* 8b 03 mov (%r11),%eax
*41* 0f xx 03 new_mov (%r11),%eax
In the case of 64*: *
*49* 8b 03 mov (%r11),%rax
*49* 0f xx 03 new_mov (%r11),%rax
***Numbers in bold are REX bits, xx are new opcodes.
Gabe or anyone who has any information on this?
Best regards,
Abhishek
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Abhishek Singh
2018-11-03 00:42:42 UTC
Permalink
Thanks for the clarification.
You helped me a lot thanks :-)
Post by Gabe Black
You don't need to worry about changing ldstop.isa unless you're adding a
new microop also, but yes I think that's correct. If you use Eb and Gb, I
think you're restricting your operand size to always be a byte, but that
might be what you want.
Gabe
On Thu, Nov 1, 2018 at 9:13 PM Abhishek Singh <
Post by Abhishek Singh
There was typo in my last line
It is
I do *NOT* have to worry for "41" in "41 0f 6c 03" (41 is used for Extension
http://ref.x86asm.net/coder64.html))
On Thu, Nov 1, 2018 at 9:37 PM Abhishek Singh <
Post by Abhishek Singh
Hello Gabe,
Thanks for your help, just to verify what I have understood from your
explanation is, to add new instruction which behaves like MOV
I just need to take care of using proper operands(Gb,Eb), and *REX(Prefix)
will be taken care automatically*.
From available opcodes in two_byte.isa, I have chosen 6c, 6d, 7c, and 7d.
for example to implement
6c:New_mov(Eb,Gb)
I just add following line it in two_byte.isa file
""
0x0D: decode LEGACY_DECODEVAL {
// no prefix
0x0: decode OPCODE_OP_BOTTOM3 {
{
0x4: NEWMOV(Eb,Gb);
}
}
""
And just duplicate function by changing name in "
insts/general_purpose/data_transfer/move.py" and "microops/ldstop.isa"
So if I create binary for "41 0f 6c 03" (for NEWMOV (%r11),%al)
I do have to worry for "41" in "41 0f 6c 03" (41 is used for Extension
http://ref.x86asm.net/coder64.html))
Is this correct?
Best regards,
Abhishek
Post by Gabe Black
Hi Abhishek. In x86, and in gem5 in general but particularly in x86,
decoding happens in two steps. The predecoder reads in the bytes which are
in memory and applies context to them (operating mode, various global
settings like address sizes) and translates them into a canonical structure
called an ExtMachInst. In x86, that step gathers up all the prefixes,
opcode bytes, etc., and stores them in the ExtMachInst. When an instruction
is specified in the decoder, it has some parameters which specify what
format its operands come in. That's useful if the basic functionality of
the instruction is the same, but in different scenarios it uses register
indices from different parts of the encoding for instance. If that flavor
of operand is defined to include bits from the REX prefix, then that will
be factored in when that instruction is set up. The format of those
specifiers is modeled after an encoding you'll find in the AMD architecture
manuals where it serves a similar purpose, and you can look at that to get
an idea of what a particular specifier means.
If you use the same operand suffixes as regular mov does (for instance
Ev,Gv), then your mov should get its arguments in the same way. For
reference, E means that operand may be a register or a memory location
based on the ModRM byte, and G means the "reg" field of modRM. The small v
means to use the effective operand size.
Gabe
On Thu, Nov 1, 2018 at 9:51 AM Abhishek Singh <
Post by Abhishek Singh
Hello Everyone,
I wanted to introduce a new implementation for Mov Instruction using
R11 register, my new opcodes are placed in two_byte.isa and I have
duplicated 'mov' functionality present in files move.py and ldstop.isa.
My question is: I understand how to decode opcode for example if the
new opcode is '0x11'
take top 5 bits and then 3 bits to write a case function in two_byte.isa
I am not understanding, how should I make sure it uses REX format same
as MOV?
*41* 8a 03 mov (%r11),%al
*41* 0f xx 03 new_mov (%r11),%al
In the case of 16*: *
*66 41* 8b 03 mov (%r11),%ax
*66 41* 0f xx 03 new_mov (%r11),%ax
In the case of 32*: *
*41* 8b 03 mov (%r11),%eax
*41* 0f xx 03 new_mov (%r11),%eax
In the case of 64*: *
*49* 8b 03 mov (%r11),%rax
*49* 0f xx 03 new_mov (%r11),%rax
***Numbers in bold are REX bits, xx are new opcodes.
Gabe or anyone who has any information on this?
Best regards,
Abhishek
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Loading...