shithub: rgbds

Download patch

ref: f3f2c2ca16ba3d624829bcece86c60f0e060c4ac
parent: 9ec8186ac6e6dd9715e542ecf50ae6c29ee6f7aa
author: Eldred Habert <eldredhabert0@gmail.com>
date: Fri Jul 29 18:48:55 EDT 2022

Improve object file format documentation (#1010)

Replacing the big pre-formatted text block with a list brings:
- Better accessibility, obviously
- Responsiveness
- Better formatting (bold, etc.)
- Sub-sections that can now be linked to
- Hyperlink cross-refs to other pages

The slight disadvantage is that `ENDC` etc. are now individual
list items, whereas they'd be better as part of the same item.
No big deal though, it was much worse before.

Some descriptions have been overhauled for clarity, and some
outright corrected (such as Assertions' "Offset" field).

Co-authored-by: Antonio Vivace <avivace4@gmail.com>

--- a/man/rgbds.5
+++ b/man/rgbds.5
@@ -16,251 +16,381 @@
 .Xr rgbasm 1
 and
 .Xr rgblink 1 .
-.Em Please note that the specifications may change .
-This toolchain is in development and new features may require adding more information to the current format, or modifying some fields, which would break compatibility with older versions.
+.Em Please note that the specification is not stable yet.
+RGBDS is still in active development, and some new features require adding more information to the object file, or modifying some fields, both of which break compatibility with older versions.
 .Sh FILE STRUCTURE
 The following types are used:
 .Pp
-.Ar LONG
+.Cm LONG
 is a 32-bit integer stored in little-endian format.
-.Ar BYTE
+.Cm BYTE
 is an 8-bit integer.
-.Ar STRING
+.Cm STRING
 is a 0-terminated string of
-.Ar BYTE .
-.Bd -literal
-; Header
-
-BYTE    ID[4]            ; "RGB9"
-LONG    RevisionNumber   ; The format's revision number this file uses.
-LONG    NumberOfSymbols  ; The number of symbols used in this file.
-LONG    NumberOfSections ; The number of sections used in this file.
-
-; File info
-
-LONG    NumberOfNodes       ; The number of nodes contained in this file.
-
-REPT NumberOfNodes          ; IMPORTANT NOTE: the nodes are actually written in
-                            ; **reverse** order, meaning the node with ID 0 is
-                            ; the last one in the file!
-
-    LONG    ParentID        ; ID of the parent node, -1 means this is the root.
-
-    LONG    ParentLineNo    ; Line at which the parent context was exited.
-                            ; Meaningless on the root node.
-
-    BYTE    Type            ; 0 = REPT node
-                            ; 1 = File node
-                            ; 2 = Macro node
-
-    IF Type != 0            ; If the node is not a REPT...
-
-        STRING  Name        ; The node's name: either a file name, or macro name
-                            ; prefixed by its definition file name.
-
-    ELSE                    ; If the node is a REPT, it also contains the iter
-                            ; counts of all the parent REPTs.
-
-        LONG    Depth       ; Size of the array below.
-
-        LONG    Iter[Depth] ; The number of REPT iterations by increasing depth.
-
-    ENDC
-
-ENDR
-
-; Symbols
-
-REPT    NumberOfSymbols    ; Number of symbols defined in this object file.
-
-    STRING  Name           ; The name of this symbol. Local symbols are stored
-                           ; as "Scope.Symbol".
-
-    BYTE    Type           ; 0 = LOCAL symbol only used in this file.
-                           ; 1 = IMPORT this symbol from elsewhere
-                           ; 2 = EXPORT this symbol to other objects.
-
-    IF (Type & 0x7F) != 1  ; If symbol is defined in this object file.
-
-        LONG    SourceFile ; File where the symbol is defined.
-
-        LONG    LineNum    ; Line number in the file where the symbol is defined.
-
-        LONG    SectionID  ; The section number (of this object file) in which
-                           ; this symbol is defined. If it doesn't belong to any
-                           ; specific section (like a constant), this field has
-                           ; the value -1.
-
-        LONG    Value      ; The symbols value. It's the offset into that
-                           ; symbol's section.
-
-    ENDC
-
-ENDR
-
-; Sections
-
-REPT NumberOfSections
-    STRING  Name  ; Name of the section
-
-    LONG    Size  ; Size in bytes of this section
-
-    BYTE    Type  ; 0 = WRAM0
-                  ; 1 = VRAM
-                  ; 2 = ROMX
-                  ; 3 = ROM0
-                  ; 4 = HRAM
-                  ; 5 = WRAMX
-                  ; 6 = SRAM
-                  ; 7 = OAM
-                  ; Bits 7 and 6 are independent from the above value:
-                  ; Bit 7 encodes whether the section is unionized
-                  ; Bit 6 encodes whether the section is a fragment
-                  ; Bits 6 and 7 may not be both set at the same time!
-
-    LONG    Org   ; Address to fix this section at. -1 if the linker should
-                  ; decide (floating address).
-
-    LONG    Bank  ; Bank to load this section into. -1 if the linker should
-                  ; decide (floating bank). This field is only valid for ROMX,
-                  ; VRAM, WRAMX and SRAM sections.
-
-    BYTE    Align ; Alignment of this section, as N bits. 0 when not specified.
-
-    LONG    Ofs   ; Offset relative to the alignment specified above.
-                  ; Must be below 1 << Align.
-
-    IF      (Type == ROMX) || (Type == ROM0) ; Sections that can contain data.
-
-        BYTE    Data[Size]      ; Raw data of the section.
-
-        LONG    NumberOfPatches ; Number of patches to apply.
-
-        REPT    NumberOfPatches
-
-            LONG    SourceFile   ; ID of the source file node (for printing
-                                 ; error messages).
-
-            LONG    LineNo       ; Line at which the patch was created.
-
-            LONG    Offset       ; Offset into the section where patch should
-                                 ; be applied (in bytes).
-
-            LONG    PCSectionID  ; Index within the file of the section in which
-                                 ; PC is located.
-                                 ; This is usually the same section that the
-                                 ; patch should be applied into, except e.g.
-                                 ; with LOAD blocks.
-
-            LONG    PCOffset     ; PC's offset into the above section.
-                                 ; Used because the section may be floating, so
-                                 ; PC's value is not known to RGBASM.
-
-            BYTE    Type         ; 0 = BYTE patch.
-                                 ; 1 = little endian WORD patch.
-                                 ; 2 = little endian LONG patch.
-                                 ; 3 = JR offset value BYTE patch.
-
-            LONG    RPNSize      ; Size of the buffer with the RPN.
-                                 ; expression.
-
-            BYTE    RPN[RPNSize] ; RPN expression. Definition below.
-
-        ENDR
-
-    ENDC
-
-ENDR
-
-; Assertions
-
-LONG  NumberOfAssertions
-
-REPT  NumberOfAssertions
-
-  LONG    SourceFile   ; ID of the source file node (for printing the failure).
-
-  LONG    LineNo       ; Line at which the assertion was created.
-
-  LONG    Offset       ; Offset into the section where the assertion is located.
-
-  LONG    SectionID    ; Index within the file of the section in which PC is
-                       ; located, or -1 if defined outside a section.
-
-  LONG    PCOffset     ; PC's offset into the above section.
-                       ; Used because the section may be floating, so PC's value
-                       ; is not known to RGBASM.
-
-  BYTE    Type         ; 0 = Prints the message but allows linking to continue
-                       ; 1 = Prints the message and evaluates other assertions,
-                       ;     but linking fails afterwards
-                       ; 2 = Prints the message and immediately fails linking
-
-  LONG    RPNSize      ; Size of the RPN expression's buffer.
-
-  BYTE    RPN[RPNSize] ; RPN expression, same as patches. Assert fails if == 0.
-
-  STRING  Message      ; A message displayed when the assert fails. If set to
-                       ; the empty string, a generic message is printed instead.
-
-ENDR
-.Ed
-.Ss RPN DATA
-Expressions in the object file are stored as RPN.
-This is an expression of the form
-.Dq 2 5 + .
-This will first push the value
-.Do 2 Dc to the stack, then
+.Cm BYTE .
+Brackets after a type
+.Pq e.g. Cm LONG Ns Bq Ar n
+indicate
+.Ar n
+consecutive elements
+.Pq here, Cm LONG Ns s .
+All items are contiguous, with no padding anywhere\(emthis also means that they may not be aligned in the file!
+.Pp
+.Cm REPT Ar n
+indicates that the fields between the
+.Cm REPT
+and corresponding
+.Cm ENDR
+are repeated
+.Ar n
+times.
+.Pp
+All IDs refer to objects within the file; for example, symbol ID $0001 refers to the second symbol defined in
+.Em this
+object file's
+.Sx Symbols
+array.
+The only exception is the
+.Sx Source file info
+nodes, whose IDs are backwards, i.e. source node ID $0000 refers to the
+.Em last
+node in the array, not the first one.
+References to other object files are made by imports (symbols), by name (sections), etc.\(embut never by ID.
+.Ss Header
+.Bl -tag -width Ds -compact
+.It Cm BYTE Ar Magic[4]
+"RGB9"
+.It Cm LONG Ar RevisionNumber
+The format's revision number this file uses.
+.Pq This is always in the same place in all revisions.
+.It Cm LONG Ar NumberOfSymbols
+How many symbols are defined in this object file.
+.It Cm LONG Ar NumberOfSections
+How many sections are defined in this object file.
+.El
+.Ss Source file info
+.Bl -tag -width Ds -compact
+.It Cm LONG Ar NumberOfNodes
+The number of source context nodes contained in this file.
+.It Cm REPT Ar NumberOfNodes
+.Bl -tag -width Ds -compact
+.It Cm LONG Ar ParentID
+ID of the parent node, -1 meaning that this is the root node.
+.Pp
+.Sy Important :
+the nodes are actually written in
+.Sy reverse
+order, meaning the node with ID 0 is the last one in the list!
+.It Cm LONG Ar ParentLineNo
+Line at which the parent node's context was exited; meaningless for the root node.
+.It Cm BYTE Ar Type
+.Bl -column "Value" -compact
+.It Sy Value Ta Sy Meaning
+.It 0 Ta REPT node
+.It 1 Ta File node
+.It 2 Ta Macro node
+.El
+.It Cm IF Ar Type No \(!= 0
+If the node is not a REPT node...
+.Pp
+.Bl -tag -width Ds -compact
+.It Cm STRING Ar Name
+The node's name: either a file name, or the macro's name prefixes by its definition's file name
+.Pq e.g. Ql src/includes/defines.asm::error .
+.El
+.It Cm ELSE
+If the node is a REPT, it also contains the iteration counter of all parent REPTs.
+.Pp
+.Bl -tag -width Ds -compact
+.It Cm LONG Ar Depth
+.It Cm LONG Ar Iter Ns Bq Ar Depth
+The number of REPT iterations, by increasing depth.
+.El
+.It Cm ENDC
+.El
+.It Cm ENDR
+.El
+.Ss Symbols
+.Bl -tag -width Ds -compact
+.It Cm REPT Ar NumberOfSymbols
+.Bl -tag -width Ds -compact
+.It Cm STRING Ar Name
+This symbol's name.
+Local symbols are stored as their full name
+.Pq Ql Scope.symbol .
+.It Cm BYTE Ar Type
+.Bl -column "Value" -compact
+.It Sy Value Ta Sy Meaning
+.It 0 Ta Sy Local No symbol only used in this file.
+.It 1 Ta Sy Import No of an exported symbol (by name) from another object file.
+.It 2 Ta Sy Exported No symbol visible from other object files.
+.El
+.It Cm IF Ar Type No \(!= 1
+If the symbol is defined in this object file...
+.Pp
+.Bl -tag -width Ds -compact
+.It Cm LONG Ar NodeID
+Context in which the symbol was defined.
+.It Cm LONG Ar LineNo
+Line number in the context at which the symbol was defined.
+.It Cm LONG Ar SectionID
+The ID of the section in which the symbol is defined.
+If the symbol doesn't belong to any specific section (i.e. it's a constant), this field contains -1.
+.It Cm LONG Ar Value
+The symbol's value.
+If the symbol belongs to a section, this is the offset within that symbol's section.
+.El
+.It Cm ENDC
+.El
+.It Cm ENDR
+.El
+.Ss Sections
+.Bl -tag -width Ds -compact
+.It Cm REPT Ar NumberOfSections
+.Bl -tag -width Ds -compact
+.It Cm STRING Ar Name
+The section's name.
+.It Cm LONG Ar Size
+The section's size, in bytes.
+.It Cm BYTE Ar Type
+Bits 0\(en2 indicate the section's type:
+.Bl -column "Value" -compact
+.It Sy Value Ta Sy Meaning
+.It 0 Ta WRAM0
+.It 1 Ta VRAM
+.It 2 Ta ROMX
+.It 3 Ta ROM0
+.It 4 Ta HRAM
+.It 5 Ta WRAMX
+.It 6 Ta SRAM
+.It 7 Ta OAM
+.El
+.Pp
+Bit\ 7 being set means that the section is a "union"
+.Pq see Do Unionized sections Dc in Xr rgbasm 5 .
+Bit\ 6 being set means that the section is a "fragment"
+.Pq see Do Section fragments Dc in Xr rgbasm 5 .
+These two bits are mutually exclusive.
+.It Cm LONG Ar Address
+Address this section must be placed at.
+This must either be valid for the section's
+.Ar Type
+(as affected by flags like
+.Fl t
+or
+.Fl d
+in
+.Xr rgblink 1 ) ,
+or -1 to indicate that the linker should automatically decide
+.Pq the section is Dq floating .
+.It Cm LONG Ar Bank
+ID of the bank this section must be placed in.
+This must either be valid for the section's
+.Ar Type
+(with the same caveats as for the
+.Ar Address ) ,
+or -1 to indicate that the linker should automatically decide.
+.It Cm BYTE Ar Alignment
+How many bits of the section's address should be equal to
+.Ar AlignOfs ,
+starting from the least-significant bit.
+.It Cm LONG Ar AlignOfs
+Alignment offset.
+Must be strictly less than
+.Ql 1 << Ar Alignment .
+.It Cm IF Ar Type No \(eq 2 || Ar Type No \(eq 3
+If the section has ROM type, it contains data.
+.Pp
+.Bl -tag -width Ds -compact
+.It Cm BYTE Ar Data Ns Bq Size
+The section's raw data.
+Bytes that will be patched over must be present, even though their contents will be overwritten.
+.It Cm LONG Ar NumberOfPatches
+How many patches must be applied to this section's
+.Ar Data .
+.It Cm REPT Ar NumberOfPatches
+.Bl -tag -width Ds -compact
+.It Cm LONG Ar NodeID
+Context in which the patch was defined.
+.It Cm LONG Ar LineNo
+Line number in the context at which the patch was defined.
+.It Cm LONG Ar Offset
+Offset within the section's
+.Ar Data
+at which the patch should be applied.
+Must not be greater than the section's
+.Ar Size
+minus the patch's size
+.Pq see Ar Type No below .
+.It Cm LONG Ar PCSectionID
+ID of the section in which PC is located.
+(This is usually the same section within which the patch is applied, except for e.g.\&
+.Ql LOAD
+blocks, see
+.Do RAM code Dc in Xr rgbasm 5 . )
+.It Cm LONG Ar PCOffset
+Offset of the PC symbol within the section designated by
+.Ar PCSectionID .
+It is expected that PC points to the instruction's first byte for instruction operands (i.e.\&
+.Ql jp @
+must be an infinite loop), and to the patch's first byte otherwise
+.Ql ( db ,
+.Ql dw ,
+.Ql dl ) .
+.It Cm BYTE Ar Type
+.Bl -column "Value" -compact
+.It Sy Value Ta Sy Meaning
+.It 0 Ta Single-byte patch
+.It 1 Ta Little-endian two-byte patch
+.It 2 Ta Little-endian four-byte patch
+.It 3 Ta Single-byte Ql jr
+patch; the patch's value will be subtracted to PC + 2 (i.e.\&
+.Ql jr @
+must be the infinite loop
+.Ql 18 FE ) .
+.El
+.It Cm LONG Ar RPNSize
+Size of the
+.Ar RPNExpr
+below.
+.It Cm BYTE Ar RPNExpr Ns Bq RPNSize
+The patch's value, encoded as a RPN expression
+.Pq see Sx RPN EXPRESSIONS .
+.El
+.It Cm ENDR
+.El
+.It Cm ENDC
+.El
+.El
+.Ss Assertions
+.Bl -tag -width Ds -compact
+.It Cm LONG Ar NumberOfAssertions
+How many assertions this object file contains.
+.It Cm REPT Ar NumberOfAssertions
+Assertions are essentially patches with a message.
+.Pp
+.Bl -tag -width Ds -compact
+.It Cm LONG Ar NodeID
+Context in which the assertions was defined.
+.It Cm LONG Ar LineNo
+Line number in the context at which the assertion was defined.
+.It Cm LONG Ar Offset
+Unused leftover from the patch structure.
+.It Cm LONG Ar PCSectionID
+ID of the section in which PC is located.
+.It Cm LONG Ar PCOffset
+Offset of the PC symbol within the section designated by
+.Ar PCSectionID .
+.It Cm BYTE Ar Type
+Describes what should happen if the expression evaluates to a non-zero value.
+.Bl -column "Value" -compact
+.It Sy Value Ta Sy Meaning
+.It 0 Ta Print a warning message, and continue linking normally.
+.It 1 Ta Print an error message, so linking will fail, but allow other assertions to be evaluated.
+.It 2 Ta Print a fatal error message, and abort immediately.
+.El
+.It Cm LONG Ar RPNSize
+Size of the
+.Ar RPNExpr
+below.
+.It Cm BYTE Ar RPNExpr Ns Bq RPNSize
+The patch's value, encoded as a RPN expression
+.Pq see Sx RPN EXPRESSIONS .
+.It Cm STRING Ar Message
+The message displayed if the expression evaluates to a non-zero value.
+If empty, a generic message is displayed instead.
+.El
+.It Cm ENDR
+.El
+.Ss RPN EXPRESSIONS
+Expressions in the object file are stored as RPN, or
+.Dq Reverse Polish Notation ,
+which is a notation that allows computing arbitrary expressions with just a simple stack.
+For example, the expression
+.Ql 2 5 -
+will first push the value
+.Dq 2
+to the stack, then
 .Dq 5 .
 The
-.Do + Dc operator pops two arguments from the stack, adds them, and then pushes the result on the stack, effectively replacing the two top arguments with their sum.
-In the RGB format, RPN expressions are stored as
-.Ar BYTE Ns s
-with some bytes being special prefixes for integers and symbols.
-.Bl -column -offset indent "Sy String" "Sy String"
+.Ql -
+operator pops two arguments from the stack, subtracts them, and then pushes back the result
+.Pq Dq 3
+on the stack.
+A well-formed RPN expression never tries to pop from an empty stack, and leaves exactly one value in it at the end.
+.Pp
+RGBDS encodes RPN expressions as an array of
+.Cm BYTE Ns s .
+The first byte encodes either an operator, or a literal, which consumes more
+.Cm BYTE Ns s
+after it.
+.Bl -column -offset Ds "Value"
 .It Sy Value Ta Sy Meaning
-.It Li $00 Ta Li + operator
-.It Li $01 Ta Li - operator
-.It Li $02 Ta Li * operator
-.It Li $03 Ta Li / operator
-.It Li $04 Ta Li % operator
-.It Li $05 Ta Li unary -
-.It Li $06 Ta Li ** operator
-.It Li $10 Ta Li \&| operator
-.It Li $11 Ta Li & operator
-.It Li $12 Ta Li ^ operator
-.It Li $13 Ta Li unary ~
-.It Li $21 Ta Li && comparison
-.It Li $22 Ta Li || comparison
-.It Li $23 Ta Li unary \&!
-.It Li $30 Ta Li == comparison
-.It Li $31 Ta Li != comparison
-.It Li $32 Ta Li > comparison
-.It Li $33 Ta Li < comparison
-.It Li $34 Ta Li >= comparison
-.It Li $35 Ta Li <= comparison
-.It Li $40 Ta Li << operator
-.It Li $41 Ta Li >> operator
-.It Li $42 Ta Li >>> operator
-.It Li $50 Ta Li BANK(symbol) ,
-a
-.Ar LONG
-Symbol ID follows, where -1 means PC
-.It Li $51 Ta Li BANK(section_name) ,
-a null-terminated string follows.
-.It Li $52 Ta Li Current BANK()
-.It Li $53 Ta Li SIZEOF(section_name) ,
-a null-terminated string follows.
-.It Li $54 Ta Li STARTOF(section_name) ,
-a null-terminated string follows.
-.It Li $60 Ta Li HRAMCheck .
-Checks if the value is in HRAM, ANDs it with 0xFF.
-.It Li $61 Ta Li RSTCheck .
-Checks if the value is a RST vector, ORs it with 0xC7.
-.It Li $80 Ta Ar LONG
-integer follows.
-.It Li $81 Ta Ar LONG
-symbol ID follows.
+.It Li $00 Ta Addition operator Pq Ql +
+.It Li $01 Ta Subtraction operator Pq Ql -
+.It Li $02 Ta Multiplication operator Pq Ql *
+.It Li $03 Ta Division operator Pq Ql /
+.It Li $04 Ta Modulo operator Pq Ql %
+.It Li $05 Ta Negation Pq unary Ql -
+.It Li $06 Ta Exponent operator Pq Ql **
+.It Li $10 Ta Bitwise OR operator Pq Ql \&|
+.It Li $11 Ta Bitwise AND operator Pq Ql &
+.It Li $12 Ta Bitwise XOR operator Pq Ql ^
+.It Li $13 Ta Bitwise complement operator Pq unary Ql ~
+.It Li $21 Ta Logical AND operator Pq Ql &&
+.It Li $22 Ta Logical OR operator Pq Ql ||
+.It Li $23 Ta Logical complement operator Pq unary Ql \&!
+.It Li $30 Ta Equality operator Pq Ql ==
+.It Li $31 Ta Non-equality operator Pq Ql !=
+.It Li $32 Ta Greater-than operator Pq Ql >
+.It Li $33 Ta Less-than operator Pq Ql <
+.It Li $34 Ta Greater-than-or-equal operator Pq Ql >=
+.It Li $35 Ta Less-than-or-equal operator Pq Ql <=
+.It Li $40 Ta Left shift operator Pq Ql <<
+.It Li $41 Ta Arithmetic/signed right shift operator Pq Ql >>
+.It Li $42 Ta Logical/unsigned right shift operator Pq Ql >>>
+.It Li $50 Ta Fn BANK symbol ,
+followed by the
+.Ar symbol Ap s Cm LONG
+ID.
+.It Li $51 Ta Fn BANK section ,
+followed by the
+.Ar section Ap s Cm STRING
+name.
+.It Li $52 Ta PC's Fn BANK Pq i.e. Ql BANK(@) .
+.It Li $53 Ta Fn SIZEOF section ,
+followed by the
+.Ar section Ap s Cm STRING
+name.
+.It Li $54 Ta Fn STARTOF section ,
+followed by the
+.Ar section Ap s Cm STRING
+name.
+.It Li $60 Ta Ql ldh
+check.
+Checks if the value is a valid
+.Ql ldh
+operand
+.Pq see Do Load Instructions Dc in Xr gbz80 7 ,
+i.e. that it is between either $00 and $FF, or $FF00 and $FFFF, both inclusive.
+The value is then ANDed with $00FF
+.Pq Ql & $FF .
+.It Li $61 Ta Ql rst
+check.
+Checks if the value is a valid
+.Ql rst
+.Pq see Do RST vec Dc in Xr gbz80 7
+vector, that is one of $00, $08, $10, $18, $20, $28, $30, or $38.
+The value is then ORed with $C7
+.Pq Ql \&| $C7 .
+.It Li $80 Ta Integer literal.
+Followed by the
+.Cm LONG
+integer.
+.It Li $81 Ta A symbol's value.
+Followed by the symbol's
+.Cm LONG
+ID.
 .El
 .Sh SEE ALSO
 .Xr rgbasm 1 ,