shithub: rgbds

Download patch

ref: 3bebedf1f810c3d6149b59b75923bf4ee2442696
parent: 2ed937db2cd1a6b9b4251bd316a4f336f3d02191
author: Antonio Niño Díaz <antonio_nd@outlook.com>
date: Thu Feb 22 17:11:30 EST 2018

Handle newlines and comments correctly

Newlines have to be handled before comments or comments won't be able to
handle line endings that don't include at least one LF character.

Also, document an obscure comment syntax: Anything that follows a '*'
placed at the start of a line is also a comment until the end of the
line.

Signed-off-by: Antonio Niño Díaz <antonio_nd@outlook.com>

--- a/docs/rgbasm.5.html
+++ b/docs/rgbasm.5.html
@@ -42,6 +42,12 @@
 <div class="Pp"></div>
 All pseudo&#x2010;ops, mnemonics and registers (reserved keywords) are
   case&#x2010;insensitive and all labels are case&#x2010;sensitive.
+<div class="Pp"></div>
+There are two syntaxes for comments. In both cases, a comment ends at the end of
+  the line. The most common one is: anything that follows a semicolon
+  &quot;;&quot; (that isn't inside a string) is a comment. There is another
+  format: anything that follows a &quot;*&quot; that is placed right at the
+  start of a line is a comment.
 <h2 class="Ss" title="Ss" id="Sections"><a class="selflink" href="#Sections">Sections</a></h2>
 Before you can start writing code, you must define a section. This tells the
   assembler what kind of information follows and, if it is code, where to put
--- a/src/asm/lexer.c
+++ b/src/asm/lexer.c
@@ -159,6 +159,8 @@
 	pBuffer->pBuffer[size + 1] = 0;
 	pBuffer->nBufferSize = size + 1;
 
+	/* Convert all line endings to LF and spaces */
+
 	char *mem = pBuffer->pBuffer;
 	uint32_t instring = 0;
 
@@ -171,20 +173,44 @@
 		} else if (instring) {
 			mem += 1;
 		} else {
-			if ((mem[0] == 10 && mem[1] == 13)
-				|| (mem[0] == 13 && mem[1] == 10)) {
+			/* LF CR and CR LF */
+			if (((mem[0] == 10) && (mem[1] == 13))
+			 || ((mem[0] == 13) && (mem[1] == 10))) {
 				mem[0] = ' ';
 				mem[1] = '\n';
 				mem += 2;
-			} else if (mem[0] == 10 || mem[0] == 13) {
+			/* LF and CR */
+			} else if ((mem[0] == 10) || (mem[0] == 13)) {
 				mem[0] = '\n';
 				mem += 1;
-			} else if (mem[0] == '\n' && mem[1] == '*') {
+			} else {
 				mem += 1;
-				while (!(*mem == '\n' || *mem == '\0'))
+			}
+		}
+	}
+
+	/* Remove comments */
+
+	mem = pBuffer->pBuffer;
+	instring = 0;
+
+	while (*mem) {
+		if (*mem == '\"')
+			instring = 1 - instring;
+
+		if ((mem[0] == '\\') && (mem[1] == '\"' || mem[1] == '\\')) {
+			mem += 2;
+		} else if (instring) {
+			mem += 1;
+		} else {
+			/* Comments that start with ; anywhere in a line */
+			if (*mem == ';') {
+				while (!((*mem == '\n') || (*mem == '\0')))
 					*mem++ = ' ';
-			} else if (*mem == ';') {
-				while (!(*mem == '\n' || *mem == '\0'))
+			/* Comments that start with * at the start of a line */
+			} else if ((mem[0] == '\n') && (mem[1] == '*')) {
+				mem += 1;
+				while (!((*mem == '\n') || (*mem == '\0')))
 					*mem++ = ' ';
 			} else {
 				mem += 1;
--- a/src/asm/rgbasm.5
+++ b/src/asm/rgbasm.5
@@ -30,6 +30,12 @@
 .Pp
 All pseudo‐ops, mnemonics and registers (reserved keywords) are case‐insensitive
 and all labels are case‐sensitive.
+.Pp
+There are two syntaxes for comments. In both cases, a comment ends at the end of
+the line. The most common one is: anything that follows a semicolon
+\[dq]\&;\[dq] (that isn't inside a string) is a comment. There is another
+format: anything that follows a \[dq]*\[dq] that is placed right at the start of
+a line is a comment.
 .Ss Sections
 Before you can start writing code, you must define a section.
 This tells the assembler what kind of information follows and, if it is code,