shithub: blog

ref: bc7312edbdb6248a48dbe330616b5e849fe359e2
dir: /dram.txt/

View raw version
Been seeing crazy compiler and linker crashes recently.
Usually happening right after startup when trying to
rebuild everything like: cd /sys/src && mk all

This is on my main terminal which is a lenovo x230.

The crashes where usually impossible looking at the
stack traces, but went away when re-running mk. Strange.

Turns out, my machine got some bad DRAM over the years.

Tried to get memtest86 pxe image booted, by extracting
bootx64.efi from their usb image and putting it in
/lib/tftp and changing bootf= entry to that file.

Turns out their commercial version actively refuses
to work under pxe boot. Motherfuckers.

Use memtest86+ instead from https://memtest.org
Worked like a charm and they even had direct links
to the .efi files!

Running memtest86+ immediately showed some bad
addresses. Write that down in your copybook!

Make a file with the broken addresses, round up
to 64k just in case. Boot the system, and run:
cat '#ec/*e820' > /tmp/memmap.txt

Edit /cfg/pxe/...... file (plan9.ini) and add the
contents of /tmp/memmap.txt as *e820= like:

*e820=1 0000000000000000 0000000000008000 1 0000000000008000 000000000000c000 1 000000000000c000 0000000000087000 2 0000000000087000 0000000000088000 1 0000000000088000 000000000009c000 2 000000000009c000 000000000009d000 2 000000000009d000 000000000009e000 2 000000000009e000 00000000000a0000 1 0000000000100000 0000000020000000 2 0000000020000000 0000000020200000 1 0000000020200000 0000000040004000 2 0000000040004000 0000000040005000 1 0000000040005000 00000000cd7ec000 2 00000000cd7ec000 00000000cd80c000 1 00000000cd80c000 00000000cfdf5000 2 00000000cfdf5000 00000000cfe0b000 2 00000000cfe0b000 00000000d000d000 2 00000000d000d000 00000000d07dc000 1 00000000d07dc000 00000000d2a55000 2 00000000d2a55000 00000000d44d3000 1 00000000d44d3000 00000000d44e1000 2 00000000d44e1000 00000000d5e50000 1 00000000d5e50000 00000000d63b8000 2 00000000d63b8000 00000000d6850000 2 00000000d6850000 00000000d6929000 2 00000000d6929000 00000000d6a50000 2 00000000d6a50000 00000000d75f9000 2 00000000d75f9000 00000000da49f000 2 00000000da49f000 00000000dabb0000 2 00000000dabb0000 00000000dae9b000 2 00000000dae9b000 00000000dae9c000 2 00000000dae9c000 00000000dae9f000 4 00000000dae9f000 00000000daef5000 4 00000000daef5000 00000000daf9f000 3 00000000daf9f000 00000000dafd6000 3 00000000dafd6000 00000000dafff000 2 00000000dafff000 00000000db000000 1 0000000100000000 000000041e600000 2 00000000000a0000 00000000000c0000 2 00000000db000000 00000000dfa00000 2 00000000f80f8000 00000000f80f9000 2 00000000fed1c000 00000000fed20000 2 000000041e600000 000000041f000000 5 0000000006620000 0000000006630000 5 000000000ad10000 000000000ad20000 5 0000000010230000 0000000010240000 5 000000001e770000 000000001e780000

Now, add the broken addresses to the list with type 5 (bad ram):

5 0000000045f30000 0000000045f40000
5 0000000051730000 0000000051740000
5 000000005e3b0000 000000005e3c0000
5 000000007e260000 000000007e270000
5 0000000089d90000 0000000089da0000
5 00000000953a0000 00000000953b0000
5 000000009b550000 000000009b560000 
5 000000009d3a0000 000000009d3b0000
5 0000000407a10000 0000000407a20000
5 000000040f860000 000000040f870000
5 0000000416600000 0000000416610000
5 000000041a340000 000000041a350000

...

So the final entry becomes:

*e820=1 0000000000000000 0000000000008000 1 0000000000008000 000000000000c000 1 000000000000c000 0000000000087000 2 0000000000087000 0000000000088000 1 0000000000088000 000000000009c000 2 000000000009c000 000000000009d000 2 000000000009d000 000000000009e000 2 000000000009e000 00000000000a0000 1 0000000000100000 0000000020000000 2 0000000020000000 0000000020200000 1 0000000020200000 0000000040004000 2 0000000040004000 0000000040005000 1 0000000040005000 00000000cd7ec000 2 00000000cd7ec000 00000000cd80c000 1 00000000cd80c000 00000000cfdf5000 2 00000000cfdf5000 00000000cfe0b000 2 00000000cfe0b000 00000000d000d000 2 00000000d000d000 00000000d07dc000 1 00000000d07dc000 00000000d2a55000 2 00000000d2a55000 00000000d44d3000 1 00000000d44d3000 00000000d44e1000 2 00000000d44e1000 00000000d5e50000 1 00000000d5e50000 00000000d63b8000 2 00000000d63b8000 00000000d6850000 2 00000000d6850000 00000000d6929000 2 00000000d6929000 00000000d6a50000 2 00000000d6a50000 00000000d75f9000 2 00000000d75f9000 00000000da49f000 2 00000000da49f000 00000000dabb0000 2 00000000dabb0000 00000000dae9b000 2 00000000dae9b000 00000000dae9c000 2 00000000dae9c000 00000000dae9f000 4 00000000dae9f000 00000000daef5000 4 00000000daef5000 00000000daf9f000 3 00000000daf9f000 00000000dafd6000 3 00000000dafd6000 00000000dafff000 2 00000000dafff000 00000000db000000 1 0000000100000000 000000041e600000 2 00000000000a0000 00000000000c0000 2 00000000db000000 00000000dfa00000 2 00000000f80f8000 00000000f80f9000 2 00000000fed1c000 00000000fed20000 2 000000041e600000 000000041f000000 5 0000000006620000 0000000006630000 5 000000000ad10000 000000000ad20000 5 0000000010230000 0000000010240000 5 000000001e770000 000000001e780000 5 0000000045f30000 0000000045f40000 5 0000000051730000 0000000051740000 5 000000005e3b0000 000000005e3c0000 5 000000007e260000 000000007e270000 5 0000000089d90000 0000000089da0000 5 00000000953a0000 00000000953b0000 5 000000009b550000 000000009b560000 5 000000009d3a0000 000000009d3b0000 5 0000000407a10000 0000000407a20000 5 000000040f860000 000000040f870000 5 0000000416600000 0000000416610000 5 000000041a340000 000000041a350000

And here you go. Bad ram excluded and the crashes are gone.