FUZxxlmntmn: didn't you say something about the LS1046A being a possible alternative SoC for the Reform?16:39
mntmnFUZxxl: rbz+us are developing a LS1028A SOM16:42
FUZxxlmntmn: Ah, I see.  Seems like I have confused the product numbers.16:42
FUZxxl(LS1046A support has landed in FreeBSD HEAD today)16:42
mntmnthe 1028A has the same vivante GC7000L gpu as imx8m16:45
mntmnbut it can address more memory and has bigger pcie address space16:45
mntmnit is dual cortex-a72 vs quad cortex-a53 of imx8mq.16:45
FUZxxlthat sounds cool16:46
FUZxxlCortex A72---that's the same core used in the RPi 4B16:46
FUZxxlI've written some SIMD code for that chip in the last weeks16:47
FUZxxlthe performance is okay, but not stellar16:47
FUZxxlmight also be because the SIMD units have at least a 3 cycle latency on all operations16:47
FUZxxlSIMD performance is about 1/4 the speed of my Haswell-based laptop (using 128 bit vectors on both; if I use AVX on the Haswell box it gets a lot worse)16:49
mntmnFUZxxl: interesting17:28
mntmnFUZxxl: anything public?17:29
FUZxxlmntmn: https://github.com/clausecker/pospop17:30
FUZxxlyou need to build a Go toolchain from master as the code requires some recently added extensions to the assembler to work.17:31
FUZxxlIf you build with an older toolchain, the SIMD version won't be built17:31
mntmnFUZxxl: interesting! is the perf tied to caches as well?17:32
FUZxxlmntmn: the ARM64 code doesn't have prefetching added yet17:33
FUZxxlon x86, cache size makes a huge difference17:34
FUZxxlit eventually regresses to being main memory bound17:34
FUZxxlthough that's only after the L3 cache is exhausted17:35
FUZxxlon the Cortex-A72, there doesn't really seem to be an effect17:35
FUZxxlI'm getting the same performance for each buffer size.  Perhaps the code is too slow for memory bandwidth to matter?17:35
FUZxxlIt only reaches around 3.6 GB/s17:36
mntmnFUZxxl: really interesting. i should run it on the LS1028A and the LX2160A when i have the time18:18
FUZxxlmntmn: cool!  You can run the tests with ' go get github.com/clausecker/pospop; go test -bench=. github.com/clausecker/pospop'18:21
