Foreword
The Raspberry pi 4 (HQ Cambridge, England) is an absolute must when it comes to DIY projects, it has a huge community, so many software, “there is an hat for that” … all things ready to use, it would be too long to list all that.
And today, we don’t care.
Le Potato from LibreComputer (HQ Shenzhen, China), well, it’s a potato. I like to call it a Raspberry pi 3 and a half. Buried, for sale mostly on its manufacturer’s website and Amazon. When it’s in stock, it’s discounted, that tells everything you need to know about it.
Except… The Raspberry pi’s processor, Broadcom BCM2711, implements the armv8 instruction set, whereas Le Potato’s Amlogic S905X processor implements the armv8 instruction set and the optional crypto instructions.
Does it make a difference ?
CPU and instruction sets
Raspberry pi 4 Broadcom BCM2711
The datasheet says nothing special about crypto keywords aes, sha. That is a little bit misleading since useful instructions like asimd (Single Instruction Multiple Data/advanced SIMD) have been integrated in the armv8 base architecture. You might remember of them as NEON.
So openssl does AES computations using assembler optimized code, taking good care to protect them from compiler optimization, instead of general purpose instruction set. You may have a look at relevant openssl code in the Bibliography section below.
/proc/cpuinfo show us: Features : fp asimd evtstrm crc32 cpuid
Le Potato Amlogic S905X
The datasheet brags about crypto:
Crypto Engine – AES/AES-XTS block cipher with 128/192/256 bits keys, DES/TDES block cipher, Hardware crypto key-ladder operation and DVB-CSA for transport stream encryption, built-in hardware True Random Number Generator (TRNG), CRC and SHA-1/SHA-2/HMAC SHA engine
Actually this processor has more to offer about security in silicon, but it will be for another day.
/proc/cpuinfo shows more things, in particular about hardware accelerated block ciphers and hashing algorithms
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
Here is the S905X schematic, emphasis on the crypto bloc so you can’t miss it 🤭
The bench
We will be running a couple simple benchmarks :
- Speedtest-cli from Ookla, to see what one could expect about the best case scenario for network speed
- Openssl speed, for a pure CPU benchmark, without fooling with network latencies and external factors
- The speedtest server will be carefully chosen as the closest to the VPN Server, both hosted at Scaleway
- The client LAN is a gigabit wired lan, with 1Gb fiber optic uplink to the Internet
- The server is a VPS hosted in a nearby datacenter, with 1Gb internet unrestricted
- Both single-board computers will run Ubuntu 22.04 LTS arm64 kernel 6.1
With that said, the Raspberry has a gigabit interface and faster CPU cores, will it gives it an advantage over Le Potato ?
SBC | Cores | RAM | NIC |
RPi 4 | A72 1.8Ghz | 3200Mhz LPDDR4 | 1 Gigabit dedicated chip on PCI-e bus |
Potato | A53 1.5Ghz | 2133Mhz DDR3 | 100Mbps embedded on chip controler |
The SSL VPN will be established with same cipher and hashing parameters for both contenders:
Control Channel: TLSv1.3, cipher TLSv1.3 TLS_AES_256_GCM_SHA384, peer certificate: 2048 bit RSA, signature: RSA-SHA256
Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Raspberry pi 4
First, the network speed test:
# speedtest --server-id 28073
Speedtest by Ookla
Server: Networth Telecom - Clichy (id: 28073)
ISP: Scaleway
Idle Latency: 27.65 ms (jitter: 0.12ms, low: 27.49ms, high: 27.70ms)
Download: 44.17 Mbps (data used: 51.3 MB)
259.80 ms (jitter: 67.05ms, low: 50.64ms, high: 631.93ms)
Upload: 29.53 Mbps (data used: 47.5 MB)
411.88 ms (jitter: 80.75ms, low: 137.53ms, high: 2278.01ms)
Packet Loss: 0.0%
Result URL: https://www.speedtest.net/result/c/737edbea-e904-420e-93bb-1145d1d5286f
Now Openssl for CPU benchmarking :
# openssl speed -elapsed -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks:
5141443 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 1374584 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 351475 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 88352 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 11060 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 5529 aes-256-cbc's in 3.00s
OpenSSL 1.1.1n 15 Mar 2022
built on: Mon Jun 26 13:47:46 2023 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/openssl-CPJSK9/openssl-1.1.1n=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DBSAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-256-cbc 27421.03k 29324.46k 29992.53k 30157.48k 30201.17k 30195.71k
Le Potato
Same thing, network speed test:
# speedtest --server-id 28073
Speedtest by Ookla
Server: Networth Telecom - Clichy (id: 28073)
ISP: Scaleway
Idle Latency: 26.58 ms (jitter: 0.21ms, low: 26.32ms, high: 26.68ms)
Download: 80.70 Mbps (data used: 139.0 MB)
97.54 ms (jitter: 22.94ms, low: 36.11ms, high: 194.47ms)
Upload: 88.85 Mbps (data used: 100.2 MB)
242.54 ms (jitter: 68.72ms, low: 40.67ms, high: 1049.26ms)
Packet Loss: 0.0%
Result URL: https://www.speedtest.net/result/c/d740c37b-82f4-4dc5-b828-f8095df40bf8
And CPU test:
# openssl speed -elapsed -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing AES-256-CBC for 3s on 16 size blocks: 17585558 AES-256-CBC's in 3.00s
Doing AES-256-CBC for 3s on 64 size blocks: 12705705 AES-256-CBC's in 3.00s
Doing AES-256-CBC for 3s on 256 size blocks: 5636913 AES-256-CBC's in 3.00s
Doing AES-256-CBC for 3s on 1024 size blocks: 1777408 AES-256-CBC's in 3.00s
Doing AES-256-CBC for 3s on 8192 size blocks: 240662 AES-256-CBC's in 3.00s
Doing AES-256-CBC for 3s on 16384 size blocks: 120914 AES-256-CBC's in 3.00s
version: 3.0.2
built on: Wed May 24 17:12:55 2023 UTC
options: bn(64,64)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/openssl-7eq86f/openssl-3.0.2=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_TLS_SECURITY_LEVEL=2 -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
CPUINFO: OPENSSL_armcap=0xbf
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
AES-256-CBC 93789.64k 271055.04k 481016.58k 606688.60k 657167.70k 660351.66k
Results
Please note that as equitable as the bench can be, performance metrics doesn’t rely solely on the CPU chip own performance, a pretty long chain is tested along the way. But it is accurate enough for this humble comparison.
Le Potato performs roughly three times as fast as the Raspberry pi 4. Here you know that LibreComputer is an engineering focused company, not another leech jumping on the SBC and Raspberry trend… They resisted the urge to advertise with bad marketing our Potato is 300% faster(*) than a common raspberry.
SBC | Network speed (Mbps) | CPU AES (KBps) |
RPi 4 | ⬇ 44 - ⬆ 30 | 30M average |
Potato | ⬇ 80 - ⬆ 89 | 46M average |
Regarding the arm core frequency, memory bandwidth, and pci-e connected dedicated network adapter advantages of the Raspberry over Le Potato, I expected the difference to be much more limited.
Just a note about retail prices
At the time of this writing, a Potato with 2Gb RAM costs 40€, a Raspberry pi 4 with 2Gb RAM costs 90€.
TL;DR Le Potato > Raspberry pi
In short, Le Potato’s Amlogic CPU has builtin AES and SHA instruction set, which provides a significant advantage for crypto intensive applications like SSL VPN tunnels, that makes it a very good VPN server, or client for that matter.
Plus, it’s inexpensive as a single-board computer should be.
For general computing and probably everything else, it’s the other way around.
Bibliography
Armv8 architecture base and crypto extension quick overview
Armv8 instruction set overview
Optional aes instruction use in openssl aes assembler source code
NEON/SIMD instruction use in openssl aes assembler source code
Le Potato Amlogic S905X block diagram
Raspberry pi 4 datasheet
BCM2711 peripherals datasheet