Sorry but I'm not impressed. The 4K read speed (the one that really matter) isn't better than in a decent SATA3 SSD.
That there is so much difference between read and write speed clearly means write-back caching is involved, which means those numbers aren't worth as much as the publisher would like you to believe they are.
Any system with Intel RST drivers and sufficiently large RAM will show very high numbers for write speeds because of write-back RAM caching. Conversely, no device you will encounter will be able to sustain much higher tiny I/O rates until something fundamentally changes in the PC architecture.
Never forget that there are limitations based on the operating system itself that influence the speed at which a tiny I/O request can be turned around. That's why you'll notice that even the tiny I/O write speed shown is still topping out at less than 100 MB/second.
I've mentioned this before - 4K bytes divided into 94.91 megabytes is 24,297 I/O operations in 1 second, or about 0.041 milliseconds per operation. Even with today's giga processors 41 microseconds isn't a whole lot of time to do 1 I/O operation. It simply takes some base time for the CPU to call through the proper layers to do an I/O operation.
Assuming a virtually zero latency operation for the write-back cache, the difference you're seeing between the actual reads and the (instantaneous writes) is 112 microseconds - 41 microseconds == 71 additional microseconds to do the I/O from the flash memory.
71 microseconds to complete an I/O operation across ANY interface is phenomenally fast. That 0.071 milliseconds round trip. I don't know the specifics for this card, but I'm willing to bet the lion's share of that time is actually getting the data across the various interfaces.
You are simply NOT going to see a separate device be able to return I/O data to a CPU a whole heckuva lot faster than that.
NVMe will help, as the stack is shortened. But the data is still out on the PCIe bus, which takes time to use.
Now, when gargantuan blobs of flash memory are integrated right into the processor chipset, THEN we'll see much greater tiny I/O throughput.