I have reached those speeds while using the USBC_MassStorage class and writing and reading data from a computer.
In my implementation I use FileStream to read bytes from a large file. That seems to be slower than direct access that the USBC_MassStorage device class is most likely using.(No source available so I cannot confirm).
I’m pretty sure the GHI lib will be using low level calls to the SD while Filestream will have NETMF between the SD and you - which is a lot slower.
You can probably use RLP or maybe bribe someone to develop a SDFilestream class with DMA and direct access to the SD interface.
Also try and tweak the chunk sizes you request. I think it should be multiples of 512 bytes. See if a larger or smaller buffer speed things up. Avoid working on your NETMF buffer byte by byte… That is really slow.