Note: Staying true to the paper I got SHARP running in ~200ms on an H100. Total process — download / inference / compression / upload is about 3 seconds. Read more or reach out to spatial@fncore.com if you're interested. The MacBook Pro and Mac Mini timings below are 2-4 seconds higher than they should be for an approach identical to Apple's paper. I'll update them as soon as I re-run the benchmarks.
I benchmarked Sharp on my MacBook Pro and a Mac Mini to compare performance against the NVIDIA GPUs from Apple's paper.
The Results
Testing 9 images on each machine:
| Machine | Chip | GPU Cores | Inference | Total Time |
|---|---|---|---|---|
| MacBook Pro | M4 Pro | 16-20 | 5.56s | 7.04s |
| Mac Mini | M4 | 10 | 9.28s | 10.79s |
The Mac Mini runs 1.7x slower than the MacBook Pro — which lines up almost exactly with the GPU core ratio (10 vs 16-20 cores).
CoreML vs MPS
I also tested CoreML as an alternative to PyTorch's MPS backend on the MacBook Pro:
| Backend | Inference | Speedup |
|---|---|---|
| MPS (PyTorch) | 5.56s | baseline |
| CoreML | 4.04s | 1.4x faster |
CoreML delivers a 1.4x speedup over the standard PyTorch MPS backend on the same hardware.
Takeaway
GPU core count scales linearly with inference speed on Apple Silicon. The M4 Pro's extra cores translate directly into faster 3D processing, and CoreML can squeeze out another 40% on top of that.