Sharp Performance on Apple Silicon: M4 Pro vs M4

Note: Staying true to the paper I got SHARP running in ~200ms on an H100. Total process — download / inference / compression / upload is about 3 seconds. Read more or reach out to spatial@fncore.com if you're interested. The MacBook Pro and Mac Mini timings below are 2-4 seconds higher than they should be for an approach identical to Apple's paper. I'll update them as soon as I re-run the benchmarks.

I benchmarked Sharp on my MacBook Pro and a Mac Mini to compare performance against the NVIDIA GPUs from Apple's paper.

The Results

Testing 9 images on each machine:

Machine	Chip	GPU Cores	Inference	Total Time
MacBook Pro	M4 Pro	16-20	5.56s	7.04s
Mac Mini	M4	10	9.28s	10.79s

The Mac Mini runs 1.7x slower than the MacBook Pro — which lines up almost exactly with the GPU core ratio (10 vs 16-20 cores).

CoreML vs MPS

I also tested CoreML as an alternative to PyTorch's MPS backend on the MacBook Pro:

Backend	Inference	Speedup
MPS (PyTorch)	5.56s	baseline
CoreML	4.04s	1.4x faster

CoreML delivers a 1.4x speedup over the standard PyTorch MPS backend on the same hardware.

Takeaway

GPU core count scales linearly with inference speed on Apple Silicon. The M4 Pro's extra cores translate directly into faster 3D processing, and CoreML can squeeze out another 40% on top of that.