Here are some recent papers that study optimizing PSI for the case of significantly unequal set sizes. In all of these papers the primary goal is to reduce communication.
These two papers save on communication by having an offline phase with high communication. When it comes time to actually compute the intersection, the online communication is low:
This paper saves on communication by assuming that the server's large set is held by two (or more) noncolluding parties. That way, it is possible to leverage techniques of private information retrieval:
You could argue that the previous papers are "cheating" in the sense that they are not in the standard 2-party setting with a one-off computation.
There are a few papers that achieve low communication without such "cheating", and these all use expensive public-key crypto operations (proportional to the size of the large set). The most efficient one that I know of is this, which uses somewhat homomorphic encryption:
I happen to know a followup to this work will appear at CCS 2018, but it is not available online yet.
You cannot avoid having the protocol "touch" every part of the large set, so the server cost will always be somewhat high. In PIR-PSI the cost is some number of very cheap AES operations for each item in the large set. Hence this one is the fastest that I know about (if you include the cost of the offline pre-computation in other work). The FHE-based scheme is surprisingly fast though a little slower than PIR-PSI as far as I know.
The state of the art for all of these supports 10-100 million server items and 500-5000 client items, with costs on the order of a few seconds and maybe 5-20 MB of communication.