Given two bit vectors A and B of length $N$ the Jaccard similarity between them is $\|A \land B\|/\|A \lor B\|$.
Select $M$ bit vectors from the set of $2^N-1$ possible values, excluding all-zeroes. What is the largest $M$ for a given $N$ such that the $M(M-1)/2$ pairwise Jaccard similarities are all distinct? What is a concrete example of a subset with that property?
The number of distinct Jaccard similarity values for $N$ is the number of fractions in Farey series of order $N$, which sets the upper bound.
Experimental results
A brute-force search starting with $N=2$ finds $M = 2, 3, 4, 5, 5, 6, 7, \ldots$. I found no likely matches in OEIS. An example of 7 bit vectors for $N=8$ is:
00000001
00001110
00010111
00110111
01110111
01111111
11111111
I have lower bounds for a few larger cases. For $N=10$, $M \ge 8$ with decimal values {1, 31, 63, 99, 191, 447, 511, 1022}. For $N=64$, $M \ge 27$ with {75616855644387687, 2459624554201425171, 1521930251796536, 17293822019346890751, 18157352612976647935, 2404968724985570464, 9007705569197305865, 16717132001596030797, 16134634947158732799, 1291115287526644499, 3849546270639050489, 13168489964796115711, 144115188075864354, 16131190038204382975, 1316340271877180188, 7489449894117883466, 15037035512890333456, 8638537081332189361, 6036230857943676155, 13051322593571675920, 72092782704984705, 17867483455257025887, 17985060916847511162, 17221764975064768383, 1073193546549267, 10470171935520675435, 11344355769778337692}. The last was found using a random-based first-fit search.
Driving Problem
In chemistry, similar molecules tend to have similar properties. There are many ways to characterize "similar". One way is to use a bit vector (typically with $N=1024$ or $N=2048$) called a "fingerprint", where a given bit it set due to the presence of one or more chemical features in a molecule. The Jaccard similarity between two corresponding fingerprints - called the "Tanimoto" in my field - is used as a proxy for chemical similarity. It is easy to understand, simple to use, and gives reasonably good predictive ability.
There are many ways to analyze a set of fingerprints, for example, to cluster them, or find a dissimilar subset. The large number of ties in the NxN comparison means that the final result is not exactly determined. Different implementations of the same algorithm will likely give different results. Even multiple runs of the same program might give different result if, for example, an RNG is used to break ties and different seeds are used.
This indeterminacy makes it more difficult to validate basic correctness.
On the other hand, if the data set has no ties then the results should be well-defined, which lead me on this path.
The $M=27$ case above is good enough for my current purposes. I still have an intellectual itch to find out the exact value for $M$ and to find larger data sets, especially for $N=1024$.