It very much depends on your data. Say you have two million points, with $x_i$ randomly in the interval [0, 1000000]. Simple and fast: Sort the points by first coordinate, then for every point $(x_i, y_i)$ you compare with the points $(x_j, y_j)$ for j ranging from i + 1 to n as long as $x_j ≤ x_i + 1$. On average you compare with two other points.
Now assume you have two million points, with coordinates randomly in the interval [0, 2]. The algorithm above is totally useless. Divide the x-axis into intervals of size d for a suitable d, say 0.01, then for each of these intervals you collect all points with $x_i$ in the interval, and sort them by $y_i$.
For example, you will have all points with $0.37 ≤ x_i ≤ 0.38$. You compare them to points with $x_j$ in the intervals from $[0.00, 0.01]$ to $[1.37, 1.38]$. For each $x_i$ and each interval you compare with, you calculate two ranges where $y_j$ might be ($(x_j, y_j)$ is somewhere on the circle around $(x_i, y_i)$ with radius 1). Since the points are sorted by $y_j$ you can access all these points quickly.
You compare each $(x_i, y_i)$ with points within a total area of about $2\cdot \pi \cdot d)$, so the smaller d is, the more area you save, but the overhead grows.