If we first zoom out and then zoom in, then a vague and ahistorical motivation is given by the following chain of ideas: a Grothendieck fibration between categories is just a cartesian fibration of $\infty$-categories which happen to be ordinary categories. A cartesian fibration is a generalization of a right fibration, which is a generalization of a Kan fibration (when $\infty$-categories are seen as simplicial sets.) So you can get from topological fibrations to fibrations as follows: consider a fibration $E\to X$ to be a family of spaces $E_x$ over points of $X$ together with a lot of coherence data: given paths $x\to y$ we get maps $E_x\to E_y$ and $E_y\to E_x$ using path lifting, and indeed we have a lot of analogous higher-dimensional data expressing, for instance, that all possible choices of these maps are approximately the same. Now pretend simplicial sets are spaces, and restrict your notion of fibration so you can only get the pullback map above, not the pushforward-but otherwise the same amount of coherence data is still present. Now allow your fibers and base to be infinity-categories, instead of just spaces. Then if we restrict everything to be an ordinary category, all the higher-dimensional data disappears, and we've found Grothendieck fibrations. So, if you're willing to view a fibration of spaces as encoding a way to glue together the fibers using data from the base, then a Grothendieck fibration of categories can be considered in the same way.