As stated in the question, this recognizes only context-free languages.
Lexical analysis is actually a GSM mapping, i.e. a finite state
transduction. Since CF languages are a full Abstract Family of Languages (full AFL), they are closed under
inverse GSM mapping. Hence, if the sequences of lexemes parsed by the second phase belong to a context-free language, the original texts as a sequence of characters also belong to a
context-free language. This 2 levels organization does not change the CF
characters of syntax, assuming the second level is indeed context-free.
One point is worth noting, though it is a minor technical issue. To
have this CF character of the second phase, it is necessary that identifiers be replaced
by a finite set of standard categories represented each by a unique
symbol. The same is true for literals such as strings or numbers. The
point is that many languages do not in principle put any limit on the
size of identifiers, or some literals, so that there is in principle an infinite number
of them. But the context free grammar of the second phase must have a
finite alphabet, like any CF grammar. Hence the lexical elements
returned by the first phase cannot actually distinguish all
identifiers, or all literals, for the second phase.
However the information is kept by other means for the more semantical
phases of the compiling process.
It is true that any given program does have a finite number of
identifiers and literals, but the point is that we are considering the language of all possible programs.
Another point is of course that texts of compiler accepted programs do
not form a context free language as other constraints (such as
declarations of identifiers) are imposed on programs, in parallel to
the strictly syntaxic parsing process.
But, ignoring those more "semantics" constraints, some parsers also
use a variety of parsing "tricks", such as prioritization of reduction
rules to avoid ambiguities or non-determinism. As long as this is done
with a finite memory and a pushdown stack that is used only within a
fixed finite distance from its top, this does not get the syntax out
of the context-free realm. However, what extra parsing rules may exist
for the parser also have to be known to the users of the language,
thus making the understanding of the syntax more complex to users, and
giving more ground for misunderstanding. TANSTAAFL. Supposing this is
used to reduce ambiguity, it could then happen that the user will read
his program one way, while the parser reads it another way.
My own preference is for general CF parsers, that will detect
ambiguity (which is, of course, decidable for a given string of an
ambiguous language), and reject it as a programming error. But that is
much a matter of taste in design.
Including the lexical phase into the context-free syntax may possibly
make those extra rules more complex, as reasonning can no longer
isolate lexical issues fron CF syntactic ones. It can also make the
automaton underlysing the parser more complex, or call for more extra
rules. Typically, if parsing the CF part of the syntax requires a
lookahead that may include an identifier, the look-ahead becomes
unbounded when identifiers are not reduced to one symbol by a lexical
phase. Much depends also on the chosen parser generation technology.
Though there are some scannerless parsers used, I would think this applies more to languages with specific lexical and syntactic characteristics. But I am no expert on this, and the wikipedia page seems a bit weak on the general presentation side.
Finally, another interesting point, close to the initial question is
error recovery. It is used by compilers so as to be able to catch many
errors at each run, and was important when compiling was done in batch
mode, rather than interactively. Many syntactic error recovery
techniques are based on a formal model of finite state generation of
errors, such as missing or extra symbol, or local garbling of the strings
(this is also true for natural language processing). This can also be
modeled by a GSM. Hence, ignoring again semantical aspects, the larger
language of programs with syntactic errors accepted by a parser with
error correction is also context-free.
There is more in the comments, but how long should an answer be?