In addition to using timeit for simple benchmarking you can pytest-benchmark, which makes it super-simple to create a comparison, simply:
import os
def f1(top):
for root, dirs, files in os.walk(top):
for f in files:
path = os.path.join(root, f)
print(path)
def f2(top):
for root, dirs, files in os.walk(top):
for f in files:
print(os.path.join(root, f))
def test_f1(benchmark):
benchmark(f1, '~/tmp')
def test_f2(benchmark):
benchmark(f2, '~/tmp')
Note: ~/tmp contains 350 files/folders, YMMV. Running
python -m pytest test.py --benchmark-min-time=0.001 --benchmark-histogram=hist
Gives you nice data and a histogram:
----------------------------------------------------------------------- benchmark: 2 tests ----------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
test_f1 4.4811 (1.0) 8.6253 (1.0) 4.7941 (1.00) 0.3531 (1.0) 4.7141 (1.01) 0.2762 (1.31) 15;7 216 1000
test_f2 4.4967 (1.00) 9.3009 (1.08) 4.7773 (1.0) 0.5242 (1.48) 4.6838 (1.0) 0.2113 (1.0) 6;13 215 1000
-----------------------------------------------------------------------------------------------------------------------------------------------------------------

As you can see, the difference is not significant considering the high variance.
Now if you are still curious, you can use dis to show the bytecode that is being executed by CPython. This is a functionality of the CPython interpreter, which is the most common way to run python code:
In [1]: import os, dis
In [2]: def f1(top):
...: for root, dirs, files in os.walk(top):
...: for f in files:
...: path = os.path.join(root, f)
...: print(path)
...:
In [3]: def f2(top):
...: for root, dirs, files, in os.walk(top):
...: for f in files:
...: print(os.path.join(root, f))
...:
In [4]: dis.dis(f1)
2 0 SETUP_LOOP 60 (to 62)
2 LOAD_GLOBAL 0 (os)
4 LOAD_ATTR 1 (walk)
6 LOAD_FAST 0 (top)
8 CALL_FUNCTION 1
10 GET_ITER
>> 12 FOR_ITER 46 (to 60)
14 UNPACK_SEQUENCE 3
16 STORE_FAST 1 (root)
18 STORE_FAST 2 (dirs)
20 STORE_FAST 3 (files)
3 22 SETUP_LOOP 34 (to 58)
24 LOAD_FAST 3 (files)
26 GET_ITER
>> 28 FOR_ITER 26 (to 56)
30 STORE_FAST 4 (f)
4 32 LOAD_GLOBAL 0 (os)
34 LOAD_ATTR 2 (path)
36 LOAD_ATTR 3 (join)
38 LOAD_FAST 1 (root)
40 LOAD_FAST 4 (f)
42 CALL_FUNCTION 2
44 STORE_FAST 5 (path)
5 46 LOAD_GLOBAL 4 (print)
48 LOAD_FAST 5 (path)
50 CALL_FUNCTION 1
52 POP_TOP
54 JUMP_ABSOLUTE 28
>> 56 POP_BLOCK
>> 58 JUMP_ABSOLUTE 12
>> 60 POP_BLOCK
>> 62 LOAD_CONST 0 (None)
64 RETURN_VALUE
In [5]: dis.dis(f2)
2 0 SETUP_LOOP 56 (to 58)
2 LOAD_GLOBAL 0 (os)
4 LOAD_ATTR 1 (walk)
6 LOAD_FAST 0 (top)
8 CALL_FUNCTION 1
10 GET_ITER
>> 12 FOR_ITER 42 (to 56)
14 UNPACK_SEQUENCE 3
16 STORE_FAST 1 (root)
18 STORE_FAST 2 (dirs)
20 STORE_FAST 3 (files)
3 22 SETUP_LOOP 30 (to 54)
24 LOAD_FAST 3 (files)
26 GET_ITER
>> 28 FOR_ITER 22 (to 52)
30 STORE_FAST 4 (f)
4 32 LOAD_GLOBAL 2 (print)
34 LOAD_GLOBAL 0 (os)
36 LOAD_ATTR 3 (path)
38 LOAD_ATTR 4 (join)
40 LOAD_FAST 1 (root)
42 LOAD_FAST 4 (f)
44 CALL_FUNCTION 2
46 CALL_FUNCTION 1
48 POP_TOP
50 JUMP_ABSOLUTE 28
>> 52 POP_BLOCK
>> 54 JUMP_ABSOLUTE 12
>> 56 POP_BLOCK
>> 58 LOAD_CONST 0 (None)
60 RETURN_VALUE
So the first code does indeed produce more bytecode instructions.
Anyhow, you should consider profiling - make sure you look at parts of the code that are really relevant and avoid optimizing blindly.