I want to retrieve the number of DRAM accesses in my application. Precisely, I need to distinguish between data and code accesses. The processor is an Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz (Haswell). Based on Intel Software Developer's Manual, Volume 3 and Perf, I could find and categorize the following memory-access-related events:
(A)
LLC-load-misses [Hardware cache event]
LLC-loads [Hardware cache event]
LLC-store-misses [Hardware cache event]
LLC-stores [Hardware cache event]
=========================================================================
(B)
mem_load_uops_l3_miss_retired.local_dram
mem_load_uops_retired.l3_miss
=========================================================================
(C)
offcore_response.all_code_rd.l3_miss.any_response
offcore_response.all_code_rd.l3_miss.local_dram
offcore_response.all_data_rd.l3_miss.any_response
offcore_response.all_data_rd.l3_miss.local_dram
offcore_response.all_reads.l3_miss.any_response
offcore_response.all_reads.l3_miss.local_dram
offcore_response.all_requests.l3_miss.any_response
=========================================================================
(D)
offcore_response.all_rfo.l3_miss.any_response
offcore_response.all_rfo.l3_miss.local_dram
=========================================================================
(E)
offcore_response.demand_code_rd.l3_miss.any_response
offcore_response.demand_code_rd.l3_miss.local_dram
offcore_response.demand_data_rd.l3_miss.any_response
offcore_response.demand_data_rd.l3_miss.local_dram
offcore_response.demand_rfo.l3_miss.any_response
offcore_response.demand_rfo.l3_miss.local_dram
=========================================================================
(F)
offcore_response.pf_l2_code_rd.l3_miss.any_response
offcore_response.pf_l2_data_rd.l3_miss.any_response
offcore_response.pf_l2_rfo.l3_miss.any_response
offcore_response.pf_l3_code_rd.l3_miss.any_response
offcore_response.pf_l3_data_rd.l3_miss.any_response
offcore_response.pf_l3_rfo.l3_miss.any_response
My choices are as follows:
- It seems that the sum of
LLC-load-missesandLLC-store-misseswill return the whole DRAM accesses (equivalently, I could useLLC-missesinPerf). - For data-only accesses, I used
mem_load_uops_retired.l3_miss. It does not include stores, but seems to be OK (because stores seem to be much less frequent?!). - Simplistically,
LLC-load-misses-mem_load_uops_retired.l3_miss=DRAM Accesses for Code(As code is read-only).
Are these choices reasonable?
My other questions: (The 2nd one is the most important)
- What are
local_dramandany_response? - At first, it seems that, group (C), is a higher resolution version of the load events of group (A). But my tests show that the events in the former group is much more frequent than the latter. For example, in a simple benchmark, the number of
offcore_response.all_reads.l3_miss.any_responseevents were twice as many asLLC-load-misses. - Group (E), pertains to
demand reads(i.e., allnon-prefetchedreads). Does this mean that, e.g.:offcore_response.all_data_rd.l3_miss.any_response-offcore_response.demand_data_rd.l3_miss.any_response= DRAM read accesses caused by prefeching?
Group (D), includes DRAM access events caused by Read for Ownership operations (for Cache Coherency Protocols). It seems irrelevant to my problem.
Group (F), counts DRAM reads caused by L2-cache prefetcher which is also irrelevant to my problem.