Is `volatile` enough to allow the compiler to handle machine registers with side-effects on read?

Question

I work with microcontrollers where there are occasionally machine registers that have actions that occur when a register is read. Yes, that's not a typo, there's lots of registers that cause actions when they are written, but in a few cases if you read a register, something happens.

The most common instance of this is a UART receive register hooked up to one end of a FIFO; for example let's say there is RXDATA. When you read RXDATA it pulls one byte out of the FIFO and the next time you read RXDATA it will get the next byte.

Is there enough information in volatile to get the compiler to understand that there might be side effects from a read?

Example fragment in C:

#include <stdint.h>
#include <stdbool.h>

volatile uint8_t RXDATA;   
// there is some mechanism for associating this with a known hardware address
// (either in linker information, or in some compiler-specific attribute not shown)    

// Check that bit 0 is 1 and bit 7 is 0.
bool check_bits_1() 
{
   const uint8_t rxdata = RXDATA;
   return (rxdata & 1) && ((rxdata & 0x80) == 0);
}

// Check that bit 0 is 1 and bit 7 is 0.
bool check_bits_2() 
{
   return (RXDATA & 1) && ((RXDATA & 0x80) == 0);
}

// Check that bit 0 is 1 and bit 7 is 0.
bool check_bits_3() 
{
   const bool bit_0_is_1 = RXDATA & 1;
   const bool bit_7_is_0 = (RXDATA & 0x80) == 0;
   return bit_0_is_1 && bit_7_is_0;
}

If I ignore the C standard and pretend that a compiler does exactly what I think I am asking it to do (DWIM), then my intuition is that these three functions have different behavior:

In the first case, we read RXDATA once, so we pull out one byte of the FIFO and then do some math on it.
In the second case, we read RXDATA either once or twice (because && has short-circuit behavior), doing math directly on the register value, so we might either pull out one or two bytes from the FIFO, and this has incorrect behavior.
In the third case, we read RXDATA twice, pulling two bytes from the FIFO, so this is incorrect behavior.

Whereas if RXDATA isn't volatile then presumably all three of the above implementations are equivalent.

Does the C standard require the compiler to interpret volatile in this case in the same way I am looking at it? If not, how can a hardware register be handled properly in C?

So far as I can tell, the standard does not distinguish between read and write accesses when defining the behaviour of `volatile`. — Oliver Charlesworth, Jun 20 '17 at 22:53
https://stackoverflow.com/questions/13823669/how-to-force-an-unused-memory-read-in-c-that-wont-be-optimized-away feels like a dupe. — Oliver Charlesworth, Jun 20 '17 at 22:55
I agree that it's closely related (and thanks for the reference; [Lundin's answer](https://stackoverflow.com/a/13842698/44330) seems to answer my question) but not exactly a duplicate. — Jason S, Jun 20 '17 at 23:05
`volatile` solves the problem of a compiler optimising away access operations it "considers unnecessary" (because it doesn't know the register might be changed externally) - It does **not** solve *atomicity*, however. This must be done on top of using `volatile` — tofro, Jun 21 '17 at 05:46
Too broad. It depends on the platform. Not sure what you mean with "`volatile` containing information". The standard defines the semantics of `volatile`, but there is no additional information stored with the qualifier. — too honest for this site, Jun 21 '17 at 11:27

score 4 · Accepted Answer · answered Jun 21 '17 at 11:55

Is there enough information in volatile to get the compiler to understand that there might be side effects from a read?

Yes.

The C language formal definition of a side effect actually targets this very scenario. C11 5.1.2.3:

Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.

Regarding what the compile is allowed to optimize, C11 5.2.3.4:

In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).

In plain English, this means that any form of access, read or write, to a volatile object, is considered a side effect and a compiler is not allowed to optimize away side effects.

...then my intuition is that these three functions have different behavior

Indeed they have. This is why coding standards such as MISRA-C forbids us to mix volatile variable access together with other things in the same expression. In the UART scenario, doing so might cause loss of status flags which would be a severe bug.

Robust programs read/write to volatile variables on a single line and do all other necessary arithmetic in separate expressions.

score 3 · Answer 2 · answered Jun 20 '17 at 23:44

I think your description of how the compiler has to look at it is correct. The C standard's requirements are specified in ISO/IEC 9899:2011 §6.7.3 Type qualifiers:

¶7 An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously.¹³⁴⁾ What constitutes an access to an object that has volatile-qualified type is implementation-defined.

¹³⁴⁾ A volatile declaration may be used to describe an object corresponding to a memory-mapped input/output port or an object accessed by an asynchronously interrupting function. Actions on objects so declared shall not be ‘‘optimized out’’ by an implementation or reordered except as permitted by the rules for evaluating expressions.

The only cause for concern is the last sentence — that what qualifies as access is implementation-defined. That means you should be able to find out for any given compiler what qualifies as access; the implementation is required to define and document the rules. However, different compilers on different machines might have different interpretations of what 'an access' means.

Section 5.1.2.3 Program execution is moderately long and moderately complex to parse:

¶1 The semantic descriptions in this International Standard describe the behavior of an abstract machine in which issues of optimization are irrelevant.

¶2 Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects,¹²⁾ which are changes in the state of the execution environment. Evaluation of an expression in general includes both value computations and initiation of side effects. Value computation for an lvalue expression includes determining the identity of the designated object.

¶4 In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).

¶6 The least requirements on a conforming implementation are:

Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.

Footnote 12 refers to floating point state. ¶3 defines 'sequenced before' etc. ¶5 discusses signal handling. There are more 'least requirements' but they don't mention volatile.

I think this all agrees with your interpretation of how the code you show should be handled in the light of the volatile qualifier.

Looks like "What constitutes an access to an object that has volatile-qualified type is implementation-defined." is the key. If OP's compiler details the implementation-defined behavior, all is copacetic. — chux - Reinstate Monica, Jun 21 '17 at 02:55

score 1 · Answer 3 · answered Jun 21 '17 at 08:09

1

When dealing with memory mapped registers you pretty much have to go beyond what the C standard guarantees. If you're lucky you only need to rely on implementation specific behavior, but often you need to go beyond that and pretty much verify by hand that the generated code is correct. Even if you find a way to make this strictly standards conforming you have to account for that this is a very rarely exercised and tested area of the compiler and is one of those things that can easily be broken by an edge case in some obscure minor bugfix release. This is why almost all operating system kernels have a limited list of compilers they are supposed to be compiled with.

The kernels I have experience from follow pretty much the same pattern. Memory mapped registers are abstracted away behind some handles with function pointers for various accesses to registers. This is primarily so that you can use the same API to talk to different buses on different architecture, but the secondary purpose of it is that functions hidden behind function pointers are good at convincing the compiler to not inline and reorder things (there is rarely an actual guarantee of that, but see paragraph 1). The functions themselves range from trivial pointer dereference on certain architectures to raw assembly on architectures where the compilers have proven to be hard to convince to not be creative or where specific memory barriers are necessary.

Speaking of that last bit, you need to take the memory model into consideration. Just because the compiler isn't creative with reordering your code doesn't mean that the CPU isn't free to do whatever it wants. And this is definitely outside of the C standard.

answered Jun 21 '17 at 08:09

Art

19,807
1
34
60

The CPU is often a minor problem. Modern "bus" interfaces like PCIe also buffer, reorder, etc. The commonly used DMA on modern systems adds additional complications. – too honest for this site Jun 21 '17 at 11:29
@Olaf CPUs are rarely a problem, true. But when they are things go spectacularly bad. AMD had a fun one. Some memory mapped registers can be safely accessed cached. One system created register mappings that were both cached and uncached. The code had a branch that picked which mapping a write should go to. Speculative execution executed both branches which fetched and dirtied a cache line while the actual write went directly to the device. Next cache eviction wrote out the unmodified cache line to the device. – Art Jun 21 '17 at 11:51
That sounds more like a missconfiguration of the MMU and related, i.e. an OS problem. Point is: this was no issue on bare-metal embedded systems until few years ago. But it becomes more and more relevant with larger ARM systems like Cortex-M7 (the M3/4 are still relatively simple by default). – too honest for this site Jun 21 '17 at 11:55
There is no reason to go beyond the C standard at all, no idea where you got that idea from. The only issue is if register access in C translates "bit set" versus "read-modify-write", which is a separate issue not related to `volatile`. Overall, there's not necessarily some complex desktop OS which will hide everything behind 42 levels of abstraction. None of your concerns are particularly relevant on for example a bare metal embedded system. Where register access almost always can be written in 100% standard compliant C. – Lundin Jun 21 '17 at 12:06
@Lundin I agree for most embedded systems but the processor has to have [sequentially-consistent behavior](https://homes.cs.washington.edu/~bornholt/post/memory-models.html) so if it can reorder instructions using relaxed memory model then yeah, I have to worry. – Jason S Jun 21 '17 at 19:01

Is `volatile` enough to allow the compiler to handle machine registers with side-effects on read?

3 Answers3