Pointer is one of the tools in C which we can use to do
whatever we want. There are many hacks we can do with the pointers. I thought
of writing about one such hack that we can use and about the ways the same hack
could backfire. Let’s get started.
Assume the following requirement, your project has more
than one node and are interconnected that is communicating with one another.
Let the communication line be anything ( LAN or serial ) , it`s obvious that
the data is transmitted in the range of bytes. Each node is having data with
various datatypes and want to share that data with one another. As the
communication link is transmitting data as a byte, there comes the Packing and
Unpacking in each node.
Suppose one node wants to share the array of data with
datatype of uint32_t, then it should handle the conversion of data as shown
below,
uint32_t u32DataBuffer[10];
uint8_t u8LanTxBuffer[100];
u8LanTxBuffer[ 0 ] = u32DataBuffer [ 0 ] & 0xFF;
u8LanTxBuffer[ 1 ] = ( u32DataBuffer [ 0 ] >> 8 )
& 0xFF;
u8LanTxBuffer[ 2 ] = ( u32DataBuffer [ 0 ] >> 16 )
& 0xFF;
u8LanTxBuffer[ 3 ] = ( u32DataBuffer [ 0 ] >> 24 )
& 0xFF;
And each
node which receives the data, should handle the conversion as shown below,
uint32_t
u32DataBuffer[10];
uint8_t u8LanRxBuffer[100];
u32DataBuffer[ 0 ] = u8LanRxBuffer[ 0 ];
u32DataBuffer[ 0 ] |= u8LanRxBuffer[ 1 ] << 8;
u32DataBuffer[
0 ] |= u8LanRxBuffer[ 2 ] << 16;
u32DataBuffer[
0 ] |= u8LanRxBuffer[ 3 ] << 24;
Let`s suppose, you want to transfer 50 elements of
u32DataBuffer, then there are two issues you will face. One is code readability
as for the copy of one element you need four assignment statements and could
take 200 line for the packing itself. The second issue is performance.
Readability issue can be mitigated by employing for
loop to iterate through or memcpy can also be used, but assume the worst case,
the data you want to transmit in a single LAN packet consists of the assortment
of different datatypes. Look at the below sequence,
LanDataTransmitBuffer <- uint32_t Data1
LanDataTransmitBuffer <- uint16_t Data2
LanDataTransmitBuffer <- uint8_t Data3
LanDataTransmitBuffer <- uint32_t Data4
In above case for loop can`t be used to iterate through
and you need to use four statements for single copy of uint32_t data and it will
really mess up the code readability. This issue can be fixed by using macro as
shown below,
#define
UINT32_TO_UINT8_IN_LE( destination, source ) \
do \
{ \
destination[ 0 ] = source & 0xFF; \
destination[ 1 ] = ( source >> 8 ) & 0xFF; \
destination[ 2 ] = ( source >> 16 ) & 0xFF; \
destination[ 3 ] = ( source >> 24 ) & 0xFF; \
}while( 0 )
The macro can used as shown below,
UINT32_TO_UINT8_IN_LE( LanDataTransmitBuffer[ 0 ] , Data1
);
Code readability issue is fixed, but what about the
performance, this packing and unpacking in each node surely consumes
considerable amount of time in the total transmission as for each copy there
are four load/store instruction in addition to the shifting and other
instruction. What can be done for this?
This is where one of the pointer hack can be used to
improve the performance. The assignment can be done using a single statement by
using pointers instead of the above method where at least four statements are
needed. Below single statement can be used to do the same copy of uint32_t data
into the byte buffer array as above method,
*(( uint32_t * ) ( &destination[ 0 ] ) ) = source[ 0
];
Fair and simple right? Yes you can`t type these many
things for each conversion and having this for each conversion would make code
a bit unreadable and makes it prone for mistake, this can be fixed with a
simple macro definition as shown below,
#define UINT32_TO_UINT8_IN_LE( destination, source
) \
( *(( uint32_t * ) ( & ( destination ) ) ) = ( source
) )
The macro can used as shown below,
UINT32_TO_UINT8_IN_LE(
LanDataTransmitBuffer[ 0 ] , Data1 );
Hurrah, We`ve achieved what we want in a single statement
instead of four statements ( Similar macro can be implemented for unpacking in receiving node ). Performance wise is this is considerable amount of
improvement, Code readability wise also it`s okay.
Is there anything wrong with this method? Can this be
used on any system blindly without any other consideration?
As with the usual cases of using pointers, there`s one
loop hole here also and that could create havoc if you don`t take necessary
precaution.
Any guess what it is? Yes the issue is Unaligned memory
access, In Higher end processors which supports unaligned memory access this
pointer dereference method can be used without any fuss, But as most of the
embedded system consists of low or medium end processor which may or may not
support unaligned memory access this is definitely a worrying issue.
One way of tackling this problem is taking care of the
alignment of uint8_t data buffer while creating it, this can be done by using pragma.
By allocating the starting byte of uint8_t data buffer in the address which is
multiple of four as needed by our controllers, we can sort out the issue. But
the problem with this method is, if there`s requirement which needs mixture of
data with different datatype as mentioned previously, this will fail, as we may
do uint32_t data copy from buffer element located in address which is not a
multiple of four. So In the processors which doesn`t support unaligned memory
access this method can`t be used.
In most processors which supports unaligned memory
access there are certain things you need to ensure before using this method.
First one is to ensure the Processor MMU is properly configured to support
unaligned memory access.
For example, in ARM7 Architectures, you need to disable
the unaligned memory access trap in CP15 register before using the above
method. Likewise you need to look for the appropriate configuration in other
architectures as well.
You also need to ensure another thing, you should define
the uint8_t data buffer in the memory region which is not a strongly ordered
memory region in Cache Lookup table. If you`ve configured that memory region as
strongly ordered, then again you`ll end in trap.
So there are important things to consider even in the
processor`s which support unaligned memory access. With these many
configurations using this method will definitely painful in the software which
may need be ported to different platforms and architectures in future.
Another basic thing you need to consider with this method is
endianness.
With these many things to consider do you think this
can be used in production software, especially in safety critical systems? Well
I`ve seen this method used in production code of Class III Medical Device (
yes, software failure will result in death of the person). That device uses
Intel Atom processor and VxWorks platform. As these higher end processors
platforms supports unaligned memory access, we`ve used this pointer dereference
method in the project. If you know what you`re doing, then with Pointers you
can make your software run with its utmost efficiency, but even if you miss tad
a bit you`ll have to face the wrath.
Okay, what`s your take? Will you go for this pointer
deference method or traditional method or memcpy? If you`ve any other alternative,
please let me know in comments or by mail.
No comments:
Post a Comment