AWG Architecture and Execution Timing¶
The HDAWG Arbitrary Waveform Generator functionality is realized using field-programmable gate array (FPGA) technology. In order to provide sufficient digital signal processing resources to supply 4 or 8 high-speed outputs, the instrument architecture contains two types of FPGAs: 1 back end FPGA handling the central tasks of signal distribution and synchronization, and 2 (or 4) front end FPGAs, each feeding one pair of front panel Wave, Mark, and Trig connectors. This is sketched in Figure 1 for the 4-channel model HDAWG4, and an analogous diagram is valid for the HDAWG8.
On each front end FPGA, there is one so-called AWG Core, which is the unit sending waveforms from the memory to one pair of Wave and Mark outputs. Additionally, each front end FPGA holds 2 sine generators for digital modulation of this pair of outputs. This aspect of the HDAWG architecture is most relevant in understanding the channel grouping feature as well as triggering.
- Independently of the channel grouping mode of the HDAWG (1x8, 2x4, 4x2), sequence programs are always executed on the AWG cores. This means, e.g., in 1x8 mode, a high-level sequence program written in the AWG Sequencer Tab is getting compiled into 4 low-level sequence programs that are executed in parallel on the 4 AWG cores. The back-end FPGA synchronizes the execution timing.
- Sine generator signals are local within one front end FPGA, which is why combinations between AWG channels and sine generators for digital modulation are only possible within one output pair
- Oscillator signals are global with HDAWG-MF option, this is realized by having multiple synchronized copies of oscillators on the different AWG cores
- The 4 Marker signals from one AWG core (2 per AWG channel) can be routed to the two Mark outputs within one output pair, but not to other Mark outputs.
The digital signal processing paths between the AWG Cores and the instrument periphery (Wave, Mark, Trig, DIO, and ZSync connectors) are associated with different propagation delays. This has the following consequences:
- The relative timing of sequencer instruction execution on the AWG
Cores (such as
playWave) is not necessarily identical to the timing of their effect at the instrument periphery (changing a DIO connector voltage level, reading a DIO voltage level, changing a Trig voltage level, output of the first sample of a waveform signal).
- Trigger input signals from the front panel arrive first to one of the front-end FPGAs, from where they are distributed to the back-end FPGA and to the other front-end FPGAs. The internal trigger distribution is associated with a delay, therefore the lowest trigger-to-output latency is achieved using local triggering within one input/output connector pair in 4x2 mode.
One practical example where the propagation delay matters is the following sequence program, which generates a short rectangular pulse on Wave output 1, as well as a rising and falling edge on Mark output 1, when those outputs are configured accordingly.
playWave(ones(64)); setTrigger(1); waitWave(); setTrigger(0);
In this sequence program, the sequencer on the AWG Core issues the
setTrigger(1) after it starts the waveform playback, and
it issues the instruction
setTrigger(0) after the end of the waveform
playback. However, in the output signals of the Wave and Mark connectors
as measured on a scope, the rising (falling) edge of the Mark output
signal is earlier than the rising (falling) edge of the Wave output
signal. The reason is that the processing delay between the sequencer
and the Mark output relevant for the
setTrigger instructions is
roughly 15 ns shorter than the processing delay between the sequencer
and the Wave output via the waveform player.
AWG Core Architecture¶
The AWG core architecture is sketched in Figure 2. The main
element of the core is the Sequencer, a real-time processor running at a
clock speed of nominally 300 MHz, or 1/8 of the sampling rate. Each
high-level sequence instruction represented in the AWG Sequence Editor
is compiled into 1 or multiple low-level instructions represented in the
AWG Sequencer Advanced sub-tab. The low-level instructions are executed
with deterministic timing, one per Sequencer clock cycle. Each
instruction is executed immediately after the previous one, with the
playZero instruction, which are executed
after the previous waveform playback is finished. The last point means
that sequential waveforms are played immediately after one another, back
to back, as long as their length meets the granularity specification,
i.e. is equal to 32 samples plus a multiple of 16 samples. The table
below shows examples of high-level and corresponding low-level
|High-level instruction||Low-level (compiled) instructions|
playWave(ones(128)); // (used in a short program)
wvfe R1, 256
playWave(ones(128)); // (used in a long program)
addi R1, R0, 256
addiu R1, R1, 524288
wvfe R1, 256
addi R1, R0, 1
st R1, 34
ld R1, 0
addi R2, R0, 1165
st R2, 105
st R1, 34
High-level and compiled instructions
As the examples show, a single line in the LabOne Sequencer language may
translate in different numbers of low-level instructions, depending on
how high-level instructions are nested in that line. The example of the
playWave(ones(128)) also shows that identical high-level
instruction may compile into different low-level instructions depending
on other parameters. In this case, the total number of waveforms has an
influence on the waveform memory address width on the hardware, and
either 1 or 3 instructions are required to start the waveform playback.
Practically, the method of commenting out an individual instruction and
recompiling a sequence program allows one to infer the number of
corresponding low-level instructions. This is suitable to predict the
relative timing in series of instructions, e.g. a series of
setDIO. A more transparent approach is offered
by the command table feature. The command table allows one to execute
groups of related low-level instructions in a single clock cycle,
independently of the length of a sequence program.
The knowledge of the exact number of low-level instructions is typically
not needed in sequence programs that make use of the classical AWG
instruction set only, i.e. waveform playback (
and external triggering (
For a deeper understanding of the execution timing, it’s necessary to look at the interplay between the Sequencer and the other elements of the AWG core. The Sequencer distributes most of the instructions of the high-level AWG Sequencer Tab into separate queues:
- The Playback Queue holds waveform playback instructions
- The Prefetch Queue holds waveform prefetch instructions that load waveform data from the high-latency main memory into the low-latency, real-time cache memory
- The Wait&Set Queue holds instructions to wait for a trigger, or to set parameters
In this way, the Sequencer is able to "move ahead in time" and distribute multiple waveform prefetch instructions in the Prefetch Queue, allowing enough time to load the corresponding waveform data into the cache memory before they are required by the Playback Queue. The timed execution of instructions across separate queues is managed by the Timing Unit.
There is one class of instructions that however cannot be distributed
into queues ahead of time: these are instructions of the "Get" type,
getDIO, that return a value to the Sequencer. Since the
sequencer language allows that subsequent instructions are influenced by
the returned value (e.g. by using the external input for a conditional
branch), the Sequencer must run on the assumption that all previous
queues have to be empty before executing the Get instruction, and queues
can only be filled again when the Get instruction is completed. This
instruction classification timing rules are summarized in the following
|Instruction class||Examples||Executed by...||Executed...|
...when triggered by Timing Unit
...as early as possible
Wait & Set queue
...when triggered by Timing Unit
...when Wait&Set queue is empty
The cache memory holds space for 256 kSa of dual-channel waveform data. This memory is divided into 256 blocks of 1024 Sa (dual-channel) length. At any start of a waveform playback, the necessary waveform data needs to be present in the cache memory. For long waveforms that exceed the size of 2 cache memory blocks (2048 Sa dual-channel), only the waveform beginning of that size (2048 Sa dual-channel) needs to be present. The remainder of the waveform data is processed through a FIFO (first-in-first-out) buffer and does not occupy cache memory. This division between cache memory and FIFO buffer allows the system to bridge the access time to the main memory,
As a consequence, there is a limit in total number of long waveforms in a sequence below which a user has the full freedom in specifying a sequence program. Above that limit, the sequence program needs to fulfill certain conditions that are detailed below and that are verified by the AWG Compiler.
The limit on the number of long waveforms depends on the amount of waveform memory (say, Mshort) occupied by all short waveforms in a sequence. The limit is then given by (Mcache–Mshort)/(2Mblock), where Mcache = 256 kSa is the size of the cache memory, and Mblock = 1024 Sa is the size of one cache memory block. In case no short waveforms are used, the limit on the number of long waveforms is highest, namely Mcache/(2Mblock) = 128.
The limit can be lifted for sequences whose structure allows for idle times to controllably replace waveform beginnings in the cache during runtime. These idle times can arise due to either of the following sequence elements:
playZero(n)instructions of sufficient duration
Playback of very large waveforms with a length beyond 16 Mblock = 16,384 s Sa
In the first case, the
playZero instruction creates a well-defined
segment of zero-valued signal without the use of waveform memory. In the
second case, the FIFO buffer is able to advance sufficiently in data
accumulation such to create an idle time during runtime. Both of these
sequence elements are identified by the AWG Compiler, who then places
appropriate waveform prefetch instructions in the compiled sequence. If
the AWG Compiler finds that the waveform limit is exceeded, but there
are not sufficient idle times to fetch data, it will generate an error,
because the AWG could not guarantee gapless and jitter-free playback of
Consider a typical example of a triggered series of waveform playbacks:
waitDigTrigger(1); playWave(long_waveform_001); waitDigTrigger(1); playWave(long_waveform_002); (...) waitDigTrigger(1); playWave(long_waveform_200);
The total number 200 of long waveforms exceeds the limit 128. Unless some of these waveforms are longer than 30 kSa, the compiler will not detect any opportunity to fetch waveform data in the middle of the sequence, and thus will reject the compilation.
In practice, there is however often enough time between the end of a waveform and the following trigger event. This idle time can be made available for waveform data fetching. This can be achieved by inserting playZero instructions at places where no signal is to be generated. The sequence can be modified e.g. to this:
const M_idle = 8000; waitDigTrigger(1); playWave(long_waveform_001); playZero(M_idle); waitDigTrigger(1); playWave(long_waveform_002); playZero(M_idle); (...) waitDigTrigger(1); playWave(long_waveform_200); playZero(M_idle);
The AWG Compiler is thus made aware of the idle times and can use them to fetch waveform data. The sequence will be accepted for compilation, and is played back as intended.
The use of the
playZero(n) instruction is strongly encouraged in
replacement of a playback of a zero-valued waveform, e.g.
playWave(zeros(n)). Both instructions will result in the same output,
playZero consumes no waveform memory. In addition to avoiding the
limit of number of waveforms, it allows you to greatly reduce waveform
upload time to the instrument.
playZero(n) can replace the instruction
m=n/8 in most cases. It has the advantage of behaving equivalently to
playWave instruction. The
wait instruction is required to
implement a waiting time depending on a run-time variable, or to
generate a constant output voltage in combination with the