# AWG Architecture and Execution Timing

## Global Architecture

The HDAWG Arbitrary Waveform Generator functionality is realized using field-programmable gate array (FPGA) technology. In order to provide sufficient digital signal processing resources to supply 4 or 8 high-speed outputs, the instrument architecture contains two types of FPGAs: 1 back end FPGA handling the central tasks of signal distribution and synchronization, and 2 (or 4) front end FPGAs, each feeding one pair of front panel Wave, Mark, and Trig connectors. This is sketched in Figure 1 for the 4-channel model HDAWG4, and an analogous diagram is valid for the HDAWG8.

Figure 1. HDAWG4 device architecture

On each front end FPGA, there is one so-called AWG Core, which is the unit sending waveforms from the memory to one pair of Wave and Mark outputs. Additionally, each front end FPGA holds 2 sine generators for digital modulation of this pair of outputs. This aspect of the HDAWG architecture is most relevant in understanding the channel grouping feature as well as triggering.

• Independently of the channel grouping mode of the HDAWG (1x8, 2x4, 4x2), sequence programs are always executed on the AWG cores. This means, e.g., in 1x8 mode, a high-level sequence program written in the AWG Sequencer Tab is getting compiled into 4 low-level sequence programs that are executed in parallel on the 4 AWG cores. The back-end FPGA synchronizes the execution timing.

• Sine generator signals are local within one front end FPGA, which is why combinations between AWG channels and sine generators for digital modulation are only possible within one output pair

• Oscillator signals are global with HDAWG-MF option, this is realized by having multiple synchronized copies of oscillators on the different AWG cores

• The 4 Marker signals from one AWG core (2 per AWG channel) can be routed to the two Mark outputs within one output pair, but not to other Mark outputs.

The digital signal processing paths between the AWG Cores and the instrument periphery (Wave, Mark, Trig, DIO, and ZSync connectors) are associated with different propagation delays. This has the following consequences:

• The relative timing of sequencer instruction execution on the AWG Cores (such as setDIO, getDIO, setTrigger, playWave) is not necessarily identical to the timing of their effect at the instrument periphery (changing a DIO connector voltage level, reading a DIO voltage level, changing a Trig voltage level, output of the first sample of a waveform signal).

• Trigger input signals from the front panel arrive first to one of the front-end FPGAs, from where they are distributed to the back-end FPGA and to the other front-end FPGAs. The internal trigger distribution is associated with a delay, therefore the lowest trigger-to-output latency is achieved using local triggering within one input/output connector pair in 4x2 mode.

One practical example where the propagation delay matters is the following sequence program, which generates a short rectangular pulse on Wave output 1, as well as a rising and falling edge on Mark output 1, when those outputs are configured accordingly.

playWave(ones(64));
setTrigger(1);
waitWave();
setTrigger(0);

In this sequence program, the sequencer on the AWG Core issues the instruction setTrigger(1) after it starts the waveform playback, and it issues the instruction setTrigger(0) after the end of the waveform playback. However, in the output signals of the Wave and Mark connectors as measured on a scope, the rising (falling) edge of the Mark output signal is earlier than the rising (falling) edge of the Wave output signal. The reason is that the processing delay between the sequencer and the Mark output relevant for the setTrigger instructions is roughly 15 ns shorter than the processing delay between the sequencer and the Wave output via the waveform player.

The Signal Routing and Modulation block enables different methods of digital modulation and is described in more detail in Output Tab and Multi Frequency Modulation Tab .

## AWG Core Architecture

The AWG core architecture is sketched in Figure 2. The main element of the core is the Sequencer, a real-time processor running at a clock speed of nominally 300 MHz, or 1/8 of the sampling rate. Each high-level sequence instruction represented in the AWG Sequence Editor is compiled into 1 or multiple low-level instructions represented in the AWG Sequencer Advanced sub-tab. The low-level instructions are executed with deterministic timing, one per Sequencer clock cycle. Each instruction is executed immediately after the previous one, with the exception of playWave and playZero instruction, which are executed after the previous waveform playback is finished. The last point means that sequential waveforms are played immediately after one another, back to back, as long as their length meets the granularity specification, i.e. is equal to 32 samples plus a multiple of 16 samples. The table below shows examples of high-level and corresponding low-level instructions.

Table 1. High-level and compiled instructions
High-level instruction Low-level (compiled) instructions

playWave(ones(128)); // (used in a short program)

wvfe R1, 256

playWave(ones(128)); // (used in a long program)

wvfe R1, 256

setTrigger(1);

st R1, 34

setTrigger(getUserReg(0));

ld R1, 0

st R2, 105

st R1, 34

As the examples show, a single line in the LabOne Sequencer language may translate in different numbers of low-level instructions, depending on how high-level instructions are nested in that line. The example of the instruction playWave(ones(128)) also shows that identical high-level instruction may compile into different low-level instructions depending on other parameters. In this case, the total number of waveforms has an influence on the waveform memory address width on the hardware, and either 1 or 3 instructions are required to start the waveform playback.

Practically, the method of commenting out an individual instruction and recompiling a sequence program allows one to infer the number of corresponding low-level instructions. This is suitable to predict the relative timing in series of instructions, e.g. a series of setTrigger, wait, setDIO. A more transparent approach is offered by the command table feature. The command table allows one to execute groups of related low-level instructions in a single clock cycle, independently of the length of a sequence program.

The knowledge of the exact number of low-level instructions is typically not needed in sequence programs that make use of the classical AWG instruction set only, i.e. waveform playback (playWave, playZero) and external triggering (waitDigTrigger).

Figure 2. AWG core architecture

For a deeper understanding of the execution timing, it’s necessary to look at the interplay between the Sequencer and the other elements of the AWG core. The Sequencer distributes most of the instructions of the high-level AWG Sequencer Tab into separate queues:

• The Playback Queue holds waveform playback instructions

• The Prefetch Queue holds waveform prefetch instructions that load waveform data from the high-latency main memory into the low-latency, real-time cache memory

• The Wait&Set Queue holds instructions to wait for a trigger, or to set parameters

In this way, the Sequencer is able to "move ahead in time" and distribute multiple waveform prefetch instructions in the Prefetch Queue, allowing enough time to load the corresponding waveform data into the cache memory before they are required by the Playback Queue. The timed execution of instructions across separate queues is managed by the Timing Unit.

There is one class of instructions that however cannot be distributed into queues ahead of time: these are instructions of the "Get" type, such as getDIO, that return a value to the Sequencer. Since the sequencer language allows that subsequent instructions are influenced by the returned value (e.g. by using the external input for a conditional branch), the Sequencer must run on the assumption that all previous queues have to be empty before executing the Get instruction, and queues can only be filled again when the Get instruction is completed. This instruction classification timing rules are summarized in the following table.

Table 2. Instruction execution
Instruction class Examples Executed by…​ Executed…​

Playback

playWave

playAuxWave

playWaveDIO

playWaveZSync

playZero

executeTableEntry

Playback queue

…​when triggered by Timing Unit

Prefetch

prefetch

Prefetch queue

…​as early as possible

Wait/Set

wait

waitWave

waitDigTrigger

waitDIOTrigger

waitZSyncTrigger

waitCntTrigger

waitSineOscPhase

resetOscPhase

setSinePhase

incrementSinePhase

waitPlayQueueEmpty

setTrigger

randomSeed

setDIO

setRate

setPrecompClear

setPRNGSeed

setPRNGRange

Wait & Set queue

…​when triggered by Timing Unit

Get

getDIO

getZSyncData

getDigTrigger

getUserReg

getCnt

getPRNGValue

Sequencer

…​when Wait&Set queue is empty

The cache memory holds space for 256 kSa of dual-channel waveform data. This memory is divided into 256 blocks of 1024 Sa (dual-channel) length. At any start of a waveform playback, the necessary waveform data needs to be present in the cache memory. For long waveforms that exceed the size of 2 cache memory blocks (2048 Sa dual-channel), only the waveform beginning of that size (2048 Sa dual-channel) needs to be present. The remainder of the waveform data is processed through a FIFO (first-in-first-out) buffer and does not occupy cache memory. This division between cache memory and FIFO buffer allows the system to bridge the access time to the main memory,

As a consequence, there is a limit in total number of long waveforms in a sequence below which a user has the full freedom in specifying a sequence program. Above that limit, the sequence program needs to fulfill certain conditions that are detailed below and that are verified by the AWG Compiler.

The limit on the number of long waveforms depends on the amount of waveform memory (say, Mshort) occupied by all short waveforms in a sequence. The limit is then given by (Mcache–Mshort)/(2Mblock), where Mcache = 256 kSa is the size of the cache memory, and Mblock = 1024 Sa is the size of one cache memory block. In case no short waveforms are used, the limit on the number of long waveforms is highest, namely Mcache/(2Mblock) = 128.

The limit can be lifted for sequences whose structure allows for idle times to controllably replace waveform beginnings in the cache during runtime. These idle times can arise due to either of the following sequence elements:

1. playZero(n) instructions of sufficient duration n

2. Playback of very large waveforms with a length beyond 16 Mblock = 16,384 s Sa

In the first case, the playZero instruction creates a well-defined segment of zero-valued signal without the use of waveform memory. In the second case, the FIFO buffer is able to advance sufficiently in data accumulation such to create an idle time during runtime. Both of these sequence elements are identified by the AWG Compiler, who then places appropriate waveform prefetch instructions in the compiled sequence. If the AWG Compiler finds that the waveform limit is exceeded, but there are not sufficient idle times to fetch data, it will generate an error, because the AWG could not guarantee gapless and jitter-free playback of the.intended sequence.

Consider a typical example of a triggered series of waveform playbacks:

waitDigTrigger(1);
playWave(long_waveform_001);
waitDigTrigger(1);
playWave(long_waveform_002);
(...)
waitDigTrigger(1);
playWave(long_waveform_200);

The total number 200 of long waveforms exceeds the limit 128. Unless some of these waveforms are longer than 30 kSa, the compiler will not detect any opportunity to fetch waveform data in the middle of the sequence, and thus will reject the compilation.

In practice, there is however often enough time between the end of a waveform and the following trigger event. This idle time can be made available for waveform data fetching. This can be achieved by inserting playZero instructions at places where no signal is to be generated. The sequence can be modified e.g. to this:

const M_idle = 8000;
waitDigTrigger(1);
playWave(long_waveform_001);
playZero(M_idle);
waitDigTrigger(1);
playWave(long_waveform_002);
playZero(M_idle);
(...)
waitDigTrigger(1);
playWave(long_waveform_200);
playZero(M_idle);

The AWG Compiler is thus made aware of the idle times and can use them to fetch waveform data. The sequence will be accepted for compilation, and is played back as intended.

 The use of the playZero(n) instruction is strongly encouraged in replacement of a playback of a zero-valued waveform, e.g. playWave(zeros(n)). Both instructions will result in the same output, but playZero consumes no waveform memory. In addition to avoiding the limit of number of waveforms, it allows you to greatly reduce waveform upload time to the instrument.
 The instruction playZero(n) can replace the instruction wait(m) with m=n/8 in most cases. It has the advantage of behaving equivalently to a playWave instruction. The wait instruction is required to implement a waiting time depending on a run-time variable, or to generate a constant output voltage in combination with the last-sample-hold feature.