This document was written after trying to understand the LTTng format. It hopefully provides a useful reference to others. Parts of which I am unsure of are marked in red. UPDATE: I have since found the official documentation, so this now documents my questions about the format.
Things that I do not understand are marked in red. Things that are currently unused or unimplemted are marked in green.
LTTng stores data in a directory containing multiple files, rather
than just a single file. This directory contains the following a
subdirectory eventdefs
which contain the XML-like
definitions of the events. It contains a directory
control
which contains a set of trace files of the form
tracefilename_n
where n is the CPU number. One
of the sets of traces is called facilities. These files are
special as the contain facility load events, which load event
definitions from the XML-like files. Other sets of files are:
Trace set name | Description |
---|---|
interrupts | Contains any interrupt events |
modules | Contains module load and unload events |
processes | Contains important process events such as creation and exit. |
Most of the events are stored in the traceset called cpu. These tracefiles are stored in the root of the trace directory, not under control. These files seem to contain most events.
Each tracefile consists of a list of blocks. Each block contains a header followed by a list of events.
Each block starts with a header containg the following fields:
Name | Offset (bytes) | Type | description |
---|---|---|---|
begin_cycle_count | 0 | uint64_t | Value of the timestamp counter for the first entry in this block |
begin_freq | 8 | uint64_t | CPU frequency at the start of the block. |
end_cycle_count | 16 | uint64_t | Value of the timestamp counter for the last entry in this block |
end_freq | 24 | uint64_t | Clock frequency at the end of the block. Will be used on variable frequency processors. |
lost_size | 32 | uint32_t | The number of unused bytes in this block. As events don't always fill the block entirely this indicates how many bytes are unused at the end. |
buf_size | 36 | uint32_t | The size of this buffer. |
The number of blocks in a file is determined by dividing the size of file by the size of the first block. In the future blocks will be of variable size will be used. Determing numbers of blocks in this case will require as scan of the file..
Following the block header, is the trace file header. This is contained in each block to support flight recorder mode of operation. The trace file header has a set of general headers which are defined for all trace files, and then, depending on the trace file version, a set of extra fields. The basic fields are:
Name | Offset (bytes) | Type | description |
---|---|---|---|
magic_number | 40 | uint32_t | A magic number 0x00D6B7ED . This field can be read to determine the endianess of the file. |
arch_type | 44 | uint32_t | Architecture of the traced machine. Valid values are:
|
arch_variant | 48 | uint32_t | Architecture variant. May not be used on all architecutres. Valid values are:
|
float_word_order | 52 | uint32_t | Byte order of float and doubles. Valid values are:
|
arch_size | 56 | uint8_t | Size of void * in bytes. |
major_version | 57 | uint8_t | Major version number of the file format. |
minor_version | 58 | uint8_t | Minor version number of the file format. |
flight_recorder | 59 | uint8_t | Is flight recorded mode activated? |
has_heartbeat | 60 | uint8_t | Does this trace have a heartbeat? Valid values:
|
alignment | 61 | uint8_t | Alignment of event header and event data within the trace file.
(Not was previously badly named has_alignment ). |
freq_scale | 62 | uint32_t | Amount by which frequency is scaled. real_freq = freq / freq_scale . |
Name | Offset (bytes) | Type | description |
---|---|---|---|
start_freq | 66 | uint64_t | Frequency at the start of the trace. This may differ from the begin_freq field in the first block
if LTT was operating in flight recorder mode. |
start_tsc | 74 | uint64_t | Time stamp counter at the start of the trace. This may differ from the begin_cycle_count field in the first block
if LTT was operating in flight recorder mode. |
unused | 74 | uint64_t | UNUSED |
start_time_sec | 82 | uint64_t | Seconds component of time at start of the trace. This is the only reference to real time in the trace. |
start_time_usec | 82 | uint64_t | Microseconds component of time at start of the trace. |
Following the trace file header is a list of events. Each event
consists of an event header, which is generic to all types of events,
and event data which which is specific to the event type. Each component is
aligned to the alignment
field in the trace header.
Name | Offset (bytes) | Type | description |
---|---|---|---|
timestamp | 0 | uint64_t | timestamp counter of this event |
facility_id | 8 | uint8_t | References which facility this event is in. |
event_id | 9 | uint8_t | Identifies the specific event in the given facility. |
event_size | 10 | uint16_t | Size of event data. Valid values are:
|
Name | Offset (bytes) | Type | description |
---|---|---|---|
timestamp | 0 | uint32_t | 32 lowest significant bits of timestamp counter of this event |
facility_id | 4 | uint8_t | References which facility this event is in. |
event_id | 5 | uint8_t | Identifies the specific event in the given facility. |
event_size | 10 | uint16_t | Size of event data. Valid values are:
|
bookmarks.xml
bookmarks.xml will eventually be generated by the viewer to add annotations to the trace (for example, a specific time marked with some text information). It has not been implemented yet.
system.xml
system.xml should be an XML file that contains the system information at trace start as it is seen by lttctl. Note that whis information _should_ be optional, as a trace can start before there is any existing user space VFS. Moreover, this information is not always relevant : think of a vserver system, where the hostname is different for processes in each vserver : as the whole kernel is traced, which hostname will be chosen ? The answer is : the hostname as seen in the environment variables of the lttctl process. This, too, has not been implemented yet.
The format of the event data depends on the specific event it
is. These events are defined in XML-like files found in the
eventdefs
subdirectory of the trace.
The core
events are defined in core.xml
,
and are available at startup. Other event definitions are loaded explicity
from events in the facilities
trace files.
This is best described by the XML schema description.
The following documentation is generated directly from the XML files.
The core facility contains the basic tracing related events.
Facility is loaded.
The core facility contains the basic tracing related events.
Facility is loaded.
Facility is unloaded.
System time values sent periodically to detect cycle counter rollovers. Useful when only the 32 LSB of the TSC are saved in events header : we save the full 64 bits in this event.
Facility is loaded while in state dump. XXX: I am not sure why this is special, and needed in addition to facility load.
The fs facility contains events related to file system operation
Staring to wait for a buffer
Ending to wait for a buffer
Executing a file
Opening a file
Closing a file descriptor
Reading from a file descriptor
Write to a file descriptor
Seek a file descriptor
Do a IOCTL on a file descriptor
Do a select on a file descriptor
Do a poll on a file descriptor
The ipc facility contains events related to Inter Process Communication
IPC call
Get an IPC message queue identifier
Get an IPC semaphore identifier
Get an IPC shared memory identifier
The kernel facility has events related to kernel execution status.
Entry in a trap
Exit from a trap
Soft IRQ entry
Soft IRQ exit
Tasklet entry
Tasklet exit
Entry in an irq
Exit from an IRQ
The kernel facility has events related to kernel execution status for the i386 architecture.
System call entry
System call exit
The memory facility has memory management events.
Page allocation
Page free
Page swapped into memory
Page swapped to disk
Staring to wait for a page
Ending wait for a page
The network facility contains events related to low level network operations
A packet is arriving
We send a packet
The process facility has events related to process handling in the kernel.
Process fork
Just created a new kernel thread
Process exit
Process wait
Process kernel data structure free (end of life of a zombie)
Process kill system call
Process signal reception
Process wakeup
Scheduling change
The socket facility contains events related to sockets
Generic socket call : FIXME : should be more detailed.
Create a socket
Sending a socket message
Receiving a socket message
The statedump facility contains the events generated at trace startup
List of open file descriptors
List of active vm maps
List of loaded kernel modules
List of registered interrupts
State of each process when statedump is performed
Kernel state dump complete
The timer facility has events related to timer events in the kernel.
A timer or itimer has expired.
The timer softirq is currently runned.
An interval timer is set.