<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE html5>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
  <!-- N.B. Sentences in this document are double-spaced so that Emacs
       sentence-editing functions work more reliably. -->
  <head>
    <title>DCTV trace analysis system</title>
    <link rel="stylesheet" href="reset.css" />
    <link rel="stylesheet" href="styles.css" />
  </head>
  <body>
    <div id="container">
      <header id="header">DCTV</header>
      <nav id="sidebar">
        <ul>
          <li>
            <a href="#introduction">Introduction</a>
            <ul>
              <li><a href="#quickstart">Quick start</a></li>
              <li><a href="#background">Background</a></li>
              <li><a href="#conventions">Document conventions</a></li>
              <li><a href="#datamodel">Data model</a></li>
              <li><a href="#differences">Differences from standard SQL</a></li>
            </ul>
          </li>
          <li><a href="#syntax">Syntax reference</a></li>
          <li><a href="#standard_library">Standard library</a></li>
          <li><a href="#example">Worked example</a></li>
        </ul>
      </nav>
      <main id="manual">
        <h1><a name="introduction">Introduction</a></h1>
        <p>DCTV is a data exploration toolkit designed for both
        interactive and batch analysis of trace files and other
        heterogeneous time series data.  It's designed to answer
        complex of the sort of data that one frequently finds in
        records of system activity.</p>
        <p>Important features of DCTV are:</p>
        <ul>
          <li>SQL1999 querying of trace files</li>
          <li>specialized relational algebra and SQL syntax for time series</li>
          <li>comprehensive dimensional analysis for unit conversion
          and error detection</li>
          <li>support for analyzing very large (larger than memory)
          trace files</li>
          <li>powerful GUI for interactive trace exploration</li>
        </ul>
        <p>Use cases include:</p>
        <ul>
          <li>examining CPU time spent by a particular application</li>
          <li>examining CPU time spent in <emph>part</emph> of an
          application</li>
          <li>examining memory activity of the whole system to determine
          what caused a game to miss a frame deadline</li>
          <li>finding which functions cause the most page faults
          during app startup</li>
          <li>tracking down slow memory leaks</li>
          <li>finding why a real-time thread took too long to run and
          poll a device</li>
          <li>bulk analysis of traces from production to extract metrics for a
          dashboard</li>
        </ul>
        <p>DCTV is a "power user" tool: using it effectively requires
        an understanding of both the system components that generate
        the trace events being queried and an understanding of
        SQL-like declarative query systems.  This document aims to
        describe and document DCTV's functionality, walk through a few
        examples of trace analysis, and invite the reader to
        investigate further.</p>
        <h2><a name="quickstart">Quick start</a></h2>
        <aside class="warning">
          DCTV is under active development and is not yet stable.
          It also currently runs on Linux systems only; a port to
          macOS is underway.  See the <a href="go/dctv-db-design">DB
          design document</a> for further information on checking out and
          building the source code.
        </aside>
        <h3>Getting DCTV</h3>
        <ol>
          <li>Be running gLinux (we'll port eventually)</li>
          <li><code>git clone sso://team/dctv/dctv</code></li>
          <li><code>make dev</code></li>
          <li>follow prompts; install dependencies</li>
          <li>while the build is broken, complain to dancol@, goto 2</li>
          <li><code>./dctv</code></li>
        </ol>
        <h3>Hello world</h3>
        <code class="blockquote"><![CDATA[$ ./dctv repl mytrace=mytrace.ftrace
Type .help for help.
DCTV> SELECT COUNT(*) FROM mytrace.scheduler.timeslices_p_cpu;
COUNT()
-------
  32362
  ]]></code>
        <p>
        </p>
        <h2>Background</h2>
        <blockquote>
          Life is just one damned thing after another.
          <cite>Arnold J. Toynbee</cite>
        </blockquote>
        <h3>Purpose of DCTV</h3>
        <p>A trace file by itself is of limited utility: it's
        gigabytes of detailed, low-level records of system activity.
        When we analyze a trace file, what we really want to do is
        <emph>pose questions</emph> to that trace file and get back
        meaningful answers.  The information we want lies in the
        non-trivial <emph>relationships</emph> between trace events,
        the relationships between relationships, and so on, in a way
        that puts limits on the kind of trace analysis that it's
        possible to do using ad-hoc analysis of trace
        events themselves.</p>
        <p>After we pose questions to a trace file and get answers, we
        frequently want to use these answers as the basis for further
        questions.  In this way, we gradually increase the level of
        abstraction of our analysis, moving from questions posed in
        terms of raw trace events to ones posed in terms of the
        problem we've actually trying to solve.</p>
        <p>DCTV is a question-answering machine.  By incrementally
        constructing queries and then querying against them (for
        example, using the <code>WITH</code> construction), users
        extract increasingly abstract data from trace files, data not
        directly represented by discrete and specific low-level events
        in a trace.  The SQL REPL and the GUI both provide
        information-querying capabilities.</p>
        <p>DCTV also provides a <a href="#standard_library">standard
        library</a> of ready-made building blocks that users can query
        during trace analysis.</p>
        <h3>Other trace analysis tools</h3>
        <p>DCTV is not the first such tool for trace analysis.
        It integrates the best parts of WPA, LISA, and Perfetto's
        trace analysis models.</p>
        <code>TODO(dancol): flesh out this section</code>
        <h2><a name="conventions">Document conventions</a></h2>
        <p>This document currently assumes the reader is familiar with
        the basics of SQL and the basics of trace processing, focusing
        on DCTV's specific features in this area.</p>
        <h3>Time tables</h3>
        <p>Some figures below are "time tables" (they have "Time ▶" in
        the upper-left).  They represent timelines, where each row in
        the table is a separate and independent data series.
        Some tables represent operands and results; in this case, a
        thick black line separates the input rows and output rows.</p>

        <h3>Function signatures</h3>
        <p>Table-valued function signatures are given in Python
        syntax, with a bare <code>*</code> signifying that all
        arguments following the <code>*</code> are keyword-only and
        cannot be specified positionally.  (That is, if a function
        signature is <code>foo(*, bar=7)</code>, then you have to
        write either <code>foo()</code> (using <code>bar</code>'s
        default value or write <code>foo(bar=&gt;5)</code> (specifying an
        explicit value of the keyword argument), and you can't write
        <code>foo(1)</code> (because we can't specify
        <code>bar</code> positionally.)</p>
        <h2><a name="datamodel">Data model</a></h2>
        <p>DCTV is designed around querying one or more trace files
        using SQL queries.  DCTV performs no hardcoded pre-processing
        of trace files: we model each event in a trace file as a row
        of the "raw events" table corresponding to that event's type.
        Each field in an event is a column in that event's table;
        users extract higher-level information from these low-level
        events by defining views in terms of these low-level events.
        By querying the views, users can extract higher-level trace
        events; users can also define views in terms of other views to
        answer more abstract questions.</p>
        <h3>Table types</h3>
        <p>DCTV's query engine provides the tables and set functions
        that any SQL system provides, but extends these facilities
        with a set of operators and functions dedicated to working
        with heterogeneous time series.  Tables in DCTV are
        first-class <emph>typed</emph> objects: tables are either
        regular tables, span tables, or event tables.  Each type of
        table has a set of query operations that it supports; DCTV
        provides functions to convert one type of table to another as
        needed.</p>

        <aside class="note">It's always possible to "view" one of
        DCTV's special table types as a regular table by just using
        regular table operations (like the non-<code>SPAN</code>
        variant of <code>SELECT</code>) on it.  The result of any of
        these non-special operations is itself a regular
        table.</aside>

        <p>This table summarizes the special operations DCTV supports.
        Don't worry if you don't recognize some of these terms (like
        "partitioned span table"): they're defined below.</p>

        <table class="general">
          <tr>
            <th>Operation</th>
            <th>Left operand</th>
            <th>Right operand</th>
            <th>Result</th>
          </tr>
          <tr>
            <td>SELECT</td>
            <td>Regular table</td>
            <td>N/A</td>
            <td>Regular table</td>
          </tr>
          <tr>
            <td>SELECT</td>
            <td>Span table</td>
            <td>N/A</td>
            <td>Regular table</td>
          </tr>
          <tr>
            <td>SELECT SPAN</td>
            <td>Span table</td>
            <td>N/A</td>
            <td>Span table</td>
          </tr>
          <tr>
            <td>SPAN JOIN</td>
            <td>Unpartitioned span table</td>
            <td>Unpartitioned span table</td>
            <td>Unpartitioned span table</td>
          </tr>
          <tr>
            <td>SPAN BROADCAST INTO</td>
            <td>Unpartitioned span table</td>
            <td>Partitioned span table</td>
            <td>Partitioned span table</td>
          </tr>
          <tr>
            <td>SPAN BROADCAST FROM</td>
            <td>Partitioned span table</td>
            <td>Unpartitioned span table</td>
            <td>Partitioned span table</td>
          </tr>
          <tr>
            <td>GROUP USING PARTITION</td>
            <td>Partitioned span table</td>
            <td>N/A</td>
            <td>Unpartitioned span table</td>
          </tr>
          <tr>
            <td>GROUP USING SPANS FROM</td>
            <td>Partitioned span table</td>
            <td>Unpartitioned span table</td>
            <td>Partitioned span table</td>
          </tr>
          <tr>
            <td>GROUP USING SPANS FROM</td>
            <td>Unpartitioned span table</td>
            <td>Unpartitioned span table</td>
            <td>Unpartitioned span table</td>
          </tr>
        </table>

        <p>A <dfn>regular SQL table</dfn> is essentially a list of
        points in high-dimensional space, with each column in the
        table representing one dimension along which a point can
        vary.</p>

        <p>A <dfn>span table</dfn> represents data that vary over the
        time dimension.  An interval of time over which the data in a
        span table remain the same is called a <dfn>span</dfn>.
        The collection of time-varying data described by a span table
        is the <dfn>payload</dfn> of that span table.</p>
        <!-- TODO(dancol): talk about different time basis? -->
        <p>All span tables have two special columns:
        <dfn><code>_ts</code></dfn> and
        <dfn><code>_duration</code></dfn>.  <code>_ts</code> is an
        <code>INT64</code> timestamp, in nanoseconds since the start
        of the trace.  <code>_duration</code> is a non-zero
        <code>INT64</code> number of nanoseconds that the span covers.
        (That is, the span describes the region of time
        [<code>_ts</code>, <code>_ts</code> +
        <code>_duration</code>].)</p>

        <p><code>_ts</code> and <code>_duration</code> are always
        non-<code>NULL</code>, and a span table is always ordered by
        increasing values of <code>_ts</code>.  Spans in a span table
        cannot "overlap": a span must end either before or at exactly
        the same time as the next span begins.  (Spans from different
        partitions may overlap, however: see immediately below.)  A
        span table need not be contiguous: that is, it's legal for
        gaps to exist between spans.</p>

        <p>For example, imagine that you're looking at a Christmas
        tree light that changes color in time with music.  We might
        describe the color of the light using spans.  The following
        diagram depicts how we might use spans to describe the light's
        state.  Each pair of numbers (one above the table, one below)
        indicates the time corresponding the vertical line connecting
        them.</p>

        <table class="spanop">
          <caption>Light color</caption>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span><span>5</span></td>
          </tr>
          <tr>
            <td>Color</td>
            <td colspan="2">Red</td>
            <td colspan="1" class="empty" />
            <td colspan="1">Green</td>
          </tr>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span><span>5</span></td>
          </tr>
        </table>

        <p>Here, the light was red from time one to time three and
        then green from time four to time five, inclusive.  (From time
        three to time four, the light was off; we're choosing to
        represent "off" as the absence of a span, but an equally valid
        choice would be to make a span with a special "Off" value for
        the color.)</p>

        <p>It's useful to look at the physical table representation
        of the above set of spans.</p>

        <table class="general spantable">
          <caption>Light color (span table representation)</caption>
          <tr>
            <th>_ts</th>
            <th>_duration</th>
            <th>color</th>
          </tr>
          <tr>
            <td>1</td>
            <td>2</td>
            <td>red</td>
          </tr>
          <tr>
            <td>4</td>
            <td>1</td>
            <td>green</td>
          </tr>
        </table>

        <p>Note that one row in the physical table representation of a
        span table corresponds to one <emph>logical</emph> span.</p>

        <aside class="note">
          It's because span tables are always ordered by
          <code>_ts</code> that DCTV disallows queries of the form
          <code>SELECT SPAN ... ORDER BY ...</code>.  Re-ordering a
          span table makes no sense.  If you don't want to
          <code>SELECT</code> from a span table and make the result a
          span table, you can choose to instead view the span table as
          a regular table by using the non-<code>SPAN</code> variant
          of select (<code>SELECT * FROM my_span_table</code>),
          and in this mode, <code>SELECT</code> will let you order
          the result set by whatever you want.
        </aside>
        <p>An <dfn>event table</dfn> is like a span table, but without
        the <code>_duration</code> column.  It represents a sequence
        of "points" in time.  The advantage of using an event table
        over a regular SQL table to represent points is automatic
        integration of the event table into time-based operations
        on spans.</p>

        <h3>Partitions</h3>

        <p>A span table is either a <dfn>partitioned span table</dfn>
        or a <dfn>non-partitioned span table</dfn>.  A non-partitioned
        span table is just the kind of span table described above.
        A partitioned span table, by contrast, has an additional
        special column, the <dfn>partition column</dfn>.
        A partitioned span table is basically a bundle of logical
        partition tables all combined into a single table under a
        single name.  Each distinct <emph>value</emph> of the
        partition column, which is called a <dfn>partition</dfn>,
        defines one independent sequence of spans.</p>

        <p>All of DCTV's operations on span tables know about
        partitioned span tables (the partition column is part of the
        span table's type) and operate on each partition within a span
        table independently.  There are also operations that transform
        a partitioned span table into a non-partitioned span table
        through the use of SQL grouping operators.</p>

        <p>It's useful to sequences of spans this way instead of
        putting each in own table: this way, using a partitioned span
        table, we can operate on groups of related time series
        uniformly without having to change our queries depending on
        how many different time series we have: for example, a
        CPU-related query should look the same on any system no matter
        how many CPUs it has!</p>

        <p>DCTV currently allows a span table to have either zero or
        one partition column, but not more.  This limit is just an
        implementation limit, and in the future, DCTV will allow
        partitioning by more than one column.</p>

        <p>Let's look at our Christmas tree light example, but with
        partitions.  Here, we're looking at two lights, one called
        "light#0" and another called "light#1".  We use a sequence of
        spans to describe each light's state.  It's critical to
        understand that each light has a distinct state history, but
        that we store all of these histories in the same physical
        table, using a column to describe the specific light that a
        specific row describes.</p>

        <aside class="note">For the remainder of this document, when
        the character "#" appears in a span row label, it refers to a
        specific partition of a partitioned span table.</aside>

        <table class="spanop">
          <caption>Colors of two lights</caption>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span><span>5</span></td>
          </tr>
          <tr>
            <td>Light#0</td>
            <td colspan="2">Red</td>
            <td colspan="1" class="empty" />
            <td colspan="1">Green</td>
          </tr>
          <tr>
            <td>Light#1</td>
            <td colspan="1">Green</td>
            <td colspan="3">Red</td>
          </tr>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span><span>5</span></td>
          </tr>
        </table>

        <p>Here's the physical partitioned span table representation
        of the logical spans from the above diagram.</p>

        <table class="general spantable">
          <caption>Colors of two lights (span table representation)</caption>
          <tr>
            <th>_ts</th>
            <th>_duration</th>
            <th>lightno</th>
            <th>color</th>
          </tr>
          <tr>
            <td>1</td>
            <td>2</td>
            <td>0</td>
            <td>red</td>
          </tr>
          <tr>
            <td>1</td>
            <td>1</td>
            <td>1</td>
            <td>green</td>
          </tr>
          <tr>
            <td>2</td>
            <td>3</td>
            <td>1</td>
            <td>red</td>
          </tr>
          <tr>
            <td>4</td>
            <td>1</td>
            <td>0</td>
            <td>green</td>
          </tr>
        </table>

        <p>Like an unpartitioned span table, a partitioned span table
        is ordered strictly by increasing <code>_ts</code>.  If spans
        from two different partitions begin at the same time, the
        ordering of those with the same <code>_ts</code> value is
        unspecified. </p>

        <aside class="example">
          A real world use of spans is analyzing CPU-specific data.
          On a multi-CPU system, each CPU has its own frequency.
          A CPU might change from 800MHz to 1GHz and then down to
          600MHz, while another CPU, at the same time, might change
          its frequency from 600MHz to 800MHz and then up to 1GHz.
          Each of the two time series (the first CPU's frequency
          history and the second CPU's frequency history) is an
          independent time series.
        </aside>

        <h3>Span operations</h3>
        <p>While we can apply normal SQL querying operations to span
        tables, we can answer certain questions much more conveniently
        by using DCTV's special span operations, which are designed to
        make it easy to work with real-world time series data.</p>

        <h4>Span join</h4>

        <p>The <dfn>span join</dfn> family of operations merge spans
        together in a timewise-correct way and generates new spans
        divided on the common boundaries of the spans that flow as
        input into the span join.</p>

        <p>It's easiest to demonstrate a span join visually.</p>
        <!-- TODO(dancol): can we make this diagram more fun? -->
        <table class="spanop">
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span><span>4</span></td>
          </tr>
          <tr>
            <td>Size</td>
            <td colspan="2">tiny</td>
            <td colspan="1">giant</td>
          </tr>
          <tr>
            <td>Species</td>
            <td colspan="1">fish</td>
            <td colspan="2">squirrel</td>
          </tr>
          <tr class="result-divider"><td>SPAN JOIN</td></tr>
          <tr>
            <td>Phenotype</td>
            <td>tiny fish</td>
            <td>tiny squirrel</td>
            <td>giant squirrel</td>
          </tr>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span><span>4</span></td>
          </tr>
        </table>

        <p>Here, we're joining two hypothetical time series (as
        represented by span tables), a time series of sizes and a time
        series of animal types.  (Imagine we're trying to reconstruct
        the state of an animal given a record of the transmutation
        spells some novice sorcer's apprentice might have haphazardly
        cast.)</p>

        <p>In this trace, the "make the animal tiny" spell was in
        effect from timestamp one to timestamp three (inclusive), and
        the "make the animal giant" spell was in effect from timestamp
        3 onward.  Likewise, the "make the animal a fish" spell was in
        effect from timestamp one to timestamp two (inclusive) and the
        "make the animal a squirrel" spell was in effect from
        timestamp two onward.  The first row depicts the result of the
        size spells, and the second row depicts the effect of the
        animal-type spell.  (We imagine that each spell cancels the
        effect of the last spell of the same type.)</p>

        <p>The last row, "phenotype", represents a span table giving
        the type of animal that we observe at each moment, inferred
        from the effects of the previous two rows.  Note that the
        result span table has a span division wherever any of the
        inputs has a span division.  We ensure that all the properties
        of any of the input spans stay constant "within" any of the
        output spans, allowing for correct future computation
        involving these values.
        </p>

        <p>It may be informative to look at the row-wise representation
        of the above span tables:</p>

        <table class="general spantable">
          <caption>Size</caption>
          <tr>
            <th>_ts</th>
            <th>_duration</th>
            <th>size</th>
          </tr>
          <tr>
            <td>1</td>
            <td>2</td>
            <td>tiny</td>
          </tr>
          <tr>
            <td>2</td>
            <td>1</td>
            <td>giant</td>
          </tr>
        </table>

        <table class="general spantable">
          <caption>Species</caption>
          <tr>
            <th>_ts</th>
            <th>_duration</th>
            <th>species</th>
          </tr>
          <tr>
            <td>1</td>
            <td>1</td>
            <td>fish</td>
          </tr>
          <tr>
            <td>2</td>
            <td>2</td>
            <td>squirrel</td>
          </tr>
        </table>

        <table class="general spantable">
          <caption>Phenotype</caption>
          <tr>
            <th>_ts</th>
            <th>_duration</th>
            <th>size</th>
            <th>species</th>
          </tr>
          <tr>
            <td>1</td>
            <td>1</td>
            <td>tiny</td>
            <td>fish</td>
          </tr>
          <tr>
            <td>2</td>
            <td>1</td>
            <td>tiny</td>
            <td>squirrel</td>
          </tr>
          <tr>
            <td>3</td>
            <td>1</td>
            <td>giant</td>
            <td>squirrel</td>
          </tr>
        </table>

        <h4>Span join: inner and outer</h4>
        <p>What happens when spans don't line up exactly?</p>
        <p>Span joins come in two varieties, named after the varieties
        of regular SQL joins: <dfn>inner span join</dfn> and
        <dfn>outer span join</dfn>.  When all the inputs to a span
        join cover the same period of time, the difference doesn't
        matter.  But when there are gaps in one sequence or another,
        the difference becomes important.  Just as in the previous
        section, we'll start with a diagram.</p>

        <table class="spanop">
          <caption>Sample inputs</caption>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span><span>4</span></td>
          </tr>
          <tr>
            <td>Breath</td>
            <td colspan="1">fire</td>
            <td class="empty"/>
            <td colspan="1">ice</td>
          </tr>
          <tr>
            <td>Color</td>
            <td colspan="1">red</td>
            <td colspan="2">green</td>
          </tr>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span><span>4</span></td>
          </tr>
        </table>

        <p>Here, we see that there is no magic breath spell in effect
        from time two to time three, inclusive.  What happens when we
        perform a span join on these span tables?  It depends on the
        kind of span join.</p>

        <table class="spanop">
          <caption>Span inner join</caption>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span><span>4</span></td>
          </tr>
          <tr>
            <td>Breath</td>
            <td colspan="1">fire</td>
            <td class="empty"/>
            <td colspan="1">ice</td>
          </tr>
          <tr>
            <td>Color</td>
            <td colspan="1">red</td>
            <td colspan="2">green</td>
          </tr>
          <tr class="result-divider">
            <td>Span inner join</td>
          </tr>
          <tr>
            <td>Phenotype</td>
            <td colspan="1">fire-breathing red</td>
            <td class="empty"/>
            <td colspan="1">ice-breathing green</td>
          </tr>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span><span>4</span></td>
          </tr>
        </table>

        <table class="spanop">
          <caption>Span outer join</caption>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span><span>4</span></td>
          </tr>
          <tr>
            <td>Breath</td>
            <td colspan="1">fire</td>
            <td class="empty"/>
            <td colspan="1">ice</td>
          </tr>
          <tr>
            <td>Color</td>
            <td colspan="1">red</td>
            <td colspan="2">green</td>
          </tr>
          <tr class="result-divider">
            <td>Span outer join</td>
          </tr>
          <tr>
            <td>Phenotype</td>
            <td colspan="1">fire-breathing red</td>
            <td colspan="1"><code>NULL</code>-breathing green</td>
            <td colspan="1">ice-breathing green</td>
          </tr>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span><span>4</span></td>
          </tr>
        </table>

        <p>In the span inner join case, we emit an output span only
        when <i>all</i> input spans cover a time interval.  In the
        span outer join case, we emit an output span when <i>any</i>
        input span covers a specific time region, providing NULL for
        the value of any payload column not provided by a span for
        that region.</p>

        <p>The table representations of the two result span tables may
        make the result more clear.</p>

        <table class="general spantable">
          <caption>Span inner join result (table view)</caption>
          <tr>
            <th>_ts</th>
            <th>_duration</th>
            <th>breath</th>
            <th>color</th>
          </tr>
          <tr>
            <td>1</td>
            <td>1</td>
            <td>fire</td>
            <td>red</td>
          </tr>
          <tr>
            <td>3</td>
            <td>1</td>
            <td>ice</td>
            <td>green</td>
          </tr>
        </table>

        <table class="general spantable">
          <caption>Span outer join result (table view)</caption>
          <tr>
            <th>_ts</th>
            <th>_duration</th>
            <th>breath</th>
            <th>color</th>
          </tr>
          <tr>
            <td>1</td>
            <td>1</td>
            <td>fire</td>
            <td>red</td>
          </tr>
          <tr>
            <td>2</td>
            <td>1</td>
            <td><code>NULL</code></td>
            <td>green</td>
          </tr>
          <tr>
            <td>3</td>
            <td>1</td>
            <td>ice</td>
            <td>green</td>
          </tr>
        </table>

        <p>Note that even a span outer join won't produce a result
        span that covers a period of time that no input span covered,
        as the following diagram indicates.</p>

        <table class="spanop">
          <caption>Holes in span outer join</caption>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span><span>4</span></td>
          </tr>
          <tr>
            <td>Breath</td>
            <td colspan="1" class="empty"/>
            <td colspan="1" class="empty"/>
            <td colspan="1">ice</td>
          </tr>
          <tr>
            <td>Color</td>
            <td class="empty"/>
            <td colspan="1">red</td>
            <td colspan="1">green</td>
          </tr>
          <tr class="result-divider">
            <td>Span outer join </td>
          </tr>
          <tr>
            <td>Phenotype</td>
            <td class="empty"/>
            <td colspan="1"><code>NULL</code>-breathing red</td>
            <td colspan="1">ice-breathing green</td>
          </tr>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span><span>4</span></td>
          </tr>
        </table>

        <h4>Span broadcast</h4>

        <p>A <dfn>span broadcast</dfn> is a special kind of span join
        that operates on two span tables, one partitioned and one not.
        Normally, DCTV treats each partition within a partitioned span
        table as a separate time series and operates on each
        independently; DCTV refuses to perform span operations on span
        tables partitioned by different columns or between partitioned
        and non-partitioned span tables, since the desired operation
        isn't obvious.</p>

        <p>With a span broadcast, we can tell DCTV to perform a
        special kind of span join between a partitioned and
        non-partitioned table, "broadcasting" the non-partitioned span
        into every partition in the partitioned span table in such a
        way that the result has useful properties.</p>

        <p>The overall result is <emph>almost</emph> as if we copied
        the non-partitioned span table N times, one for each N
        partition, into a new partitioned span table, and then joined
        that new partitioned span table with the other partitioned
        span table that we had when we started.  The difference
        between this hypothetical operation and span broadcast is that
        span broadcast doesn't generate any output spans for regions
        not covered by any span in the partitioned span table, even if
        that region is covered by the non-partitioned span table.</p>

        <p>Another way to think of it is that span broadcast "labels"
        each span in a partitioned span table with the payload of the
        non-partitioned table.  The output of a span broadcast
        operation is partitioned in the same way as its partitioned
        input.</p>

        <p>As usual, a diagram may be illustrative.  Here, "Size#0"
        and "Size#1" indicate two spans of the same span table (let's
        suppose animals 0 and 1 have different size spells cast on
        them), "Size".  "Color" is the input non-partitioned span
        table (let's suppose color spells affect all animal at the
        same time).</p>

        <table class="spanop">
          <caption>Sample inputs</caption>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span><span>5</span></td>
          </tr>
          <tr>
            <td>Size #0</td>
            <td colspan="1">tiny</td>
            <td colspan="2">giant</td>
          </tr>
          <tr>
            <td>Size #1</td>
            <td colspan="3">tiny</td>
          </tr>
          <tr>
            <td>Color</td>
            <td colspan="1">red</td>
            <td colspan="1" class="empty" />
            <td colspan="2">green</td>
          </tr>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span><span>5</span></td>
          </tr>
        </table>

        <p>Just like regular span joins, span broadcasts come in
        <dfn>span inner broadcast</dfn> and <dfn>span outer
        broadcast</dfn> varieties, depicted below.  Note that the time
        period from four to five doesn't appear in the result span
        tables, since from time four to time five, we had a color span
        from the non-partitioned span, but no spans from size, the
        partitioned span table.</p>

        <table class="spanop">
          <caption>Inner broadcast of color into size</caption>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span><span>5</span></td>
          </tr>
          <tr>
            <td>Size #0</td>
            <td colspan="1">tiny</td>
            <td colspan="2">giant</td>
          </tr>
          <tr>
            <td>Size #1</td>
            <td colspan="3">tiny</td>
          </tr>
          <tr>
            <td>Color</td>
            <td colspan="1">red</td>
            <td colspan="1" class="empty" />
            <td colspan="2">green</td>
          </tr>
          <tr class="result-divider">
            <td>Inner broadcast</td>
          </tr>
          <tr>
            <td>Result#0</td>
            <td colspan="1">tiny red</td>
            <td colspan="1" class="empty" />
            <td colspan="1">giant green</td>
          </tr>
          <tr>
            <td>Result#1</td>
            <td colspan="1">tiny red</td>
            <td colspan="1" class="empty" />
            <td colspan="1">tiny green</td>
          </tr>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span><span>5</span></td>
          </tr>
        </table>

        <table class="spanop">
          <caption>Outer broadcast of color into size</caption>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span><span>5</span></td>
          </tr>
          <tr>
            <td>Size #0</td>
            <td colspan="1">tiny</td>
            <td colspan="2">giant</td>
          </tr>
          <tr>
            <td>Size #1</td>
            <td colspan="3">tiny</td>
          </tr>
          <tr>
            <td>Color</td>
            <td colspan="1">red</td>
            <td colspan="1" class="empty" />
            <td colspan="2">green</td>
          </tr>
          <tr class="result-divider">
            <td>Outer broadcast</td>
          </tr>
          <tr>
            <td>Result#0</td>
            <td colspan="1">tiny red</td>
            <td colspan="1"><code>NULL</code>-colored giant</td>
            <td colspan="1">giant green</td>
          </tr>
          <tr>
            <td>Result#1</td>
            <td colspan="1">tiny red</td>
            <td colspan="1"><code>NULL</code>-colored tiny</td>
            <td colspan="1">tiny green</td>
          </tr>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span><span>5</span></td>
          </tr>
        </table>

        <p>In general, we use a span broadcast when we have a number
        of different things happening at the same time (each
        represented by one partition of a span table) and we want to
        "mix into" this span table knowledge of something that affects
        the environment as a whole.</p>
        <aside class="example">We might denote a period of a few
        seconds time during which the app
        com.flashlightco.myflashlight starts up in response to a
        launcher tap.  (This is not an efficient flashight app.)  If
        we have a table of process activity, partitioned by CPU, we
        can apply a span inner broadcast to the process activity table
        and narrow our view of that table to the interval during which
        the flashlight app was starting, but keep the result
        partitioned by CPU.
        </aside>

        <h4>Span group</h4>

        <p>A <dfn>span group</dfn> operation is the opposite of a span
        join, in a sense.  It merges spans together and applies SQL
        set functions (like <code>MAX</code> and <code>SUM</code>) to
        the payloads of the merged spans, forming for each payload a
        combined value determined through the usual SQL aggregation
        operation..</p>

        <p>Here's a diagram.</p>

        <table class="spanop">
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span></td>
            <td><span>5</span></td>
            <td><span>6</span></td>
            <td><span>7</span></td>
            <td><span>8</span><span>9</span></td>
          </tr>
          <tr>
            <td>Number arms</td>
            <td>2</td>
            <td>5</td>
            <td>0</td>
            <td>7</td>
            <td>2</td>
            <td>4</td>
            <td>9</td>
            <td>0</td>
          </tr>
          <tr>
            <td>Periods</td>
            <td colspan="2">A</td>
            <td colspan="2">B</td>
            <td colspan="2">C</td>
            <td colspan="2">D</td>
          </tr>
          <tr class="result-divider"><td>Span group</td></tr>
          <tr>
            <td><code>MAX(arms)</code></td>
            <td colspan="2">5</td>
            <td colspan="2">7</td>
            <td colspan="2">4</td>
            <td colspan="2">9</td>
          </tr>
          <tr>
            <td><code>MIN(arms)</code></td>
            <td colspan="2">2</td>
            <td colspan="2">0</td>
            <td colspan="2">2</td>
            <td colspan="2">0</td>
          </tr>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span></td>
            <td><span>5</span></td>
            <td><span>6</span></td>
            <td><span>7</span></td>
            <td><span>8</span><span>9</span></td>
          </tr>
        </table>

        <p>Here, our hapless sorcerer repeatedly changed the numbers
        of arms that our poor animal had at any time.  We want to
        determine, based on the record of arm-number changes, for each
        relatively broad interval A, B, C, and D, the minimum and
        maximum number of arms our animal had during that
        interval.</p>

        <p>A span group operation involves two span tables: the
        <dfn>grouped</dfn> table and the <dfn>grouper</dfn> table.
        The grouped table ("number of arms", in our example) supplies
        the source data for the grouping operations; the grouper table
        (here, "periods") supplies spans describing the groups that
        form the output value.  The grouped table may or may not be
        partitioned; if it is partitioned, DCTV applies grouping to
        each partition individually.  The grouper table may not
        currently be partitioned.</p>

        <p>A span group operation always emits one output span for
        each span in its grouper input span table.  If no grouped span
        overlaps with a given grouper span, all its aggregate values
        end up being <code>NULL</code>.  An example follows.</p>

        <table class="spanop">
          <caption>Illustration of span group behavior with missing
          grouped values</caption>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span></td>
            <td><span>5</span></td>
            <td><span>6</span></td>
            <td><span>7</span></td>
            <td><span>8</span><span>9</span></td>
          </tr>
          <tr>
            <td>Number arms</td>
            <td>2</td>
            <td>5</td>
            <td>0</td>
            <td class="empty" />
            <td class="empty" />
            <td class="empty" />
            <td>9</td>
            <td>0</td>
          </tr>
          <tr>
            <td>Periods</td>
            <td colspan="2">A</td>
            <td colspan="2">B</td>
            <td colspan="2">C</td>
            <td colspan="2">D</td>
          </tr>
          <tr class="result-divider"><td>Span group</td></tr>
          <tr>
            <td><code>MAX(arms)</code></td>
            <td colspan="2">5</td>
            <td colspan="2">0</td>
            <td colspan="2"><code>NULL</code></td>
            <td colspan="2">9</td>
          </tr>
          <tr>
            <td><code>MIN(arms)</code></td>
            <td colspan="2">2</td>
            <td colspan="2">0</td>
            <td colspan="2"><code>NULL</code></td>
            <td colspan="2">0</td>
          </tr>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span></td>
            <td><span>5</span></td>
            <td><span>6</span></td>
            <td><span>7</span></td>
            <td><span>8</span><span>9</span></td>
          </tr>
        </table>

        <!-- TODO(dancol): make this paragraph more clear -->
        <p>Span group operations have two flavors: <dfn>span group and
        intersect</dfn> and <dfn>span group and union</dfn>.
        The difference matters only when multiple partitions are
        involved.  In the former case, we include payloads from the
        grouped span table only when all partitions are present in a
        given interval; in the latter case, we include the grouped
        span table in the output spans when any input grouped
        partition is present.</p>

        <aside class="note">If we want the output of a span join to
        include only the regions of time covered by the grouped span,
        first span join the grouper with the grouped span, then use
        the result as the grouper span table in the span
        group.</aside>

        <h4>Span departition</h4>

        <p>A <dfn>span departition</dfn> operation transforms a
        partitioned span table into a non-partitioned span table by
        grouping the partition payloads with SQL set values.
        This operation is useful mainly when we have a "split up" view
        of activity on the system and want to derive a whole-system
        view by matching up all the partitions.</p>

        <p>To return to our magical forensics example, imagine our
        apprentice cast some very expensive add-arms-to-animals spells
        on a number of different animals.  We're billed for arms based
        on the total number we're using at any one time (there's a
        license server and everything), so we want to reconstruct,
        based on a record of each animal's arm count, the number of
        arms we were using in total at a particular moment.  In the
        following table, "Arms#0", "Arms#1", and so on denote the
        partitions of a single "Arms" span table.</p>

        <table class="spanop">
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span></td>
            <td><span>5</span></td>
            <td><span>6</span></td>
            <td><span>7</span></td>
            <td><span>8</span><span>9</span></td>
          </tr>
          <tr>
            <td>Arms#0</td>
            <td colspan="3">2</td>
            <td colspan="2">7</td>
            <td>4</td>
            <td>9</td>
            <td>0</td>
          </tr>
          <tr>
            <td>Arms#1</td>
            <td colspan="5">2</td>
            <td colspan="3">4</td>
          </tr>
          <tr class="result-divider"><td>Departition</td></tr>
          <tr>
            <td><code>SUM(arms)</code></td>
            <td colspan="3">4</td>
            <td colspan="2">9</td>
            <td colspan="1">8</td>
            <td colspan="1">13</td>
            <td colspan="1">4</td>
          </tr>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span></td>
            <td><span>5</span></td>
            <td><span>6</span></td>
            <td><span>7</span></td>
            <td><span>8</span><span>9</span></td>
          </tr>
        </table>

        <p>A span departition operation resembles a span group join
        followed by a span group operation, but it's specified
        separately so that we can work with partitioned span tables
        without knowing in advance how many partitions we have or
        having to expand our queries to work with each partition
        separately.</p>

        <p>Span departitions come in two varieties, the <dfn>span
        departition and union</dfn> and <dfn>span departition and
        intersect</dfn> operations, with the difference concerning the
        treatment of missing data.  The following table gives the
        differences between these approaches.</p>

        <table class="spanop">
          <caption>Arm history with missing data</caption>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span></td>
            <td><span>5</span></td>
            <td><span>6</span></td>
            <td><span>7</span></td>
            <td><span>8</span><span>9</span></td>
          </tr>
          <tr>
            <td>Arms#0</td>
            <td colspan="3" class="empty" />
            <td colspan="2">7</td>
            <td>4</td>
            <td>9</td>
            <td>0</td>
          </tr>
          <tr>
            <td>Arms#1</td>
            <td colspan="5">2</td>
            <td colspan="3" class="empty" />
          </tr>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span></td>
            <td><span>5</span></td>
            <td><span>6</span></td>
            <td><span>7</span></td>
            <td><span>8</span><span>9</span></td>
          </tr>
        </table>

        <p>In intersect mode, we generate an output span for a region
        of time only when <emph>all</emph> partitions have a span
        covering that period.</p>

        <table class="spanop">
          <caption>Departition intersect result</caption>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span></td>
            <td><span>5</span></td>
            <td><span>6</span></td>
            <td><span>7</span></td>
            <td><span>8</span><span>9</span></td>
          </tr>
          <tr>
            <td>Arms#0</td>
            <td colspan="3">2</td>
            <td colspan="2">7</td>
            <td>4</td>
            <td>9</td>
            <td>0</td>
          </tr>
          <tr>
            <td>Arms#1</td>
            <td colspan="5">2</td>
            <td colspan="3">4</td>
          </tr>
          <tr class="result-divider">
            <td>Departition intersect</td>
          </tr>
          <tr>
            <td><code>SUM(arms)</code></td>
            <td colspan="3" class="empty" />
            <td colspan="2">9</td>
            <td colspan="3" class="empty" />
          </tr>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span></td>
            <td><span>5</span></td>
            <td><span>6</span></td>
            <td><span>7</span></td>
            <td><span>8</span><span>9</span></td>
          </tr>
        </table>

        <p>By contrast, in union mode, we generate an output span when
        <emph>any</emph> partition covers a unit in time.  We treat
        any missing partitions as contributing <code>NULL</code> to
        the output aggregation for each span.  Note that SQL
        aggregation functions just skip <code>NULL</code> values, so
        the sums below are correct.</p>

        <table class="spanop">
          <caption>Departition union result</caption>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span></td>
            <td><span>5</span></td>
            <td><span>6</span></td>
            <td><span>7</span></td>
            <td><span>8</span><span>9</span></td>
          </tr>
          <tr>
            <td>Arms#0</td>
            <td colspan="3">2</td>
            <td colspan="2">7</td>
            <td>4</td>
            <td>9</td>
            <td>0</td>
          </tr>
          <tr>
            <td>Arms#1</td>
            <td colspan="5">2</td>
            <td colspan="3">4</td>
          </tr>
          <tr class="result-divider">
            <td>Departition union</td>
          </tr>
          <tr>
            <td><code>SUM(arms)</code></td>
            <td colspan="3">2</td>
            <td colspan="2">9</td>
            <td colspan="1">4</td>
            <td colspan="1">9</td>
            <td colspan="1">0</td>
          </tr>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span></td>
            <td><span>5</span></td>
            <td><span>6</span></td>
            <td><span>7</span></td>
            <td><span>8</span><span>9</span></td>
          </tr>
        </table>

        <h3>Trace processing intrinsic functions</h3>

        <p>DCTV aims to be a general-purpose time series analysis
        program, one that just happens to be especially useful for
        processing Android system traces.  Its general approach is to
        avoid system- and metric-specific data processing routines and
        provide general-purpose operators that users can combine to
        analyze data in particular situations.</p>

        <p>The previous section describes operations that DCTV
        provides in the form of query operators.  DCTV also provides
        some operations, usually less common ones, in the form of
        table-valued functions.</p>

        <h4><a name="time_series_to_span_conversion">Time series to span conversion</a></h4>

        <p>Recall that DCTV exposes events from trace files as raw
        data points, in event tables.  We have to build span tables
        from these raw data somehow, and the <a
        href="#time_series_to_spans"><code>time_series_to_spans</code></a>
        table-valued function does exactly that.</p>

        <p><code>time_series_to_spans</code> takes as input a set of
        event sources and a set of output column descriptors and
        produces a span table as output.  Logically, it consuming
        events from the given sources, in time order, and constructs
        spans by watching for "start" and "stop" events as denoted by
        the input sources.  Payload values attached to the event
        sources become payload columns of the output span table
        according to each column specification's column
        specification.</p>

        <p>Each source is either a "start-start" source or a "stop"
        source.  The former case models a set of events that divide a
        timeline up into discrete chunks.</p>

        <p>Returning for a moment to our hypothetical wizardly
        apprentice, we recall that an animal's size might change as
        our apprentice casts various "change size" spells on it.
        The raw, event-by-event, record of spells cast by our
        apprentice might look like this.</p>

        <table class="general spantable">
          <caption>Raw size spell record</caption>
          <tr>
            <th>_ts</th>
            <th>size</th>
          </tr>
          <tr>
            <td>1</td>
            <td>tiny</td>
          </tr>
          <tr>
            <td>3</td>
            <td>huge</td>
          </tr>
          <tr>
            <td>4</td>
            <td>large</td>
          </tr>
          <tr>
            <td>6</td>
            <td>huge</td>
          </tr>
        </table>

        <p>Processing this raw event table into spans using
        <code>time_series_to_spans</code>, we end up with a span table
        that looks like this.  (The time scale goes to seven for
        easier comparison with the next example.)</p>

        <table class="spanop">
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span></td>
            <td><span>5</span></td>
            <td><span>6</span><span>7</span></td>
          </tr>
          <tr>
            <td>Size</td>
            <td colspan="2">tiny</td>
            <td colspan="1">huge</td>
            <td colspan="2">large</td>
          </tr>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span></td>
            <td><span>5</span></td>
            <td><span>6</span><span>7</span></td>
          </tr>
        </table>

        <aside class="note">The final "huge" spell isn't reflected in
        the output span table, because
        <code>time_series_to_spans</code> ignores spans left "open"
        (i.e., unclosed) at the end of processing.  The intent of this
        feature is to work with span inner join operations to
        automatically ignore noisy partial-data "junk intervals" at
        the beginning and end of traces.  If a need arises,
        <code>time_series_to_spans</code> could be extended in the
        future to automatically close open spans.</aside>

        <p><code>time_series_to_spans</code> also supports "stop"
        events.  These events don't start new spans, but do indicate
        that any open span active at the time of the stop event should
        be finished.  In an operating system context, if
        <code>sched_switch</code> is a start-start event, a CPU
        hotplug off event might be a "stop" event, since it would
        indicate that a CPU has stopped processing traces without
        producing any new ones.</p>

        <p>To return to our unfortunate apprentice example, suppose we
        have an additional table of "size reset" spells that we know
        were cast during the sequence of size change spells.  A size
        reset spell just returns a creature to whatever size it had
        without any magical augmentation.  The raw table might look
        something like this.</p>

        <table class="general spantable">
          <caption>Raw size-reset spell record</caption>
          <tr>
            <th>_ts</th>
          </tr>
          <tr>
            <td>5</td>
          </tr>
          <tr>
            <td>7</td>
          </tr>
        </table>

        <p>If we feed both our original size spell record event table
        <emph>and</emph> our size-reset spell table into
        <code>time_series_to_spans</code>, we end up with a span table
        that looks like this.</p>

        <table class="spanop">
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span></td>
            <td><span>5</span></td>
            <td><span>6</span><span>7</span></td>
          </tr>
          <tr>
            <td>Size</td>
            <td colspan="2">tiny</td>
            <td colspan="1">huge</td>
            <td colspan="1">large</td>
            <td colspan="1" class="empty" />
            <td colspan="1">huge</td>
          </tr>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span></td>
            <td><span>5</span></td>
            <td><span>6</span><span>7</span></td>
          </tr>
        </table>

        <p>Note the differences: first, we now have a "hole" between
        times five and six, because the stop table told us that we
        stopped changing our poor confused creature's size at time
        five and didn't start changing it again until time six.
        Second, we have a "huge" span from time six to seven, because
        the span beginning at time six is no longer left open after
        <code>time_series_to_spans</code> ends.</p>

        <aside class="note">If you want a span table that substitutes
        a concrete value (say, "normal") for the hole, you can combine
        a span outer join of the whole-trace span with
        <code>COALESCE</code> on the payload column to make
        one.</aside>

        <p>Each payload column that <code>time_series_to_spans</code>
        generates is described by a "source specification".
        The specification describes, for each output column, the
        source event table from which we get the column's value and
        the "edge" from which we draw the value.  (The edge defaults
        to "rising".)  Using the "rising" edge means that we draw the
        output payload column for a span from the event that started
        the span; using "falling" instead tells
        <code>time_series_to_spans</code> to draw the payload column
        value from the <emph>closing</emph> event.  We typically stick
        with "rising" except in special cases.</p>

        <p><code>time_series_to_spans</code> supports creating
        partitioned span tables as well; each source specification can
        be associated with a partition column in that source table.
        All sources for a given call to
        <code>time_series_to_spans</code> must be partitioned the same
        way.</p>

        <h4><a name="stackification">Stackification</a></h4>

        <p>Not all raw input events look like a series of start and
        stop on a timeline.  Another common pattern in row input is
        the "start-stop stack", in which a series of nested and
        balanced start and stop events describe the erection and
        demolition of a stack of some kind of thing.</p>

        <p>Stacks can be anything: examples include procedure call
        stacks, Android synchronous atrace regions, and nested
        interrupt handlers.  To keep with our hapless-apprentice
        example theme, we'll imagine that spells are prepared by
        simultaneous chanting, waving, and stirring, and that we have
        distinct "start" and "stop" records for each activity.</p>

        <p>Suppose we know at what time our apprentice starts a given
        activity and know at what time an activity ends.  Suppose also
        that our apprentice at least paid enough attention in class to
        understand that one always stops the magical activity one most
        recently started.</p>

        <p>(Note that at time five, a second chant begins even though
        a chant was already ongoing.  A friend must have joined
        in.)</p>

        <table class="general spantable">
          <caption>Spell starts</caption>
          <tr>
            <th>_ts</th>
            <th>activity</th>
          </tr>
          <tr>
            <td>1</td>
            <td>stir</td>
          </tr>
          <tr>
            <td>3</td>
            <td>wave</td>
          </tr>
          <tr>
            <td>4</td>
            <td>chant</td>
          </tr>
          <tr>
            <td>5</td>
            <td>chant</td>
          </tr>
        </table>

        <table class="general spantable">
          <caption>Spell stops</caption>
          <tr>
            <th>_ts</th>
          </tr>
          <tr>
            <td>2</td>
          </tr>
          <tr>
            <td>7</td>
          </tr>
          <tr>
            <td>7</td>
          </tr>
          <tr>
            <td>7</td>
          </tr>
        </table>

        <p>What happens if we rearrange these data into spans?</p>

        <table class="spanop">
          <caption>Notional stackified spells</caption>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span></td>
            <td><span>5</span></td>
            <td><span>6</span><span>7</span></td>
          </tr>
          <tr>
            <td>Effects</td>
            <td colspan="1">[stir]</td>
            <td colspan="1" class="empty" />
            <td colspan="1">[wave]</td>
            <td colspan="1">[wave, chant]</td>
            <td colspan="2">[wave, chant, chant]</td>
          </tr>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span></td>
            <td><span>5</span></td>
            <td><span>6</span><span>7</span></td>
          </tr>
        </table>

        <p>This arrangement makes logical sense, but it isn't quite
        compatible with DCTV's data model.  Note that the value of
        each cell is actually a list!  Unlike some databases, DCTV
        does not support composite (multi-part) values as column
        values.  But here we apparently have composite values in the
        cells.  How do we represent these spans as tables?  By <a
        href="https://en.wikipedia.org/wiki/Database_normalization">
        normalization</a>.</p>

        <table class="general spantable">
          <caption>Stack contents</caption>
          <tr>
            <th>stack_id</th>
            <th>depth</th>
            <th>token</th>
          </tr>
          <tr>
            <td>1</td>
            <td>0</td>
            <td>stir</td>
          </tr>
          <tr>
            <td>2</td>
            <td>0</td>
            <td>wave</td>
          </tr>
          <tr>
            <td>3</td>
            <td>0</td>
            <td>wave</td>
          </tr>
          <tr>
            <td>3</td>
            <td>1</td>
            <td>chant</td>
          </tr>
          <tr>
            <td>4</td>
            <td>0</td>
            <td>wave</td>
          </tr>
          <tr>
            <td>4</td>
            <td>1</td>
            <td>chant</td>
          </tr>
          <tr>
            <td>4</td>
            <td>2</td>
            <td>chant</td>
          </tr>
        </table>

        <table class="spanop">
          <caption>Normalized stackified spells</caption>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span></td>
            <td><span>5</span></td>
            <td><span>6</span><span>7</span></td>
          </tr>
          <tr>
            <td>Stack Id</td>
            <td colspan="1">1</td>
            <td colspan="1" class="empty" />
            <td colspan="1">2</td>
            <td colspan="1">3</td>
            <td colspan="2">4</td>
          </tr>
          <tr class="times">
            <td>Time ▶</td>
            <td><span>1</span></td>
            <td><span>2</span></td>
            <td><span>3</span></td>
            <td><span>4</span></td>
            <td><span>5</span></td>
            <td><span>6</span><span>7</span></td>
          </tr>
        </table>

        <p>Now, we can look up the stack corresponding to each span by
        looking at that span's stack id payload and joining it against
        the stack contents table.  The stackify DCTV intrinsic
        processes any kind of stack into these two tables (the stack
        contents regular table and the "stack history" span
        table).</p>

        <aside class="note">This approach is admittedly pretty ugly,
        but it works.  If it turns out to be a big enough of a
        problem, we may just implement support for composite values
        (which is really just this approach under the hood).</aside>

        <h4><a name="span_generation">Generating span tables from thin air</a></h4>

        <p>There are general utility functions to generate specialized
        span tables useful for composing with others.
        The <code>generate_sequential_spans</code> table-valued
        function generates a sequence of spans according to the start
        time, stop time, and duration specified in the call.
        It's useful for generating spans to quantize the timeline into
        discrete intervals and for generating "whole trace" spans that
        act as inputs to span joins.</p>

        <p>Each trace namespace has a few convenience functions for
        succinctly generating, using
        <code>generate_sequential_spans</code>, certain kinds of span
        tables.  See the <a href="#standard_library">"standard
        library"</a> reference below.</p>

        <h3>Dimensional analysis</h3>

        <p>DCTV provides a dimensional analysis feature to make it
        easy and natural to query traces using naturally-specified
        values and to avoid errors that can arise from accidental
        nonsensical combinations of incompatible units. Each quantity
        in a query is associated with a <dfn>unit</dfn> and these
        units propagate through the query as it is processed.
        Quantities with different units combine according to the rules
        of <a
        href="https://en.wikipedia.org/wiki/Dimensional_analysis">dimensional
        analysis</a>.  DCTV also knows how to convert from one
        compatible unit to another.  DCTV will signal errors rather
        than produce results that are dimensional nonsense.
        The overall goal of the dimensional analysis feature is to
        make it easy and natural to query traces using
        naturally-specified values and to avoid errors that can arise
        from accidental nonsensical combinations of incompatible
        units.</p>

        <h4>Unit specification</h4>

        <p>Units come from two sources:</p>
        <ol>
          <li>intrinsic tagging of
          quantities with units during trace parsing, and</li>
          <li>explicit
          tagging of quantities with units in query syntax.</li>
        </ol>

        <p>The syntax for specifying a unit is just adding the name of
        the unit after a numeric literal.  For simple alphanumeric
        unit names, a bare word is sufficient, e.g., <code>4ns</code>.
        For more complicated units that contain operators that SQL
        would otherwise interpret as part of expressions, the unit
        name needs to be quoted with backticks, as in
        <code>4`miles/hour`</code>.  Without the backticks, SQL would
        interpret <code>4miles/hour</code> as an attempt to divide the
        quantity <code>4</code> by the column <code>hour</code>, which
        is probably not what we want.</p>

        <h4>Unit names</h4>

        <!-- TODO(dancol): include the unit name list here -->
        <p>DCTV understands both common and abbreviated names for
        units.  This document will eventually list all understood unit
        names; for the moment, see <a
        href="https://team.git.corp.google.com/dctv/dctv/+/master/src/dctv/units.txt">units.txt</a>
        in the DCTV source code.</p>

        <h4>Unit conversion</h4>

        <p>Queries can explicitly convert units from one type to
        another using the <code>IN</code> operator. <!-- TODO(dancol):
        link to syntax --></p>

        <aside class="example code-example"><![CDATA[DCTV> SELECT 4`inches` IN cm;
4 IN cm [cm]
------------
       10.16]]></aside>

        <p>In the DCTV REPL, column headers that denote a quantity
        with unit list that unit in square brackets after the column
        name.  Above, we see <code>[cm]</code> at the end of the
        column name, indicating that the <code>10.16</code> is
        specified in terms of centimeters.</p>

        <p>DCTV's unit analysis also understands rates.  In the
        example below, DCTV gives a unit in terms of miles, because
        we're multiplying a rate, in miles per hour, by a unit of
        time.  The time unit here need not be the literal unit used in
        the rate: DCTV will convert units as needed.</p>

        <aside class="example code-example"><![CDATA[DCTV> SELECT 4`miles/hour` * 2`days`;
(4 * 2) [mi]
------------
         192]]></aside>

        <h3>
        </h3>

        <h2><a name="differences">Differences from standard SQL</a></h2>

        <h3>Nested namespaces</h3>
        <p>Standard SQL provides a two-level namespace for tables:
        each table is named by an optional schema (followed by a dot),
        and then a table name.  DCTV, by contrast, allows for
        arbitrarily deep nesting of namespaces, with each namespace
        component separated by a period.  (SQL's standard syntax is a
        special case.)  We use the nested namespace syntax to talk
        about specific tables and views embedded in a "trace
        sub-namespace", which we form when we mount a trace into the
        global SQL namespace.</p>

        <h3>Keyword arguments</h3>

        <p>Normal SQL allows only positional arguments to function
        calls.  DCTV allows for Python-style keyword arguments as
        well, with each keyword-argument pair separated by the "=&gt;"
        token.  See the <a href="#syntax">syntax reference</a> for
        details.</p>

        <h3>Extended table-valued-function-call syntax</h3>
        <p>DCTV exposes some facilities as table-valued functions.
        The arguments to these functions are evaluated in a context
        different from normal SQL expression evaluation, and in this
        context, DCTV supports extended syntax, including the use of
        list and dictionary literals.  (Table-valued functions are
        Python functions and these list and dictionary literals become
        list and dict values inside calls.)  See the syntax reference
        for details.</p>

        <h3>Miscellaneous syntax extensions</h3>

        <p>DCTV is designed to minimize users fighting with the
        syntax.  Wherever SQL requires a list of something to be
        comma-separated, DCTV allows and ignores a trailing comma.
        Where SQL requires a list terminator (e.g., semicolons after
        each query statement), DCTV allows users to omit the list
        terminator.</p>

        <p>DCTV recognizes <code>&lt;&gt;</code> and the C-style
        <code>!=</code> operators as equivalent.</p>

        <p>DCTV provides the "spaceship" and "anti-spaceship"
        operators <code>&lt;=&gt;</code> and <code>&lt;!=&gt;</code>,
        respectively, which act like <code>==</code> and
        <code>!=</code>, except that they treat <code>NULL</code> as
        being equal to itself.  (MySQL calls these operators "null
        safe comparison operators".)</p>

        <p>In addition to the standard SQL <code>-- </code> comment
        prefix, DCTV allows the use of <code>#</code> as a
        Python-style comment prefix and the use of <code>/*</code> and
        <code>*/</code> for C-style block comments.</p>

        <h3>Missing features</h3>

        <p>DCTV does not implement some features of more traditional
        databases.  The following table summarizes the features not
        provided, whether we plan to provide them, and any additional
        relevant information.</p>

        <table class="general">
          <tr>
            <th>Feature</th>
            <th>Status</th>
            <th>Command</th>
          </tr>
          <tr>
            <td>INSERT/UPDATE/DELETE</td>
            <td>Not planned</td>
            <td>DCTV is immutable</td>
          </tr>
          <tr>
            <td>SQL1999 window functions</td>
            <td>Planned</td>
            <td></td>
          </tr>
          <tr>
            <td>SQL/PL</td>
            <td>Planned</td>
            <td>Will be accelerated</td>
          </tr>
          <tr>
            <td>Recursive CTEs</td>
            <td>Planned</td>
            <td></td>
          </tr>
          <tr>
            <td>Coordinated subqueries</td>
            <td>Planned</td>
            <td></td>
          </tr>
        </table>
        <h1><a name="syntax">Syntax reference</a></h1>
        <h2>SQL Statement list</h2>
        <p>The REPL accepts statement lists as top-level input.</p>
        <object class="sytax" data="sql-stmt-list.syntax.svg" type="image/svg+xml" />
        <h2>SQL statement</h2>
        <p>A given SQL statement is either a SELECT, which performs a
        query, or one of a few miscellaneous types of data management
        operation.</p>
        <object class="sytax" data="sql-stmt.syntax.svg" type="image/svg+xml" />
        <h2>SELECT</h2>
        <p>A SELECT is a combination of one or more "select core"
        statements (combined together with operators like
        <code>UNION</code>), all sorted and windowed.</p>
        <p>Note that span tables cannot be combined using
        SQL compound operators.</p>
        <p>Common table expressions are "local" views that exist only
        in the context of the following SELECT and
        subsequently-defined common table expressions.  (That is, the
        common table expressions have lexical scope, and the names are
        bound as with <code>let*</code> in Lisp.)</p>
        <object class="sytax" data="select-stmt.syntax.svg" type="image/svg+xml" />
        <h2>Regular select core</h2>

        <p>This diagram shows the syntax for the main body of a
        <code>SELECT</code> statement.  If the keyword
        <code>SPAN</code> appears after the <code>SELECT</code>, the
        "type" of the result of the <code>SELECT</code> is a span
        table; otherwise, it's a regular table.</p>

        <p>In <code>SPAN</code> mode, <code>SELECT</code> always
        includes the special columns <code>_ts</code>,
        <code>_duration</code>, and (if partitioned) the partition
        column in the selected column set.  <code>SELECT SPAN FROM
        ...</code> (with no column list between the <code>SPAN</code>
        and the <code>FROM</code>) indicates selecting
        <emph>only</emph> these special columns.  These special
        columns may not be specified "by hand" in the result-column
        list.</p>

        <p>The <code>table-or-join</code> clause describes the syntax
        for span join operators.  The <code>GROUP...USING SPANS</code>
        syntax describes a span group operation from the data model
        section; the <code>GROUP...USING PARTITION</code> syntax
        describes a span departition operation.  <code>GROUP BY</code>
        works exactly the same way it does in standard SQL.</p>
        <p><code>HAVING</code> in span mode always filters the
        <emph>generated</emph> spans; <code>WHERE</code> filters the
        <emph>inputs</emph> to any span join and grouping operations,
        analogously to the distinction between <code>WHERE</code> and
        <code>HAVING</code> in standard SQL.</p>
        <object class="sytax" data="regular-select.syntax.svg" type="image/svg+xml" />
        <h2>Result column</h2>
        <object class="sytax" data="result-column.syntax.svg" type="image/svg+xml" />
        <h2>Table or join specification</h2>
        <p>This element describes a "column source" from which a
        <code>SELECT</code> draws columns.  It can be a simple table
        name, a call to a table-valued function, a subquery (of which
        <code>VALUES</code> is a special case), or a join operation of
        other column sources.</p>
        <p>A comma joining two table specifications is equivalent to
        <code>INNER JOIN</code>.</p>
        <p><code>AS</code> assigns a local alias to one of these
        column sources, the alias being useful in expressions in the
        result-column clauses.  If a column list comes after the
        <code>AS</code>, the columns of the thing named with
        <code>AS</code> are renamed to match the columns in the column
        list that follows the <code>AS</code>, which must have the
        same length as the set of columns in the named thing.</p>
        <object class="sytax" data="table-or-join.syntax.svg" type="image/svg+xml" />
        <h2>Conventional join</h2>
        <p>A normal SQL join.</p>
        <object class="sytax" data="conventional-join.syntax.svg" type="image/svg+xml" />
        <h2>Span join</h2>
        <p>Describes a span join operation.  The <code>PARTITION
        AS</code> clause provides the name of the partition column in
        the resulting span table, which must be specified if the left
        and right span tables have partition columns with
        different names.</p>
        <object class="sytax" data="span-join.syntax.svg" type="image/svg+xml" />
        <h2>Span broadcast</h2>
        <p>A span broadcast operation.  In the
        <code>BROADCAST..INTO</code> variant, the unpartitioned span
        table is on the left, whereas in the
        <code>BROADCAST...FROM</code> variant, the unpartitioned span
        table is on the right.</p>
        <object class="sytax" data="span-broadcast.syntax.svg" type="image/svg+xml" />
        <h2>Table specification</h2>
        <p>Names a single table, either as a name of a table
        in the table namespace, a call to a table-valued
        function in the table namespace, or a subquery.</p>
        <object class="sytax" data="table-spec.syntax.svg" type="image/svg+xml" />
        <h2>Table-valued function arglist</h2>
        <p>The argument list for a table-valued function call.  Note
        the keyword arguments.</p>
        <object class="sytax" data="tvf-arglist.syntax.svg" type="image/svg+xml" />
        <h2>Table-valued function (TVF) expression</h2>
        <p>The syntax for an expression in TVF context.  Note the dict
        and list literal syntax.  A subquery is also a valid argument
        to a table-valued function!</p>
        <object class="sytax" data="tvf-expr.syntax.svg" type="image/svg+xml" />
        <h2>SQL expression</h2>
        <p>Syntax for expressions that can occur in a query
        outside TVF context.</p>
        <object class="sytax" data="expr.syntax.svg" type="image/svg+xml" />
        <h2>Function call argument list</h2>
        <p>The argument list for a call to a function SQL
        expression context.  Note the keyword arguments and
        the optional <code>DISTINCT</code> keyword.</p>
        <object class="sytax" data="function-arglist.syntax.svg" type="image/svg+xml" />
        <h2>Data type names</h2>
        <p>List of allowed data type names.</p>
        <object class="sytax" data="type-name.syntax.svg" type="image/svg+xml" />
        <h2>Literal value syntax</h2>
        <p>Literal values can appear either as regular SQL expressions
        or as TVF expressions.  In TVF (i.e., Python) context,
        <code>TRUE</code> becomes <code>True</code>,
        <code>FALSE</code> <code>False</code>, and <code>NULL</code>
        <code>None</code>.</p>
        <object class="sytax" data="literal-value.syntax.svg" type="image/svg+xml" />
        <h2>Bind parameters</h2>
        <p>Represents a parameter substitution in a query.
        Positional argument numbers are assigned automatically to
        positional <code>?</code> substitutions without explicit
        numbers.  The assignments starts from zero and proceeds
        left-to-right during parsing, incrementing each unnumbered
        positional substitution's substitution number by one.
        Explicitly numbered substitutions do not affect this automatic
        numbering.</p>
        <object class="sytax" data="bind-parameter.syntax.svg" type="image/svg+xml" />
        <h2>Numeric literal</h2>
        <object class="sytax" data="numeric-literal.syntax.svg" type="image/svg+xml" />
        <h2>VALUES list</h2>
        <p><code>VALUES</code> works just as it does in standard SQL
        and allows query authors to include data inline.  DCTV does
        not provide a <code>CREATE TABLE</code> function, but one can
        achieve a similar effect by using <code>CREATE VIEW</code>
        with a <code>VALUES</code> select-part.</p>
        <object class="sytax" data="values-list.syntax.svg" type="image/svg+xml" />
        <div />
        <object class="sytax" data="values-list-row.syntax.svg" type="image/svg+xml" />
        <h2>Common table expression</h2>
        <p>The common table expression part of a SQL query.
        The optional column list in the name performs the same
        column-renaming operation that the optional column list after
        <code>AS</code> does.</p>
        <object class="sytax" data="common-table-expression.syntax.svg" type="image/svg+xml" />
        <h2>Namespace prefix</h2>
        <p>Names a part of the DCTV namespace.</p>
        <object class="sytax" data="ns-prefix.syntax.svg" type="image/svg+xml" />
        <h2>Table namespace name</h2>
        <p>Describes a table in the table namespace.</p>
        <object class="sytax" data="table-ns-name.syntax.svg" type="image/svg+xml" />
        <h2>Table-valued-function name</h2>
        <p>Describes a table-valued-function in the table namespace.</p>
        <object class="sytax" data="tvf-ns-name.syntax.svg" type="image/svg+xml" />
        <h2>SQL function name</h2>
        <p>Describes a SQL function name in the function namespace.
        Note that the function namespace is distinct from the table
        namespace.</p>
        <object class="sytax" data="function-name.syntax.svg" type="image/svg+xml" />
        <h2>CREATE VIEW</h2>
        <p><code>CREATE VIEW</code> works just like it does in
        standard SQL.</p>
        <object class="sytax" data="create-view-stmt.syntax.svg" type="image/svg+xml" />
        <h2>DROP VIEW</h2>
        <p><code>DROP VIEW</code> works just like it does in
        standard SQL.</p>
        <object class="sytax" data="drop-view-stmt.syntax.svg" type="image/svg+xml" />
        <h2>DROP ALL</h2>
        <p><code>DROP ALL</code> drops everything from a prefix
        of the DCTV namespace.  It is useful for "unmounting"
        traces by detaching a trace sub-namespace from the global
        namespace.</p>
        <object class="sytax" data="drop-all-stmt.syntax.svg" type="image/svg+xml" />
        <h2>MOUNT TRACE</h2>
        <p>Mount trace "mounts" a trace file at a prefix of the trace
        namespace.  See the standard library section of this manual
        for a description of what's available under the mount
        prefix.</p>
        <object class="sytax" data="mount-trace-stmt.syntax.svg" type="image/svg+xml" />
        <h2>Ordering term</h2>
        <p>Ordering in a <code>SELECT</code></p>
        <object class="sytax" data="ordering-term.syntax.svg" type="image/svg+xml" />
        <h2>SQL compound operators</h2>
        <p>Ways of combining multiple <code>SELECT</code> "cores".</p>
        <p>Not applicable to span tables.</p>
        <object class="sytax" data="compound-operator-name.syntax.svg" type="image/svg+xml" />
        <h2>Comment syntax</h2>
        <object class="sytax" data="comment-syntax.syntax.svg" type="image/svg+xml" />
        <h2>Operators</h2>
        <p>The following table gives the precedence
        of each operator.  Operators on the same row have
        the same precedence.</p>

        <table class="general" style="width:50%">
          <caption>Operator precedence</caption>
          <tr>
            <td>
              *
              /
              %
              //
            </td>
          </tr>
          <tr>
            <td>
              +
              -
            </td>
          </tr>
          <tr>
            <td>
              &lt;&lt;
              &gt;&gt;
            </td>
          </tr>
          <tr>
            <td>
              &amp;
            </td>
          </tr>
          <tr>
            <td>
              |
            </td>
          </tr>
          <tr>
            <td>
              BETWEEN
            </td>
          </tr>
          <tr>
            <td>
              =
              &lt;=&gt;
              &lt;!=&gt;
              &gt;=
              &gt;
              &lt;=
              &lt;
              !=
              &lt;&gt;
              ==
              IS
            </td>
          </tr>
          <tr>
            <td>
              NOT
            </td>
          </tr>
          <tr>
            <td>
              AND
            </td>
          </tr>
          <tr>
            <td>
              OR
            </td>
          </tr>
        </table>

        <table class="general" style="width:50%">
          <caption>Operator descriptions</caption>
          <tr>
            <th>Operator</th>
            <th>Description</th>
          </tr>
          <tr>
            <td>*</td>
            <td>Multiply</td>
          </tr>
          <tr>
            <td>/</td>
            <td>True division (yield float)</td>
          </tr>
          <tr>
            <td>%</td>
            <td>Modulus</td>
          </tr>
          <tr>
            <td>//</td>
            <td>Floor division (truncates toward zero)</td>
          </tr>
          <tr>
            <td>+</td>
            <td>Addition</td>
          </tr>
          <tr>
            <td>-</td>
            <td>Subtraction</td>
          </tr>
          <tr>
            <td>&lt;&lt;</td>
            <td>Left shift</td>
          </tr>
          <tr>
            <td>&gt;&gt;</td>
            <td>Right shift</td>
          </tr>
          <tr>
            <td>&amp;</td>
            <td>Bitwise AND</td>
          </tr>
          <tr>
            <td>|</td>
            <td>Bitwise OR</td>
          </tr>
          <tr>
            <td>BETWEEN</td>
            <td>Standard SQL</td>
          </tr>
          <tr>
            <td>= ==</td>
            <td>Equality</td>
          </tr>
          <tr>
            <td>&lt;=&gt; IS</td>
            <td>NULL-safe equality</td>
          </tr>
          <tr>
            <td>&lt;!=&gt; IS NOT</td>
            <td>NULL-safe inequality</td>
          </tr>
          <tr>
            <td>&gt;=</td>
            <td>Greater than or equals</td>
          </tr>
          <tr>
            <td>&gt;</td>
            <td>Greater than</td>
          </tr>
          <tr>
            <td>&lt;=</td>
            <td>Less than or equal</td>
          </tr>
          <tr>
            <td>&lt;</td>
            <td>Less than</td>
          </tr>
          <tr>
            <td>!= &lt;&gt;</td>
            <td>Inequality</td>
          </tr>
          <tr>
            <td>NOT</td>
            <td>Logical negation</td>
          </tr>
          <tr>
            <td>AND</td>
            <td>Logical conjunction</td>
          </tr>
          <tr>
            <td>OR</td>
            <td>Logical disjunction</td>
          </tr>
        </table>

        <h1><a name="standard_library">Trace standard library</a></h1>
        <!-- TODO(dancol): document columns -->
        <p>DCTV queries operate on a single shared per-session
        namespace.  (Each leaf level of the namespace has distinct
        mappings for table-valued things and for SQL functions, but
        the non-leaf nodes are shared between the namespaces.)</p>
        <h2>Per-trace names</h2>
        <p>DCTV makes a trace available by "mounting" it under a
        namespace prefix.  Names beginning with this prefix then refer
        to the trace that was mounted.  Multiple traces can be mounted
        in the same session, and a single query can pull data from
        multiple traces.</p>
        <p>In the description below, <code>mytrace.</code>
        refers to an arbitrary trace mountpoint.</p>
        <dl>
          <dt><code>mytrace.raw_events.*</code></dt>
          <dd>
            <p>These tables provide access to the raw events embedded
            in the trace.  For example,
            <code>mytrace.raw_events.sched_switch</code> is a table of
            <code>sched_switch</code> events, with one column for each
            field in the ftrace event.</p>
            <p>The DCTV event parser has a special case for
            <code>trace_marker_write</code> events: we put each one in
            a table formed by the concatenation of the event name and
            the first part of the event payload.  For example,
            <code>`print|B`</code> refers to those
            <code>trace_marker_write</code> events that begin with the
            prefix <code>B|</code>, indicating the start of a
            synchronous application-defined trace event.  We need to
            write <code>mytrace.raw_events.`print|B`</code> instead of
            <code>mytrace.raw_events.print|B</code> because
            <code>|</code> is normally an operator, so to treat it as
            part of a table name, we need to escape it with
            backticks.</p>
            <aside class="note">This special case for
            <code>write</code> is an ugly hack that exists so that we
            can give a different "schema" (set of columns) to each
            different kind of write event depending on its
            payload.</aside>
          </dd>
          <dt><code>mytrace.scheduler.timeslices_p_cpu</code></dt>
          <dd>
            <p>This table is a span table partitioned by CPU
            representing the scheduler activity of the system.</p>
          </dd>
          <dt><code>mytrace.scheduler.cpufreq_p_cpu</code></dt>
          <dd>
            <p>This table is a span table partitioned by CPU
            representing the CPU frequency that each CPU is
            known to have.</p>
          </dd>
          <dt><code>mytrace.last_ts</code></dt>
          <dd>
            <p>Single-column, single-value event table giving the
            largest timestamp found in the trace.  It's useful for
            building spans that cover the whole trace, but see
            <code>quantize</code> immediately below.</p>
          </dd>
          <dt><code>mytrace.quantize(interval=&gt;NULL)</code></dt>
          <dd>
            <p>This table-valued function generates a payloadless span
            table that divides the trace timeline into fixed-size
            spans of duration <code>interval</code>.  This table is
            useful for quantizing the trace timeline into fixed-size
            blocks for display or analysis, and is designed to work
            with span group operations.</p>
            <p>If <code>interval</code> is <code>NULL</code>,
            generates a span table with one huge span covering the
            whole trace.</p>
            <aside class="example code-example"><![CDATA[SELECT SPAN SUM(_duration)/5s AS non_idle_ratio
FROM (SELECT SPAN * FROM my_cpu_timeslices WHERE pid != 0)
GROUP USING SPANS FROM mytrace.quantize(5s) ]]></aside>
          </dd>
        </dl>

        <h2>The DCTV namespace</h2>

        <p>DCTV-specific query functions live under the
        <code>dctv.</code> namespace prefix.</p>

        <dl>
          <dt><code><dfn><a name="time_series_to_spans">
            dctv.time_series_to_spans(*, sources, columns, partition=&gt;NULL)
          </a></dfn></code></dt>
          <dd>
            <p>This function implements the time series to span
            conversion operation described <a
            href="#time_series_to_span_conversion"> above</a>.</p>
            <p><code>sources</code> is a list of source
            specifications.  Each sources specification is a dict with
            the following entries; entries are optional unless
            otherwise indicated.  As a convenience, a source
            specification can also be a list, the elements of which
            are turned into dict elements in the order given below.
            If a source specification is neither a dict nor a list, it
            is treated as if it were a dict with only the source
            element provided.  (This way, a bare table is a valid
            event source.)</p>
            <dl>
              <dt>source</dt>
              <dd>The event table providing the raw events that this routine
              turns into spans. Mandatory.</dd>
              <dt>role</dt>
              <dd>Either <code>"start"</code> or <code>"stop"</code>, defaulting to
              <code>"start"</code>.  Indicates whether the given source starts and
              separates output spans (in the former case) or whether it stops only
              started spans (the latter case).</dd>
              <dt>partition</dt>
              <dd>Either a string naming the column by which this
              source is partitioned or <code>NULL</code>, indicating
              that the source is unpartitioned.  Defaults to
              <code>NULL</code>.</dd>
              <dt>timestamp</dt>
              <dd>The name of the column in the source providing the timestamp.
              Defaults to <code>"_ts"</code>.</dd>
              <dt>nickname</dt>
              <dd>An optional string assigning a name to this source that column
              specifications in <code>columns</code> can reference.</dd>
            </dl>
            <p><code>columns</code> is a list of column
            specifications, each representing one payload column in
            the generated span table.</p>

            <p>Each column specification is a dict with the elements
            below.  As a convenience, a column specification can also
            be a list, the elements of which are turned into dict
            elements in the order given below.  If a column
            specification is neither a dict nor a list, it must be a
            string, and it is treated as if it were a dict with only
            the column element set.  (This way, a simple string is a
            valid column descriptor in the case that we have only one
            source.)</p>

            <dl>
              <dt>column</dt>
              <dd>String naming the output column in the generated
              span table.  Mandatory.</dd>
              <dt>source</dt>
              <dd>Identifies the source that supplies this output
              column.  May be omitted when only one source is given to
              the call; otherwise, must either be a number (naming a
              source positionally) or a string (matching the nickname
              given to a source in its specification).</dd>
              <dt>source_column</dt>
              <dd>Name of the column in the source event table that
              supplies the value of the corresponding column in the
              output table.  Defaults to the name of the output
              column.</dd>
              <dt>edge</dt>
              <dd>Either <code>"rising"</code> or
              <code>"falling"</code>, defaulting to the former.
              Determines which event supplies the value of the column
              in the output table: the event that starts a span or the
              event that ends a span.</dd>
            </dl>
            <p><code>partition</code> is the name of the partition
            column in the output span table.  If it is specified, all
            sources must have their own partitions specified.  If it
            is not yet, then no source may be partitioned.</p>

            <aside class="example code-example"><![CDATA[SELECT SPAN * FROM dctv.time_series_to_span(
  sources=>[my_raw_events_table],
  columns=>["foo", "bar", "qux"],
  )]]></aside>

            <aside class="example code-example"><![CDATA[SELECT SPAN * FROM dctv.time_series_to_span(
  sources=>[{source=>table1, partition=>"cpu", nickname="foo"},
            {source=>table2, partition=>"cpu", role="stop"}],
  columns=>[{column=>"total_things",
             source=>"foo",
             source_column=>"last_things",
             edge=>"falling"}])
)]]></aside>
            <p>This routine looks pretty ugly when called.  Most of
            the time, you want to use one of the pre-defined span
            tables in the <a href="#standard_library">standard
            library</a>, which call <code>time_series_to_spans</code>
            for you.</p>
          </dd>
          <dt><dfn><code>dctv.stack_history()</code></dfn></dt>
          <dd>
            <p>This table-valued function understands "nested" events,
            turning them into stacks for further analysis.
            This table-valued function generates a span table mapping
            time intervals to stack IDs.</p>
            <p>See the <a href="#stackification">stackification</a>
            sub-section of the data model section.</p>
            <!-- TODO(dancol): document argument list -->
          </dd>
          <dt><dfn><code>dctv.stack_contents()</code></dfn></dt>
          <dd>
            <p>This table-valued function generates the
            <emph>contents</emph> of the stack IDs generated by the
            previous function.</p>
            <p>See the <a href="#stackification">stackification</a>
            sub-section of the data model section.</p>
          </dd>
          <dt><dfn><code>dctv.generate_sequential_spans(start, stop, duration)</code></dfn></dt>
          <dd>
            <p>This table-valued function generates "synthetic" spans
            useful for a variety of purposes.  See the <a
            href="#span_generation">span generation</a> sub-section of
            the data model section above.</p>
            <p><code>start</code> is a timestamp at which the spans
            should start.  <code>stop</code> is the time at which the
            last span should end.  <code>duration</code> is the length
            of each generated span.  Output spans are generated with
            no gaps.</p>
          </dd>
        </dl>

        <h1><a name="example">Worked example</a></h1>
        <p>Having read the above manual, this query should make sense.</p>
        <p><code>TODO(dancol):</code> expand this section.</p>
        <ol>
          <li>Extract from “print|B” a list of frame-start events.</li>
          <li>Take these events and, using time_series_to_span’s
          “start-start” mode, assemble a set of spans partitioning the
          trace timeline into frames.</li>
          <li>Select those frame-spans that lasted longer than 17ms,
          i.e., that took a long time to render.</li>
          <li>Intersect this bad-frame span set with the per-processor
          span table describing what the system is actually
          doing. Don’t consider the idle process.</li>
        </ol>
        <code class="blockquote"><![CDATA[WITH frames AS (SELECT SPAN * FROM dctv.time_series_to_spans(
                      sources=>[{source=>(SELECT * FROM trace.raw_events.`print|B` W
      HERE name='eglBeginFrame'),
                                 timestamp=>'ts'}],
                      columns=>[])),
           bad_frames AS (SELECT SPAN * FROM frames WHERE _duration > 17ms),
           bad_timeslices AS (SELECT SPAN * FROM trace.scheduler.timeslices_p_cpu
                                       SPAN BROADCAST FROM bad_frames)
      SELECT comm, cpu, SUM(_duration) AS totdur FROM bad_timeslices
      WHERE pid != 0
      GROUP BY comm, cpu
      ORDER BY totdur DESC
      LIMIT 20
        ]]></code>
      </main>
    </div>
  </body>
</html>
