Rules for the Intermediate Model in SQL Generation

Overview

Each KQL query is transformed into an intermediate object model prior to SQL generation.

Class diagram

All transformation rules operate exclusively on this intermediate object model. After rule processing is complete, the final SQL statement is generated from the transformed model.


Transformation Rules

1. Push Logical Expressions to Outer-Joined Tables

Rule: PushLogicalExpressionRule

Optional linked tables may contain additional filter conditions. In SQL, such conditions must be placed inside the ON clause of the OUTER JOIN. If they are placed in the WHERE clause, the OUTER JOIN semantics are destroyed and effectively converted into an INNER JOIN.

PushLogicalExpressionRule moves logical expressions from the SELECT filter or having to the appropriate joined table.

Example

Move the expression o.order_data BETWEEN DATE '2023-01-01' AND DATE '2023-12-31' to table orders.

    // give customers and optional count of orders in 2023

    FIND customers c, c+orders o
    FILTER o.order_data BETWEEN DATE '2023-01-01' AND DATE '2023-12-31'
    FETCH c.company_name, count(o)

Expected SQL

    SELECT
    c.company_name
    , count(o.order_id)
    FROM
     customers c
     LEFT OUTER JOIN orders o ON
      c.customer_id = o.customer_id
     AND
      o.order_data BETWEEN DATE '2023-01-01' AND DATE '2023-12-31'
    GROUP BY
     c.company_name

Incorrect SQL (Without Rule Application)

    SELECT
    c.company_name
    , count(o.order_id)
    FROM
     customers c
     LEFT OUTER JOIN orders o ON
      c.customer_id = o.customer_id
    WHERE 
      o.order_data BETWEEN DATE '2023-01-01' AND DATE '2023-12-31'
    GROUP BY
     c.company_name

This invalid transformation destroys OUTER JOIN behavior. The user explicitly requested an optional count of orders; therefore, the LEFT OUTER JOIN semantics must be preserved.


2. Add GROUP Expressions for Non-Aggregate OUT Expressions

Rule: GroupRule

If at least one aggregate expression appears in the OUT clause, SQL requires all non-aggregate expressions to be included in the GROUP BY clause.

In the previous example, count(o) is detected as an aggregate expression. Therefore, c.company_name is automatically added to the GROUP BY clause.


3. Push Aggregate Filter Expressions to HAVING Clause

Rule: HavingRule

Filter expressions that contain aggregate functions must not appear in the WHERE clause. Instead, they must be placed in the HAVING clause.

Example

    // Find customers who have placed more than 10 orders in January 2023,
    // return companyname and count, sort by count.
    
    FIND customers c, c-orders o
    FILTER count(o) > 10 AND o.order_date BETWEEN DATE '2023-01-01' AND DATE '2023-12-31'
    FETCH c.company_name, count(o) DESC

This query uses an INNER JOIN. The expression count(o) > 10 is an aggregate expression.

Expected SQL

    SELECT
      c.company_name
    , count(o.order_id)
    FROM
     customers c
      INNER JOIN orders o ON
       c.customer_id = o.customer_id
    WHERE
     o.order_date BETWEEN DATE '2023-01-01' AND DATE '2023-12-31'
    GROUP BY
     c.company_name
    HAVING
     count(o.order_id) > 10
    ORDER BY
     count(o.order_id) DESC

Placing count(o) > 10 inside the WHERE clause would result in invalid SQL.


4. Push Expressions to Tables

Expressions are classified as:

  • Homogeneous expressions — reference at most one table.
  • Heterogeneous expressions — reference two or more tables.

Homogeneous expressions are assigned to their referenced table. Heterogeneous expressions remain at the SELECT level.

4.1 Push OUT Expressions

Rule: PushOutRule

Homogeneous expressions in the OUT clause of SELECT are pushed down to the referenced table.


4.2 Push GROUP Expressions

Rule: PushGroupRule

Homogeneous expressions in the GROUP of SELECT clause are pushed down to the referenced table.


4.3 Push ORDER Expressions

Rule: PushOrderRule

Homogeneous expressions in the ORDER clause of SELECT are pushed down to the referenced table.


5. Identity Rule

Rule: IdentityRule

KQL supports intuitive expressions such as:

count(o)

This differs from SQL, where a column reference (e.g., count(o.order_id)) or count(*) is required.

The IdentityRule transforms entity-based count expressions into valid SQL column-based count expressions.

Both table-count and column-count semantics are supported.


6. Infer OUT Expressions for Query Blocks

Rule: InferJoinColumnsToBlockRule

KQL abstracts join columns and does not require them to be explicitly returned.

When query blocks (CTEs) are linked to other tables, required join columns must be added automatically to the intermediate model to ensure valid SQL generation.

InferJoinColumnsToBlockRule performs this enrichment.

Example

    // find the sum of unit_price * quantity for orders
    // order by sum and limit result to first row
    // join with employees and return last_name, first_name and phone number
    
    
    WITH sales AS (
    FIND orders o, o-order_details d
    FILTER sum(d.unit_price * d.quantity) sum DESC
    LIMIT 1
    )
    FIND employees e, e-sales s
    FETCH e.last_name, e.first_name, e.home_phone

The sales query block is linked to employees. Therefore, the join column o.employee_id must be added automatically to the block.

Expected SQL

    WITH sales AS (
    SELECT
      sum(d.unit_price * d.quantity) AS sum
    , o.employee_id
    FROM
     orders o
      INNER JOIN order_details d ON
       o.order_id = d.order_id
    GROUP BY
      o.employee_id
    ORDER BY
      sum DESC
    FETCH FIRST 1 ROWS ONLY
    )
    SELECT
      e.last_name
    , e.first_name
    , e.home_phone
    FROM
     employees e
      INNER JOIN sales s ON
       e.employee_id = s.employee_id

Without automatic inference of the join column, the JOIN condition in the outer query would be invalid.


Summary

The intermediate model enables deterministic, rule-based SQL generation. All transformation rules operate on the structured object representation of the KQL query to ensure:

  • Preservation of join semantics
  • Correct aggregate handling
  • Valid SQL clause placement
  • Proper grouping and ordering
  • Automatic join column inference for query blocks

This layered approach ensures that generated SQL remains syntactically valid and semantically consistent with the original KQL intent.