Querying

The API provides a simple, but powerful query language in addition to iterating directly over entity attributes. Using queries can often substantially speed up your code as well as reduce the amount of code written.

A query is issued using Session.query() and returns a list of matching entities. The query always has a single target entity type that the query is built against. This means that you cannot currently retrieve back a list of different entity types in one query, though using projections does allow retrieving related entities of a different type in one go.

The syntax for a query is:

select <projections> from <entity type> where <criteria>

However, both the selection of projections and criteria are optional. This means the most basic query is just to fetch all entities of a particular type, such as all projects in the system:

projects = session.query('Project')

A query always returns a QueryResult instance that acts like a list with some special behaviour. The main special behaviour is that the actual query to the server is not issued until you iterate or index into the query results:

for project in projects:
    print(project['name'])

You can also explicitly call all() on the result set:

projects = session.query('Project').all()

Note

This behaviour exists in order to make way for efficient paging and other optimisations in future.

Using criteria to narrow results

Often you will have some idea of the entities you want to retrieve. In this case you can optimise your code by not fetching more data than you need. To do this, add criteria to your query:

projects = session.query('Project where status is active')

Each criteria follows the form:

<attribute> <operator> <value>

You can inspect the entity type or instance to find out which attributes are available to filter on for a particular entity type. The list of operators that can be applied and the types of values they expect is listed later on.

Combining criteria

Multiple criteria can be applied in a single expression by joining them with either and or or:

projects = session.query(
    'Project where status is active and name like "%thrones"'
)

You can use parenthesis to control the precedence when compound criteria are used (by default and takes precedence):

projects = session.query(
    'Project where status is active and '
    '(name like "%thrones" or full_name like "%thrones")'
)

Filtering on relationships

Filtering on relationships is also intuitively supported. Simply follow the relationship using a dotted notation:

tasks_in_project = session.query(
    'Task where project.id is "{0}"'.format(project['id'])
)

This works even for multiple strides across relationships (though do note that excessive strides can affect performance):

tasks_completed_in_project = session.query(
    'Task where project.id is "{0}" and '
    'status.type.name is "Done"'
    .format(project['id'])
)

The same works for collections (where each entity in the collection is compared against the subsequent condition):

import arrow

tasks_with_time_logged_today = session.query(
    'Task where timelogs.start >= "{0}"'.format(arrow.now().floor('day'))
)

In the above query, each Task that has at least one Timelog with a start time greater than the start of today is returned.

When filtering on relationships, the conjunctions has and any can be used to specify how the criteria should be applied. This becomes important when querying using multiple conditions on collection relationships. The relationship condition can be written against the following form:

<not?> <relationship> <has|any> (<criteria>)

For optimal performance has should be used for scalar relationships when multiple conditions are involved. For example, to find notes by a specific author when only name is known:

notes_written_by_jane_doe = session.query(
    'Note where author has (first_name is "Jane" and last_name is "Doe")'
)

This query could be written without has, giving the same results:

notes_written_by_jane_doe = session.query(
    'Note where author.first_name is "Jane" and author.last_name is "Doe"'
)

any should be used for collection relationships. For example, to find all projects that have at least one metadata instance that has key=some_key and value=some_value the query would be:

projects_where_some_key_is_some_value = session.query(
    'Project where metadata any (key=some_key and value=some_value)'
)

If the query was written without any, projects with one metadata matching key and another matching the value would be returned.

any can also be used to query for empty relationship collections:

users_without_timelogs = session.query(
    'User where not timelogs any ()'
)

Supported operators

This is the list of currently supported operators:

Operators

Description

Example

= is

Exactly equal.

name is “martin”

!= is_not

Not exactly equal.

name is_not “martin”

> after greater_than

Greater than exclusive.

start after “2015-06-01”

< before less_than

Less than exclusive.

end before “2015-06-01”

>=

Greater than inclusive.

bid >= 14400

<=

Less than inclusive.

bid <= 14400

in

One of.

status.type.name in (“In Progress”, “Done”)

not_in

Not one of.

status.name not_in (“Omitted”, “On Hold”)

like

Matches pattern.

name like “%thrones”

not_like

Does not match pattern.

name not_like “%thrones”

has

Test scalar relationship.

author has (first_name is “Jane” and last_name is “Doe”)

any

Test collection relationship.

metadata any (key=some_key and value=some_value)

Optimising using projections

In Understanding sessions we mentioned auto-population of attribute values on access. This meant that when iterating over a lot of entities and attributes a large number of queries were being sent to the server. Ultimately, this can cause your code to run slowly:

>>> projects = session.query('Project')
>>> for project in projects:
...     print(
...         # Multiple queries issued here for each attribute accessed for
...         # each project in the loop!
...         '{project[full_name]} - {project[status][name]})'
...         .format(project=project)
...     )

Fortunately, there is an easy way to optimise. If you know what attributes you are interested in ahead of time you can include them in your query string as projections in order to fetch them in one go:

>>> projects = session.query(
...     'select full_name, status.name from Project'
... )
>>> for project in projects:
...     print(
...         # No additional queries issued here as the values were already
...         # loaded by the above query!
...         '{project[full_name]} - {project[status][name]})'
...         .format(project=project)
...     )

Notice how this works for related entities as well. In the example above, we also fetched the name of each Status entity attached to a project in the same query, which meant that no further queries had to be issued when accessing those nested attributes.

Note

There are no arbitrary limits to the number (or depth) of projections, but do be aware that excessive projections can ultimately result in poor performance also. As always, it is about choosing the right tool for the job.

You can also customise the Default projections to use for each entity type when none are specified in the query string.

Aggregate data with group by

Group by can be used in a query to aggregate data. For example, to find the number of tasks in each status:

select count(id) from Task group by status_id

Read more about how to use group by in Aggregate results with group by

Queries that include group by clauses must be executed using the session.call() method versus session.query().

Not all entities support group by yet. The most common ones that already support it include Task and Project. You can find a full list of supported entities in the API reference documentation in your ftrack workspace. It available under the Help menu and “group by” is visible on entities that support it if you are on ftrack 4.13+.

Note

group by is available since ftrack 4.12