Explains how to make aggregate queries over data. Samples demostrating some common use cases.

Overview

Aggregated queries allow you to perform aggregation operations over records in entities, like count, sum or average.

This feature is implemented following MongoDB’s aggregation framework, so you will notice many similarities. Here you will find all the information to write aggregate queries and you shouldn’t need to check MongoDB’s docs.

Basically there is a pipeline where the input are the records in the entity and from there you apply different operations to filter, sort and aggregate data. Here is a sample of an aggregate query:

var resultSet = sys.data.aggregate('contacts', [
  {match: {'company.isCustomer': true}},
  {group: {by: 'company', totalSkills: 'sum(numberOfSkills)'}}
]);
while (resultSet.hasNext()) {
  var result = resultSet.next();
  // the format of the result is a simple object you can stringify
  log(JSON.stringify(result));
}

Here in the pipeline we have two steps:

  1. Select only records associated with a customer
  2. Group by company and sum up the field numberOfSkills and store it in the field totalSkills

So the input of one step is the result of the previous one. That’s why it is a pipeline.

You will end up with a result set that contains records with the following structure for the example above:

{
  "company": {
    "id": "57fd2d61e4b0ce322b0c530d",
    "label": "Flashset"
  },
  "totalSkills": 12
}

As you can see you get a simple Javascript (it follows the JSON representation you can see in the REST API object instead of something like a record object. This is because the structure changes based on the operations in your pipeline and the result is not a record from an entity any longer. You should check each type’s documentation to verify what’s the format used in the JSON representation.

Aggregated queries, similar to other queries, can be expressed using a query map, a query builder and the REST API. Here you will find samples for all versions.

Operations

Count

Allows to count elements and store the result in a field. This field cannot contain special characters, only numbers and letters and will be the only one in the result.

Here are some samples of how you can use it:

// counts companies from New York
var resultSet = sys.data.aggregate('companies', [
  {match: {'address.state': 'NY'},
  count: 'counter_field'
  }
]);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
// counts companies from New York
var query = sys.data.createAggregateQuery('companies');
query.match().field('address.state').equals('NY');
query.count().counterName('counter_field');
var resultSet = sys.data.aggregate(query);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
counts companies from New York

PUT /data/companies/aggregate

[
  {"match": {"address.state": "NY"},
            "count": "counter_field"}
]

When you run the above query you will get something like this as the output:

{"counter_field":2}

Skip

Allows to skip a number of elements. This operator is configured with a positive integer and omits the first number of results.

Here are some samples of how you can use it:

// retrieves contacts skipping first 3 results
var resultSet = sys.data.aggregate('contacts', [
  {skip: 3}
]);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
// retrieves contacts skipping first 3 results
var query = sys.data.createAggregateQuery('contacts');
query.skip().step(3);
var resultSet = sys.data.aggregate(query);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
retrieves contacts skipping first 3 results

PUT /data/contacts/aggregate

[
  {"skip": 3}
]

Limit

Allows to limit number of elements. This operator is configured with a positive integer and retrieves the first number of results.

Here are some samples of how you can use it:

// retrieves first 5 contacts
var resultSet = sys.data.aggregate('contacts', [
  {limit: 5}
]);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
// retrieves first 5 contacts
var query = sys.data.createAggregateQuery('contacts');
query.limit().step(5);
var resultSet = sys.data.aggregate(query);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
retrieves first 5 contacts

PUT /data/contacts/aggregate

[
  {"limit": 5}
]

Unwind

Allows to unwind multivalued fields. This operator is configured with field, if this field is an array, it will retrieve the same record with a single value for selected field per each element in array.

Here are some samples of how you can use it:

// retrieves companies unwinding services
var resultSet = sys.data.aggregate('companies', [
  {unwind: {fieldPath: 'services', includeEmpty: true}}
]);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
// retrieves companies unwinding services
var query = sys.data.createAggregateQuery('companies');
query.unwind().path('services').includeEmpty(true);
var resultSet = sys.data.aggregate(query);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
retrieves companies unwinding services

PUT /data/companies/aggregate

[
  {"unwind": {"fieldPath": "services", "includeEmpty": true}}
]

When you run the above query you will get something like this as the output:

{"id":"57fd2d64e4b0ce322b0c8349","name":"Photolist","services":"SERVICE_A"} 
{"id":"57fd2d64e4b0ce322b0c8349","name":"Photolist","services":"SERVICE_C"} 
{"id":"57fd2d62e4b0ce322b0c6268","name":"DabZ","services":"SERVICE_A"} 
{"id":"57fd2d62e4b0ce322b0c6268","name":"DabZ","services":"SERVICE_B"} 
{"id":"57fd2d62e4b0ce322b0c6268","name":"DabZ","services":"SERVICE_D"} 
{"id":"57fd2d60e4b0ce322b0c50f0","name":"Mydo","services":null} 
{"id":"57fd2d64e4b0ce322b0c7d12","name":"Zazio","services":"SERVICE_C"} 
...

Lookup

Allows to lookup elements from other entities in current records. This operator makes an equivalent to a SQL left-join using the name of a foreign entity, a foreign field in that entity and a local field in current entity (set in query).

Here are some samples of how you can use it:

// retrieves companies with related contacts
var resultSet = sys.data.aggregate('companies', [
  {lookup: {localFieldPath: 'id', foreignFieldPath: 'company.id', foreignEntityName: 'contacts', as: 'relatedContacts'}}
]);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
// retrieves companies with related contacts
var query = sys.data.createAggregateQuery('companies');
query.lookup().localField('id').foreignField('company.id').foreignEntity('contacts').as('relatedContacts');
var resultSet = sys.data.aggregate(query);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
retrieves companies with related contacts

PUT /data/companies/aggregate

[
  {"lookup": {
    "localFieldPath": "id", 
    "foreignFieldPath": "company.id", 
    "foreignEntityName": "contacts", 
    "as": "relatedContacts"}
    }
]

When you run the above query you will get something like this as the output:

{"id":"57fd2d64e4b0ce322b0c8349","name":"Photolist","relatedContacts":[{
                                                                           "id": "5506fc44c2eee3b1a702694c",
                                                                           "company": {
                                                                             "id": "5506fc43c2eee3b1a7026944",
                                                                             "label": "Photolist"
                                                                           },
                                                                           "firstName": "John",
                                                                           "lastName": "Doe",
                                                                           "email": "john.doe@abcinc.com",
                                                                           "phoneNumbers": [
                                                                             "3039514211",
                                                                             "3039514210"
                                                                           ]
                                                                         },
                                                                         {
                                                                           "id": "5506fc44c2eee3b1a702694d",
                                                                           "company": {
                                                                             "id": "5506fc43c2eee3b1a7026944",
                                                                             "label": "Photolist"
                                                                           },
                                                                           "firstName": "Martin",
                                                                           "lastName": "Smith",
                                                                           "email": "martin.smith@abcinc.com"
                                                                         }]} 
{"id":"57fd2d62e4b0ce322b0c6268","name":"DabZ","relatedContacts":[{"id": "5506fc44c2eee3b1a702694e",
                                                                   "company": {
                                                                     "id": "5506fc43c2eee3b1a7026946",
                                                                     "label": "DabZ"
                                                                   },
                                                                   "firstName": "William",
                                                                   "lastName": "Brown",
                                                                   "email": "william.brown@acme.com",
                                                                   "phoneNumbers": [
                                                                     "3039514211",
                                                                     "3039514210"
                                                                   ]}]} 
{"id":"57fd2d60e4b0ce322b0c50f0","name":"Mydo","relatedContacts":[]} 
{"id":"57fd2d64e4b0ce322b0c7d12","name":"Zazio","relatedContacts":[]} 
...

Match

Allows to filter records in the pipeline. You need to use this operation to select the records you want to use for your aggregation.

This filter works the same way as the ones for regular queries, which means you can pass a query map or query builder (if you are using the Javascript API). Keep in mind that only filters by fields are supported here while other parameters are ignored.

Here are some sample of matching operations:

// counts the number of skills for contacts of customers
// see how the match operator filters contacts where company is a customer
var resultSet = sys.data.aggregate('contacts', [
  {match: {'company.isCustomer': true}},
  {group: {totalNumberOfSkills: 'sum(numberOfSkills)'}}
]);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
// counts the number of skills for contacts of customers
// see how the match operator filters contacts where company is a customer
var query = sys.data.createAggregateQuery('contacts');
query.match().field('company.isCustomer').equals(true);
query.group().accumulate('totalNumberOfSkills').sum('numberOfSkills');
var resultSet = sys.data.aggregate(query);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
counts the number of skills for contacts of customers
// see how the match operator filters contacts where company is a customer

PUT /data/contacts/aggregate

[
  {"match": {"company.isCustomer": true}},
  {"group": {"totalNumberOfSkills": "sum(numberOfSkills)"}}
]
// counts the number of skills for contacts
// the match operator filters contacts by customers and state equals to New Jersey
var resultSet = sys.data.aggregate('contacts', [
  {match: {
    'company.isCustomer': true, 
    'address.state': 'NJ'
  }},
  {group: {
    totalNumberOfSkills: 'sum(numberOfSkills)'
  }}
]);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
// counts the number of skills for contacts
// the match operator filters contacts by customers and state equals to New Jersey
var query = sys.data.createAggregateQuery('contacts');
query.match()
    .field('company.isCustomer').equals(true)
    .field('address.state').equals('NJ');
query.group().accumulate('totalNumberOfSkills').sum('numberOfSkills');
var resultSet = sys.data.aggregate(query);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
counts the number of skills for contacts
// the match operator filters contacts by customers and state equals to New Jersey

PUT /data/contacts/aggregate

[
  {"match": {
    "company.isCustomer": true, 
    "address.state": "NJ"
  }},
  {"group": {
    "totalNumberOfSkills": "sum(numberOfSkills)"
  }}
]

Sort

Allows to change the sorting of records in your aggregation pipeline. This is useful for sorting the final result or to use together with accumulators like first() or last() in the group operator.

Here are some samples of how you can use it:

// finds the contact with more skills per company
var resultSet = sys.data.aggregate('contacts', [
  {sort: {'numberOfSkills': 'desc'}},
  {group: {
    by: 'company', 
    firstName: 'first(firstName)', 
    lastName: 'first(lastName)', 
    skills: 'first(numberOfSkills)'
  }}
]);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
// finds the contact with more skills per company
var query = sys.data.createAggregateQuery('contacts');
query.sort().by('numberOfSkills', 'desc');
query.group()
    .by('company')
    .accumulate('firstName').first('firstName')
    .accumulate('lastName').first('lastName')
    .accumulate('skills').first('numberOfSkills');
var resultSet = sys.data.aggregate(query);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
finds the contact with more skills per company

PUT /data/contacts/aggregate

[
  {"sort": {"numberOfSkills": "desc"}},
  {"group": {
    "by": "company", 
    "firstName": "first(firstName)", 
    "lastName": "first(lastName)", 
    "skills": "first(numberOfSkills)"
  }}
]
// finds the contact with more skills per company and sorts the result by skills and last name
var resultSet = sys.data.aggregate('contacts', [
  {sort: {'numberOfSkills': 'desc'}},
  {group: {
    by: 'company', 
    firstName: 'first(firstName)', 
    lastName: 'first(lastName)', 
    skills: 'first(numberOfSkills)'
  }},
  {sort: {'skills': 'desc', 'lastName': 'asc'}}
]);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
// finds the contact with more skills per company and sorts the result by skills and last name
var query = sys.data.createAggregateQuery('contacts');
query.sort().by('numberOfSkills', 'desc');
query.group()
    .by('company')
    .accumulate('firstName').first('firstName')
    .accumulate('lastName').first('lastName')
    .accumulate('skills').first('numberOfSkills');
query.sort()
    .by('skills', 'desc')
    .by('lastName', 'asc');
var resultSet = sys.data.aggregate(query);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
finds the contact with more skills per company and sorts the result by skills and last name

PUT /data/contacts/aggregate

[
  {"sort": {"numberOfSkills": "desc"}},
  {"group": {
    "by": "company", 
    "firstName": "first(firstName)", 
    "lastName": "first(lastName)", 
    "skills": "first(numberOfSkills)"
  }},
  {"sort": {"skills": "desc", "lastName": "asc"}}
]

Project

Allows to remove fields from records and reduce memory usage. This is important if you need to process a lot of records but you just need a few fields.

Here are some samples of how to use it:

// leaves only number of employees and the sums up
var resultSet = sys.data.aggregate('companies', [
  {project: 'name,numberOfEmployees'}
]);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
// leaves only number of employees and the sums up
var query = sys.data.createAggregateQuery('companies');
query.project()
    .field('name')
    .field('numberOfEmployees');
var resultSet = sys.data.aggregate(query);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
leaves only number of employees and the sums up

PUT /data/companies/aggregate

[
  {"project": "name,numberOfEmployees"}
]

When you run the above query you will get something like this as the output:

{"id":"57fd2d64e4b0ce322b0c8349","name":"Photolist","numberOfEmployees":83} 
{"id":"57fd2d62e4b0ce322b0c6268","name":"DabZ","numberOfEmployees":635} 
{"id":"57fd2d60e4b0ce322b0c50f0","name":"Mydo","numberOfEmployees":917} 
{"id":"57fd2d64e4b0ce322b0c7d12","name":"Zazio","numberOfEmployees":618} 
...

You can see that only fields name and numberOfEmployees were included in the output (and id that is always there).

Group

This is the most important operation as it is the one that will actually aggregate data. It allows to group a set of records based on some fields and accumulate the results of those records into a field.

For example:

// calculate the total number of skills per company
var resultSet = sys.data.aggregate('contacts', [
  {project: 'company,numberOfSkills'},
  {group: {by: 'company', totalSkills: 'sum(numberOfSkills)'}}
]);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
// calculate the total number of skills per company
var query = sys.data.createAggregateQuery('contacts');
query.project()
    .field('company')
    .field('numberOfSkills');
query.group()
    .by('company')
    .accumulate('totalSkills').sum('numberOfSkills');
var resultSet = sys.data.aggregate(query);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
calculate the total number of skills per company

PUT /data/contacts/aggregate

[
  {"project": "company,numberOfSkills"},
  {"group": {"by": "company", "totalSkills": "sum(numberOfSkills)"}}
]

For this query you will get a result set like this one:

{"company":{"id":"57fd2d60e4b0ce322b0c503d","label":"Jabbercube"},"totalSkills":4}
{"company":{"id":"57fd2d63e4b0ce322b0c75b0","label":"Skilith"},"totalSkills":4}
{"company":{"id":"57fd2d61e4b0ce322b0c5dc6","label":"Trupe"},"totalSkills":4}
{"company":{"id":"57fd2d60e4b0ce322b0c4c22","label":"Feedbug"},"totalSkills":8}
{"company":{"id":"57fd2d62e4b0ce322b0c65ce","label":"Realcube"},"totalSkills":8}
{"company":{"id":"57fd2d61e4b0ce322b0c530d","label":"Flashset"},"totalSkills":12}
{"company":{"id":"57fd2d60e4b0ce322b0c51fe","label":"Thoughtworks"},"totalSkills":8}
{"company":{"id":"57fd2d60e4b0ce322b0c4e02","label":"Mynte"},"totalSkills":0}

Here you can see how data was grouped by the fields indicated in the by option (in this case the company field) and then you have accumulators defined that are added as fields in the output (in this case the totalSkills field).

Basically the by option indicates how records will be groups. Records that have the same value(s) in the fields listed in this option will belong to the same group. You can put more than one field if you separate the names using commas or calling by() multiple times in the query builder.

Then you can define any number of accumulators. Each accumulator will be a field in the output and must use one of the accumulator operations available:

count()

This will return the number of records in the group:

// counts contacts on each company
var resultSet = sys.data.aggregate('contacts', [
  {project: 'company'},
  {group: {by: 'company', numberOfContacts: 'count()'}}
]);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
// counts contacts on each company
var query = sys.data.createAggregateQuery('contacts');
query.project()
    .field('company')
    .field('numberOfSkills');
query.group()
    .by('company')
    .accumulate('numberOfContacts').count();
var resultSet = sys.data.aggregate(query);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
counts contacts on each company

PUT /data/contacts/aggregate

[
  {"project": "company,numberOfSkills"},
  {"group": {"by": "company", "numberOfContacts": "count()"}}
]

sum(field)

Sums up the value in one field for the records in the group:

// calculate the total number of skills per company
var resultSet = sys.data.aggregate('contacts', [
  {project: 'company,numberOfSkills'},
  {group: {by: 'company', totalSkills: 'sum(numberOfSkills)'}}
]);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
// calculate the total number of skills per company
var query = sys.data.createAggregateQuery('contacts');
query.project()
    .field('company')
    .field('numberOfSkills');
query.group()
    .by('company')
    .accumulate('totalSkills').sum('numberOfSkills');
var resultSet = sys.data.aggregate(query);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
calculate the total number of skills per company

PUT /data/contacts/aggregate

[
  {"project": "company,numberOfSkills"},
  {"group": {"by": "company", "totalSkills": "sum(numberOfSkills)"}}
]

This only works for number fields like integer, money, decimal or percentage.

avg(field)

Calculates the average of the values in one field for the records in the group:

// calculate the average number of skills per company per contact
var resultSet = sys.data.aggregate('contacts', [
  {project: 'company,numberOfSkills'},
  {group: {by: 'company', avgSkills: 'avg(numberOfSkills)'}}
]);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
// calculate the average number of skills per company per contact
var query = sys.data.createAggregateQuery('contacts');
query.project()
    .field('company')
    .field('numberOfSkills');
query.group()
    .by('company')
    .accumulate('avgSkills').avg('numberOfSkills');
var resultSet = sys.data.aggregate(query);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
calculate the average number of skills per company per contact

PUT /data/contacts/aggregate

[
  {"project": "company,numberOfSkills"},
  {"group": {"by": "company", "avgSkills": "avg(numberOfSkills)"}}
]

This only works for number fields like integer, money, decimal or percentage.

Keep in mind that if the value of the field is null or the field isn’t present at all in the record it won’t count for the average calculation.

first(field)

It will select the value of the first record as the value of the output. This is usually used in combination with the sort operator:

// finds the contact with more skills per company
var resultSet = sys.data.aggregate('contacts', [
  {sort: {'numberOfSkills': 'desc'}},
  {group: {
    by: 'company', 
    firstName: 'first(firstName)', 
    lastName: 'first(lastName)', 
    skills: 'first(numberOfSkills)'
  }}
]);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
// finds the contact with more skills per company
var query = sys.data.createAggregateQuery('contacts');
query.sort().by('numberOfSkills', 'desc');
query.group()
    .by('company')
    .accumulate('firstName').first('firstName')
    .accumulate('lastName').first('lastName')
    .accumulate('skills').first('numberOfSkills');
var resultSet = sys.data.aggregate(query);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
finds the contact with more skills per company

PUT /data/contacts/aggregate

[
  {"sort": {"numberOfSkills": "desc"}},
  {"group": {
    "by": "company", 
    "firstName": "first(firstName)", 
    "lastName": "first(lastName)", 
    "skills": "first(numberOfSkills)"
  }}
]

last(field)

It will select the value of the last record as the value of the output. This is usually used in combination with the sort operator:

// finds the contact with more skills per company
var resultSet = sys.data.aggregate('contacts', [
  {sort: {'numberOfSkills': 'asc'}},
  {group: {
    by: 'company', 
    firstName: 'last(firstName)', 
    lastName: 'last(lastName)', 
    skills: 'last(numberOfSkills)'
  }}
]);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
// finds the contact with more skills per company
var query = sys.data.createAggregateQuery('contacts');
query.sort().by('numberOfSkills', 'asc');
query.group()
    .by('company')
    .accumulate('firstName').last('firstName')
    .accumulate('lastName').last('lastName')
    .accumulate('skills').last('numberOfSkills');
var resultSet = sys.data.aggregate(query);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
finds the contact with more skills per company

PUT /data/contacts/aggregate

[
  {"sort": {"numberOfSkills": "asc"}},
  {"group": {
    "by": "company", 
    "firstName": "last(firstName)", 
    "lastName": "last(lastName)", 
    "skills": "last(numberOfSkills)"
  }}
]

max(field)

It will select the maximum value from all records record as the value of the output.

// finds the maximum number of skills per company for one contact
var resultSet = sys.data.aggregate('contacts', [
  {group: {
    by: 'company', 
    skills: 'max(numberOfSkills)'
  }}
]);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
// finds the maximum number of skills per company for one contact
var query = sys.data.createAggregateQuery('contacts');
query.group()
    .by('company')
    .accumulate('skills').max('numberOfSkills');
var resultSet = sys.data.aggregate(query);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
finds the maximum number of skills per company for one contact

PUT /data/contacts/aggregate

[
  {"group": {
    "by": "company", 
    "skills": "max(numberOfSkills)"
  }}
]

min(field)

It will select the minimum value from all records record as the value of the output.

// finds the minimum number of skills per company for one contact
var resultSet = sys.data.aggregate('contacts', [
  {group: {
    by: 'company', 
    skills: 'min(numberOfSkills)'
  }}
]);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
// finds the minimum number of skills per company for one contact
var query = sys.data.createAggregateQuery('contacts');
query.group()
    .by('company')
    .accumulate('skills').min('numberOfSkills');
var resultSet = sys.data.aggregate(query);
while (resultSet.hasNext()) {
  log(JSON.stringify(resultSet.next()));
}
finds the minimum number of skills per company for one contact

PUT /data/contacts/aggregate

[
  {"group": {
    "by": "company", 
    "skills": "min(numberOfSkills)"
  }}
]

Limitations

Aggregated queries work well for small and moderated data sets. If you need to perform analysis over big data sets we recommend moving that information to other services especifically designed with that purpose in mind like Google Big Query or Amazon Redshift.

Back to top