Aggregate Functions
This list includes functions that perform statistical calculations over groups of data.
avg(<value>)
This function returns the average, or mean, of the values in a field.
Usage
You can use this function with the stats
and timechart
commands.
Example
-
The following example returns the average of the values in the
latitude
field for each distinct value in thecity
field:... | stats avg(latitude) BY city
-
The following example displays a timechart of the average of the
latency
field by city:... | timechart avg(latency) BY city
Use-Case Example
Calculate Average Response Time for Web Services
Problem: A web administrator wants to determine the average response time for different web services to identify which services might need optimization.
Solution: Use the stats
command with the avg
function to calculate the average response time for each web service.
Implementation:
index=weblog sourcetype=access_combined
| stats avg(response_time) AS avg_response_time BY service
| sort - avg_response_time
min(<value>)
This function returns the minimum value in a field.
Usage
This function processes field values as numbers if possible, otherwise processes field values as strings. You can use this function with the stats
and timechart
commands. This command cannot be used on non-numeric fields.
Example
-
The following example calculates the minimum value of the
latitude
field for locations in the southern hemisphere (latitude < 0) and groups the results byweekday
:... | stats min(eval(latitude < 0)) AS min BY weekday
-
The following example finds the minimum
latency
value:... | stats min(latency)
-
The following example displays a timechart of the minimum latency over time for each city:
... | timechart min(latency) BY city
Use-Case Example
Identify the Minimum CPU Utilization per Minute per Server
Problem: You aim to monitor the performance of various servers in your network by identifying the minimum CPU utilization recorded, to ensure that no server is showing abnormal behavior which could indicate issues.
Solution: Utilize the stats
command in conjunction with the min
function to calculate the minimum CPU utilization for each server. This method allows for a straightforward identification of servers that may be underperforming or experiencing issues, by highlighting instances of unusually low CPU utilization.
Implementation:
... | stats min(cpu_utilization) AS MinCPUUtilization BY host
Explanation:
- This command calculates the minimum CPU Utilization (
cpu_utilization
field) for each server (host
field). Themin
function is used to find the lowest value of CPU utilization for each unique server. This approach is useful for monitoring server performance and quickly identifying any servers that may be experiencing issues.
max(<value>)
This function returns the maximum value in a field.
Usage
This function processes field values as numbers if possible, otherwise processes field values as strings. You can use this function with the stats
and timechart
commands. This command cannot be used on non-numeric fields.
Example
-
The following example returns the maximum value of the
latency
field:... | stats max(latency)
-
The following example calculates the maximum value of the
latitude
field for locations in the northern hemisphere (latitude > 0) and groups the results byweekday
:... | stats max(eval(latitude > 0)) AS max BY weekday
-
The following example displays a timechart of the maximum latency over time for each city:
... | timechart max(latency) BY city
Use-Case Example
Identify the Maximum CPU Utilization per Server Over Time
Problem: A system administrator wants to monitor the maximum CPU utilization for each server over time to identify potential overloads.
Solution: Use the stats
command to calculate the maximum CPU utilization per server. Alternatively, use the timechart
command to visualize the maximum CPU utilization over time for each server.
Implementation with stats
:
Assuming the data is structured with fields for server
, time
, and cpu_usage
, the following SPL query calculates the maximum CPU utilization per server:
index=server_metrics sourcetype=cpu_usage
| stats max(cpu_usage) AS max_cpu_usage BY server
Explanation:
- This query filters logs to those related to CPU usage metrics.
- The
stats
command calculates the maximumcpu_usage
value for eachserver
.
Implementation with timechart
:
To visualize the maximum CPU utilization over time for each server:
index=server_metrics sourcetype=cpu_usage
| timechart span=1m max(cpu_usage) BY server
Explanation:
- This query also filters logs to those related to CPU usage metrics.
- The
timechart
command creates a time series chart with a 1-minute interval (span=1m
), showing the maximumcpu_usage
for eachserver
over time. - This visualization helps in identifying trends and potential peaks in CPU utilization across different servers.
Output:
- The
stats
command output lists each server with its maximum CPU utilization. - The
timechart
command output will be a time series chart, with time on the x-axis and CPU utilization on the y-axis, displaying lines for each server to indicate how the maximum CPU utilization varies over time.
range(<value>)
Returns the difference between the maximum and minimum values in a field.
Usage
The values in the field must be numeric. You can use this function with the stats
and timechart
commands.
Example
- The following command calculates the range of latency for each city, providing a simple measure of variability:
... | stats range(latency) AS LatencyRange BY city
Detailed Example
- For a more in-depth analysis, suppose you want to analyze network latency across different cities, focusing on peak hours (9 AM to 5 PM) during weekdays. You aim to identify cities with high latency variability and peak latency times:
... | where date_wday!="saturday" AND date_wday!="sunday"
| where date_hour>=9 AND date_hour<=17
| eval CityLatency=if(latency>200, "High", "Normal")
| stats min(latency) AS MinLatency, max(latency) AS MaxLatency, range(latency) AS LatencyRange BY city
| sort - LatencyRange
- The following example displays a timechart of the range of latency over time for each city:
... | timechart span=1h range(latency) BY city
Use-Case Example
Analyze Temperature Fluctuations Across Different Locations
Problem: A meteorologist wants to identify locations with the highest temperature fluctuations over a 24-hour period.
Solution: Use the stats
command with the range
function to calculate the temperature range for each location.
Implementation:
index=weather sourcetype=temperature_readings
| eval hour=strftime(_time, "%H")
| stats min(temperature) AS min_temp, max(temperature) AS max_temp, range(temperature) AS temp_range BY location
| sort - temp_range
Explanation:
- This query filters logs to those containing temperature readings.
- The
eval
command extracts the hour from the timestamp. - The
stats
command calculates the minimum, maximum, and range of temperature for each location. - Results are sorted in descending order of temperature range.
- The output will show each location with its minimum and maximum temperatures, as well as the temperature range, helping identify areas with the highest temperature fluctuations.
sum(<value>)
This function returns the sum of the values in a field.
Usage
You can use this function with the stats
and timechart
commands.
Example
-
To calculate the total revenue from subscriptions for all events:
... | stats sum(revenue)
-
For summing revenue from subscriptions per month per city:
... | stats sum(revenue) AS "total revenue" BY city, month
-
The following example displays a timechart of the sum of revenue over time per city:
... | timechart span=1m sum(revenue) BY city
Use-Case Example
Calculate Total Sales by Product Category
Problem: A sales manager wants to determine the total sales for each product category to identify top-performing categories.
Solution: Use the stats
command with the sum
function to calculate the total sales for each product category.
Implementation:
index=sales sourcetype=transactions
| stats sum(sale_amount) AS total_sales BY product_category
| sort - total_sales
Explanation:
- This query filters logs to those related to sales transactions.
- The
stats
command calculates the sum of sale_amount for each product_category. - Results are sorted in descending order of total sales.
- The output will show each product category and its corresponding total sales, helping identify the top-performing categories.
count(<value>) or c(<value>)
This function returns the number of occurrences in a field.
Usage
To use this function, you can specify count(<value>)
, or the abbreviation c(<value>)
. This function processes field values as strings. You can use this function with the stats
and timechart
commands.
Example
-
Count the number of events in each city:
... | stats count by city
-
The following example displays a timechart of the count of events over time, split by city:
... | timechart count by city
Use-Case Example
Analyze Website Traffic by Source
Problem: A digital marketer wants to determine which traffic sources are bringing the most visitors to the website.
Solution: Use the stats
command with the count
function to count the number of visits from each traffic source.
Implementation:
index=web sourcetype=access_combined
| stats count AS visits BY referrer_domain
| sort - visits
| head 10
Explanation:
- This query filters logs to those related to web access.
- The
stats
command counts the occurrences for eachreferrer_domain
. - Results are sorted in descending order of visit count.
- The
head
command limits the output to the top 10 results. - The output will show the top 10 referrer domains and their corresponding visit counts, helping identify the most effective traffic sources.
distinct_count(<value>) or dc(<value>)
This function returns the count of distinct values in a field.
Usage
To use this function, you can specify distinct_count(<value>)
, or the abbreviation dc(<value>)
. This function processes field values as strings. You can use this function with the stats
and timechart
commands.
Example
-
Calculate the number of unique zip codes from which events are reported:
... | stats dc(zip)
-
For each city, calculate the distinct count of zip codes with events reporting latency greater than 100ms:
... | where latency > 100 | stats dc(zip) AS UniqueZipCodes by city
-
The following example displays a timechart of the distinct count of zip codes reporting events over time:
... | timechart dc(zip)
Use-Case Example
Analyze User Engagement Across Different Devices
Problem: A product manager wants to understand how many unique users are engaging with their application across different device types.
Solution: Use the stats
command with the distinct_count
function to count the number of unique users for each device type.
Implementation:
index=app_usage sourcetype=user_sessions
| stats dc(user_id) AS unique_users BY device_type
| sort - unique_users
Explanation:
- This query filters logs to those related to app usage and user sessions.
- The
stats
command calculates the distinct count ofuser_id
for eachdevice_type
. - Results are sorted in descending order of unique user count.
- The output will show each device type and its corresponding number of unique users, helping understand user engagement across different devices.