Splunk Dedup Command Tutorial: Remove Duplicate Events

The dedup command removes duplicate events from your search results. If you're seeing the same log entry multiple times or getting repeated data from different sources, dedup cleans it up so you see each unique record only once. It's one of the first commands you'll reach for when dealing with noisy data.

What the Dedup Command Does

Dedup filters out duplicate events based on the fields you specify. You tell it which fields to compare, and it keeps only the first occurrence of each unique combination. Events that match exactly on those fields get discarded on their subsequent appearances.

This is different from the stats command. Where stats aggregates and summarizes data, dedup simply removes exact duplicates and preserves the original event structure. You keep all the fields from your events, they just get deduplicated.

Basic Dedup Syntax

The simplest form looks like this:

... | dedup field_name

This keeps the first occurrence of each unique value in field_name and removes all subsequent duplicates. If you have fifty events with the same user ID, you'll get back just one.

You can dedup on multiple fields:

source="auth.log" | dedup user_id, source_ip

Now dedup removes events where both user_id and source_ip match previous entries. It only considers an event a duplicate if both fields have the same values as a previous event.

Dedup Limits and Control

By default, dedup keeps the first occurrence. You can specify how many duplicates to keep using the number parameter:

source="errors.log" | dedup error_code, 2

This keeps the first two occurrences of each unique error_code combination and removes the rest. Useful when you want to see a few examples of each error but not hundreds.

You can also dedup based on the latest occurrence instead of the first:

source="events.log" | dedup timestamp sortby -timestamp

The sortby parameter arranges events by the specified field. Using a minus sign sorts in descending order, so dedup keeps the event with the newest timestamp instead of the oldest.

Practical Example: Cleaning Up Duplicate Logs

Imagine you have logs flowing from multiple agents and they're sending duplicates. Here's how to clean it:

source="application.log" | dedup source, message

This keeps one event for each unique combination of source and message. If Server A and Server B log the same message, you get both. But if Server A logs the same message twice, you see it only once.

Now your data is cleaner and you can run stats or other analysis without inflated counts from duplicates.

Dedup with Sorting

Sorting before dedup changes which occurrence you keep. If you sort by timestamp first, dedup keeps the earliest event. Sort by timestamp descending and it keeps the latest:

source="metrics.log" | sort - _time | dedup metric_name, status

This sorts events newest-first, then keeps the most recent occurrence of each metric and status combination. Great for getting the latest state of something.

Want to go deeper?

No Nonsense Introduction to Splunk

Skip the endless docs rabbit hole. This hands-on course takes you from zero to confident with Splunk searches, dashboards, and alerts. Taught by a Splunk Certified Architect with over 10 years of real-world experience.

View the course →

Dedup vs. Unique vs. Stats

Dedup is similar to other commands but works differently. The unique command also removes duplicates but works on the output of stats commands. Stats aggregates and counts, while dedup just removes duplicates while keeping the original field values.

If you need exact counts of occurrences, use stats count by field_name. If you need to see each unique record once, use dedup.

Dedup with Complex Field Combinations

You can dedup on computed fields created with eval:

source="transactions.log" | eval hour = strftime(_time, "%H") | dedup user_id, hour

This creates an hour field from the timestamp, then deduplicates so you see each user only once per hour. Useful for tracking user activity patterns without repetitive entries.

You can also dedup on wildcards or partial matches using other commands before dedup:

source="web.log" | eval domain = substr(url, 1, find(url, "/")-1) | dedup domain, status_code

This extracts domain from the URL, then keeps one occurrence of each domain and status code combination.

Dedup Performance Considerations

Dedup is fast and works on all data in your pipeline. It doesn't aggregate, so you're still working with individual events. Large result sets process quickly because dedup just does a simple comparison on the fields you specify.

However, dedup doesn't reduce your result set by default. If you have millions of unique events, dedup keeps all of them. The limit parameter helps here:

source="logs.log" | dedup user_id, 1

This keeps only the first occurrence of each user_id, significantly reducing your results. Combine with limit to cap the total number of events returned.

Common Dedup Patterns

Remove all duplicate events entirely:

... | dedup *

The asterisk means all fields. Events must match exactly on every field to be considered duplicates. This effectively removes perfect duplicate lines.

Keep only the most recent occurrence of each user:

... | sort - _time | dedup user_id

See what hosts are generating errors with no duplicates:

source="errors.log" | dedup host, error_type

This gives you one example of each error type per host, making it easy to see the variety of problems without repetition.

When to Use Dedup

Use dedup when you're dealing with noisy data that has legitimate duplicates. You want to see each unique event once, not multiple times. It's perfect for agent-generated logs, APIs that retry requests, or systems that send the same event across multiple channels.

Avoid dedup when you want to count occurrences. If you need to know how many times something happened, use stats instead. Dedup just hides the duplicates without changing your analysis.

You now understand how dedup cleans up duplicate events and keeps your search results clear and focused. Start using it whenever you notice duplicate entries in your searches.

Ready to master Splunk search commands? Check out our training course for hands-on practice with dedup and other essential SPL commands.

Ready to level up?

No Nonsense Introduction to Splunk

Learn Splunk the practical way. No death-by-slides, no waffle. Just focused video demos with real data and a structured path from installation to dashboards and alerts. From just $4.99 with lifetime access.

Start the course for $4.99 →

Relevant lessons in the course

Basic Search and SPL Syntax Getting Data Into Splunk Creating Dashboards

← Back to all posts