Remove duplicate values from a Map in Java

·

3 min read

Overview

This article builds on the excellent article from Baeldung.

It explores one or two ways, to remove duplicate values, from a Map, not mentioned in the above-mentioned article.

💡
This is an updated version of this article; I have modified one of the code samples (see below regarding potential memory leaks).

Introduction

I am using Java 17 and start with the following map:

Map<String, String> placesAndCapitals = Map.of("antartica", "new town",
        "fantasia", "small ville",
        "mars", "big city",
        "sun", "hot plazza",
        "underworld", "new town");

We see that the value "new town" appears twice.

The original article, on Baeldung, explores a couple of ways to remove duplicates, therefore let's see how we could do it differently.

Filter with a custom predicate

We can stream the entries in the Map and filter distinct values with a custom predicate. The predicate keeps track of values that have already been seen (I think this 'technique' is used in another Baeldung's article).

Predicate<Map.Entry<String, String>> checkDistinct = new Predicate<>() {
    Map<String, Boolean> alreadySeen = new HashMap<>();

    @Override
    public boolean test(Map.Entry<String, String> entry) {
        return alreadySeen.putIfAbsent(entry.getValue(), Boolean.TRUE) == null;
    }
};

Map<String, String> result = placesAndCapitals.entrySet()
    .stream()
    .filter(checkDistinct)
    .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));

The predicate uses an internal Map named "alreadySeen", and makes use of putIfAbsent.

putIfAbsent returns null for keys that are not in the Map, and also inserts a value for the missing key.
Subsequent calls to putIfAbsent - with keys already present - returns Boolean.TRUE.

The type of alreadySeen is Map<String, String>, however, you can choose anything you fancy.
As long as you insert a value with putIfAbsent, that should be fine.

Using Reduce

We can use reduce to achieve our goal.
Let us do it first with lambda functions.

HashMap<String, String> result = placesAndCapitals.entrySet()
         .stream()
         .reduce(new HashMap<>(),
               (acc, item) -> {
                     if (!acc.containsValue(item.getValue())) {
                        acc.put(item.getKey(), item.getValue());
                     }
                     return acc;
               },
               (mapA, mapB) -> {
                     mapA.putAll(mapB);
                     return mapA;
               });

Here we use acc which is itself a Map, to accumulate the result(s).
acc is initialized with new HasMap<>().
If acc does not contain a value, we add the key and associated value.
We have to return something from the lambdas which are passed to reduce; acc is reused by the next element in the stream (and mapA is returned by the combiner).
We use the overload of reduce which takes three parameters:
<U> U reduce(U identity, BiFunction<U,? super T,U> accumulator, BinaryOperator<U> combiner)

We can refactor a little

placesAndCapitals.entrySet()
         .stream()
         .reduce(new HashMap<>(), AppUtils.accumulator, AppUtils.combiner);

public class AppUtils {
    private AppUtils() {}

    static BiFunction<Map<String, String>, Map.Entry<String, String>, Map<String, String>> accumulator =
            (acc, item) -> {
                if (!acc.containsValue(item.getValue())) {
                    acc.put(item.getKey(), item.getValue());
                }
                return acc;
            };

    static BinaryOperator<Map<String, String>> combiner =
            (mapA, mapB) -> {
                mapA.putAll(mapB);
                return mapA;
            };
}

I've simply extracted the lambdas into a utility class.
It's a matter of taste, however, you could do it differently.

Using the stream API twice

Similarly to what is done in the Baeldung article, one can use stream twice in a row.
This is a slight variation of what is done in the original article; just for fun.

PS: use what is done in the original article, it is more straightforward I think.

placesAndCapitals.entrySet().stream().
      collect(Collectors.groupingBy(Map.Entry::getValue))
      .entrySet().stream()
      .collect(Collectors.toMap(e -> e.getValue().get(0).getKey(), Map.Entry::getKey));

We use groupingBy, to group entries by value, we end up with a Map<String, List<String>>.
We then stream the entry set again and keep the key of the first element (an Entry<K,V>) in the List associated with each key (the values in the original Map).
Each list (of Entry<K,V>) contains at least one element.

Conclusion

In this article, we've built on an article from Baeldung to explore approaches to removing duplicate Map values.

The complete source code is available on Github.

Did you find this article valuable?

Support Yet another blog by becoming a sponsor. Any amount is appreciated!