Remove duplicate values from a Map in Java
Overview
This article builds on the excellent article from Baeldung.
It explores one or two ways, to remove duplicate values, from a Map, not mentioned in the above-mentioned article.
Introduction
I am using Java 17 and start with the following map:
Map<String, String> placesAndCapitals = Map.of("antartica", "new town",
"fantasia", "small ville",
"mars", "big city",
"sun", "hot plazza",
"underworld", "new town");
We see that the value "new town" appears twice.
The original article, on Baeldung, explores a couple of ways to remove duplicates, therefore let's see how we could do it differently.
Filter with a custom predicate
We can stream the entries in the Map and filter distinct values with a custom predicate. The predicate keeps track of values that have already been seen (I think this 'technique' is used in another Baeldung's article).
Predicate<Map.Entry<String, String>> checkDistinct = new Predicate<>() {
Map<String, Boolean> alreadySeen = new HashMap<>();
@Override
public boolean test(Map.Entry<String, String> entry) {
return alreadySeen.putIfAbsent(entry.getValue(), Boolean.TRUE) == null;
}
};
Map<String, String> result = placesAndCapitals.entrySet()
.stream()
.filter(checkDistinct)
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
The predicate uses an internal Map named "alreadySeen", and makes use of putIfAbsent
.
putIfAbsent
returns null
for keys that are not in the Map, and also inserts a value for the missing key.
Subsequent calls to putIfAbsent
- with keys already present - returns Boolean.TRUE
.
The type of alreadySeen
is Map<String, String>
, however, you can choose anything you fancy.
As long as you insert a value with putIfAbsent
, that should be fine.
Using Reduce
We can use reduce
to achieve our goal.
Let us do it first with lambda functions.
HashMap<String, String> result = placesAndCapitals.entrySet()
.stream()
.reduce(new HashMap<>(),
(acc, item) -> {
if (!acc.containsValue(item.getValue())) {
acc.put(item.getKey(), item.getValue());
}
return acc;
},
(mapA, mapB) -> {
mapA.putAll(mapB);
return mapA;
});
Here we use acc
which is itself a Map, to accumulate the result(s).acc
is initialized with new HasMap<>().
If acc
does not contain a value, we add the key and associated value.
We have to return something from the lambdas which are passed to reduce
; acc
is reused by the next element in the stream (and mapA
is returned by the combiner).
We use the overload of reduce
which takes three parameters:<U> U reduce(U identity, BiFunction<U,? super T,U> accumulator, BinaryOperator<U> combiner)
We can refactor a little
placesAndCapitals.entrySet()
.stream()
.reduce(new HashMap<>(), AppUtils.accumulator, AppUtils.combiner);
public class AppUtils {
private AppUtils() {}
static BiFunction<Map<String, String>, Map.Entry<String, String>, Map<String, String>> accumulator =
(acc, item) -> {
if (!acc.containsValue(item.getValue())) {
acc.put(item.getKey(), item.getValue());
}
return acc;
};
static BinaryOperator<Map<String, String>> combiner =
(mapA, mapB) -> {
mapA.putAll(mapB);
return mapA;
};
}
I've simply extracted the lambdas into a utility class.
It's a matter of taste, however, you could do it differently.
Using the stream API twice
Similarly to what is done in the Baeldung article, one can use stream
twice in a row.
This is a slight variation of what is done in the original article; just for fun.
PS: use what is done in the original article, it is more straightforward I think.
placesAndCapitals.entrySet().stream().
collect(Collectors.groupingBy(Map.Entry::getValue))
.entrySet().stream()
.collect(Collectors.toMap(e -> e.getValue().get(0).getKey(), Map.Entry::getKey));
We use groupingBy
, to group entries by value, we end up with a Map<String, List<String>>.
We then stream the entry set again and keep the key of the first element (an Entry<K,V>
) in the List associated with each key (the values in the original Map).
Each list (of Entry<K,V>
) contains at least one element.
Conclusion
In this article, we've built on an article from Baeldung to explore approaches to removing duplicate Map values.
The complete source code is available on Github.