GOOGLE ADS

jeudi 21 avril 2022

Elasticsearch Date Histogram aggregations results mismatch

I have taken the user count from the elastic search index. same query but different histogram interval types like Day, Month, Week, Quarter, and year the counts are not matching correctly

note: totally I have only 4 months of data for this year

This is for month interval ES search query

{
"query": {
"range": {
"@timestamp": {
"gte": "2022-01-01",
"lte": "2022-04-14"
}
}},
"aggs": {
"dt": {
"date_histogram": {
"field": "@timestamp",
"interval": "month",
"format": "yyyy-MM-dd"

},
"aggs": {
"events": {
"nested": {
"path": "events"
},

"aggs": {
"unique_user_count": {
"cardinality": {
"field": "events.actor.id.keyword"
}
}
}
}}}}}


got below Month results (response)

{
"aggregations": {
"dt": {
"buckets": [
{
"key_as_string": "2022-01-01",
"key": 1640995200000,
"doc_count": 2930,
"events": {
"doc_count": 13988,
"unique_user_count": {
"value": 37
}
}
},
{
"key_as_string": "2022-02-01",
"key": 1643673600000,
"doc_count": 36910,
"events": {
"doc_count": 175151,
"unique_user_count": {
"value": 580
}
}
},
{
"key_as_string": "2022-03-01",
"key": 1646092800000,
"doc_count": 24861,
"events": {
"doc_count": 133383,
"unique_user_count": {
"value": 555
}
}
},
{
"key_as_string": "2022-04-01",
"key": 1648771200000,
"doc_count": 6005,
"events": {
"doc_count": 30730,
"unique_user_count": {
"value": 170
}
}
}
]
}
}
}

Again I run the same query but changed intervals = Year

{
"query": {
"range": {
"@timestamp": {
"gte": "2022-01-01",
"lte": "2022-04-14"
}
}},
"aggs": {
"dt": {
"date_histogram": {
"field": "@timestamp",
"interval": "year",
"format": "yyyy-MM-dd"

},
"aggs": {
"events": {
"nested": {
"path": "events"
},

"aggs": {
"unique_user_count": {
"cardinality": {
"field": "events.actor.id.keyword"
}
}
}
}}}}}


I got the below Year response

{
"aggregations": {
"dt": {
"buckets": [
{
"key_as_string": "2022-01-01",
"key": 1640995200000,
"doc_count": 70706,
"events": {
"doc_count": 353252,
"unique_user_count": {
"value": 1007
}
}
}
]
}
}
}

My Expected results like this
year = 37+580+555+170
year = 1342 ----> but I got "1007" wrong values

how do match the sum(month) value and year values?


Solution of the problem

In your monthly buckets you're running a cardinality aggregation to get the unique user count per month.

If you're running the same aggregation over a year, the unique user count cannot be the sum of the monthly user count because a given user might have had interactions during several months.

If you compare the total events count they do match: 13988 + 175151 + 133383 + 30730 = 353252

So all is fine, you just need to compare apples to apples

Aucun commentaire:

Enregistrer un commentaire

Comment utiliseriez-vous .reduce() sur des arguments au lieu d'un tableau ou d'un objet spécifique ?

Je veux définir une fonction.flatten qui aplatit plusieurs éléments en un seul tableau. Je sais que ce qui suit n'est pas possible, mais...