I have taken the user count from the elastic search index. same query but different histogram interval types like Day, Month, Week, Quarter, and year the counts are not matching correctly
note: totally I have only 4 months of data for this year
This is for month interval ES search query
{
"query": {
"range": {
"@timestamp": {
"gte": "2022-01-01",
"lte": "2022-04-14"
}
}},
"aggs": {
"dt": {
"date_histogram": {
"field": "@timestamp",
"interval": "month",
"format": "yyyy-MM-dd"
},
"aggs": {
"events": {
"nested": {
"path": "events"
},
"aggs": {
"unique_user_count": {
"cardinality": {
"field": "events.actor.id.keyword"
}
}
}
}}}}}
got below Month results (response)
{
"aggregations": {
"dt": {
"buckets": [
{
"key_as_string": "2022-01-01",
"key": 1640995200000,
"doc_count": 2930,
"events": {
"doc_count": 13988,
"unique_user_count": {
"value": 37
}
}
},
{
"key_as_string": "2022-02-01",
"key": 1643673600000,
"doc_count": 36910,
"events": {
"doc_count": 175151,
"unique_user_count": {
"value": 580
}
}
},
{
"key_as_string": "2022-03-01",
"key": 1646092800000,
"doc_count": 24861,
"events": {
"doc_count": 133383,
"unique_user_count": {
"value": 555
}
}
},
{
"key_as_string": "2022-04-01",
"key": 1648771200000,
"doc_count": 6005,
"events": {
"doc_count": 30730,
"unique_user_count": {
"value": 170
}
}
}
]
}
}
}
Again I run the same query but changed intervals = Year
{
"query": {
"range": {
"@timestamp": {
"gte": "2022-01-01",
"lte": "2022-04-14"
}
}},
"aggs": {
"dt": {
"date_histogram": {
"field": "@timestamp",
"interval": "year",
"format": "yyyy-MM-dd"
},
"aggs": {
"events": {
"nested": {
"path": "events"
},
"aggs": {
"unique_user_count": {
"cardinality": {
"field": "events.actor.id.keyword"
}
}
}
}}}}}
I got the below Year response
{
"aggregations": {
"dt": {
"buckets": [
{
"key_as_string": "2022-01-01",
"key": 1640995200000,
"doc_count": 70706,
"events": {
"doc_count": 353252,
"unique_user_count": {
"value": 1007
}
}
}
]
}
}
}
My Expected results like this
year = 37+580+555+170
year = 1342 ----> but I got "1007" wrong values
how do match the sum(month) value and year values?
Solution of the problem
In your monthly buckets you're running a cardinality aggregation to get the unique user count per month.
If you're running the same aggregation over a year, the unique user count cannot be the sum of the monthly user count because a given user might have had interactions during several months.
If you compare the total events count they do match: 13988 + 175151 + 133383 + 30730 = 353252
So all is fine, you just need to compare apples to apples
Aucun commentaire:
Enregistrer un commentaire