Skip to content

Inconsistent Data Querying ElasticSearch #409

@ribeirodba

Description

@ribeirodba

Look this test I´ve performed in Elassandra with Python.

I created a function to query data using Cassandra driver:

def process_query_cassandra(query, fetch_size = 5000, consistency_level=ConsistencyLevel.LOCAL_ONE):
start = timer()
paging_state = None
rows = []
while True:
statement = SimpleStatement(query, fetch_size = fetch_size, consistency_level=consistency_level)
results = session.execute(statement, paging_state=paging_state)
paging_state = results.paging_state
for row in results.current_rows:
rows.append(row)
if paging_state == None:
break
df = pd.DataFrame(rows)
end = timer()
return df, timedelta(seconds=end-start)

Table f0101 has 872390 rows.

When I query using CQL only, results are OK:

query1 = """
select *
from "dlfinjdep"."f0101"
ALLOW FILTERING
"""

Running Cassandra #1
(22-06-01 12:43) Rows: 872390 seconds: 0:03:17.609349
Running Cassandra #2
(22-06-01 12:46) Rows: 872390 seconds: 0:03:04.289089

However, when I use the option to query ElasticSearch index through CQL, I get different results:

query2 = """
select *
from "dlfinjdep"."f0101"
WHERE es_query='{"query":{"match_all":{}}}'
AND es_options='indices=dlfinjdep-f0101-index'
ALLOW FILTERING
"""

Running Elastic #1
(22-06-01 12:50) Rows: 841350 seconds: 0:03:49.136313
Running Elastic #2
(22-06-01 12:54) Rows: 834372 seconds: 0:03:33.985948

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions