we didnt rely on it, or on the ordering of messages received from kafka, timestamps and transaction IDs were generated by the client app/kafka publisher and were part of the message put into kafka topic, when we consume that message with one of the parallel kafka consumers and save a row in hbase table that original timestamp + transactionId becomes part of the rowkey string, other parts being attributes that we wanted to index (secondary indices are supported in hbase/phoenix but we didnt use them too much, basically the composite rowkey is the index). Then when querying hbase it works as a parallel scanning machine and can do a time range scan + filtering +aggregation very fast.
On a separate note we didn't use joins even though they are supported in Phoenix, data was completely denormalized into one big table.
On a separate note we didn't use joins even though they are supported in Phoenix, data was completely denormalized into one big table.