I'm working on a price data project, trying to store OHLC 1 minute data and then sampling it over various intervals. I decided to try QuestDB for its neat time-series features such as SAMPLE BY but am running into a wall.
I want to use SAMPLE BY to query my data, which looks like this:
| timestamp | open | high | low | close |
|---|---|---|---|---|
| "2021-09-10T19:43:21.657672Z" | 2.0 | 4.0 | 1.0 | 3.0 |
| "2021-09-15T06:12:42.267416Z" | 3.0 | 6.0 | 3.0 | 6.0 |
For example, for the two rows above, which are in the same month, I would like to run SAMPLE BY 1M and aggregate the values for the month.
For high and low values the aggregate functions are min(low) and max(high), but I can't figure out what to do for open and close values.
I tried retrofitting this approach but got as far as the following without errors:
with ohlc as (
select row_number() over (
partition by cast(cast(timestamp as float) / 1000000 / 60 / 60 / 24 / 7 / 4.34524 as int)
order by timestamp
), open, high, low, close, timestamp
from eurusd
)
select max(high) high,
min(low) low,
min(case when rn_asc = 1 then [open] end) as open
from ohlc
Some errors I encountered:
- Adding the second
row_number() over [...] order by timestamp descfails with a syntax error. If I remove thedescit works, so I just removed that column altogether since then it's the same as the first. - Neither
SAMPLE BYnorGROUP BYat the end works with "base query does not provide dedicated TIMESTAMP column" and "group by expression does not match anything select in statement" errors, respectively. - QuestDB recognizes the
CASE WHEN THENbut if I yield a column name in theTHENportion it doesn't work with no error message.
I'm currently stuck at this point having looked across the QuestDB tutorials, SO questions, and GitHub issues for a mention of this because it seems like a common operation, especially since they have written many price time-series data tutorials, but none address this.
I'm wondering if anyone has encountered this and has a solution? Thank you!