I have been trying to publish source data to external hive tables which are partitioned. I cannot see an option in run settings to point the data to a particular partition. Please help.
Solved! Go to Solution.
Hi @Sooraj P?,
There is no such option. This should not be necessary, and could lead to data loss if the destination partition was incorrectly specified.
Partitioning always depends on the data: you cannot write data that does not match the partition into that partition.
Suppose you have a year column -- you could use that as a partition column. On the filesystem, that will translate to something like, e.g., /table/year=2020/..., /table/year=2019/..., etc.
So now when you write data to such a table and it is from 2020, it automatically goes into that partition.
The trick here is that the year information is nowhere retained within the table, but only in the partition name.
Does this help?
Regards,
Nathanael
Thank you @Nathanael Kuipers? .