0

I am trying to achieve this scenario where I can split a gz file which is on local linux and upload to hdfs as part unzipped or zipped files without writing the split output to disk. I am having issues after trying this below command.

below command writes to local disk then I can upload to hdfs which I don't want :-

zcat ./file.txt.gz | tail  -n +2 | split -l 20 - file.part 


hdfs dfs -copyFromLocal ./*file.part* /folder/in/hdfs/

I want something like this is it achievable ?:-

zcat ./file.txt.gz | tail  -n +2 | split -l 20 | gzip -d | hdfs dfs -put - /folder/in/hdfs/file.part
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255

1 Answers1

0

You can avoid split and do the splitting on your own:

number_of_files=5 # for you to determine
zcat ./file.txt.gz | for((i=0;i<5;i++)); do
    head -n 20 | hdfs dfs -put - /folder/in/hdfs/file.part_$i
done
Hauke Laging
  • 90,279