Comparative Analysis of Various File Formats in HIVE

File Size:
1.48 MB
Volume 3, Issue 6 (June, 2017)
Publication No:
Ramanjot Kaur, Dr. Raman Chadha
7 x

A file format is an approach in which information is stored or encoded in a computer file. In Hive it generally refers to how records are stored inside the file. As we are dealing with structured data, each record has to be its own structure. How records are encoded in a file defines a file format. These file formats mainly vary between data encoding, compression rate, usage of space and disk I/O. Hive does not verify whether the data that you are loading matches the schema for the table or not. However, it verifies if the file format matches the table definition or not. This research paper will focus on in-depth study and analyzation of various file formats supported by HIVE by identifying the key questions that drive a file format choice.

Tags Associated: Big Data Hadoop HIve