

{"id":160427,"date":"2022-09-28T10:31:00","date_gmt":"2022-09-28T05:01:00","guid":{"rendered":"https:\/\/www.jigsawacademy.com\/?p=160427"},"modified":"2022-09-30T09:24:22","modified_gmt":"2022-09-30T03:54:22","slug":"53-crucial-hive-interview-questions-with-answers-2022","status":"publish","type":"post","link":"https:\/\/www.jigsawacademy.com\/blogs\/business-analytics\/53-crucial-hive-interview-questions-with-answers-2022\/","title":{"rendered":"53 Crucial Hive Interview Questions With Answers (2022) | UNext Jigsaw"},"content":{"rendered":"\r\n<h2><strong>Introduction<\/strong><\/h2>\r\n\r\n\r\n\r\n<p>Big Data interviews can take place in general lines or concentrate on a specific system or method. This article will focus on the Big Data tool- Apache Hive- frequently used. You get a detailed understanding of questions asked in Big Data interviews by employers connected with Apache Hive after going through this Apache Hive interview questions article.<\/p>\r\n\r\n\r\n\r\n<p>Hadoop is an open-source framework designed to facilitate the storing and processing of large volumes of data. Hive is a data warehouse tool that works in the Hadoop ecosystem to process and summarize the data, making it easier to use. Now that you know what Hive is in the Hadoop ecosystem read on to find out the most common Hive interview questions.<\/p>\r\n\r\n<p><a class=\"all-link\"><img decoding=\"async\" class=\"blog-desk-banner\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2022\/05\/IPBA-02.webp\" alt=\"Desktop Banner\" title=\"\"> <img decoding=\"async\" class=\"blog-mob-banner\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2022\/05\/IPBA-01.webp\" alt=\"Mobile Banner\" title=\"\"><\/a><\/p>\r\n\r\n<h3><b>Apache Hive \u2013 A Brief Introduction<\/b><\/h3>\r\n<p><span style=\"font-weight: 400;\">Apache Hive is a popular data warehouse system. It is built on top of Hadoop and is extensively used for analyzing structured and semi-structured data. It provides an easy and reliable mechanism to project structure onto the data and perform queries written in HQL (Hive Query Language), similar to SQL statements.<\/span><\/p>\r\n<h3><b>Apache Hive Job Trends:<\/b><\/h3>\r\n<p><span style=\"font-weight: 400;\">Most companies today consider Apache Hive their go-to resource for analytics on large data sets. As it also supports SQL-like query statements, it is quite popular amongst professionals from a non \u2013 programming background who are looking forward to working on the Hadoop MapReduce framework.<\/span><\/p>\r\n<h3><b>Hive Interview Questions<\/b><\/h3>\r\n<p><span style=\"font-weight: 400;\">Hive-related questions are most often an integral part of any Data Science related interview. Being prepared to answer the same with confidence helps you build a good image in the eyes of the interviewer and also chart out a successful career. The following <\/span><b>Hive Interview Questions<\/b><span style=\"font-weight: 400;\"> have been specifically curated to help you get acquainted with the nature of questions you might have to answer in the interview. If you are a beginner, the interviewer will be looking out to check how strong your foundation is and might ask you questions related to basic concepts. As your experience increases, so will the difficulty level of the questions, with them becoming more technical and application-oriented.<\/span><\/p>\r\n<p><strong>Hive Interview Questions<\/strong><\/p>\r\n\r\n\r\n\r\n<ol>\r\n<li><strong>What applications are supported by Hive?<\/strong><\/li>\r\n<li><strong>What are the different tables available in Hive?<\/strong><\/li>\r\n<li><strong>What is the difference between external and managed tables?<\/strong><\/li>\r\n<li><strong>Where does the data of a Hive table get stored?<\/strong><\/li>\r\n<li><strong>Can Hive be used in OLTP systems?<\/strong><\/li>\r\n<li><strong>Can a table name be changed in Hive?<\/strong><\/li>\r\n<li><strong>Where is Hive table data stored?<\/strong><\/li>\r\n<li><strong>Can the default location of a managed table be changed in Hive?<\/strong><\/li>\r\n<li><strong>What is a Hive Metastore?<\/strong><\/li>\r\n<li><strong>What are the types of meta stores?<\/strong><\/li>\r\n<li><strong>What is the difference between Local and Remote meta stores?<\/strong><\/li>\r\n<li><strong>What is the default Apache Hive metastore database?<\/strong><\/li>\r\n<li><strong>Can multiple users use one metastore?<\/strong><\/li>\r\n<li><strong>What are the three different modes in which Hive can be operated?<\/strong><\/li>\r\n<li><strong>Is there a data type in Hive to store date information?<\/strong><\/li>\r\n<li><strong>Why is partitioning used in Hive?<\/strong><\/li>\r\n<li><strong>What is dynamic partitioning and when is it used? <\/strong><\/li>\r\n<li><strong>What are the Hive collection data types?<\/strong><\/li>\r\n<li><strong>Is it possible to run UNIX shell commands in Hive?<\/strong><\/li>\r\n<li><strong>Is it possible to execute Hive queries from a script file?<\/strong><\/li>\r\n<li><strong>What is a .hiverc file?<\/strong><\/li>\r\n<li><strong>How can you check if a specific partition exists?<\/strong><\/li>\r\n<li><strong>\u00a0If you had to list all databases that began with the letter \u2018c\u2019, how would you do it?<\/strong><\/li>\r\n<li><strong>Is it possible to delete DBPROPERTY in Hive?<\/strong><\/li>\r\n<li><strong>Which Java class handles the input record encoding into files that store Hive tables?<\/strong><\/li>\r\n<li><strong>Which Java class handles output record encoding into Hive query files?<\/strong><\/li>\r\n<li><strong>When a Hive table partition is pointed to a new directory, what happens to the data?<\/strong><\/li>\r\n<li><strong>Do you save space in the HDFS by archiving Hive tables?<\/strong><\/li>\r\n<li><strong>How can you stop a partition from being accessed in a query?<\/strong><\/li>\r\n<li><strong>What is a table generating function on Hive?<\/strong><\/li>\r\n<li><strong>Can you avoid MapReduce on Hive?<\/strong><\/li>\r\n<li><strong>Can a Cartesian join be created between two Hive tables?<\/strong><\/li>\r\n<li><strong>What is a view in Hive?<\/strong><\/li>\r\n<li><strong>Can the name of a view be the same as a Hive table name?<\/strong><\/li>\r\n<li><strong>Can we use the LOAD or INSERT command to view?<\/strong><\/li>\r\n<li><strong>What is indexing in Hive?<\/strong><\/li>\r\n<li><strong>Are multi-line comments supported by Hive?<\/strong><\/li>\r\n<li><strong>How can you view the indexes of a Hive table?<\/strong><\/li>\r\n<li><strong>What is the Hive ObjectInspector function?<\/strong><\/li>\r\n<li><strong>What is bucketing?<\/strong><\/li>\r\n<li><strong>How is bucketing helpful?<\/strong><\/li>\r\n<li><strong>Can you specify the name of the table creator in Hive?<\/strong><\/li>\r\n<li><strong>What is Hcatalog?<\/strong><\/li>\r\n<li><strong>What is UDF in Hive?<\/strong><\/li>\r\n<li><strong>What does \/*streamtable(table_name)*\/ do?\u00a0<\/strong><\/li>\r\n<li><strong>What are the limitations of Hive?<\/strong><\/li>\r\n<li><strong>Why do you need a Hcatolog?<\/strong><\/li>\r\n<li><strong>Name the components of a Hive query processor?<\/strong><\/li>\r\n<li><strong>Why do we need buckets?\u00a0<\/strong><\/li>\r\n<li><strong>How Hive distribute the rows into buckets?<\/strong><\/li>\r\n<li><strong>What will happen in case you have not issued the command: \u00a0\u2018SET hive.enforce.bucketing=true;\u2019\u00a0before bucketing a table in Hive in Apache Hive 0.x or 1.x?\u00a0<\/strong><\/li>\r\n<li><strong>How do ORC format tables help Hive to enhance its performance?<\/strong><\/li>\r\n<li><strong>What are the different components of a Hive architecture?<\/strong><\/li>\r\n<\/ol>\r\n<p>&nbsp;<\/p>\r\n\r\n<h2 class=\"has-vivid-cyan-blue-color has-text-color\">Hive Interview Questions for 2022<\/h2>\r\n\r\n<p><a class=\"all-link\"><img decoding=\"async\" class=\"blog-desk-banner\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2022\/06\/IPBA-03.webp\" alt=\"Desktop Banner\" title=\"\"> <img decoding=\"async\" class=\"blog-mob-banner\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2022\/06\/IPBA_2_Mobile-1.png\" alt=\"Mobile Banner\" title=\"\"><\/a><\/p>\r\n\r\n<p>Here is the comprehensive list of the most commonly asked Hive interview questions. Interview questions on Hive may be direct or application-based.<\/p>\r\n\r\n\r\n\r\n<h2>1. What applications are supported by <strong>Hive<\/strong>?<\/h2>\r\n\r\n\r\n\r\n<p>Hive supports client applications based on Java, PHP, Python, C, and Ruby coding languages.<\/p>\r\n\r\n\r\n\r\n<h2>2. What are the different tables available in <strong>Hive<\/strong>?<\/h2>\r\n\r\n\r\n\r\n<p>There are two types of tables available in Hive &#8211; managed and external.<\/p>\r\n\r\n\r\n\r\n<h2>3. What is the difference between external and managed tables?<\/h2>\r\n\r\n\r\n\r\n<p>While external tables give data control to Hive but not control of a schema, managed tables give both schema and data control.<\/p>\r\n\r\n\r\n\r\n<h2>4. Where does the data of a <strong>Hive<\/strong> table get stored?<\/h2>\r\n\r\n\r\n\r\n<p>The Hive table gets stored in an HDFS directory \u2013 \/user\/hive\/warehouse, by default. You can adjust it by setting the desired directory in the configuration parameter hive.metastore.warehouse.dir in hive-site.xml.<\/p>\r\n\r\n\r\n\r\n<h2>5. Can <strong>Hive<\/strong> be used in OLTP systems?<\/h2>\r\n\r\n\r\n\r\n<p>Since Hive does not support row-level data insertion, it is unsuitable for OLTP systems.<\/p>\r\n\r\n\r\n\r\n<h2>6. Can a table name be changed in <strong>Hive<\/strong>?<\/h2>\r\n\r\n\r\n\r\n<p>Yes, you can change a table name in Hive. You can rename a table name by using: Alter Table table_name RENAME TO new_name.<\/p>\r\n\r\n\r\n\r\n<h2>7. Where is <strong>Hive<\/strong> table data stored?<\/h2>\r\n\r\n\r\n\r\n<p>Hive table data is stored in an HDFS directory by default \u2013 user\/hive\/warehouse. This can be altered.<\/p>\r\n\r\n\r\n\r\n<h2>8. Can the default location of a managed table be changed in <strong>Hive<\/strong>?<\/h2>\r\n\r\n\r\n\r\n<p>Yes, the default managed table location can be changed in Hive by using the LOCATION \u2018&lt;hdfs_path&gt;\u2019 clause.<\/p>\r\n\r\n\r\n\r\n<h2>9. What is a <strong>Hive<\/strong> Metastore?<\/h2>\r\n\r\n\r\n\r\n<p>A Metastore is a relational database that stores the metadata of Hive partitions, tables, databases, and so on.<\/p>\r\n\r\n\r\n\r\n<h2>10. What are the types of meta-stores?<\/h2>\r\n\r\n<p><a class=\"all-link\"><img decoding=\"async\" class=\"blog-desk-banner\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2022\/05\/IPBA-02.webp\" alt=\"Desktop Banner\" title=\"\"> <img decoding=\"async\" class=\"blog-mob-banner\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2022\/05\/IPBA-01.webp\" alt=\"Mobile Banner\" title=\"\"><\/a><\/p>\r\n\r\n<p>Local and Remote meta stores are the two types of Hive meta stores.<\/p>\r\n\r\n\r\n\r\n<h2>11. What is the difference between Local and Remote meta stores?<\/h2>\r\n\r\n\r\n\r\n<p>Local meta stores run on the same Java Virtual Machine (JVM) as the <strong>Hive<\/strong> service, whereas remote meta stores run on a separate, distinct JVM.<\/p>\r\n\r\n\r\n\r\n<h2>12. What is the default Apache <strong>Hive <\/strong>metastore database?<\/h2>\r\n\r\n\r\n\r\n<p>The default database for metastore is the embedded Derby database provided by Hive, which is backed by the local disk.<\/p>\r\n\r\n\r\n\r\n<h2>13. Can multiple users use one metastore?<\/h2>\r\n\r\n\r\n\r\n<p>No, metastore sharing is not supported by Hive.<\/p>\r\n\r\n\r\n\r\n<h2>14. What are the three different modes in which <strong>Hive<\/strong> can be operated?<\/h2>\r\n\r\n\r\n\r\n<p>The three modes in which Hive can be operated are Local mode, distributed mode, and pseudo-distributed mode.<\/p>\r\n\r\n\r\n\r\n<h2>15. Is there a data type in <strong>Hive<\/strong> to store date information?<\/h2>\r\n\r\n\r\n\r\n<p>The TIMESTAMP data type in Hive stores all data information in java.sql.timestamp format.<\/p>\r\n\r\n\r\n\r\n<h2>16. Why is partitioning used in <strong>Hive<\/strong>?<\/h2>\r\n\r\n\r\n\r\n<p>Partitioning is used in Hive as it allows for the reduction of query latency. Instead of scanning entire tables, only relevant partitions and corresponding datasets are scanned.<\/p>\r\n\r\n\r\n\r\n<h2>17. What is dynamic partitioning, and when is it used?<\/h2>\r\n<p>Dynamic partitioning is the one where the values of the partition column will be known in the runtime, I.e, during loading of data into the Hive table<\/p>\r\n<p>A dynamic partition can be used in the following two cases:<\/p>\r\n<ul>\r\n<li>Loading data from an already existing non-partitioned table to help improve the sampling and thus, decrease the query latency.<\/li>\r\n<li>When the values of the partitions are unknown beforehand and thus, finding the partition values manually from a huge data sets is a tedious task.<\/li>\r\n<\/ul>\r\n<h2>18. What are the <strong>Hive<\/strong> collection data types?<\/h2>\r\n\r\n\r\n\r\n<p>ARRAY, MAP, AND STRUCT are the three Hive collection data types.<\/p>\r\n\r\n\r\n\r\n<h2>19. Is it possible to run UNIX shell commands in <strong>Hive<\/strong>?<\/h2>\r\n\r\n\r\n\r\n<p>Yes, one can run shell commands in Hive by adding a \u2018!\u2019 before the command.<\/p>\r\n\r\n\r\n\r\n<h2>20. Is it possible to execute <strong>Hive <\/strong>queries from a script file?<\/h2>\r\n\r\n\r\n\r\n<p>Yes, one can do so with the help of a source command. For example \u2013 Hive&gt; source \/path\/queryfile.hql<\/p>\r\n\r\n\r\n\r\n<h2>21. What is a .hiverc file?<\/h2>\r\n\r\n\r\n\r\n<p>It is a file consisting of a list of commands that must be run when the Command Line Input is initiated.<\/p>\r\n\r\n\r\n\r\n<h2>22. How can you check if a specific partition exists?<\/h2>\r\n\r\n\r\n\r\n<p>Use the following command: SHOW PARTITIONS table_name PARTITION (partitioned_column=\u2019partition_value\u2019)<\/p>\r\n\r\n\r\n\r\n<h2>23. If you had to list all databases that began with the letter \u2018c\u2019, how would you do it?<\/h2>\r\n\r\n\r\n\r\n<p>By using the following command: SHOW DATABASES LIKE \u2018c.*\u2019<\/p>\r\n\r\n\r\n\r\n<h2>24. Is it possible to delete DBPROPERTY in <strong>Hive<\/strong>?<\/h2>\r\n\r\n\r\n\r\n<p>No, there is no way to delete the DBPROPERTY.<\/p>\r\n\r\n\r\n\r\n<h2>25. Which Java class handles the input record encoding into files that store <strong>Hive <\/strong>tables?<\/h2>\r\n\r\n\r\n\r\n<p>The \u2018org.apache.hadoop.mapred.TextInputFormat\u2019 class.<\/p>\r\n\r\n\r\n\r\n<h2>26. Which Java class handles output record encoding into <strong>Hive <\/strong>query files?<\/h2>\r\n\r\n\r\n\r\n<p>The \u2018org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat\u2019 class.<\/p>\r\n\r\n\r\n\r\n<h2>27. When a <strong>Hive<\/strong> table partition is pointed to a new directory, what happens to the data?<\/h2>\r\n\r\n\r\n\r\n<p>The data remains in the old directory and needs to be transferred manually.\u00a0<\/p>\r\n\r\n\r\n\r\n<h2>28. Do you save space in the HDFS by archiving <strong>Hive<\/strong> tables?<\/h2>\r\n\r\n\r\n\r\n<p>No, archiving Hive tables only helps reduce the number of files that make for easier management of data.<\/p>\r\n\r\n\r\n\r\n<h2>29. How can you stop a partition from being accessed in a query?<\/h2>\r\n\r\n\r\n\r\n<p>Use the ENABLE OFFLINE clause along with the ALTER TABLE command.<\/p>\r\n\r\n\r\n\r\n<h2>30. What is a table generating function on <strong>Hive<\/strong>?<\/h2>\r\n\r\n\r\n\r\n<p>MapReduce is a programming framework that allows Hive to divide large datasets into smaller units and process them parallelly.\u00a0<\/p>\r\n\r\n\r\n\r\n<h2>31. Can you avoid MapReduce on <strong>Hive<\/strong>?<\/h2>\r\n\r\n\r\n\r\n<p>You can make Hive avoid MapReduce to return query results by setting the hive.exec.mode.local.auto property to \u2018true\u2019.<\/p>\r\n\r\n\r\n\r\n<h2>32. Can a Cartesian join be created between two <strong>Hive<\/strong> tables?<\/h2>\r\n\r\n\r\n\r\n<p>This is not possible as it cannot be implemented in MapReduce programming.<\/p>\r\n\r\n\r\n\r\n<h2>33. What is a view in <strong>Hive<\/strong>?<\/h2>\r\n\r\n\r\n\r\n<p>A view is a logical construct that allows search queries to be treated as tables.<\/p>\r\n\r\n\r\n\r\n<h2>34. Can the name of a view be the same as a <strong>Hive<\/strong> table name?<\/h2>\r\n\r\n\r\n\r\n<p>No, the name of the view must always be unique in the database.<\/p>\r\n\r\n\r\n\r\n<h2>35. Can we use the LOAD or INSERT command to view?<\/h2>\r\n\r\n\r\n\r\n<p>No, these commands cannot be used with respect to a view in Hive.<\/p>\r\n\r\n\r\n\r\n<h2>36. What is indexing in <strong>Hive<\/strong>?<\/h2>\r\n\r\n\r\n\r\n<p><strong>Hive <\/strong>indexing is a query optimization technique to reduce the time needed to access a column or a set of columns within a Hive database.<\/p>\r\n\r\n\r\n\r\n<h2>37. Are multi-line comments supported by <strong>Hive<\/strong>?<\/h2>\r\n\r\n\r\n\r\n<p>No, multi-line comments are supported by Hive.<\/p>\r\n\r\n\r\n\r\n<h2>38. How can you view the indexes of a <strong>Hive <\/strong>table?<\/h2>\r\n\r\n\r\n\r\n<p>By using the following command: SHOW INDEX ON table_name<\/p>\r\n\r\n\r\n\r\n<h2>39. What is the <strong>Hive<\/strong> ObjectInspector function?<\/h2>\r\n\r\n\r\n\r\n<p>It helps to analyze the structure of individual columns and rows and provides access to the complex objects that are stored within the database.<\/p>\r\n\r\n\r\n\r\n<h2>40. What is bucketing?<\/h2>\r\n\r\n\r\n\r\n<p>Bucketing is the process of hashing the values in a column into several user-defined buckets which helps avoid over-partitioning.<\/p>\r\n\r\n\r\n\r\n<h2>41. How is bucketing helpful?<\/h2>\r\n\r\n<p><a class=\"all-link\"><img decoding=\"async\" class=\"blog-desk-banner\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2022\/05\/IPBA-02.webp\" alt=\"Desktop Banner\" title=\"\"> <img decoding=\"async\" class=\"blog-mob-banner\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2022\/05\/IPBA-01.webp\" alt=\"Mobile Banner\" title=\"\"><\/a><\/p>\r\n\r\n<p>Bucketing helps optimize the sampling process and shortens the query response time.<\/p>\r\n\r\n\r\n\r\n<h2>42. Can you specify the name of the table creator in <strong>Hive<\/strong>?<\/h2>\r\n\r\n\r\n\r\n<p>Yes, by using the TBLPROPERTIES clause. For example \u2013 TBLPROPERTIES (\u2018creator\u2019= \u2018john\u2019)<\/p>\r\n\r\n\r\n\r\n<h2>43. What is Hcatalog?<\/h2>\r\n\r\n\r\n\r\n<p>Hcatalog is a tool that helps to share data structures with other external systems in the Hadoop ecosystem.<\/p>\r\n\r\n\r\n\r\n<h2>44. What is UDF in <strong>Hive<\/strong>?<\/h2>\r\n\r\n\r\n\r\n<p>UDF is a user-designed function created with a Java program to address a specific function that is not part of the existing Hive functions.<\/p>\r\n\r\n\r\n\r\n<h2>45. What does \/*streamtable(table_name)*\/ do?\u00a0<\/h2>\r\n\r\n\r\n\r\n<p>A query hint allows for a table to be streamed into memory before a query is executed.<\/p>\r\n\r\n\r\n\r\n<h2>46. What are the limitations of <strong>Hive<\/strong>?<\/h2>\r\n\r\n\r\n\r\n<p><strong>Hive<\/strong> has the following limitations:\u00a0<\/p>\r\n\r\n\r\n\r\n<ul>\r\n<li>Real-time queries cannot be executed and it has no row-level support.<\/li>\r\n<li>Hive cannot be used for online transaction processing.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<h2>47. Why do you need a Hcatolog?<\/h2>\r\n\r\n<p><a class=\"all-link\"><img decoding=\"async\" class=\"blog-desk-banner\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2022\/06\/IPBA-03.webp\" alt=\"Desktop Banner\" title=\"\"> <img decoding=\"async\" class=\"blog-mob-banner\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2022\/06\/IPBA_2_Mobile-1.png\" alt=\"Mobile Banner\" title=\"\"><\/a><\/p>\r\n\r\n<p>For sharing Data structures with external systems, Hcatalog is a necessary tool. It offers access to the Hive metastore for reading and writing data in a Hive data warehouse.<\/p>\r\n\r\n\r\n\r\n<h2>48. Name the components of a <strong>Hive<\/strong> query processor?<\/h2>\r\n\r\n\r\n\r\n<p>Following are the components of a Hive query processor:<\/p>\r\n\r\n\r\n\r\n<ul>\r\n<li>Logical Plan of Generation.<\/li>\r\n<li>Physical Plan of Generation.<\/li>\r\n<li>Execution Engine.<\/li>\r\n<li>UDF\u2019s and UDAF.<\/li>\r\n<li>Operators.<\/li>\r\n<li>Optimizer.<\/li>\r\n<li>Parser.<\/li>\r\n<li>Semantic Analyzer.<\/li>\r\n<li>Type Checking.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<h2>49. <b><span data-contrast=\"auto\">Why do we need buckets?<\/span><\/b><span data-ccp-props=\"{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/h2>\r\n<p><span data-contrast=\"auto\">Here are the two main reasons for performing bucketing to a partition:<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\r\n<ul>\r\n<li data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"1\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559683&quot;:0,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" aria-setsize=\"-1\" data-aria-posinset=\"3\" data-aria-level=\"1\"><span data-contrast=\"auto\">A<\/span><b><span data-contrast=\"auto\">\u00a0<\/span><\/b><span data-contrast=\"auto\">map side join<\/span><b><span data-contrast=\"auto\">\u00a0<\/span><\/b><span data-contrast=\"auto\">requires data belonging to a unique join key to be present in the same partition. However, what about those cases where your partition key differs from that of the join key? Therefore, you can perform a map side join by bucketing the table using the join key in such cases.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/li>\r\n<li data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"1\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559683&quot;:0,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" aria-setsize=\"-1\" data-aria-posinset=\"4\" data-aria-level=\"1\"><span data-contrast=\"auto\">Bucketing makes the sampling process more efficient and, thus, allows us to decrease the query time.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/li>\r\n<\/ul>\r\n<h2>50. How does <strong>Hive<\/strong> distribute the rows into buckets?<\/h2>\r\n\r\n\r\n\r\n<p><strong>Hive<\/strong> uses the formula: <em>hash_function (bucketing_column) modulo (num_of_buckets)<\/em> to calculate the row&#8217;s bucket number. Here, hash_function is based on the Data type of the column. The hash_function is for integer data type:<\/p>\r\n\r\n\r\n\r\n<p><em>hash_function (int_type_column)= value of int_type_column<\/em><\/p>\r\n\r\n\r\n\r\n<h2>51. <b><span data-contrast=\"auto\">What will happen in case you have not issued the command: \u00a0<\/span><\/b><b><i><span data-contrast=\"auto\">\u2018SET hive.enforce.bucketing=true;\u2019<\/span><\/i><\/b><b><span data-contrast=\"auto\">\u00a0before bucketing a table in Hive in Apache Hive 0.x or 1.x?<\/span><\/b><span data-ccp-props=\"{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/h2>\r\n<p><span data-contrast=\"auto\">The command: <\/span><i><span data-contrast=\"auto\">\u2018SET hive.enforce.bucketing=true;\u2019<\/span><\/i><span data-contrast=\"auto\">\u00a0allows you to have the correct number of reducer while using \u2018CLUSTER BY\u2019 clause for bucketing a column. In case it\u2019s not done, one may find the number of files generated in the table directory to be unequal to the number of buckets. As an alternative solution, one may also set the number of reducer equal to the number of buckets by using\u00a0<\/span><i><span data-contrast=\"auto\">set mapred.reduce.task = num_bucket<\/span><\/i><span data-contrast=\"auto\">.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\r\n<h2>52. How do ORC format tables help <strong>Hive<\/strong> to enhance its performance?<\/h2>\r\n\r\n\r\n\r\n<p>You can easily store the Hive Data with the ORC (Optimized Row Column) format, which helps to streamline several limitations.<\/p>\r\n\r\n\r\n\r\n<h2>53. What are the different components of a <strong>Hive<\/strong> architecture?<\/h2>\r\n\r\n\r\n\r\n<p>Following are the five components of a Hive Architecture:<\/p>\r\n\r\n\r\n\r\n<ol>\r\n<li><strong>User Interface:<\/strong> It helps the user to send queries to the Hive system and other operations. The user interface provides hive Web UI, Hive Command-Line and Hive HDInsight.<br \/>\r\n\r\n<\/li>\r\n<li><strong>Driver:<\/strong> It designs a session handle for the query, and then the queries are sent to the compiler for the execution plan.<\/li>\r\n<li><strong>Metastore: <\/strong>It contains organized data and information on various warehouse tables and partitions.<\/li>\r\n<li><strong>Compiler: <\/strong>It creates the execution plan for the queries, performs semantic analysis on different query blocks, and generates query expressions.<\/li>\r\n<li><strong>Execution Engine:<\/strong> It implements the execution plans created by the compiler\r\n\r\n<\/li>\r\n<\/ol>\r\n<h4><strong>Conclusion<\/strong><\/h4>\r\n\r\n\r\n\r\n<p>The above-listed Hive interview questions and answers cover most of the important topics under Hive, but this is in no way an exhaustive list. If you are interested in making it big in the world of data and evolving as a Future Leader, you may consider our <a href=\"https:\/\/www.jigsawacademy.com\/integrated-program-in-business-analytics\/\">Integrated Program In Business Analytics<\/a>, a 10-month online program in collaboration with IIM Indore!<\/p>\r\n\r\n\r\n\r\n<h2>Also, Read<\/h2>\r\n\r\n\r\n\r\n<ul>\r\n<li><a class=\"rank-math-link\" href=\"https:\/\/www.jigsawacademy.com\/blogs\/business-analytics\/excel-interview-questions\/\"><strong>Top 50 Excel Interview Questions<\/strong><\/a><\/li>\r\n<\/ul>\r\n","protected":false},"excerpt":{"rendered":"<p>Introduction Big Data interviews can take place in general lines or concentrate on a specific system or method. This article will focus on the Big Data tool- Apache Hive- frequently used. You get a detailed understanding of questions asked in Big Data interviews by employers connected with Apache Hive after going through this Apache Hive [&hellip;]<\/p>\n","protected":false},"author":2640,"featured_media":253376,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1210],"tags":[3263,3250,3261,3262,7230,3258,3260],"form":[10307],"acf":[],"_links":{"self":[{"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/posts\/160427"}],"collection":[{"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/users\/2640"}],"replies":[{"embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/comments?post=160427"}],"version-history":[{"count":13,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/posts\/160427\/revisions"}],"predecessor-version":[{"id":253817,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/posts\/160427\/revisions\/253817"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/media\/253376"}],"wp:attachment":[{"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/media?parent=160427"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/categories?post=160427"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/tags?post=160427"},{"taxonomy":"form","embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/form?post=160427"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}