Apache CarbonData 1.4.0 正式发布，多项新功能及性能提升

本文原文：https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=85475081。
Carbondata 1.4.0 下载 Carbondata 官方文档 Carbondata 源码

Apache CarbonData社区很高兴发布1.4.0版本，在社区开发者和用户的共同努力下，1.4.0解决了超过230个JIRA Tickets（新特性和bug修复），欢迎大家试用。

简介

CarbonData是一个高性能的数据解决方案，目标是实现一份数据支持多种分析场景，包括BI分析，即席SQL查询，明细数据分析，流式分析等。CarbonData已经部署在许多企业生产环境中，例如一个规模较大的场景，支持单个表5PB数据(超过10万亿条记录)上明细数据分析，响应时间小于3秒! 下面是1.4.0支持的新特性介绍。

Carbon Core

数据加载性能提升

通过增强入库过程中的IO读写(包括排序临时文件改进，分区排序，免拷贝等)，数据加载性能得到了显着提高。在其中一个生产环境中，与上一版本相比，我们观察到多达300%的改进。

数据Compaction性能提升

通过在Compaction过程中采用数据预取和矢量化读取的改进，CarbonData表上的Compaction执行性能与上一版本相比提高了500%。得益于这个提升，在其中一个生产环境中，可以实现每5分钟的数据加载(数据量为几百GB)的同时达到秒级查询响应，通过设置自动Compaction，系统每隔30分钟和60分钟进行一次Compaction。可以将 carbon.compaction.level.threshold 设置为 "6,2"，减少了Segments的数量，使CarbonData的索引更有效。

DataMap管理

1.4.0中的CREATE DATAMAP语句中引入了新的语法'DEFERRED REBUILD'，这使得用户可以选择DataMap管理机制是自动或手动。在创建DataMap时，如果用户指定了'DEFERRED REBUILD'，系统会默认设置DataMap的状态为不可用，当用户执行REBUILD DATAMAP命令后，系统会触发DataMap的加载，并在查询时使用该DataMap。这使用户可以控制何时加载DataMap，有利于用户控制对资源的使用。相对地，用户也可以不指定'DEFERRED REBUILD', 每当有新的数据加载发生时系统会自动触发所有相关DataMap的加载（与老版本一样）。详细操作请参阅DataMap管理。

外部表

现在您可以通过CREATE TABLE ... LOCATION ...来指定Carbon数据文件的存储位置，这个特性的行为和用户与Hive External Table相同。

支持云存储

您可以使用云存储来建立Carbon外部表，例如将Carbon表存储在AWS S3，华为云OBS等云存储中。

支持在独立应用程序中使用SDK

1.4.0提供了Java SDK，通过使用该SDK，应用程序可以不依赖Hadoop和Spark来创建表格、写入和读取CarbonData文件。例如，用户可以写一个独立的Java程序将现有数据转换为CarbonData文件。，目前，SDK支持把以下格式转换为CarbonData文件，支持写入到本地磁盘或云存储。

CSV数据，Schema由用户指定。
JSON数据，Schema通过Avro对象表达。

针对OLAP场景的增强

支持在Streaming Table里使用预汇聚 (PreAggregate DataMap)

在上个版本中，一个表格不能同时进行流式入库和创建预汇聚表，在1.4.0中去除了这个限制。现在您可以在流式表上创建预聚合表，既缩短了数据从产生到可分析的时间，也可以利用预汇总表来提高查询性能。此特性的实现机制是把一个查询分为两个部分，一部分查询流数据，另一部分查询预聚合数据，最终系统自动合并查询结果。由于预聚合数据比原始数据少得多，所以使查询更快。

预聚合表支持分区

针对分区表，用户创建预汇聚表（preaggregate DataMap）后，预汇聚表会具备相同的分区属性（相同的分区列）。由于此时主表和预汇聚表的分区是Aligned的，因此当您在主表上执行数据管理操作(如创建/删除/覆盖写分区)时，同样的操作将在聚合表上自动完成，使两者保持同步。例如，用户可以创建一个天分区表，每天导入数据到新分区，这样系统也会自动完成对应预汇聚表的新分区导入。

支持物化视图(Alpha功能, MV DataMap)

与1.3.0版中引入的预汇聚表（PreAggregate DataMap）相比，1.4.0中引入了功能更强大的物化视图（MV DataMap），它可以涵盖更多的 OLAP分析场景。用户通过类似的DataMap语句(CTAS)创建，删除，显示物化视图，在查询时系统会根据查询条件和执行成本找到合适的物化视图，将查询语句重写为针对物化视图的查询，提升查询性能。

CarbonData物化视图作为一个长期演进特性，目前支持SPJGH的形式(select-predicate-join-groupby-having)，用户可以创建单表或多表的汇聚表，也可以针对单表只做过滤，不做汇聚。

这个特性目前是Alpha版本，仍存在不完善的地方，不建议用户在生产系统中使用，但我们鼓励所有用户在非生产系统中试用，该特性会在未来版本中逐步改进。

针对明细数据分析的增强

针对高基数列的BloomFilter DataMap(Alpha功能)

为了提升高基数列的过滤效果和查询性能，1.4.0引入了BloomFilter索引。它针对的场景是类似用户名/ID等高基数列上进行精确匹配。在一个与上一版本的对比测试中，我们针对用户名进行过滤查询，发现并发查询性能提高了3~5倍。有关更多详细信息，请参阅 BloomFilter DataMap指南

针对文本检索的Lucene DataMap(Alpha功能)

Lucene是一个高性能全文检索引擎，1.4.0实现了一个基于Lucene的DataMap索引，用户可以创建Lucene DataMap来提高长文本字符串列的模糊匹配查询性能。有关更多详细信息，请参阅 Lucene DataMap指南

支持搜索模式(Alpha功能)

为了提高并发过滤查询性能，CarbonData新增了一种“搜索模式”来执行查询（包含查询调度和执行）。该模式不使用Spark RDD和DAG Scheduler，避免了由于RDD带来的性能开销。在一个与“Spark模式”的对比测试中，“搜索模式”使查询时延降低了一半，从1秒降低到500ms。

其他重要改进

改进了EXPLAIN命令输出，通过EXPLAIN命令，用户可以得知某个查询是否被重写针对预聚合表或物化视图的查询，使用了哪个索引，命中了多少个文件和Blocklet等，可以基于此对物化视图和索引进行调优。
在Carbon Core中增加了性能调优日志，包括输出SQL解析和优化器占用时间，索引过滤信息，Carbon文件IO读取时间，解码Blocklet的数量和时间，向上层引擎填充结果的时间等。参考“enable.query.statistics”配置设置。
并行支持数据加载和Compaction并发执行。
支持将可见和不可见的Segment元数据分隔为两个文件，并在SHOW SEGMENTS命令中显示它们。
支持分区表上的全局排序选项
减少全局排序表中的对象生成，减少GC
对DESC命令进行优化以显示分区表的分区值和位置

1.4.0版本的完整JIRA列表如下。

Sub-task

[CARBONDATA-1522] - 6. Loading aggregation tables for streaming data tables.
[CARBONDATA-1575] - Support large scale data on DataMap
[CARBONDATA-1601] - Add carbon store module
[CARBONDATA-1998] - Support FileReader Java API for file level carbondata
[CARBONDATA-2165] - Remove spark dependency in carbon-hadoop module
[CARBONDATA-2189] - Support add and drop interface
[CARBONDATA-2206] - Integrate lucene as datamap
[CARBONDATA-2247] - Support writing index in CarbonWriter
[CARBONDATA-2294] - Support preaggregate table creation on partition tables
[CARBONDATA-2301] - Support query interface in CarbonStore
[CARBONDATA-2359] - Support applicable load options and table properties for Non Transactional table
[CARBONDATA-2360] - Insert into and Insert Into overwrite support for Non Transactional table
[CARBONDATA-2361] - Refactor Read Committed Scope implementation.
[CARBONDATA-2369] - Add a document for Non Transactional table with SDK writer guide
[CARBONDATA-2388] - Avro Nested Datatype Support
[CARBONDATA-2423] - CarbonReader Support To Read Non Transactional Table
[CARBONDATA-2430] - Reshuffling of Columns given by user in SDK
[CARBONDATA-2433] - Executor OOM because of GC when blocklet pruning is done using Lucene datamap
[CARBONDATA-2443] - Multi Level Complex Type Support for AVRO SDK
[CARBONDATA-2457] - Add converter to get Carbon SDK Schema from Avro schema directly.
[CARBONDATA-2474] - Support Modular Plan
[CARBONDATA-2475] - Support Materialized View query rewrite
[CARBONDATA-2484] - Refactor the datamap code and clear the datamap from executor on table drop

Bug

[CARBONDATA-1114] - Failed to run tests in windows env
[CARBONDATA-1990] - Null values shown when the basic word count example is tried on carbon streaming table
[CARBONDATA-2002] - Streaming segment status is not getting updated to finished or success
[CARBONDATA-2056] - Hadoop Configuration with access key and secret key should be passed while creating InputStream of distributed carbon file.
[CARBONDATA-2080] - Hadoop Conf not propagated from driver to executor in S3
[CARBONDATA-2085] - It's different between load twice and create datamap with load again after load data and create datamap
[CARBONDATA-2130] - Find some Spelling error in CarbonData
[CARBONDATA-2147] - Exception displays while loading data with streaming
[CARBONDATA-2152] - Min function working incorrectly for string type with dictionary include in presto.
[CARBONDATA-2155] - IS NULL not working correctly on string datatype with dictionary_include in presto integration
[CARBONDATA-2161] - Compacted Segment of Streaming Table should update "mergeTo" column
[CARBONDATA-2194] - Exception message is improper when use incorrect bad record action type
[CARBONDATA-2198] - Streaming data to a table with bad_records_action as IGNORE throws ClassCastException
[CARBONDATA-2199] - Exception occurs when change the datatype of measure having sort_column
[CARBONDATA-2207] - TestCase Fails using Hive Metastore
[CARBONDATA-2208] - Pre aggregate datamap creation is failing when count(*) present in query
[CARBONDATA-2209] - Rename table with partitions not working issue and batch_sort and no_sort with partition table issue
[CARBONDATA-2211] - Alter Table Streaming DDL should blocking DDL like other DDL ( All DDL are blocking DDL)
[CARBONDATA-2213] - Wrong version in datamap example module cause compilation failure
[CARBONDATA-2216] - Error in compilation and execution in sdvtest
[CARBONDATA-2217] - nullpointer issue drop partition where column does not exists and clean files issue after second level of compaction
[CARBONDATA-2219] - Add validation for external partition location to use same schema
[CARBONDATA-2221] - Drop table should throw exception when metastore operation failed
[CARBONDATA-2222] - Update the FAQ doc for some mistakes
[CARBONDATA-2229] - Unable to save dataframe as carbontable with specified external database path
[CARBONDATA-2232] - Wrong logic in spilling unsafe pages to disk
[CARBONDATA-2235] - add system configuration to filter datamaps from show tables command
[CARBONDATA-2236] - Add SDV Test Cases for Standard Partition
[CARBONDATA-2237] - Scala Parser failures are accumulated into memory form thread local
[CARBONDATA-2241] - Wrong Query written in Preaggregation Document
[CARBONDATA-2244] - When there are some invisibility INSERT_IN_PROGRESS/INSERT_OVERWRITE_IN_PROGRESS segments on main table, it can not create preaggregate table on it.
[CARBONDATA-2248] - Removing parsers thread local objects after parsing of carbon query
[CARBONDATA-2249] - Not able to query data through presto with local carbondata-store
[CARBONDATA-2261] - Support Set segment command for Streaming Table
[CARBONDATA-2264] - There is error when we create table using CarbonSource
[CARBONDATA-2265] - [DFX]-Load]: Load job fails if 1 folder contains 1000 files
[CARBONDATA-2266] - All Examples are throwing NoSuchElement Exception in current master branch
[CARBONDATA-2274] - Partition table having more than 4 column giving zero record
[CARBONDATA-2275] - Query Failed for 0 byte deletedelta file
[CARBONDATA-2277] - Filter on default values are not working
[CARBONDATA-2287] - Add event to alter partition table
[CARBONDATA-2289] - If carbon merge index is enabled then after IUD operation if some blocks of a segment is deleted, then during query and IUD operation the driver is throwing FileNotFoundException while preparing BlockMetaInfo.
[CARBONDATA-2302] - Fix some bugs when separate visible and invisible segments info into two files
[CARBONDATA-2303] - If dataload is failed for parition table then cleanup is not working.
[CARBONDATA-2307] - OOM when using DataFrame.coalesce
[CARBONDATA-2308] - Compaction should be allow when loading is in progress
[CARBONDATA-2314] - Data mismatch in Pre-Aggregate table after Streaming load due to threadset issue
[CARBONDATA-2319] - carbon_scan_time and carbon_IO_time are incorrect in task statistics
[CARBONDATA-2320] - Fix error in lucene coarse grain datamap suite
[CARBONDATA-2321] - Selecton after a Concurrent Load Failing for Partition columns
[CARBONDATA-2327] - invalid schema name _system shows when executed show schemas in presto
[CARBONDATA-2329] - Non Serializable extra info in session is overwritten by values from thread
[CARBONDATA-2333] - Block insert overwrite on parent table if any of the child tables are not partitioned on the specified partition columns
[CARBONDATA-2335] - Autohandoff is failing when preaggregate is created on streaming table
[CARBONDATA-2337] - Fix duplicately acquiring 'streaming.lock' error when integrating with spark-streaming
[CARBONDATA-2343] - Improper filter resolver cause more filter scan on data that could be skipped
[CARBONDATA-2346] - Dropping partition failing with null error for Partition table with Pre-Aggregate tables
[CARBONDATA-2347] - Fix Functional issues in LuceneDatamap in load and query and make stable
[CARBONDATA-2350] - Fix bugs in minmax datamap example
[CARBONDATA-2364] - Remove useless and time consuming code block
[CARBONDATA-2366] - Concurrent Datamap creation is failing when using hive metastore
[CARBONDATA-2374] - Fix bugs in minmax datamap example
[CARBONDATA-2386] - Query on Pre-Aggregate table is slower
[CARBONDATA-2391] - Thread leak in compaction operation if prefetch is enabled and compaction process is killed
[CARBONDATA-2394] - Setting segments in thread local space but not getting reflected in the driver
[CARBONDATA-2401] - Date and Timestamp options are not working in SDK
[CARBONDATA-2406] - Dictionary Server and Dictionary Client MD5 Validation failed with hive.server2.enable.doAs = true
[CARBONDATA-2408] - Before register to master, the master maybe not finished the start service.
[CARBONDATA-2410] - Error message correction when column value length exceeds 320000 charactor
[CARBONDATA-2413] - After running CarbonWriter, there is null directory about datamap
[CARBONDATA-2417] - SDK writer goes to infinite wait when consumer thread goes dead
[CARBONDATA-2419] - sortColumns Order we are getting wrong as we set for external table is fixed
[CARBONDATA-2426] - IOException after compaction on Pre-Aggregate table on Partition table
[CARBONDATA-2427] - Fix SearchMode Serialization Issue during Load
[CARBONDATA-2431] - Incremental data added after table creation is not reflecting while doing select query.
[CARBONDATA-2432] - BloomFilter DataMap should be contained in carbon assembly jar
[CARBONDATA-2435] - SDK dependency Spark jar
[CARBONDATA-2436] - Block pruning problem post the carbon schema restructure.
[CARBONDATA-2437] - Complex Type data loading is failing is for null values
[CARBONDATA-2438] - Remove spark/hadoop related classes in carbon assembly
[CARBONDATA-2439] - Update guava version for bloom datamap
[CARBONDATA-2440] - In SDK user can not specified the Unsafe memory , so it should take complete from Heap , and it should not be sorted using unsafe.
[CARBONDATA-2441] - Implement distribute interface for bloom datamap
[CARBONDATA-2442] - Reading two sdk writer output with differnt schema should prompt exception
[CARBONDATA-2463] - if two insert operations are running concurrently 1 task fails and causes wrong no of records in select
[CARBONDATA-2464] - Fixed OOM in case of complex type
[CARBONDATA-2465] - Improve the carbondata file reliability in data load when direct hdfs write is enabled
[CARBONDATA-2468] - sortcolumns considers all dimension also if few columns specified for sort_columns prop
[CARBONDATA-2469] - External Table must show its location instead of default store path in describe formatted
[CARBONDATA-2472] - Refactor NonTransactional table code for Index file IO performance
[CARBONDATA-2476] - Fix bug in bloom datamap cache
[CARBONDATA-2477] - No dictionary Complex type with double/date/decimal data type table creation is failing
[CARBONDATA-2479] - Multiple issue in sdk writer and external table flow
[CARBONDATA-2480] - Search mode RuntimeException: Error while resolving filter expression
[CARBONDATA-2486] - set search mode information is not updated in the documentation
[CARBONDATA-2487] - Block filters for lucene with more than one text_match udf
[CARBONDATA-2489] - Fix coverity reported warnings
[CARBONDATA-2492] - Thread leak issue in case of any data load failure
[CARBONDATA-2493] - DataType.equals() failes for complex types
[CARBONDATA-2498] - Change CarbonWriterBuilder interface to take schema while creating writer
[CARBONDATA-2503] - Data write fails if empty value is provided for sort columns in sdk
[CARBONDATA-2520] - datamap writers are not getting closed on task failure
[CARBONDATA-2538] - No exception is thrown if writer path has only lock files
[CARBONDATA-2545] - Fix some spell error in CarbonData
[CARBONDATA-2552] - Fix Data Mismatch for Complex Data type Array of Timestamp with Dictionary Include
[CARBONDATA-2555] - SDK Reader should have isTransactionalTable = false by default, to be inline with SDK writer

New Feature

[CARBONDATA-1516] - Support pre-aggregate tables and timeseries in carbondata
[CARBONDATA-2055] - Support integrating Streaming table with Spark Streaming
[CARBONDATA-2242] - Support materialized view
[CARBONDATA-2253] - Support write JSON/Avro data to carbon files
[CARBONDATA-2262] - Create table should support using carbondata and stored as carbondata
[CARBONDATA-2267] - Implement Reading Of Carbon Partition From Presto
[CARBONDATA-2276] - Support SDK API to read schema in data file and schema file
[CARBONDATA-2278] - Save the datamaps to system folder of warehouse
[CARBONDATA-2291] - Add datamap status and refresh command to sync data manually to datamaps
[CARBONDATA-2296] - Test famework should take the location of local module target folder if not integrtion module
[CARBONDATA-2297] - Support SEARCH_MODE for basic filter query
[CARBONDATA-2312] - Support In Memory catalog
[CARBONDATA-2323] - Distributed search mode using gRPC
[CARBONDATA-2371] - Add Profiler output in EXPLAIN command
[CARBONDATA-2373] - Add bloom filter datamap to support precise query
[CARBONDATA-2378] - Support enable/disable search mode in ThriftServer
[CARBONDATA-2380] - Support visible/invisible datamap for performance tuning
[CARBONDATA-2415] - All DataMap should support REFRESH command
[CARBONDATA-2416] - Index DataMap should support immediate load and deferred load when creating the DataMap

Improvement

[CARBONDATA-1663] - Decouple spark in carbon modules
[CARBONDATA-2018] - Optimization in reading/writing for sort temp row during data loading
[CARBONDATA-2032] - Skip writing final data files to local disk to save disk IO in data loading
[CARBONDATA-2099] - Refactor on query scan process to improve readability
[CARBONDATA-2139] - Optimize CTAS documentation and test case
[CARBONDATA-2140] - Presto Integration - Code Refactoring
[CARBONDATA-2148] - Use Row parser to replace current default parser:CSVStreamParserImp
[CARBONDATA-2159] - Remove carbon-spark dependency for sdk module
[CARBONDATA-2168] - Support global sort on partition tables
[CARBONDATA-2184] - Improve memory reuse for heap memory in `HeapMemoryAllocator`
[CARBONDATA-2187] - Restructure the partition folders as per the standard hive folders
[CARBONDATA-2196] - during stream sometime carbontable is null in executor side
[CARBONDATA-2204] - Access tablestatus file too many times during query
[CARBONDATA-2223] - Adding Listener Support for Partition
[CARBONDATA-2226] - Refactor UT's to remove duplicate test scenarios to improve CI time for PreAggregate create and drop feature
[CARBONDATA-2227] - Add Partition Values and Location information in describe formatted for Standard partition feature
[CARBONDATA-2230] - Add a path into table path to store lock files and delete useless segment lock files before loading
[CARBONDATA-2231] - Refactor FT's to remove duplicate test scenarios to improve CI time for Streaming feature
[CARBONDATA-2234] - Support UTF-8 with BOM encoding in CSVInputFormat
[CARBONDATA-2250] - Reduce massive object generation in global sort
[CARBONDATA-2251] - Refactored sdv failures running on different environment
[CARBONDATA-2254] - Optimize CarbonData documentation
[CARBONDATA-2255] - Should rename the streaming examples to make it easy to understand
[CARBONDATA-2256] - Adding sdv Testcases for SET_Parameter_Dynamically_Feature
[CARBONDATA-2258] - Separate visible and invisible segments info into two files to reduce the size of tablestatus file.
[CARBONDATA-2260] - CarbonThriftServer should support S3 carbon table
[CARBONDATA-2271] - Collect SQL execution information to driver side
[CARBONDATA-2285] - spark integration code refactor
[CARBONDATA-2295] - Add UNSAFE_WORKING_MEMORY_IN_MB as a configuration parameter in presto integration
[CARBONDATA-2298] - Delete segment lock files before update metadata
[CARBONDATA-2299] - Support showing all segment information(include visible and invisible segments)
[CARBONDATA-2304] - Enhance compaction performance by enabling prefetch
[CARBONDATA-2310] - Refactored code to improve Distributable interface
[CARBONDATA-2315] - DataLoad is showing success and failure message in log,when no data is loaded into table during LOAD
[CARBONDATA-2316] - Even though one of the Compaction task failed at executor. All the executor task is showing success in UI and Job fails from driver.
[CARBONDATA-2317] - concurrent datamap with same name and schema creation throws exception
[CARBONDATA-2324] - Support config ExecutorService in search mode
[CARBONDATA-2325] - Page level uncompress and Query performance improvement for Unsafe No Dictionary
[CARBONDATA-2338] - Add example to upload data to S3 by using SDK
[CARBONDATA-2341] - Add CleanUp for Pre-Aggregate table
[CARBONDATA-2353] - Add cache for DataMap schema provider to avoid IO for each read
[CARBONDATA-2357] - Add column name and index mapping in lucene datamap writer
[CARBONDATA-2358] - Dataframe overwrite does not work properly if the table is already created and has deleted segments
[CARBONDATA-2365] - Add QueryExecutor in SearchMode for row-based CarbonRecordReader
[CARBONDATA-2375] - Add CG prune before FG prune
[CARBONDATA-2376] - Improve Lucene datamap performance by eliminating blockid while writing and reading index.
[CARBONDATA-2379] - Support Search mode run in the cluster and fix some error
[CARBONDATA-2381] - Improve compaction performance by filling batch result in columnar format and performing IO at blocklet level
[CARBONDATA-2384] - SDK support write/read data into/from S3
[CARBONDATA-2390] - Refresh Lucene data map for the exists table with data
[CARBONDATA-2392] - Add close method for CarbonReader
[CARBONDATA-2396] - Add CTAS support for using DataSource Syntax
[CARBONDATA-2404] - Add documentation for using carbondata and stored as carbondata
[CARBONDATA-2407] - Removed All Unused Executor BTree code
[CARBONDATA-2414] - Optimize documents for sort_column_bounds
[CARBONDATA-2422] - Search mode Master port should be dynamic
[CARBONDATA-2448] - Adding compacted segments to load and alter events
[CARBONDATA-2454] - Add false positive probability property for bloom filter datamap
[CARBONDATA-2455] - Fix _System Folder creation and lucene AND,OR,NOT Filter fix
[CARBONDATA-2458] - Remove unnecessary TableProvider interface
[CARBONDATA-2459] - Support cache for bloom datamap
[CARBONDATA-2467] - Null is printed in the SDK writer logs for operations logged
[CARBONDATA-2470] - Refactor AlterTableCompactionPostStatusUpdateEvent usage in compaction flow
[CARBONDATA-2473] - Support Materialized View as enhanced Preaggregate DataMap
[CARBONDATA-2494] - Improve Lucene datamap size and performnace.
[CARBONDATA-2495] - Add document for bloomfilter datamap
[CARBONDATA-2496] - Chnage the bloom implementation to hadoop for better performance and compression
[CARBONDATA-2524] - Support create carbonReader with default projection

Test

[CARBONDATA-2073] - Add test cases for pre aggregate table
[CARBONDATA-2257] - Add SDV test cases for Partition with Global Sort
[CARBONDATA-2352] - Add SDV Test Cases for Partition with Pre-Aggregate table
[CARBONDATA-2356] - Adding UT for Lucene DataMap

Task

[CARBONDATA-1827] - Add Support to provide S3 Functionality in Carbondata
[CARBONDATA-1959] - Support compaction on S3 table
[CARBONDATA-1960] - Add example for creating a local table and load CSV data which is stored in S3.
[CARBONDATA-1961] - Support data update/delete on S3 table
[CARBONDATA-2135] - Documentation for Table Comment and Column Comment
[CARBONDATA-2138] - Documentation for HEADER option
[CARBONDATA-2214] - Remove config 'spark.sql.hive.thriftServer.singleSession' from installation-guide.md
[CARBONDATA-2215] - Add the description of Carbon Stream Parser into streaming-guide.md
[CARBONDATA-2259] - Add auto CI for examples
[CARBONDATA-2300] - Add ENABLE_UNSAFE_IN_QUERY_EXECUTION as a configuration parameter in presto integration
[CARBONDATA-2370] - Document for Presto cluster setup for carbondata
[CARBONDATA-2424] - Add documentation for properties of Pre-Aggregate tables
[CARBONDATA-2434] - Add ExternalTableExample and LuceneDataMapExample
[CARBONDATA-2507] - Some properties not validate in CarbonData, like enable.offheap.sort