Community Tip - Stay updated on what is happening on the PTC Community by subscribing to PTC Community Announcements. X
Parquet Data Format used in ThingWorx Analytics
Starting ThingWorx Analytics Version 8.1 Data storage will no longer require the installation of a PostgreSQL database. Instead, uploaded CSV data is converted to the optimized Apache Parquet format and stored directly in the file system.
This Blog explains some the features of Apache Parquet justifying this transition in ThingWorx Analytics Data Storage.
features
What is Apache Parquet:
Apache Parquet is a column-oriented data store of the Apache Hadoop ecosystem. It is compatible with most of the data processing frameworks in the Hadoop environment. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk.
Below is an illustration of the Columnar Storage model:
Apache Parquet Features and Benefits:
Apache Parquet is implemented using the record shredding and assembly algorithm taking into account the complex data structures that can be used to store the data.
Apache Parquet stores data where the values in each column are physically stored in contiguous memory locations. Due to the columnar storage, Apache Parquet provides the following benefits:
Some advantages of using Parquet for ThingWorx Analytics:
Apart from the above benefits of using Parquet which amount to higher efficiency and increased performance, below are some advantages that apply specifically to ThingWorx Analytics
The illustration below shows the transition from Row-based Data Storage model VS the columnar based Storage of Parquet