Skip to main content
1-Visitor
November 4, 2022
Solved

Create parquet file from Thingworx Extension

  • November 4, 2022
  • 1 reply
  • 2280 views

Hello everyone,

 I've created an extension that convert a file from csv to parquet and send it to Event Hub. 

The creation of parquet is based on spark-3.3.0-bin-hadoop2  libraries installed on file system and referred by an enviroment variable. 

This scenario is working well but i wondering if is possible perform the same action loading the spark-3.3.0-bin-hadoop2  as a Thingworx extension. In this case how  can I referred the library in an environment variables ?

Answering question  could solve the problem when Thingworx is installed in a cloud environment. 

 

Thank you

Best answer by VladimirRosu_116627

Is the code you're running with that library referencing any existing Hadoop installation? In other words, does that library needs access to an existing Apache Hadoop installation?

1 reply

19-Tanzanite
November 8, 2022

Is the code you're running with that library referencing any existing Hadoop installation? In other words, does that library needs access to an existing Apache Hadoop installation?

gfontana1-VisitorAuthor
1-Visitor
November 9, 2022

Hello Vladimir,

I've discovered that, in reality, if you include the right library in your project you can avoid to referred the Hadoop installation.

It's a forcing because the Hadoop installation is always searched, when you generate the parquet file, but if it's missing the writer use those jar referred locally; so the extension works well.

Thank you for your answer and have a nice day 

Giorgio Fontana

19-Tanzanite
November 9, 2022

Very good information. So, did you manage to use that library without having any conflict with the built-in ThingWorx JAR libraries? Usually external JAR libraries also have some dependencies, which in turn can conflict with the ThingWorx libraries (but not necessarily). I looked at the library you mentioned and it had 240 JAR libraries in there as dependencies. If it works as an extension this means that all those libraries are already included in ThingWorx, which is a very lucky situation.