cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - You can subscribe to a forum, label or individual post and receive email notifications when someone posts a new topic or reply. Learn more! X

Store large CSV multi-column files in a ValueStream

plpt
7-Bedrock

Store large CSV multi-column files in a ValueStream

I want to import a large amount of timestamped data into a ValueStream. The files are CSV with multiples columns and +100k rows (per file). It's seem that CSVParser is the way to go, but it doesn't fit my needs. Indeed, several problems :

  • The result of the parser is a 500k rows infoTable, and infoTables are not suited for such amount of data.
  • I don't know how to archive these data into a ValueStream. According to this article, I can either:
    • save values row by row with SetPropertyValue, but a for loop on 500k rows can not be performed (way too long) and I can not save the timestamp of my historical values.
    • use updatePropertyValues, but I can only import one column and the dataShape don't match with the infoTable returned by CSVPasrer

So, how can I import such a large data history into a ValueStream?

Could a JSON format be easier?

1 ACCEPTED SOLUTION

Accepted Solutions

Hello @plpt,

 

It depends on several factors -- how often you're going to do it, whether you have access to the database and filesystem, etc. There are many possible solutions, here are just few ideas:

 

  1. Unless you need to load such files regularly, the most straightforward approach would be to disable service timeouts (set it to 12 hours, for example) and do what you described above -- use an extension to parse CSV files into a large infotable (increase -Xmx, if needed) and then use AddXXXValueStreamEntry in a loop to populate the stream;
  2. Load CSV data directly into the database (make sure the server is stopped before doing that, update sequences, if needed, and generally speaking be very careful with that approach);
  3. Parse the CSV in command line, e.g. via AWK, use cURL to call AddXXXValueStreamEntry and leave it running overnight. This is probably the safest and easiest thing to do if you don't have full control over your hosting;
  4. Write an extension, that would process the CSV in streaming mode and call those AddXXXValueStreamEntry for you (probably there's some lower-level API you might be able to use to make it even faster);

 

JSON format won't help you much, I'm afraid.

 

Regards,
Constantine

View solution in original post

2 REPLIES 2

Hello @plpt,

 

It depends on several factors -- how often you're going to do it, whether you have access to the database and filesystem, etc. There are many possible solutions, here are just few ideas:

 

  1. Unless you need to load such files regularly, the most straightforward approach would be to disable service timeouts (set it to 12 hours, for example) and do what you described above -- use an extension to parse CSV files into a large infotable (increase -Xmx, if needed) and then use AddXXXValueStreamEntry in a loop to populate the stream;
  2. Load CSV data directly into the database (make sure the server is stopped before doing that, update sequences, if needed, and generally speaking be very careful with that approach);
  3. Parse the CSV in command line, e.g. via AWK, use cURL to call AddXXXValueStreamEntry and leave it running overnight. This is probably the safest and easiest thing to do if you don't have full control over your hosting;
  4. Write an extension, that would process the CSV in streaming mode and call those AddXXXValueStreamEntry for you (probably there's some lower-level API you might be able to use to make it even faster);

 

JSON format won't help you much, I'm afraid.

 

Regards,
Constantine

Thank you Constantine, it's very clear now!

 

Philippe

Top Tags