Solved: Re: Extract information from text data file

GR_13078482 · ‎Apr 01, 2025

I have an ascii text data file that has 3000+ lines of information. I am attempting to extract some values from the first "column" of a group of lines near the end of the file. The lines are bracketed by known labels.

I have attempted to do this using the READFILE command in the following approach:

FileDat := READFILE("path to file", "delimited", 1, [1,1])

startline:= match("Data Starts Here", FileDat) + 1

endline:= match("Data Ends Here", FileDat) -1

data:= READFILE("path to file", "delimited", [startline, endline],[1,1],1)

I have two questions:

1) Is there a way to match the full expression, including spaces, in the data file? Right now, it is not matching the "Data Starts Here" - it is only getting the first word "Data". I can correct this by putting underscores in the expression to be matched, and in the data file, but that is a less desirable workaround.

2) I note that the first "READFILE" generates an array (FileDat) that pulls in the first column (or first word or value) from each line in the text file. However, it does not include the blank lines that are in that file. That isn't necessarily a problem, but it does make it so that the startline and endline are based on this "compressed" line numbering. When we get to the second READFILE (where I am trying to extract the actual data I want - which is a set of time points that are the first word/value on the targeted lines), the line numbers from the actual file are used, which include the blank lines. I want to be able to use this for different data files, which do not have predictable line numbering. Is there a way to preserve the full line numbers that include the blank lines in the first READFILE statement, or to be able to reference the compressed line numbering in the second statement?

Werner_E · ‎Apr 01, 2025

As a workaround and to avoid having to write a parser using the data extracted by READBIN you can detect the correct line numbers via READCSV and the use READFILE as you did.

I replaced the underlines in your data file by spaces and it works OK:

This should work as long as there are no commas in your data file.

Another option is to use READTEXT with a delimiter character which sure does not occur in the file (I used @)

Remark: You could also use READFILE to search for the text with spaces if you use "fixed" with a large enough column width, but its of no use because READFILE ignores empty lines when reading in the file and so the line numbers are wrong as you already noticed.

View solution in original post

Werner_E · ‎Apr 01, 2025

You may try to use READPRN.

You may also post here a demo Prime file along with a demo data file and show how what your data file looks like and what you would like to achieve.

It may also be possible to read in the file using READBIN and write a parser in Prime to exctract the desired data.

Using READFILE with "delimited" will make Prime to choose the space as data delimiter which is the reason that the string "Data" is seen as the first data to read in,

Depending on the data file it might also be possible to use READTEXT or READCSV.

GR_13078482 · ‎Apr 01, 2025

Here are some sample files showing what I am attempting to do. The data file has been cut down significantly from my actual application, but it has a representative sample of the information and format that I am using.

I provided notes in the Mathcad sheet with additional information of what I am trying to accomplish.

Thanks!

Werner_E · ‎Apr 01, 2025

As a workaround and to avoid having to write a parser using the data extracted by READBIN you can detect the correct line numbers via READCSV and the use READFILE as you did.

I replaced the underlines in your data file by spaces and it works OK:

This should work as long as there are no commas in your data file.

Another option is to use READTEXT with a delimiter character which sure does not occur in the file (I used @)

Remark: You could also use READFILE to search for the text with spaces if you use "fixed" with a large enough column width, but its of no use because READFILE ignores empty lines when reading in the file and so the line numbers are wrong as you already noticed.

terryhendicott · ‎Apr 01, 2025

Thanks Werner,

I often use a lot of text file input to Prime. I have a small C++ program that deletes blank lines so I can use READTEXT.

Using READCSV first is a better technique thank you.

Cheers

Terry

Werner_E · ‎Apr 01, 2025

Maybe READFILE with "fixed" with a large value for column width is preferable as it should read every line in a string variable and will not choke on commas as READCSV possibly will do.

We can also use READBIN and split at CR(LF) = 13(10).

But doing so we will also respect leading spaces. Sometimes this can be desirable, but sometimes (as in this thread) it can be bothersome.

So searching for correct line number here would require to use a different search strategy, or to provide the correct number of leading spaces in the search string or to modify READLN so that leading (and maybe also trailing) spaces are skipped.