Community Tip - Visit the PTCooler (the community lounge) to get to know your fellow community members and check out some of Dale's Friday Humor posts! X
Objective is to remove a specific string with special characters from a file programmatically.
The problem experienced is, everything leading up to the forward slash I can remove, as soon as i include a forward slash in the $stringToRemove variable, the program fails to perform a substitution with the value "test". Ultimately i will replace "test" with an empty space. Lack of debugging tools makes this a bit tricky to solve.
# $stringToRemove = '<!string1 % string2 "string3//string4/string5"> %string6; '
$Double_Quote_Symbol = chr(34)
$Percent_Symbol = chr(37)
$Forward_Slash_Symbol = chr(47)
$stringToRemove = "string1 " . $Percent_Symbol . "string2" . $Double_Quote_Symbol . "string3//string4/string5"
execute("substitute -a -c -noe -ws -noq /" . $stringToRemove . "/test")
Note: this is performed on an XML file opened as a text file using "edit -untagged"
I've attempted:
1) using the decimal value of a forward slash to ensure it is recognized as a string value rather than a built in special function using chr().
2) using the quote() function in attempt to return the string value rather than built in functions reading the forward slash in a string incorrectly. Unsuccessful.
$stringToRemove = quote('<!string1 % string2 "string3//string4/string5"> %string6;')
3) building the string piece by piece and escaping the forward slash with a backslash which did not work for me. example:
$stringToRemove = $string1 . $string2 . $string 3 . "\/\/" . $string4
I noticed in the 'substitute' docs that ACL may perform essentially a tag balancing check. Does ACL interpret this forward slash in a string and expecting it to be the end tag and ffails because of this? All I hope to accomplish is to ensure this string can be programmatically removed.
Hello again. It's not 100% clear what you're trying to achieve, is it that you're looking to rename some elements across a whole bunch of files? If so then the "Arbortext way" is not to use this sort of script. You would either use oid_XXX functions to implement a recursive tree walker algorithm or a simple XSLT to apply the markup transformation.
If you're just looking to do a string replace then this works to replace all ABC/XYZ with ZZZ: subs -a -c -ws "ABC/XYZ"ZZZ"
If you want to do some basic markup replacement then this works to rename <abbrev> tags to <acronym>, but again is not really the Arbortext way: subs -a -c -ws -m "<abbrev>(.*)</abbrev>"acronym>\1</acronym>"
@GarethOakes You have been a life saver with all the help thank you.
so in conclusion to this particular issue, I noticed instead of Forward Slashes you used Double Quotes as a means to tell the ACL command the oldtext and newtext parameters.
The ah-ha moment came and noticed because the execute function interprets the string which is passed in, there was a conflict with the string i wanted to replace and the ACL command syntax passed into the execution function.
The string i wanted to delete is the Entity Tag up to the ending semi colon. (format is exact, but due to work I am replacing the sensitive info):
<!DOCTYPE pm
<!ENTITY % ISOEntities PUBLIC "aaaa//bbbb//cccc//dddd" "eeee//ffff/gggg/hhhh/iiii"> %ISOEntities;
]
so it may very well be that I should be utilizing the oid functions to iterate over the DOCType tag. It appears to be holding a Data Collection assumed by the square brackets.
Its observable now, if there are multiple items within the DOCTYPE pm brackets and I simply substitute a line with an empty space there could be a problem with the formatting of the XML in that particular collection. Im not certain if this will be an issue exactly but will be experimenting with it next!
I just thought converting the file to text to remove this single string in all files would be the quickest way to be value added to the operation. Only about a month into ACL so still learning!
this is the resulting successful solution.
$stringToRemove_ISOEntities = '<!ENTITY % ISOEntities PUBLIC " '
$stringToRemove_ISOEntities_2 = 'aaaa//bbbb//cccc//dddd'
$stringToRemove_ISOEntities_3 = '" "eeee//ffff/gggg/hhhh/iiii"'
$stringToRemove_ISOEntities_4 = '"> %ISOEntities;'
execute('substitute -a -c -ws -m "' . $stringToRemove_ISOEntities . '"' . 'test1-"' )
execute('substitute -a -c -ws -m /' . $stringToRemove_ISOEntities_2 . '/' . 'test2-' )
execute('substitute -a -c -ws -m "' . $stringToRemove_ISOEntities_3 . '"' . 'test3-"')
execute('substitute -a -c -ws -m /' . $stringToRemove_ISOEntities_4 . '/' . 'test4-' )
output:
<!DOCTYPE pm [test1-test2-test3-test4-
]>
Ah OK the DOCTYPE is particularly awkward, as it's not a regular tag or PI. Entities in particular are a bit of a pain with XML because they are replaced during parsing so you don't really get access to them with most XML APIs. I think Arbortext has a way to deal with it programmatically but I've not tried. If your search and replace works across your data set then that is probably easiest! I guess the edit -untagged trick would be enough to get around most of the issues so you're probably pretty safe. Glad to have helped!