Solved: Get Elements via using regular expressions in Arbo...

hbestler · ‎Jul 06, 2016

Hi,

we would like to get all '<simpara>'-Elements within an array (as each entry), if the user marks some <simpara>s.

This is for example the marked selection:

<simpara role="klein">This is a test text<emphasis role="underlined">with emphasis elements</emphasis> inside simpara</simpara><simpara>Lorem ipsum</simpara><simpara>Test test test <emphasis role="italic_on">asdfasdf</emphasis>lol omg</simpara><simpara>Test text with footnote inside<footnote><simpara>footnote text</simpara></footnote>lorem lorem</simpara><simpara>Pun<emphasis role="bold_on">asdfasdf</emphasis>kt 6</simpara>

As you can see, in this example there are 5 <simpara>'s which we would like to get in an array like this:

$arr[0] = <simpara role="klein">This is a test text<emphasis role="underlined">with emphasis elements</emphasis> inside simpara</simpara>

$arr[1] = <simpara>Lorem ipsum</simpara>

$arr[2] = <simpara>Test test test <emphasis role="italic_on">asdfasdf</emphasis>lol omg</simpara>

$arr[3] = <simpara>Test text with footnote inside<footnote><simpara>footnote text</simpara></footnote>lorem lorem</simpara>

...

We tried to solve this with a while loop and "index" function to search for '<simpara>' and '</simpara>' to get each simpara, but unfortunately if there are footnotes inside (see red example above), which has also <simpara> inside, this try does not work, because the remaining text is cutted after </simpara>.

Our other try with regular expressions also does not work, although our regex seems to be correct (tested in regexlab):

$res = match($simparas,'<simpara.*?[^footnote>]</simpara>')

message_box($res,0)

We get no result if we use this acl. We think, the problem is the '?' (lookahead) command inside our regex. It seems to be, that 'lookahead' does not work in acl-regex?

Can anybody help us? Maybe some of you had the same request or another idea to get the desired result.

Thank you in advance.

Greetings from Germany

hbestler · ‎Jul 08, 2016

Hi Clay,

thank you for your help. Meanwhile we have a solution for that.

With selection_start and selection_end we get the start and end position of the selection. After that we use oid_next and oid_content to get all simpara elements. We use a while loop until selection_end position oid.

View solution in original post

ClayHelberg · ‎Jul 06, 2016

Hi Hubert--

You'll have better luck thinking about this in terms of a document tree, rather than as markup strings. Let the parser worry about matching start tags and end tags, and just focus on elements.

You should be able to get what you want with something like this:

function getSimParas(simparas[]) {

# start with a clean array

delete(simparas);

# create a temp array to store simpara oids

local oids[], o;

# grab all the simparas

xpath_nodeset(oids, "//simpara");

# now iterate over the oids and store the content of each

for (o in oids) {

simparas[o] = oid_content(oids[o], 0x1); # 0x1 flag = include tags

}

Call this function, and the array simparas will be filled with the markup for each of the simpara elements. Note that this will include the nested one inside the footnote (as a separate entry in the array). If you don't want to include the nested ones, change the XPath expression to something like this:

xpath_nodeset(oids, "//simpara[not(ancestor::simpara)]");

If you really want the elements, and not the markup, then it's even easier, you can just return the contents of the oids array. You can omit the final for loop in that case.

--Clay

hbestler · ‎Jul 08, 2016

Hi Clay,

thank you for your help. Meanwhile we have a solution for that.

With selection_start and selection_end we get the start and end position of the selection. After that we use oid_next and oid_content to get all simpara elements. We use a while loop until selection_end position oid.

Get Elements via using regular expressions in Arbortext Editor 6.1

Get Elements via using regular expressions in Arbortext Editor 6.1