cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - Your Friends List is a way to easily have access to the community members that you interact with the most! X

Get Elements via using regular expressions in Arbortext Editor 6.1

hbestler
12-Amethyst

Get Elements via using regular expressions in Arbortext Editor 6.1

Hi,

we would like to get all '<simpara>'-Elements within an array (as each entry), if the user marks some <simpara>s.

This is for example the marked selection:

<simpara role="klein">This is a test text<emphasis role="underlined">with emphasis elements</emphasis> inside simpara</simpara><simpara>Lorem ipsum</simpara><simpara>Test test test <emphasis role="italic_on">asdfasdf</emphasis>lol omg</simpara><simpara>Test text with footnote inside<footnote><simpara>footnote text</simpara></footnote>lorem lorem</simpara><simpara>Pun<emphasis role="bold_on">asdfasdf</emphasis>kt 6</simpara>

As you can see, in this example there are 5 <simpara>'s which we would like to get in an array like this:

$arr[0] = <simpara role="klein">This is a test text<emphasis role="underlined">with emphasis elements</emphasis> inside simpara</simpara>

$arr[1] = <simpara>Lorem ipsum</simpara>

$arr[2] = <simpara>Test test test <emphasis role="italic_on">asdfasdf</emphasis>lol omg</simpara>

$arr[3] = <simpara>Test text with footnote inside<footnote><simpara>footnote text</simpara></footnote>lorem lorem</simpara>

...

We tried to solve this with a while loop and "index" function to search for '<simpara>' and '</simpara>' to get each simpara, but unfortunately if there are footnotes inside (see red example above), which has also <simpara> inside, this try does not work, because the remaining text is cutted after </simpara>.

Our other try with regular expressions also does not work, although our regex seems to be correct (tested in regexlab):

$res = match($simparas,'<simpara.*?[^footnote>]</simpara>')

message_box($res,0)

We get no result if we use this acl. We think, the problem is the '?' (lookahead) command inside our regex. It seems to be, that 'lookahead' does not work in acl-regex?

Can anybody help us? Maybe some of you had the same request or another idea to get the desired result.

Thank you in advance.

Greetings from Germany

ACCEPTED SOLUTION

Accepted Solutions

Hi Clay,

thank you for your help. Meanwhile we have a solution for that.

With selection_start and selection_end we get the start and end position of the selection. After that we use oid_next and oid_content to get all simpara elements. We use a while loop until selection_end position oid.

View solution in original post

2 REPLIES 2

Hi Hubert--

You'll have better luck thinking about this in terms of a document tree, rather than as markup strings. Let the parser worry about matching start tags and end tags, and just focus on elements.

You should be able to get what you want with something like this:

function getSimParas(simparas[]) {

     # start with a clean array

     delete(simparas);

     # create a temp array to store simpara oids

     local oids[], o;

     # grab all the simparas

     xpath_nodeset(oids, "//simpara");

     # now iterate over the oids and store the content of each

     for (o in oids) {

          simparas[o] = oid_content(oids[o], 0x1); # 0x1 flag = include tags

     }

}


Call this function, and the array simparas will be filled with the markup for each of the simpara elements. Note that this will include the nested one inside the footnote (as a separate entry in the array). If you don't want to include the nested ones, change the XPath expression to something like this:


     xpath_nodeset(oids, "//simpara[not(ancestor::simpara)]");


If you really want the elements, and not the markup, then it's even easier, you can just return the contents of the oids array. You can omit the final for loop in that case.


--Clay

Hi Clay,

thank you for your help. Meanwhile we have a solution for that.

With selection_start and selection_end we get the start and end position of the selection. After that we use oid_next and oid_content to get all simpara elements. We use a while loop until selection_end position oid.

Announcements

Top Tags