Re: SOLR indexing published content PDFs from 3 di...

boemi · ‎May 29, 2023

Hello Windchill Experts

We have a Windchill 12.0.2.0 environment and we handle 3 CAD systems in it.

All drawings are published as HPGL and PDF (PDFs available under "default" representation under files - see screenshot).

We set up SOLR indexing and it is working so far good with all the DB fields.

PDF indexing is also working for WTdocuments OOTB.

Indexing Creo data could be activated through native data but with the other systems this doesn't help.

We are strugling with the published PDF content - I do not find anything in the documentation.

In a indexed search webinar FELCO mentioned that indexing is also possible for additional CAD systems through published PDFs.

Does anyone can give us a guidance how we can solve that?

Thank you

Marc

buenosroas · ‎Jun 22, 2023

Check that pdf's are set as an indexable MIME type, and if not add it. Check out this in the PTC Help Centre: https://support.ptc.com/help/wnc/r12.1.1.0/en/index.html#page/Windchill_Help_Center/WCSysAdminIndexSearch/WCSysAdminIndexSearchIndexingRules.html#

I'm sure you've found this already, but linking it regardless: https://www.ptc.com/en/support/article/CS178261

If PDF wasn't in your MIME list of indexable file types you will have to run bulk indexing after adding it. Hope any of this helps.

boemi · ‎Jun 22, 2023

Hi buenosroas

Thank you for your feedback

We already have PDF as MIME type included.

Indexing of PDF's included in WT documents is working.

The issue we have is when the PDF is published from a drawing and then attached as a file to the CAD objects. (as shown in the screenshot).

Do you also attach PDF's in the same way?

Regards,
Marc

HelesicPetr · ‎Jun 22, 2023

Hi @boemi

I guess that you use own custom code to attached the PDF from a drawing to a CAD.

In this situation, the SOLR does not know that something has been changed. So you may try to start indexing of that object manually if it is possible.

It is just idea. I never try it but you could write a code to start indexing of that PDF after is attached to the CAD.

PetrH

HelesicPetr · ‎Jun 22, 2023

PS> The Indexing works through queues so you could just add a new entry to it with a correct object.

buenosroas · ‎Jun 22, 2023

We do for NX (which also happens to be the only CAD system we've got for the time being). We publish the pdf through publish rules. Keep in mind though that even if you can see the pdf as part of the representation it is not stored as an object directly, but embedded within the representation for the object. So if you save your representation you get it as zip or jar. If I recall the representation is stored as a pvz.

In our case we extract that pdf through a customization when a drawing is approved and reattach it to the drawing object as a pdf. We can then link to it through another customization so that we may permalink to it since we've set up our server with an alias. I'm trying to think though if there are cases where we need to find the drawing through indexed search from content within that is not represented through the meta data that is stored with the object in Windchill. What kind of data would you like to locate through search that is not also found through attributes? If you can search for the owning object surely you're already where you want to be?

HelesicPetr · ‎Jun 22, 2023

Hi @buenosroas

One of the use case can be that a user would like to search by some dimension from a drawing with others conditions.

I can imagine that requirement.

PetrH

boemi · ‎Jun 22, 2023

Hi @buenosroas

Goal is to have the capability to search also through content on the drawing (notes and so on).

For Creo this is OOTB available from the MIME type - just needed to activate it.

For other CAD systems this does only work if the MIME type is available.

By example for Inventor --> there is no MIME type available (just AutoCAD but that doesn't help)

Some practical examples.

- On old drawings we didn't had a field in the DB for the change numbers.

With indexing I could easily collect all the objects for such a change number.

- Searching all object with a special coating note because the coating is now no more available on the market.

Marc

buenosroas · ‎Jun 22, 2023

I to search for a drawing pdf that has been fed back in as secondary content (attachment) but our indexed search does not return a hit on neither the attachment nor the parent object (which article CS173028 states it should find). With further investigation of the pdf attachment the text on the drawing itself is not selectable like I would expect text to be. Our drawings are converted to PDF through CGM and not HPGL, so our cases are not equal. Sorry that I'm not able to be of much help.

boemi · ‎Jun 22, 2023

Hi @buenosroas

Thank you anyway, it is always helpful to exchange knowledge - this gives good inputs and other viewpoints.

Marc

boemi · ‎Oct 26, 2023

Dear Community

Did anyone had the same use case and can help?

@jfelkins Any chance we could speak together if your company could support with the issue?

Thank you in advance for the help.

Regards.
Marc

mmeadows-3 · ‎Oct 26, 2023

Hi Marc,

I would be the one configuring SOLR for @jfelkins . I have a customer configured similar to what you describe. They generate PDF attachments of MicroStation drawings and SOLR Index Search indexes the PDF attachments.

I'm not 100% confident in the behavior. Through limited experience, this is what I think is happening. Any clarification or corrections would be appreciated.

SOLR is indexing the object if the primary or secondary content file's extension is identified as indexable in the Data Formats (MIME types). SOLR doesn't actually care what the object is (CAD Document or Document).

The behavior needs to be validated. Before adding/enabling the drawing's MIME type for indexing, add the PDF of the drawing as an Attachment and index the object.

If content sensitive search returns the object based on some content in the PDF, then no need to mess with the MIME types.
If the object is not returned, then only the Primary content file's extension is being considered for indexing and we need to add/enable the Primary content's MIME type for indexing.

Note: When a content file is sent to SOLR Server for indexing, SOLR Server needs a Rosetta Stone to understand how to parse and index that particular file type. AutoCAD is understood, but most CAD packages are not. In this scenario, SOLR Server will return empty results on the the primary content file and move on to the PDF attachment. Since the PDF has the relevant information we want indexed, this is good enough for our needs.

Once indexed, Content Sensitive search in Windchill only returns the CAD Document object and doesn't distinguish between the primary and secondary content that was indexed.

The setup...
1. SOLR does not index Representation content.
According to PTC, it only indexes the primary (drawing) and secondary (attachment) content.

https://support.ptc.com/help/wnc/r12.1.1.0/en/index.html#page/Windchill_Help_Center/WCSysAdminIndexSearch/WCSysAdminIndexSearchIndexableTypes.html

We can configure Windchill to upload the Additional File content PDF generated by WVS as an as an Attachment.

https://www.ptc.com/en/support/article/CS348848

2. The PDF Attachment must be indexable.
As @buenosroas mentioned, a PDF of an image doesn't include any indexable text.
Note: In Creo Parametric you can stroke out text to line art to make the text on the PDF look correct (intf2d_out_pdf_stroke_text_font). The native text is still embedded in the PDF, though not displayed. Indexing will SOLR Indexing can't index the line art, but it will index the hidden text.

3. (Optional, based on testing above) SOLR must be told the MIME type should be indexed.
Get the current list of MIME types from your Windchill instance. I don't trust that the Data Formats list hasn't been altered previously.

java wt.content.DataFormatUtil -list>D:\PTC\dataFormats.txt

If the primary content file must be flagged for indexing, we need to add the appropriate CAD drawing file extensions (IDW, SLDDRW, etc.) before those CAD Document drawings will be indexed.

https://www.ptc.com/en/support/article/CS32720

For example:
The system where I have this configuration running indexes Document objects that represented legacy CAD formats (e.g. MicroStation primary content and PDF attachments). I had to add the MicroStation file extension to the MIME formats before those PDFs were indexed.

https://support.ptc.com/help/wnc/r12.0.2.0/en/index.html#page/Windchill_Help_Center%2FWCCG_Serv_DataFormats_AddUpdateDataFormats.html%23

formatName = Microstation
mimeType = image/vnd.dgn
description = DGN file
indexable = true
icon = netmarkets/images/microstation_3.png
extensions = DGN

SOLR Server probably still cannot index the native drawing format (SLDDRW, IDW, etc.). If interested, you can confirm what SOLR can index.

https://www.ptc.com/en/support/article/CS101260

Please let us know if step 3 is necessary. Hope this helps.

mmeadows-3 · ‎Oct 26, 2023

FYI: CAD users can add the PDF attachments during upload to avoid publishing.

https://www.ptc.com/en/support/article/CS353471

https://support.ptc.com/help/windchill/whc/whc_en/index.html#page/Windchill_Help_Center/ProEWCIntegCustSecondaryContentManageAttachSecContent.html

boemi · ‎Oct 26, 2023

Hi mmeadows-3

Thank you for fast response.

Understood so far that only the primary and secondary content are indexable.

Unfortunately we do not store the PDF as a secondary content - ours is stored under the representations (we have additional tools connecting to this location).

I will check with our admins what options we have for the future.

This helped a lot - Thank you so much.

Regards,
Marc

SOLR indexing published content PDFs from 3 different CAD Systems (Creo, Inventor, Solidworks)

SOLR indexing published content PDFs from 3 different CAD Systems (Creo, Inventor, Solidworks)