cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - Did you get an answer that solved your problem? Please mark it as an Accepted Solution so others with the same problem can find the answer easily. X

Translate the entire conversation x

SOLR crashing during Bulk Indexing

avillanueva
22-Sapphire III

SOLR crashing during Bulk Indexing

I've completed a rehearsal upgrade to 13.0.2.4 and kicked off a full re-index. In the past, this has been a low risk task that churned away for hours until completion. I've index 850K objects out of 1M but was fighting with Solr to keep running. Rate initially was 2200/min which I guess is good. I have not changed any of the default memory settings. System has 8GB of ram and plenty of HD space. I did not see any areas filling up. I checked the obvious logs but could not find any glaring error messages when it shutdown. I just would notice that I would see failed jobs in the Index Administrator and when I checked, the solr process had ended. I restarted it and resumed but would stop for some reason maybe 1/2 hour later. 

When its running, I am able to search and can see Classification Search so I know its functioning. I did see a curious issue with text preview on documents where it showed only name and number. (see https://www.ptc.com/en/support/article/CS249768?source=search) which indicates to reindex again. No problem since this is a rehearsal but still strange. 

 

Looking to see what I should be looking for in the logs as an indicator to why its stopping. I suspect memory issues but did not see any out of memory messages as I would expect. RHEL 9.2 SELinux. Noting issues from previous where I was not able to run this vis systemctl as root (running as another user). I had to start manually as that user from shell each time. 

ACCEPTED SOLUTION

Accepted Solutions
avillanueva
22-Sapphire III
(To:avillanueva)

Closing out thread since it seems to have stabilized and completed Bulk Indexing. No reason why it stopped misbehaving. 

View solution in original post

8 REPLIES 8

I change the SOLR_HEAP in SolrServer/bin/set_env.sh to 8192m in our environment.

However when I upgraded from 12.1.2.13 to 13.0.2.2 I reused the indexed data.

avillanueva
22-Sapphire III
(To:avillanueva)

Well, now it decided to show text preview. Go figure. I made no changes. Running now to complete rest. Let's see if we can make it to the end. 

avillanueva
22-Sapphire III
(To:avillanueva)

Closing out thread since it seems to have stabilized and completed Bulk Indexing. No reason why it stopped misbehaving. 

I opened a case on this while still at NASA. It seemed to be caused only on bulk indexing. The process wouldn't clear the temp files and it would crash the service. Additionally, depending on the size of indexing, the temp files would fill up the drive where your index resides. If you started the service back up, it would properly clear the temp files and resume.

 

PTC Tech said it wasn't reproducible, however it happened in every bulk indexing we did.

avillanueva
22-Sapphire III
(To:jbailey)

Thanks @jbailey I remember you posting about that. I did see files in temp area but I am unsure if I saw it filling up. I will certainly be doing a second rehearsal and will watch for that. I was able to watch the JVM rapidly fill up memory and empty as it was bulk indexing. In your investigation, were you able to see indicators in any logs of evidence of our of space errors? Something I could look for? Now that you are on the other side, maybe you can push to get it resolved. Were you able to change the temp folder location? I am sure I can shift that to a mount with oodles of space just in case. 

"The other side" 😀. I have a feeling I will be hearing that a lot!

 

Our indexing was not super large, so I just poked at it (restarted the service every 6 hours or so) and called it a day once the TS said it wasn't reproducible. Once it ran out of space, it killed the service and I saw the threshold notifications for bulk indexing from the jmx notifications. 

 

I will poke around and see if I can find the errors I ran into.

A good way to see if the temp files are filling up,

  • Figure out how long it takes to index.
  • Look at the drive in 20% increments of time (Should have increased drive usage significantly)
  • Restart the indexing service on your SOLR server
  • Look at drive space usage and compare to the size on disk above.

That might give you an idea if you are running into the same issue we did.

We did the upgrade from version 12.0.2 to 13.0.2 just last weekend, taking advantage of the fact that Monday was a national holiday in Italy.

 

The new server for version 13.0 has 128 GB of RAM versus 64 GB in the previous one, configured with 5 MS, each 10 GB, versus the old one's 4 MS, each 8 GB.

 

So we didn't have the problem of the service crashing, given the sizing of the new server, but we still noticed that something wasn't working because the server monitoring system we use (Icinga) reported that the disk where the Windchill installation is had very little free space.

 

We went to check the reason for this large increase in occupied disk space with an analysis SW and found that it was due to the huge amount of logs written by Solr (150 GB in the new server instead of about 5 Gb in the old one).

 

Marco

Announcements

Top Tags