Error updating statistics from Admin screen, and system not working actually

We just did our first load of real data, with nearly 4 million patients and about 65 million notes. But… all searches show up empty and emerse claims we have just 2231 patients in the system. PATIENT table and patient index show the right count, but I can see that we get this NPE when it tries to update stats. I think it’s related to the LAST_UPDATED field in SOLR, which we current do NOT have in any of our indexes. Ideas on how to fix this or what might be wrong?

Thanks,
Eric
ERROR: [2020-04-08T11:45:56,556] org.springframework.scheduling.support.TaskUtils$LoggingErrorHandler.handleError()96 Unexpected error occurred in scheduled task.
java.lang.NullPointerException: null
at edu.umich.med.emerse.dto.SolrStatsDTO.(SolrStatsDTO.java:32) ~[emerse-service-4.10.8.jar:?]
at edu.umich.med.emerse.service.search.SolrClientHelper.queryForStats(SolrClientHelper.java:303) ~[emerse-service-4.10.8.jar:?]
at edu.umich.med.emerse.batch2.BatchService.updateIndexStatsViaSolr(BatchService.java:61) ~[emerse-service-4.10.8.jar:?]
at edu.umich.med.emerse.batch2.BatchService$$FastClassBySpringCGLIB$$afa90bc.invoke() ~[emerse-service-4.10.8.jar:?]

More details on this:
I’ve added 75 million notes to our system but it isn’t working just yet. I think I found out why, and have a few questions:

  1. Are we required to have a LAST_UPDATED date field in the documents index? It isn’t listed here: http://project-emerse.org/documentation/integration_guide.html#trueintegrating-documents but I can see that we are getting an NPE when Emerse is looking for that field.
  2. Also, the documentation says that a RPT_DATE field is required, but in the code (EMERSE.DOC_FIELD_EMR_INTENT) it looks like it should instead be called ENCOUNTER_DATE, and that is what I see in the sample documents index. Should I rename it to ENCOUNTER_DATE?
  3. Related to item 2, do I need an ENCOUNTER_DATE for EVERY note in the system? For some I don’t have them, I suppose I could either skip those or put in a fake date.

Yes. It does look like LAST_UPDATED is actually required, at least for the batch process of updating the index statistics. That batch process is responsible for updating the min and max dates of the index as seen in EMERSE when searching without constraining on date. (The date constraints are actually applied to the CLINICAL_DATE / ENCOUNTER_DATE.) Sorry that wasn’t in the documentation as a required field, we can add that.

As for (2), yes, it should match the DOC_FIELD_EMR_INTENT value. If you loaded it as RPT_DATE, you could just change the DOC_FIELD_EMR_INTENT table instead. (That table is intended to make the field names in solr remappable.) (See http://project-emerse.org/documentation/data_guide.html#truedoc_field_emr_intent-table for more details, though I’m guessing you’ve seen that page already.)

As for (3), we assume you do have an encounter date for every note in Solr, but (just from looking at the code), it looks like there’s a chance it’ll work even without them. Such documents should never be returned from any search that constrains on date, but if you leave that filter open, I believe they should appear. However, we rely on that field for sorting, so either they may be out of order, or that part of the system might break. Like I said, we’re expecting all documents to have an encounter date. Setting it as a dummy date should work.

Oh, and if you don’t have LAST_UPDATED at all, you can just map it to your RPT_DATE / ENCOUNTER_DATE in the DOC_FIELD_EMR_INTENT table. Something like:

update DOC_FIELD_EMR_INTENT 
   set SOLR_FIELD_NAME = 'RPT_DATE' 
 where NAME = 'LAST_UPDATED'

Thanks, I was able to get past all that but now have a new issue. I am getting the following exception when I do a search that I know has results. The Lucene code inside of Emerse is unable to find a Document by it’s ID, but I can see that the Document does exists in SOLR as expected when I query for it via the SOLR UI. Note that the ID is a big string originally sourced from Mongodb. Is this just a lag time thing?
Caused by: edu.umich.med.emerse.domain.DocumentNotFoundException: 5e45e1bb5808eee774af32ba
at edu.umich.med.emerse.service.search.LuceneSearcher.getDocumentFragments(LuceneSearcher.java:252) ~[emerse-service-4.10.8.jar:?]

Yes, basically. So, EMERSE both talks to Solr, and reads Solr’s index files directly. So, in all likelihood, EMERSE hasn’t “re-opened” the index files and so it’s reading an old version. (Solr can present a consistent old view of the index kind of like a database presents a single consistent view of tables for transactions.) EMERSE should re-open its indexes during its batch-jobs. You can force it to re-open it by making a GET to:

/springmvc/admin/refresh/indexes

The property to re-configure the batch job is:

task.refreshIndexes.cron=00 30 6 * * ?

(The syntax means it should run at 6:30 AM every morning.)

I’ve found that restarting Tomcat fixes the error, at least for a while. It does pop up again but I’ve also been in the process of loading 75 million notes. That’s going to finish in the next hour or so and after that I’ll refresh everything, etc. and expect I’ll have a good working stable system.
Thanks!