The best way to integrate Solr with as400

Technical 17 June 2014

As I found little information and experiences on this integration topic on the Internet, I decided to share my experience in this area in this post.

As “Mr. interface”, I’ve been working on the integration of as400 (a.k.a. iSeries or System i) with a wide variety of systems (ERP, WMS, e-commerce, card payments etc.;). The last decade, I worked on Solr search and how to integrate Solr with the as400: using the Solr services to perform the full text search for clients in native OS/400 environment. This integration of Solr with as400 is not always straightforward: there is a lot to consider and both worlds “java web-apps” and “OS/400” are quite different, requiring different sets of technical knowledge. And this complexity of the integration doesn’t change when you deploy Solr on the as400 itself or when you deploy it on a different system.

Different middleware solutions are in use by companies to integrate with as400. When that is the companies guideline, this approach can be valid because it combines companies’ knowledge to integrate with the as400 and the use of the Solrj or http interface to communicate with Solr.

However, for smaller implementations, a direct integration between both systems (as400 and Solr) might be simpler and better as avoiding the “3rd player”. The middleware often represents a SPOF (single point of failure) and often requires the involvement of a 3rd group of design and support people.

In this document, we consider the “best” way as the approach that most likely will result in an integrated system that has the lowest TCO, considering initial setup, maintenance and support. Of course, your particular case might deviate from the suggested approach in this document. But, the approach that is simple and standard in terms of knowledge and processes, is likely to be the best.

To achieve our goal to define this “best” way to integrate as400 with Solr, let’s start by looking at what both systems offer and what’s the typical expertise of the technicians that work on these systems.

Required expertise on as400 side

RPG is the popular programming language on as400. But I think it’s safe to assume that other programming languages on as400 could take the approach depicted here to integrate with external systems.

Due to the nature of the as400, programming in as400 is structured and tightly integrated with the DB (the OS is the DB). This makes developing as400 programs “comfortable” when knowing the rules.

The as400 takes care of all complexity involved in developing applications with DB integration. This means that DB integration with iSeries is a daily job that doesn’t cause any headaches… But, this DB oriented approach is not really suitable for our needs: we require application integration.

Application integration is required when we process search requests by an external service (Solr in our case) on an index that requires a NRT (near-real-time) update of the data. We need to find a way to transfer requests and data from an as400 to an external system, taking into account the usual habitat of an as400 RPG programmer.

I believe a lot of technicians would agree that data queues are the obvious choice to meet our requirement for simplicity: traditionally, data queues are quite widely adopted way to communicate data asynchronously between applications within a single as400 or from an as400 to another system. If we can do the job with data queues, we’re fine with the knowledge requirements on the as400 side.

Required expertise on solr side

There are 2 main questions that require to be answered when integrating a search service:

– how to index the data?

– how to process the search requests and responses?

Indexing data

The ‘native’ tool for importing data into the solr is the dataimporthandler. This is a do-it-all solution for importing and mapping the data from different datasources to the solr schema. This tool is sufficient for database integration. So when database integration covers your needs for indexing data, you’re fine. This process handles the old-time transport of “baskets” of data.

Is this approach compatible with todays standard for NRT (near real time) integration? No.

The problem with the dataimporthandler is that it requires to be triggered by a http request to start transporting the “baskets”. This is fine for most operating systems, but this is not really “simple” or “naturally” achieved from within OS/400 environment (as there is no native OS/400 command that performs the same function as “curl” on Linux). So the first bottleneck to solve in our solr-as400 integration is: how to transfer the “trigger” from the as400 to Solr to initiate the dataimporthandler.

Re-using the dataimporthandler makes a lot of sense, but we need an approach in which the external system (the as400 in our case) pushes the data into Solr rather than the traditional dataimporthandler approach where Solr pulls the data from the external system.

As already mentioned, we want to achieve this “trigger” or “push” approach via the entries in the data queues on the as400 side.

Processing search requests and responses

The 2nd question: “how to process the search requests and responses?” suffers from the same bottleneck. As said, the http interface is not “natural” on OS/400 while the Solrj client is java-based so not suitable to run in a multi-user OS/400 native application (as you don’t want to startup a JVM in every user’s session).

Proposed solution

Solr is a platform in which you can easily add components known as requesthandlers. This means that it’s easy to add additional functionality or processes. So we should be able to extend Solr so it can directly integrate with our native as400 application’s data queues.

Proposed solution for index update process

To index the data, we will re-use the dataimporthandler. The functionality of this component is extendable, allowing the configuration of different data-sources and entities. This “plug-in” option is the obvious entry point to talk with the data queue on the as400. The dataimporthandler can wait for an entity record (or “trigger”) from a data queue the same way as it “waits” for a database server to return a row of changed data. The only missing component in the vanilla solr build is the ‘datasource’ and ‘entity’ that allows to read the dataqueue in a solr dataimporthandler compatible way. These java components are not so difficult to build and I’ll provide more details on this in a next post (see also http://wiki.apache.org/solr/DataImportHandler ).

When this component is deployed, the setup of the root-entity in the data-config.xml can be kept very simple and this will capture the data received in the data queue entry:

<document>
<entity name="queue" dataSource="as400-dataqueue" processor="net.sr_sl.solr.handler.dataimport.AS400DataQueueEntityProcessor" AS400DataQueueName="MYQUEUE" AS400DataQueueLibrary="*LIBL" colName="KEY" timeout="120" >
<entity name="mydata" dataSource="jdbc" query="select…where mydata_key='${queue.KEY}' "
...

This setup will receive a queue-entity (as root entity) that will result in the field “queue.KEY” with the content of the data queue entry that was sent from the as400.

This allows for an easy integration from as400 perspective: just prepare the updated data on DB/400 and send the key to that data in a data queue. This key is retrieved via the data queue entry in the dataimporthandler. The dataimporthandler provides the necessary functionality to manipulate the data from the entry and pulls all relevant data into the index. Of course, you could also send all data in the data queue entry when that is a better fit for your use-case.

There is however a further issue to solve: When starting this import process via the dataimporthandler, the data queue will be read until the timeout is reached or 1 or more data queue entries have been received. After building the data to be indexed based on the remaining setup in the data-config, the dataimporthandler will commit the data to the solr index. Then, the dataimporthandler will stop, waiting for the next trigger to startup again.

This is where an additional function is required in the dataimporter: the startup and continuous iteration of the process. Most likely you want this process to start automatically and to continuously read the data-queue for new entries.

In order to achieve the startup, the dataimporthandler scheduling functionality can be used (see the wiki on scheduling the dataimporthandler).

We also need to initiate a new dataimport request when the previous has been committed or timed-out. In order to achieve this, we modified the dataimport request handler in order to support the use of a request parameter ‘infinite’: …/dataimport?command=full-import&config=data-config.xml&infinite=on

With infinite=on, our dataimport request handler will automatically re-iterate the data-import (via data queue reading in our case) when the previous iteration was committed. Unless an “abort” command was received from an administrator. This provides a natural way to start (via scheduling) and stop (via abort) the infinite data-import process.

Proposed solution for index searching process

On the search side, the integration requires a bit more functionality on the as400 side. For the index update process, we only needed to send the data (or key to the data) out into the data queue. For the request/response process involved with a search request, we’ll need to add some complexity: we need to make a synchronous process, also using data queues. This sounds more difficult than it is. We can send a request into a data queue and when we expect a response, we just wait for an answer (within the expected time-out) on another keyed data queue. The key for this data queue entry is sent within the search request and this is reused in the response to populate the answer in the keyed data queue for the response. This “key” value can be a transaction ID or a unique identifier of the job (I like to use jobnumber/user/jobname as unique identifier). The latter also allows a clean-up of the keyed data queue before every new request when something broke up or when the response timed-out in a previous request/response process.

More schematic:

as400 side:
user requests a search

→ data queue entry is sent with 2 elements:

(1) unique job identifier

(2) request parameters for the search

→ wait for reply on keyed data queue entry with key = unique job identifier

→ when data queue entry is received or timed out, process the results

Solr side:

→ wait for data queue entries with search requests

→ when a data queue entry is received, process the search request

→ reply the search results in the keyed data queue with an entry with key = unique job identifier

The as400 code can be implemented in a black box program that handles this request/response process. The solr side code is implemented via a request handler that is started upon the startup of Solr in an infinite way similar to the data import request handler. This request handler just needs to forward the request parameters to the local search request handler and interpret the results before sending the reply back into the keyed data queue.

Conclusion

Although this explanation of the integration approach might not look easy, the depicted approach ensures the simplest, most direct and future-proof integration between native as400 applications and solr. The required steps and components to solve the bottlenecks with as400-solr integration are:

1) on solr side:

– add a jar with the updated request handlers to the Solr instance

– configure the request handler (data-import and search request/response) to use the right data queue

– configure the scheduler to start these request handlers automatically and in an infinite way

2) on as400 side, the problem is reduced to sending/receiving data via data queue entries. No additional components required.

Can this integration be done simpler with less components and as flexible?

The re-use of solr as the server component for running the integration processes, ensures that only the 2 systems at play have to be “known” and maintained. No additional administrative processes are required and the only knowledge that is required is the knowledge of both systems – and this knowledge is anyway required when both systems are in play.

When I feel that there is some interest on this topic, I will elaborate on the solr and rpg source code in a future blog post. In the meanwhile, any questions and suggestions are welcome and will be considered carefully. Also when you want to add some comments or reactions on this blog post, don’t hesitate to contact us.