Monday, November 30, 2009

Security

It struck me today that one very important topic in system integration is security, but that this is one that was largely missing from most of our topic presentations and discussions outside of single sign-on & SAML.

I did a couple quick searches through our texts and found an interesting quote in one of our texts:

From "Enterprise Application Integration" (Wiley), Chapter 1:

"In the 1998 FBI/Computer Security Institute Computer Crime and Security Survey, 64 percent of respondents said their enterprises had been successfully attacked. Data modification occurred in 14 percent of the attacks. Quantifiable losses reached $136 million."

 If things were bad in 1998 I would guess that they're worse now (just a gut feeling, not backed up by data in any way!), so I think that we must be aware of security issues if/when we are ever involved in the design, development or use of an integrated system. A few quick google searches show that there are a LOT of resources (or at least articles) covering security related to SOA, SOAP, XML, web portals, etc.

I know that in my own experiences with integration solutions (see my prior posts) I regularly handle rather sensitive personal information. The way that security is handled in these cases is largely handled by encrypting the data prior to sending it, effectively making any security issues an "internal" matter.

I'm curious if anyone has any particular examples of security done well or poorly in an integrated system?

Friday, November 27, 2009

Supply Chain E-Procurement Application: An example of B2B Integration

I work with our company’s enterprise e-procurement solution, which I will refer to as E-Market (not actual name). This is an excellent example of B2B integration because it involves not only the company I work for and the software vendor of the E-Market product but it also involves each of the product vendors that we purchase products from. I will attempt to explain overall how this product works in the real world and who are the important players in the process.

Below I have an architectural diagram of the systems and people that are involved:



(click on image above to enlarge)

End User Experience
I will start first with the end user. They log onto the E-Market application via the web on their local machine. After logging into E-Market the user is able to purchase items from the different product vendor’s websites (i.e. Staples, Dell, Office Depot) through what is known as ‘PunchOuts’. To the end user the user interface for selecting the products from the different product vendor’s websites looks almost the exact same as if the user logged directly into the product vendor’s websites without going through the E-Market application. However the difference is that once the user has selected all of the items and hit submit, instead of that order being sent to the product vendor…all of the information is actually transferred back into the E-Market application for final processing (i.e. selecting payment method, shipping location, approvers of the order if necessary, etc.). After the user has finished entering all the necessary information into E-Market the order is submitted, and the user’s work for the order is done. Then the order is submitted to the product vendor. The End user may speak to the Application Support if they are having issues or may speak with the Supply Chain Department about potential product vendors that can be added as 'PunchOuts'.

Application Support for E-Market Web Server/Database
This person is responsible for the upkeep of the E-Market web server and making sure that the system is always up and running and can provide support to the end user as well. Responsibilities also include applying any patches or updates to the physical systems and applications provided from the software vendor of the E-Market application. The physical systems can either reside on-site at the company who uses the E-Market software or the software vendor of the E-Market application can host the company’s web/database systems. The latter option introduces the possibility of cloud-computing.

Software Vendor of E-Market Application
The software vendor is responsible for the complete software life-cycle of the E-Market product. They are the ones that provide the updates and patches to the ‘Application Support’ of the system to be applied. If the system is hosted then the software vendor will both provide and apply the patches. The software vendor of the E-Market application not only works with the ‘Application Support’ for providing the fix but more importantly they work with the product vendors whose websites are visited by the end users. This is truly where the B2B work is done. The software and product vendors must work together to establish standards that will be used to address communication protocols between the systems (E-market system and product vendor’s systems). Within the procurement purchasing process some of the following standards are used:

XSL – standard for the documents that are sent between the systems (contain purchase information such as description of items ordered, quantity of items ordered, price of items ordered, etc.)
cXML – standard protocol used for the communication of data between the systems.

Product Vendors (i.e. Staples, Office Depot, Dell)
These vendors work with both the vendor of the E-Market Software as well as the Supply Chain department representatives of the company that uses the E-Market Software. The first relationship (product vendors <-> E-Market software vendor) was described just previously. The second relationship (product vendor <-> Supply Chain Department) is primarily in place to agree on which product vendors will be available to the end users. This can be driven by request from end users to add certain product vendors or can come between relationships of the Supply Chain department and product vendors. Interestingly enough this is another point of strong B2B integration. This gives both the product vendor and the company using the E-Market software insight to each other’s processing patterns. This will help understand how much money or how many transactions are occurring in particular with which product vendors and with which specific end users. This is important for both the company (to strike better deals on products) and the product vendor (to understand customer demand), as well as a host of other items that can improve on the overall efficiency of supply chain management.

References:

cXML ORG (Excellent reference to discuss more on the standards/protocols used. This also defines some of the words used above.)
Wikipedia: E-procurement (General background on e-procurement).

Monday, November 23, 2009

A real-world example of integration that needs work.

I've been thinking about integration solutions from the user perspective, since it turns out I use a fair amount of these on a day-to-day basis. Particularly, I've found myself thinking a lot about the importance of finding a good solution, one that not only works properly but also functions well, especially from the user's perspective. I have the...um...pleasure (I use the term very loosely!) of using rather sub-par integration solution on a daily basis at work.

We receive large amounts of data about applicants in digital format every day. The program (which I will simply call "Solution X") can be used to import data in a wide variety of formats such as non-delimited text in a flat file, XML, or delimited text. All of the data inputs for which we use Solution X are non-delimited flat file formats. I think there are 3 we use regularly with this program. Solution X is rather extensible in that you can set up a different profile for each format. I haven't done this part myself, but I'm told you don't need any particular level of expertise and you don't even need to be a programmer (though I'm told it helps). The data ends up in an organization-wide Oracle database.

The program examines the data in 2 "passes", the first of which is almost completely automated: the user loads the program, loads the appropriate profile, loads the file and starts the first pass. The execution time for each record varies depending on the file format complexity and the resources of the computer (a desktop pc) running the program. If there are no errors then the total execution time for the first pass is simply the number of records in the file times the length of time per record. If there are errors then a user may have to dismiss a dialog box, which can stall the first pass. Once the pass is done the records are in one of three states: matched, new, or unmatched. Matched records were determined to belong to an existing applicant record and the input data has been added to their file. A new record was determined to not belong to an existing applicant record and a new record was created for them. An unmatched record couldn't be automatically determined to be either a match or new, so the user must make the final decision. The second pass through the file loads each unmatched record and gives the user the opportunity to decide if the potential matches are in fact matches.

So far so good. Well, sort of.

First we have some practical problems. One is that the program is not very forgiving when the input data changes. Even very small differences can require fairly significant work either by IT staff to change the profile or by the user to manually alter the input data. Unfortunately because of project priorities we end up manually changing the input data on a daily basis.

Another problem is the fact that when errors occur during first passes they can easily halt the entire process. This is really a major issue particularly when very large files are received. For example, yesterday evening we received a file with about 1200 records that takes, on average, about 30 seconds per record on the first pass. That's 10 hours if there are no errors in the first pass. I will often start a first pass on a large file at the end of the day, and sometimes I come back in the morning to find that the program has been displaying an error dialog since record 23.

Also difficult is the fact that a record being flagged as unmatched might not mean that the record couldn't be matched or determined to be new. It could be that there was some problem with the input data that is enough to prevent loading but isn't enough to warrant informing the user. This requires the user to look through cryptic log files and fix the problem manually.

As you can see, Solution X is functional. It does what it needs to do, albeit with some effort, and perhaps irritation, on the user's part.

The bigger problem, at least in my opinion, is that Solution X was obviously not designed for processing large numbers of files/records, or even with efficiency in mind! You cannot, for example, have two people working on the same input file at the same time, nor can you run multiple instances of the program at the same time (e.g. to work on two different types of files simultaneously). The only way to reduce that 10 hour initial pass time is to manually split the input file up and then run multiple instances of the program on different computers. This means that to get the first pass of that file done by the end of the day we need to dedicate at least two computers to the job.

The biggest problem is that everything is started by the user. The user must load the program, the profile, the file and then start the process. Because the program can get stuck rather easily the user has to check in to make sure that the "automatic" part is even running! For larger volumes like this it seems that we need a solution that runs as a daemon and takes care of the "easy" stuff itself...

The following is what I think Solution X should have been to begin with. Let's call it Solution Z. Solution Z is a two-part program utilizing a server-side daemon and a front-end GUI application. The server side daemon continually monitors a drop folder. When new data is available it is automatically retrieved, decrypted and put into the drop folder. Sometime in the next minute the daemon automatically loads the file and initiates the matching/loading process. As with Solution X, those files that are clearly matches or clearly new are loaded automatically. Records that are flagged as "unmatched" or with malformed data are placed in a queue. The user can load the GUI app to pull unmatched or malformed records from the queue as time permits. For each record that is pulled the user with make the decision about matching and/or correct any malformed data. Once these operations are performed the records are resubmitted to the daemon for loading. The use of queues for the error corrections means that multiple users can work on a single input batch at the same time. The use of a server-side daemon for loading allows for greater automation and parallelism in the processing of an input file. For high priority records you could load the GUI app right away and pull from the queue frequently in order to complete the whole batch as soon as possible. For lower priority records you could wait for the queue to build up. If the server-side daemon is run as a job on a server (as opposed to running the "server" process on a desktop workstation) you would likely also see a significant decrease in processing time per record. Even if this doesn't change appreciably you would almost certainly benefit significantly from the fact that you don't have to wait for a human to start the "automatic" first pass.

Thoughts?