Spring Batch – Concepts and interfaces

Recently I came across a very interesting incident which was related to spring batch, where a write skip count was not getting updated properly. I never worked on spring batch and almost knew nothing about it before working on this incident.

So after understanding what the incident is about, I started reading how spring batch works.

Let’s see what spring batch is and how it works.

In general terms “BATCH” is the execution of a series of programs on a computer without manual intervention.

Where batch processing can be used:

  • Data Export
  • Invoice generation
  • Bulk database updates
  • Automated transaction processing
  • Processing digital images

What is Spring Batch: Spring Batch is an open source framework for batch processing. It is a lightweight framework based on top of spring framework.

Features:

  • Transaction management
  • Chunk based processing
  • Start/Stop/Restart
  • Retry/Skip

Let’s first understand the terms which are core to Spring Batch framework.

Batch: execution of a series of jobs

Job: A sequence of one or more steps and associated configuration that belong to the batch job. A job is indented to be executed without interruption.

JobInstance : A uniquely identifiable job run.

JobExecution : A single attempt to run a job. A JobInstance will be considered complete only when JobExecution completes successfully.

Step : A Step is a part of a Job and contains all the necessary information to execute the batch processing actions that are expected to be done at that phase of the job. A Step is a single state within the flow of a job.

StepExecution: An attempt to execute a step. Contains information about commit count and access to the Execution Context.

Job Repositories: Job repositories provides CRUD persistence operations for all job related metadata like the results obtained, their instances, the parameters used for the Jobs executed and the context where the processing runs.

JobLauncher: Responsible for launching jobs with their job parameters.

Batch application can be divided in three main parts:

  • Reading the data (from a database, file system, etc.)
  • Processing the data (filtering, grouping, calculating, validating…)
  • Writing the data (to a database, reporting, distributing…)

There are various reader and writer interfaces provided by spring batch framework.

Key Interfaces are:

  • ItemReader :
  • ItemWriter
  • ItemProcessor

ItermReader : Readers are abstractions responsible of the data retrieval. Here is a list of readers

  • AmqpItemReader
  • AggregateItemReader
  • FlatFileItemReader
  • HibernateCursorItemReader
  • HibernatePagingItemReader
  • IbatisPagingItemReader
  • ItemReaderAdapter
  • JdbcCursorItemReader
  • JdbcPagingItemReader
  • JmsItemReader
  • JpaPagingItemReader
  • ListItemReader
  • MongoItemReader
  • Neo4jItemReader
  • RepositoryItemReader
  • StoredProcedureItemReader
  • StaxEventItemReader

ItemWriter: Writers are abstractions responsible of writing the data to the desired output database or system. Here is a list of writers

  • AbstractItemStreamItemWriter
  • AmqpItemWriter
  • CompositeItemWriter
  • FlatFileItemWriter
  • GemfireItemWriter
  • HibernateItemWriter
  • IbatisBatchItemWriter
  • ItemWriterAdapter
  • JdbcBatchItemWriter
  • JmsItemWriter
  • JpaItemWriter
  • MimeMessageItemWriter
  • MongoItemWriter
  • Neo4jItemWriter
  • StaxEventItemWriter
  • RepositoryItemWriter

ItemProcessor: Processors are in responsible for modifying the data records converting it from the input format to the output desired one. These are optional. Here is a list

  • ValidatingItemProcessor
  • PassThroughItemProcessor
  • ScriptItemProcessor

And many other.

To put it all together this is how it looks

SpringBatch

 

Ok. So now how these batch jobs are processed. There are two ways.

1.Chunk oriented processing:

Chunk oriented processing refers to reading the data one at a time, and creating ‘chunks’ that will be written out, within a transaction boundary. One item is read in from an ItemReader, handed to an ItemProcessor, and aggregated. Once the number of items read equals the commit interval, the entire chunk is written out via the ItemWriter, and then the transaction is committed.

chunk-oriented-processing

Configuring step for chunk oriented processing:

<job id="sampleJob" job-repository="jobRepository">
    <step id="step1">
        <tasklet transaction-manager="transactionManager">
            <chunk reader="itemReader" writer="itemWriter" commit-interval="10"/>
        </tasklet>
    </step>
</job>

 

 

2.Tasklet oriented processing: Sometimes step consists of simple operations consisting of a single task like simple stored procedure call or deleting a file etc.

For such case Tasklet interface is provided.

The Tasklet is a simple interface that has one method, execute, which will be a called repeatedly by the TaskletStep until it either returns RepeatStatus.FINISHED or throws an exception to signal a failure.

Configuration of step as a tasklet:

<step id=”step1″>

<tasklet ref=”myTasklet”/>

</step>       

 

In next article we will see how these concepts are used /applied to create a batch job application.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Leave a comment