Spring Batch – Concepts and interfaces

Recently I came across a very interesting incident which was related to spring batch, where a write skip count was not getting updated properly. I never worked on spring batch and almost knew nothing about it before working on this incident.

So after understanding what the incident is about, I started reading how spring batch works.

Let’s see what spring batch is and how it works.

In general terms “BATCH” is the execution of a series of programs on a computer without manual intervention.

Where batch processing can be used:

Data Export
Invoice generation
Bulk database updates
Automated transaction processing
Processing digital images

What is Spring Batch: Spring Batch is an open source framework for batch processing. It is a lightweight framework based on top of spring framework.

Features:

Transaction management
Chunk based processing
Start/Stop/Restart
Retry/Skip

Let’s first understand the terms which are core to Spring Batch framework.

Batch: execution of a series of jobs

Job: A sequence of one or more steps and associated configuration that belong to the batch job. A job is indented to be executed without interruption.

JobInstance : A uniquely identifiable job run.

JobExecution : A single attempt to run a job. A JobInstance will be considered complete only when JobExecution completes successfully.

Step : A Step is a part of a Job and contains all the necessary information to execute the batch processing actions that are expected to be done at that phase of the job. A Step is a single state within the flow of a job.

StepExecution: An attempt to execute a step. Contains information about commit count and access to the Execution Context.

Job Repositories: Job repositories provides CRUD persistence operations for all job related metadata like the results obtained, their instances, the parameters used for the Jobs executed and the context where the processing runs.

JobLauncher: Responsible for launching jobs with their job parameters.

Batch application can be divided in three main parts:

Reading the data (from a database, file system, etc.)
Processing the data (filtering, grouping, calculating, validating…)
Writing the data (to a database, reporting, distributing…)

There are various reader and writer interfaces provided by spring batch framework.

Key Interfaces are:

ItemReader :
ItemWriter
ItemProcessor

ItermReader : Readers are abstractions responsible of the data retrieval. Here is a list of readers

AmqpItemReader
AggregateItemReader
FlatFileItemReader
HibernateCursorItemReader
HibernatePagingItemReader
IbatisPagingItemReader
ItemReaderAdapter
JdbcCursorItemReader
JdbcPagingItemReader
JmsItemReader
JpaPagingItemReader
ListItemReader
MongoItemReader
Neo4jItemReader
RepositoryItemReader
StoredProcedureItemReader
StaxEventItemReader

ItemWriter: Writers are abstractions responsible of writing the data to the desired output database or system. Here is a list of writers

AbstractItemStreamItemWriter
AmqpItemWriter
CompositeItemWriter
FlatFileItemWriter
GemfireItemWriter
HibernateItemWriter
IbatisBatchItemWriter
ItemWriterAdapter
JdbcBatchItemWriter
JmsItemWriter
JpaItemWriter
MimeMessageItemWriter
MongoItemWriter
Neo4jItemWriter
StaxEventItemWriter
RepositoryItemWriter

ItemProcessor: Processors are in responsible for modifying the data records converting it from the input format to the output desired one. These are optional. Here is a list

ValidatingItemProcessor
PassThroughItemProcessor
ScriptItemProcessor

And many other.

To put it all together this is how it looks

Ok. So now how these batch jobs are processed. There are two ways.

1.Chunk oriented processing:

Chunk oriented processing refers to reading the data one at a time, and creating ‘chunks’ that will be written out, within a transaction boundary. One item is read in from an ItemReader, handed to an ItemProcessor, and aggregated. Once the number of items read equals the commit interval, the entire chunk is written out via the ItemWriter, and then the transaction is committed.

Configuring step for chunk oriented processing:

<job id="sampleJob" job-repository="jobRepository">
    <step id="step1">
        <tasklet transaction-manager="transactionManager">
            <chunk reader="itemReader" writer="itemWriter" commit-interval="10"/>
        </tasklet>
    </step>
</job>

2.Tasklet oriented processing: Sometimes step consists of simple operations consisting of a single task like simple stored procedure call or deleting a file etc.

For such case Tasklet interface is provided.

The Tasklet is a simple interface that has one method, execute, which will be a called repeatedly by the TaskletStep until it either returns RepeatStatus.FINISHED or throws an exception to signal a failure.

Configuration of step as a tasklet:

</step>

In next article we will see how these concepts are used /applied to create a batch job application.

TechnoTravey

Technology and Travelling

Spring Batch – Concepts and interfaces

Leave a comment Cancel reply

Share this:

Leave a comment Cancel reply