BUSINESS OBJECT DATA SERVICES
Architecture:
It is a three
tier architecture which consists of user interfaces like data services
designer, management console and the application tier consists of job server
and finally we will have local and central repositories where all the objects
of systems will be stored. Now let us go into each component individually.
Data Services Designer:
It is the place where we develop, validate and execute the ETL jobs.
We can connect to the central repository only through designer.
We
can import and export the jobs to file or repository from designer as well.
Local
Repository:
->It contains all the user defined objects like projects, jobs, workflows, data flows etc.,
->It also contains the system defined objects like different
transforms that are used to manipulate the source data as per the business
rules.
->The connection to the database server is also created in the
local repository only. We call it as
data store.
->We can create a repository using the repository manager that
comes with the installation.
->The steps to create repository includes
Start-> programs-> BI
Data services-> Repository Manager -> Select type of repository(local or
central or profiler) -> Repository Name-> Connection Name-> Database
credentials-> create.
Central
Repository:
1.
It is also same like local repository. It also stores both the user defined and
pre defined objects. It is used to serve the multi user environment. In case if
it is not a multi user environment, its usage is optional. In case of multi
user environment, when one user is accessing an application, it can’t be
accessed by any other user. We use check-in and check-out mechanism for this
purpose which is similar to locking. We can create a repository using the
repository manager that comes with the installation.
2. The steps to create repository includes
Start-> programs-> BI Data services->
Repository Manager -> Select type of repository(local or central or profiler)
-> Repository Name-> Connection Name-> Database credentials->
create.
Job
Server:
The job server consists of .bat
files (contains jobs) and its work is to start the job engine which executes
jobs.
Job
Engine:
It executes the requested jobs.
Access Server:
This is used for DI Real-time service jobs. This Server controls the XML message passing between the source and target nodes in Real-Time jobs.
Web Server:
This
is used for the administration activities. We can schedule jobs, execute jobs,
creating users etc., are the tasks that can be done through this server.
So
these are all the components that exist in BODS architecture. So when we need
to create an ETL job, we will request the local repository through for the
source data which in turn requests the connected database. Then we request the
required transform. Then we will create the job in designer. Then we save it in
local repository (may be central repository too as per requirement). Then we
execute it through job server and engine. This is how the flow goes on.
Source
or target can be SAP or NON_SAP system.
Objects
Hierarchy:
Now
let us discuss about the object hierarchy. It consists of different objects
like project, job, workflows, scripts, data flows, sources, targets, transforms.
Project:
It
is the highest level of hierarchy in which we create jobs.
Jobs:
Job
is the executable object of BODS.
Work Flow:
It
is an optional object where we create different data flows, scripts.
Script:
External
required logic which is not built through BODS can be developed by writing
scripts.
Data Flow:
This
is the most important object of the hierarchy where we create our mapping.
Transform:
These are the pre-built objects used for developing the required business rules.
This
is all about the object hierarchy of BODS.
Now
let us start from creating data stores.
Connecting
BODS to a SAP system:
This
process starts from SAP BW work bench. The steps includes
1. Open BW work bench
2. Go to modeling
3. Go to source systems
4. Right click on external
systems->click on new source
5. A dialogue box appears
6. Give a name to source system
7. RFC Destination
->DI_SOURCE->program_ID(case sensitive)->save
Now go to data services management console and
follow the steps as below
1. expand sap connections
2. RFC
server interface item
3. Configuration
tab
4. Give
the requested details. You need to give program id as well and it should be
same as given before in the BW work bench.
5. All
the details to be filled here can be filled with the help of BW team.
6. Then
go to status tab
7. Then click on start
Now go to BW work bench and test the connection.
It shows the success or failed status.
Now you can create a data source in BODS and
import the tables from SAP system as well.
Let us see the screen shots of this connection
process.
Connecting to Relational Systems:
We can connect to relational systems directly by
credentials of the database.
Creating Data Store:
Data store is the connection we establish to the database.
We can create this through local object library pane. In the local object pane,
1. Right click on vacant
space then click on new.
2. Then give the name of
data store, type of data store (database or sap or web services etc.,), version
of database in your system, connection name and login credentials of database
schema.
Importing Tables through Data Store:
For importing tables,
1. We just need to right
click on data store name and click on either import by name or import all and
then give the particular schema’s credentials.
2. Then we can see the imported tables in the local object library.
We can also use flat files, xml files, DTDs etc.,
as the sources, and targets as well. We can make excel as only source. We can’t
make it as a target. For using these files as source,
1. Click on format tab on
the bottom of local object library
2. Then
right click->new -> select file type-> give format details like
delimiters etc.,
Then
close the dialogue box.
Template Tables:
These are the tables used when we don’t know how the processed data looks like. So we can attach these tables to ETL transform on fly and can use it as target.
Now Let Us Enter into Creating and Executing A
Job:
We will execute all the ETL mappings only through
job. It is the only executable object of BODS.
For creating any job in BODS, we first need to
have a project created. For creating job,
1. Go to file->new->project
2. Then give the name of the project
3. Then you will be able to see your created
project in both project and local object library panes.
4. If you delete a project, you should remove it
from local object library also as well. Because whenever you create an object,
it will be automatically stored in local repository. So, if you want to delete
that project, you should delete it from local repository also.
5. Now for creating job, right click on
project-> create new job
6. Select Batch or real time job.
7. Now, you can see your created job in project
and local object library panes.
8. Now you can create workflow (optional). For
creating work flow,
I.
Click on work flow icon which you can find at the right end of designer.
ii.
Place it in the mapping pane.
9. Now you can create the data flows. For
creating data flow,
I.
Click on data flow icon which you can find at the right end of designer.
ii.
Place it in the mapping pane.
10. Now, you can develop your ETL mapping. For
that,
I.
Click on the source table which you have imported through data store.
ii. Drag
the table and drop in designer pane.
iii.
Select make it as source.
iv. Click
on the required transform(s) from local object library
v. Drag
it and drop in the designer pane.
vi.
Connect the source table and the transform.
vii. Now
edit the transform as per business requirement.
viii. Click on the target table which you have
imported through data store.
ix. Drag
the table and drop in designer pane.
x.
Connect transform and target table.
xi. Now
we have successfully created the job.
xii. Click
on the validate icon on top of the designer or click on validate option which
you can find in the validation menu.
xiii. Now, right click on job in the project pane
and click on execute.
xiv. Click ok->ok.
xv. Now
you can watch the statistics of the job by clicking on monitor tab and you can
see the errors (if any) in the error tab as well.
xvi. Now you can see the target table for loaded
data.
Exporting Objects to Files:
We can export the created objects like jobs to a file as well. The format of the file is .atl. For exporting the object to repository,
1. Right click on in the local repository.
2. Click on export. Now you can find export pane
in the designer area.
3. Drag the object to the exporting pane.
4. Right click on it and click export again
5. Then give the location at which the .atl file
has to be located.
6. Click ok.
Connecting to central repository:
We can export objects to central repository from designer only. We can’t login to central repository directly. For that, we need to connect to the central repository. The steps to connect with central repository are as follows.
1. Go to tools-> central repository -> give
the requested details->click ok.
Exporting objects to central
repository:
For exporting objects to central repository,
1. Right click on the object to be exported, then
click on add to central repository.
2. Select the repository then ok.
Scheduling Jobs:
We can also schedule, monitor, execute the jobs from “data services management console”.
For scheduling jobs,
1. Open data services management console from
start->programs
2. Login using admin and admin as user name and
password respectively.
3. Then go to batch job which you can find on
your left side.
4. Select a job to be scheduled
5. Click on schedule
6. Then a dialogue box appears
7. Then fill the date and time details for
scheduling.
8. Check the “active” check box.
9. Click execute.
Creating users and groups for
particular repository:
We can also create users and groups from the data services management console. For that,
1. Click on repositories
2. Click on add groups or users
3. Click add
4. Give the user name and password.
Now we have covered creating repositories, data
stores, importing tables, files, creating jobs, connecting to central
repository, exporting objects to file and central repositories, scheduling
jobs, creating users and groups for particular repository.
Now let us see different kinds of transforms
provided by BODS for accomplishing our business needs. There three kinds of
transforms in BODS.
They are
1. Data Integrator Transforms
2. Platform Transforms
3. Data Quality Transforms.
Data integrator transforms includes
1. Data Transfer transform
2. Date Generation transform
3. Effective Date transform
4. Hierarchy Flattening transform
5. Table Comparison transform
6. Key Generation transform
7. History Preservation transform
8. Pivot transform
9. Reverse Pivot Transform
10. XML Pipe Line Transform
1. Data Transfer transform:
•
This is used to transfer the data from source to the target system.
•
It is mainly used when millions of records are being processed.
•
It splits the task into different sub data flows and executes them
serially.
•
Mainly used when there are multiple threads
•
So performance increases.
2. Date Generation transform:
•
It produces dates serially from start date to end date specified by
us.
•
The serial order is may be day wise, month wise, week wise or we can
manually assign using global variables.
3. Effective Date Transform:
•
This transform calculates “effective to” date for records which
contains “Effective Date” field of date type.
•
The “effective to” column is added to target along with source schema
and it gives the date until which each record is valid.
•
These effective date and effective to column are like valid from and
valid to respectively.
•
So we can identify the range of each record’s validity.
4. Hierarchy Flattening Transform:
•
This transform builds a complete hierarchy in target table for the
source data set and describes the hierarchy based on given parent and child
columns.
•
It describes the hierarchy in both horizontal and vertical format
based on our request.
5. Table Comparison transform:
•
It is used for identifying data manipulation operations in the source.
•
It identifies changes in the source and transfers the data to the target
with rows flagged as insert, update or delete as per the change.
•
We can select which column’s changes has to be identified by dragging
the column in compare columns tab in transform editor.
6. Key Generation Transform:
•
It is used to generate keys for the newly inserted rows.
•
This transform identifies the maximum existing key value in the
source, and generates keys by taking that maximum value as starting value.
7. History Preserving Transform:
•
As the name indicating it is used to preserve the history of all the
records.
•
When we update a record in the source, instead of updating the record
in the target, this transform inserts a new row with updated record along with
previous record.
8. Pivot Transform:
•
It is used for converting the selected columns to rows.
•
i.e., for each value in selected column, pivot transform inserts a new
row in target table.
9. Reverse Pivot Transform:
•
It is used for converting the selected rows to columns.
•
It combines data from multiple rows into a single row.
•
It creates new columns to perform above task.
10. XML Pipe Line Transform:
•
It processes large XML files. Instead of taking all the instances at a
time into memory, it takes one instance at a time and processes it and releases
the memory.
•
Next instance is again loaded to same memory location so that memory
consumption is reduced.
Platform transforms includes
1. Case Transform
2. Map Operation Transform
3. Merge Transform
4. Query Transform
5. Row Generation Transform
6. SQL Transform
7. Validation Transform
1. Case
Transform:
•
It is used to split the input rows based on given conditions and route
them to different locations respectively.
•
Rows that do not satisfy any
given condition goes to other specified location labeled as ‘default’.
2. Map Operation Transform:
•
This transform is used to alter the data manipulation operation codes.
•
It takes data manipulation operation codes as input and alters them to
specified operational code like update to insert, update to update etc..
3. Merge Transform:
•
This transform acts like UNION ALL in SQL.
•
It combines all the input data sets that are from different sources
and sends to target.
•
It allows duplicates.
4. Query Transform:
It is very important transform which is
frequently used in jobs.
•
Mapping columns from input to output schema
•
Assigning primary keys to output schema
•
Adding new columns in output schema
•
Performing join operations
•
Applying group by, order by clauses.
•
Removing duplicate rows
5. Row Generation Transform:
•
Produces a data set with a single column. The column values start with
the number that you set in the “Row
number starts at” option. The values then increments by one to a
specified number of rows.
6. SQL Transform:
•
This transform won’t take any input.
•
This is useful when any of the built-in transform cannot perform
required operation.
•
We will write the SQL query manually through this transform.
7. Validation Transform:
•
This is used to validate the input rows based on specific criteria.
•
The rows that satisfies the required criteria are routed to the target
labeled as “pass”.
•
Those rows which do not satisfy the required criteria are routed to
the target labeled as “fail”.
Data quality transforms are used for parsing,
standardizing and correcting the data. We have different transforms like
address cleansing, match, country code, geocode etc.,