Monday, 28 January 2013



BUSINESS OBJECT DATA SERVICES

Architecture:


It is a three tier architecture which consists of user interfaces like data services designer, management console and the application tier consists of job server and finally we will have local and central repositories where all the objects of systems will be stored. Now let us go into each component individually.
Data Services Designer:

It is the place where we develop, validate and execute the ETL jobs.
We can connect to the central repository only through designer.
We can import and export the jobs to file or repository from designer as well.
Local Repository: 

->It contains all the user defined objects like projects, jobs, workflows, data flows etc.,
->It also contains the system defined objects like different transforms that are used to manipulate the source data as per the business rules.
->The connection to the database server is also created in the local repository only.  We call it as data store.

->We can create a repository using the repository manager that comes with the installation.
->The steps to create repository includes
 Start-> programs-> BI Data services-> Repository Manager -> Select type of repository(local or central or profiler) -> Repository Name-> Connection Name-> Database credentials-> create.

Central Repository:

1. It is also same like local repository. It also stores both the user defined and pre defined objects. It is used to serve the multi user environment. In case if it is not a multi user environment, its usage is optional. In case of multi user environment, when one user is accessing an application, it can’t be accessed by any other user. We use check-in and check-out mechanism for this purpose which is similar to locking. We can create a repository using the repository manager that comes with the installation.
2. The steps to create repository includes
 Start-> programs-> BI Data services-> Repository Manager -> Select type of repository(local or central or profiler) -> Repository Name-> Connection Name-> Database credentials-> create.

Job Server:

The job server consists of .bat files (contains jobs) and its work is to start the job engine which executes jobs.
Job Engine:

It executes the requested jobs.

Access Server:

This is used for DI Real-time service jobs. This Server controls the XML message passing between the source and target nodes in Real-Time jobs.
Web Server:

This is used for the administration activities. We can schedule jobs, execute jobs, creating users etc., are the tasks that can be done through this server.

So these are all the components that exist in BODS architecture. So when we need to create an ETL job, we will request the local repository through for the source data which in turn requests the connected database. Then we request the required transform. Then we will create the job in designer. Then we save it in local repository (may be central repository too as per requirement). Then we execute it through job server and engine. This is how the flow goes on.
Source or target can be SAP or NON_SAP system.

Objects Hierarchy:

Now let us discuss about the object hierarchy. It consists of different objects like project, job, workflows, scripts, data flows, sources, targets, transforms.

Project: 
It is the highest level of hierarchy in which we create jobs.
Jobs: 

Job is the executable object of BODS.
Work Flow:

It is an optional object where we create different data flows, scripts.
Script: 

External required logic which is not built through BODS can be developed by writing scripts.
Data Flow:

This is the most important object of the hierarchy where we create our mapping.
Transform:

These are the pre-built objects used for developing the required business rules.
This is all about the object hierarchy of BODS.
Now let us start from creating data stores.
Connecting BODS to a SAP system: 
This process starts from SAP BW work bench. The steps includes
1. Open BW work bench
2. Go to modeling
3. Go to source systems
4. Right click on external systems->click on new source
5. A dialogue box appears
6. Give a name to source system
7. RFC Destination ->DI_SOURCE->program_ID(case sensitive)->save
Now go to data services management console and follow the steps as below
1. expand sap connections
2. RFC server interface item
3. Configuration tab
4. Give the requested details. You need to give program id as well and it should be same as given before in the BW work bench.
5. All the details to be filled here can be filled with the help of BW team.
6. Then go to status tab
7. Then click on start
Now go to BW work bench and test the connection. It shows the success or failed status.
Now you can create a data source in BODS and import the tables from SAP system as well.
Let us see the screen shots of this connection process.







Connecting to Relational Systems:
We can connect to relational systems directly by credentials of the database.
Creating Data Store:
Data store is the connection we establish to the database. We can create this through local object library pane. In the local object pane, 
1. Right click on vacant space then click on new.
2. Then give the name of data store, type of data store (database or sap or web services etc.,), version of database in your system, connection name and login credentials of database schema.
Importing Tables through Data Store:
For importing tables,
1. We just need to right click on data store name and click on either import by name or import all and then give the particular schema’s credentials.
2. Then we can see the imported tables in the local object library.
We can also use flat files, xml files, DTDs etc., as the sources, and targets as well. We can make excel as only source. We can’t make it as a target. For using these files as source,
1. Click on format tab on the bottom of local object library
2. Then right click->new -> select file type-> give format details like delimiters etc.,
Then close the dialogue box.
Template Tables:

These are the tables used when we don’t know how the processed data looks like. So we can attach these tables to ETL transform on fly and can use it as target.
Now Let Us Enter into Creating and Executing A Job:
We will execute all the ETL mappings only through job. It is the only executable object of BODS.
For creating any job in BODS, we first need to have a project created. For creating job, 
1. Go to file->new->project
2. Then give the name of the project
3. Then you will be able to see your created project in both project and local object library panes.
4. If you delete a project, you should remove it from local object library also as well. Because whenever you create an object, it will be automatically stored in local repository. So, if you want to delete that project, you should delete it from local repository also.
5. Now for creating job, right click on project-> create new job
6. Select Batch or real time job.
7. Now, you can see your created job in project and local object library panes.
8. Now you can create workflow (optional). For creating work flow,
    I. Click on work flow icon which you can find at the right end of designer.
   ii. Place it in the mapping pane.
9. Now you can create the data flows. For creating data flow,
    I. Click on data flow icon which you can find at the right end of designer.
   ii. Place it in the mapping pane.
10. Now, you can develop your ETL mapping. For that,
    I. Click on the source table which you have imported through data store.
   ii. Drag the table and drop in designer pane.
  iii. Select make it as source.
  iv. Click on the required transform(s) from local object library
   v. Drag it and drop in the designer pane.
  vi. Connect the source table and the transform.
 vii. Now edit the transform as per business requirement.
viii. Click on the target table which you have imported through data store.
  ix. Drag the table and drop in designer pane.
   x. Connect transform and target table.
  xi. Now we have successfully created the job.
 xii. Click on the validate icon on top of the designer or click on validate option which you can find in the validation menu.
xiii. Now, right click on job in the project pane and click on execute.
xiv. Click ok->ok.
 xv. Now you can watch the statistics of the job by clicking on monitor tab and you can see the errors (if any) in the error tab as well.
xvi. Now you can see the target table for loaded data.

Exporting Objects to Files:

We can export the created objects like jobs to a file as well. The format of the file is .atl. For exporting the object to repository,
1. Right click on in the local repository.
2. Click on export. Now you can find export pane in the designer area.
3. Drag the object to the exporting pane.
4. Right click on it and click export again
5. Then give the location at which the .atl file has to be located.
6. Click ok.
Connecting to central repository:

We can export objects to central repository from designer only. We can’t login to central repository directly. For that, we need to connect to the central repository. The steps to connect with central repository are as follows.
1. Go to tools-> central repository -> give the requested details->click ok.
Exporting objects to central repository:

For exporting objects to central repository,
1. Right click on the object to be exported, then click on add to central repository.
2. Select the repository then ok.
Scheduling Jobs

We can also schedule, monitor, execute the jobs from “data services management console”.
For scheduling jobs,
1. Open data services management console from start->programs
2. Login using admin and admin as user name and password respectively.
3. Then go to batch job which you can find on your left side.
4. Select a job to be scheduled
5. Click on schedule
6. Then a dialogue box appears
7. Then fill the date and time details for scheduling.
8. Check the “active” check box.
9. Click execute.
Creating users and groups for particular repository:

We can also create users and groups from the data services management console. For that,
1. Click on repositories
2. Click on add groups or users
3. Click add
4. Give the user name and password.
Now we have covered creating repositories, data stores, importing tables, files, creating jobs, connecting to central repository, exporting objects to file and central repositories, scheduling jobs, creating users and groups for particular repository.

Now let us see different kinds of transforms provided by BODS for accomplishing our business needs. There three kinds of transforms in BODS.
They are
1. Data Integrator Transforms
2. Platform Transforms
3. Data Quality Transforms.

Data integrator transforms includes
1. Data Transfer transform
2. Date Generation transform
3. Effective Date transform
4. Hierarchy Flattening transform
5. Table Comparison transform
6. Key Generation transform
7. History Preservation transform
8. Pivot transform
9. Reverse Pivot Transform
10. XML Pipe Line Transform


1. Data Transfer transform:
 
      This is used to transfer the data from source to the target system.
      It is mainly used when millions of records are being processed.
      It splits the task into different sub data flows and executes them serially.
      Mainly used when there are multiple threads
      So performance increases.
2. Date Generation transform:
      It produces dates serially from start date to end date specified by us.
      The serial order is may be day wise, month wise, week wise or we can manually assign using global variables.
3. Effective Date Transform:
      This transform calculates “effective to” date for records which contains “Effective Date” field of date type.
      The “effective to” column is added to target along with source schema and it gives the date until which each record is valid.
      These effective date and effective to column are like valid from and valid to respectively.
      So we can identify the range of each record’s validity.
4. Hierarchy Flattening Transform:
      This transform builds a complete hierarchy in target table for the source data set and describes the hierarchy based on given parent and child columns.
      It describes the hierarchy in both horizontal and vertical format based on our request.
5. Table Comparison transform:
      It is used for identifying data manipulation operations in the source.
      It identifies changes in the source and transfers the data to the target with rows flagged as insert, update or delete as per the change.
      We can select which column’s changes has to be identified by dragging the column in compare columns tab in transform editor.
6. Key Generation Transform:
      It is used to generate keys for the newly inserted rows.
      This transform identifies the maximum existing key value in the source, and generates keys by taking that maximum value as starting value.
7. History Preserving Transform:
      As the name indicating it is used to preserve the history of all the records.
      When we update a record in the source, instead of updating the record in the target, this transform inserts a new row with updated record along with previous record.

8. Pivot Transform:
      It is used for converting the selected columns to rows.
      i.e., for each value in selected column, pivot transform inserts a new row in target table.
9. Reverse Pivot Transform:
      It is used for converting the selected rows to columns.
      It combines data from multiple rows into a single row.
      It creates new columns to perform above task.
10. XML Pipe Line Transform:
      It processes large XML files. Instead of taking all the instances at a time into memory, it takes one instance at a time and processes it and releases the memory.
      Next instance is again loaded to same memory location so that memory consumption is reduced.
Platform transforms includes
1. Case Transform
2. Map Operation Transform
3. Merge Transform
4. Query Transform
5. Row Generation Transform
6. SQL Transform
7. Validation Transform

 1. Case Transform:
      It is used to split the input rows based on given conditions and route them to different locations respectively.
       Rows that do not satisfy any given condition goes to other specified location labeled as ‘default’.
2. Map Operation Transform:
      This transform is used to alter the data manipulation operation codes.
      It takes data manipulation operation codes as input and alters them to specified operational code like update to insert, update to update etc..
3. Merge Transform:
      This transform acts like UNION ALL in SQL.
      It combines all the input data sets that are from different sources and sends to target.
      It allows duplicates.
4. Query Transform:
It is very important transform which is frequently used in jobs.
      Mapping columns from input to output schema
      Assigning primary keys to output schema
      Adding new columns in output schema

      Adding function calls
      Performing join operations
      Applying group by, order by clauses.
      Removing duplicate rows

5. Row Generation Transform:
     Produces a data set with a single column. The column values start with the number that you set in the “Row number starts at” option. The values then increments by one to a specified number of rows.
6. SQL Transform:
      This transform won’t take any input.
      This is useful when any of the built-in transform cannot perform required operation.
      We will write the SQL query manually through this transform.
7. Validation Transform:
      This is used to validate the input rows based on specific criteria.
      The rows that satisfies the required criteria are routed to the target labeled as “pass”.
      Those rows which do not satisfy the required criteria are routed to the target labeled as “fail”.
Data quality transforms are used for parsing, standardizing and correcting the data. We have different transforms like address cleansing, match, country code, geocode etc.,