ETL Tools


ETL TOOLS

The times of increasing data-dependence forced a lot of companies to invest in complicated data warehousing systems. Their differentiation and incompatibility led to an uncontrolled growth of costs and time needed to coordinate all the processes. The ETL (Extract, transform, load) tools were created to simplify the data management with simultaneous reduction of absorbed effort. 

Depending on the needs of customers there are several types of tools. 
One of them perform and supervise only selected stages of the ETL process like data migration tools(EtL Tools , “small t”tools) , data transformation tools(eTl Tools “capital T”tools).Another are complete (ETL Tools ) and have many functions that are intended for processing large amounts of data or more complicated ETL projects. 

Some of them like server engine tools execute many ETL steps at the same time from more than one developer , while other like client engine tools are simpler and execute ETL routines on the same machine as they are developed. 
There are two more types. First called code base tools is a family of programing tools which allow you to work with many operating systems and programing languages.The second one called GUI base tools remove the coding layer and allow you to work without any knowledge (in theory) about coding languages.

How do the ETL tools work ? |ETL Tools Selection in Data Warehousing


How do the ETL tools work?

The first task is data extraction from internal or external sources. After sending queries to the source system data may go indirectly to the database. However usually there is a need to monitor or gather more information and then go to Staging Area . Some tools extract only new or changed information automatically so we dont have to update it by our own. 
The second task is transformation which is a broad category: 
-transforming data into a stucture wich is required to continue the operation (extracted data has usually a sructure typicall to the source) 
-sorting data 
-connecting or separating 
-cleansing 
-checking quality 

The third task is loading into a data warehouse

As you can see the ETL Tools have many other capabilities (next to the main three: extraction , transformation and loading) like for instance sorting , filtering , data profiling , quality control, cleansing , monitoring , synchronization and consolidation.

ETL Process ,ETL Tools Selection in Data Warehousing




ETL PROCESS

ETL process
The three-stage ETL process and the ETL tools implementing the concept might be a response for the needs described above.

The ‘ETL’ shortcut comes from 'Extract, transform, and load' – the words that describe the idea of the system. The ETL tools were created to improve and facilitate data warehousing.
    The Etl process consists of the following steps:
  1. Initiation
  2. Build reference data
  3. Extract from sources
  4. Validate
  5. Transform
  6. Load into stages tables
  7. Audit reports
  8. Publish
  9. Archive
  10. Clean up
Sometimes those steps are supervised and performed indirectly but its very time-consuming and may be not so accurate.
The purpose of using ETL Tools is to save the time and make the whole process more reliable.

Extract, transform, load |ETL Tools Selection in Data Warehousing



Nowadays, most companies’ existence depends on data flow. When plenty of information is generally accessible and one can find almost everything he needs, managing became easier than ever before. The Internet simplifies cooperation – time needed to send and receive requested data gets shorter as more and more institutions computerize their resources. Also, the communication between separate corporation departments became easier – no one needs to send normal letters (or even the office boys) as the process is replaced by e-mails. Although the new ways of communication improved and facilitated managing, the ubiquitous computerization has its significant disadvantages.

The variety of data – as positive phenomenon as possible – got a little bit out of control. The unlimited growth of databases’ size caused mess that often slows down (or even disable) data finding process.

It’s all about effective information storing. Uncategorized data is assigned to different platforms and systems. As a consequence, finding wanted data brings a lot of troubles – user needs to know what data he administers, where it is located (and whether he has proper access), finally how to take them out.
Wrong was someone who thought that the hardest task was making decisions basing on data. No – finding data itself is often much more annoying. But users are not the only ones suffering for databases’ overgrowth. The IT departments – usually responsible for keeping the systems work – have to struggle with data in different formats and systems. ‘Keeping it alive’ is extremely time-consuming what delays the company’s work.
Slow (or sometimes omitted at all) transformation of data causes that it’s usually impossible to provide demanded information in demanded time. Formed divergence between data provided and data really existing in the moment of need harms the IT departments’ image.

To achieve better results, companies invest in external systems – computing power and resources. Not enough power causes lacks of synchronization of data. Transporting information between separate units lasts too long to work effectively. On the other side, computing power increasing – that might be an example solution – is expensive and lead to overgrowth of the operation costs.
Supposing that example company managed to prepare well-working database responsible for supporting designated operation. A lot of money and time got spent. Everything seems wonderful until it comes to another operation. Suddenly it appears that once created system doesn’t really fit the requirements of new operation and the best idea is to create a new system from the beginning. Yes, modifications might be made but there is no single developer common for all parts of the projects, so it demands cooperation of at least a few subjects – that hardly disables the idea.

Some List of ETL Tools
----------------------------

Here is a list of the most popular comercial and freeware(open-sources) ETL Tools.
Comercial ETL Tools:


  • IBM Infosphere DataStage
  • Informatica PowerCenter
  • Oracle Warehouse Builder (OWB)
  • Oracle Data Integrator (ODI)
  • SAS ETL Studio
  • Business Objects Data Integrator(BODI)
  • Microsoft SQL Server Integration Services(SSIS)
  • Ab Initio
  • Freeware, open source ETL tools:
  • Pentaho Data Integration (Kettle)
  • Talend Integrator Suite
  • CloverETL
  • Jasper ETL



What are different ways of Initiating Transaction in CICS?

  • We can initiate the transation through MQ and TCP/IP and Terminal and consol.
  • We can initiate CICS Transaction By giving TRANSACTION ID By giving CICS START Command AUTOMATIC TASK INITIATION(ATI). 
 Other Methods:


1. By entering Transaction Id(4 bytes)
2.By Start command
3. By Return Trans Id (as in pseudo conv)
4. By Registering Tran ID in plt so that it will be automaticaly dusring CICS start
5. By ATI
6.PF PA keys as defined in PCT

How a CICS Transaction be initiated?

Involving CICS Control programs and Control tables. a. Trans-id entered in terminal b. TCP with TCT recognizes incoming data c. SCP acquires the storage for the Terminal Input-Output Area (TIOA) d. KCP prepares control data for this task e.KCP, through PCT, tries to find the application program associated with the Trans-id. f. If PPT entry of the application program does not show the resident address of the program, KCP passes control to PCP, which fetches the application program from the load library and places it into the main storage. g. KCP passes control to the application program. h. Application program starts its processing

CEDF,CEMT,CEBR,CECI transactions in CICS

CEDF :
CICS-supplied Execution Diagnostic Facility transaction. It provides interactive program execution and debugging functions of a CICS programs.
CEMT : CICS-supplied Extended Master Terminal transaction. It displays or manipulates CICS control environment interactively.
CEBR : CICS-supplied Temporary Storage Browse transaction. It displays the content of Temporary Storage Queue ( TSQ ).
CECI : CICS-supplied Command Interpreter transaction. It verifies the syntax of a CICS command and executes the command.

CICS Interview Questions for TCS,Infosys,IBM,HP and Other Companies


  1. How do you place the cursor on a particular position on the screen? –
              Move -1 to the length attribute of the field aand use the CURSOR option.
     2.    Define the field with IC in the BMS map.               Use CURSOR(n m)??
  1. What are the two outputs created as a result of generation of a map? – The map copybook and the load module.
  2. What is the difference between physical map and symbolic map? –
              The physical map is the load module and the symbolic map is the data structure.

5.     How do you protect a field from being overlaid? -
              What is the attribute byte? –
              Defines the display/transmission of field. most cases is an output field from the program.
  1. How do you use extended attributes ? Define EXTATT=YES and the correct terminal type.
  2. What are the 3 working storage fields used for every field on the map? – Length, attribute and input/output field.
  3. What is MDT? What are FSET, FRSET ? Modified Data Tag. Bit in the attribute byte indicating modification of field on screen. Happens on an input operation.
    FSET. Sets MDT on to ensure field is transmitted. Happens on an output operation.
    FRSET. Resets MDT. Until this happens, field continues to be sent.
  4. What is the use of DSECT parameter in BMS? Is the parameter to generate a symbolic map.
  5. Do you receive the attribute byte in the symbolic map? On EOF yes.

  6. How do you make your BMS maps case sensitive? Use ASIS???

  7. What is effect on RECEIVE MAP when PF key is pressed? Data transmission may happen,
    PA key is pressed? Data transmission will not happen.
  8. What are SEND MAP MAPONLY & SEND MAP DATAONLY ? MAPONLY - to send the map alone, without any data. Eg: used for sending Menu screens.
    DATAONLY - to send data alone, without sending the screen layout again. Eg: used for refreshing the screen data.
  9. What is the difference between a PF key & a PA key ? PF keys wake up the task and transmit modified data, PA keys only wake up the task.
  10. Name the macros used to define the following: MAP MAPSET FIELD
    DFHMSD DFHMDI DFHMDF
  11. Can you use OCCURS in a BMS map? If you do, what are the issues related with its use? Yes. cannot use group by clause???
  12. Can you define multiple maps in a BMS mapset? Yes.
  13. How is the storage determined in the symbolic map, if you have multiple maps? Storage for maps redefine the first. This means largest map has to be the first.
  14. What is the meaning of BMS length of field = 0? Data was not entered in the field
  15. Can you simply check if length = 0 for checking if a field was modified? No, not if ERASE EOF was used.

Difference between single quotes and double quotes in unix

Enclosing characters in single quotes (`'') preserves the literal value of each character within the quotes.  A single quote may not occur between single quotes, even when preceded by a backslash.

Enclosing characters in double quotes (`"') preserves the literal value of all characters within the quotes, with the exception of `$', ``', and `\'.  The characters `$' and ``' retain their special meaning within double quotes (*note Shell Expansions::).  The backslash retains its special meaning only when followed by one of the following characters: `$', ``', `"', `\', or `newline'.  Within double quotes, backslashes that are followed by one of these characters are removed. Backslashes preceding characters without a special meaning are left unmodified.  A double quote may be quoted within double quotes by preceding it with a backslash.

The special parameters `*' and `@' have special meaning when in double quotes (*note Shell Parameter Expansion::).

What is the difference between CI( CONTROL INTERVAL)and CA(CONTROL AREA)


The CI is a single unit of transfer from secondary to main memory. Inside it will have freespace to store the records of fixed or variable size.
If u try to insert a new record to a ci which is not having any space to hold at that time the half of the record from this ci will be moved to some free ci. This splitting of ic in case of ci full is called CI splitt.
A collection of ci is called as CA. During the ci splitt if it is unable to find the free ci at that time the entire CA will be splitted into two and half will be moved to some free CA. This is called CA splitt. This is the primary mechanism of the VASM to store all the records inside your ksds in sorted order.

What is the procedure for running the batch job in mainframe?

Friends let me explain with an example:--

EG:-Adding two numbers

Step-1:- We need to compile that program and that compiled program(i.e Error free code) we need to keep in one Load Library (EG:-ABCD.GEEK.LOADLIB)

Step-2:- Now we need to execute this program giving PGM=ADD(Program Name) and the Load Libray name(ABCD.GEEK.LOADLIB)

Step-3:- Now In SYSOUT we can view the results

That's it this the process how to compile and execute a batch program

How do you convert flat files to VSAM files

Let's be a little more precise.We will take the tack that you will be creating a KSDS file.First, you need to know which field/s you will be using as the primary key and if you need, alternate keys. Sort the records by that field/s.Use IDCAMS REPRO to load the flat file into the VSAM file.

difference between Intra partition TDQ and Extra partition TDQ?

INTRA PARTITION TD QUEUEs It is a group of sequential records which are produced by the same and / or different transactions within a CICS region. These Qs are stored in only one physical file ( VSAM ) in a CICS region, which is prepared by the system programmer. Once a record is read from a queue, the record will be logically removed from the queue; that is the record cannot be read again. EXTRA PARTITION TD QUEUEs It is a group of sequential records which interfaces between the transactions of the CICS region and the systems outside of CICS region. Each of these TDQs is a separate physical file, and it may be on the disk, tape, printer or plotter.

What is Pseudo-conversation?How it is achieved in CICS Programs?

The programming methodology in which the task will not wait for the terminal users response, but frees and resources after sending the message is call a PSEDUDO Conversational Technique. Terminating the task every time the application needs a response from the user and then starting the next transaction when the user presses a attention key is PSEUDO Conversational Processing.


Generally the I/O request requiring a user input takes 1000 times more time than CPU processing time. So, the idea is to free the current CPU resources till the user gives his response and initiate the same after receiving the response from user. In the meantime, CPU engages itself in processing other programs/transactions. This helps multitasking concept and hnce faster and concurrent processing.
 

What is the difference between the XCTL and LINK commands?


XCTL : This command passes the control to the same level.Control does not return back after this command is executed.
Syntax:
EXEC CICS XCTL
PROGRAM(PGMNAME)
COMMAREA(DATA-NAME)
END-EXEC.

LINK: link transfers the control to the next lower level.Link command also expects the control to return back.
Syntax is same use link in place of XCTL. 

Both XCTL and LINK are control transfer commands.

XCTL transfers the control to another program (called program) at the level below to the calling program. Control is never returned back to the called program.

LINK transfers the control to another program (called program) at the same of the calling program. Control is expected to return back to the called program.

CICS Error Code AIEV,IBM AEIV due to length error of COMMAREA on EXEC CICS LINK command

Problem(Abstract)

You have a modified Node Error Program, DFHZNEP, that issues an EXEC CICS LINK command to another application. This program has worked for many years until upgrading to z/OS 1.8. Now, the program abends with abend code AEIV indicating that there is a length error (LENGERR).

Cause

EXEC CICS LINK command has LENGTH parameter coded incorrectly.

Diagnosing the problem

Your Link command is coded similar to the following:

EXEC CICS LINK PROGRAM('SSCCERR')

COMMAREA(NEPCABEG)
LENGTH(NEPCALEN)

NEPCALEN is defined using the DFHNEPCA macro as an equate:


NEPCALEN EQU *-NEPCABEG


Thus, NEPCALEN is equated to a full word binary value of x'0000009E', causing the length value to be picked up at location x'9E'. The CICS Translator loads the address of the equated field into the parameter list, and this actually yields an address of x'0000009E'.


Prior to upgrading to z/OS V1.8, location x'9E' either contained a half word of zeros or a half word value not having the left most bit turned on. After upgrading, this storage contains a half word value with the left most bit turned on, thus thought to be a negative value. The negative value causes LENGERR and abendAEIV. 


Source :IBM

How can connect sql server 2005 with java in java?

In order to connect MySQL database with Java one need to use below mentioned sample script:

<%@ page import="java.sql.*" %>
<%@ page import="com.mysql.jdbc.Driver" %>

<%!
Class.forName("com.mysql.jdbc.Driver").newInstance ();
java.sql.Connection conn;
conn = DriverManager.getConnection(
"jdbc:mysql:///?user=&password=") ;
%>
Use the JDBC functionality to open a connection to SQL Server.
You can read up on how here:

How to connect MySQL Database from Java

In order to connect MySQL database with Java one need to use below mentioned sample script:

<%@ page import="java.sql.*" %>
<%@ page import="com.mysql.jdbc.Driver" %>

<%!
Class.forName("com.mysql.jdbc.Driver").newInstance ();
java.sql.Connection conn;
conn = DriverManager.getConnection(
"jdbc:mysql:///?user=&password=") ;
%>

Connect to an Oracle database with JDBC in Java

JDBC mean Java Database Connectivity. In java, using JDBC drivers, we can connect to database.
Steps to connect to JDBC.
1) Load the driver, using Class.forName(DriverName);
2) Get the connection object, Connection con = Driver.getConnection(loaded driver name);
3) Create a SQL statement, Statement s = con.createStatement();
4) Create Resultset object using the statement created above, ResultSet rs = s.executeQuery("sql statement");

Iterate the result set to get all the values from the database.

Finally don't miss this
5) s.close();
6) con.close() ;


Example:


import java.sql.*;

public class TestDBOracle {

  public static void main(String[] args)
      throws ClassNotFoundException, SQLException
  {
    Class.forName("oracle.jdbc.driver.OracleDriver");
    //
    // or
    // DriverManager.registerDriver
    //        (new oracle.jdbc.driver.OracleDriver());

        String url = "jdbc:oracle:thin:@//server.local:1521/prod";
    //               jdbc:oracle:thin:@//host:port/service
    // or
    // String url = "jdbc:oracle:thin:@server.local:1521:prodsid";
    //               jdbc:oracle:thin:@host:port:SID
    //
    //  SID  - System ID of the Oracle server database instance.
 //         By default, Oracle Database 10g Express Edition
 //         creates one database instance called XE.
 //         ex : String url = "jdbc:oracle:thin:@myhost:1521:xe";



    Connection conn =
         DriverManager.getConnection(url,"scott","tiger");

    conn.setAutoCommit(false);
    Statement stmt = conn.createStatement();
    ResultSet rset =
         stmt.executeQuery("select BANNER from SYS.V_$VERSION");
    while (rset.next()) {
         System.out.println (rset.getString(1));
    }
    stmt.close();
    System.out.println ("Ok.");
  }
}

XML and using Xerces parser for Java to generate and parse XML

his small tutorial introduces you to the basic concepts of XML and using Xerces parser for Java to generate and parse XML.
The intended audience are XML beginners with knowledge of Java.
DOM (Document Object Model ) parser - Tree Structure based API:
    The Dom parser implements the dom api and it creates a DOM tree in memory for a XML document
5.2 SAX (Simple API For XML ) parser - Event Based API
    The SAX parser implements the SAX API and it is event driven interface. As it parses it invokes the callback methods
5.3 When to use DOM parser
  • Manipulate the document
  • Traverse the document back and forth
  • Small XML files
Drawbacks of DOM parser
    Consumes lot of memory 5.4 When to use SAX parser
  • No structural modification
  • Huge XML files
5.5 Validating And Non Validating
DOM and SAX can either be a validating or a non validating parser.
    A validating parser checks the XML file against the rules imposed by DTD or XML Schema.
    A non validating parser doesn't validate the XML file against a DTD or XML Schema.
Both Validating and non validating parser checks for the well formedness of the xml document

View Complete Tutorial here

How to create XML document using java,Java Architecture for XML Binding (JAXB)

In this section, you will  learn to create a XML document using the DOM APIs. This XML document uses  1.0 version  and UTF-8 encoding. 
To work with an XML document it is easy to do so, if you have a document object in place and the XML document loaded in it. Yes, java too has APIs for working with an XML document. With this API, you can navigate through the XML document, or create an XML document from the scratch.
import org.w3c.dom.Document;
import org.w3c.dom.DOMException;
import org.w3c.dom.Element;
DocumentBuilderFactory fact = DocumentBuilderFactory.newInstance();
DocumentBuilder bd = fact.newDocumentBuilder();
doc = bd.newDocument();
Element rt = (Element) doc.createElement("rtElement");
document.appendChild(root);
rt.appendChild( doc.createTextNode("Some") );
. . .
. . .
Proper imports are to be done to work with the classes and the methods needed for working with an XML document. The above code sample would give you an idea on how to go about it. Methods like createElement of the Element object are used to create elements.
And these elements are added to the root node using the method appendChild. Lots of sample codes are available in the internet on this topic.
 Here is Java File: CreatXMLFile.java

How to Create Running Total Field in Crystal Reports

Overview

A running total field in a Crystal report is a summary field that allows you to control how and when the summary is calculated, and when it is reset.
In this tutorial, you will create a running total field and insert it into a report.


View Complete Tutorial here

VB.NET Crystal Reports Summary Fields


The following C# - Crystal Reports section describes how to add a summary field in the Crystal Reports
In the Crystal Reports designer view window, right click on the Report Footer , just below the Total field and select Insert -> Summary .


 
Then you will get a screen , select the Total from the combo box and select Sum from next Combo Box , and summary location Grand Total (Report Footer) . Click Ok button

Now you can see @Total is just below the Total field in the report Footer

Now the designing part is over and the next step is to call the Crystal Reports in C# and view it in Crystal Reports Viewer control .

Select the default form (Form1.cs) you created in C# and drag a button and a CrystalReportViewer control to your form .

You have to include CrystalDecisions.CrystalReports.Engine in your C# Source Code.

using CrystalDecisions.CrystalReports.Engine;

using CrystalDecisions.Shared;

How to create a login page in Aspnet with sqlserver

First we have to create a database for storing the login information

Open the Sqlserver Management studio

Create a new database and change the database name as db_Logininformation

Create a New table in that database(db_Logininformation) and change the table name as tab_userinformation
Now, create the following fields

  1. UserID   int         PrimaryKey  
  2. username varchar(50)  
  3. Password varchar(50)  
Next, Enter the some usernames and paasword directlu in the database for checking purpose

and save the table

Next ,open the Microsoft visual studio 2008

Next,select the Aspnet web application and change the name as LoginPage

Next,come to the design page of LoginPage and drag and drop two labels,two textboxes and button

next,come to the code page of the Login.aspx.cs

Write the following codein the page event
 
  1. protected void Button1_Click(object sender, EventArgs e)  
  2.     {        
  3.         con = new SqlConnection(ConfigurationManager.ConnectionStrings["connectionstring"].ToString());  
  4.         con.Open();  
  5.         com = new SqlCommand("select Password from userinformation where Username='" + txt_uname.Text + "'", con);  
  6.         dr = com.ExecuteReader();  
  7.         if (!dr.Read())  
  8.         {  
  9.             Response.write("Invalid User");  
  10.         }  
  11.         else  
  12.         {  
  13.             if (dr[0].ToString() == txt_pwd.Text)  
  14.             {  
  15.                 Response.Redirect("~/mainPage.aspx");  
  16.             }  
  17.             else  
  18.             {  
  19.                Response.write("Wrong Password");  
  20.                txt_pwd.Focus();  
  21.             }  
  22.         }  
  23.   
  24.     } 

How to Create ASP.NET Session Login Without Cookies

The following describes the easiest way I have found to force users to log into an ASP.NET website for each session but not require them to accept cookies. You must do the following things.
  1. Create a Web.config file with the appropriate entries to allow session state management.
  2. Create a well formed Global.asax file with the code below included in it.
  3. Create a login page to authenticate users against a database or whatever method you desire.
' Fires when the session is started and sets the default loggedin state to ""

Sub Session_Start(ByVal sender As Object, ByVal e As EventArgs)
Session("Loggedin") = ""
CheckLoggedIn()
End Sub
Collapse
' Called when the request has been process by the Request Handler and 

' HttpSessionState is available [This is the key piece of code that forces 

' the user is login check with each page request]

Sub Application_OnPostRequestHandlerExecute()
CheckLoggedIn()
End Sub
Collapse
'Check that the user is logged in.

Sub CheckLoggedIn()
'If the user is not logged in and you are not currently on the Login Page.

If Session("LoggedIn") = "" And InStr(Request.RawUrl, "Login.aspx") = 0 Then
Response.Redirect("~/Login/Login.aspx")
End If
End Sub
Finally create a Login.aspx file that authenticates the user. If the user is allowed in, set: Session("Loggedin") = "Yes"
That's all there is to it. Hope this helps! Enjoy!