a Power Center 8x Key Concepts

download a Power Center 8x Key Concepts

of 9

Transcript of a Power Center 8x Key Concepts

  • 8/7/2019 a Power Center 8x Key Concepts

    1/9

    Informatica PowerCenter 8x Key Concepts

    We shall look at the fundamental components of the Informatica PowerCenter 8.x Suite,the key components are

    1. PowerCenter Domain2. PowerCenter Repository

    3. Administration Console

    4. PowerCenter Client5. Repository Service

    6. Integration Service

    1. PowerCenter Domain

    A domain is the primary unit for management and administration of services in

    PowerCenter. Node, Service Manager and Application Services are components of a

    domain.

    Node

    Node is the logical representation of a machine in a domain. The machine in which thePowerCenter is installed acts as a Domain and also as a primary node. We can add other

    machines as nodes in the domain and configure the nodes to run application services such

    as the Integration Service or Repository Service. All service requests from other nodes inthe domain go through the primary node also called as master gateway.

    The Service Manager

    The Service Manager runs on each node within a domain and is responsible for startingand running the application services. The Service Manager performs the following

    functions, Alerts. Provides notifications of events like shutdowns, restart

    Authentication. Authenticates user requests from the Administration Console,

    PowerCenter Client, Metadata Manager, and Data Analyzer

    Domain configuration. Manages configuration details of the domain like machinename, port

    Node configuration. Manages configuration details of a node metadata like

    machine name, port

    Licensing. When an application service connects to the domain for the first timethe licensing registration is performed and for subsequent connections the

    licensing information is verified

    Logging. Manages the event logs from each service, the messages could beFatal,

    Error, Warning, Info

    User management. Manages users, groups, roles, and privileges

    Application services

    The services that essentially perform data movement, connect to different data sourcesand manage data are called Application services, they are namely Repository Service,

  • 8/7/2019 a Power Center 8x Key Concepts

    2/9

    Integration Service, Web Services Hub, SAPBW Service, Reporting Service and

    Metadata Manager Service. The application services run on each node based on the way

    we configure the node and the application service.

    Domain Configuration

    Some of the configurations for a domain involves assigning host name, port numbers tothe nodes, setting up Resilience Timeout values, providing connection information of

    metadata Database, SMTP details etc. All the Configuration information for a domain is

    stored in a set of relational database tables within the repository. Some of the globalproperties that are applicable for Application Services like Maximum Restart Attempts,

    Dispatch Mode as Round Robin/Metric Based/Adaptive etc are configured under

    Domain Configuration.

    2. PowerCenter Repository

    The PowerCenter Repository is one of best metadata storage among all ETL products.

    The repository is sufficiently normalized to store metadata at a very detail level; which in

    turn means the Updates to the repository are very quick and the overall Team-basedDevelopment is smooth. The repository data structure is also useful for the users to do

    analysis and reporting.

    Accessibility to the repository through MX views and SDK kit extends the repositories

    capability from a simple storage of technical data to a database for analysis of the ETL

    metadata.

    PowerCenter Repository is a collection of 355 tables which can be created on any major

    relational database. The kinds of information that are stored in the repository are,1. Repository configuration details

    2. Mappings

    3. Workflows4. User Security

    5. Process Data of session runs

    For a quick understanding,

    When a user creates a folder, corresponding entries are made into table OPB_SUBJECT;

    attributes like folder name, owner id, type of the folder like shared or not are all stored.

    When we create\import sources and define field names, datatypes etc in source analyzerentries are made into opb_src and OPB_SRC_FLD.

    When target and related fields are created/imported from any database entries are made

    into tables like OPB_TARG and OPB_TARG_FLD.Table OPB_MAPPING stores mapping attributes like Mapping Name, Folder Id, Valid

    status and mapping comments.

    Table OPB_WIDGET stores attributes like widget type, widget name, comments etc.Widgets are nothing but the Transformations which Informatica internally calls them as

    Widgets.

    Table OPB_SESSION stores configurations related to a session task and table

    OPB_CNX_ATTR stores information related to connection objects.

  • 8/7/2019 a Power Center 8x Key Concepts

    3/9

    Table OPB_WFLOW_RUN stores process details like workflow name, workflow started

    time, workflow completed time, server node it ran etc.

    REP_ALL_SOURCES, REP_ALL_TARGETS and REP_ALL_MAPPINGS are few ofthe many views created over these tables.

    PowerCenter applications access the PowerCenter repository through the RepositoryService. The Repository Service protects metadata in the repository by managing

    repository connections and using object-locking to ensure object consistency.

    We can create a repository as global or local. We can go for global to store common

    objects that multiple developers can use through shortcuts and go for local repository to

    perform of development mappings and workflows. From a local repository, we can create

    shortcuts to objects in shared folders in the global repository. PowerCenter supportsversioning. A versioned repository can store multiple versions of an object.

    3. AdministrationConsoleThe Administration Console is a web application that we use to administer the

    PowerCenter domain and PowerCenter security. There are two pages in the console,Domain Page & Security Page.

    We can do the following In Domain Page:

    o Create & manage application services like Integration Service and RepositoryService

    o Create and manage nodes, licenses and folders

    o Restart and shutdown nodeso View log events

    o Other domain management tasks like applying licenses and managing grids and

    resourcesWe can do the following in Security Page:

    o Create, edit and delete native users and groups

    o Configure a connection to an LDAP directory service. Import users and groupsfrom the LDAP directory service

    o Create, edit and delete Roles (Roles are collections of privileges)

    o Assign roles and privileges to users and groups

    o Create, edit, and delete operating system profiles. An operating system profile isa level of security that the Integration Services uses to run workflows.

    4.PowerCenterClient

    Designer, Workflow Manager, Workflow Monitor, Repository Manager & Data

    Stencil are five client tools that are used to design mappings, Mapplets, create

    sessions to load data and manage repository.Mapping is an ETL code pictorially depicting logical data flow from source to target

    involving transformations of the data. Designer is the tool to create mappings

    Designer has five window panes, Source Analyzer, Warehouse Designer,

    Transformation Developer, Mapping Designer and Mapplet Designer.

  • 8/7/2019 a Power Center 8x Key Concepts

    4/9

    Source Analyzer:

    Allows us to import Source table metadata from Relational databases, flat files, XMLand COBOL files. We can only import the source definition in the source Analyzer

    and not the source data itself is to be understood. Source Analyzer also allows us to

    define our own Source data definition.

    Warehouse Designer:

    Allows us to import target table definitions which could be Relational databases, flatfiles, XML and COBOL files. We can also create target definitions manually and can

    group them into folders. There is an option to create the tables physically in the

    database that we do not have in source analyzer. Warehouse designer doesnt allow

    creating two tables with same name even if the columns names under them vary orthey are from different databases/schemas.

    Transformation Developer:

    Transformations like Filters, Lookups, Expressions etc that have scope to be re-usedare developed in this pane. Alternatively Transformations developed in Mapping

    Designer can also be reused by checking the optionre-use and by that it would bedisplayed under Transformation Developer folders.

    Mapping Designer:

    This is the place where we actually depict our ETL process; we bring in sourcedefinitions, target definitions, transformations like filter, lookup, aggregate and

    develop a logical ETL program. In this place it is only a logical program because the

    actual data load can be done only by creating a session and workflow.

    Mapplet Designer:

    We create a set of transformations to be used and re-used across mappings.

    Workflow Manager :

    In the Workflow Manager, we define a set of instructions called a workflow toexecute mappings we build in the Designer. Generally, a workflow contains a session

    and any other task we may want to perform when we run a session. Tasks can include

    a session, email notification, or scheduling information.

    A set of tasks grouped together becomes worklet. After we create a workflow, we run

    the workflow in the Workflow Manager and monitor it in the Workflow Monitor.

    Workflow Manager has following three window panes,Task Developer, Create taskswe want to accomplish in the workflow. Worklet Designer, Create a worklet in the

    Worklet Designer. A worklet is an object that groups a set of tasks. A worklet is

    similar to a workflow, but without scheduling information. You can nest workletsinside a workflow. Workflow Designer, Create a workflow by connecting tasks with

    links in the Workflow Designer. We can also create tasks in the Workflow Designer

    as you develop the workflow. The ODBC connection details are defined in Workflow

    Manager Connections Menu .

  • 8/7/2019 a Power Center 8x Key Concepts

    5/9

    Workflow Monitor:

    We can monitor workflows and tasks in the Workflow Monitor. We can view detailsabout a workflow or task in Gantt Chart view or Task view. We can run, stop, abort,

    and resume workflows from the Workflow Monitor. We can view sessions and

    workflow log events in the Workflow Monitor Log Viewer.

    The Workflow Monitor displays workflows that have run at least once. The

    Workflow Monitor continuously receives information from the Integration Serviceand Repository Service. It also fetches information from the repository to display

    historic information.

    The Workflow Monitor consists of the following windows:

    Navigator window Displays monitored repositories, servers, and repositoriesobjects.

    Output window Displays messages from the Integration Service and Repository

    Service.

    Time window Displays progress of workflow runs.Gantt chart view Displays details about workflow runs in chronological format.

    Task view Displays details about workflow runs in a report format.

    Repository Manager

    We can navigate through multiple folders and repositories and perform basicrepository tasks with the Repository Manager. We use the Repository Manager to

    complete the following tasks:

    1. Add and connect to a repository, we can add repositories to the Navigator windowand client registry and then connect to the repositories.

    2. Work with PowerCenter domain and repository connections, we can edit or remove

    domain connection information. We can connect to one repository or multiplerepositories. We can export repository connection information from the client

    registry to a file. We can import the file on a different machine and add the

    repository connection information to the client registry.3. Change your password. We can change the password for our user account.

    4. Search for repository objects or keywords. We can search for repository objects

    containing specified text. If we add keywords to target definitions, use a keyword

    to search for a target definition.5. View objects dependencies. Before we remove or change an object, we can view

    dependencies to see the impact on other objects.

    6. Compare repository objects. In the Repository Manager, wecan compare tworepository objects of the same type to identify differences between the objects.

    7. Truncate session and workflow log entries. we can truncate the list of session and

    workflow logs that the Integration Service writes to the repository. we cantruncate all logs, or truncate all logs older than a specified date.

    5. Repository Service

  • 8/7/2019 a Power Center 8x Key Concepts

    6/9

    As we already discussed about metadata repository, now we discuss a separate,multi-

    threaded process that retrieves, inserts and updates metadata in the repository database

    tables, it is Repository Service.Repository service manages connections to the PowerCenter repository from

    PowerCenter client applications like Desinger, Workflow Manager, Monitor, Repository

    manager, console and integration service. Repository service is responsible for ensuringthe consistency of metdata in the repository.

    Creation & Properties:

    Use the PowerCenter Administration Console Navigator window to create a Repository

    Service. The properties needed to create are,

    Service Name name of the service like rep_SalesPerformanceDev

    Location Domain and folder where the service is created

    License license service name

    Node, Primary Node & Backup Nodes Node on which the service process runs

    CodePage The Repository Service uses the character set encoded in the repository code

    page when writing data to the repositoryDatabase type & details Type of database, username, pwd, connect string andtablespacename

    The above properties are sufficient to create a repository service, however we can take a

    look at following features which are important for better performance and maintenance.

    General Properties

    > OperatingMode: Values are Normal and Exclusive. Use Exclusive mode to perform

    administrative tasks like enabling version control or promoting local to global repository

    > EnableVersionControl: Creates a versioned repository.

    Node Assignments: High availability option is licensed feature which allows us to

    choose Primary & Backup nodes for continuous running of the repository service. Undernormal licenses would see only Node to select from.

    Database Properties

    > DatabaseArrayOperationSize: Number of rows to fetch each time an array database

    operation is issued, such as insert or fetch. Default is 100

    > DatabasePoolSize: Maximum number of connections to the repository database that the

    Repository Service can establish. If the Repository Service tries to establish moreconnections than specified for DatabasePoolSize, it times out the connection attempt after

    the number of seconds specified for Database Connection Timeout.

    Advanced Properties

    > CommentsRequiredFor Checkin: Requires users to add comments when checking in

    repository objects.> Error Severity Level: Level of error messages written to the Repository Service log.

    Specify one of the following message levels: Fatal, Error, Warning, Info, Trace & Debug

    > EnableRepAgentCaching:Enables repository agent caching. Repository agent caching

    provides optimal performance of the repository when you run workflows. When you

  • 8/7/2019 a Power Center 8x Key Concepts

    7/9

    enable repository agent caching, the Repository Service process caches metadata

    requested by the Integration Service. Default is Yes.

    > RACacheCapacity:Number of objects that the cache can contain when repository agentcaching is enabled. You can increase the number of objects if there is available memory

    on the machine running the Repository Service process. The value must be between 100

    and 10,000,000,000. Default is 10,000> AllowWritesWithRACaching: Allows you to modify metadata in the repository when

    repository agent caching is enabled. When you allow writes, the Repository Service

    process flushes the cache each time you save metadata through the PowerCenter Clienttools. You might want to disable writes to improve performance in a production

    environment where the Integration Service makes all changes to repository metadata.

    Default is Yes.

    Environment Variables

    The database client code page on a node is usually controlled by an environment variable.

    For example, Oracle uses NLS_LANG, and IBM DB2 uses DB2CODEPAGE. All

    Integration Services and Repository Services that run on this node use the sameenvironment variable. You can configure a Repository Service process to use a different

    value for the database client code page environment variable than the value set for thenode.

    You might want to configure the code page environment variable for a Repository

    Service process when the Repository Service process requires a different database client

    code page than the Integration Service process running on the same node.For example, the Integration Service reads from and writes to databases using the UTF-8

    code page. The Integration Service requires that the code page environment variable be

    set to UTF-8. However, you have a Shift-JIS repository that requires that the code pageenvironment variable be set to Shift-JIS. Set the environment variable on the node to

    UTF-8. Then add the environment variable to the Repository Service process properties

    and set the value to Shift-JIS.

    6. Integration Service (IS)

    The key functions of IS are

    Interpretation of the workflow and mapping metadata from the repository.

    Execution of the instructions in the metadata

    Manages the data from source system to target system within the memory and

    disk

    The main three components of Integration Service which enable data movement are,

    Integration Service Process Load Balancer

    Data Transformation Manager

    1. Integration Service Process (ISP)

    The Integration Service starts one or more Integration Service processes to run andmonitor workflows. When we run a workflow, the ISP starts and locks the workflow,

  • 8/7/2019 a Power Center 8x Key Concepts

    8/9

    runs the workflow tasks, and starts the process to run sessions. The functions of the

    Integration Service Process are,

    Locks and reads the workflow

    Manages workflow scheduling, ie, maintains session dependency

    Reads the workflow parameter file

    Creates the workflow log Runs workflow tasks and evaluates the conditional links

    Starts the DTM process to run the session

    Writes historical run information to the repository

    Sends post-session emails.

    2 Load Balancer

    The Load Balancer dispatches tasks to achieve optimal performance. It dispatches tasksto a single node or across the nodes in a grid after performing a sequence of steps. Before

    understanding these steps we have to know about Resources, Resource Provision

    Thresholds, Dispatch mode and Service levels

    Resources we can configure the Integration Service to check the resourcesavailable on each node and match them with the resources required to run the task.

    For example, if a session uses an SAP source, the Load Balancer dispatches thesession only to nodes where the SAP client is installed

    Three Resource Provision Thresholds, The maximum number of runnable

    threads waiting for CPU resources on the node called Maximum CPU Run Queue

    Length. The maximum percentage of virtual memory allocated on the node relativeto the total physical memory size called Maximum Memory %. The maximum

    number of running Session and Command tasks allowed for each Integration Service

    process running on the node called Maximum Processes

    Three Dispatch modes Round-Robin: The Load Balancer dispatches tasks to

    available nodes in a round-robin fashion after checking the Maximum Process

    threshold. Metric-based: Checks all the three resource provision thresholds anddispatches tasks in round robin fashion. Adaptive: Checks all the three resource

    provision thresholds and also ranks nodes according to current CPU availability

    Service Levels establishes priority among tasks that are waiting to be dispatched,the three components of service levels are Name, Dispatch Priority and Maximum

    dispatch wait time. Maximum dispatch wait time is the amount of time a task can

    wait in queue and this ensures no task waits forever.

    A .Dispatching Tasks on a node:

    1. The Load Balancer checks different resource provision thresholds on the node

    depending on the Dispatch mode set. If dispatching the task causes any threshold tobe exceeded, the Load Balancer places the task in the dispatch queue, and it

    dispatches the task later

    2. The Load Balancer dispatches all tasks to the node that runs the masterIntegration Service process

    B. Dispatching Tasks on a grid:

    1. The Load Balancer verifies which nodes are currently running and enabled

  • 8/7/2019 a Power Center 8x Key Concepts

    9/9

    2. The Load Balancer identifies nodes that have the PowerCenter resources required

    by the tasks in the workflow

    3. The Load Balancer verifies that the resource provision thresholds on eachcandidate node are not exceeded. If dispatching the task causes a threshold to be

    exceeded, the Load Balancer places the task in the dispatch queue, and it dispatches

    the task later4. The Load Balancer selects a node based on the dispatch mode.

    3 Data Transformation Manager (DTM) Process:

    When the workflow reaches a session, the Integration Service Process starts the DTM

    process. The DTM is the process associated with the session task. The DTM process

    performs the following tasks: Retrieves and validates session information from the repository.

    Validates source and target code pages.

    Verifies connection object permissions.

    Performs pushdown optimization when the session is configured for pushdownoptimization.

    Adds partitions to the session when the session is configured for dynamicpartitioning.

    Expands the service process variables, session parameters, and mapping variables

    and parameters.

    Creates the session log. Runs pre-session shell commands, stored procedures, and SQL.

    Sends a request to start worker DTM processes on other nodes when the session is

    configured to run on a grid.

    Creates and runs mapping, reader, writer, and transformation threads to extract,

    transform, and load data

    Runs post-session stored procedures, SQL, and shell commands and sends post-session email

    After the session is complete, reports execution result to ISP.