Infolinks

Friday, 28 December 2012

Oracle SQL* Loader Concepts

Oracle SQL* Loader Concepts


Introduction
SQL*Loader is a bulk loader utility used for moving data from external files into the Oracle database.
Capabilities of SQL*Loader
          Can load data from multiple data files during the same load session.
          Can load data into multiple tables during the same load session.
          Can selectively load data (Records can be loaded based on the records' values).
          Can manipulate the data before loading it, using SQL functions.
          Can generate unique sequential key values in specified columns .
Does sophisticated error reporting which greatly aids troubleshooting


SQL Loader - How it works
SQL*Loader takes as input a control file, which controls the behavior of SQL*Loader, and one or more data files.
Output of the SQL*Loader is an Oracle database (where the data is loaded), a log file, a bad file, and potentially a discard file.
Control File : The SQL*Loader control File is the key to any load process.
The control file provides the following information to SQL*Loader.
          The name and location of the input file
          The format of the records in the input data file
          The name of table or tables to be loaded
          The correspondence between the fields in the input record and the columns in the database tables being loaded.
          Selection criteria defining which records from the input file contain data to be inserted into the destination database tables
          The names and location of the bad file and the discard file.
Data File : The data file contains the data to be loaded in a specific format.
Sometimes if the amount of data is less, it can be embedded as part of the control file.
Bad File : The bad file contains records rejected, either by SQL*Loader or by Oracle.
Records are rejected by SQL*Loader when the input format is invalid.
For example: If a delimited field exceeds the specified length.

After a record is accepted for processing by SQL*Loader, a row is sent to Oracle for insertion.
If Oracle determines that the row is valid, then the row is inserted into the database.
If not, the record is rejected, and SQL*Loader puts it in the bad file.
The row may be rejected, for example, because a key is not unique, because a required field is not null, or because the field contains invalid data for the Oracle data type.
Log File : The log file contains a detailed summary of the load, including a description of any errors that occurred during the load.
The summary of the load includes the number of logical records read from the data file, the number of records rejected because of errors, the number of records discarded because of selection criteria, and the elapsed time of the load.

Discard File : The discard file contains records that were filtered out of the load because they did not match any record-selection criteria specified in the control file.
Discard file is created only when it is needed, and only if specified that a discard file should be enabled. 
Structure of control file
   LOAD DATA
   INFILE ‘<data file name>’
   BADFILE ‘<bad file name>’
   DISCARDFILE ‘<discard file name>’
   APPEND/TRUNCATE/REPLACE/INSERT
   INTO TABLE <table name>
   WHEN <condition>
   TRAILING NULLCOLS
     (column 1  constant,
      column 2  <sequence_name>.nextval,
      column 3  “<sql function(data field)>”,
      …)
     <BEGINDATA>
          LOAD DATA : It signifies the start of control file.
          INFILE : Specifies the Data file name(s)
          BADFILE : Specifies Bad file name that gets created during the SQL loader run
          DISCARDFILE: Specifies the Discard file name that gets created
          APPEND/TRUNCATE/INSERT/REPLACE : Operation that will be performed  on the database table
          INTO TABLE: Table name to which the data will be loaded
          BEGINDATA: Specifies that the subsequent lines of the control file contains data. In this case INFILE use is not required
          POSITION: Specifies the position of each column of data
          WHEN : Contains the condition(s)
Types of Data files
Fixed length Data files
       All the records in the data file are of the same byte length
       Fields are identified by their positions in the records.
Example: Emp No : 1-5
               Emp Name : 6-16
               Department Name : 17-21
111  JOHN      SALES
222  MARK      HR
333  NANCY    MARKETING  
Variable length Data files
       Columns in the input data file have variable lengths
       Fields are identified by a delimiter
Example: In the following example, a comma separates the input values
111,JOHN,SALES
222,MARK,HR
333,NANCY,MARKETING  
The fields terminated by “,” clause tells SQL*Loader that during the load, each column value will be terminated by a comma
Load Processing:
A control file is executed by using SQLLDR command at the operating system prompt.
Some input parameters should be given for executing the control file.
Each load must have an control file.
Syntax : sqlldr <apps user name>/<apps pwd> control=<control file name> log=<log file name> data=<data file name>
 

Oracle SQL Loader Utility


Abstract
The document is aimed at describing the procedure to be followed while working with SQL Loader Utility. Using the tool, we can load the external data coming in various formats like excel sheet, flat text file, comma Separated Variable (.csv) files and so on.
Case History
Database Administrators may come across some situation, where they need to load the external data coming from other sources into oracle tables. Before loading the data, they can restrict/limit/skip the loading with the usage of Control file (.ctl)

Analysis
External data will be of two types like Fixed Data and Variable Data and depending upon the type of the data, we need to generate the creation of the control file which in turn will convert the utility tool functioned as per the requirement and ensure that only the qualified or eligible data is loaded into the the Tables.

Introduction: As discussed above, the data source will be in the form of various types like .csv and  flat text files and hence the loading of the data into the tables has been classified into types namely Fixed Width and Varied data.

Terminology:
1)    Control File: Control file describes the actions to be done by the sql loader and we can use any text editor to writing a control file. Usually this will be followed by .ctl file extension.
2)    In File: In file or input file is the name of the data source file like .csv or .txt which will be given as a input for the control file to look for the data.
3)    Bad file: Bad file is the file which records the rows which got failed during the loading operation and the cause for the failure. This is a Optional file to provide while executing the sql loader at the command line
4)    Discard File: Discard file is again an optional file, which will store the rows that are not qualified during the load operation, when a condition like where or when has been specified as a part of the control file.
5)    Loading Options:  INSERT,  APPEND, REPLACE and TRUNCATE

Insert           :  Loads rows only if the target table is empty.
Append         :  Load rows if the target table is empty or not.
Replace         :  First deletes all the rows in the table and then load rows
Truncate        :  First truncates the table and then load rows.

Varied Data Format ( .csv format or .txt files)
Example:

“10”,”James”,”1000”,”Finance”
“20”,”Stuart”,”2000”,”Design”
“30”,”Mclean”,”3000”,”Media”
“40”,”Sandra”,”8000”,”Architect”
“50”,”Natalie”,”9000”,”Solutions”

Now we have to generate a control file to describe the loading actions for sqlloader utility to load the data into the table EMP as illustrated below:

load data
infile ‘C:\mine\emp_data.csv’
insert into table emp
fields terminated by “,” optionally enclosed by  `” ` TRAILING NULLCOLS
( table_column1,
 Table_column2,
 Table_column3,
 Table_column4
)

Note: TRAILING NULLCOLS means if the last column is null then treat this as null value, otherwise,  SQL LOADER will treat the record as bad if the last column is null.

After you have written the control file save it and call SQL Loader utility by typing the following command

$ sqlldr userid= scott/tiger control=emp.ctl  log=emp_load.log
                
After you have executed the above command SQL Loader will shows you the output describing how many rows it has loaded.

The LOG option of sql loader specifies where the log file of this sql loader session should be created.  The log file contains all actions which SQL loader has performed i.e. how many rows were loaded, how many were rejected and how much time is taken to load the rows and etc. You have to view this file for any errors encountered while running SQL Loader.

Fixed Data Format ( .dat file)
Example:
10 James   1000   Finance
20 Stuart  2000   Design
30 Mclean  3000  Media
10 Sandra  8000  Architect
20 Natalie  9000  Solutions
SOLUTION:
Steps :-
First Open the file in a text editor and count the length of fields, for above example, employee number is from 1st position to 2nd position, employee name is from 4th position to 10th position, Salary column is from 11th position to 15th position and finally Department column from 16th postion to  21st Position.

load data
infile ‘C:\mine\emp_data.dat’
insert into table emp
(
   table_column1       position(01:02)         integer external,
   table_column2       position(03:10)         char,
   table_column3       position(11:15)         integer external
   table_column4       position(16:21)         char
);
Note:-  The datatypes (INTEGER EXTERNAL, CHAR, DECIMAL EXTERNAL) identify the datatype of data fields in the file, not of corresponding columns in the table.

After writing the control file save it and initiate the SQL Loader utility by typing the following command
$ sqlldr userid= scott/tiger control=emp.ctl  log=emp_load.log

We can simultaneously load data into multiple tables within the same session and using WHEN condition to load only specified rows which meets a particular condition (only equal to “=” and not equal to “<>” conditions are allowed).

Considering the above mentioned data, our requirement is to load the rows having department number equals =10 into one table and  department number not equal to <> 10  to go into another table.
load data
infile ‘C:\mine\emp_data.dat’
append into table scott.emp1
when (dept_no=’ 10’)
(
   table_column1       position(01:02)         integer external,
   table_column2       position(03:10)         char,
   table_column3       position(11:15)         integer external
   table_column4       position(16:21)         char
)
Into table scott.emp2
when (dept_no<>’ 10 ’)
(
   table_column1       position(01:02)         integer external,
   table_column2       position(03:10)         char,
   table_column3       position(11:15)         integer external
   table_column4       position(16:21)         char
);

Note:-          SQL Loader help can be obtained by  typing the command
                                             $sqlldr help=Y
userid          ORACLE username/password
control         Control file name
log              Log file name
bad             Bad file name
data            Data file name
discard         Discard file name
discardmax    Number of discards to allow
skip              Number of logical records to skip
load              Number of logical records to load
errors           Number of errors to allow
bindsize        Size of conventional path bind array in bytes
silent            Suppress messages during run direct, use direct path
parfile           parameter file: name of file that contains parameter specifications
parallel          do parallel load
file               File to allocate extents from
readsize         Size of Read buffer
 
=================
 

SQL Loader Tutorial


Example 1: Loading Variable length Data
Example 1: demonstartes the following
  • A simple control file identifying one table and three columns to be loaded.
  • Including the data to be loaded in the control file itself, so there is no separate data file.
Control File
The control file is example1.ctl
1)   LOAD DATA
2)   INFILE *
3)   INTO TABLE dept_ldr_trg
4)   FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
5)   (deptno, dname, loc)
6)   BEGINDATA
   123,SALES,"DELHI"
   124,"ACCOUNTING",HYDERABAD
   125,"PACKING",KOLKATTA
   126,FINANCE,"MUMBAI"
   127,"HR",CHENNAI.
   128,"ADMIN",PUNE
   129,"RECRUITMENT","GURGAON"
   ,TRANSPORT,"NOIDA"
Points to note:
  1. The LOAD DATA statement marks the beginning of the control file.
  2. INFILE * specifies that data is found in the control file and not in an external file.
  3. The INTO TABLE statement is required to identify the table to be loaded. Since no option is mentioned, the default option is INSERT. By default SQL* Loader requires the table to be empty before it inserts any records.
  4. TERMINATED BY ',' specifies that the data is terminated by commas.
OPTIONALLY ENCLOSED BY specifies that the data may be optionally enclosed by Quotations.
  1. The names of the columns to be loaded are enclosed in parentheses.  Since no data type is specified, the default is a character of length 255.
  2. BEGINDATA specifies the beginning of the data.
To run this example:
sqlldr scott/tiger control=example1.ctl log=example1.log
SQL Loader loads the dept_ldr_trg table and creates a log file example1.log.
The last record gets rejected and the same can be checked from the bad file and the log file as the table dept_trg_ldr has a NOT NULL constraint on the deptno column.
Example 2 – Loading Fixed format Fields
Example 2: demonstrates the following
  • A separate data file
  • Data conversions
  • Load dates using the data type DATE.
  • Usage of functions like TO_CHAR, TO_DATE, DECODE, UPPER etc
  • Using a sequence to insert unique values.
  • I
  •  
  • ===================================================================
 

Oracle SQL*Loader


General
Note: This page consists of a series of demonstrations of various SQL*Loader capabilities. It is by no means complete.

SQL Loader Data Types
CHAR
DECIMAL EXTERNAL
INTEGER EXTERNAL
Modes
APPEND
INSERT
REPLACE
TRUNCATE
INFILE
INFILE * or INFILE '<file_name>'
[RECSIZE <integer> BUFFERS <integer>]
INFILE 'mydata.dat' "RECSIZE 80 BUFFERS 8" 
INTO
INTO <table_name>
B
BADFILE

Records with formatting errors or that cause Oracle errors
BADFILE '<file_name>'
BADFILE 'sample.bad'
DISCARDFILE

Records not satisfying a WHEN clause
DISCARDFILE '<file_name>'
DISCARDMAX <integer>
DISCARDFILE 'sample.dsc'
CHARACTERSET
CHARACTERSET <character_set_name>
X
LENGTH
LENGTH [SEMANTICS] <BYTE | CHAR>
CHARACTERSET <character_set_name>
LENGTH
Load Type
APPEND
INSERT
REPLACE
INSERT INTO <table_name>
Terminators
Comma
','
Tab
0x'09'
WHEN
WHEN
I
TRAILING NULLCOLS
TRAILING NULLCOLS
T
Options Clause
BINDSIZE = n
COLUMNARRAYROWS = n
DIRECT = {TRUE | FALSE}
ERRORS = n
LOAD = n
MULTITHREADING = {TRUE | FALSE}
PARALLEL = {TRUE | FALSE}
READSIZE = n
RESUMABLE = {TRUE | FALSE}
RESUMABLE_NAME = 'text string'
RESUMABLE_TIMEOUT = n
ROWS = n
SILENT = {HEADER | FEEDBACK | ERRORS | DISCARDS | PARTITIONS | ALL}
SKIP = n
SKIP_INDEX_MAINTENANCE = {TRUE | FALSE}
SKIP_UNUSABLE_INDEXES = {TRUE | FALSE}
STREAMSIZE = n
OPTIONS (BINDSIZE=100000, SILENT=(ERRORS, FEEDBACK) )
Paths
CONVENTIONAL PATH
DIRECT PATH

All loads demonstrated below are convention with the exception of demo 6.


Assembling Logical Records
CONCATENATE
CONCATENATE

CONTINUEIF
CONTINUEIF THIS

CONTINUEIF
CONTINUEIF NEXT

CONTINUEIF
CONTINUEIF LAST


Demo Tables & Data
Demo Tables
CREATE TABLE dept (
deptno   VARCHAR2(2),
dname    VARCHAR2(20),
loc      VARCHAR2(20));

CREATE TABLE emp (
empno    NUMBER(4),
ename    VARCHAR2(10),
job      VARCHAR2(10),
mgr      NUMBER(4),
hiredate DATE,
sal      NUMBER(8,2),
comm     NUMBER(7,2),
deptno   NUMBER(2),
projno   NUMBER(4),
loadseq  NUMBER(3));

CREATE TABLE proj (
emp      NUMBER(4),
projno   NUMBER(3));

CREATE TABLE funcdemo (
last_name  VARCHAR2(20),
first_name VARCHAR2(20));

CREATE TABLE decodemo (
fld1    VARCHAR2(20),
fld2    VARCHAR2(20));

CREATE TABLE denver_prj (
projno  VARCHAR2(3),
empno   NUMBER(5),
projhrs NUMBER(2));

CREATE TABLE orlando_prj (
projno  VARCHAR2(3),
empno   NUMBER(5),
projhrs NUMBER(2));

CREATE TABLE misc_prj (
projno  VARCHAR2(3),
empno   NUMBER(5),
projhrs NUMBER(2));

Demo 1
Basic import of delimited data with data in the control file
Control File
LOAD DATA
INFILE
*
INTO TABLE <table_name>
FIELDS TERMINATED BY <delimiter>
OPTIONALLY ENCLOSED BY <enclosing character>
(<column_name>, <column_name>, <column_name>)
sqlldr userid=dev/dev control=c:\load\demo1.ctl log=d:\load\sqlldr01.log

Demo 2
Basic import of fixed length data with separate data and control files
Control File

Data File
LOAD DATA
INFILE
<data_file_path_and_name>
INTO TABLE <table_name> (
<column_name>
POSITION(<integer>:<integer>) <data_type>,
<column_name> POSITION(<integer>:<integer>) <data_type>,
<column_name> POSITION(<integer>:<integer>) <data_type>)
sqlldr userid=dev/dev control=c:\load\sqlldr02.ctl log=c:\load\demo02.log

Demo 3
Append of delimited data with data in the control file. This sample demonstrates date formatting, delimiters within delimiters and implementation of record numbering with a SQL*Loader sequence. APPEND indicates that the table need not be empty before the SQL*Loader is run.
Control File
LOAD DATA
INFILE *
APPEND
INTO TABLE emp
FIELDS TERMINATED BY ","
OPTIONALLY ENCLOSED BY '"'
(<column_name>, <column_name> DATE
"DD-Month-YYYY",
<column_name>
CHAR TERMINATED BY ':',
<column_name>
SEQUENCE(MAX,1))
sqlldr userid=dev/dev control=c:\load\sqldr03.ctl log=c:\load\demo3.log

Demo 4
Replace of fixed length data with separate data and control file. This sample demonstrates specifying a discard file, the maximum number of records to discard (DISCARDMAX), and CONTINUEIF ( where it looks for an asterisk in the first position to determine if a new line has started.
Control File

Data File
LOAD DATA
INFILE 'c:\temp\sqlldr04.dat'
DISCARDFILE 'c:\temp\demo4.dsc'
DISCARDMAX 999
REPLACE
CONTINUEIF THIS (1) = '*'

INTO TABLE emp (
empno        POSITION(1:4)    INTEGER EXTERNAL,
ename        POSITION(6:15)   CHAR,
hiredate    POSITION(52:60)   INTEGER EXTERNAL)
sqlldr userid=dev/dev control=c:\load\sqlldr04.ctl log=c:\load\demo4.log

Demo 5
Loading into multiple tables during an import using the WHEN keyword. The control file loads two different tables making three passes at one of them. Note the problem with the Doolittle record and how it is handled.
Control File

Data File
LOAD DATA
INFILE 'c:\temp\sqlldr05.dat'
BADFILE 'c:\temp\bad05.bad'
DISCARDFILE 'c:\temp\disc05.dsc'
REPLACE

INTO TABLE emp (
empno  POSITION(1:4)   INTEGER EXTERNAL,
ename  POSITION(6:15)  CHAR,
deptno POSITION(17:18) CHAR,
mgr    POSITION(20:23) INTEGER EXTERNAL)

--1st project: proj has two columns, both not null
INTO TABLE proj
WHEN projno != ' ' (
emp POSITION(1:4) INTEGER EXTERNAL,
projno POSITION(25:27) INTEGER EXTERNAL)

-- 2nd project
INTO TABLE proj
WHEN projno != ' ' (
emp POSITION(1:4) INTEGER EXTERNAL,
projno POSITION(29:31) INTEGER EXTERNAL)

-- 3rd project
INTO TABLE proj
WHEN projno != ' ' (
emp POSITION(1:4) INTEGER EXTERNAL,
projno POSITION(33:35) INTEGER EXTERNAL)
sqlldr userid=dev/dev control=c:\load\sqlldr5.ctl log=d:\load\demo5.log

Demo 6
Using the NULLIF and BLANKS keywords to handle zero length strings being loaded into numeric columns. Also note the use of Direct Path Load in the control file (direct=true).
Control File

Data File
LOAD DATA
INFILE 'c:\temp\sqlldr06.dat'
INSERT
INTO TABLE emp
-- SORTED INDEXES (emp_empno)
(
empno  POSITION(01:04) INTEGER EXTERNAL
NULLIF empno=BLANKS,
ename  POSITION(06:15) CHAR,
job    POSITION(17:25) CHAR,
mgr    POSITION(27:30) INTEGER EXTERNAL
NULLIF mgr=BLANKS,
sal    POSITION(32:39) DECIMAL EXTERNAL
NULLIF sal=BLANKS,
comm   POSITION(41:48) DECIMAL EXTERNAL
NULLIF comm=BLANKS,
deptno POSITION(50:51) INTEGER EXTERNAL
NULLIF deptno=BLANKS)
sqlldr userid=dev/dev control=c:\load\sqlldr06.ctl log=c:\load\demo6.log direct=true

Demo 7
Using a buit-in function to modify data during loading
Control File
LOAD DATA
INFILE *
INSERT
INTO TABLE funcdemo
(
LAST_NAME position(1:7) CHAR
"UPPER(:LAST_NAME)",
FIRST_NAME position(8:15) CHAR
"LOWER(:FIRST_NAME)"
)
BEGINDATA
Locke Phil
Cline Jack
sqlldr userid=dev/dev control=c:\load\sqlldr7.ctl log=c:\load\demo7.log

Demo 8
Another example of using a built-in function, in this case DECODE, to modify data during loading
Control File
LOAD DATA
INFILE *
INSERT
INTO TABLE decodemo
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
(
fld1,
fld2
"DECODE(:fld1, 'hello', 'goodbye', :fld1)"
)
BEGINDATA
hello,""
goodbye,""
this is a test,""
hello,""
sqlldr userid=dev/dev control=c:\load\sqlldr08.ctl log=c:\load\demo8.log

Demo 9
Loading multiple files into multiple tables in a singe control file. Note the use of the WHEN keyword.
Control File

Data File

Data File
LOAD DATA
INFILE 'c:\temp\sqlldr9a.dat'
INFILE 'c:\temp\sqlldr9b.dat'

APPEND
INTO TABLE denver_prj
WHEN projno = '101'
(projno position(1:3) CHAR,
empno position(4:8) INTEGER EXTERNAL,
projhrs position(9:10) INTEGER EXTERNAL)

INTO TABLE orlando_prj
WHEN projno = '202'
(projno position(1:3) CHAR,
empno position(4:8) INTEGER EXTERNAL,
projhrs position(9:10) INTEGER EXTERNAL)

INTO TABLE misc_prj
WHEN projno != '101' AND projno != '202'
(projno position(1:3) CHAR,
empno position(4:8) INTEGER EXTERNAL,
projhrs position(9:10) INTEGER EXTERNAL)
sqlldr userid=dev/dev control=c:\load\sqlldr09.ctl log=c:\load\demo9.log

Demo 10
Loading negative numeric values. Note Clark and Miller's records in the data file.
Control File

Data File
LOAD DATA
INFILE 'c:\temp\sqlldr10.dat'
INTO TABLE emp
(
empno  POSITION(01:04) INTEGER EXTERNAL,
ename  POSITION(06:15) CHAR,
job    POSITION(17:25) CHAR,
mgr    POSITION(27:30) INTEGER EXTERNAL,
sal    POSITION(32:39) DECIMAL EXTERNAL,
comm   POSITION(41:48) DECIMAL EXTERNAL,
deptno POSITION(50:51) INTEGER EXTERNAL)
sqlldr userid=dev/dev control=c:\load\sqlldr10.ctl log=c:\load\demo10.log

Demo 11
Loading XML
Control File

Data File
LOAD DATA
INFILE *
INTO TABLE po_tab
APPEND
XMLTYPE (xmldata)
FIELDS
(xmldata CHAR(2000))
BEGINDATA

<?xml version="1.0"?>
<purchaseOrder xmlns="http://www.oracle.com/PO" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.oracle.com/PO http://www.oracle.com/scha0/po1.xsd"
orderDate="1999-10-20">

<shipTo country="US">
<name>Alice Smith</name>
<street>123 Maple Street</street>
<city>Mill Valley</city>
<state>CA</state>
<zip>90952</zip>
</shipTo>

<billTo country="US">
<name>Robert Smith</name>
<street>8 Oak Avenue</street>
<city>Old Town</city>
<state>PA</state>
<zip>95819</zip>
</billTo>

<comment>Hurry, my lawn is going wild!</comment>

<items>
<item partNum="872-AA">
<productName>Lawnmower</productName>
<quantity>1</quantity>
<USPrice>148.95</USPrice>
<comment>Confirm this is electric</comment>
</item>
<item partNum="926-AA">
<productName>Baby Monitor</productName>
<quantity>1</quantity>
<USPrice>39.98</USPrice>
<shipDate>1999-05-21</shipDate>
</item>
</items>
</purchaseOrder>
sqlldr userid=dev/dev control=c:\load\sqlldr11.ctl log=c:\load\demo11.log

1 comment:

  1. It was really a nice article and i was really impressed by reading this Data Science online Course


    ReplyDelete