Oracle SQL* Loader Concepts
Introduction
SQL*Loader is a bulk loader
utility used for moving data from external files into the Oracle database.
Capabilities
of SQL*Loader
•
Can
load data from multiple data files during the same load session.
•
Can
load data into multiple tables during the same load session.
•
Can
selectively load data (Records can be loaded based on the records' values).
•
Can
manipulate the data before loading it, using SQL functions.
•
Can
generate unique sequential key values in specified columns .
Does
sophisticated error reporting which greatly aids troubleshooting
SQL Loader - How it works
SQL*Loader takes as input a
control file, which controls the behavior of SQL*Loader, and one or more data
files.
Output
of the SQL*Loader is an Oracle database (where the data is loaded), a log file,
a bad file, and potentially a discard file.
Control
File : The SQL*Loader control File is the key to any load process.
The
control file provides the following information to SQL*Loader.
•
The
name and location of the input file
•
The
format of the records in the input data file
•
The
name of table or tables to be loaded
•
The
correspondence between the fields in the input record and the columns in the
database tables being loaded.
•
Selection
criteria defining which records from the input file contain data to be inserted
into the destination database tables
•
The
names and location of the bad file and the discard file.
Data
File : The data file contains the data to be loaded in a specific
format.
Sometimes
if the amount of data is less, it can be embedded as part of the control file.
Bad
File : The bad file contains records rejected, either by SQL*Loader or
by Oracle.
Records
are rejected by SQL*Loader when the input format is invalid.
For
example: If a delimited field exceeds the specified length.
After
a record is accepted for processing by SQL*Loader, a row is sent to Oracle for
insertion.
If
Oracle determines that the row is valid, then the row is inserted into the
database.
If
not, the record is rejected, and SQL*Loader puts it in the bad file.
The
row may be rejected, for example, because a key is not unique, because a
required field is not null, or because the field contains invalid data for the
Oracle data type.
Log
File : The log file contains a detailed summary of the load, including a
description of any errors that occurred during the load.
The
summary of the load includes the number of logical records read from the data
file, the number of records rejected because of errors, the number of records
discarded because of selection criteria, and the elapsed time of the load.
Discard
File : The discard file contains records that were filtered out of the load
because they did not match any record-selection criteria specified in the
control file.
Discard
file is created only when it is needed, and only if specified that a discard
file should be enabled.
Structure of control file
LOAD DATA
INFILE ‘<data file name>’
BADFILE ‘<bad file name>’
DISCARDFILE ‘<discard file name>’
APPEND/TRUNCATE/REPLACE/INSERT
INTO TABLE <table name>
WHEN <condition>
TRAILING NULLCOLS
(column 1
constant,
column 2
<sequence_name>.nextval,
column 3
“<sql function(data field)>”,
…)
<BEGINDATA>
•
LOAD DATA : It signifies the start of control file.
•
INFILE : Specifies the Data file name(s)
•
BADFILE : Specifies Bad file name that gets
created during the SQL loader run
•
DISCARDFILE: Specifies the Discard file name that gets
created
•
APPEND/TRUNCATE/INSERT/REPLACE : Operation that
will be performed on the database table
•
INTO TABLE: Table name to which the data will be
loaded
•
BEGINDATA: Specifies that the subsequent lines of
the control file contains data. In this case INFILE use is not required
•
POSITION: Specifies the position of each column of
data
•
WHEN : Contains the condition(s)
Types of Data files
Fixed
length Data files
All the records in the data file are of
the same byte length
Fields are identified by their positions
in the records.
Example:
Emp No : 1-5
Emp Name : 6-16
Department Name : 17-21
111 JOHN
SALES
222 MARK
HR
333 NANCY
MARKETING
Variable
length Data files
Columns in the input data file have
variable lengths
Fields are identified by a delimiter
Example:
In the following example, a comma separates the input values
111,JOHN,SALES
222,MARK,HR
333,NANCY,MARKETING
The
fields terminated by “,” clause tells SQL*Loader that during the load, each
column value will be terminated by a comma
Load Processing:
A
control file is executed by using SQLLDR command at the operating system
prompt.
Some
input parameters should be given for executing the control file.
Each
load must have an control file.
Syntax
: sqlldr <apps user name>/<apps pwd> control=<control file
name> log=<log file name> data=<data file name>
Oracle SQL Loader Utility
Abstract
The
document is aimed at describing the procedure to be followed while
working with SQL Loader Utility. Using the tool, we can load the
external data coming in various formats like excel sheet, flat text file, comma Separated Variable (.csv) files and so on.
Case History
Database
Administrators may come across some situation, where they need to load
the external data coming from other sources into oracle tables. Before
loading the data, they can restrict/limit/skip the loading with the
usage of Control file (.ctl)
Analysis
External
data will be of two types like Fixed Data and Variable Data and
depending upon the type of the data, we need to generate the creation of
the control file which in turn will convert the utility tool functioned as per the requirement and ensure that only the qualified or eligible data is loaded into the the Tables.
Introduction: As discussed above, the data source will be in the form of various types like .csv and flat text files and hence the loading of the data into the tables has been classified into types namely Fixed Width and Varied data.
Terminology:
1) Control
File: Control file describes the actions to be done by the sql loader
and we can use any text editor to writing a control file. Usually this
will be followed by .ctl file extension.
2) In
File: In file or input file is the name of the data source file like
.csv or .txt which will be given as a input for the control file to look
for the data.
3) Bad
file: Bad file is the file which records the rows which got failed
during the loading operation and the cause for the failure. This is a
Optional file to provide while executing the sql loader at the command
line
4) Discard
File: Discard file is again an optional file, which will store the rows
that are not qualified during the load operation, when a condition like
where or when has been specified as a part of the control file.
5) Loading Options: INSERT, APPEND, REPLACE and TRUNCATE
Insert : Loads rows only if the target table is empty.
Append : Load rows if the target table is empty or not.
Replace : First deletes all the rows in the table and then load rows
Truncate : First truncates the table and then load rows.
Varied Data Format ( .csv format or .txt files)
Example:
“10”,”James”,”1000”,”Finance”
“20”,”Stuart”,”2000”,”Design”
“30”,”Mclean”,”3000”,”Media”
“40”,”Sandra”,”8000”,”Architect”
“50”,”Natalie”,”9000”,”Solutions”
Now
we have to generate a control file to describe the loading actions for
sqlloader utility to load the data into the table EMP as illustrated
below:
load data
infile ‘C:\mine\emp_data.csv’
insert into table emp
fields terminated by “,” optionally enclosed by `” ` TRAILING NULLCOLS
( table_column1,
Table_column2,
Table_column3,
Table_column4
)
Note: TRAILING NULLCOLS
means if the last column is null then treat this as null value,
otherwise, SQL LOADER will treat the record as bad if the last column
is null.
After you have written the control file save it and call SQL Loader utility by typing the following command
$ sqlldr userid= scott/tiger control=emp.ctl log=emp_load.log
After you have executed the above command SQL Loader will shows you the output describing how many rows it has loaded.
The LOG option of sql loader specifies where the log file of this sql loader session should be created. The log file
contains all actions which SQL loader has performed i.e. how many rows
were loaded, how many were rejected and how much time is taken to load
the rows and etc. You have to view this file for any errors encountered
while running SQL Loader.
Fixed Data Format ( .dat file)
Example:
10 James 1000 Finance
20 Stuart 2000 Design
30 Mclean 3000 Media
10 Sandra 8000 Architect
20 Natalie 9000 Solutions
SOLUTION:
Steps :-
First Open the file in a text editor and count the length of fields, for above example, employee number is from 1st position to 2nd position, employee name is from 4th position to 10th position, Salary column is from 11th position to 15th position and finally Department column from 16th postion to 21st Position.
load data
infile ‘C:\mine\emp_data.dat’
insert into table emp
(
table_column1 position(01:02) integer external,
table_column2 position(03:10) char,
table_column3 position(11:15) integer external
table_column4 position(16:21) char
);
Note:- The
datatypes (INTEGER EXTERNAL, CHAR, DECIMAL EXTERNAL) identify the
datatype of data fields in the file, not of corresponding columns in the
table.
After writing the control file save it and initiate the SQL Loader utility by typing the following command
$ sqlldr userid= scott/tiger control=emp.ctl log=emp_load.log
We
can simultaneously load data into multiple tables within the same
session and using WHEN condition to load only specified rows which meets
a particular condition (only equal to “=” and not equal to “<>”
conditions are allowed).
Considering
the above mentioned data, our requirement is to load the rows having
department number equals =10 into one table and department number not
equal to <> 10 to go into another table.
load data
infile ‘C:\mine\emp_data.dat’
append into table scott.emp1
when (dept_no=’ 10’)
(
table_column1 position(01:02) integer external,
table_column2 position(03:10) char,
table_column3 position(11:15) integer external
table_column4 position(16:21) char
)
Into table scott.emp2
when (dept_no<>’ 10 ’)
(
table_column1 position(01:02) integer external,
table_column2 position(03:10) char,
table_column3 position(11:15) integer external
table_column4 position(16:21) char
);
Note:- SQL Loader help can be obtained by typing the command
$sqlldr help=Y
userid ORACLE username/password
control Control file name
log Log file name
bad Bad file name
data Data file name
discard Discard file name
discardmax Number of discards to allow
skip Number of logical records to skip
load Number of logical records to load
errors Number of errors to allow
bindsize Size of conventional path bind array in bytes
silent Suppress messages during run direct, use direct path
parfile parameter file: name of file that contains parameter specifications
parallel do parallel load
file File to allocate extents from
readsize Size of Read buffer
=================
SQL Loader Tutorial
Example 1: Loading
Variable length Data
Example
1: demonstartes the following
- A simple control file identifying one table and three columns to be loaded.
- Including the data to be loaded in the control file itself, so there is no separate data file.
Control File
The
control file is example1.ctl
1) LOAD DATA
2) INFILE *
3) INTO TABLE dept_ldr_trg
4) FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
5) (deptno, dname, loc)
6) BEGINDATA
123,SALES,"DELHI"
124,"ACCOUNTING",HYDERABAD
125,"PACKING",KOLKATTA
126,FINANCE,"MUMBAI"
127,"HR",CHENNAI.
128,"ADMIN",PUNE
,TRANSPORT,"NOIDA"
Points to note:
- The LOAD DATA statement marks the beginning of the control file.
- INFILE * specifies that data is found in the control file and not in an external file.
- The INTO TABLE statement is required to identify the table to be loaded. Since no option is mentioned, the default option is INSERT. By default SQL* Loader requires the table to be empty before it inserts any records.
- TERMINATED BY ',' specifies that the data is terminated by commas.
OPTIONALLY ENCLOSED BY specifies that the
data may be optionally enclosed by Quotations.
- The names of the columns to be loaded are enclosed in parentheses. Since no data type is specified, the default is a character of length 255.
- BEGINDATA specifies the beginning of the data.
To
run this example:
sqlldr
scott/tiger control=example1.ctl log=example1.log
SQL
Loader loads the dept_ldr_trg table and creates a log file example1.log.
The
last record gets rejected and the same can be checked from the bad file and the
log file as the table dept_trg_ldr has a NOT NULL constraint on the deptno
column.
Example 2 – Loading
Fixed format Fields
Example
2: demonstrates the following
- A separate data file
- Data conversions
- Load dates using the data type DATE.
- Usage of functions like TO_CHAR, TO_DATE, DECODE, UPPER etc
- Using a sequence to insert unique values.
- I
- ===================================================================