A Comprehensive Guide
Becoming a proficient Oracle Database
Administrator (DBA) involves a deep understanding of database architecture,
robust backup and recovery strategies, keen performance tuning skills, and
stringent security protocols. This guide delves into these critical areas,
offering insights based on the experience of an Oracle DBA and the foundational
concepts of Oracle databases.
The Role of an
Oracle Database Administrator: A Professional's Perspective
Joe Black, an Oracle Database
Administrator for over 8 years, brings a wealth of experience to the field. He
began his career as a research analyst, managing and updating European Buy-side
Databases, and later transitioned to a Senior Account Manager assisting clients
with customized CRMs. This progression from an "end-user" account
manager to a database administrator highlights a valuable journey through
various roles, gaining diverse experience.
Joe currently works at the U.S.
Department of State in Edina, MN, providing DBA services to clients within the
Metro Area and collaborating with the Preston IT Consulting team on contractual
bids. His daily activities are multifaceted and critical for maintaining
database health:
Checking emails for outstanding requests (via Remedy) and resolving them based on priority.
Checking alert logs for major changes, errors, warnings, and messages. The alert log captures significant events and initialization parameters upon startup.
Primarily using Oracle Enterprise Manager (OEM) to stay apprised of database issues.
Checking critical TABLESPACES to ensure sufficient space, preventing performance slowdowns.
Validating backups to ensure successful completion, a crucial part of his routine.
- Running simulated backup and recovery scenarios in testing environments.
- Monitoring databases for performance bottlenecks.
- Working with development teams to create new databases or test new applications.
- Reviewing updates and patches for Oracle.
He manages databases using tools such
as TOAD, OEM, SQL Developer, and SQL Loader. His experience spans Oracle
versions 10g, 11g, and 12c across various platforms including Linux, UNIX,
Solaris, and Windows. His current team consists of 5 DBAs, who work closely
with system administrators, network administrators, and application developers.
They support over 200 databases in production, development, and testing
environments, ranging from 100 GB to 23 terabytes, with approximately 7,000
concurrent users. Their databases currently run on Oracle version 11.2.0.4 on
Red Hat Enterprise Linux 7.3.
Understanding Oracle
Database Fundamentals
An Oracle database is composed of both
logical and physical structures.
- Physical Structures, which are viewable from the
operating system, include:
- Control files: These are crucial binary files that record the physical structure of the database, including its name, timestamps, datafile locations and statuses, redo log file locations, archiving mode, tablespace information, checkpoint data, and backup files. Oracle recommends at least two control files, but having three on different disks is preferred for redundancy and availability, preventing a single point of failure. If a control file is corrupted or inaccessible due to media failure, the instance will shut down.
- Online redo log files: These contain information necessary to re-create any database transactions and are multiplexed for reliability. The Log Writer (LGWR) writes to all members of a redo log group simultaneously. A "log switch" occurs when LGWR moves to a new group. In noarchivelog mode, older redo log files are overwritten, while in archivelog mode, log archiving happens before reuse. The LGWR writes to the online redo log files before the Database Buffer Writer (DBWn) writes dirty blocks to data files, a process called write-ahead protocol, fundamental to data integrity.
- Archived redo log files: Copies of online redo log files used for recovery purposes.Password files, parameter files, data files, alert logs, trace files, and backup files.
- Logical Structures, which cannot be viewed from the
operating system, include:
- Tablespace: Comprises at least one datafile,
physically located on the server's file system. A datafile belongs to exactly
one tablespace, and all database objects reside within a tablespace.
- Data block: The smallest unit of Oracle Storage, where data is stored. Oracle I/O operates on data blocks, which are based on the underlying operating system blocks.
- Extents: A specific number of continuous data blocks obtained in a single allocation, used to store information as part of a segment.
- Segment: Consists of one or more extents allocated within a tablespace. There are four types: Data, Index, Undo, and Temporary segments.
An Oracle Instance comprises the
memory structure and background processes.
- Memory Structure (SGA/PGA):
- Required components:
- Database buffer cache: Keeps recently used data in memory.
- Shared pool: Contains the library cache (for SQL/PLSQL and recently executed statements) and data dictionary cache (for data definitions like datafiles, tables, indexes, privileges, users). Caching these improves performance.
- Redo log buffer: Keeps transaction log information (redo records).
- Optional components:
- Large pool: Caches data for large operations like RMAN backup/recovery.
- Java pool: Caches Java objects/codes.
- Streams pool: Caches queued messages.
Background Processes:
- System Monitor (SMON): Performs system-level cleanup, instance recovery, and cleans unused temporary segments.
- Process Monitor (PMON): Cleans up after failed user connections, resets active transaction tables, releases locks, and removes process IDs.
- Log Writer (LGWR): Writes redo records from the log buffer to online redo log files on disk.
- Database Buffer Writer (DBWn): Writes dirty blocks from the Database Buffer Cache to datafiles.
- Checkpoint (CKPT): Updates control files for synchronization with datafiles and ensures system-wide data consistency using System Change Numbers (SCN).
- Archiver (ARCn): Copies online redo log files to archived redo log files, when enabled.
Strategic Backup and
Recovery (B/R)
Backup and Recovery are crucial for
guaranteeing recovery and business continuity. The ultimate goal is to minimize
Mean Time to Recover (MTTR). Joe's strategy is guided by factors such as
database size, Service Level Agreements (SLAs), the environment, available disk
space, and the volume of data changes.
He primarily uses RMAN (Recovery
Manager) for B/R due to its many benefits:
- Ability to perform incremental backups.
- Leverages the Data Recovery Advisor for diagnosing and repairing data failures and corruption.
- Enables easy automation of Backup and Recovery.
- Allows verification of backup usability and checks for corrupted files, with the ability to recover corrupted blocks without taking datafiles offline.
- Offers simpler backup and recovery commands.
Joe enables Autobackup Controlfile
and Backup Optimization in RMAN configuration. He utilizes the Fast
Recovery Area (FRA) as the default B/R file location because it acts as a
central storage area, automatically manages disk space, performs operations
faster, and offers increased reliability as disk is safer than tape.
Backup strategies vary by database
size:
- Small databases (below 300 GB): Daily full export backups and daily Archivelog backups.
- Medium databases (300 GB – 3 TB): Incremental level 0 backups on Sundays and daily level 1 incremental backups Monday to Saturday. Archivelogs are backed up hourly. For larger databases in this range, data domain is used for storage instead of FRA.
- Large databases (over 3 TB): Full hot backup scheduled quarterly on Sunday (during slow downtime) with hourly Archivelog backups. Data domain is also used for storage.
Joe enables Oracle block change
tracking to improve incremental backup performance by recording data block
changes. This allows RMAN to identify changed blocks without scanning all data
blocks. He performs a full level 0 hot backup every Sunday, followed by level 1
incremental backups Monday to Saturday as overnight jobs, coinciding with low
traffic periods. Production databases are always run in archivelog mode.
For additional safety, he maintains a redundancy
set including the last backup of data and control files, multiplexed copies
of current redo log files, copies of the current control file, and all archived
redologs since the last backup. Tape copies of backup files are maintained at
an offsite location using Oracle Secure Backup, with encryption for security.
A key decision for DBAs is hot
versus cold backups:
- Hot backups are performed while the database is online and running, thus being in an inconsistent state.
- Cold backups are done when the database is shut down, ensuring a consistent state where data is static and SCNs (System Change Numbers) are consistent. RMAN can perform whole-database hot backups.
Incremental backups can be
differential (default) or cumulative:
- Differential: Each subsequent backup contains changes since the last level 0 backup. For recovery, the level 0 backup and all subsequent level 1 backups are needed.
- Cumulative: Each subsequent level 1 backup contains all information changed since the last level 0 backup. For recovery, only the level 0 backup and the most recent level 1 backup are needed. A level 0 incremental backup copies all blocks with data, similar to a full backup, but is specifically part of an incremental strategy.
- To increase backup speed, besides block change tracking, increasing the number of RMAN channels (default is 1) also helps with large datafiles.
- Recovery is typically performed in mount mode,
except for control file recovery. The process generally involves restoring
files, then recovering them by applying redo log files and incremental backups.
- Recovering a lost or damaged control file with autobackup enabled often involves starting the instance in nomount mode, restoring the controlfile from autobackup, then mounting, restoring, recovering the database, and finally opening with resetlogs. Alternatively, if control files are multiplexed, one can use operating system commands to copy a good control file over a bad one or update the CONTROL_FILES parameter.
- Recovering a lost datafile involves starting the instance in mount mode, restoring the datafile, recovering the datafile, and then opening the database with resetlogs.
- Incomplete recovery is used to recover to a specific point in time before the most recent log transaction, either by time or change. Complete recovery runs until no more logs are available.
Ensuring Database
Security
Database security focuses on
preventing unauthorized use of the database and its components. It’s important
to note that security breaches are often caused by internal users, not
just external hackers. Joe's security strategy is based on Cisco Systems'
Triple A principles (Authentication, Authorization, and Auditing) and Data
Encryption via Transparent Data Encryption (TDE).
- Authentication: Granting database access through credential validation. This involves managing password profiles (failed login attempts, password lifetime, grace time) and enforcing password complexity (e.g., at least one number and one alphabetical character, not a simple reversal of username).
- Authorization: Limiting access for authenticated users. Oracle follows the Principle of Least Privilege, meaning users cannot create sessions or manipulate objects without explicit privileges. Joe avoids using the ADMIN OPTION when granting privileges to maintain clear oversight.
- Auditing: Monitoring user activity. To be efficient and manage space, Fine Grained Auditing (FGA) is used to audit specific selects, inserts, or updates on the most sensitive objects.
- Data Encryption: Using Transparent Data Encryption (TDE), introduced in Oracle 10g, for sensitive data. TDE automatically manages encryption/decryption keys and allows indexing of encrypted columns. It uses the Advanced Encryption Standard (AES) algorithm by default, which is recommended by the US government and safer than DES. An Oracle Wallet is a container for storing authentication and signing credentials, crucial for TDE's master key. There are encryption wallets (manual opening after startup) and auto-open wallets (automatic opening), with encryption wallets generally recommended unless dealing with unattended Oracle Data Guard environments.
Performance Tuning:
An Ongoing Challenge
Performance tuning is not an exact
science, as many factors can cause poor database performance. Joe's approach
involves a systematic diagnosis:
1.Check Connection/Network: Verify if the LISTENER is running (using lsnrctl status) and ping the network to rule out router issues.
2.Check System Resources: If the network is good, analyze excessive resource consumption (CPU, memory, I/O) using tools like System Activity Report (SAR), TOP, IOSTAT, and FREE.
3.Address Oracle-Related Problems:
- Literal values versus bind variables: Using literal values forces hard parsing, while bind variables allow reuse of parsed statements from the library cache, reducing I/O.
- Row-lock contention: Identify and kill blocking sessions by querying v$session.
- Unusable indexes: If DML statements make an index unusable, the Cost Based Optimizer (CBO) will perform full table scans instead of index scans, slowing performance. Rebuilding the index is the solution.
4.Diagnostic Reports:
- Automatic Diagnostic Repository (ADR): Stores Alert logs, Health monitor reports, and Trace files, accessed via adrci.
- Automatic Workload Repository (AWR): Contains snapshots of database statistics from the SGA, stored in the SYSAUX tablespace, collected hourly by default. Used for analyzing performance between two points in time via awrrpt.sql.
- Active Session History (ASH): Provides information from the past 10 minutes, unlike AWR. Accessed via ashrpt.sql.
- Automatic Database Diagnostic Monitor (ADDM): Analyzes AWR snapshots and recommends actions for severe performance issues (e.g., expensive SQL, I/O, locking, memory allocation).
- Explain Plan: Shows the execution plan chosen by the Oracle optimizer for a SQL statement, helping to understand query performance and whether indexes or full table scans are used.
5.Statistics: The database automatically collects optimizer statistics via the GATHER_STATS_JOB daily during maintenance windows. It collects statistics for tables without stats or with stale stats (more than 10% data change). For objects undergoing huge changes, manual scheduling of stats collection might be needed.
6.Indexes: Improve speedy access to table rows.
- When to index: When selecting a small portion of a table, for primary key/unique key/not-null columns, secondary indexes on frequently queried columns (e.g., city), foreign keys, predicate columns, and columns used in joins. Data warehousing can have many indexes due to its query-oriented nature.
- When NOT to index: If a query retrieves more than 10-15% of rows, or with large numbers of DML operations in OLTP environments (too many indexes are detrimental).
- Types: B-tree indexes (default, best for OLTP, high cardinality like SSN), and Bitmap indexes (ideal for low cardinality, few distinct values like Sex/State, good for data warehousing, problematic with many DMLs).
7.Hints: Can be used to override the CBO's execution plan, forcing Oracle to use an index if known to be better than a full table scan.
8.Views: Virtual tables formed by a query, requiring CREATE VIEW privilege.
9.Materialized Views: Real objects, not virtual. They allow running statistics and reports without significantly affecting the database, as they are rebuilt later to include changes, not updated in real time like indexes.
10.Table Partitioning: Enhances performance, manageability, and availability by subdividing large tables and indexes. Common types include Range Partitioning (often with dates) and Hash Partitioning (when data is not historical or lacks an obvious key). Recommended for tables larger than 2GB or historical data.
Key Operational
Concepts
- PFILE vs. SPFILE:
- PFILE (Parameter File): A text file used before Oracle 9i, containing static initialization parameters. Changes require a database restart, problematic for 24/7 uptime.
- SPFILE (Server Parameter File): Introduced in 9i, it's a dynamically maintained binary file. It contains both dynamic (changes take effect immediately without restart) and static parameters, making it superior for continuous operation.
Database Startup and Shutdown:
Startup stages:
1.NOMOUNT: Reads the spfile/pfile, allocates SGA memory, starts background processes, creating an instance. Alert logs are created.
2.MOUNT: Oracle opens and reads the control files, locating datafiles and redo log files. Operations like renaming datafiles or enabling/disabling archivelog mode can be done here.
3.OPEN: Datafiles and redo log files are opened. If inconsistent, SMON performs instance recovery, and the database becomes available.
Shutdown stages: A checkpoint is performed, dirty blocks are written, control and datafile headers are updated for consistency. Datafiles and redo log files are then closed. The database is dismounted and control files closed. Finally, the instance is deleted from memory.
Tablespace Management:
- Temporary tablespaces: Used to manage space for database sort operations and global temporary tables. Assigning them to users prevents them from allocating sort space in the SYSTEM tablespace. Benefits include preventing sort space errors and efficient parallel execution.
- Undo tablespace: Essential for maintaining undo information (records of transaction actions before commitment). Undo records are used for transaction rollbacks, database recovery, read consistency, Flashback Query, and Flashback features to recover from logical corruptions. Oracle 9i introduced automatic undo management via undo tablespaces, strongly recommended over rollback segments.
- OLTP (Online Transaction Processing): Systems designed for frequent, short, online transactions. Examples include order entry, retail sales, and financial systems.
- Oracle Data Guard: Ensures high availability, data protection, and disaster recovery. It creates, maintains, manages, and monitors standby databases as copies of the production database. In case of primary database unavailability (planned or unplanned), Data Guard can switch a standby to the primary role, minimizing downtime.
- Configuration: One primary database and one or more standby databases, with redo logs shipped and applied to standbys.
Types of Standby Databases:
- Physical Standby: Physically identical copy, synchronized via Redo Apply (recovering redo data).
- Logical Standby: Contains the same logical information but can have different physical organization. Synchronized via SQL Apply (transforming redo data into SQL statements). Allows concurrent user access for queries, reporting, and enables almost no downtime upgrades.
- Snapshot Standby: A fully updatable standby that archives redo logs but doesn't apply them until converted back to a physical standby, discarding local updates.
Operations:
- Switchover: A planned role reversal between primary and standby with no data loss.
- Failover: Occurs when the primary database fails, transitioning a standby to the primary role. Can be configured for zero data loss.
- Data Guard Broker: A distributed management framework that automates configuration, maintenance, and monitoring of Data Guard.
- Protection Modes: Max Availability (zero data loss, sync transport of redo, availability priority), Max Protection (sync transport of redo, zero data loss), Max Performance (performance priority, potential data loss).
- Oracle ASM (Automatic Storage Management): A file system manager that optimizes storage for database files, supporting features like striping and mirroring.
- ASM Instance: A lightweight memory structure with its own background processes that maintains metadata about files and disk groups.
- Disk Group: A group of disks (LUNs or JBODs) across which striping is done for performance.
- ASM Redundancy: Mirroring disk files within the same disk group for availability. Types include External (storage array handles mirroring), Normal (ASM maintains two copies), and High (ASM maintains three copies, default setting).
- Fail Groups: Disks within a disk group are fail groups, allowing for mirroring across separate servers for improved fault tolerance.
- Transportable Tablespaces: A feature to copy a set of tablespaces between Oracle databases.
- Benefits: Much faster than export/import because datafiles are directly copied, and only metadata is transferred via utilities like Data Pump or imp/exp. Useful for data warehousing partitions, publishing structured data, copying read-only versions, and archiving historical data.
- Process: Requires placing tablespaces in read-only mode during transport. Supports cross-platform transport starting with Oracle Database 10g, especially if platforms have the same endianness. If endianness differs, a conversion step is needed.
- Limitations: Source and target databases must use the same character set. Cannot transport to a database with an existing tablespace of the same name (though renaming is possible). Objects with underlying/contained objects are only transportable if all are in the set. Specific limitations apply to XMLType tables and Advanced Queues. Compatibility level of the transportable set must be equal to or lower than the target database's compatibility.
Troubleshooting
Common Scenarios
- Database Hung (ORA-00257: Archiver Full): A common reason for a hung database is space shortage, particularly if the archiver is full. Users cannot execute DML. Solutions include changing retention policies to purge archive logs, increasing DB_RECOVERY_FILE_DEST_SIZE, backing up FRA contents, or deleting obsolete backup files with RMAN. Other causes for a hung database can include poorly performing SQL statements, blocking locks, latch contention, or insufficient PGA sizing. If the Flash Recovery Area fills up, the database won't process new work or spawn new connections.
- ORA-01652: Unable to extend temp segment in tablespace: This error, often misleading, indicates insufficient space in a permanent tablespace where a temporary segment (e.g., for index creation) is being built, not the actual TEMP tablespace. The solution is to add a datafile to the offending permanent tablespace.
No comments:
Post a Comment