故障概述
今天项目上其他公司的运维人员找我说某数据库好像归档满了,并且他直接rman删了3天前的所有归档,然后发现二节点是宕机的,启动二节点数据库后发现2节点undo数据文件需要recover,这个数据库没有备份。(万幸的是,需要recover的只是2节点undo数据文件)。
切记:!!!!归档日志没有备份的情况下,在删除归档日志之前一定要确认是否存在需要使用归档日志recover的数据文件!!!!!否则只有丢数据了!!!
进入正题:
故障原因:
UNDOTBS2表空间上还有active状态的事务(未提交),二节点由于归档目录满了导致数据库宕机,然后Oracle会将undo表空间下的记录着active未提交的段都标志为NEEDS RECOVERY。
处理思路:
1、有备份的情况下,直接备份恢复就可以了。
2、如果没有备份,可以用新建的UNDO替代损坏的UNDO,那么损坏UNDO上的未提交事务也将不得不丢弃。当然NEEDS RECOVERY的回滚段需要先进行处理。
故障处理
检查需要recover的数据文件:
SQL> select file#,status,name from v$datafile where status ='RECOVER';
FILE# STATUS
---------- ---------------------
NAME
---------------------------------------------------
5 RECOVER
+XXXX/XXXX/datafile/undotbs2.267.795022137
1 rows selected
先创建一个新的undo表空间
SQL> create undo tablespace UNDOTBS02 datafile size 10g autoextend on next 2g;
Tablespace created.
查看当前节点(2节点)的表空间,还是需要recover的undotbs2。
SQL> show parameter undo
NAME TYPE VALUE
------------------- ------------ --------------------
undo_management string AUTO
undo_retention integer 900
undo_tablespace string UNDOTBS2
用新创建的undotbs02替换之前的undotbs2
SQL> alter system set undo_tablespace='UNDOTBS02' scope=both;
System altered.
在删除之前的undo表空间的时候报错ora-01548
SQL> alter database datafile 5 offline;
Database altered.
SQL> drop tablespace UNDOTBS2 including contents and datafiles;
drop tablespace UNDOTBS2 including contents and datafiles
*
ERROR at line 1:
ORA-01548: active rollback segment '_SYSSMU11_3993685818$' found, terminate dropping tablespace
这个ora-01538的报错说是回滚段中还有活动的事务。
检查回滚段状态:
SQL> select segment_name,status,tablespace_name from dba_rollback_segs where status not in ('ONLINE','OFFLINE') and tablespace_name='UNDOTBS2';
SEGMENT_NAME STATUS TABLESPACE_NAME
------------------------------ ------------------ -----------------------------
_SYSSMU11_3993685818$ NEEDS RECOVERY UNDOTBS2
_SYSSMU12_3786484004$ NEEDS RECOVERY UNDOTBS2
_SYSSMU13_4206247907$ NEEDS RECOVERY UNDOTBS2
_SYSSMU14_3550840220$ NEEDS RECOVERY UNDOTBS2
_SYSSMU15_1667460038$ NEEDS RECOVERY UNDOTBS2
_SYSSMU16_102810502$ NEEDS RECOVERY UNDOTBS2
_SYSSMU17_4197495598$ NEEDS RECOVERY UNDOTBS2
_SYSSMU18_2803337290$ NEEDS RECOVERY UNDOTBS2
_SYSSMU19_2100581483$ NEEDS RECOVERY UNDOTBS2
_SYSSMU20_3702784048$ NEEDS RECOVERY UNDOTBS2
10 rows selected.
试图将这些回滚段置于offline,然后drop掉。
SQL> alter rollback segment "_SYSSMU11_3993685818$" offline;
Rollback segment altered.
SQL> alter rollback segment "_SYSSMU12_3786484004$" offline;
Rollback segment altered.
SQL> alter rollback segment "_SYSSMU13_4206247907$" offline;
Rollback segment altered.
SQL> alter rollback segment "_SYSSMU14_3550840220$" offline;
Rollback segment altered.
SQL> alter rollback segment "_SYSSMU15_1667460038$" offline;
Rollback segment altered.
SQL> alter rollback segment "_SYSSMU16_102810502$" offline;
Rollback segment altered.
SQL> alter rollback segment "_SYSSMU17_4197495598$" offline;
Rollback segment altered.
SQL> alter rollback segment "_SYSSMU18_2803337290$" offline;
Rollback segment altered.
SQL> alter rollback segment "_SYSSMU19_2100581483$" offline;
Rollback segment altered.
SQL> alter rollback segment "_SYSSMU20_3702784048$" offline;
Rollback segment altered.
drop回滚段时报错ORA-30025:
SQL> drop rollback segment "_SYSSMU11_3993685818$";
drop rollback segment "_SYSSMU11_3993685818$"
*
ERROR at line 1:
ORA-30025: DROP segment '_SYSSMU11_3993685818$' (in undo tablespace) not allowed
将_smu_debug_mode设置成4。这里4代表可以在UNDO自动管理(SMU)下能够进行一些手工的回滚段管理模式下的操作(RBU)
SQL> alter system set"_smu_debug_mode" = 4;
System altered.
使用一下两个隐含参数_OFFLINE_ROLLBACK_SEGMENTS和_CORRUPTED_ROLLBACK_SEGMENTS,将NEEDS RECOVERY的回滚段添加至这两个隐含参数列表并编辑添加至pfile中:
vi init.ora
...
_OFFLINE_ROLLBACK_SEGMENTS=(_SYSSMU11_3993685818$,_SYSSMU12_3786484004$,_SYSSMU13_4206247907$,_SYSSMU14_3550840220$,_SYSSMU15_1667460038$,_SYSSMU16_102810502$,_SYSSMU17_4197495598$,_SYSSMU18_2803337290$,_SYSSMU19_2100581483$,_SYSSMU20_3702784048$)
_CORRUPTED_ROLLBACK_SEGMENTS=(_SYSSMU11_3993685818$,_SYSSMU12_3786484004$,_SYSSMU13_4206247907$,_SYSSMU14_3550840220$,_SYSSMU15_1667460038$,_SYSSMU16_102810502$,_SYSSMU17_4197495598$,_SYSSMU18_2803337290$,_SYSSMU19_2100581483$,_SYSSMU20_3702784048$)
...
关闭数据库后以init.ora文件启动数据库。
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount pfile='/home/oracle/init.ora';
ORACLE instance started.
Total System Global Area 2.6991E+10 bytes
Fixed Size 2213976 bytes
Variable Size 2.5770E+10 bytes
Database Buffers 1073741824 bytes
Redo Buffers 145174528 bytes
Database mounted.
SQL> alter database open;
Database altered.
再次drop状态为NEEDS RECOVERY的回滚段。
SQL> drop rollback segment "_SYSSMU11_3993685818$";
Rollback segment dropped.
SQL> drop rollback segment "_SYSSMU12_3786484004$";
Rollback segment dropped.
SQL> drop rollback segment "_SYSSMU13_4206247907$";
Rollback segment dropped.
SQL> drop rollback segment "_SYSSMU14_3550840220$";
Rollback segment dropped.
SQL> drop rollback segment "_SYSSMU16_102810502$";
Rollback segment dropped.
SQL> drop rollback segment "_SYSSMU15_1667460038$";
Rollback segment dropped.
SQL> drop rollback segment "_SYSSMU17_4197495598$";
Rollback segment dropped.
SQL> drop rollback segment "_SYSSMU18_2803337290$";
Rollback segment dropped.
SQL> drop rollback segment "_SYSSMU19_2100581483$";
Rollback segment dropped.
SQL> drop rollback segment "_SYSSMU20_3702784048$";
Rollback segment dropped.
成功drop不再报错,然后删除之前的undo表空间和数据文件:
SQL> alter system set undo_tablespace='UNDOTBS02' scope=both;
System altered.
SQL> drop tablespace UNDOTBS2 including contents and datafiles;
Tablespace dropped.
将新建的undotbs02表空间成功替换之前的undotbs2,并将undotbs2表空间删除。
SQL> select file#,status,name from v$datafile status='ONLINE';
FILE# STATUS NAME
---------- --------------------- --------------------------------------------------------------------------------
1 SYSTEM +DHOMSAPP/dhomsapp/datafile/system.256.795021933
...
10 ONLINE +DHOMSAPP/dhomsapp/datafile/undotbs02.289.145649813
10 rows selected.
至此,2节点undo表空间成功恢复至online,数据库恢复正常。