{"id":228,"date":"2008-09-05T10:32:58","date_gmt":"2008-09-05T15:32:58","guid":{"rendered":"http:\/\/www.dannorris.com\/?p=228"},"modified":"2019-04-01T13:57:59","modified_gmt":"2019-04-01T13:57:59","slug":"ora-1555-after-switchover-to-standby","status":"publish","type":"post","link":"https:\/\/www.dannorris.com\/blog\/2008\/09\/05\/ora-1555-after-switchover-to-standby\/","title":{"rendered":"ORA-1555 after switchover to standby"},"content":{"rendered":"<p>\t\t\t\tThis week started with diagnosing an interesting situation that didn&#8217;t seem to be talked about much (maybe because it is a relatively &#8220;old&#8221; problem&#8211;explanation in a minute), so I thought I&#8217;d share the experience for everyone&#8217;s benefit.<\/p>\n<p>The situation involves a 10.2.0.1 (&lt;gasp&gt;) database on Linux 32-bit RHEL 4. The database is in flashback mode and has a single physical standby database with Data Guard configured in maximum availability mode. The standby was configured relatively recently and had never been tested with a switchover. The switchover was scheduled and conducted without much issue.<!--more--><\/p>\n<p>The switchover to the standby database was done, application brought up against the now activated standby database and there were no issues reported. Some basic testing was done for about 30 minutes, no application errors, no database errors.<\/p>\n<p>Once satisfied that the standby database was viable and worked, the customer wanted the original production database server to be put back into the primary role. Another switchover was executed without issue and the applications were again reconfigured to access the primary database and brought up. It was at this point that the alert log started showing many (100+ per hour, during low utilization period) errors like these:<\/p>\n<pre><code>ORA-01555 caused by SQL statement below (SQL ID: 74rfy9pxwabb4, Query Duration=0 sec, SCN: 0x0003.52c296c8):\n<\/code><\/pre>\n<p>This error was puzzling since the query duration is 0 sec, so it isn&#8217;t a undo_retention issue. These statistics from v$undostat were also puzzling.<\/p>\n<pre><code>\n                            Tuned Undo # Unexpired  # Expired   ORA-1555 Out-Of-space  Max Query\n   INST_ID BEGIN_TIME        Retention      Stolen     Reused      Error        Error     Length\n---------- ---------------- ---------- ----------- ---------- ---------- ------------ ----------\n         1 09\/02\/2008 12:59       1500           0          0          7            0        136\n         1 09\/02\/2008 13:09       1500           0          0          3            0        137\n         1 09\/02\/2008 13:19       1500           0          0          4            0         37\n         1 09\/02\/2008 13:29       1500           0          0         12            0         27\n         1 09\/02\/2008 13:39       1500           0          0          5            0         19\n         1 09\/02\/2008 13:49       1500           0          0          0            0         23\n         1 09\/02\/2008 13:59       1500           0          0          0            0         66\n         1 09\/02\/2008 14:09       1500           0          0         19            0        157\n         1 09\/02\/2008 14:19       1500           0          0         22            0        142\n         1 09\/02\/2008 14:29       1500           0          0         14            0          0\n         1 09\/02\/2008 14:39       1500           0          0         10            0          0\n         1 09\/02\/2008 14:49       1500           0          0         17            0        114\n         1 09\/02\/2008 14:59       1500           0          0          3            0          0\n<\/code><\/pre>\n<p>Seeing that Max Query Length is at least an order of magnitude less than the undo_retention (1500), there&#8217;s no reason why the ORA-1555 errors should be occurring&#8211;especially more than 20 in a 10-minute interval. At this point, I concluded that we must be hitting a bug. After all, these statistics were complete inconsistent with logic and we are on 10.2.0.1, so it seems pretty likely that others have probably hit this issue before given the 3+ years of field experience that 10g Release 2 has.<\/p>\n<p>It took a few tries, but eventually, I located Oracle BUG 5380055 which is documented in Metalink Note 386830.1. It affects environments on 10.1.0.5, 10.2.0.1 or 10.2.0.2 that have flashback database enabled and conduct two switchovers. There are some other scenarios that will trigger the bug as well, but this scenario fit our circumstances.<\/p>\n<p>Luckily, the note suggests that there is a patch. I looked and was lucky to find that 10.2.0.1 does indeed have a patch available for our platform. Following the procedure to install the patch on both primary and standby environments is supposed to prevent the issue from occurring in the future. A second procedure after the patch is installed is required to &#8220;fix&#8221; the issues that were introduced in to the database by the bug. I don&#8217;t really understand the internals of the issue well enough to understand how this fixes it, but the fix is to set these two parameters in the primary instance:<\/p>\n<pre><code>*._allow_error_simulation = TRUE\n*._smu_debug_mode = 268435456\n<\/code><\/pre>\n<p>Then, start the primary instance in restricted mode. Remove the parameters and restart the primary instance in normal mode and the problem(s) should be gone. The standby database will receive the fixes in the redo stream, so there&#8217;s no additional procedures to conduct on the standby side. In our case, it worked well and the entire downtime to install the patch and conduct the restart procedure was about 15 minutes.<\/p>\n<p>Obviously, we long ago recommended to this customer that they update their patch level to 10.2.0.3 or 10.2.0.4. The application they use is custom written in-house, so the only barrier to the upgrade is the time necessary to test the patchset. They will upgrade soon&#8211;I&#8217;m sure this issue will help motivate them even more.\t\t<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This week started with diagnosing an interesting situation that didn&#8217;t seem to be talked about much (maybe because it is a relatively &#8220;old&#8221; problem&#8211;explanation in a minute), so I thought I&#8217;d share the experience for everyone&#8217;s benefit. The situation involves a 10.2.0.1 (&lt;gasp&gt;) database on Linux 32-bit RHEL 4. The database is in flashback mode &hellip; <a href=\"https:\/\/www.dannorris.com\/blog\/2008\/09\/05\/ora-1555-after-switchover-to-standby\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;ORA-1555 after switchover to standby&#8221;<\/span><\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[30,15,22],"tags":[235,236,40,149,209,211],"class_list":["post-228","post","type-post","status-publish","format-standard","hentry","category-database","category-oracle","category-technical","tag-_allow_error_simulation","tag-_smu_debug_mode","tag-40","tag-ora-1555","tag-standby","tag-switchover"],"_links":{"self":[{"href":"https:\/\/www.dannorris.com\/blog\/wp-json\/wp\/v2\/posts\/228","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dannorris.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dannorris.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dannorris.com\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dannorris.com\/blog\/wp-json\/wp\/v2\/comments?post=228"}],"version-history":[{"count":1,"href":"https:\/\/www.dannorris.com\/blog\/wp-json\/wp\/v2\/posts\/228\/revisions"}],"predecessor-version":[{"id":526,"href":"https:\/\/www.dannorris.com\/blog\/wp-json\/wp\/v2\/posts\/228\/revisions\/526"}],"wp:attachment":[{"href":"https:\/\/www.dannorris.com\/blog\/wp-json\/wp\/v2\/media?parent=228"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dannorris.com\/blog\/wp-json\/wp\/v2\/categories?post=228"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dannorris.com\/blog\/wp-json\/wp\/v2\/tags?post=228"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}