Tas pats (kas) tehniskās piezīmes -- Entries on 18th September 2010

18th September 2010

2:06am: bread: Cannot read the block
ReiserFS example
http://smartmontools.sourceforge.net/badblockhowto.html#reiserfs_ex

This section was written by Joachim Jautz with additions from Manfred Schwarb.

The following problems were reported during a scheduled test:

smartd[575]: Device: /dev/hda, starting scheduled Offline Immediate Test.
[... 1 hour later ...]
smartd[575]: Device: /dev/hda, 1 Currently unreadable (pending) sectors
smartd[575]: Device: /dev/hda, 1 Offline uncorrectable sectors

[Step 0] The SMART selftest/error log (see smartctl -l selftest) indicated there was a problem with block address (i.e. the 512 byte sector at) 58656333. The partition table (e.g. see sfdisk -luS /dev/hda or fdisk -ul /dev/hda) indicated that this block was in the /dev/hda3 partition which contained a ReiserFS file system. That partition started at block address 54781650.

While doing the initial analysis it may also be useful to take a copy of the disk attributes returned by smartctl -A /dev/hda. Specifically the values associated with the "Reallocated_Sector_Ct" and "Reallocated_Event_Count" attributes (for ATA disks, the grown list (GLIST) length for SCSI disks). If these are incremented at the end of the procedure it indicates that the disk has re-allocated one or more sectors.

[Step 1] Get the file system's block size:

# debugreiserfs /dev/hda3 | grep '^Blocksize'
Blocksize: 4096

[Step 2] Calculate the block number:

# echo "(58656333-54781650)*512/4096" | bc -l
484335.37500000000000000000

It is re-assuring that the calculated 4 KB damaged block address in /dev/hda3 is less than "Count of blocks on the device" shown in the output of debugreiserfs shown above.

[Step 3] Try to get more info about this block => reading the block fails as expected but at least we see now that it seems to be unused. If we do not get the `Cannot read the block' error we should check if our calculation in [Step 2] was correct ;)

# debugreiserfs -1 484335 /dev/hda3
debugreiserfs 3.6.19 (2003 http://www.namesys.com)

484335 is free in ondisk bitmap
The problem has occurred looks like a hardware problem.

If you have bad blocks, we advise you to get a new hard drive, because once you get one bad block that the disk drive internals cannot hide from your sight, the chances of getting more are generally said to become much higher (precise statistics are unknown to us), and this disk drive is probably not expensive enough for you to risk your time and data on it. If you don't want to follow that advice then if you have just a few bad blocks, try writing to the bad blocks and see if the drive remaps the bad blocks (that means it takes a block it has in reserve and allocates it for use for of that block number). If it cannot remap the block, use badblock option (-B) with reiserfs utils to handle this block correctly.

bread: Cannot read the block (484335): (Input/output error).

Aborted

So it looks like we have the right (i.e. faulty) block address.

[Step 4] Try then to find the affected file [3]:

tar -cO /mydir | cat >/dev/null

If you do not find any unreadable files, then the block may be free or located in some metadata of the file system.

[Step 5] Try your luck: bang the affected block with badblocks -n (non-destructive read-write mode, do unmount first), if you are very lucky the failure is transient and you can provoke reallocation [4]:

# badblocks -b 4096 -p 3 -s -v -n /dev/hda3 `expr 484335 + 100` `expr 484335 - 100`

[5]

check success with debugreiserfs -1 484335 /dev/hda3. Otherwise:

[Step 6] Perform this step only if Step 5 has failed to fix the problem: overwrite that block to force reallocation:

# dd if=/dev/zero of=/dev/hda3 count=1 bs=4096 seek=484335
1+0 records in
1+0 records out
4096 bytes transferred in 0.007770 seconds (527153 bytes/sec)

[Step 7] If you can't rule out the bad block being in metadata, do a file system check:

reiserfsck --check

This could take a long time so you probably better go for lunch ...

[Step 8] Proceed as stated earlier. For example, sync disk and run a long selftest that should succeed now.

Comment on this

Tas pats (kas) tehniskās piezīmes

History

18th September 2010