SUMMARY-Problems in solaris reading large nfs mounted directories.

Susan Thielen (thielen@irus.rri.uwo.ca)
Fri, 30 May 1997 09:14:43 -0400 (EDT)

Well, that was fast.. a solution first thing in the morning...

The solution -

Use the 32bitclients option in the /etc/exports file on the SGI.

The question

>
>Hi!
>
>I have a few directories on SGI's running IRIX 6.2 that my Sparc 10
>running Solaris 2.5.1 cannot access.. It's the oddest thing... They
>hve more than 512 entries... and this happens....
>
>cd reco1
>ra.irus.rri.uwo.ca# ls
>NFS readdir+ failed for server diamond.irus: error 2 (RPC: Can't decode result)
>NFS readdir failed for server diamond.irus: error 2 (RPC: Can't decode result)
>ra.irus.rri.uwo.ca#
>
>Now I can read it on a Sparc 10 running 4.1.3_U1 no problem. But there is
>quite a while before the directory does list...
>
>Is this a timeout problem?? WHere can I fine tune this?? Help!! I can
>people doing reconstructions that need this solved!!!
>

Much thanks to

Casper Dik <casper@holland.Sun.COM>
Donald Molaro <molaro@canuck.com>
Kevin.Sheehan@uniq.com.au (Kevin Sheehan {Consulting Poster Child})
Ulla Fischer <ulla@dmi.min.dk>
bismark@alta.Jpl.Nasa.Gov (Bismark Espinoza)

Ulla provided the most comprehensive solution, which I'll include here...

>From ulla@dmi.min.dk Fri May 30 07:03:31 1997
>Date: Fri, 30 May 1997 11:03:13 GMT
>From: Ulla Fischer <ulla@dmi.min.dk>
>To: thielen@irus.rri.uwo.ca
>Subject: Re: Problems in solaris reading large nfs mounted directories...
>
>This is reported as bug id 1247376:
>
> Bug Id: 1247376
> Category: network
> Subcategory: nfs
> State: closed
> Synopsis: ls -l on large dir generates RPC: Can't decode result on readdir on 2.5 client
> Description:
>
>
>Details of the problem as described by the customer:
>
>
>Under certain circumstances (typically involving a directory with a large
>number
>of entries), A Solaris NFS 3 client fails to correctly read the results of
>a correct NFS Version 3 Readdirplus or readdir response. To recreate the
>problem, one need only issue an "ls -l" on the right directory on an SGI IRIX
>machine. The problem was even worse in the early release of NFS v3, but that
>is not really relevant.
>
>The sun client prints the messages:
>
># ls -l
>NFS readdir+ failed for server neteng: error 2 (RPC: Can't decode result)
>NFS readdir failed for server neteng: error 2 (RPC: Can't decode result)
>total 0
>#
>
>Snoop shows the sun client doing a readdirplus request:
>
>NFS: ----- Sun NFS -----
>NFS:
>NFS: Proc = 17 (Read from directory - plus)
>NFS: File handle = 86410032BC215A94000A00000000004E
>NFS: 02C16987000A00000000000000000080
>NFS: Cookie = 0
>NFS: Verifier = 0000000000000000
>NFS: Dircount = 1048
>NFS: Maxcount = 8192
>NFS:
>
>And The SGI machine sending a valid response:
>
>RPC: ----- SUN RPC Header -----
>RPC:
>RPC: Transaction id = 2690142209
>RPC: Type = 1 (Reply)
>RPC: This is a reply to frame 17
>RPC: Status = 0 (Accepted)
>RPC: Verifier : Flavor = 0 (None), len = 0 bytes
>RPC: Accept status = 0 (Success)
>RPC:
>NFS: ----- Sun NFS -----
>NFS:
>NFS: Proc = 17 (Read from directory - plus)
>NFS: Status = 0 (OK)
>NFS: Post-operation attributes:
>NFS: File type = 2 (Directory)
>NFS: Mode = 0755
>NFS: Setuid = 0, Setgid = 0, Sticky = 0
>NFS: Owner's permissions = rwx
>NFS: Group's permissions = r-x
>NFS: Other's permissions = r-x
>NFS: Link count = 8, User ID = 3739, Group ID = 10
>NFS: File size = 12288, Used = 1572864
>NFS: Special: Major = 14, Minor = 0
>NFS: File system id = 50331652, File id = 46229895
>NFS: Last access time = 01-Apr-96 23:25:42.344810309 GMT
>NFS: Modification time = 18-Mar-96 21:53:59.368977050 GMT
>NFS: Attribute change time = 18-Mar-96 21:53:59.368977050 GMT
>NFS:
>NFS: Cookie verifier = 0000000000000000
>NFS:
>NFS: ------------------ entry #1
>NFS: File ID = 46229895
>NFS: Name = .
>NFS: Cookie = 1099511633710
>NFS: Post-operation attributes:
>NFS: File type = 2 (Directory)
>NFS: Mode = 0755
>NFS: Setuid = 0, Setgid = 0, Sticky = 0
>NFS: Owner's permissions = rwx
>NFS: Group's permissions = r-x
>NFS: Other's permissions = r-x
>NFS: Link count = 8, User ID = 3739, Group ID = 10
>NFS: File size = 12288, Used = 1572864
>NFS: Special: Major = 14, Minor = 0
>NFS: File system id = 50331652, File id = 46229895
>NFS: Last access time = 01-Apr-96 23:25:42.344810309 GMT
>NFS: Modification time = 18-Mar-96 21:53:59.368977050 GMT
>NFS: Attribute change time = 18-Mar-96 21:53:59.368977050 GMT
>NFS:
>NFS: File handle = 86410032BC215A94000A00000000004E
>NFS: 02C16987000A00000000000000000080
>NFS: ------------------ entry #2
>NFS: File ID = 42297867
>NFS: Name = ..
>NFS: Cookie = 1099532764776
>...
>
>But notice the Cookies returned by the server, e.g. 1099532764776, which
>= 0x10001428668, which won't fit in 32 bytes.
>
>The problem is that Solaris is trying to cram a 64 bit opaque entity into
>a 32 bit field in a dirent structure. Since it is opaque, the client should
>do nothing but remember it and pass it back. Instead the solaris code
>does something like this in nfs3_xdr.c (I am quoting code from the ONC+ early
>release of v3, which had MAXOFF_T set to something less than 0xffffffff.
>I believe in 2.5 it was set to 0xffffffff so that at least it let in all 32
>bits. The line numbers are valid for rev 1.9 of nsf3_xdr.c ):
>line 19:
> #pragma ident "@(#)nfs3_xdr.c 1.9 94/05/12 SMI"
>line 1704:
> xdr_getdirpluslist(register XDR *xdrs, READDIRPLUS3resok *objp)
> {
> .....
>line 1746:
> if (cookie > (cookie3)MAXOFF_T)
> return (FALSE);
>
>It is this bogus examination of cookie in line 1746 that is the culprit.
>Since cookie is opaque, the client shouldn't try to interpret what it contains.
>
>
>
> Work around:
>from the exports man page on IRIX 6.2, the -32bitclients option can be used to generate 32 bits cookies instead of 64s.
>
> Integrated in releases:
> Duplicate of:
> Patch id:
> See also:
> Summary:
>Large NFS V3 READDIR cookies are not accepted by Solaris 2.5 client
>
>
>
>The exports manpage on IRIX 6.3 says about 32bitclients:
>
> 32bitclients
> Causes the server to mask off the high order 32 bits of
> directory cookies in NFS version 3 directory operations. This
> option may be required when clients run 32-bit operating
> systems that assume the entire cookie is contained in 32 bits
> and reject responses containing version 3 cookies with high
> bits on. IRIX 5.3 and Solaris 2.5 are examples of 32-bit
> operating systems with this behavior, which produces error
> messages like "Cannot decode response" on directory operations.
> XFS filesystems on the server can generate cookies with high
> bits on. Exporting filesystems with the 32bitclients option
> causes these bits to be masked and prevents error messages.
>
> A filesystem name that is not followed by a name list is exported to
> everyone. A ``#'' anywhere in the file indicates a comment extending to
> the end of the line on which it appears. A backslash () at the end of a
> line permits splitting long lines into shorter ones.
>
>So your entries in the exports file should look like this:
>/data/decode -ro,32bitclients
>/data/GDB/94 -ro,32bitclients
>
> Yours,
>Ulla Fischer
>Danmarks Meteorologiske Institut Email: ulla@dmi.min.dk
>Edb-afdelingen Phone: + 45 39 15 75 00
>Lyngbyvej 100 Phone: + 45 39 15 75 54
>2100 Koebenhavn 0 Fax: + 45 39 27 75 01
>Denmark
>

Susan KJ Thielen System/Network Manager
Imaging Lab, Robarts Research Institute Phone: (519) 663-5777x4029
PO Box 5015, 100 Perth Drive Fax: (519) 663-3900
London, ON N6A 5K8 Canada E-mail: thielen@irus.rri.uwo.ca