Reading files from the proc filesystem

July 27, 2014

GravatarBy Michael Snoyman

I was stumped by this one myself for a bit today, so I thought writing it up in a blog post would be a good way to make sure (1) I don't forget this little fact, and (2) hopefully the next person doesn't need to puzzle over this as long as I did. Let's say you want to read the contents of a file in the proc filesystem, such as /proc/uptime. There are many ways to do that in Haskell. Let's ignore any streaming data framework for the moment, and instead focus on just the "string-like" types: String and strict/lazy ByteString/Text. Here's a little program that tries all of them out:

import qualified Data.ByteString      as S
import qualified Data.ByteString.Lazy as L
import qualified Data.Text.IO         as T
import qualified Data.Text.Lazy.IO    as TL

test :: Show a => String -> (FilePath -> IO a) -> IO ()
test name reader = do
    contents <- reader "/proc/uptime"
    putStrLn $ name ++ ": " ++ show contents

main :: IO ()
main = do
    test "String           " readFile
    test "strict ByteString" S.readFile
    test "lazy ByteString  " L.readFile
    test "strict Text      " T.readFile
    test "lazy Text        " TL.readFile

Given that the uptime file is just simple ASCII data, you'd probably assume (like I did) that all of these will produce the same result. In fact, that's not the case. On my system, the results are:

String           : "60740.70 136144.86\n"
strict ByteString: ""
lazy ByteString  : "60740.70 136144.86\n"
strict Text      : "60740.70 136144.86\n"
lazy Text        : "60740.70 136144.86\n"

Strict ByteString reading is returning an empty value! Why is this happening? It's actually quite easy to see once you throw in two new pieces of information. First, let's look at the implementation of Data.ByteString.readFile:

readFile :: FilePath -> IO ByteString
readFile f = bracket (openBinaryFile f ReadMode) hClose
    (\h -> hFileSize h >>= hGet h . fromIntegral)

Notice how we allocate a buffer exactly the right size to read in the entire contents of the file. We don't do this with any of the file reading functions. For the lazy variants, we don't want to read the entire file into memory at once. And for strict Text, knowing the size of the file doesn't tell us the size of the buffer we need to allocate, due to variable length encoding. So this nifty optimization only applies to strict ByteStrings.

Now piece of data number two:

$ ls -l /proc/uptime 
-r--r--r-- 1 root root 0 Jul 27 13:56 /proc/uptime

Huh, the file is empty! As is well documented, virtually every file in the proc filesystem is listed as empty, and the contents are generated on demand by the kernel.

So how do you read the file contents into a strict ByteString? There are actually plenty of approaches that work. In my case, I ended up just writing a helper function using conduit:

    localReadFile fp =
         IO.withBinaryFile fp IO.ReadMode $ \h ->
         sourceHandle h $$ foldC

But probably the simplest thing to do is to just convert a lazy ByteString into a strict ByteString, e.g. fmap L.toStrict . L.readFile.

Comments

comments powered by Disqus

Archives