Make a unique filesystem path, without creating the file

Tue Feb 16 12:36:50 EST 2016

On Tue, 16 Feb 2016 04:56 pm, Ben Finney wrote:

> An example::
> 
>     import io
>     import tempfile
>     names = tempfile._get_candidate_names()

I'm not sure that calling a private function of the tempfile module is
better than calling a deprecated function.

>     def test_frobnicates_configured_spungfile():
>         """ ‘foo’ should frobnicate the configured spungfile. """
> 
>         fake_file_path = os.path.join(tempfile.gettempdir(), names.next())

At this point, you have a valid pathname, but no guarantee whether it refers
to a real file on the file system or not. That's the whole problem with
tempfile.makepath -- it can return a file name which is not in use, but by
the time it returns to you, you cannot guarantee that it still doesn't
exist.

Now, since this is a test which doesn't actually open that file, it doesn't
matter. There's no actual security vulnerability here. So your test doesn't
actually require that the file is unique, or that it doesn't actually
exist. (Which is good, because you can't guarantee that it doesn't exist.)

So why not just pick a random bunch of characters?

    chars = list(string.ascii_letters)
    random.shuffle(chars)
    fake_file_path = ''.join(chars[:10])

>         fake_file = io.BytesIO("Lorem ipsum, dolor sit
>         amet".encode("utf-8"))
>
>         patch_builtins_open(
>                 when_accessing_path=fake_file_path,
>                 provide_file=fake_file)

There's nothing apparent in this that requires that fake_file_path not
actually exist, which is good since (as I've pointed out before) you cannot
guarantee that it doesn't exist. One could just as easily, and just as
correctly, write:

        patch_builtins_open(
                when_accessing_path='/foo/bar/baz',
                provide_file=fake_file)

and regardless of whether /foo/bar/baz actually exists or not, you are
guaranteed to get the fake file rather than the real file. So I question
whether you actually need this tempfile.makepath function at all.

*But* having questioned it, for the sake of the argument I'll assume you do
need it, and continue accordingly.

>         system_under_test.config.spungfile_path = fake_file_path
>         system_under_test.foo()
>         assert_correctly_frobnicated(fake_file)
> 
> So the test case creates a fake file, makes a valid filesystem path to
> associate with it, then patches the ‘open’ function so that it will
> return the fake file when that specific path is requested.
> 
> Then the test case alters the system under test's configuration, giving
> it the generated filesystem path for an important file. The test case
> then calls the function about which the unit test is asserting
> behaviour, ‘system_under_test.foo’. When that call returns, the test
> case asserts some properties of the fake file to ensure the system under
> test actually accessed that file.

Personally, I think it would be simpler and easier to understand if, instead
of patching open, you allowed the test to read and write real files:

    file_path = '/tmp/spam'
    system_under_test.config.spungfile_path = file_path
    system_under_test.foo()
    assert_correctly_frobnicated(file_path)
    os.unlink(file_path)

In practice, I'd want to only unlike the file if the test passes. If it
fails, I'd want to look at the file to see why it wasn't frobnicated.

I think that a correctly-working filesystem is a perfectly reasonable
prerequisite for the test, just like a working CPU, memory, power supply,
operating system and Python interpreter. You don't have to guard against
every imaginable failure ("fixme: test may return invalid results if the
speed of light changes by more than 0.0001%"), and you might as well take
advantage of real files for debugging. But that's my opinion, and if you
have another, that's your personal choice.

> With a supported standard library API for this – ‘tempfile.makepath’ for
> example – the generation of the filesystem path would change from four
> separate function calls, one of which is a private API::
> 
>     names = tempfile._get_candidate_names()
>     fake_file_path = os.path.join(tempfile.gettempdir(), names.next())
> 
> to a simple public function call::
> 
>     fake_file_path = tempfile.makepath()

Nobody doubts that your use of tempfile.makepath is legitimate for your
use-case. But it is *not* legitimate for the tempfile module, and it is a
mistake that it was added in the first place, hence the deprecation.
Assuming that your test suite needs this function, your test library, or
test suite, should provide that function, not tempfile. I believe it is
unreasonable to expect the tempfile module to keep a function which is a
security risk in the context of "temp files" just because it is useful for
some completely unrelated use-cases.

After all, your use of this doesn't actually have anything to do with
temporary files. It is a mocked *permanent* file, not a real temporary one.

> This whole thread began because I expected such an API would exist.
> 
> 
>> I don't see how it is useful to have a notion of a filepath at all
>> in this case, and therefore I don't see why you would want a
>> mktemp-like function available.
> 
> Because the system under test expects to be dealing with a filesystem,
> including normal restrictions on filesystem paths.

Yes, but the system doesn't try to enforce the filesystem's rules, does it?
Apart from simple restrictions like "path must be a string", I wouldn't
expect your system to make rules like:

- file names must be 8 characters followed by a dot followed by 3
characters;

- paths must not contain ASCII nulls;

etc. That's for the file system to enforce. So you don't really know ahead
of time what "normal restrictions" exist. (Are newlines allowed in
filenames? ASCII null bytes? limited to 2**16 path components?)

So your test can completely ignore any and all restrictions, and your
monkeypatched open() will be perfectly happy to deal with them:

    fake_file_path = ""  # Empty string.
    patch_builtins_open(
            when_accessing_path=fake_file_path,
            provide_file=fake_file)

and your system shouldn't care.

(Well, perhaps it will reject the empty string as a path. But it shouldn't
reject anything else.)

> The file object needs to be fake because the test case should not be
> prone to irrelevant failures when the real filesystem isn't behaving as
> expected; this test case makes assertions only about what
> ‘system_under_test.foo’ does internally, not what the filesystem does.

Since you're monkeypatching open(), the patched open() can define any path
it likes as "fake".

Admittedly it might be a bit scary to see things like:

    open("/dev/sda", "w").write("pwned!!!")

in the test suite, but it's perfectly safe (assuming your test harness is
working correctly to patch open).

>> But.. then why a filesystem path at all in that case?
> 
> Because the system under test is expecting valid filesystem paths, and I
> have no good reason to violate that constraint.

Since your test doesn't know what filesystem your code will be running on,
you can't make any assumptions about what paths are valid or not valid.

> Almost. I want the filesystem paths to be valid because the system under
> test expects them, it may perform its own validation, 

If the system tries to validate paths, it is broken. That's how you get
broken applications that insist that all file names must be located in 
C:\\My Documents. The application should allow the file system to validate
paths.

-- 
Steven