python file synchronization

Cameron Simpson cs at zip.com.au
Thu Feb 16 19:33:24 EST 2012


On 16Feb2012 22:11, Sherif Shehab Aldin <silentquote at gmail.com> wrote:
| First sorry for my very very late reply, has been overloaded at work last
| week :(

Me too.

There's no hurry at my end, take your time.

[...]
| > Can a simple byte count help here? Copy the whole file with FTP. From
| > the new copy, extract the bytes from the last byte count offset onward.
| > Then parse the smaller file, extracting whole records for use by (C).
| > That way you can just keep the unparsed tail (partial record I imagine)
| > around for the next fetch.
| >
| > Looking at RFC959 (the FTP protocol):
| >
| >  http://www.w3.org/Protocols/rfc959/4_FileTransfer.html
| >
| > it looks like you can do a partial file fetch, also, by issuing a REST
| > (restart) command to set a file offset and then issuing a RETR (retrieve)
| > command to get the rest of the file. These all need to be in binary mode
| > of course.
| >
| > So in principle you could track the byte offset of what you have fetched
| > with FTP so far, and fetch only what is new.
| 
|  I am actually grabbing the file from ftp with a bash script using lftp, It
| seemed a simple task for python at the beginning and then I noticed the
| many problems. I have checked lftp and did not know how to continue
| downloading a file. Do I have to use ftp library, may be in python so I can
| use that feature?

Looking at "man lftp" I see that the "get" command has a "-c" option
(for "continue"). That probably does it for you. Should be easy to test:

  - make big file on FTP server
  - fetch with lftp (interactively - this is all just to check)
  - append a little data to the file on the server
     date >> the-big-file
  - refetch:
      get -c the-big-file

and see how much data gets copied. Hopefully just the new bytes.

[...]
| > | After searching more yesterday, I found that local mv is atomic, so
| > instead
| > | of creating the lock files, I will copy the new diffs to tmp dir, and
| > after
| > | the copy is over, mv it to actual diffs dir, that will avoid reading It
| > | while It's still being copied.
| >
| > Yes, this sounds good. Provided the mv is on the same filesystem.
[...]
| > Yes they are in same file system, I am making sure of that ;)

Good.

BTW, when replying inline try to make sure your new text has clean blank
lines above and below and has not kept the indentation quote markers.
See above that your "Yes they are in same file system" seems to be at
the same indentation as my earlier sentence above? That made your reply
look like part of the quote instead of a reply, and I nearly missed it.

[...]
| > It is also useful to make simple tests of small pieces of the code.
| > So make the code to get part of the data a simple function, and write
| > tests to execute it in a few ways (no new data, part of a record,
| > several records etc).
| >
| > You are right, my problem is that I don't care about testing until my code
| grows badly and then I notice what I got myself into :)

(Again, see the quoting level?)

One approach to get into testing slowly is to make a test for your bug.
Suppose something is not working. Write a small function that exhibits
the bug, as small as possible. That is now a test function! When you fix
the bug, the test function will pass. Keep it around!

Some projects have tests for every bug report that gets fixed.

Another approach to growing a test suite is to write out a good docstring
for somthing, describing with precision what the function arranges
i.e. not the internal mechanisms, but what the caller can rely on being
true after the function has been called. Then write a test function that
calls the main function and then checks each thing the docstring says
should be true. Each check is a test.

Trite example:

  def double(x):
    ''' Return double the value of `x`.
        The return value will be twice `x`.
        The return value will be even.
    '''
    return x * 2

  class Tests(unitest.TestCase):
    def test_double(self):
      for x in 0, 3, 17, 100:           # test a few different values
        x2 = double(x)
        self.assertEqual(x2, x + x)     # test doubling
        self.assertEqual(x2 % 2, 0)     # test evenness

You can see that writing out the guarentees in the docstring assists in
writing some tests.

| > I really appreciate your help. I am trying to learn from the mailing list,
| I noticed many interesting posts in the list already. I wish I could read
| the python-list same way.. but unfortunately the mail digest they send is
| quiet annoying :(

I do not use digests - I have my subscription set to individual emails.
Just arrange that your mailer files then in a "python" mail folder when
they arrive so they do not clutter your inbox.

| Many thanks to you, and I will keep you posted if I got other ideas. :)

Excellent. Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

Q: How does a hacker fix a function which doesn't work for all of the elements in its domain?
A: He changes the domain.



More information about the Python-list mailing list