[Zope-dev] Large file support

seant@superchannel.org seant@superchannel.org
Tue, 24 Oct 2000 20:31:52 +0200


I have been building an "ExternalFile" class which stores the body of
the file in an external file, mirroring the Zope path/hierarchy.  This
will allow easy integration with servers that can mount the external
representation of the content and serve it with a consistent namespace.

To make life zimple, I tried to move all file manipulation to Zope,
including upload/download/copy/cut/paste/delete and permissions.  These
external files are transaction aware, blah blah..

Working with files > 20MB I notices some serious performance/scalability
issues and investigated.  Here are the results.

A diff with my changes against version 2.2.2 is available at
<http://www.superchannel.org/Playground/large_file_zope2.2.2_200010241.diff>


Concerns:

	Zope objects like File require data as a seekable file or as a
	coherent block, rather than as a stream.  Initializing/updating
	these objects *may* require loading the entire file into memory.

	In memory buffering of request or response data could cause
	excessive swapping of the working set.

	Multi-service architecture (ZServer->ZPublisher) could limit the
	reuse of stream handles.

	Creating temporary files as FIFOs buffers between the services
	causes signficant swapping.


Modifications:

	Using pipes I found that FTPServer.ContentCollector was using a
	StringIO to buffer the uploads from FTP clients.  I changed this
	into a TemporaryFile for a while which revealed the leaked file
	descriptor bug (see below).  This intermediary temp file caused 1
	extra file copy for each request.  The goal is to not have any
	intermediary files at all, and pipeline the content directly into
	the Zope objects.

	To remove this FTP upload file buffer, I converted the FTP collector
	again from a TemporaryFile into a pipe with a reader and writer file
	objects.  The FTPRequest receives the reader from which it can
	process the input on the publish thread in processInputs.

	Since we are dealing with blocking pipes it is OK to have a reader
	on the publish thread and a writer on the ZServer thread.  The major
	considerations were regarding the proper way to read from a pipe
	through the chain of control, especially in cgi.FieldStorage.

	Stdin is treated as the reader of the pipe throughout the code.  All
	seek()s and tell()s on sys.stdin type objects (a tty not a seekable
	file) should be considered illegal and removed.


	Usage of FieldStorage from FTP (Unknown content-length)

	To gain access to the body of a request, one typically calls
	REQUEST['BODY'] or REQUEST['BODYFILE'].  This returns the file
	object the FieldStorage copied from stdin.

	To prevent FieldStorage from copying the file from stdin to a
	temporary file, we can set the CONTENT_LENGTH header to '0' in the
	FTP _get_env for a STOR.

	In this case, FieldStorage creates a temporary file but doesn't read
	any data from stdin so we can return stdin directly when BODYFILE is
	requested and 'content-length' is '0'.  However, BODYFILE could be a
	pipe which doesn't support 'seek' or 'tell'.  The code used to suck
	the data off the BODYFILE needs to be modified to adapt to the
	possibly of being passed a pipe.


	Updating Image.File to play with pipes

	The _read_data method of Image.File pulls the data out of the
	BODYFILE and sticks it in the instance as a string, pdata object, or
	a linked list of pdata objects.  The existing code reads and builds
	the list in one clean sweep back-to-front.  I belive this keeps the
	pdata.data chunks out of memory, quickly (sub)committing then
	deactivating (_p_changed = None) them.

	Since we can no longer safely assume 'seek' is valid for BODYFILE, I
	tried to read and build the list front-to-back.  This kept the data
	in memory, even though I tried to deactivate the objects quickly. 

	As a tradeoff, I read the data front-to-back then built the list
	back-to-front taking another pass to reverse the list so it is in
	the correct order.

	Memory usage appears to be steady, meaning the whole file is not
	loaded into the working set.  This also prevents unecessary reading
	into a temporary FieldStorage file during an FTP upload.


	Web based uploads...

	...suck.  I do not recommend doing a web based upload for files 
	> 1mb.  First, a content-length is known, so we don't get the
	advantage of pipelining the data directly from the socket, a
	temporary file must be created, written and read.  Second, I believe
	the content is encoded so the transferred bitcount is much higher
	than using FTP.

	Plus, most browsers today do not support a progress bar for posts,
	so there is no indication of status, causing most people to click
	'Upload' multiple times.

	I haven't done any optimization for this case, but have tested that
	is still works properly.


	Cleaning up (leaked file descriptor bug)

	I noticed that when uploading 20+ MB a couple of times, I ran out of
	hard drive space.  This didn't make sense and I looked into what
	files were open by Zope.  Doing an 'lsof' I found that the temporary
	files which are immediatly unlinked after creation, were still open
	until the end of the Zope process.  These files (created by
	tempfile.TemporaryFile) needed to be closed after the end of the
	REQUEST and RESPONSE, rather than at the end of the Zope process.

	After publishing, the close method of the REQUEST gets called.  Here
	I added closing of stdin and the FieldStorage created TemporaryFile
	'_file'.


	Output producers

	The ZServer.HTTPResponse object makes a good attempt of keeping
	large results out of memory but does so by creating a temporary file
	and copying any written data to it then pushing a file_part_producer
	onto the channel output queue.

	If the Zope object knows how to produce the data themselves, they
	could push producer(s) directly to the channel.  I added a single
	check in ZServer.HTTPResponse(256) where a temporary file is only
	created if the data is larger than the in-memory buffer *and*
	doesn't already look like a producer with 'more' as a method.

	If the temporary file doesn't exist the rest of the code simply
	writes the data to the channel and the channel produces the output
	directly from the producer created by the Zope object.

	Using a file producer from my Zope object cuts out a file copy, and
	those get expensive when one is dealing with 20+ MB files.  The
	response time is also dramatically reduced because the file copy
	step before streaming to the client was removed.

	I would like to apply the same concept to Image.File.index_html
	where rather than creating a temporary file in the RESPONSE to queue
	the contents, create a producer to pull the data directly out of the
	backend when it is ready to write.  I am experiencing a 10 second
	latency (233Mhz laptop) between requesting a 10MB file and receiving
	the first byte with the current code.  If an output producer is
	used, this latency would drop < 1 sec.

	I made an attempt to create a pdata_producer but failed because of
	ZODB errors reloading the object.  I get a traceback like:

	2000-10-24T09:19:08 ERROR(200) ZODB Couldn't load state for
	'\000\000\000\000\000\000&\370' Traceback (innermost last): File
	/usr/local/zope/lib/python/ZODB/Connection.py, line 442, in setstate
	AttributeError: 'None' object has no attribute 'load'

	My hunch is that the Image, pdata_producer or pdata object gets
	deactivated and can't find its DB to load itself.  I tried setting a
	_p_jar on the pdata_producer, but I don't really know what happens
	when the object context leaves publish_module.  Since the object
	activation happens in the ZServer thread, some voodoo may be needed
	to get the proper state in the pdata_producer.... any takers?


Testing...

	I have only tested these changes with FTPServer and HTTPServer, not
	PCGIServer or FCGIServer.

	I have tested round-trip coherency because of the change in
	Image.File._read_data.

	I haven't completely tested boundary conditions, where
	Image.File._read_data makes descisions.  The extent has been large
	files 10+ MB and small files < 64K.

	I haven't tested HTTPRequest.retry which will probably fail because
	HTTPRequest.stdin now may be a pipe.

	3rd party products which treat BODYFILE as a seekable file object
	may fail during FTP uploads.


Summary:

	Most of these efforts are geared towards FTP, as HTTP form uploads
	don't seem to be worth the effort.

	I haven't taken a look at HTTP PUT, for webdav clients etc...
	Similar pipelining could be used, however I doubt they would be
	possible without modifing cgi.FieldStorage.

	Zope seems to be doing a lot with TempStorage and other ZODB magic I
	didn't care about checking out.  Some performance improvements could
	be included here.

	FTP I/O with my changes including my ExternalFile custom output
	producer dramatically increases Zopes performance and scalability.

-Sean