April 2024 - This site, and Kamaelia are being updated. There is significant work needed, and PRs are welcome.

Kamaelia.Protocol.HTTP.HTTPClient

Single-Shot HTTP Client

This component is for downloading a single file from an HTTP server. Pick up data received from the server on its "outbox" outbox.

Generally you should use SimpleHTTPClient in preference to this.

Example Usage

How to use it:

Pipeline(
    SingleShotHTTPClient("http://www.google.co.uk/"),
    SomeComponentThatUnderstandsThoseMessageTypes()
).run()

If you want to use it directly, note that it doesn't output strings but ParsedHTTPHeader, ParsedHTTPBodyChunk and ParsedHTTPEnd like HTTPParser. This makes has the advantage of not buffering huge files in memory but outputting them as a stream of chunks. (with plain strings you would not know the contents of the headers or at what point that response had ended!)

How does it work?

SingleShotHTTPClient accepts a URL parameter at its creation (to __init__). When activated it creates an HTTPParser instance and then connects to the webserver specified in the URL using a TCPClient component. It sends an HTTP request and then any response from the server is received by the HTTPParser.

HTTPParser processes the response and outputs it in parts as:

ParsedHTTPHeader,
ParsedHTTPBodyChunk,
ParsedHTTPBodyChunk,
...
ParsedHTTPBodyChunk,
ParsedHTTPEnd

If SingleShotHTTPClient detects that the requested URL is a redirect page (using the Location header) then it begins this cycle anew with the URL of the new page, otherwise the parts of the page output by HTTPParser are sent on to "outbox".

Simple HTTP Client

This component downloads the pages corresponding to HTTP URLs received on "inbox" and outputs their contents (file data) as a message, one per URL, to "outbox" in the order they were received.

Example Usage

Type URLs, and they will be downloaded and placed, back to back in "downloadedfile.txt":

Pipeline(
    ConsoleReader(">>> ", ""),
    SimpleHTTPClient(),
    SimpleFileWriter("downloadedfile.txt"),
).run()

How does it work?

SimpleHTTPClient uses the Carousel component to create a new SingleShotHTTPClient component for every URL requested. As URLs are handled sequentially, it has only one SSHC child at anyone time.


Kamaelia.Protocol.HTTP.HTTPClient.SimpleHTTPClient

class SimpleHTTPClient(Axon.Component.component)

Inboxes

  • control : Shut me down
  • _carouselready : Receive NEXT when carousel has completed a request
  • inbox : URLs to download - a dict {'url':'x', 'postbody':'y'} or a just the URL as a string
  • _carouselinbox : Data from SingleShotHTTPClient via Carousel

Outboxes

  • debug : Information to aid debugging
  • outbox : Requested file's data string
  • signal : Signal I have shutdown
  • _carouselnext : Create a new SingleShotHTTPClient
  • _carouselsignal : Shutdown the carousel

Methods defined here

Warning!

You should be using the inbox/outbox interface, not these methods (except construction). This documentation is designed as a roadmap as to their functionalilty for maintainers and new component developers.

__init__(self)

Create and link to a carousel object

cleanup(self)

Destroy child components and send producerFinished when we quit.

debug(self, msg)

main(self)

Main loop.

Kamaelia.Protocol.HTTP.HTTPClient.SingleShotHTTPClient

class SingleShotHTTPClient(Axon.Component.component)

SingleShotHTTPClient() -> component that can download a file using HTTP by URL

Arguments: - starturl -- the URL of the file to download - [postbody] -- data to POST to that URL - if set to None becomes an empty body in to a POST (of PUT) request - [connectionclass] -- specify a class other than TCPClient to connect with - [method] -- the HTTP method for the request (default to GET normally or POST if postbody != ""

Inboxes

  • control : UNUSED
  • _parserinbox : Data from HTTP parser
  • _parsercontrol : Signals from HTTP parser
  • _tcpcontrol : Signals from TCP client
  • inbox : UNUSED

Outboxes

  • signal : UNUSED
  • _parsersignal : Signals for HTTP parser
  • _tcpoutbox : Send over TCP connection
  • debug : Output to aid debugging
  • outbox : Requested file
  • _tcpsignal : Signals shutdown of TCP connection

Methods defined here

Warning!

You should be using the inbox/outbox interface, not these methods (except construction). This documentation is designed as a roadmap as to their functionalilty for maintainers and new component developers.

__init__(self, starturl[, postbody][, connectionclass][, extraheaders][, method])

formRequest(self, url)

Craft a HTTP request string for the supplied url

handleRedirect(self, header)

Check for a redirect response and queue the fetching the page it points to if it is such a response. Returns true if it was a redirect page and false otherwise.

main(self)

Main loop.

mainBody(self)

Called repeatedly by main loop. Checks inboxes and processes messages received. Start the fetching of the new page if the current one is a redirect and has been completely fetched.

makeRequest(self, request)

Connect to the remote HTTP server and send request

shutdownKids(self)

Close TCP connection and HTTP parser

Feedback

Got a problem with the documentation? Something unclear that could be clearer? Want to help improve it? Constructive criticism is very welcome - especially if you can suggest a better rewording!

Please leave you feedback here in reply to the documentation thread in the Kamaelia blog.

-- Automatic documentation generator, 05 Jun 2009 at 03:01:38 UTC/GMT