PaPy

Parallel Pipelines for Python
Download

PaPy Ranking & Summary

Advertisement

  • Rating:
  • License:
  • GPL v3
  • Price:
  • FREE
  • Publisher Name:
  • Marcin Cieslik
  • Publisher web site:
  • http://muralab.org/

PaPy Tags


PaPy Description

Parallel Pipelines for Python PaPy is a Python framework to construct and execute pipelines (flow-charts) of arbitrary tasks in parallel either locally using multi-processing or multi-threading or remotely using RPC(Remote Procedure Calls) as provided by RPyC (Remote Python Call).The pipeline is represented as an arbitrary directed acyclic graph. The user has to define functions of the nodes (called Pipers) and the edges (called pipes), which represent data-flow or dependency. Piper instances are assigned to virtual resources (called IMaps) which are technically pools of local and/or remote processes or threads. IMaps are similar to multiprocessing.Pool imap method, but support multiple tasks i.e. (function, iterable) tuples, which are interwoven rather than being evaluated one after another.An example of a valid Piper function (it identifies the host, process and itself):@imports(], ], ]])def hid(i): return "item %s is on host:%s using process:%s, and function:%s" % (i, socket.gethostname(), os.getpid(), id(hid))Creates a IMap instance, which utilizes 4 processes:local_pool = IMap(worker_num =4)Creates a IMap instance, which utilizes 4 threads:local_thread_pool = IMap(worker_type ='thread', worker_num =4)Creates a IMap instance, which utilizes 8 remote processes on 2 hosts.remote_pool = IMap(worker_num =0, worker_remote =,, 4]])The worker_num =0 overrides the default of creating local worker processes in the number of available CPUs. Remote hosts ('host1' and 'host2') should run a RPyC classic server in forking mode i.e.:python classic_server.py -m 'forking'The input to the pipeline needs to be a collection and PaPy processes data in the pipeline in batches of adjustable size which allows for a parallel(memory consumption) vs lazy(immediate results) tradeoff.PaPy is flexible:* graph topology is unrestricted* user-function code is unrestricted* the number of inputs is unrestricted* the number of IMaps is unrestricted* the number of used processes/thread is unrestricted* pipers can be assigned to IMaps arbitrarily* IMaps can be shared (load-balancing)* memory-parallelism-laziness trade-off is adjustable* cross-platform hosts are supported Here are some key features of "PaPy": · construction of arbitrarily complex pipelines · flexible local and remote parallelism · shared local and remote resources · robustness to exceptions · support for time-outs · real-time logging · os-independent (really a feature of multiprocessing) · cross-platform (really a feature of RPyC) · tested & documented. Requirements: · Python


PaPy Related Software