DataHub

Quickly find and create data mining programs
Download

DataHub Ranking & Summary

Advertisement

  • Rating:
  • License:
  • GPL
  • Price:
  • FREE
  • Publisher Name:
  • Lukasz Szybalski
  • Publisher web site:
  • https://launchpad.net/~szybalski

DataHub Tags


DataHub Description

Quickly find and create data mining programs DataHub is a software that allows you to quickly find and create data mining programs that are able to crawl, parse, and load the data source into database or other types of useful forms.Install DataHub* The best way to get started with datahub is to install it in the following way:* Setup virtualenv which will keep the installation in a separate directory. virtualenv --no-site-packages datahubENVNew python executable in datahubENV/bin/pythonInstalling setuptools............done.source datahubENV/bin/activate * Download the source and untar it: wget http://launchpad.net/datahub/trunk/0.7/+download/datahub-0.7.tar.gztar -xzvf datahub-0.7.tar.gz*Install itcd datahub-0.7/python setup.py install* Make sure it got installed by checking a list of templates: paster create --list-templates* Done. Move on to the next section. Run DataHub* Datahub is a paster template so you run it as follows: paster create --list-templatespaster create -t datahub* You should see something like this: paster create -t datahubSelected and implied templates: PasteScript#basic_package A basic setuptools-enabled package datahub#datahub DataHub is a tool to help you datamine(crawl, parse, and load) any data.Enter project name: myprojectVariables: egg: myproject package: myproject project: myprojectEnter version (Version (like 0.1)) : Enter description (One-line description of the package) : my projectEnter long_description (Multi-line description (in reST)) : this is a long descriptionEnter keywords (Space-separated keywords/tags) : datahub dataprocessEnter author (Author name) : mynameEnter author_email (Author email) : Enter url (URL of homepage) : Enter license_name (License name) : Enter zip_safe (True/False: if the package can be distributed as a .zip file) : Creating template basic_packageCreating directory ./myproject Recursing into +package+ Creating ./myproject/myproject/ Copying __init__.py to ./myproject/myproject/__init__.py Copying setup.cfg to ./myproject/setup.cfg Copying setup.py_tmpl to ./myproject/setup.pyCreating template datahub Recursing into +package+ Copying README.txt_tmpl to ./myproject/myproject/README.txt Recursing into crawl Creating ./myproject/myproject/crawl/ Copying Readme.txt_tmpl to ./myproject/myproject/crawl/Readme.txt Copying __init__.py to ./myproject/myproject/crawl/__init__.py Copying download.sh to ./myproject/myproject/crawl/download.sh Copying download_list.txt_tmpl to ./myproject/myproject/crawl/download_list.txt Copying harvestman-+package+.xml to ./myproject/myproject/crawl/harvestman-myproject.xml Recursing into hdf5 Creating ./myproject/myproject/hdf5/ Copying READEM_hdf5.txt_tmpl to ./myproject/myproject/hdf5/READEM_hdf5.txt Copying __init__.py to ./myproject/myproject/hdf5/__init__.py Recursing into load Creating ./myproject/myproject/load/ Copying __init__.py to ./myproject/myproject/load/__init__.py Copying model.template to ./myproject/myproject/load/model.template Recursing into parse Creating ./myproject/myproject/parse/ Copying __init__.py to ./myproject/myproject/parse/__init__.py Recursing into wiki Creating ./myproject/myproject/wiki/ Copying REAME.wiki_tmpl to ./myproject/myproject/wiki/REAME.wikiRunning /home/lucas/tmp/datahubENV/bin/python setup.py egg_infoManually creating paster_plugins.txt (deprecated! pass a paster_plugins keyword to setup() instead)Adding datahub to paster_plugins.txt* Go into the myproject folder and start coding.* The folder structure looks like this: myproject|-- myproject| |-- README.txt| |-- __init__.py| |-- crawl| | |-- Readme.txt| | |-- __init__.py| | |-- download.sh| | |-- download_list.txt| | `-- harvestman-myproject.xml| |-- hdf5| | |-- READEM_hdf5.txt| | `-- __init__.py| |-- load| | |-- __init__.py| | `-- model.template| |-- parse| | `-- __init__.py| `-- wiki| `-- REAME.wiki|-- myproject.egg-info| |-- PKG-INFO| |-- SOURCES.txt| |-- dependency_links.txt| |-- entry_points.txt| |-- not-zip-safe| |-- paster_plugins.txt| `-- top_level.txt|-- setup.cfg`-- setup.py Requirements: · Python


DataHub Related Software