PDFTextStream free download, PDFTextStream download on software download

	PDFTextStream A PDF text and metadata extraction library available for Java, Python, and .NET.
Download

PDFTextStream Ranking & Summary

Rating:

License:
Other/Proprietary Li...

Price:
USD 1900.00 | BUY the full version

Publisher Name:
Snowtide Informatics Systems, Inc.

Publisher web site:
http://snowtide.com/

PDFTextStream Tags

PDFTextStream Description

A PDF text and metadata extraction library available for Java, Python, and .NET. PDFTextStream project is a PDF text and metadata extraction library available for Java, Python, and .NET.It supports all versions of the PDF document specification, (including v1.6, used by Acrobat 7), extraction of text encoded using double-byte character sets (including Chinese, Japanese, and Korean), decryption of 40-bit and 128-bit encrypted documents, and extraction of all document metadata provided by PDF documents (including form data, bookmarks, and annotations). Easy integration with Jakarta Lucene is included. Requirements: · Apache Lucene What's New in This Release: · Added an .isStruckThrough() method to com.snowtide.pdf.TextUnit, indicating whether a character has a strikethrough drawn through it. · Improved PDFTextStream's support for embedded character mappings. · The calculation of whitespace between words has been fixed to properly account for whitespace that is explicitly encoded in the source PDF documents. · Improved PDFTextStream's handling of composite content encodings, which previously could fail resulting in some ranges of PDF content being 'ignored' during extraction. · Fixed a bug in VisualOutputTarget where text from a single line would be split over multiple lines · Improved vertical alignment of text extracted using VisualOutputTarget · Improved VisualOutputTarget-produced extracts to eliminate spurious additional whitespace between closely-adjacent words

PDFTextStream Related Software