public class pdfParser extends AbstractParser implements Parser
Modifier and Type | Class and Description |
---|---|
private static class |
pdfParser.ResourceCleaner |
Parser.Failure
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
individualPagePropertyname |
static boolean |
individualPages |
log, SUPPORTED_EXTENSIONS, SUPPORTED_MIME_TYPES
Constructor and Description |
---|
pdfParser() |
Modifier and Type | Method and Description |
---|---|
static void |
clean_up_idiotic_PDFParser_font_cache_which_eats_up_tons_of_megabytes() |
private java.util.Collection<AnchorURL>[] |
extractPdfLinks(PDDocument pdf)
extract clickable links from pdf
|
static void |
main(java.lang.String[] args)
test
|
Document[] |
parse(DigestURL location,
java.lang.String mimeType,
java.lang.String charset,
VocabularyScraper scraper,
int timezoneOffset,
java.io.InputStream source)
parse an input stream
|
equals, getName, hashCode, singleList, supportedExtensions, supportedMimeTypes
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
equals, getName, hashCode, supportedExtensions, supportedMimeTypes
public static boolean individualPages
public static java.lang.String individualPagePropertyname
public Document[] parse(DigestURL location, java.lang.String mimeType, java.lang.String charset, VocabularyScraper scraper, int timezoneOffset, java.io.InputStream source) throws Parser.Failure, java.lang.InterruptedException
Parser
parse
in interface Parser
location
- the url of the sourcemimeType
- the mime type of the source, if knowncharset
- the charset of the source, if knownscraper
- an entity scraper to detect facets from text annotation contextsource
- a input streamParser.Failure
java.lang.InterruptedException
private java.util.Collection<AnchorURL>[] extractPdfLinks(PDDocument pdf)
pdf
- the document to parsepublic static void clean_up_idiotic_PDFParser_font_cache_which_eats_up_tons_of_megabytes()
public static void main(java.lang.String[] args)
args
-