pdfParser (YaCy API: javadoc documentation)

java.lang.Object
- net.yacy.document.AbstractParser
- - net.yacy.document.parser.pdfParser

All Implemented Interfaces:: Parser

public class pdfParser
extends AbstractParser
implements Parser

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

private static class pdfParser.ResourceCleaner
- Nested classes/interfaces inherited from interface net.yacy.document.Parser
  Parser.Failure

Nested Classes
Modifier and Type	Class and Description
`private static class`	`pdfParser.ResourceCleaner`

Field Summary

Fields
Modifier and Type Field and Description

static java.lang.String individualPagePropertyname

static boolean individualPages
- Fields inherited from class net.yacy.document.AbstractParser
  log, SUPPORTED_EXTENSIONS, SUPPORTED_MIME_TYPES

Fields
Modifier and Type	Field and Description
`static java.lang.String`	`individualPagePropertyname`
`static boolean`	`individualPages`

Constructor Summary

Constructors
Constructor and Description

pdfParser()

Constructors
Constructor and Description
`pdfParser()`

Method Summary

Methods
Modifier and Type	Method and Description
`static void`	`clean_up_idiotic_PDFParser_font_cache_which_eats_up_tons_of_megabytes()`
`private java.util.Collection<AnchorURL>[]`	`extractPdfLinks(PDDocument pdf)` extract clickable links from pdf
`static void`	`main(java.lang.String[] args)` test
`Document[]`	`parse(DigestURL location, java.lang.String mimeType, java.lang.String charset, VocabularyScraper scraper, int timezoneOffset, java.io.InputStream source)` parse an input stream

Methods inherited from class net.yacy.document.AbstractParser
equals, getName, hashCode, singleList, supportedExtensions, supportedMimeTypes

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface net.yacy.document.Parser
equals, getName, hashCode, supportedExtensions, supportedMimeTypes

- Field Detail
  - individualPages
```
public static boolean individualPages
```
  - individualPagePropertyname
```
public static java.lang.String individualPagePropertyname
```
- Constructor Detail
  - pdfParser
```
public pdfParser()
```
- Method Detail
  - parse
```
public Document[] parse(DigestURL location,
               java.lang.String mimeType,
               java.lang.String charset,
               VocabularyScraper scraper,
               int timezoneOffset,
               java.io.InputStream source)
                 throws Parser.Failure,
                        java.lang.InterruptedException
```
    Description copied from interface: Parser
    
    parse an input stream
    
    Specified by:
    
    parse in interface Parser
    
    Parameters:
    location - the url of the source
    mimeType - the mime type of the source, if known
    charset - the charset of the source, if known
    scraper - an entity scraper to detect facets from text annotation context
    source - a input stream
    
    Returns:
    a list of documents that result from parsing the source
    
    Throws:
    
    Parser.Failure
    
    java.lang.InterruptedException
  - extractPdfLinks
```
private java.util.Collection<AnchorURL>[] extractPdfLinks(PDDocument pdf)
```
    extract clickable links from pdf
    
    Parameters:
    pdf - the document to parse
    
    Returns:
    all detected links
  - clean_up_idiotic_PDFParser_font_cache_which_eats_up_tons_of_megabytes
```
public static void clean_up_idiotic_PDFParser_font_cache_which_eats_up_tons_of_megabytes()
```
  - main
```
public static void main(java.lang.String[] args)
```
    test
    
    Parameters:
    args -

Class pdfParser

Nested Class Summary

Nested classes/interfaces inherited from interface net.yacy.document.Parser

Field Summary

Fields inherited from class net.yacy.document.AbstractParser

Constructor Summary

Method Summary

Methods inherited from class net.yacy.document.AbstractParser

Methods inherited from class java.lang.Object

Methods inherited from interface net.yacy.document.Parser

Field Detail

individualPages

individualPagePropertyname

Constructor Detail

pdfParser

Method Detail

parse

extractPdfLinks

clean_up_idiotic_PDFParser_font_cache_which_eats_up_tons_of_megabytes

main