YaCy Release current_development

Major Changes   
Jump to: Bugfixes / Other Changes

Fri Feb 16 11:35:15 CET 2018
by luccioman
Fixed CrawlStartExpert.html HTML validation errors

Validated with Nu Html Checker 17.11.1
Changed Files: htroot/CrawlStartExpert.html, locales/de.lng, locales/fr.lng, locales/hi.lng, locales/ja.lng, locales/master.lng.xlf, locales/ru.lng, locales/sk.lng, locales/uk.lng, locales/zh.lng
Wed Feb 14 07:51:07 CET 2018
by luccioman
Adjusted last blacklist entry example for a more accurate description

As discussed in issue #160 , blacklist entries can indeed currently not
be "complete" regular expressions, but must be structured as a domain
part, a separator character ('/'), and a path part.
Changed Files: htroot/Blacklist_p.html, locales/de.lng, locales/fr.lng, locales/hi.lng, locales/ja.lng, locales/master.lng.xlf, locales/ru.lng, locales/uk.lng, locales/zh.lng
Tue Feb 06 10:25:38 CET 2018
by luccioman
Added basic support for autotagging microdata annotated item types.

With the appropriate vocabulary settings in Vocabulary_p.html page, this
can produce Vocabulary search facets displaying item types referenced in
html documents by microdata annotation.
Tested notably, but not limited to, vocabulary classes/types defined by
Schema.org and Dublin Core.
Changed Files: defaults/yacy.init, htroot/Vocabulary_p.html, htroot/Vocabulary_p.java, source/net/yacy/cora/document/id/MultiProtocolURL.java, source/net/yacy/cora/language/synonyms/AutotaggingLibrary.java, source/net/yacy/cora/lod/vocabulary/Tagging.java, source/net/yacy/document/Condenser.java, source/net/yacy/document/Document.java, source/net/yacy/document/parser/htmlParser.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/SwitchboardConstants.java, test/java/net/yacy/cora/language/synonyms/AutotaggingLibraryTest.java
Sat Dec 23 18:56:17 CET 2017
by luccioman
Added optional search parameter/setting to control content domain filter

Thus allowing to choose at configuration or per search request, whether
extending or not results beyond strict content domain filter (image,
video, audio or application).

Related graphical controls to be added to user interface.
Changed Files: defaults/yacy.init, htroot/yacy/search.java, htroot/yacysearch.java, htroot/yacysearchitem.java, source/net/yacy/peers/Protocol.java, source/net/yacy/peers/RemoteSearch.java, source/net/yacy/search/SwitchboardConstants.java, source/net/yacy/search/query/QueryGoal.java, source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/query/SearchEvent.java
Tue Dec 19 13:52:05 CET 2017
by luccioman
Do locale independant case conversion on hosts, schemes, and file exts.

Required for proper operation when the default system locale is Turkish,
as dottless and dotted i characters have specific case conversion rules
in this language.
Changed Files: htroot/api/ymarks/get_metadata.java, source/net/yacy/cora/federate/solr/responsewriter/OpensearchResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/YJsonResponseWriter.java, source/net/yacy/crawler/data/CrawlProfile.java, source/net/yacy/crawler/retrieval/HTTPLoader.java, source/net/yacy/crawler/robots/RobotsTxtEntry.java, source/net/yacy/peers/SeedDB.java, source/net/yacy/repository/Blacklist.java, source/net/yacy/repository/FilterEngine.java, source/net/yacy/repository/LoaderDispatcher.java, source/net/yacy/search/snippet/MediaSnippet.java, source/net/yacy/server/http/HTTPDProxyHandler.java
Fri Dec 15 11:28:46 CET 2017
by luccioman
Started implementing optional https preference for protocol operations

Introduced through the new configurable setting
network.unit.protocol.https.preferred, defaulting to false for now.

Let choose to prefer using https when available on remote peers to
perform YaCy protocol operations including notably hello or transferRWI.

Not yet implemented for every YaCy protocol operations.

Changed Files: defaults/yacy.init, htroot/MessageSend_p.java, htroot/Network.java, htroot/yacy/hello.java, source/net/yacy/peers/Network.java, source/net/yacy/peers/Protocol.java, source/net/yacy/peers/Seed.java, source/net/yacy/search/SwitchboardConstants.java
Sat Dec 09 22:29:35 CET 2017
by Michael Peter Christen
added a crawl filter based on <div> tag class names
When a crawl is started, a new field to exclude content from scraping is
available. The field can be identified with the class name of div tags.
All text contained in such a div tag where the configured class name(s)
match are not indexed, while the remaining page is indexed.
Changed Files: htroot/CrawlStartExpert.html, htroot/CrawlStartExpert.java, htroot/Crawler_p.java, htroot/QuickCrawlLink_p.java, source/net/yacy/crawler/CrawlSwitchboard.java, source/net/yacy/crawler/data/CrawlProfile.java, source/net/yacy/crawler/retrieval/Response.java, source/net/yacy/data/BookmarkHelper.java, source/net/yacy/data/ymark/YMarkCrawlStart.java, source/net/yacy/document/AbstractParser.java, source/net/yacy/document/Parser.java, source/net/yacy/document/TextParser.java, source/net/yacy/document/importer/MediawikiImporter.java, source/net/yacy/document/parser/bzipParser.java, source/net/yacy/document/parser/gzipParser.java, source/net/yacy/document/parser/html/ContentScraper.java, source/net/yacy/document/parser/html/ScraperInputStream.java, source/net/yacy/document/parser/htmlParser.java, source/net/yacy/document/parser/sevenzipParser.java, source/net/yacy/document/parser/tarParser.java, source/net/yacy/document/parser/zipParser.java, source/net/yacy/repository/LoaderDispatcher.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/DocumentIndex.java, test/java/net/yacy/document/parser/html/ContentScraperTest.java, test/java/net/yacy/document/parser/htmlParserTest.java
Fri Dec 08 15:12:08 CET 2017
by luccioman
Removed use of deprecated Jetty IPAccessHandler for client filtering.

Upgraded to InetAccessHandler.
Added InetPathAccessHandler extension to InetAccessHandler to maintain
path patterns capability previously available in IPAccessHandler but
lost in InetAccessHandler.

Filtering on IPv6 addresses is now supported.

Support for deprecated pattern formats such as "192.168." and
"" has been removed, but startup automated migration
should convert such patterns eventually present in serverClient.
Changed Files: defaults/yacy.init, htroot/SettingsAck_p.java, htroot/Settings_ServerAccess.inc, locales/ru.lng, source/net/yacy/http/InetPathAccessHandler.java, source/net/yacy/http/Jetty9HttpServerImpl.java, source/net/yacy/migration.java, source/net/yacy/yacy.java, test/java/net/yacy/http/InetPathAccessHandlerTest.java, test/java/net/yacy/migrationTest.java
Thu Dec 07 00:24:33 CET 2017
by reger
upd to Jetty-9.4.8.v20171121
Changed Files: .classpath, build.xml, lib/jetty-client-9.4.8.v20171121.jar, lib/jetty-continuation-9.4.8.v20171121.jar, lib/jetty-deploy-9.4.8.v20171121.jar, lib/jetty-http-9.4.8.v20171121.jar, lib/jetty-io-9.4.8.v20171121.jar, lib/jetty-jmx-9.4.8.v20171121.jar, lib/jetty-proxy-9.4.8.v20171121.jar, lib/jetty-security-9.4.8.v20171121.jar, lib/jetty-server-9.4.8.v20171121.jar, lib/jetty-servlet-9.4.8.v20171121.jar, lib/jetty-servlets-9.4.8.v20171121.jar, lib/jetty-util-9.4.8.v20171121.jar, lib/jetty-webapp-9.4.8.v20171121.jar, lib/jetty-xml-9.4.8.v20171121.jar, pom.xml
Mon Dec 04 08:48:37 CET 2017
by luccioman
Use HTTP Post operation for resetting memory monitoring state.

Fixes issue #145

Also added textual hint on the button, and display it only when it makes
sense, that is to say when the memory state is 'exhausted'.
Changed Files: htroot/PerformanceQueues_p.java, htroot/Performance_p.html, locales/de.lng, locales/fr.lng, locales/master.lng.xlf, locales/ru.lng, locales/uk.lng, locales/zh.lng
Fri Nov 24 14:10:41 CET 2017
by luccioman
Made possible to use https for remote search on peers with SSL enabled.

Default is still http to prevent any regressions, but a new setting is
available to choose https as the preferred protocol to perform remote
New configuration setting 'remotesearch.https.preferred' is manually
editable in yacy.conf file or in Advanced Properties page
Should be enabled as default in the future for improved privacy. 
Https could also eventually be used for other peers communications.
Changed Files: defaults/yacy.init, source/net/yacy/cora/federate/solr/instance/RemoteInstance.java, source/net/yacy/peers/Protocol.java, source/net/yacy/peers/Seed.java, source/net/yacy/search/SwitchboardConstants.java
Thu Nov 09 09:30:20 CET 2017
by luccioman
Upgraded com.twelvemonkeys.imageio dependencies from 3.3.1 to 3.3.2
Changed Files: .classpath, build.xml, lib/common-image-3.3.2.jar, lib/common-io-3.3.2.jar, lib/common-lang-3.3.2.jar, lib/imageio-bmp-3.3.2.jar, lib/imageio-core-3.3.2.jar, lib/imageio-metadata-3.3.2.jar, lib/imageio-tiff-3.3.2.jar, pom.xml
Thu Oct 26 07:51:18 CEST 2017
by luccioman
Enable HTTP Digest authentication for non admin users.

Also ensure authentication is not lost by Digest timeout when navigating
between index.html and search results page.

This way, running searches with extended features on a remote peer or a
password protected peer works with a regular user (with "Extended
search" rights). 
When authenticating on the search page with a user without "Extended
search" rights, it appears as authenticated, but has just its usual
access to the public search features.
Changed Files: htroot/Blog.java, htroot/BlogComments.java, htroot/User.java, htroot/env/templates/header.template, htroot/env/templates/simpleSearchHeader.template, htroot/index.html, htroot/index.java, htroot/yacysearch.html, htroot/yacysearch.java, htroot/yacysearchitem.html, htroot/yacysearchitem.java, htroot/yacysearchtrailer.java, source/net/yacy/data/UserDB.java, source/net/yacy/search/Switchboard.java
Sun Oct 22 20:00:00 CEST 2017
by reger
upd to Solr 6.6.2
Changed Files: .classpath, build.xml, lib/lucene-analyzers-common-6.6.2.jar, lib/lucene-analyzers-phonetic-6.6.2.jar, lib/lucene-backward-codecs-6.6.2.jar, lib/lucene-classification-6.6.2.jar, lib/lucene-codecs-6.6.2.jar, lib/lucene-core-6.6.2.jar, lib/lucene-grouping-6.6.2.jar, lib/lucene-highlighter-6.6.2.jar, lib/lucene-join-6.6.2.jar, lib/lucene-memory-6.6.2.jar, lib/lucene-misc-6.6.2.jar, lib/lucene-queries-6.6.2.jar, lib/lucene-queryparser-6.6.2.jar, lib/lucene-spatial-extras-6.6.2.jar, lib/lucene-suggest-6.6.2.jar, lib/solr-core-6.6.2.jar, lib/solr-solrj-6.6.2.jar, pom.xml
Sat Oct 21 10:57:36 CEST 2017
by luccioman
Added an optional login link/status to the search public top nav bar.

Thus allowing a more convenient way (wihout the need to go to the admin
section) to login when searching on your remote or password protected
peer and benefit from extended search features such as Heuristics,
Bookmarking or JavasScript resorting.

Can be disabled using the ConfigSearchPage_p.html.
Changed Files: defaults/yacy.init, htroot/ConfigSearchPage_p.html, htroot/ConfigSearchPage_p.java, htroot/env/templates/simpleSearchHeader.template, htroot/yacysearch.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/SwitchboardConstants.java
Thu Oct 12 07:16:19 CEST 2017
by luccioman
Reduced number of search navigators refresh requests in JS resort mode

The SearchEvent listen to changes on each of its navigators, and the
information about their overall state is sent with each fetched search
item (as a "data-nav-generation" attribute). Then the browser can
regularly fetch a fresh version of yacysearchtrailer.html only if
necessary (when that nav-generation value change).
Changed Files: htroot/js/yacysort.js, htroot/yacysearchitem.html, htroot/yacysearchitem.java, htroot/yacysearchtrailer.html, htroot/yacysearchtrailer.java, source/net/yacy/cora/sorting/ClusteredScoreMap.java, source/net/yacy/cora/sorting/ConcurrentScoreMap.java, source/net/yacy/cora/sorting/OrderedScoreMap.java, source/net/yacy/cora/sorting/ScoreMap.java, source/net/yacy/cora/sorting/ScoreMapUpdatesListener.java, source/net/yacy/search/navigator/Navigator.java, source/net/yacy/search/navigator/StringNavigator.java, source/net/yacy/search/query/SearchEvent.java
Mon Oct 09 14:13:46 CEST 2017
by luccioman
Add a configurable limit to tags initially displayed in search results

When the limit is reached, a button allow expanding/collapsing remaining

When this feature is activated without a limit to the number of
displayed tags, when encountering search results with a very large
number of keywords, the results page can become almost unusable (very
long vertical scrollbar)
Changed Files: defaults/yacy.init, htroot/ConfigSearchPage_p.html, htroot/ConfigSearchPage_p.java, htroot/env/base.css, htroot/js/yacysearch.js, htroot/yacysearchitem.html, htroot/yacysearchitem.java, source/net/yacy/search/SwitchboardConstants.java
Sat Oct 07 12:29:55 CEST 2017
by Andreas
Merge pull request #3 from yacy/master

Fork update
Changed Files: .classpath, .travis.yml, build.xml, debian/control, htroot/ConfigBasic.html, htroot/ConfigBasic.java, htroot/ConfigSearchPage_p.html, lib/jetty-client-9.4.7.v20170914.jar, lib/jetty-continuation-9.4.7.v20170914.jar, lib/jetty-deploy-9.4.7.v20170914.jar, lib/jetty-http-9.4.7.v20170914.jar, lib/jetty-io-9.4.7.v20170914.jar, lib/jetty-jmx-9.4.7.v20170914.jar, lib/jetty-proxy-9.4.7.v20170914.jar, lib/jetty-security-9.4.7.v20170914.jar, lib/jetty-server-9.4.7.v20170914.jar, lib/jetty-servlet-9.4.7.v20170914.jar, lib/jetty-servlets-9.4.7.v20170914.jar, lib/jetty-util-9.4.7.v20170914.jar, lib/jetty-webapp-9.4.7.v20170914.jar, lib/jetty-xml-9.4.7.v20170914.jar, pom.xml, source/net/yacy/cora/federate/solr/SchemaConfiguration.java, source/net/yacy/cora/federate/solr/responsewriter/OpensearchResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/YJsonResponseWriter.java, source/net/yacy/cora/util/Html2Image.java, source/net/yacy/data/Translator.java, source/net/yacy/document/parser/bzipParser.java, source/net/yacy/kelondro/data/word/WordReferenceRow.java, source/net/yacy/kelondro/data/word/WordReferenceVars.java, source/net/yacy/search/schema/WebgraphConfiguration.java, test/java/net/yacy/document/parser/bzipParserTest.java, test/java/net/yacy/kelondro/rwi/ReferenceContainerTest.java, test/parsertest/umlaute_html_utf8.html.bz2, test/parsertest/umlaute_html_xml_txt_gnu.tbz2, test/parsertest/umlaute_linux.txt.bz2
Mon Oct 02 00:50:30 CEST 2017
by reger
upd to Jetty-9.4.7.v20170914
Changed Files: .classpath, build.xml, lib/jetty-client-9.4.7.v20170914.jar, lib/jetty-continuation-9.4.7.v20170914.jar, lib/jetty-deploy-9.4.7.v20170914.jar, lib/jetty-http-9.4.7.v20170914.jar, lib/jetty-io-9.4.7.v20170914.jar, lib/jetty-jmx-9.4.7.v20170914.jar, lib/jetty-proxy-9.4.7.v20170914.jar, lib/jetty-security-9.4.7.v20170914.jar, lib/jetty-server-9.4.7.v20170914.jar, lib/jetty-servlet-9.4.7.v20170914.jar, lib/jetty-servlets-9.4.7.v20170914.jar, lib/jetty-util-9.4.7.v20170914.jar, lib/jetty-webapp-9.4.7.v20170914.jar, lib/jetty-xml-9.4.7.v20170914.jar, pom.xml
Fri Sep 29 23:22:39 CEST 2017
by Andreas
Merge pull request #2 from yacy/master

Merge #2
Changed Files: htroot/env/templates/header.template, htroot/index.html, htroot/yacysearch.html, htroot/yacysearch.java, htroot/yacysearchitem.html, htroot/yacysearchitem.java, htroot/yacysearchlatestinfo.java, htroot/yacysearchtrailer.java, locales/de.lng, locales/fr.lng, locales/ja.lng, locales/master.lng.xlf, locales/ru.lng, source/net/yacy/search/query/QueryModifier.java, source/net/yacy/search/query/QueryParams.java
Fri Sep 29 19:18:12 CEST 2017
by luccioman
Ensure private search features are not lost on Digest auth timeout

This is a fix for mantis 766 ( http://mantis.tokeek.de/view.php?id=766 )

Since the upgrade to Digest authentication, access to protected search
features was indeed disabled once the Digest nonce timed out.

After Digest auth timeout the browser no more sent authentication
information and as the search results page is not private, protected
features were simply be hidden without asking browser again for

Adding a supplementary parameter when accessing the search results as
authenticated fixes this.
Changed Files: htroot/env/templates/header.template, htroot/index.html, htroot/yacysearch.html, htroot/yacysearch.java, htroot/yacysearchitem.html, htroot/yacysearchitem.java, htroot/yacysearchlatestinfo.java, htroot/yacysearchtrailer.java, source/net/yacy/search/query/QueryParams.java
Wed Sep 27 23:32:00 CEST 2017
by Andreas
Merge pull request #1 from yacy/master

Changed Files: .classpath, build.xml, defaults/yacy.init, htroot/ConfigPortal_p.html, htroot/ConfigPortal_p.java, htroot/ConfigSearchPage_p.html, htroot/ConfigSearchPage_p.java, htroot/HostBrowser.html, htroot/IndexControlRWIs_p.java, htroot/env/templates/footer.template, htroot/env/templates/submenuDesign.template, htroot/env/yacysort.css, htroot/js/accessibleHistogram.js, htroot/js/raphael.min.js, htroot/js/yacysearch.js, htroot/js/yacysort.js, htroot/jslicense.html, htroot/yacysearch.html, htroot/yacysearch.java, htroot/yacysearchitem.html, htroot/yacysearchitem.java, htroot/yacysearchtrailer.html, htroot/yacysearchtrailer.java, lib/jsonic-1.3.10.jar, lib/lucene-analyzers-common-6.6.1.jar, lib/lucene-analyzers-phonetic-6.6.1.jar, lib/lucene-backward-codecs-6.6.1.jar, lib/lucene-classification-6.6.1.jar, lib/lucene-codecs-6.6.1.jar, lib/lucene-core-6.6.1.jar, lib/lucene-grouping-6.6.1.jar, lib/lucene-highlighter-6.6.1.jar, lib/lucene-join-6.6.1.jar, lib/lucene-memory-6.6.1.jar, lib/lucene-misc-6.6.1.jar, lib/lucene-queries-6.6.1.jar, lib/lucene-queryparser-6.6.1.jar, lib/lucene-spatial-extras-6.6.1.jar, lib/lucene-suggest-6.6.1.jar, lib/solr-core-6.6.1.jar, lib/solr-solrj-6.6.1.jar, pom.xml, source/net/yacy/search/Switchboard.java, source/net/yacy/search/SwitchboardConstants.java, source/net/yacy/search/query/QueryParams.java
Mon Sep 25 09:19:08 CEST 2017
by luccioman
Made the dates navigator max elements number user configurable.

Also used object properties on QueryParams instances, rather than using
mutable class (static) properties.
Changed Files: defaults/yacy.init, htroot/ConfigSearchPage_p.html, htroot/ConfigSearchPage_p.java, htroot/IndexControlRWIs_p.java, htroot/yacy/search.java, htroot/yacysearch.java, htroot/yacysearchtrailer.java, source/net/yacy/cora/federate/FederateSearchManager.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/SwitchboardConstants.java, source/net/yacy/search/query/QueryParams.java
Sun Sep 17 08:25:14 CEST 2017
by reger
update jars for upd solr 6.6. commit for ant
Changed Files: lib/jsonic-1.3.10.jar, lib/lucene-analyzers-common-6.6.1.jar, lib/lucene-analyzers-phonetic-6.6.1.jar, lib/lucene-backward-codecs-6.6.1.jar, lib/lucene-classification-6.6.1.jar, lib/lucene-codecs-6.6.1.jar, lib/lucene-core-6.6.1.jar, lib/lucene-grouping-6.6.1.jar, lib/lucene-highlighter-6.6.1.jar, lib/lucene-join-6.6.1.jar, lib/lucene-memory-6.6.1.jar, lib/lucene-misc-6.6.1.jar, lib/lucene-queries-6.6.1.jar, lib/lucene-queryparser-6.6.1.jar, lib/lucene-spatial-extras-6.6.1.jar, lib/lucene-suggest-6.6.1.jar, lib/solr-core-6.6.1.jar, lib/solr-solrj-6.6.1.jar
Wed Sep 06 16:58:40 CEST 2017
by luccioman
Improved search navigators counters accuracy and consistency.

- added some missing increments from RWI results
- decrement relevant navigator counts when solr or RWI results are
evicted because duplicates detection or constraints checked belatedly
- do not compute facets when unnecessary to avoid unwanted CPU load
- do not increment from facets when already done
- do not rely on facets on remote solr peers requests, as most of the
time only a limited part of their total results if fetched (thus also
preventing unnecessary load on remote peers)
- use a concurrency friendly score map for the dates navigators to
prevent unwanted ConcurrentModificationExceptions

This improves the situation for the most obvious inconsistencies in
search navigators counts, but more has to be done for a true accuracy
(notably when query modifiers constraints are applied belatedly - after
the solr or RWI retrieval request - such as the content domain
Changed Files: htroot/yacysearchtrailer.java, source/net/yacy/cora/federate/AbstractFederateSearchConnector.java, source/net/yacy/cora/sorting/ConcurrentScoreMap.java, source/net/yacy/kelondro/data/word/WordReferenceVars.java, source/net/yacy/kelondro/util/ISO639.java, source/net/yacy/peers/Protocol.java, source/net/yacy/peers/RemoteSearch.java, source/net/yacy/search/query/SearchEvent.java
Thu Aug 31 07:37:24 CEST 2017
by luccioman
Use final results counts in progress bar detailed statistics.

Using unfiltered detailed counts (local and remote entries found before
doubles detection and before applying query modifiers) was confusing and
inconsistent with the total count. It could let think more results are
to come in the next pages, without understanding why they are not
Changed Files: htroot/js/yacysearch.js, htroot/yacysearch.html, htroot/yacysearch.java, htroot/yacysearchitem.html, htroot/yacysearchitem.java, htroot/yacysearchlatestinfo.java, htroot/yacysearchlatestinfo.json, locales/cn.lng, locales/de.lng, locales/fr.lng, locales/ja.lng, locales/master.lng.xlf, locales/ru.lng, locales/uk.lng
Thu Aug 24 18:47:18 CEST 2017
by luccioman
Removed some unnecessary uses of java.lang.reflect api.

This improves code browsing and readability, making search by references
or call hierarchy IDE features more accurate.
Changed Files: htroot/ConfigBasic.java, htroot/api/ymarks/import_ymark.java, source/net/yacy/contentcontrol/ContentControlFilterUpdateThread.java, source/net/yacy/contentcontrol/SMWListSyncThread.java, source/net/yacy/kelondro/workflow/InstantBusyThread.java, source/net/yacy/kelondro/workflow/OneTimeBusyThread.java, source/net/yacy/peers/OnePeerPingBusyThread.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/SwitchboardConstants.java, source/net/yacy/search/query/SearchEvent.java
Mon Aug 21 09:38:20 CEST 2017
by luccioman
Improved parsing support for OOXML spreadsheets (.xlsx)

As reported edycop in mantis 765 (
http://mantis.tokeek.de/view.php?id=765 ), parsing of xlsx files was
quite incomplete.
Now properly support "Shared String Table" entry in Office Open XML
spreadsheets, an also detect embedded URLs.

Integrating the Apache poi-ooxml library could be an option for finer
OOXML formats support, but their SAX style parsing example (
http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api ) tends to
show that a custom SAX handler is still efficient for lightweight and
low memory footprint processing.
Changed Files: source/net/yacy/document/parser/ooxmlParser.java, source/net/yacy/document/parser/xml/GenericXMLContentHandler.java, source/net/yacy/document/parser/xml/OOXMLSharedStringsHandler.java, source/net/yacy/document/parser/xml/OOXMLSpreeadsheetHandler.java, test/java/net/yacy/document/ParserTest.java, test/java/net/yacy/document/parser/ooxmlParserTest.java, test/parsertest/umlaute_linux.ppsx, test/parsertest/umlaute_linux.xlsx
Mon Aug 14 14:57:58 CEST 2017
by luccioman
Implemented partial stream parsing of tar archives.

Also added JUnit tests for the tar parser and fixed unwanted use of the
tar parser as a fallback on files included in a tar archive.
Changed Files: source/net/yacy/document/parser/tarParser.java, test/java/net/yacy/document/parser/tarParserTest.java, test/parsertest/umlaute_dc_xml_iso.xml, test/parsertest/umlaute_dc_xml_utf8.xml, test/parsertest/umlaute_html_iso.html, test/parsertest/umlaute_html_utf8.html, test/parsertest/umlaute_html_xml_txt_gnu.tar, test/parsertest/umlaute_html_xml_txt_pax.tar, test/parsertest/umlaute_html_xml_txt_ustar.tar, test/parsertest/umlaute_html_xml_txt_v7.tar, test/parsertest/umlaute_linux.txt
Fri Aug 11 20:50:36 CEST 2017
by luccioman
Fixed missing transitive dependency to commons-collections4-4.1

Dependency required by poi-3.16. 

Dependency was not provided in YaCy but already defined on previous poi
versions. This only became problematic since upgrade from poi-3.15 to
poi-3.16 (commit dedc6552d37b5e877258abddac9621f7fe75bf9b). Indeed in
this new poi release, a poi component used in some YaCy parsers code
paths now explicitely needs a class from the commons-collections4
library : org.apache.poi.hpsf.Section uses now

Impacted YaCy parsers : xlsParser, pptParser, docParser.

Issue detected by the folowing JUnit tests failing :
ParserTest.testpptParsers(), ParserTest.testdocParsers(),
Changed Files: .classpath, lib/commons-collections4-4.1.License, lib/commons-collections4-4.1.jar
Sat Jul 08 09:04:03 CEST 2017
by luccioman
Started support of partial parsing on large streamed resources.

Thus enable getpageinfo_p API to return something in a reasonable amount
of time on resources over MegaBytes size range.
Support added first with the generic XML parser, for other formats
regular crawler limits apply as usual. 
Changed Files: htroot/api/getpageinfo_p.java, source/net/yacy/crawler/retrieval/StreamResponse.java, source/net/yacy/document/AbstractParser.java, source/net/yacy/document/Document.java, source/net/yacy/document/Parser.java, source/net/yacy/document/TextParser.java, source/net/yacy/document/parser/GenericXMLParser.java, source/net/yacy/document/parser/html/ContentScraper.java, source/net/yacy/document/parser/xml/GenericXMLContentHandler.java, source/net/yacy/kelondro/util/FileUtils.java, source/net/yacy/repository/LoaderDispatcher.java, test/java/net/yacy/document/parser/GenericXMLParserTest.java, test/java/net/yacy/document/parser/html/ContentScraperTest.java
Sat Jul 01 23:58:28 CEST 2017
by reger
upd to Jetty 9.4.6.v20170531
Modify loginservice to the changes in Jetty, partially based on pull 
request #101 https://github.com/yacy/yacy_search_server/pull/101 bu @automenta
Changed Files: .classpath, build.xml, htroot/ConfigUser_p.java, lib/jetty-client-9.4.6.v20170531.jar, lib/jetty-continuation-9.4.6.v20170531.jar, lib/jetty-deploy-9.4.6.v20170531.jar, lib/jetty-http-9.4.6.v20170531.jar, lib/jetty-io-9.4.6.v20170531.jar, lib/jetty-jmx-9.4.6.v20170531.jar, lib/jetty-proxy-9.4.6.v20170531.jar, lib/jetty-security-9.4.6.v20170531.jar, lib/jetty-server-9.4.6.v20170531.jar, lib/jetty-servlet-9.4.6.v20170531.jar, lib/jetty-servlets-9.4.6.v20170531.jar, lib/jetty-util-9.4.6.v20170531.jar, lib/jetty-webapp-9.4.6.v20170531.jar, lib/jetty-xml-9.4.6.v20170531.jar, lib/jetty.License, pom.xml, source/net/yacy/http/Jetty9HttpServerImpl.java, source/net/yacy/http/MonitorHandler.java, source/net/yacy/http/YaCyLegacyCredential.java, source/net/yacy/http/YaCyLoginService.java
Tue Jun 27 06:42:33 CEST 2017
by luccioman
Ensure lower case conversion consistency with any default locale.

Especially for Turkish speaking users using "tr" as their system default
locale : strings for technical stuff (URLs, tag names, constants...)
must not be lower cased with the default locale, as 'I' doesn't becomes
'i' like in other locales such as "en", but becomes '?'.
Changed Files: htroot/ConfigHeuristics_p.java, htroot/Crawler_p.java, htroot/api/blacklists/add_entry_p.java, htroot/api/blacklists/delete_entry_p.java, htroot/api/getpageinfo_p.java, htroot/api/ymarks/add_ymark.java, source/net/yacy/cora/protocol/RequestHeader.java, source/net/yacy/cora/protocol/http/HTTPClient.java, source/net/yacy/crawler/retrieval/Response.java, source/net/yacy/data/wiki/WikiCode.java, source/net/yacy/document/Document.java, source/net/yacy/document/content/SurrogateReader.java, source/net/yacy/document/parser/html/TransformerWriter.java, source/net/yacy/document/parser/rdfa/impl/RDFaTripleImpl.java, source/net/yacy/gui/framework/Browser.java, source/net/yacy/http/AbstractRemoteHandler.java, source/net/yacy/http/servlets/YaCyDefaultServlet.java, source/net/yacy/kelondro/data/meta/URIMetadataNode.java, source/net/yacy/kelondro/util/Formatter.java, source/net/yacy/kelondro/util/ISO639.java, source/net/yacy/kelondro/util/OS.java, source/net/yacy/peers/Network.java, source/net/yacy/peers/operation/yacyRelease.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/query/QueryModifier.java, source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/CollectionSchema.java, source/net/yacy/search/schema/WebgraphSchema.java, source/net/yacy/server/serverObjects.java, source/net/yacy/utils/translation/TranslatorXliff.java, source/net/yacy/yacy.java, test/java/net/yacy/document/parser/htmlParserTest.java
Mon Jun 26 16:30:21 CEST 2017
by luccioman
Added a generic XML parser, able to parse elements text and URLs.

This parser adds support for any XML based format other than already
supported XML vocabularies such XHTML, RSS/Atom feeds... It will
eventually be used as a fallback if one of these specific parsers fail,
before falling back to the existing genericParser which extracts not
that much useful information except URL tokens.
Changed Files: source/net/yacy/document/TextParser.java, source/net/yacy/document/parser/GenericXMLParser.java, source/net/yacy/document/parser/html/ContentScraper.java, source/net/yacy/document/parser/xml/GenericXMLContentHandler.java, source/net/yacy/kelondro/io/CharBuffer.java, test/java/net/yacy/document/parser/GenericXMLParserTest.java, test/parsertest/umlaute_dc_xml_iso.xml, test/parsertest/umlaute_dc_xml_utf8.xml
Tue Jun 20 09:21:55 CEST 2017
by luccioman
Cleaned up memory usage page HTML

- fixed validation errors
- removed deprecated attributes
- improved accessibility with richer table semantics (headers and
caption elements) and language declaration
Changed Files: htroot/PerformanceMemory_p.html, locales/cn.lng, locales/de.lng, locales/fr.lng, locales/master.lng.xlf, locales/ru.lng, locales/sk.lng, locales/uk.lng
Wed Jun 14 09:13:50 CEST 2017
by luccioman
Limit the synchronization blocking time on some Cache operations.

Using a Reentrant lock instead of the intrinsic synchronization lock
permits limiting the blocking time to acquire a lock.

Useful on a very busy Cache concurrently accessed by many threads : when
the time to acquire a lock is too high, getting/storing content on the
cache becomes inefficient, and it is then better to fall back to loading
remote resources.

Illustrated by the CacheTest stress test and some traces reported in
mantis 751 ( http://mantis.tokeek.de/view.php?id=751 )
Changed Files: source/net/yacy/crawler/data/Cache.java, source/net/yacy/kelondro/blob/ArrayStack.java, source/net/yacy/kelondro/blob/Compressor.java, source/net/yacy/search/Switchboard.java, test/java/net/yacy/crawler/data/CacheTest.java
Fri Jun 09 12:25:23 CEST 2017
by Michael Peter Christen
migrated Solr 5.5 -> Solr 6.6 and from Java 1.7 -> 1.8
Also: now Version 1.921
Changed Files: .classpath, .settings/org.eclipse.jdt.core.prefs, build.properties, build.xml, defaults/solr/schema.xml, defaults/solr/solrconfig.xml, htroot/yacysearchtrailer.java, lib/commons-math3-3.4.1.jar, lib/lucene-analyzers-common-6.6.0.jar, lib/lucene-analyzers-phonetic-6.6.0.jar, lib/lucene-backward-codecs-6.6.0.jar, lib/lucene-classification-6.6.0.jar, lib/lucene-codecs-6.6.0.jar, lib/lucene-core-6.6.0.jar, lib/lucene-facet-6.6.0.jar, lib/lucene-grouping-6.6.0.jar, lib/lucene-highlighter-6.6.0.jar, lib/lucene-join-6.6.0.jar, lib/lucene-memory-6.6.0.jar, lib/lucene-misc-6.6.0.jar, lib/lucene-queries-6.6.0.jar, lib/lucene-queryparser-6.6.0.jar, lib/lucene-spatial-6.6.0.jar, lib/lucene-suggest-6.6.0.jar, lib/metrics-core-3.2.2.jar, lib/solr-core-6.6.0.jar, lib/solr-dataimporthandler-6.6.0.jar, lib/solr-solrj-6.6.0.jar, lib/spatial4j-0.6.jar, lib/zookeeper-3.4.10.jar, source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java, source/net/yacy/cora/federate/solr/instance/EmbeddedInstance.java, source/net/yacy/cora/federate/solr/instance/InstanceMirror.java, source/net/yacy/cora/federate/solr/instance/ServerMirror.java, source/net/yacy/cora/federate/solr/instance/ServerShard.java, source/net/yacy/cora/federate/solr/responsewriter/EnhancedXMLResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/FlatJSONResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/GSAResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/GrepHTMLResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/OpensearchResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/SnapshotImagesReponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/YJsonResponseWriter.java, source/net/yacy/http/servlets/GSAsearchServlet.java, source/net/yacy/http/servlets/SolrSelectServlet.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/search/query/QueryModifier.java, source/net/yacy/search/query/QueryParams.java, test/java/net/yacy/document/DateDetectionTest.java
Sat Jun 03 04:00:46 CEST 2017
by luccioman
Ensure file input streams proper closing in both success and failures

Also add when possible a warning level log message on input stream
closing error instead of failing silently. This could help understanding
some IO exceptions such as "too many files open".
Changed Files: source/net/yacy/document/parser/images/bmpParser.java, source/net/yacy/document/parser/images/genericImageParser.java, source/net/yacy/document/parser/images/icoParser.java, source/net/yacy/gui/framework/Switchboard.java, source/net/yacy/kelondro/blob/Gap.java, source/net/yacy/kelondro/blob/HeapReader.java, source/net/yacy/kelondro/index/RowHandleMap.java, source/net/yacy/kelondro/index/RowHandleSet.java, source/net/yacy/kelondro/util/FileUtils.java, source/net/yacy/kelondro/util/SetTools.java, source/net/yacy/kelondro/util/XMLTables.java, source/net/yacy/repository/Blacklist.java, source/net/yacy/search/AutoSearch.java, source/net/yacy/search/Switchboard.java, source/net/yacy/server/http/TemplateEngine.java, source/net/yacy/utils/PKCS12Tool.java, source/net/yacy/utils/cryptbig.java, source/net/yacy/utils/tarTools.java, source/net/yacy/utils/translation/TranslationManager.java, test/java/net/yacy/document/parser/htmlParserTest.java, test/java/net/yacy/document/parser/images/genericImageParserTest.java, test/java/net/yacy/document/parser/images/metadataImageParserTest.java, test/java/net/yacy/document/parser/pdfParserTest.java
Fri Jun 02 12:14:29 CEST 2017
by luccioman
Ensure proper closing of file input streams.
Changed Files: source/net/yacy/cora/document/id/MultiProtocolURL.java, source/net/yacy/cora/geo/OpenGeoDBLocation.java, source/net/yacy/cora/protocol/ftp/FTPClient.java, source/net/yacy/cora/storage/Files.java, source/net/yacy/crawler/data/Snapshots.java, source/net/yacy/data/Translator.java, source/net/yacy/document/Condenser.java, source/net/yacy/document/Document.java, source/net/yacy/document/parser/pdfParser.java, source/net/yacy/http/Jetty9HttpServerImpl.java, source/net/yacy/utils/CryptoLib.java, source/net/yacy/utils/PKCS12Tool.java, source/net/yacy/utils/cryptbig.java, source/net/yacy/utils/gzip.java, source/net/yacy/yacy.java, test/java/net/yacy/document/ParserTest.java, test/java/net/yacy/document/parser/xlsParserTest.java
Fri Jun 02 01:00:21 CEST 2017
by reger
Introduce keyword query parameter 
This enables keyword navigator to filter on keywords. Added search page
output and layout config for keywords, allowing e.g. in Intranet use
to display the keywords. No styling or links applied to the keyword
text (but is desirable possibly in combination with bootstrap-tagsinput
for future/intranet).
Changed Files: defaults/yacy.init, htroot/ConfigSearchPage_p.html, htroot/ConfigSearchPage_p.java, htroot/index.html, htroot/yacysearchitem.html, htroot/yacysearchitem.java, source/net/yacy/search/navigator/StringNavigator.java, source/net/yacy/search/query/QueryModifier.java, source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/query/SearchEvent.java
Mon May 15 13:15:16 CEST 2017
by luccioman
Added user interface feedback on results feeding termination status.

Added as an additional icon with title in the search progress bar, to
inform about background search feeder threads terminated or still
running. While giving a bit more information to users about the p2p
search process, this can help choosing whether or not wait a little bit
more time before going to the next page, in order to get results from
various sources sorted as best as possible (see #91 for a discussion
about sorting accuracy and network latency).

Other related modifications included :
 - regular updates to statistics in the progress bar until the
background feeders are completely terminated.
 - removed some uses of unsecure and discouraged JavaScript elements
Changed Files: htroot/js/yacysearch.js, htroot/yacysearch.html, htroot/yacysearchitem.html, htroot/yacysearchitem.java, htroot/yacysearchlatestinfo.java, htroot/yacysearchlatestinfo.json, source/net/yacy/search/query/SearchEvent.java
Thu May 11 18:02:33 CEST 2017
by luccioman
Improved previous merge "Show ranking in HTML UI".

- added the new setting as configurable in the "Debug/Analysis" settings
page. Debug/analysis is its main purpose for now as there is currently
no nice and "understansable" ranking score info servlet (see forum
discussion http://forum.yacy-websuche.de/viewtopic.php?f=8&t=5884 ) 
- render in the "Search Page Layout" page preview when enabled
- added constants
Changed Files: defaults/yacy.init, htroot/ConfigSearchPage_p.html, htroot/ConfigSearchPage_p.java, htroot/SettingsAck_p.java, htroot/Settings_Debug.inc, htroot/Settings_p.java, htroot/yacysearchitem.html, source/net/yacy/search/SwitchboardConstants.java
Fri Apr 14 14:32:44 CEST 2017
by luccioman
Extended Mediawiki dump import to remote URLs.

When using a public HTTP URL in /IndexImportMediawiki_p.html, the remote
file now is directly streamed and processed, allowing import of several
GB dumps even with a low memory remote peer, and without need to
manually download the dump file first.
Changed Files: bin/importmediawiki.sh, htroot/IndexImportMediawiki_p.html, htroot/IndexImportMediawiki_p.java, source/net/yacy/cora/document/id/MultiProtocolURL.java, source/net/yacy/crawler/retrieval/FileLoader.java, source/net/yacy/crawler/retrieval/SMBLoader.java, source/net/yacy/document/importer/MediawikiImporter.java, source/net/yacy/document/parser/htmlParser.java, source/net/yacy/repository/LoaderDispatcher.java, source/net/yacy/search/index/DocumentIndex.java
Thu Apr 06 21:18:01 CEST 2017
by reger
upd to Solr-5.5.4
Changed Files: .classpath, build.xml, lib/lucene-analyzers-common-5.5.4.jar, lib/lucene-analyzers-phonetic-5.5.4.jar, lib/lucene-backward-codecs-5.5.4.jar, lib/lucene-classification-5.5.4.jar, lib/lucene-codecs-5.5.4.jar, lib/lucene-core-5.5.4.jar, lib/lucene-facet-5.5.4.jar, lib/lucene-grouping-5.5.4.jar, lib/lucene-highlighter-5.5.4.jar, lib/lucene-join-5.5.4.jar, lib/lucene-memory-5.5.4.jar, lib/lucene-misc-5.5.4.jar, lib/lucene-queries-5.5.4.jar, lib/lucene-queryparser-5.5.4.jar, lib/lucene-spatial-5.5.4.jar, lib/lucene-suggest-5.5.4.jar, lib/solr-core-5.5.4.jar, lib/solr-solrj-5.5.4.jar, pom.xml
Tue Apr 04 00:59:26 CEST 2017
by reger
upd to pdfbox-2.0.5.jar and transient dependency xmpcore-5.1.3.jar
required by metadata-extractor-2.10.1 (fix build.xml compiler warning)
Changed Files: .classpath, build.xml, lib/fontbox-2.0.5.License, lib/fontbox-2.0.5.jar, lib/pdfbox-2.0.5.License, lib/pdfbox-2.0.5.jar, lib/xmpcore-5.1.3.jar, lib/xmpcore-5.1.3.license, pom.xml
Mon Apr 03 11:34:49 CEST 2017
by luccioman
Set Config Portal as a private administration page.

Consistently with its required action from submission credentials, and
because external unauthenticated users do not need to access these
Changed Files: defaults/yacy.init, htroot/ConfigAppearance_p.html, htroot/ConfigPortal.java, htroot/ConfigPortal_p.html, htroot/ConfigPortal_p.java, htroot/ConfigSearchPage_p.html, htroot/ConfigSearchPage_p.java, htroot/env/templates/header.template, htroot/env/templates/submenuPortalConfiguration.template, locales/cn.lng, locales/de.lng, locales/fr.lng, locales/hi.lng, locales/ja.lng, locales/master.lng.xlf, locales/ru.lng, locales/uk.lng, source/net/yacy/http/servlets/GSAsearchServlet.java
Fri Mar 31 00:58:11 CEST 2017
by reger
Implement surrogate import from Warc archives (as first option handle
warc = Web ARChive File Format.
Warc files with extension .warc or compressed warc.gz can be placed in the
DATA/surrogate/in and contained responses are imported to the index.
The used library is stream based so we can easily extend it later to use
and load warc's from the net.
Changed Files: .classpath, build.xml, lib/jwat-archive-common-1.0.4.jar, lib/jwat-common-1.0.4.jar, lib/jwat-gzip-1.0.4.jar, lib/jwat-warc-1.0.4.jar, pom.xml, source/net/yacy/document/importer/WarcImporter.java, source/net/yacy/search/Switchboard.java
Sun Mar 26 11:48:00 CEST 2017
by luccioman
Enforced access controls on some administrative actions.

 - ensure use of HTTP POST method : HTTP GET should only be used for
information retrieval and not to perform server side effect operations
(see HTTP standard https://tools.ietf.org/html/rfc7231#section-4.2.1)
 - a transaction token is now required for these administrative form
submissions to ensure the request can not be included in an external
site and performed silently/by mistake by the user browser
Changed Files: bin/clearall.sh, bin/clearcache.sh, bin/clearindex.sh, bin/deleteurl.sh, bin/passwd.sh, bin/protectedPostApiCall.sh, htroot/ConfigAccounts_p.html, htroot/ConfigAccounts_p.java, htroot/ConfigProperties_p.html, htroot/ConfigProperties_p.java, htroot/ConfigUpdate_p.html, htroot/ConfigUpdate_p.java, htroot/IndexControlRWIs_p.html, htroot/IndexControlRWIs_p.java, htroot/IndexControlURLs_p.html, htroot/IndexControlURLs_p.java, htroot/IndexDeletion_p.html, htroot/IndexDeletion_p.java, htroot/IndexFederated_p.html, htroot/IndexFederated_p.java, htroot/PerformanceQueues_p.html, htroot/PerformanceQueues_p.java, htroot/Performance_p.html, htroot/Steering.html, htroot/Steering.java, htroot/env/templates/header.template, htroot/terminal_p.html, source/net/yacy/cora/protocol/HeaderFramework.java, source/net/yacy/data/BadTransactionException.java, source/net/yacy/data/TransactionManager.java, source/net/yacy/http/servlets/DisallowedMethodException.java, source/net/yacy/http/servlets/YaCyDefaultServlet.java, source/net/yacy/yacy.java, stopYACY.sh
Tue Mar 21 17:15:01 CET 2017
by luccioman
Updated shell scripts to be compatible with HTTP Digest authentication

Because curl and wget do not let use a hashed password as parameter,
YaCy shell scripts which require authentication are now interactive by
default when HTTP Digest is the only available authentication method.
Batch mode can still be available trough the use of an environment

Other improvements :
 - added backward compatibility for Basic Authentication
 - fixed curl/wget presence detection 
 - do not return with exit code 0 when an API call failed, and print an
error message when the case occurs
 - documented available authentication options for API calls
Changed Files: bin/apicall.sh, bin/apicat.sh, bin/down.sh, bin/passwd.sh, bin/search1.sh, stopYACY.sh
Sun Mar 19 02:30:08 CET 2017
by reger
Introduce the option to configure a shutdown port.
A port value of -1 will disable this option.

If set to a value greater 0, YaCy listens on this of on the local loopback 
address ( for a shutdown or restart signal.
E.g. connect to http://localhost:8005/shutdown will stop the YaCy server.
http://localhost:8005/restart will restart it.
This option allows to stop YaCy locally independant from the web web 
frontend (which might be configured for password protected remote access).

Changed Files: defaults/yacy.init, htroot/SettingsAck_p.html, htroot/SettingsAck_p.java, htroot/Settings_ServerAccess.inc, htroot/Settings_p.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/SwitchboardConstants.java, source/net/yacy/server/serverSwitch.java
Sat Mar 18 20:02:26 CET 2017
by reger
add switchboardconstants for server ports config keys
Changed Files: htroot/ConfigBasic.java, htroot/QuickCrawlLink_p.java, htroot/SettingsAck_p.java, htroot/api/snapshot.java, source/net/yacy/gui/Tray.java, source/net/yacy/http/Jetty9HttpServerImpl.java, source/net/yacy/migration.java, source/net/yacy/peers/Network.java, source/net/yacy/peers/Seed.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/SwitchboardConstants.java, source/net/yacy/utils/upnp/UPnP.java, source/net/yacy/yacy.java
Tue Feb 28 18:11:54 CET 2017
by luccioman
Privacy enhancement : added settings to control referrer policy.

HTTP "Referer" header sent by the browser when using YaCy can now be
controlled either with the referrer meta tag as a global policy, or only
for search result links by adding the attribute rel="noreferrer".

To improve privacy with the less possible regressions, the default is
set as meta tag with value "origin-when-cross-origin" : internal YaCy
links behavior is not affected, but when visiting external websites
referrer url is not empty but stripped from query parameters and path.

Older browsers, Safari, MS IE and Edge do not support the referrer meta
tag, so the standard but less flexible noreferrer link type can also be
enabled as an alternative.

User-friendly settings page to be implemented.
Changed Files: defaults/yacy.init, htroot/env/templates/metas.template, htroot/yacysearchitem.html, htroot/yacysearchitem.java, source/net/yacy/http/servlets/YaCyDefaultServlet.java, source/net/yacy/search/SwitchboardConstants.java
Mon Feb 20 10:48:07 CET 2017
by luccioman
Refactored and enforced Solr mandatory fields for proper operation

- Added a new method to check activation of mandatory fields on
Collection Configuration commit, consistently with checks previously
performed in Switchboard startup and with mandatory fields in the
default schema.
- Reorganized default schema and CollectionConfiguration enumeration :
moved no more mandatory fields in a specific section, and moved fields
enabled at startup to the mandatory section. 
- Marked mandatory fields as required and with stronger font in the
IndexSchema_p.html page
Changed Files: defaults/solr.collection.schema, htroot/IndexSchema_p.html, htroot/IndexSchema_p.java, source/net/yacy/cora/federate/solr/SchemaDeclaration.java, source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/CollectionSchema.java, source/net/yacy/search/schema/WebgraphSchema.java
Mon Feb 13 19:11:17 CET 2017
by luccioman
Added support for HTML OpenSearch results.

Many OpenSearch systems do not provide results as standard RSS/Atom
feeds but only as HTML. 

This modification add some support for custom OpenSearch HTML results
through the use of mapping files (as already done for federated Solr
search) relying on CSS-like selectors to retrieve information from HTML

An example mapping file is provided to map results from the
www.npmjs.com OpenSearch URL.
Changed Files: defaults/federatecfg/npmjs.html.map.properties, defaults/heuristicopensearch.conf, source/net/yacy/cora/federate/AbstractFederateSearchConnector.java, source/net/yacy/cora/federate/FederateSearchManager.java, source/net/yacy/cora/federate/opensearch/OpenSearchConnector.java, source/net/yacy/cora/protocol/Domains.java, source/net/yacy/cora/protocol/http/HTTPClient.java
Sat Feb 11 19:53:27 CET 2017
by reger
upd to Jetty-9.2.21.v20170120
Changed Files: .classpath, build.xml, lib/jetty-client-9.2.21.v20170120.jar, lib/jetty-continuation-9.2.21.v20170120.jar, lib/jetty-deploy-9.2.21.v20170120.jar, lib/jetty-http-9.2.21.v20170120.jar, lib/jetty-io-9.2.21.v20170120.jar, lib/jetty-jmx-9.2.21.v20170120.jar, lib/jetty-proxy-9.2.21.v20170120.jar, lib/jetty-security-9.2.21.v20170120.jar, lib/jetty-server-9.2.21.v20170120.jar, lib/jetty-servlet-9.2.21.v20170120.jar, lib/jetty-servlets-9.2.21.v20170120.jar, lib/jetty-util-9.2.21.v20170120.jar, lib/jetty-webapp-9.2.21.v20170120.jar, lib/jetty-xml-9.2.21.v20170120.jar, pom.xml
Thu Feb 09 11:05:06 CET 2017
by luccioman
Added a new Debug/Analysis advanced settings subsection.

As discussed in PR #93 with @JeremyRand and @reger24 this new advanced
settings page includes:
 - a new setting to control remote Solr responses encoding
 - some existing debug settings which could not be set through the admin
user interface
Changed Files: defaults/yacy.init, htroot/SettingsAck_p.html, htroot/SettingsAck_p.java, htroot/Settings_Debug.inc, htroot/Settings_p.html, htroot/Settings_p.java, source/net/yacy/cora/federate/SolrFederateSearchConnector.java, source/net/yacy/cora/federate/solr/instance/InstanceMirror.java, source/net/yacy/peers/Protocol.java, source/net/yacy/search/AutoSearch.java, source/net/yacy/search/SwitchboardConstants.java, source/net/yacy/search/index/Fulltext.java
Fri Jan 27 15:47:15 CET 2017
by luccioman
Added user-friendly controls over disk usage configuration settings.

As mentioned in issue #103, control settings over YaCy disk usage
already existed but lacked a user-friendly way to set them.

I added it to the Performance_p.html administration page with a little
refactoring on the "Resource Observer" fieldset for improved
accessibility and HTML standards respect.
Also added the possibility to enable/disable the autoregulation fonction
from this page.
Changed Files: htroot/PerformanceQueues_p.java, htroot/Performance_p.html, htroot/env/base.css, locales/cn.lng, locales/de.lng, locales/master.lng.xlf, locales/ru.lng, locales/uk.lng, source/net/yacy/search/ResourceObserver.java, source/net/yacy/search/SwitchboardConstants.java
Sun Jan 22 23:58:46 CET 2017
by reger
Group all proxy settings on System Administration by adding settings of
UrlProxyAccss page (moved from deleted AugmentedBrowsing_p), adjust
submenu (remove Augmented Browsing) and translation files.
Changed Files: htroot/ConfigSearchPage_p.html, htroot/SettingsAck_p.html, htroot/SettingsAck_p.java, htroot/Settings_UrlProxyAccess.inc, htroot/Settings_p.html, htroot/Settings_p.java, htroot/Status_p.inc, htroot/env/templates/submenuSemantic.template, locales/de.lng, locales/fr.lng, locales/ja.lng, locales/master.lng.xlf, locales/ru.lng, source/net/yacy/http/servlets/UrlProxyServlet.java, source/net/yacy/http/servlets/YaCyProxyServlet.java
Sat Jan 21 00:26:04 CET 2017
by reger
upd to solr-5.5.3
minor bugfix version
Changed Files: .classpath, build.xml, lib/lucene-analyzers-common-5.5.3.jar, lib/lucene-analyzers-phonetic-5.5.3.jar, lib/lucene-backward-codecs-5.5.3.jar, lib/lucene-classification-5.5.3.jar, lib/lucene-codecs-5.5.3.jar, lib/lucene-core-5.5.3.jar, lib/lucene-facet-5.5.3.jar, lib/lucene-grouping-5.5.3.jar, lib/lucene-highlighter-5.5.3.jar, lib/lucene-join-5.5.3.jar, lib/lucene-memory-5.5.3.jar, lib/lucene-misc-5.5.3.jar, lib/lucene-queries-5.5.3.jar, lib/lucene-queryparser-5.5.3.jar, lib/lucene-spatial-5.5.3.jar, lib/lucene-suggest-5.5.3.jar, lib/solr-core-5.5.3.jar, lib/solr-solrj-5.5.3.jar, pom.xml
Mon Jan 09 16:44:47 CET 2017
by luccioman
Cleaned up some Javadoc warnings.
Changed Files: source/net/yacy/cora/date/ISO8601Formatter.java, source/net/yacy/cora/protocol/http/HTTPClient.java, source/net/yacy/data/list/ListAccumulator.java, source/net/yacy/data/list/XMLBlacklistImporter.java, source/net/yacy/data/ymark/YMarkUtil.java, source/net/yacy/document/AbstractParser.java, source/net/yacy/document/Document.java, source/net/yacy/document/LargeNumberCache.java, source/net/yacy/document/LibraryProvider.java, source/net/yacy/document/Parser.java, source/net/yacy/document/TextParser.java, source/net/yacy/document/content/DCEntry.java, source/net/yacy/document/importer/Importer.java, source/net/yacy/document/importer/MediawikiImporter.java, source/net/yacy/document/importer/ResumptionToken.java, source/net/yacy/document/parser/apkParser.java, source/net/yacy/document/parser/docParser.java, source/net/yacy/document/parser/html/ContentScraper.java, source/net/yacy/document/parser/html/Evaluation.java, source/net/yacy/document/parser/html/ImageEntry.java, source/net/yacy/document/parser/html/TransformerWriter.java, source/net/yacy/document/parser/htmlParser.java, source/net/yacy/gui/framework/Switchboard.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/SwitchboardConstants.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/navigator/LanguageNavigator.java, source/net/yacy/search/navigator/Navigator.java, source/net/yacy/search/navigator/RestrictedStringNavigator.java, source/net/yacy/search/navigator/YearNavigator.java, source/net/yacy/search/query/QueryGoal.java, source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/snippet/TextSnippet.java
Wed Jan 04 17:09:37 CET 2017
by luccioman
Upgraded jgit build library to version 4.5.0

This is the latest Java 7 compatible jgit release.

Properly support GitHub tags marked as "Pre-release". 
With the previous venerable jgit version 1.1.0, a YaCy repository clone
having such a tag made GitRevTask and GitRevMavenTask crash.
Changed Files: build.xml, libbuild/GitRevMavenTask/pom.xml, libbuild/GitRevMavenTask/src/GitRevMavenTask.java, libbuild/GitRevTask/GitRevTask.java, libbuild/JavaEWAH-0.7.9.License, libbuild/JavaEWAH-0.7.9.jar, libbuild/httpclient-4.3.6.License, libbuild/httpclient-4.3.6.jar, libbuild/jsch-0.1.53.License, libbuild/jsch-0.1.53.jar, libbuild/org.eclipse.jgit-, libbuild/org.eclipse.jgit-, libbuild/slf4j-api-1.7.2.License, libbuild/slf4j-api-1.7.2.jar, pom.xml

Jump to: YaCy Release current_development top / Other Changes

Wed Feb 28 12:23:52 CET 2018
by luccioman
Small fix on svg parser error message
Changed Files: source/net/yacy/document/parser/images/svgParser.java
Wed Feb 28 07:31:32 CET 2018
by luccioman
Fixed NPE case when on audio resource parsed with null tag
Changed Files: source/net/yacy/document/parser/audioTagParser.java
Sat Feb 10 11:56:28 CET 2018
by luccioman
Fixed issue #158 : completed div CSS class ignore in crawl
Changed Files: htroot/CrawlStartExpert.html, source/net/yacy/document/parser/html/AbstractScraper.java, source/net/yacy/document/parser/html/ContentScraper.java, source/net/yacy/document/parser/html/Scraper.java, source/net/yacy/document/parser/html/TransformerWriter.java, test/java/net/yacy/document/parser/htmlParserTest.java
Thu Feb 08 14:31:26 CET 2018
by luccioman
Fixed loss of search modifiers on bookmark, recommand or delete result
Changed Files: htroot/yacysearchitem.java
Tue Feb 06 17:17:13 CET 2018
by luccioman
Fixed loss of other modifiers on keywords/tags search navigation links
Changed Files: source/net/yacy/search/query/QueryParams.java, test/java/net/yacy/search/query/QueryParamsTest.java
Sat Jan 13 10:45:00 CET 2018
by luccioman
Use a constant for crawler reject reason prefix with specific processing
Changed Files: source/net/yacy/crawler/CrawlStacker.java, source/net/yacy/search/Switchboard.java
Wed Jan 10 18:38:42 CET 2018
by luccioman
Fixed internal tables exact value match iterator
Changed Files: source/net/yacy/kelondro/blob/Tables.java
Fri Dec 01 09:48:42 CET 2017
by luccioman
Fixed URL parsing with fragment and empty path
Changed Files: source/net/yacy/cora/document/id/MultiProtocolURL.java, test/java/net/yacy/cora/document/id/MultiProtocolURLTest.java
Thu Nov 30 20:21:45 CET 2017
by luccioman
Fixed url mask filter generated when protocol modifier is not null
Changed Files: source/net/yacy/search/query/QueryParams.java
Tue Oct 31 07:52:30 CET 2017
by luccioman
Fixed spelling
Changed Files: htroot/index.html, locales/cn.lng, locales/de.lng, locales/fr.lng, locales/master.lng.xlf, locales/ru.lng, locales/uk.lng
Tue Oct 24 09:30:21 CEST 2017
by luccioman
Fixed blacklist returned location URL on empty parameters
Changed Files: source/net/yacy/repository/BlacklistHelper.java
Wed Oct 18 08:31:18 CEST 2017
by luccioman
Fixed NullPointerException cases on snapshot images parsing.
Changed Files: htroot/api/snapshot.java, source/net/yacy/cora/util/Html2Image.java
Mon Oct 16 19:47:18 CEST 2017
by luccioman
Fixed a NullPointerException case on images encoding errors.
Changed Files: source/net/yacy/http/servlets/YaCyDefaultServlet.java
Thu Oct 05 14:44:33 CEST 2017
by luccioman
Fixed Travis configuration for Debian package building task
Changed Files: .travis.yml, build.xml
Thu Oct 05 14:44:33 CEST 2017
by luccioman
Fixed Travis configuration for Debian package building task
Changed Files: .travis.yml
Thu Oct 05 14:26:55 CEST 2017
by luccioman
Fixed YaCy Debian package path in Travis configuration
Changed Files: .travis.yml
Tue Aug 29 07:32:33 CEST 2017
by luccioman
Fixed Unresolved_Pattern occurence on results favicon HTML id.
Changed Files: htroot/yacysearchitem.java
Sun Jul 16 14:39:53 CEST 2017
by luccioman
Distinguish response parsing failures from unexpected exceptions.
Changed Files: source/net/yacy/crawler/retrieval/Response.java
Tue Jul 11 09:00:27 CEST 2017
by luccioman
Fixed read/copy on input streams reading sometimes less than expected.
Changed Files: source/net/yacy/kelondro/util/FileUtils.java, test/java/net/yacy/kelondro/util/FileUtilsTest.java
Sat Jul 08 22:46:15 CEST 2017
by reger
Fix unresolved pattern in api/share.html by init some display var's
Changed Files: htroot/api/share.java
Fri Jun 30 01:06:17 CEST 2017
by luccioman
Do not wrap unnecessarily loader IOExceptions in IOExceptions
Changed Files: source/net/yacy/repository/LoaderDispatcher.java
Thu Jun 08 07:19:16 CEST 2017
by luccioman
Properly close file output streams even on exceptions scenarios.
Changed Files: htroot/ConfigLanguage_p.java, source/net/yacy/cora/federate/solr/instance/EmbeddedInstance.java, source/net/yacy/cora/lod/vocabulary/Tagging.java, source/net/yacy/cora/protocol/ftp/FTPClient.java, source/net/yacy/cora/storage/ZIPWriter.java, source/net/yacy/crawler/data/Transactions.java, source/net/yacy/data/Translator.java, source/net/yacy/document/content/dao/PhpBB3Dao.java, source/net/yacy/document/parser/apkParser.java, source/net/yacy/document/parser/bzipParser.java, source/net/yacy/document/parser/gzipParser.java, source/net/yacy/http/Jetty9HttpServerImpl.java, source/net/yacy/kelondro/blob/Gap.java, source/net/yacy/kelondro/blob/HeapWriter.java, source/net/yacy/kelondro/index/BinSearch.java, source/net/yacy/kelondro/index/RowHandleMap.java, source/net/yacy/kelondro/index/RowHandleSet.java, source/net/yacy/kelondro/util/XMLTables.java, source/net/yacy/peers/operation/yacyRelease.java, source/net/yacy/repository/Blacklist.java, source/net/yacy/search/AutoSearch.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/server/serverSwitch.java, source/net/yacy/utils/gzip.java, source/net/yacy/utils/tarTools.java, source/net/yacy/utils/translation/TranslatorXliff.java, source/net/yacy/visualization/AnimationGIF.java, source/net/yacy/visualization/AnimationPlotter.java, source/net/yacy/visualization/ChartPlotter.java, source/net/yacy/visualization/RasterPlotter.java
Tue May 30 12:32:14 CEST 2017
by luccioman
Fix unescape of URLs having some '%' chars but not percent-encoded
Changed Files: source/net/yacy/cora/document/id/MultiProtocolURL.java, test/java/net/yacy/cora/document/id/MultiProtocolURLTest.java
Tue May 30 08:48:20 CEST 2017
by luccioman
Fixed scraper NullPointerException cases on malformed URLs.
Changed Files: source/net/yacy/document/parser/html/ContentScraper.java
Thu May 18 00:28:12 CEST 2017
by Michael Peter Christen
enhanced debugging
Changed Files: source/net/yacy/search/schema/CollectionSchema.java
Tue May 09 12:15:41 CEST 2017
by luccioman
Fixed Debian install message misspelling.
Changed Files: debian/yacy.templates
Thu May 04 08:45:30 CEST 2017
by luccioman
Fixed the previously added link to scheduled dump operations.
Changed Files: htroot/IndexImportMediawiki_p.html
Mon May 01 11:44:26 CEST 2017
by Michael Peter Christen
copied fix from yacy_grid_parser for wrong array type
Changed Files: source/net/yacy/document/parser/html/ContentScraper.java
Mon Apr 24 13:27:07 CEST 2017
by luccioman
Fixed "Unchecked conversion" compilation warnings.
Changed Files: source/net/yacy/cora/federate/solr/responsewriter/FlatJSONResponseWriter.java, source/net/yacy/cora/util/JSONArray.java, source/net/yacy/cora/util/JSONObject.java, source/net/yacy/document/parser/pdfParser.java, source/net/yacy/search/navigator/FileTypeNavigator.java, source/net/yacy/search/navigator/HostNavigator.java, source/net/yacy/search/navigator/StringNavigator.java, source/net/yacy/search/navigator/TokenizedStringNavigator.java, source/net/yacy/search/navigator/YearNavigator.java
Fri Apr 14 21:14:26 CEST 2017
by reger
fix unresolved_pattern on missing post parameter api/message.html
Changed Files: htroot/yacy/message.java
Thu Mar 30 15:41:14 CEST 2017
by luccioman
Fixed NPE case and API URL link on Solr HTML output for webgraph core.
Changed Files: source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java
Tue Mar 07 12:27:27 CET 2017
by luccioman
Fixed settingsAck_p.html back link for case where referrer is stripped.
Changed Files: htroot/SettingsAck_p.java
Fri Mar 03 13:46:44 CET 2017
by luccioman
Fixed unresolved pattern case on /yacysearchlatestinfo.json api
Changed Files: htroot/yacysearchlatestinfo.java
Thu Feb 16 02:36:24 CET 2017
by reger
fix NPE in HTMLResponseWriter on missing document title
Changed Files: source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java
Thu Feb 09 10:59:41 CET 2017
by luccioman
Fixed NPE case occurring when local solr index is disabled in search.
Changed Files: source/net/yacy/search/query/SearchEvent.java
Tue Jan 24 11:49:15 CET 2017
by luccioman
Index Browser : fixed display of "Count colors" for authorized users.
Changed Files: htroot/HostBrowser.java
Mon Jan 23 14:54:37 CET 2017
by luccioman
Fixed "-UNRESOLVED_PATTERN-" admin parameter in "load & index" links.
Changed Files: htroot/HostBrowser.java
Sat Jan 21 00:35:05 CET 2017
by reger
fix the missing solr-5.5.2.jar delete from prev. commit
Changed Files:
Mon Jan 09 17:59:01 CET 2017
by luccioman
Fixed 2 failing JUNit tests.
Changed Files: test/java/net/yacy/document/DateDetectionTest.java, test/java/net/yacy/utils/translation/TranslatorXliffTest.java
Mon Jan 09 09:57:53 CET 2017
by luccioman
Fixed some JavaDocs broken links.
Changed Files: source/net/yacy/cora/bayes/Classifier.java, source/net/yacy/data/list/ListAccumulator.java, source/net/yacy/search/SwitchboardConstants.java
Mon Jan 09 09:54:14 CET 2017
by luccioman
Fixed maven assembly base directory to match last main YaCy binaries.
Changed Files: assembly.xml

Other Changes   
Jump to: YaCy Release current_development top / Bugfixes

Sat Mar 10 15:46:53 CET 2018
by Michael Peter Christen
added nav filter
Changed Files: htroot/AccessTracker_p.java, htroot/CrawlStartExpert.html, source/net/yacy/cora/protocol/Scanner.java, source/net/yacy/document/parser/html/ContentScraper.java
Thu Mar 01 20:50:44 CET 2018
by luccioman
Enabled partial parsing of audio resources.
Changed Files: source/net/yacy/document/parser/audioTagParser.java, source/net/yacy/kelondro/util/FileUtils.java
Wed Feb 28 13:46:40 CET 2018
by luccioman
Updated audio file extensions with ones recently added to audioTagParser
Changed Files: source/net/yacy/cora/document/analysis/Classification.java
Wed Feb 28 12:27:17 CET 2018
by luccioman
Let a chance for other parsers on audioTagParser error

As done in all other parsers, eventually falling back in the end to the
genericParser which creates a minimal index entry.
Changed Files: source/net/yacy/document/parser/audioTagParser.java
Wed Feb 28 11:58:32 CET 2018
by luccioman
Reuse existing File copy function to handle audio parser tmp files
Changed Files: source/net/yacy/document/parser/audioTagParser.java
Wed Feb 28 08:19:13 CET 2018
by luccioman
Factored audio parser tag processing
Changed Files: source/net/yacy/document/parser/audioTagParser.java
Wed Feb 28 07:49:40 CET 2018
by luccioman
Removed some unnecessary intermediate list creation on array copy.
Changed Files: source/net/yacy/document/Document.java
Tue Feb 27 18:04:12 CET 2018
by luccioman
Updated the list of audio file formats supported by the audioTagParser

Follows upgrade to Jaudiotagger dependency to version 2.2.5.
Changed Files: defaults/yacy.init, source/net/yacy/document/parser/audioTagParser.java, source/net/yacy/migration.java, source/net/yacy/search/Switchboard.java
Mon Feb 26 09:17:26 CET 2018
by luccioman
Upgraded Jaudiotagger dependency from 2.0.3 to 2.2.5
Changed Files: .classpath, build.xml, lib/jaudiotagger-2.2.5.License, lib/jaudiotagger-2.2.5.jar, pom.xml
Fri Feb 23 19:17:09 CET 2018
by reger
upd to commons-compress-1.16.1 
Changed Files: .classpath, build.xml, lib/commons-compress-1.16.1.jar, lib/commons-compress-1.16.License, pom.xml
Fri Feb 23 11:41:50 CET 2018
by luccioman
Added HTML5 embedded audio for results playing on supporting browsers

Restricted to authenticated or localhost users only to prevent
redistribution license issues.
Changed Files: htroot/env/base.css, htroot/js/yacysearch.js, htroot/yacysearch.html, htroot/yacysearch.java, htroot/yacysearchitem.html, htroot/yacysearchitem.java
Fri Feb 23 11:36:03 CET 2018
by luccioman
Added missing vocabulary navigator increment on results from RWI
Changed Files: source/net/yacy/search/query/SearchEvent.java
Wed Feb 21 08:41:13 CET 2018
by luccioman
Allow creation of vocabularies from remote CSV file URLs.
Changed Files: htroot/Vocabulary_p.html, htroot/Vocabulary_p.java, source/net/yacy/kelondro/util/FileUtils.java
Wed Feb 21 08:38:35 CET 2018
by luccioman
Make StreamResponse usable in Java try-with-resources statements
Changed Files: source/net/yacy/crawler/retrieval/StreamResponse.java
Tue Feb 20 12:22:54 CET 2018
by luccioman
Enforced controls on vocabulary editing operations.
Changed Files: htroot/Vocabulary_p.html, htroot/Vocabulary_p.java
Tue Feb 20 11:22:34 CET 2018
by luccioman
Vocabulary editor : use accessible labels and CSS for elements position
Changed Files: htroot/Vocabulary_p.html, htroot/env/base.css
Mon Feb 19 15:15:02 CET 2018
by luccioman
Vocabulary_p.html : richer semantics for HTML tables

Also replaced deprecated attributes
Changed Files: htroot/Vocabulary_p.html
Mon Feb 19 11:48:40 CET 2018
by luccioman
Provide user interface messages on vocabulary creation read/write errors
Changed Files: htroot/Vocabulary_p.html, htroot/Vocabulary_p.java, source/net/yacy/cora/lod/vocabulary/Tagging.java
Mon Feb 19 09:35:44 CET 2018
by luccioman
Mark vocabulary name field as required using html instead of JavaScript
Changed Files: htroot/Vocabulary_p.html
Mon Feb 19 08:54:42 CET 2018
by luccioman
Fixed Vocabulary_p.html HTML validation errors.

Validated with Validated with Nu Html Checker 17.11.1.
Changed Files: htroot/Vocabulary_p.html, locales/de.lng, locales/master.lng.xlf, locales/ru.lng
Fri Feb 16 10:19:41 CET 2018
by luccioman
Issue #156 : new option to clean up (or not) search cache on crawl start

Prevent also unnecessary search event cache clean-up on each access to
the crawl monitor page (Crawler_p.html).
Changed Files: htroot/CrawlStartExpert.html, htroot/CrawlStartExpert.java, htroot/CrawlStartSite.html, htroot/Crawler_p.java, htroot/Load_MediawikiWiki.html, htroot/Load_MediawikiWiki.java, htroot/Load_PHPBB3.html, htroot/Load_PHPBB3.java
Fri Feb 16 08:51:26 CET 2018
by luccioman
Upgraded maven JUnit test dependency from 4.11 to 4.12
Changed Files: pom.xml
Thu Feb 15 19:14:07 CET 2018
by luccioman
Use https rather than http in links and queries to openstreetmap.org
Changed Files: htroot/yacysearch.html, source/net/yacy/peers/graphics/OSMTile.java
Thu Feb 15 07:29:17 CET 2018
by luccioman
Handle escaped line breaks and separators in vocabulary import from CSV
Changed Files: htroot/Vocabulary_p.java, test/Vocabulary_pTest.java
Wed Feb 14 10:31:09 CET 2018
by luccioman
Added a line start field for vocabulary import from CSV file

As a convenience to ignore eventual CSV header lines
Changed Files: htroot/Vocabulary_p.html, htroot/Vocabulary_p.java
Wed Feb 14 09:29:04 CET 2018
by luccioman
Added option to choose field delimiter in vocabulary import from CSV
Changed Files: htroot/Vocabulary_p.html, htroot/Vocabulary_p.java
Wed Feb 14 09:27:17 CET 2018
by luccioman
Removed unused import
Changed Files: source/net/yacy/search/query/SearchEvent.java
Wed Feb 14 07:14:25 CET 2018
by luccioman
Reuse the same Pattern instance when matching multiple key/values
Changed Files: source/net/yacy/server/serverObjects.java
Tue Feb 13 18:24:26 CET 2018
by luccioman
Improved blacklist entries editing operations :

- Fixes issue #160 : handle properly syntax exceptions with a user
friendly message
- Fixes loss of information on multiple blacklist entries editions
- Fixes loss of entries when moving entries from one list to another
Changed Files: htroot/Blacklist_p.html, htroot/Blacklist_p.java, htroot/IndexControlRWIs_p.java, htroot/api/blacklists/add_entry_p.java, source/net/yacy/repository/Blacklist.java, source/net/yacy/repository/BlacklistHelper.java, source/net/yacy/server/serverObjects.java
Mon Feb 12 01:16:14 CET 2018
by reger
Remove now obsolete html for language-nav and ISO639 jar reference
Changed Files: htroot/yacysearchtrailer.html, htroot/yacysearchtrailer.java
Mon Feb 12 00:16:34 CET 2018
by reger
Adjust and move Language Navigator to be member of the navigatior plugin
Changed Files: htroot/ConfigSearchPage_p.html, htroot/ConfigSearchPage_p.java, htroot/yacysearchtrailer.java, source/net/yacy/search/navigator/LanguageNavigator.java, source/net/yacy/search/navigator/NavigatorPlugins.java, source/net/yacy/search/query/SearchEvent.java
Sat Feb 10 20:01:35 CET 2018
by reger
upd to httpclient-4.5.5
Changed Files: .classpath, build.xml, lib/httpclient-4.5.5.jar, lib/httpcore-4.4.9.License, lib/httpcore-4.4.9.jar, lib/httpmime-4.5.5.jar, pom.xml
Thu Feb 08 08:07:30 CET 2018
by luccioman
Fixed loss of "meanCount" search param when using facets or page buttons

Then on new search queries, no suggestions at all could be displayed.
Changed Files: htroot/yacysearch.java, htroot/yacysearchitem.java, source/net/yacy/search/query/QueryParams.java
Wed Feb 07 15:54:46 CET 2018
by luccioman
Do not clear all search modifiers when unselecting one modifier.

Previously, when clicking a selected facet in the search results page to
unselect it, all other eventually selected modifiers/facets were also
Changed Files: htroot/yacysearchtrailer.java, source/net/yacy/search/query/QueryParams.java
Tue Feb 06 15:14:14 CET 2018
by luccioman
Remove old query terms from search results suggestions links.

Especially when old terms were misspelled, suggestions links then
provided most of the time empty results.
Changed Files: htroot/yacysearch.java, source/net/yacy/search/query/QueryParams.java
Tue Feb 06 12:33:44 CET 2018
by luccioman
Enable results suggestions (Did you Mean) even when RWI is not enabled.

RWI is no more necessary for suggestions processing since commit
Revealed by a question about spell check from ouahpiti on YaCy forum
(http://forum.yacy-websuche.de/viewtopic.php?f=23&t=6084 ).
Changed Files: htroot/yacysearch.java
Fri Feb 02 10:27:36 CET 2018
by luccioman
Refactoring : documented and extracted autotagging processing functions.
Changed Files: source/net/yacy/document/Tokenizer.java, test/java/net/yacy/document/TokenizerTest.java
Fri Feb 02 09:31:40 CET 2018
by luccioman
Added HTML microdata typed items parsing capability.

This adds the possibility for the HTML parser to gather typed items URLs
annotated in HTML tags with itemscope and itemtype attributes (see
microdata specification https://www.w3.org/TR/microdata/ ), notably
Types from the schema.org vocabulary, but also Types/Classes from any
other vocabulary, such as the common ones listed in the RDFa core
context ( https://www.w3.org/2011/rdfa-context/rdfa-1.1.html ).
Changed Files: source/net/yacy/document/parser/html/ContentScraper.java, source/net/yacy/document/parser/html/Scraper.java, source/net/yacy/document/parser/html/TransformerWriter.java, test/java/net/yacy/document/parser/html/ContentScraperTest.java
Tue Jan 30 21:00:18 CET 2018
by luccioman
Create recrawl requests with the relevant crawl profile.

Recrawl default profile was previously effectively used for crawl
stacker acceptance check, but request entries were indeed still created
with the "snippetGlobalText" profile.
Changed Files: source/net/yacy/crawler/RecrawlBusyThread.java
Mon Jan 29 18:34:47 CET 2018
by luccioman
Added an utility to generate/update XLIFF master file from lng files.
Changed Files: htroot/Translator_p.java, source/net/yacy/utils/translation/GenerateMasterXliff.java, source/net/yacy/utils/translation/TranslationManager.java
Mon Jan 29 16:51:00 CET 2018
by luccioman
Updated master and French translation for the IndexReIndexMonitor_p page
Changed Files: htroot/IndexReIndexMonitor_p.html, locales/de.lng, locales/fr.lng, locales/master.lng.xlf
Mon Jan 29 14:03:01 CET 2018
by luccioman
Moved dbtest to the test source folder.
Changed Files: test/java/net/yacy/dbtest.java
Mon Jan 29 14:00:43 CET 2018
by luccioman
Fixed NullPointerException case on Table init with relative file path.

Can occur for example when running dbtest with relative test table file
name (wihout explicit parent folder).
Changed Files: source/net/yacy/kelondro/table/Table.java
Mon Jan 29 13:56:37 CET 2018
by luccioman
Shutdown daemon threads at the end of dbtest
Changed Files: source/net/yacy/dbtest.java
Mon Jan 29 13:38:25 CET 2018
by luccioman
Replaced improper ByteBuffer.equals() implementation by Arrays.equals()

Renamed also ByteBuffer.equals() to startsWith() as this is the
appropriate function implementation semantics.
Changed Files: htroot/IndexControlRWIs_p.java, htroot/Wiki.java, htroot/yacy/search.java, source/net/yacy/cora/util/ByteBuffer.java, source/net/yacy/dbtest.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/ranking/ReferenceOrder.java
Sun Jan 28 12:41:56 CET 2018
by luccioman
Added a manual performance test for the HostBalancer.

Consequently to the report in mantis 776

Running the perfs test with different control parameters seems to reveal
that the YaCy's RowHandleMap used in the balancer depthCache is finally
more efficient than for example the ConcurrentHashMap from JDK 8.
Changed Files: test/java/net/yacy/crawler/HostBalancerTest.java
Sat Jan 27 18:32:45 CET 2018
by reger
upd to metadata-extractor-2.11.0.jar
Changed Files: .classpath, build.xml, lib/metadata-extractor-2.11.0.License, lib/metadata-extractor-2.11.0.jar, pom.xml
Fri Jan 26 17:15:27 CET 2018
by luccioman
Removed time condition on HostBalancer initialization in JUnit test.

Its initialization in main application usage remains asynchronous.
Changed Files: source/net/yacy/crawler/HostBalancer.java, test/java/net/yacy/crawler/HostBalancerTest.java
Fri Jan 26 10:31:13 CET 2018
by luccioman
Commit Solr index before simulating or starting recrawl job.

This ensures up-to-date simulation query results, and recrawl
Changed Files: htroot/IndexReIndexMonitor_p.java, source/net/yacy/crawler/RecrawlBusyThread.java
Fri Jan 26 09:50:40 CET 2018
by luccioman
Merge pull request #155 from JeremyRand/readme-typo-fixes

Fix some typos in the README.
Changed Files: README.md
Fri Jan 26 05:34:31 CET 2018
by JeremyRand
Fix some typos in the README.
Changed Files: README.md
Thu Jan 25 07:57:56 CET 2018
by luccioman
Revised the RDFaParser main launcher for minimal proper operation.

This parser is still not enabled in the main text parsers list. More
would have to be done to make it functional.
Changed Files: source/net/yacy/document/parser/rdfa/impl/RDFaParser.java, source/net/yacy/document/parser/rdfa/impl/RDFaTripleImpl.java
Sat Jan 20 18:54:08 CET 2018
by luccioman
Fixed stored URL in web cache when redirection(s) occurs.

Associate cached content to the last redirection location, instead of
the first URL of a redirection(s) chain :
 - for proper base URL processing in parsers (fixes mantis 636 -
 - to prevent duplicated content in Solr index when recrawling a
redirected URL
Changed Files: source/net/yacy/crawler/retrieval/Response.java, source/net/yacy/repository/LoaderDispatcher.java
Fri Jan 19 11:58:52 CET 2018
by luccioman
Automatically refresh running recrawl report when JavaScript is enabled.

For users who would prefer to keep JavaScript disabled, a manual Refresh
button is still available.
Changed Files: htroot/IndexReIndexMonitor_p.html, htroot/IndexReIndexMonitor_p.java, htroot/IndexReIndexMonitor_p.json, htroot/js/IndexReIndexMonitor.js, htroot/jslicense.html
Fri Jan 19 10:18:35 CET 2018
by luccioman
Merge pull request #154 from tangdou1/master

update chinese translation
Changed Files: locales/zh.lng
Tue Jan 16 10:16:14 CET 2018
by tangdou1
Merge pull request #1 from tangdou1/tangdou1-patch-1

Update zh.lng
Changed Files: locales/zh.lng
Tue Jan 16 10:11:07 CET 2018
by tangdou1
Update zh.lng

translate some untranslated words to chinese.
Changed Files: locales/zh.lng
Tue Jan 16 08:35:54 CET 2018
by tangdou1
Update zh.lng
Changed Files: locales/zh.lng
Mon Jan 15 18:32:34 CET 2018
by luccioman
Set reindex page to html5 and removed presentational only html tables.
Changed Files: htroot/IndexReIndexMonitor_p.html
Mon Jan 15 17:16:54 CET 2018
by luccioman
Removed unused duplicated HTML id on header hidden field
Changed Files: htroot/env/templates/header.template
Mon Jan 15 10:05:49 CET 2018
by luccioman
Removed unncessary reflection usage for workflow tasks.

This improves code readability and maintainability (calls hierarchy are
easier to read) and eventually performance.
Changed Files: source/net/yacy/crawler/CrawlStacker.java, source/net/yacy/kelondro/workflow/InstantBlockingThread.java, source/net/yacy/kelondro/workflow/WorkflowProcessor.java, source/net/yacy/kelondro/workflow/WorkflowTask.java, source/net/yacy/peers/Dispatcher.java, source/net/yacy/search/Switchboard.java
Mon Jan 15 08:30:37 CET 2018
by luccioman
Added new recrawl job profile to the list of default crawl profiles
Changed Files: source/net/yacy/crawler/CrawlSwitchboard.java
Mon Jan 15 08:06:28 CET 2018
by luccioman
Refresh recrawl job profile threshold date like other default profiles
Changed Files: source/net/yacy/search/Switchboard.java
Sat Jan 13 15:46:04 CET 2018
by luccioman
Added a specific default crawl profile for the recrawl job.

- with only light constraint on known indexed documents load date, as it
can already been controlled by the selection query, and the goal of the
job is indeed to recrawl selected documents now
- using the iffresh cache strategy
Changed Files: source/net/yacy/crawler/CrawlSwitchboard.java, source/net/yacy/crawler/RecrawlBusyThread.java
Sat Jan 13 12:13:04 CET 2018
by luccioman
Added comments about crawl profiles recrawl cycles
Changed Files: source/net/yacy/crawler/CrawlSwitchboard.java
Sat Jan 13 12:07:56 CET 2018
by luccioman
More comprehensive log on rejected recrawls caused by date constraint
Changed Files: source/net/yacy/crawler/CrawlStacker.java
Fri Jan 12 11:47:13 CET 2018
by luccioman
Added more details to the recrawl job report
Changed Files: htroot/IndexReIndexMonitor_p.html, htroot/IndexReIndexMonitor_p.java, source/net/yacy/crawler/RecrawlBusyThread.java
Fri Jan 12 10:23:26 CET 2018
by luccioman
Add a query link to local Solr to browse selected recrawl candidates
Changed Files: htroot/IndexReIndexMonitor_p.html, htroot/IndexReIndexMonitor_p.java
Thu Jan 11 09:53:27 CET 2018
by luccioman
Display recrawl job report also when job is actively running
Changed Files: htroot/IndexReIndexMonitor_p.html, htroot/IndexReIndexMonitor_p.java
Wed Jan 10 17:05:53 CET 2018
by luccioman
Record recrawl calls to make them schedulable
Changed Files: htroot/IndexImportMediawiki_p.java, htroot/IndexReIndexMonitor_p.java, source/net/yacy/data/WorkTables.java
Tue Jan 09 22:33:15 CET 2018
by luccioman
Added a report info box about eventual last terminated recrawl job

For easier monitoring of recrawls.
Changed Files: htroot/IndexReIndexMonitor_p.html, htroot/IndexReIndexMonitor_p.java, source/net/yacy/crawler/RecrawlBusyThread.java
Tue Jan 09 10:22:26 CET 2018
by luccioman
Added a stop condition to the Recrawl busy thread
Changed Files: htroot/IndexReIndexMonitor_p.java, source/net/yacy/crawler/RecrawlBusyThread.java
Mon Jan 08 21:20:46 CET 2018
by luccioman
Made possible to customize selection query before launching a recrawl
Changed Files: htroot/IndexReIndexMonitor_p.html, htroot/IndexReIndexMonitor_p.java, locales/de.lng, locales/master.lng.xlf, source/net/yacy/crawler/RecrawlBusyThread.java
Sun Jan 07 15:25:16 CET 2018
by luccioman
Enforced controls (HTTP method, token) on ReIndex and ReCrawl operations
Changed Files: htroot/IndexReIndexMonitor_p.html, htroot/IndexReIndexMonitor_p.java, htroot/IndexSchema_p.html, htroot/IndexSchema_p.java
Tue Jan 02 10:21:07 CET 2018
by luccioman
Fixed SegmentTest test case time dependant occasional failures

As highlighted by latest automated Travis builds.
Changed Files: source/net/yacy/kelondro/rwi/IndexCell.java, test/java/net/yacy/search/index/SegmentTest.java
Tue Jan 02 08:13:14 CET 2018
by luccioman
Added UI switch to control content domain constraint per search request
Changed Files: htroot/ConfigPortal_p.html, htroot/index.html, htroot/index.java, htroot/yacysearch.html, htroot/yacysearch.java, htroot/yacysearchtrailer.html, htroot/yacysearchtrailer.java, source/net/yacy/search/query/QueryParams.java
Fri Dec 29 11:32:42 CET 2017
by luccioman
Added UI setting for strictness of content-type checking on media search
Changed Files: htroot/ConfigPortal_p.html, htroot/ConfigPortal_p.java
Thu Dec 28 03:13:42 CET 2017
by reger
upd to commons-io-2.6
Changed Files: .classpath, build.xml, lib/commons-io-2.6.License, lib/commons-io-2.6.jar, pom.xml
Thu Dec 28 02:51:52 CET 2017
by reger
Make TokenizedStringNavigator (used for keyword search facet) active
check case insensitive.
As keywords are compared lower case, make sure user input keyword:Key
or keyword:key will be shown as active in facet entry key.
Changed Files: source/net/yacy/search/navigator/TokenizedStringNavigator.java
Sun Dec 24 01:34:23 CET 2017
by reger
upd to httpclient-4.5.4 and httpmime-4.5.4
Changed Files: .classpath, build.xml, lib/httpclient-4.5.4.jar, lib/httpmime-4.5.4.jar, pom.xml
Sun Dec 24 01:02:18 CET 2017
by reger
upd to icu4j-60.2
Changed Files: .classpath, build.xml, lib/icu4j-60.2.License, lib/icu4j-60.2.jar, pom.xml
Fri Dec 22 11:39:30 CET 2017
by luccioman
Enable full size images preview for users with extended search rights
Changed Files: source/net/yacy/visualization/ImageViewer.java
Fri Dec 22 11:01:02 CET 2017
by luccioman
Added UI setting for optional encryption with https on p2p searches
Changed Files: htroot/ConfigPortal_p.html, htroot/ConfigPortal_p.java
Thu Dec 21 18:41:32 CET 2017
by luccioman
Added optional https support for remote crawl and profile operations
Changed Files: htroot/ViewProfile.java, htroot/rct_p.java, source/net/yacy/crawler/data/CrawlQueues.java, source/net/yacy/peers/Protocol.java, source/net/yacy/search/Switchboard.java
Tue Dec 19 12:30:49 CET 2017
by luccioman
Enable optional https support for /yacy/transferURL API calls.

Also updated some Javadoc and consistently use Switchboard instance as a
constructor parameter where relevant.
Changed Files: htroot/IndexControlRWIs_p.java, source/net/yacy/peers/Dispatcher.java, source/net/yacy/peers/Protocol.java, source/net/yacy/peers/Transmission.java, source/net/yacy/search/Switchboard.java
Tue Dec 19 11:14:20 CET 2017
by luccioman
Updated links to Java Regular Expressions documentation to version 8
Changed Files: htroot/Blacklist_p.html, htroot/CrawlStartExpert.html, htroot/RegexTest.html, locales/hi.lng, locales/uk.lng, locales/zh.lng
Sat Dec 16 00:49:48 CET 2017
by reger
upd to commons-compress-1.15
Changed Files: .classpath, build.xml, lib/commons-compress-1.15.License, lib/commons-compress-1.15.jar, pom.xml
Fri Dec 15 17:03:35 CET 2017
by luccioman
Restored peer URL host name stripping removed from previous commit.

Still useful for peers with IPv6 addresses.
Changed Files: source/net/yacy/peers/Protocol.java
Wed Dec 13 07:38:04 CET 2017
by luccioman
Merge pull request #149 from Scre13/bugfix_default_settings

Fixed loading default thread load setting in Performance Settings of Queues and Processes.
Changed Files: htroot/PerformanceQueues_p.java
Tue Dec 12 23:25:56 CET 2017
by ScRe13
fixed default loading default settings; load was populated with wrong value
Changed Files: htroot/PerformanceQueues_p.java
Sun Dec 10 01:25:20 CET 2017
by reger
Show hide or show public surftip button depending on current config status,
to show the button to switch the status (hiding button of current status)
Changed Files: htroot/Surftips.html, htroot/Surftips.java
Fri Dec 08 15:26:46 CET 2017
by luccioman
Removed Java 1.8 no more necessary version checking (fixes issue #147)

Java 1.8 is by the way now a prerequisite to run from latest sources.
Changed Files: htroot/Status.html, htroot/Status.java
Fri Dec 08 01:01:07 CET 2017
by reger
remove deprecated jetty continuation class from urlproxyservlet
(was a long time carry over, while not supporting async requests)
Changed Files: source/net/yacy/http/servlets/UrlProxyServlet.java
Thu Dec 07 15:16:11 CET 2017
by Michael Peter Christen
(more!) evaluation of XRealIP from nginx reverse proxy
Changed Files: htroot/yacysearchitem.java, source/net/yacy/cora/protocol/RequestHeader.java, source/net/yacy/http/Jetty9YaCySecurityHandler.java, source/net/yacy/http/MonitorHandler.java, source/net/yacy/http/servlets/GSAsearchServlet.java, source/net/yacy/http/servlets/UrlProxyServlet.java, source/net/yacy/http/servlets/YaCyQoSFilter.java
Mon Dec 04 19:13:16 CET 2017
by luccioman
Made "tld:" modifier case insensitive and IDN complient.

Thus allowing typing internationalized top-level domains with non ASCII
characters as tld: modifier.
Changed Files: htroot/yacysearch.java
Mon Dec 04 18:23:26 CET 2017
by luccioman
Improved support for internationalized domain names on "site:" modifier

Allow typing directly internationalized domain names including non ASCII
characters in the search field. 
Search is done using the ASCII Compatible Encoding (ACE) representation.
Changed Files: source/net/yacy/search/query/QueryModifier.java
Mon Dec 04 14:11:29 CET 2017
by luccioman
Do locale independant case conversion on "filetype:" query modifier.
Changed Files: source/net/yacy/search/query/QueryModifier.java
Mon Dec 04 14:08:34 CET 2017
by luccioman
Made "site:" query modifier case insensitive.
Changed Files: source/net/yacy/search/query/QueryModifier.java
Mon Dec 04 13:58:15 CET 2017
by luccioman
Refactored 'site:' query modifier parsing into a dedicated function.
Changed Files: source/net/yacy/search/query/QueryModifier.java
Mon Dec 04 01:12:50 CET 2017
by reger
upd to httpcore-4.4.8
Changed Files: .classpath, build.xml, lib/httpcore-4.4.8.License, lib/httpcore-4.4.8.jar, pom.xml
Sat Dec 02 08:45:42 CET 2017
by luccioman
Merge pull request #144 from him2him2/_fic_HTTPS

Update HTTP -> HTTPS in README.md
Changed Files: README.md
Fri Dec 01 11:52:52 CET 2017
by luccioman
Prefer fine URL match over approximate URL mask regex on final filtering

Also prevent adding a redundant and CPU costly Solr url mask filter
query when possible
Changed Files: source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/query/SearchEvent.java
Fri Dec 01 11:19:31 CET 2017
by luccioman
Improved accuracy of URLs search filters : protocol, tld, host, file ext
Changed Files: source/net/yacy/cora/document/id/MultiProtocolURL.java, source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/query/SearchEvent.java, test/java/net/yacy/search/query/QueryParamsTest.java
Fri Dec 01 08:46:46 CET 2017
by luccioman
Apply tld query modifier on Solr host_s mandatory field.

The filter has thus much more chances to be effective than when applied
on the optional field host_dnc_s.
Changed Files: source/net/yacy/search/query/QueryParams.java
Thu Nov 30 09:20:32 CET 2017
by luccioman
Refactored url mask filter build from query modifiers

For better readability and easier unit testing.
Changed Files: source/net/yacy/search/query/QueryParams.java
Sun Nov 26 22:01:42 CET 2017
by reger
upd to Jsoup-1.11.2
Changed Files: .classpath, build.xml, lib/jsoup-1.11.2.jar, pom.xml
Sun Nov 26 02:53:51 CET 2017
by reger
remove redundant setting of timeout for remoteinstance 
and replace depreciated updatesolrclient instantiation with recommended builder
Changed Files: source/net/yacy/cora/federate/solr/instance/RemoteInstance.java
Thu Nov 23 09:54:36 CET 2017
by Ronald Eddy Jr
Update HTTP -> HTTPS in README.md

URLs were updated to use HTTPS protocol in README.md.
Changed Files: README.md
Wed Nov 22 09:07:36 CET 2017
by luccioman
Upgraded apache POI dependency from 3.16 to 3.17
Changed Files: .classpath, build.xml, lib/poi-3.17.License, lib/poi-3.17.jar, lib/poi-scratchpad-3.17.jar, pom.xml
Wed Nov 22 09:06:16 CET 2017
by luccioman
Added a basic JUnit test for the Visio parser (vsdParser)
Changed Files: source/net/yacy/document/parser/vsdParser.java, test/java/net/yacy/document/parser/vsdParserTest.java
Mon Nov 20 18:52:45 CET 2017
by luccioman
Do locale neutral case conversion of HTML charset name.

Required to properly run on systems with default locale set to Turkish
language, as with this locale the 'i' character has different upper and
lower case flavors than with other locales.
Changed Files: source/net/yacy/document/parser/htmlParser.java
Mon Nov 20 18:50:49 CET 2017
by luccioman
Restore initial locale at the end of a JUnit test case which modify it.
Changed Files: test/java/net/yacy/document/TextParserTest.java
Mon Nov 20 18:47:46 CET 2017
by luccioman
Do locale neutral case conversions on domain names.

Required to properly run on systems with default locale set to Turkish
language, as with this locale the 'i' character has different upper and
lower case flavors than with other locales.
Changed Files: source/net/yacy/cora/protocol/Domains.java, test/java/net/yacy/cora/protocol/DomainsTest.java
Mon Nov 20 15:23:33 CET 2017
by luccioman
Do locale neutral case conversions in MultiProtocolURL

For any relevant URL parts : host name, URL scheme, session ids or
technical parts (see https://url.spec.whatwg.org/#url-writing and
https://tools.ietf.org/html/rfc3986 for current standard references).

Remaining locale sensitive conversion used for detection of URL word
components in urlComps() makes sense but using detected language would
be preferable than using the default system locale.
Changed Files: source/net/yacy/cora/document/id/MultiProtocolURL.java, test/java/net/yacy/cora/document/id/MultiProtocolURLTest.java
Mon Nov 20 09:48:46 CET 2017
by luccioman
Do locale neutral case conversions in Classification

Required for people using Turkish language as their default system
locale, as with this locale the 'i' character has different upper and
lower case flavors than with other locales.
Changed Files: source/net/yacy/cora/document/analysis/Classification.java, test/java/net/yacy/cora/document/analysis/ClassificationTest.java
Fri Nov 17 11:09:55 CET 2017
by luccioman
Added signing key to developer releases location.
Changed Files: defaults/yacy.network.freeworld.unit
Thu Nov 16 09:50:55 CET 2017
by luccioman
Updated lists of known sponsored and country-code TLDs.

Using current IANA reference list at
https://www.iana.org/domains/root/db .

As for previous update on known generic TLDs list, the generated URL
hashes on these domains stay the same but it improves performance of URL
hash computation for URLs on these domains.
Changed Files: source/net/yacy/cora/protocol/Domains.java, source/net/yacy/cora/protocol/tld/GenericTLD.java, source/net/yacy/cora/protocol/tld/InternationalizedCountryCodeTLD.java, source/net/yacy/cora/protocol/tld/SponsoredTLD.java
Tue Nov 14 09:42:09 CET 2017
by luccioman
Updated the generic top-level known domains list.

Using current IANA reference list at

The generated URL hashes on these domains stay the same but performance
is greatly improved as a DNS resolve request is required on URL hash
computation when the TLD part of the host name is unknown.

Hash computation mean time measured on 1541 sample URLs (one on each
TLD) and a computer with a DSL connection : about 230ms before change,
then only 20ms.
Changed Files: source/net/yacy/cora/document/id/DigestURL.java, source/net/yacy/cora/protocol/Domains.java, source/net/yacy/cora/protocol/GenericTLD.java, test/java/net/yacy/cora/document/id/DigestURLHashPerfTest.java
Tue Nov 14 09:24:13 CET 2017
by luccioman
Added some JavaDoc
Changed Files: source/net/yacy/kelondro/util/FileUtils.java
Tue Nov 14 09:17:43 CET 2017
by luccioman
Updated log path in informative message of stop script.

As highlighted by @Lew-Rockwell-Fan in issue #140, the two log paths
mentioned by the stopYACY.sh script were inconsistent.
Changed Files: stopYACY.sh
Wed Nov 08 09:33:30 CET 2017
by luccioman
Improved some JUnit tests isolation and resources release

The modified tests were successfull when run manually from an IDE such
as Eclipse, but failed occasionnally when run with maven as part of the
overall test suite.
Changed Files: test/java/net/yacy/kelondro/io/RecordsTest.java, test/java/net/yacy/search/index/SegmentTest.java
Tue Nov 07 19:02:09 CET 2017
by luccioman
Remove old hard-coded holiday dates from DateDection class.

Replaced with rules based relative to current year as already done for a
part of the supported dates.
Changed Files: source/net/yacy/document/DateDetection.java, test/java/net/yacy/document/DateDetectionTest.java
Mon Nov 06 09:37:44 CET 2017
by luccioman
Upgraded icu4j dependency from 59_1 to 60.1
Changed Files: .classpath, build.xml, lib/icu4j-60.1.License, lib/icu4j-60.1.jar, pom.xml
Mon Nov 06 09:14:03 CET 2017
by luccioman
Added a html parser charset detection unit test
Changed Files: test/java/net/yacy/document/parser/htmlParserTest.java
Sun Nov 05 00:52:14 CET 2017
by reger
upd to pdfbox-2.0.8.jar
Changed Files: .classpath, build.xml, lib/fontbox-2.0.8.License, lib/fontbox-2.0.8.jar, lib/pdfbox-2.0.8.License, lib/pdfbox-2.0.8.jar, pom.xml
Sat Nov 04 11:06:05 CET 2017
by luccioman
Renamed Chinese & Greek lng files using ISO639-1 codes.

Previously named with their ISO 3166-1 country code : this way, when
setting language to "Browser" in ConfigBasic.html, it didn't work
properly when browser preferred language was Chinese or Greek as their
respective language codes are "zh" and "el" (not "cn" and "gr" which are
their country codes)
Changed Files: htroot/ConfigBasic.html, htroot/ConfigBasic.java, locales/el.lng, locales/zh.lng, source/net/yacy/data/Translator.java
Fri Nov 03 10:34:36 CET 2017
by luccioman
Added a help link to ISO 639-1 language codes list ref
Changed Files: htroot/index.html, locales/cn.lng, locales/de.lng, locales/fr.lng, locales/master.lng.xlf, locales/ru.lng, locales/uk.lng
Thu Nov 02 08:57:00 CET 2017
by luccioman
Added description of spatial restrictions in search options
Changed Files: htroot/index.html, locales/fr.lng, locales/master.lng.xlf
Tue Oct 31 08:53:17 CET 2017
by luccioman
Customized Threads with generic name for easier monitoring.
Changed Files: source/net/yacy/crawler/RecrawlBusyThread.java, source/net/yacy/document/importer/WarcImporter.java, source/net/yacy/search/Switchboard.java
Tue Oct 31 08:19:04 CET 2017
by luccioman
Added language HTML attribute to the search home page.
Changed Files: htroot/index.html, locales/cn.lng, locales/de.lng, locales/fr.lng, locales/ja.lng, locales/master.lng.xlf, locales/ru.lng, locales/sk.lng, locales/uk.lng
Tue Oct 31 07:44:37 CET 2017
by luccioman
Updated search page keyboard shortcuts descriptions. 
Changed Files: htroot/index.html, locales/cn.lng, locales/de.lng, locales/fr.lng, locales/master.lng.xlf, locales/ru.lng, locales/uk.lng
Mon Oct 30 08:07:59 CET 2017
by luccioman
Use accessible labels for search home page radio buttons.
Changed Files: htroot/index.html
Mon Oct 30 07:38:47 CET 2017
by luccioman
Updated a license header typo.
Changed Files: source/net/yacy/crawler/CrawlStarterFromScraper.java
Fri Oct 27 14:00:30 CEST 2017
by Apply55gx
fix typo
Changed Files: source/net/yacy/crawler/CrawlStarterFromScraper.java, source/net/yacy/crawler/FileCrawlStarterTask.java
Tue Oct 24 09:54:54 CEST 2017
by luccioman
Stay authenticated when going to the search start page.

Otherwise, when authenticated as admin and navigating from search
results or admin pages to the search start page (/index.html), if
nothing is done on that page within HTTP Digest Auth timeout (about
2mn), then search is performed without authentication and so without
extended search features.
Changed Files: htroot/env/templates/simpleSearchHeader.template, htroot/env/templates/simpleheader.template
Tue Oct 24 09:34:03 CEST 2017
by luccioman
Use the same top nav bar on index.html and search results.

Thus eventually including the same optional login link/status in the
search start page than in the results page, for the same convenient
login without the need to use the Administration section.
Changed Files: htroot/index.html, htroot/index.java, htroot/yacysearch.java
Mon Oct 23 18:28:11 CEST 2017
by luccioman
Fixed loss of index page form values on 'more options' link click.

Restores the behavior introduced eleven years ago (see commit
479861a3cf82e3439f7cdcce3865d3de602d53c3) and lost by mistake 3 years
ago (see commit 617dd9c97b5db119a4603190ccedaf7d504b728b), when the
click handler started referencing a missing HTML id.

Changed Files: htroot/index.html
Thu Oct 19 09:27:52 CEST 2017
by luccioman
Fixed JPEG snapshot resizing when running on OpenJDK.

Resizing JPEG snapshot images through /api/snapshot.jpg failed when
running on OpenJDK, but rendered successfully with a Oracle JDK.
Details in mantis 772 ( http://mantis.tokeek.de/view.php?id=772 ).

Removing any alpha component (useless in snapshot images) from the
rendered resized image solves the issue.
Changed Files: htroot/api/snapshot.java, source/net/yacy/peers/graphics/EncodedImage.java
Wed Oct 18 14:17:06 CEST 2017
by luccioman
Updated Java version information on Readme
Changed Files: README.md
Wed Oct 18 07:53:07 CEST 2017
by luccioman
Consistently encode snapshot image with format requested on the API.

Previously, calling /api/snapshot.png rendered JPEG encoded images.
Changed Files: htroot/api/snapshot.java, source/net/yacy/cora/util/Html2Image.java, test/java/net/yacy/cora/util/Html2ImageTest.java
Tue Oct 17 09:41:58 CEST 2017
by luccioman
Fixed search result Snapshots link.

Previously rendered as a broken URL containing the absolute file path of
a snapshot on the search server.

Now rendered as a valid URL linking to the /api/snapshot API to provide
available snapshot content. Snapshot format is selected among the
available ones in the following order of preference  : JPG/PNG, PDF, and
Changed Files: htroot/ConfigSearchPage_p.html, htroot/yacysearchitem.html, htroot/yacysearchitem.java, locales/fr.lng, locales/master.lng.xlf
Mon Oct 16 19:45:17 CEST 2017
by luccioman
Fixed pdf2image conversion with imagemagick on PDFs having transparency

The target image format (jpeg) doesn't support transparency, so the
Html2ImageTest produced unusable black images when ran on a linux
machine having imagemagick package installed.
Changed Files: source/net/yacy/cora/util/Html2Image.java, test/java/net/yacy/cora/util/Html2ImageTest.java
Mon Oct 16 17:04:22 CEST 2017
by luccioman
Properly close resources (even on error) on OS and ThreadDump classes.

Also updated some JavaDoc and main() function usage message on the same
Changed Files: source/net/yacy/kelondro/logging/ThreadDump.java, source/net/yacy/kelondro/util/OS.java
Mon Oct 16 09:18:12 CEST 2017
by luccioman
Fixed ProfilingGraph calculation integer overflows and added test class. 

Complementary to fix proposed in PR #128 by @otteresk.

Changed Files: htroot/PerformanceGraph.java, source/net/yacy/peers/graphics/ProfilingGraph.java, source/net/yacy/visualization/ChartPlotter.java, test/java/net/yacy/peers/graphics/ProfilingGraphTest.java
Wed Oct 11 07:13:28 CEST 2017
by luccioman
Addedd missing parameters to yacysearchtrailer call on JS resort mode
Changed Files: htroot/js/yacysort.js
Mon Oct 09 19:08:39 CEST 2017
by luccioman
Adjusted ResponseHeaderTest to succeed on slow or highly loaded CPU
Changed Files: test/java/net/yacy/cora/protocol/ResponseHeaderTest.java
Mon Oct 09 14:25:43 CEST 2017
by luccioman
Added a Travis build status image to Readme
Changed Files: README.md
Sat Oct 07 06:13:22 CEST 2017
by reger
Adjust tags css style in ConfigSearchPage to equal search page
Changed Files: htroot/ConfigSearchPage_p.html
Fri Oct 06 20:32:28 CEST 2017
by reger
Update deprecated SolrInputDocument.addField() with boost value
remove unused SchemaConfiguration.getDate (as it is designed to return
only past dates which might be unexpected for general configuration schema)
Changed Files: source/net/yacy/cora/federate/solr/SchemaConfiguration.java, source/net/yacy/search/schema/WebgraphConfiguration.java
Thu Oct 05 14:42:05 CEST 2017
by luccioman
Updated Debian optional dependencies with the ones used for snapshots
Changed Files: debian/control
Thu Oct 05 14:22:35 CEST 2017
by luccioman
Exclude eventual maven targets from ant dist task.
Changed Files: build.xml
Thu Oct 05 13:09:11 CEST 2017
by luccioman
Updated travis config : install ghostscript, required for Html2Image
Changed Files: .travis.yml
Thu Oct 05 13:09:11 CEST 2017
by luccioman
Updated travis config : install ghostscript, required for Html2Image
Changed Files: .travis.yml, source/net/yacy/cora/util/Html2Image.java
Thu Oct 05 09:25:02 CEST 2017
by luccioman
Updated Travis jdk version to match current requirements (Java 1.8)
Changed Files: .travis.yml
Wed Oct 04 18:33:09 CEST 2017
by luccioman
Added partial bzip2 stream parsing support and bzipParser Junit test
Changed Files: source/net/yacy/document/parser/bzipParser.java, test/java/net/yacy/document/parser/bzipParserTest.java, test/parsertest/umlaute_html_utf8.html.bz2, test/parsertest/umlaute_html_xml_txt_gnu.tbz2, test/parsertest/umlaute_linux.txt.bz2
Wed Oct 04 08:41:43 CEST 2017
by luccioman
Fixed RWI distance calculation on multi words search queries.

Distance was lost when storing/retrieving references to intermediate
result container.

Now all JUnit tests are again successfully passing!
Changed Files: source/net/yacy/kelondro/data/word/WordReferenceRow.java, source/net/yacy/kelondro/data/word/WordReferenceVars.java, test/java/net/yacy/kelondro/rwi/ReferenceContainerTest.java
Mon Oct 02 10:05:57 CEST 2017
by luccioman
Added textual hints to language radio buttons labels

As an help and accessible alternative to visual styling marking  whether
a language is available in browser preferred lang mode.
Changed Files: htroot/ConfigBasic.html, htroot/ConfigBasic.java
Mon Oct 02 09:36:13 CEST 2017
by luccioman
Fixed NullPointerException case on 'Browser' lang selection

Occurred when English was the only active language, then making the
ConfigBasic.html page unusable until manually modifying the
locale.language setting.
Changed Files: source/net/yacy/data/Translator.java
Mon Oct 02 02:51:10 CEST 2017
by reger
fix array out of bounds in YJsonResponseWriter and OpensearchResponsWriter
on recreation of image url. 
Set parameter of indexList2protocolList to required number of images (image_stubs)
Situation e.g. image_stub(size=15) but images_protocol(size=12)
Changed Files: source/net/yacy/cora/federate/solr/responsewriter/OpensearchResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/YJsonResponseWriter.java
Sat Sep 30 11:58:49 CEST 2017
by otter
prevent integer overflow in chartDot for nodes with a big index
Changed Files: source/net/yacy/visualization/ChartPlotter.java
Sat Sep 30 00:48:54 CEST 2017
by otter
prevent integer overflow in chartLine
Changed Files: source/net/yacy/visualization/ChartPlotter.java
Fri Sep 29 00:26:30 CEST 2017
by reger
Adjust filetype: query modifier parameter to lower case
to prevent mismatch on user input with mixed case
Internally file extension are always compared lowercase.
Changed Files: source/net/yacy/search/query/QueryModifier.java
Thu Sep 28 09:55:23 CEST 2017
by luccioman
Updated master translation file for ConfigSearchPage_p.html
Changed Files: locales/de.lng, locales/fr.lng, locales/ja.lng, locales/master.lng.xlf, locales/ru.lng
Thu Sep 28 00:46:49 CEST 2017
by reger
Add links to the optional keyword tags of search result
If swichted on link (click) to the tag adds the keyword to the search query.
If a keyword navigator is active the selected keyword adds or replaces 
a query keyword: modifier (currently replace was choosen as multiple 
keywords are not fully supported yet)
Changed Files: htroot/yacysearchitem.html, htroot/yacysearchitem.java
Wed Sep 27 17:51:11 CEST 2017
by luccioman
Added French translation for ConfigSearchPage_p.html
Changed Files: locales/fr.lng
Tue Sep 26 14:58:30 CEST 2017
by luccioman
Added missing accessible labels to ConfigSearchPage_p.html
Changed Files: htroot/ConfigSearchPage_p.html, htroot/env/templates/submenuDesign.template
Tue Sep 26 07:59:44 CEST 2017
by luccioman
Fixed ConfigSearchPage_p HTML validation errors.

Validated with Nu Html Checker 17.9.0
Changed Files: htroot/ConfigSearchPage_p.html, htroot/env/templates/footer.template
Mon Sep 25 15:21:17 CEST 2017
by luccioman
Removed unnecessary max counts init on empty search navigators.
Changed Files: htroot/IndexControlRWIs_p.java, htroot/yacy/search.java, source/net/yacy/cora/federate/FederateSearchManager.java
Mon Sep 25 14:54:35 CEST 2017
by luccioman
Restrict Search Result Layout modification to HTTP POST only.
Changed Files: htroot/ConfigSearchPage_p.html, htroot/ConfigSearchPage_p.java
Fri Sep 22 11:00:46 CEST 2017
by luccioman
Improved accessibility of histograms widgets.

Added keyboard navigation support and missing WAI-ARIA attributes.

Tested with NVDA 2017.3 screenreader on recent major browsers.
Changed Files: htroot/ConfigSearchPage_p.html, htroot/HostBrowser.html, htroot/js/accessibleHistogram.js, htroot/jslicense.html, htroot/yacysearchtrailer.html
Wed Sep 20 07:59:20 CEST 2017
by luccioman
Upgraded JavaScript lib raphael.js from 2.1.3 to 2.2.7
Changed Files: htroot/ConfigSearchPage_p.html, htroot/HostBrowser.html, htroot/js/raphael.min.js, htroot/jslicense.html, htroot/yacysearchtrailer.html
Mon Sep 18 17:36:07 CEST 2017
by luccioman
Refresh paginations buttons instead of fully rendering each time.

This prevent the already displayed pagination buttons to be unresponsive
when clicking on them while the rendering JS function is running.
Changed Files: htroot/js/yacysearch.js, htroot/js/yacysort.js, htroot/yacysearch.html
Sun Sep 17 00:29:36 CEST 2017
by reger
update classpath for Eclipse project config to Solr 6.6.1 
Changed Files: .classpath
Sun Sep 17 00:27:04 CEST 2017
by reger
update to Solr 6.6.1 
(ant build)
Changed Files: build.xml
Sun Sep 17 00:26:18 CEST 2017
by reger
update to Solr 6.6.1 
(maven build)
Changed Files: pom.xml
Sat Sep 16 23:58:17 CEST 2017
by reger
update maven source and compiler plugin to latest version
Changed Files: pom.xml
Sat Sep 16 10:13:09 CEST 2017
by luccioman
Handle JS refreshing of belatedly added search navigators
Changed Files: htroot/js/yacysort.js
Sat Sep 16 09:26:08 CEST 2017
by luccioman
Restrict JS results resorting to authenticated users.

Until a more efficient DOM refresh model needing less XHR requests per
search is implemented.
Changed Files: htroot/ConfigPortal_p.html, htroot/yacysearch.java
Fri Sep 15 14:23:49 CEST 2017
by luccioman
Added HTML ids to search navigators for a more reliable JS refreshing.
Changed Files: htroot/js/yacysort.js, htroot/yacysearchtrailer.html
Fri Sep 15 12:16:24 CEST 2017
by luccioman
Results JS resort : properly handle results with same ranking value.
Changed Files: htroot/env/yacysort.css, htroot/js/yacysort.js
Fri Sep 15 11:12:23 CEST 2017
by luccioman
Added new graphical setting for browser JS/On demand results resorting.
Changed Files: htroot/ConfigPortal_p.html, htroot/ConfigPortal_p.java
Fri Sep 15 09:51:34 CEST 2017
by luccioman
Apply JS resort only when currently relevant : p2p text search
Changed Files: htroot/yacysearch.java
Thu Sep 14 09:36:55 CEST 2017
by luccioman
Do not animate unnecessarily when changing page on JS sorted results.
Changed Files: htroot/env/yacysort.css, htroot/js/yacysort.js
Wed Sep 13 19:03:01 CEST 2017
by luccioman
Prevent unnecessary DOM finds in JS resorting functions.

Also removed now unused functions earlierPage() and laterPage().
Changed Files: htroot/js/yacysort.js
Wed Sep 13 09:03:24 CEST 2017
by luccioman
Stop updating results with JS resorting on server feeds termination
Changed Files: htroot/js/yacysort.js
Wed Sep 13 08:35:15 CEST 2017
by luccioman
Updated the JavaScript license information page
Changed Files: htroot/jslicense.html
Wed Sep 13 08:23:19 CEST 2017
by luccioman
Disabled as default verbose browser console logs in yacysort.js
Changed Files: htroot/js/yacysort.js
Wed Sep 13 08:16:29 CEST 2017
by luccioman
Added missing copyright header to the yacysort.js file
Changed Files: htroot/js/yacysort.js
Wed Sep 13 08:06:11 CEST 2017
by luccioman
Moved the JS resort specific styling to the usual YaCy CSS location
Changed Files: htroot/env/yacysort.css, htroot/yacysearch.html
Wed Sep 13 07:58:05 CEST 2017
by luccioman
Disable manual search results resorting when resorting is done with JS

Also added a constant for the js resorting setting key.
Changed Files: htroot/yacysearch.java, source/net/yacy/search/SwitchboardConstants.java
Wed Sep 13 07:41:03 CEST 2017
by luccioman
Trigger js resorting animations using only CSS classes.

Also added some more descriptive comments.
Changed Files: htroot/js/yacysort.js, htroot/yacysort.css
Mon Sep 11 20:02:19 CEST 2017
by Ryszard Go?
Javascript re-sorting: Remove potentially breaking display property and reset max-height when animation is finished.
Changed Files: htroot/yacysort.css
Sun Sep 10 17:09:35 CEST 2017
by Ryszard Go?
Javascript re-sorting: replace jQuery show() with css animations
Changed Files: htroot/js/yacysort.js, htroot/yacysearch.html, htroot/yacysort.css
Fri Sep 08 11:16:37 CEST 2017
by luccioman
Added Solr filter queries for audio, video and application domains

Inspired from the existing one used on image search, and consistent with
post filtering on content domain applied in SearchEvent.addNodes().

These filters are quite simplistic but at least audio, video or
application search now return results. Previously, when filtering on
these content domains, many results pages (and often even the first
page) were empty while the total results count suggested that results
should be available. This was because filtering on domain was only
applied AFTER requesting Solr indexes.
Changed Files: source/net/yacy/search/query/QueryGoal.java, source/net/yacy/search/query/QueryParams.java
Tue Sep 05 00:51:43 CEST 2017
by reger
update master.lng, IndexExport_p.html text
Changed Files: locales/master.lng.xlf
Sun Sep 03 19:34:48 CEST 2017
by JeremyRand
Javascript re-sorting: optimize the jQuery selectors a little bit.
Changed Files: htroot/js/yacysort.js
Sun Sep 03 20:03:48 CEST 2017
by JeremyRand
Fix numbered page navigation from getting corrupted when statistics() runs.
Changed Files: htroot/js/yacysearch.js
Sun Sep 03 20:09:44 CEST 2017
by JeremyRand
Add UI for numbered page navigation when Javascript re-sorting is enabled.
Changed Files: htroot/js/yacysearch.js, htroot/js/yacysort.js, htroot/yacysearch.html
Mon Apr 03 05:33:10 CEST 2017
by JeremyRand
Fix the sidebar item "Wiki Name Space" with Javascript re-sorting.
Changed Files: htroot/js/yacysort.js
Mon Apr 03 05:18:16 CEST 2017
by JeremyRand
(WIP) Add numbered page navigation when Javascript re-sorting is enabled.

TODO: Add UI for selecting the number.
Changed Files: htroot/js/yacysort.js
Mon Apr 03 04:32:09 CEST 2017
by JeremyRand
(WIP) Fix the sidebar when Javascript resorting is in use.

TODO: Add some markup so that DOM traversal in the animations is less painful.
Changed Files: htroot/js/yacysort.js, htroot/yacysearch.html, htroot/yacysearchtrailer.html
Sun Sep 03 19:50:08 CEST 2017
by JeremyRand
(WIP) Optionally sort HTML search items via Javascript.

TODO: Expose a GUI setting for this.
Changed Files: defaults/yacy.init, htroot/js/yacysort.js, htroot/yacysearch.html, htroot/yacysearch.java
Mon Aug 28 16:33:53 CEST 2017
by JeremyRand
Add data-ranking attribute to each HTML search item.
Changed Files: htroot/yacysearchitem.html, htroot/yacysearchitem.java
Sat Sep 02 09:53:38 CEST 2017
by luccioman
Updated internal ISO 639-1 language codes with latest standards.

Includes 54 language code additions, some name modifications, and
marking a few deprecated.
Changed Files: source/net/yacy/kelondro/util/ISO639.java
Thu Aug 31 11:24:59 CEST 2017
by luccioman
Fixed count of filtered results from local solr.

Was inadequately modified in my previous related commits (making next
pages buttons unavailable in Search portal mode), as
SearchEvent.local_solr_available did not count the total filtered
results but only the ones within the currently fetched result page(s).
Changed Files: htroot/yacysearch.java, htroot/yacysearchitem.java, htroot/yacysearchlatestinfo.java, source/net/yacy/search/query/SearchEvent.java
Wed Aug 30 23:50:14 CEST 2017
by Michael Peter Christen
try to fix problem
with error description
Changed Files: source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java
Wed Aug 30 12:23:45 CEST 2017
by luccioman
Use local solr filtered results in total search results count.

This modification has indeed low incidence as eventual query modifiers
are already applied when requesting the local solr index. 
It mainly impact doublons detected with results from remote peers.

Also updated javadocs for clarification.
Changed Files: source/net/yacy/search/query/SearchEvent.java
Tue Aug 29 08:16:12 CEST 2017
by luccioman
Make result action links visible when focusing them with keyboard.
Changed Files: htroot/env/base.css
Tue Aug 29 07:39:12 CEST 2017
by luccioman
Removed duplicate HTML class attribute.
Changed Files: htroot/yacysearch.html
Mon Aug 28 19:03:51 CEST 2017
by luccioman
Added a button to manually refresh sorting of p2p search results.

As a server-side oriented alternative to the JavaScript realtime
resorting feature proposed in PR #104.
The goal is the same as in this PR : having the possibility compensate
the network latency of various peers results fetching and obtain once
possible a consistently ranked result set.
Changed Files: htroot/js/yacysearch.js, htroot/yacysearch.html, htroot/yacysearch.java, source/net/yacy/cora/sorting/WeakPriorityBlockingQueue.java, source/net/yacy/search/query/SearchEvent.java
Sun Aug 27 04:22:39 CEST 2017
by reger
update master.lng, RankingSolr_p.html text
Changed Files: locales/master.lng.xlf
Wed Aug 23 08:20:37 CEST 2017
by luccioman
Use Javadoc style comments on SearchEvent properties.

For better code readability and understanding.
Changed Files: source/net/yacy/search/query/SearchEvent.java
Tue Aug 22 14:13:00 CEST 2017
by luccioman
Added unit tests on the gzip parser.
Changed Files: source/net/yacy/document/parser/gzipParser.java, test/java/net/yacy/document/parser/gzipParserTest.java, test/parsertest/umlaute_html_utf8.html.gz, test/parsertest/umlaute_html_xml_txt_gnu.tgz, test/parsertest/umlaute_linux.txt.gz
Tue Aug 22 14:11:35 CEST 2017
by luccioman
Finer control on max links to parse in the html parser.
Changed Files: source/net/yacy/cora/storage/SizeLimitedMap.java, source/net/yacy/cora/storage/SizeLimitedSet.java, source/net/yacy/document/parser/html/ContentScraper.java, source/net/yacy/document/parser/htmlParser.java, test/java/net/yacy/document/parser/htmlParserTest.java, test/parsertest/umlaute_html_namedentities.html
Tue Aug 22 14:06:09 CEST 2017
by luccioman
Added some unit tests on FileUtils.
Changed Files: test/java/net/yacy/kelondro/util/FileUtilsTest.java
Sun Aug 20 22:17:27 CEST 2017
by reger
Allow to stop currently running warc import (stop button) 
Changed Files: htroot/IndexImportWarc_p.html, htroot/IndexImportWarc_p.java, source/net/yacy/document/importer/WarcImporter.java
Wed Aug 16 14:21:07 CEST 2017
by luccioman
Use unredirected robots.txt URL when adding an entry to the table.
Changed Files: source/net/yacy/crawler/robots/RobotsTxt.java
Wed Aug 16 09:30:33 CEST 2017
by luccioman
Ensure proper synchronous robots entry retrieval on first check.

Previously, when checking for the first time the robots.txt policy on a
unknown host (not cached in the robots table), result was always empty
in the /getpageinfo_p.xml api and in the /CrawlCheck_p.html page. Next
calls returned however the correct information.
Changed Files: htroot/api/getpageinfo_p.java, source/net/yacy/crawler/robots/RobotsTxt.java
Tue Aug 15 21:04:36 CEST 2017
by luccioman
Upgraded Docker base image from deprecated java to openjdk.
Changed Files: docker/Dockerfile, docker/Dockerfile.alpine
Tue Aug 15 10:11:05 CEST 2017
by luccioman
Prevent search result failure on incomplete images information.

Complements the recent modification related to images in commit 7f395ef.

Unfortunately many documents metadata fetched from the freeworld p2p
network have only partial information about embedded images. Without
proper error handling, this made many searches in p2p mode to fail
Changed Files: htroot/yacysearchitem.java, source/net/yacy/kelondro/data/meta/URIMetadataNode.java
Tue Aug 15 07:16:01 CEST 2017
by Michael Peter Christen
added usage of X-Real-IP http header
to identify request IPs which came through NGINX reverse proxy
Changed Files: source/net/yacy/cora/protocol/RequestHeader.java, source/net/yacy/http/servlets/SolrSelectServlet.java
Mon Aug 14 20:12:09 CEST 2017
by Michael Peter Christen
added image link in search results
This should be a help to make a preview of search results.
The image is computed from the list of embedded images, it is
always the first image in that list.
In rss-type results the image is presented like
<media:content medium="image" url="https://abc.xyz/logo.png"/>
as defined in
Changed Files: htroot/yacysearchitem.java, htroot/yacysearchitem.json, htroot/yacysearchitem.xml, source/net/yacy/cora/federate/solr/responsewriter/OpensearchResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/YJsonResponseWriter.java, source/net/yacy/kelondro/data/meta/URIMetadataNode.java
Mon Aug 14 14:47:01 CEST 2017
by luccioman
Also handle text content when parsing XML within limits.
Changed Files: source/net/yacy/document/parser/GenericXMLParser.java, test/java/net/yacy/document/parser/GenericXMLParserTest.java
Mon Aug 14 02:16:43 CEST 2017
by reger
Add junit test for AbstractOperations.addOperand()
Changed Files: test/java/net/yacy/cora/federate/solr/logic/AbstractOperationsTest.java
Mon Aug 14 01:03:15 CEST 2017
by reger
Correction of https://github.com/yacy/yacy_search_server/commit/d03e2c98ea6bd5701c8e8257174c439b9c006afb
Fix Conjunction.addOperator to do nothing if term is empty
prevent to result in query string with repeated logical operator
like "field:term AND AND field:term"
possibliy causing out of mem in postprocessing_doublecontent
Changed Files: source/net/yacy/cora/federate/solr/logic/AbstractOperations.java
Mon Aug 14 00:52:03 CEST 2017
by reger
Fix Conjunction.addOperator to do nothing if term is empty
prevent to result in query string with repeated logical operator
like "field:term AND AND field:term"
possibliy causing out of mem in postprocessing_doublecontent
Changed Files: source/net/yacy/cora/federate/solr/logic/AbstractOperations.java
Sat Aug 12 21:53:04 CEST 2017
by reger
Remove deprecated YaCyProxyServlet
was replaced by UrlProxyServlet
Changed Files: defaults/web.xml
Sat Aug 12 09:43:49 CEST 2017
by luccioman
Prevent unwanted cached bytes duplication on stream parsing.
Changed Files: source/net/yacy/document/TextParser.java
Sat Aug 12 09:42:06 CEST 2017
by luccioman
Updated xml parser limited parsing test for use latest jdk.
Changed Files: test/java/net/yacy/document/parser/GenericXMLParserTest.java
Fri Aug 11 20:34:59 CEST 2017
by luccioman
Updated debian package configuration to match new Java 1.8 target

Following migration from Java 1.7 to Java 1.8 in commit
Changed Files: debian/control
Thu Aug 10 23:57:37 CEST 2017
by reger
upde to icu4j-59_1.jar
Changed Files: .classpath, build.xml, lib/icu4j-59_1.jar, pom.xml
Sun Aug 06 23:41:53 CEST 2017
by reger
Skip public post of jre version.
Added to determine switch to java8  https://github.com/yacy/yacy_search_server/commit/596b5dfa5936b25b605c42807730c29a1d08cd15
Changed Files: htroot/Network.html, htroot/Network.java, source/net/yacy/peers/Seed.java, source/net/yacy/peers/SeedDB.java
Sun Aug 06 23:26:27 CEST 2017
by reger
Replace deprecated ConcurrentHashSet with recommended Java8 
ConcurrentHashMap.newKeySet() in postprocessDocuments()
Changed Files: source/net/yacy/search/schema/CollectionConfiguration.java
Sat Aug 05 23:47:27 CEST 2017
by reger
Harmonizing use of xml reader / sax parser in XMLBlacklistImporter
eliminating the need for lib/xercesImpl.jar
Changed Files: .classpath, build.xml, pom.xml, source/net/yacy/data/list/XMLBlacklistImporter.java
Sat Aug 05 22:30:06 CEST 2017
by reger
Patch last_modified date with internal FirstSeenTime() if no date provided
to make sure updated documents are indexed with their last-modified
date as provided in current crawl. 
(to patch moddate always with firstseen might bear the risk of miss 
actual updates).
Changed Files: source/net/yacy/search/schema/CollectionConfiguration.java
Tue Aug 01 00:59:53 CEST 2017
by reger
Remove obsolete Protocol parameter ttl (time to live) 
not interpreted in target yacy/query.html
also Protocol.querySeed() not used and parameter not interpreted in 
target servlet yacy/query.html
Changed Files: source/net/yacy/peers/Protocol.java
Mon Jul 31 23:38:10 CEST 2017
by reger
upd to poi-3.16.jar
Changed Files: .classpath, build.xml, lib/poi-3.16.License, lib/poi-3.16.jar, lib/poi-scratchpad-3.16.jar, pom.xml
Mon Jul 31 01:55:01 CEST 2017
by reger
Replace deprecated getIP with getIPs in Protocol transferURL() and 
Remember used ip for error handling and departInterface
Changed Files: source/net/yacy/peers/Protocol.java
Sun Jul 30 23:02:15 CEST 2017
by reger
Replace one more deprecated peerDeparture in Protocol.transferIndex() 
by moving/using interfaceDeparture() in transferRWI()
Changed Files: source/net/yacy/peers/Protocol.java
Sun Jul 30 20:09:06 CEST 2017
by reger
upd to pdfbox-2.0.7.jar
Changed Files: .classpath, build.xml, lib/fontbox-2.0.7.License, lib/fontbox-2.0.7.jar, lib/pdfbox-2.0.7.License, lib/pdfbox-2.0.7.jar, pom.xml
Sun Jul 23 03:55:56 CEST 2017
by reger
Add SolrConfig ClassicIndexSchemaFactory to prevent Solr startup warning.
This overrides Solr default to use managed schema. As we don't use
programatic schema changes this directs Solr to use schema.xml, eliminating
the warning.
Changed Files: defaults/solr/solrconfig.xml
Mon Jul 17 15:35:10 CEST 2017
by luccioman
Log an error when Solr folder migration fails for some reason.
Changed Files: source/net/yacy/search/index/Fulltext.java
Sun Jul 16 23:37:28 CEST 2017
by reger
upd to jwat-warc-1.1.0.jar
Changed Files: .classpath, build.xml, lib/jwat-archive-common-1.1.0.jar, lib/jwat-common-1.1.0.jar, lib/jwat-gzip-1.1.0.jar, lib/jwat-warc-1.1.0.jar, pom.xml
Sun Jul 16 23:35:56 CEST 2017
by reger
upd version for typeahead.jquery.js in jslicense.html
Changed Files: htroot/jslicense.html
Sun Jul 16 14:46:46 CEST 2017
by luccioman
Support parsing gzip files from servers with redundant headers.

Some web servers provide both 'Content-Encoding : "gzip"' and
'Content-Type : "application/x-gzip"' HTTP headers on their ".gz" files.
This was annoying to fail on such resources which are not so uncommon,
while non conforming (see RFC 7231 section for
"Content-Encoding" header specification
Changed Files: source/net/yacy/crawler/retrieval/StreamResponse.java, source/net/yacy/document/TextParser.java, source/net/yacy/document/parser/gzipParser.java
Sun Jul 16 14:37:06 CEST 2017
by luccioman
URL Viewer : apply crawler size limits when adding to local index.

This allow large files parsing and preview, while preventing unwanted
OutOfMemory errors which are likely to occur when adding to the Solr
Index resources larger than configured crawler limits.
Changed Files: htroot/ViewFile.java
Sat Jul 15 00:19:23 CEST 2017
by reger
Clean up unmaintained and unused AugmentParser trail.
Changed Files:
Fri Jul 14 23:41:39 CEST 2017
by reger
Clean up redundant but obsolete jquery.rdfquery-core-1.0.js script lib
Changed Files: htroot/jslicense.html
Thu Jul 13 08:18:40 CEST 2017
by luccioman
Added gzip parser support for max content bytes limit
Changed Files: source/net/yacy/document/parser/gzipParser.java
Thu Jul 13 08:12:10 CEST 2017
by luccioman
Added HTML parser support for maximum content bytes parsing limit 
Changed Files: source/net/yacy/document/parser/html/ContentScraper.java, source/net/yacy/document/parser/htmlParser.java
Wed Jul 12 16:03:23 CEST 2017
by luccioman
Merge pull request #122 from Scarfmonster/patch-1

I also reproduced the issue, and the fix is working fine.

Thanks @Scarfmonster 
Changed Files: source/net/yacy/http/Jetty9HttpServerImpl.java
Wed Jul 12 00:18:12 CEST 2017
by luccioman
Added RSS parser support for maximum content bytes parsing limit
Changed Files: source/net/yacy/cora/document/feed/RSSFeed.java, source/net/yacy/cora/document/feed/RSSReader.java, source/net/yacy/document/Document.java, source/net/yacy/document/parser/rssParser.java
Wed Jul 12 00:13:24 CEST 2017
by luccioman
Finer control on bounded input streams with custom stream implementation
Changed Files: source/net/yacy/cora/util/StreamLimitException.java, source/net/yacy/cora/util/StrictLimitInputStream.java, source/net/yacy/crawler/retrieval/FileLoader.java, source/net/yacy/crawler/retrieval/HTTPLoader.java, source/net/yacy/document/TextParser.java, source/net/yacy/document/parser/GenericXMLParser.java
Tue Jul 11 09:07:48 CEST 2017
by luccioman
Added parsing within bounds implementation to the generic parser.
Changed Files: source/net/yacy/document/parser/genericParser.java
Tue Jul 11 09:06:37 CEST 2017
by luccioman
Support trying multiple parsers even when streaming on large resources.
Changed Files: source/net/yacy/document/TextParser.java
Tue Jul 11 09:04:23 CEST 2017
by luccioman
Support loading local files with a per request specified maximum size.

Consistently with the HTTP loader implementation.
Changed Files: source/net/yacy/crawler/retrieval/FileLoader.java, source/net/yacy/repository/LoaderDispatcher.java
Sun Jul 09 23:08:54 CEST 2017
by reger
Fix css conflict of YMarks.html to make it viewable.
yacy-ymarks.css sidebar conflicts with bootstraps sidebar (different
overlay settings). Simply renamed it to ymark-sidebar.
Changed Files: htroot/YMarks.html, htroot/env/yacy-ymarks.css
Sat Jul 08 23:46:10 CEST 2017
by reger
upd to commons-fileupload-1.3.3.jar
Changed Files: .classpath, build.xml, lib/commons-fileupload-1.3.3.License, lib/commons-fileupload-1.3.3.jar, pom.xml
Mon Jul 03 14:53:36 CEST 2017
by luccioman
Removed temporary html parser test code
Changed Files: test/java/net/yacy/document/parser/htmlParserTest.java
Mon Jul 03 13:51:14 CEST 2017
by luccioman
URL Viewer : decode raw text using the eventual response charset.

When provided, or decode as UTF-8 as previously done.
Changed Files: htroot/ViewFile.java
Mon Jul 03 10:00:53 CEST 2017
by luccioman
HTML parser : removed unnecessary remaining recursive processing

Recursive processing was removed in commit
67beef657f82e92f48dd8425073ad81896a2ff4b, but one remained for anchors
content(likely omitted from refactoring). It is no more necessary :
other links such as images embedded in anchors are currently correctly
detected by the parser.

More annoying : that remaining recursive processing could lead to almost
endless processing when encountering some (invalid) HTML structures
involving nested anchors, as detected and reported by lucipher on YaCy
forum ( http://forum.yacy-websuche.de/viewtopic.php?f=23&t=6005 ).
Changed Files: source/net/yacy/document/parser/html/ContentScraper.java, test/java/net/yacy/document/parser/htmlParserTest.java
Fri Jun 30 11:41:48 CEST 2017
by luccioman
Updated PerformanceQueues_p.xml API with last related servlet changes
Changed Files: htroot/PerformanceQueues_p.xml
Fri Jun 30 11:30:54 CEST 2017
by luccioman
Made remote search max system load limits configurable from UI.

As reported by davide on YaCy forums (
http://forum.yacy-websuche.de/viewtopic.php?f=23&t=6004 ) when the
system is on high load, unless reading carefully YaCy configuration
file, it could be difficult to understand why remote search results are
not fetched.
Changed Files: htroot/PerformanceQueues_p.html, htroot/PerformanceQueues_p.java, source/net/yacy/peers/RemoteSearch.java, source/net/yacy/search/SwitchboardConstants.java
Fri Jun 30 02:11:18 CEST 2017
by reger
Add keyword constraint to rwi query result filter
To discard rwi results not matching query keyword: parameter 
Changed Files: source/net/yacy/search/query/SearchEvent.java
Fri Jun 30 01:13:47 CEST 2017
by luccioman
Apply consistent behavior on HTTP resource size exceeding limit.

On content size known from HTTP headers, terminates connection faster
and improves error reports quality by reporting relevant message
"Content to download exceed maximum value..." rather than previously "no
response (NULL) for url...".
Changed Files: source/net/yacy/cora/protocol/http/HTTPClient.java
Fri Jun 30 00:30:54 CEST 2017
by luccioman
Respect maxFileSize limit also when streaming HTTP and when relevant.

Constraint applied consistently with HTTP content full load in byte
Changed Files: source/net/yacy/crawler/retrieval/HTTPLoader.java, source/net/yacy/repository/LoaderDispatcher.java, source/net/yacy/visualization/ImageViewer.java
Thu Jun 29 11:36:47 CEST 2017
by luccioman
Added an informative title on the crawl start robots.txt status icon
Changed Files: htroot/js/IndexCreate.js
Thu Jun 29 11:25:27 CEST 2017
by luccioman
Crawl start Ajax request : properly handle eventual XML parsing errors

Otherwise on a malformed getpageinfo_p XML response (from the browser
point of view), JavaScript errors where thrown and the ajax status
steering wheel remained displayed indefinitely.
Changed Files: htroot/js/IndexCreate.js
Tue Jun 27 19:30:40 CEST 2017
by luccioman
Refactored plain-text URLs detection implementation.

For faster processing (measured about 2 times faster on many real-world
examples) and more advanced detection (previous algorithm detected only
URLs separated from the rest of the text by a space character).
Changed Files: source/net/yacy/document/parser/html/ContentScraper.java, test/java/net/yacy/document/parser/html/ContentScraperTest.java
Mon Jun 26 17:33:56 CEST 2017
by luccioman
Made mime type and extension normalization locale independent.

Previously, upper cased mime type was incorrectly normalized when the
default locale is Turkish.
Changed Files: source/net/yacy/document/TextParser.java, test/java/net/yacy/document/TextParserTest.java
Sun Jun 25 20:05:37 CEST 2017
by reger
upd to jwat-warc-1.0.6.jar
Changed Files: .classpath, build.xml, lib/jwat-archive-common-1.0.6.jar, lib/jwat-common-1.0.6.jar, lib/jwat-gzip-1.0.6.jar, lib/jwat-warc-1.0.6.jar, pom.xml
Sat Jun 24 23:15:25 CEST 2017
by reger
remove unused Solr optional extra handler lib solr-dataimporthandler-6.6.0.jar
Changed Files: .classpath, build.xml
Sat Jun 24 22:54:43 CEST 2017
by reger
upd to jsoup-1.10.3.jar
Changed Files: .classpath, build.xml, lib/jsoup-1.10.3.jar, pom.xml
Fri Jun 23 02:23:49 CEST 2017
by Ryszard Go?
Wrong password was removed after the SSL certificate import

Removing the keystore password will prevent ssl from working after the next restart. The certificate password should be removed instead.
Fixes http://mantis.tokeek.de/view.php?id=687
Changed Files: source/net/yacy/http/Jetty9HttpServerImpl.java
Thu Jun 22 10:50:34 CEST 2017
by luccioman
Improved character encoding detection from Content-Type header

Also updated some related JavaDocs
Changed Files: source/net/yacy/cora/protocol/HeaderFramework.java, test/java/net/yacy/cora/protocol/HeaderFrameworkTest.java
Wed Jun 21 09:14:50 CEST 2017
by luccioman
Added a basic JUnit test with test gz files for the gzip parser
Changed Files: test/java/net/yacy/document/parser/gzipParserTest.java, test/parsertest/umlaute_html_utf8.html.gz, test/parsertest/umlaute_linux.txt.gz
Wed Jun 21 09:11:17 CEST 2017
by luccioman
Properly close test files in htmlParser unit test
Changed Files: test/java/net/yacy/document/parser/htmlParserTest.java
Mon Jun 19 17:02:11 CEST 2017
by luccioman
Prevent integer overflow in table statistics and use strong typing
Changed Files: htroot/PerformanceMemory_p.java, source/net/yacy/kelondro/table/Table.java
Sat Jun 17 09:33:14 CEST 2017
by luccioman
Limit the number of initially previewed links in crawl start pages.

This prevent rendering a big and inconvenient scrollbar on resources
containing many links.
If really needed, preview of all links is still available with a "Show
all links" button.

Doesn't affect the number of links used once the crawl is effectively
started, as the list is then loaded again server-side.
Changed Files: htroot/CrawlStartExpert.html, htroot/CrawlStartSite.html, htroot/api/getpageinfo_p.java, htroot/api/getpageinfo_p.xml, htroot/js/IndexCreate.js
Sat Jun 17 09:26:37 CEST 2017
by luccioman
Improved stream-oriented parsing entering conditions.
Changed Files: source/net/yacy/document/TextParser.java
Fri Jun 16 08:50:57 CEST 2017
by luccioman
Limit scope of some local JavaScript variables.
Changed Files: htroot/js/IndexCreate.js
Fri Jun 16 08:44:40 CEST 2017
by Michael Peter Christen
added json(p) endpoint for crawl start
Changed Files: htroot/Crawler_p.java, htroot/Crawler_p.json
Fri Jun 16 06:31:45 CEST 2017
by reger
make nsis build script require java 8
Changed Files: build.nsi
Fri Jun 16 02:17:49 CEST 2017
by reger
update nsi installer java autodl bundleid to use jre-8u131
Changed Files: build.nsi
Fri Jun 16 00:12:09 CEST 2017
by reger
remove reference to velocityresponsewriter in solrconfig.xml 
it is not longer part of solr-core api
Changed Files: defaults/solr/solrconfig.xml
Thu Jun 15 21:02:18 CEST 2017
by reger
remove sample path setting in solrconfig.xml not valid in Yacy
resulting in startup stop exception after fresh swithch to 1.921
Changed Files: defaults/solr/solrconfig.xml
Thu Jun 15 20:24:53 CEST 2017
by reger
update maven pom setting to YaCy version 1.921 
java 1.8 and solr 6.6
Changed Files: pom.xml
Thu Jun 15 14:13:46 CEST 2017
by luccioman
Prevent high CPU load at startup, caused by the Solr suggester build.

Reported by Collision on mantis 758 (
http://mantis.tokeek.de/view.php?id=758 ).
Introduced by the new YaCy Solr configuration for Solr 6.6.0 (see commit
6fe735945da97abcbb91ac545fb11cff9d48effc), including now Suggester
Changed Files: defaults/solr/solrconfig.xml
Thu Jun 15 09:50:02 CEST 2017
by luccioman
Added HT Cache basic statistics (hit rate)
Changed Files: htroot/ConfigHTCache_p.html, htroot/ConfigHTCache_p.java, source/net/yacy/crawler/data/Cache.java, test/java/net/yacy/crawler/data/CacheTest.java
Thu Jun 15 09:48:22 CEST 2017
by luccioman
Use volatile to ensure concurrent threads use up to date property value
Changed Files: source/net/yacy/kelondro/blob/Compressor.java
Wed Jun 14 19:02:08 CEST 2017
by luccioman
Made Cache compression level and lock timeout user configurable
Changed Files: defaults/yacy.init, htroot/ConfigHTCache_p.html, htroot/ConfigHTCache_p.java, source/net/yacy/crawler/data/Cache.java, source/net/yacy/kelondro/blob/Compressor.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/SwitchboardConstants.java, test/java/net/yacy/crawler/data/CacheTest.java
Wed Jun 14 08:56:11 CEST 2017
by luccioman
Prevent log pollution from unwanted Solr warnings.

Many non-blocking "java.nio.file.NoSuchFileException" traces with
warning log level can be logged by Solr, especially when heavily
crawling. This is issue is known from Solr 5.x but still unresolved with
Solr 6.x ( https://issues.apache.org/jira/browse/SOLR-9120 )

Consequently upgraded to "SEVERE" the default log level of the related
internal Solr class.

See also mantis 727 ( http://mantis.tokeek.de/view.php?id=727 )
Changed Files: defaults/yacy.logging
Fri Jun 09 12:50:36 CEST 2017
by Michael Peter Christen
re-added solr synchronization hack
Changed Files: source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java
Thu Jun 08 07:36:11 CEST 2017
by luccioman
Ensure system resource release by closing document stream.
Changed Files: source/net/yacy/document/TextParser.java
Tue Jun 06 10:30:02 CEST 2017
by luccioman
Removed unnecessary finalize implementation.

On such private classes with limited scope but with frequent instance
creations and removals within the application lifecycle, implementing
the finalize method is particularly unwanted as it decreases the garbage
collector performance.
What's more the Object.finalize() method is now deprecated in the JDK 9
and will eventually disappear from future releases (see
Changed Files: source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java
Sun Jun 04 01:50:40 CEST 2017
by reger
Tokenize result entry keywords and add some styling for display
Changed Files: htroot/env/base.css, htroot/yacysearchitem.html, htroot/yacysearchitem.java
Sat Jun 03 21:58:04 CEST 2017
by reger
upd to commons-compress-1.14.jar
Changed Files: .classpath, build.xml, lib/commons-compress-1.14.License, lib/commons-compress-1.14.jar, pom.xml
Fri Jun 02 09:47:45 CEST 2017
by luccioman
Unsure closing ChunkIterator stream in every possible use case.

Also trace in logs the eventual close failures instead of failing
This should help prevent holding too many unreleased system file
handlers, as in the case reported by eros on YaCy forum
Changed Files: source/net/yacy/kelondro/table/ChunkIterator.java, source/net/yacy/kelondro/table/Table.java
Fri Jun 02 01:46:06 CEST 2017
by luccioman
Improved consistency between loader openInputStream and load functions
Changed Files: source/net/yacy/crawler/retrieval/FTPLoader.java, source/net/yacy/crawler/retrieval/FileLoader.java, source/net/yacy/crawler/retrieval/HTTPLoader.java, source/net/yacy/crawler/retrieval/Response.java, source/net/yacy/crawler/retrieval/SMBLoader.java, source/net/yacy/crawler/retrieval/StreamResponse.java, source/net/yacy/repository/LoaderDispatcher.java, source/net/yacy/visualization/ImageViewer.java
Tue May 30 17:38:16 CEST 2017
by luccioman
Added JavaDoc to the getpageinfo_p API servlet.
Changed Files: htroot/api/getpageinfo_p.java
Tue May 30 09:29:28 CEST 2017
by luccioman
Deprecated duplicated and internally unused getpageinfo servlet.

Redirections set for the transition of any eventual external uses:
 - /api/getpageinfo.xml to /api/getpageinfo_p.xml
 - /api/getpageinfo.json to /api/getpageinfo_p.json
Changed Files: htroot/api/getpageinfo.java, htroot/api/getpageinfo_p.json
Mon May 29 19:16:09 CEST 2017
by luccioman
Fixed a NullPointerException case on Digest authentication.

Could occur when upgrading from a Debian package configured with Basic
authentication (as in release 1.92.9000) to a more recent one with
Digest authentication, without having re-encoded the admin password (for
example with dpkg-reconfigure).

As reported by eros on YaCy forum
Changed Files: source/net/yacy/http/YaCyLegacyCredential.java
Wed May 24 22:13:42 CEST 2017
by reger
upd to pdfbox-2.0.6.jar
Changed Files: .classpath, build.xml, lib/fontbox-2.0.6.License, lib/fontbox-2.0.6.jar, lib/pdfbox-2.0.6.License, lib/pdfbox-2.0.6.jar, pom.xml
Wed May 24 08:43:03 CEST 2017
by luccioman
Quoted param value in Solr query to avoid unwanted traces in logs

When Webgraph Solr core is enabled, crawling and removing from index an
URL whose hash starts with the '-' character (example URL :
https://cs.wikipedia.org/ whose hash is "-2-HuTEndn4x") produced a full
ParseException stack trace in YaCy logs. This was not blocking because
the Solr query parser is able to escape itself the query and run it
successfully, but filled uselessly YaCy logs.
Changed Files: source/net/yacy/search/index/Fulltext.java
Tue May 23 07:25:40 CEST 2017
by luccioman
Restored search page default behavior for Tab, Page Up and Down keys

Replaced by shortcuts defined by the HTML "accesskey" attribute which
has the advantage to be advertised by screen readers when focusing the
corresponding buttons, contrary to custom JavasScript key handlers.
Now With Firefox :
 - "Alt + Shift + n" for next page
 - "Alt + Shift + p" for previous page

Following ARIA recommendation : "keyboard shortcuts enhance, not
replace, standard keyboard access." ( see

Fix for mantis 711 (http://mantis.tokeek.de/view.php?id=711)
Changed Files: htroot/js/yacysearch.js, htroot/yacysearch.html
Mon May 22 01:56:11 CEST 2017
by reger
Set request originator to own peer in warc importer
in addition to change in https://github.com/yacy/yacy_search_server/commit/039162fbf0eca808afd350d360c3bcfe62dc4195
Changed Files: source/net/yacy/document/importer/WarcImporter.java
Mon May 22 01:34:08 CEST 2017
by reger
Change warc importer to use defaultsurrogate-crawl profile, as reported
by LA_FORGE http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5990 and
analysed by @luccioman (see comment https://github.com/yacy/yacy_search_server/commit/510f11d3745e14841420781376b733fd248d51f3)
it creates conflict using a other crawlprofile without setting originator.
Changed Files: source/net/yacy/document/importer/WarcImporter.java
Thu May 18 00:28:00 CEST 2017
by Michael Peter Christen
added a cache to prevent too many seed enumerations
Changed Files: source/net/yacy/peers/Seed.java, source/net/yacy/peers/SeedDB.java
Wed May 17 09:00:29 CEST 2017
by luccioman
Enable p2p and cluster communication when "Protection of all pages" on

As reported by paul89 on YaCy forum
(http://forum.yacy-websuche.de/viewtopic.php?f=23&t=5958 ), when setting
the "Protection of all pages" to "On" in the "ConfigAccounts_p.html"
page, the peer became completely unreachable by others, which is not the
purpose of this feature.
But the restriction still makes sense as a security enforcement and is
maintained in private "Robinson mode" where by the way any peer-to-peer
or cluster communication would be rejected.
Changed Files: source/net/yacy/http/Jetty9YaCySecurityHandler.java
Tue May 16 09:44:13 CEST 2017
by luccioman
Added missing accessibility attributes on search results progress bar.
Changed Files: htroot/js/yacysearch.js, htroot/yacysearch.html
Mon May 15 13:31:24 CEST 2017
by luccioman
Annotated search result information separators for screen readers.
Changed Files: htroot/ConfigSearchPage_p.html, htroot/yacysearchitem.html
Sat May 13 20:38:25 CEST 2017
by sgaebel
added closing of lst-Tag in solr-Export
Changed Files: source/net/yacy/search/index/Fulltext.java
Thu May 11 08:33:19 CEST 2017
by luccioman
Added some JavaDoc
Changed Files: source/net/yacy/peers/RemoteSearch.java
Tue May 09 22:52:54 CEST 2017
by reger
Adjust mergeDocuments to keep youngest last-modified date of document
Changed Files: source/net/yacy/document/Document.java, test/java/net/yacy/document/DocumentTest.java
Tue May 09 18:32:47 CEST 2017
by luccioman
Fixed StringIndexOutOfBoundsException case.

Revealed by commit c77e43a : the exception was then thrown when indexing
pages containing mailto: scheme URL links with the Solr Webgraph core
Fixed the error case and restored filtering on mailto links in
Document.resortLinks() as these URLs still should not appear in
Changed Files: source/net/yacy/document/Document.java, source/net/yacy/search/schema/WebgraphConfiguration.java
Tue May 09 12:20:41 CEST 2017
by luccioman
Updated Debian package post install script admin password encoding.

To fit the now default HTTP authentication method set to Digest in
commit f7fce1b.
Also fixed unauthenticated access from localhost setting when first
installing the Debian package and letting the prompted password field
Changed Files: debian/postinst
Thu May 04 16:36:45 CEST 2017
by luccioman
Improved new blacklist entries URL scheme detection.
Changed Files: source/net/yacy/repository/BlacklistHelper.java, test/java/net/yacy/repository/BlacklistHelperTest.java
Thu May 04 11:21:27 CEST 2017
by luccioman
Updated putHTML() JavaDoc
Changed Files: source/net/yacy/server/serverObjects.java
Thu May 04 11:19:59 CEST 2017
by luccioman
Handle '?' and '+' chars as valid wild cards when adding to blacklist.

An entry such as "domain.com/[a-z]+" is a valid regular expression and
do not need additional ".*.*/.*" wildcards.
Changed Files: source/net/yacy/repository/BlacklistHelper.java
Thu May 04 11:12:58 CEST 2017
by luccioman
Fixed blacklist Regex containing '+' characters rendering.

As reported on YaCy forum by shni
(http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5970) when a
blacklist entry contained both '?' and '+' characters, the '+' chars
were wrongly decoded and rendered as spaces.
Changed Files: htroot/Blacklist_p.java
Wed May 03 18:53:01 CEST 2017
by luccioman
Added MediaWiki dump import scheduling feature.

Checking the last modified date by default to prevent unnecessary long
running operations.
Changed Files: htroot/IndexImportMediawiki_p.html, htroot/IndexImportMediawiki_p.java, source/net/yacy/data/WorkTables.java
Tue May 02 09:38:45 CEST 2017
by luccioman
Improved MediaWiki dump import monitoring.

When import thread is terminated :
 - now stop refreshing and stay on the monitoring page to give user a
feedback after a long running import
 - added link to the next monitoring step : results from surrogates
 - added link to new import
On the new import page, added a link on the eventual last import report.
Changed Files: htroot/IndexImportMediawiki_p.html, htroot/IndexImportMediawiki_p.java
Tue May 02 09:33:11 CEST 2017
by luccioman
Added some JavaDoc
Changed Files: source/net/yacy/document/importer/Importer.java
Tue May 02 09:32:04 CEST 2017
by luccioman
Fixed regression introduced by commit 9ad4d16

On MediaWiki dump imports, the SurrogateReader was trying to unread too
many bytes, then failing with the following exception :
"java.io.IOException: Push back buffer is full".
Changed Files: source/net/yacy/document/content/SurrogateReader.java
Mon May 01 11:38:02 CEST 2017
by Michael Peter Christen
added patch to rewrite altered yacy grid schema into yacy schema

This generates the stub and protocol parts of an url for inboundlinks,
outboundlinks and images
Changed Files: source/net/yacy/search/Switchboard.java
Sun Apr 30 23:53:52 CEST 2017
by reger
Add a responsHeader to the solr index export with a format identifier
and export parameter (in accordance with response xml format) for easier
format detection on import.
Changed Files: source/net/yacy/document/content/DCEntry.java, source/net/yacy/document/content/SurrogateReader.java, source/net/yacy/search/index/Fulltext.java
Fri Apr 28 11:39:51 CEST 2017
by luccioman
Fixed Index Export feature for compatibility with old indexed documents.

This is a fix for mantis 682 (http://mantis.tokeek.de/view.php?id=682)
and issue #116
Changed Files: source/net/yacy/search/index/Fulltext.java
Fri Apr 28 11:36:48 CEST 2017
by luccioman
Added some JavaDoc
Changed Files: source/net/yacy/cora/federate/solr/SchemaDeclaration.java
Thu Apr 27 18:24:54 CEST 2017
by luccioman
Crawl results page : apply table lines number limit.

Take into account the already existing default limit value (especially
useful after a long crawl or surrogates import), or a custom one from
parameter "count".
Added a "Show all" link for convenience.
Changed Files: htroot/CrawlResults.html, htroot/CrawlResults.java
Thu Apr 27 09:50:04 CEST 2017
by luccioman
Extended WikiCode template inclusion syntax support.

Wiki templates are not rendered but syntax support is improved, which
greatly enhance snippets rendering on search results coming from a
MediaWiki dump import.
Tested on various dumps from Wikimedia at
See also Wikipedia transclusion documentation at
Changed Files: source/net/yacy/data/wiki/WikiCode.java, test/java/net/yacy/data/wiki/WikiCodeTest.java
Tue Apr 25 08:44:02 CEST 2017
by Michael Peter Christen
added yacy grid flatjson surrogate parser
Changed Files: source/net/yacy/search/Switchboard.java, source/net/yacy/search/schema/CollectionSchema.java
Mon Apr 24 18:24:26 CEST 2017
by luccioman
Fixed surrogates import monitoring page (/CrawlResults.html?process=7)

This page was always empty, as described in mantis 740
Changed Files: source/net/yacy/crawler/retrieval/Response.java, source/net/yacy/search/Switchboard.java
Sat Apr 22 23:32:40 CEST 2017
by reger
upd to jwat-1.0.5
Changed Files: .classpath, build.xml, lib/jwat-archive-common-1.0.5.jar, lib/jwat-common-1.0.5.jar, lib/jwat-gzip-1.0.5.jar, lib/jwat-warc-1.0.5.jar, pom.xml
Thu Apr 20 00:47:52 CEST 2017
by reger
fix unit test MultiProtocolURL(file) assertion for Windows path with
drive letter.
Changed Files: test/java/net/yacy/cora/document/id/MultiProtocolURLTest.java
Thu Apr 20 00:18:18 CEST 2017
by reger
Take out mailto collect in internal parsed document
As earlier plans to make use of mailto as separate webgraph entity didn't
materialize (see  http://forum.yacy-websuche.de/viewtopic.php?f=8&t=5726&p=32493&hilit=mailto#p32493)
free the unused handling and resources.
Changed Files: htroot/ViewFile.java, source/net/yacy/document/Document.java
Sun Apr 16 04:25:29 CEST 2017
by reger
Add url input field as source for WarcImporter
allowing to import warc from url without prior download.
Changed Files: htroot/IndexImportWarc_p.html, htroot/IndexImportWarc_p.java, source/net/yacy/document/importer/WarcImporter.java
Fri Apr 14 14:23:50 CEST 2017
by luccioman
Improved http client close time on stream processing errors.
Changed Files: source/net/yacy/cora/protocol/http/HTTPClient.java
Wed Apr 12 17:17:03 CEST 2017
by luccioman
Fixed endless loop case in wikicode processing.

Detected when importing recent MediaWiki dumps containing some pages
with script content in plain text format (see Scribunto extension
https://www.mediawiki.org/wiki/Extension:Scribunto ).

Further improvement : modify the MediawikiImporter to prevent processing
revisions whose <model> is not wikitext.
Changed Files: source/net/yacy/data/wiki/WikiCode.java, test/java/net/yacy/data/wiki/WikiCodeTest.java
Wed Apr 12 09:23:10 CEST 2017
by luccioman
Improved support for non ASCII chars in local file system URLs

Creating a MultiProtocolURL instance from a File object and then
retrieving a File with getFSFile() was inconsistent with file paths
containing space or non ASCII chars. 
Changed Files: source/net/yacy/cora/document/id/MultiProtocolURL.java, test/java/net/yacy/cora/document/id/MultiProtocolURLTest.java
Tue Apr 11 08:21:34 CEST 2017
by luccioman
Improved error reports on various wiki dump prerequisites failure cases.

Also added some JavaDoc.
Changed Files: htroot/IndexImportMediawiki_p.html, htroot/IndexImportMediawiki_p.java
Tue Apr 11 07:34:17 CEST 2017
by luccioman
Used a text input for wiki dump import file selection.

Using an HTML "file" input was confusing (as reported by promocore on
YaCy forum : http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5965) ,
and it only worked with MS IE/Edge on a local YaCy peer :
 - for security reasons some current major browsers such as Firefox or
Chrome do not allow to send full file path information when using a file
form input
 - the local file system selection popup doesn't make sense when you
want to import a dump on a remote YaCy server
Changed Files: htroot/IndexImportMediawiki_p.html
Mon Apr 10 22:58:20 CEST 2017
by reger
Adjust ConfigSearchPage_p to activated hosts navigator as plugin
Changed Files: htroot/ConfigSearchPage_p.html, htroot/ConfigSearchPage_p.java
Mon Apr 10 22:42:06 CEST 2017
by reger
Activate hosts navigator plugin. This includes rwi results in the navigator
This might be tangential related to http://mantis.tokeek.de/view.php?id=736
as the example includes a local index search, while rwi results are not
Changed Files: htroot/yacysearchtrailer.html, htroot/yacysearchtrailer.java, htroot/yacysearchtrailer.json, htroot/yacysearchtrailer.xml, source/net/yacy/search/navigator/NavigatorPlugins.java, source/net/yacy/search/query/QueryModifier.java, source/net/yacy/search/query/SearchEvent.java
Sun Apr 09 21:42:05 CEST 2017
by reger
add missing text from ConfigRobotsTxt_p to master.lng
and link to Translation Editor to Translation News page.
Changed Files: htroot/TransNews_p.html, locales/master.lng.xlf
Sun Apr 09 02:09:32 CEST 2017
by reger
add servlet to list user in UserDB and made user editor available in
separate servlet for a quick and easy overview of configured user and
selection for edit.
Changed Files: htroot/ConfigAccountList_p.html, htroot/ConfigAccountList_p.java, htroot/ConfigAccounts_p.html, htroot/ConfigAccounts_p.java, htroot/ConfigUser_p.html, htroot/ConfigUser_p.java
Sat Apr 08 22:54:57 CEST 2017
by reger
fix edit current user form to required post mehtod 
introduced with https://github.com/yacy/yacy_search_server/commit/cde237b68763c542da20038e5f62bea341ae1d37
Changed Files: htroot/ConfigAccounts_p.html, htroot/ConfigAccounts_p.java
Fri Apr 07 09:15:05 CEST 2017
by Michael Peter Christen
added flatjson parser (stub, unfinished)
Changed Files: source/net/yacy/search/Switchboard.java
Wed Apr 05 00:08:25 CEST 2017
by reger
Introduce a Keyword search navigator using the index field keywords.
The keywords field string is split into words as navigator entries.

A keyword navigator facet is essential for search appliance usage were
documents and metadata use often specialized keyword vocabularies to 
filter search results. This navi can be used without custom index schema.

As we don't have defined a search query command to filter "keywords" yet,
the filtering is limited by adding the keyword to the search query.
Changed Files: source/net/yacy/search/navigator/NavigatorPlugins.java, source/net/yacy/search/navigator/TokenizedStringNavigator.java
Mon Apr 03 22:53:07 CEST 2017
by reger
add CookieTest_p.html text to master.lng
Changed Files: locales/master.lng.xlf
Mon Apr 03 12:20:16 CEST 2017
by luccioman
Enforced access controls on a few more administration pages.

 - ensure use of HTTP POST method when performing server side effect
 - transaction token required to ensure the request has effectively been
requested by user interaction
Changed Files: htroot/ConfigPortal_p.html, htroot/ConfigPortal_p.java, htroot/Table_API_p.html, htroot/Table_API_p.java, htroot/Translator_p.html, htroot/Translator_p.java
Mon Apr 03 11:40:37 CEST 2017
by luccioman
Escaped HTML eventually active content from recorded API call comments.
Changed Files: htroot/Table_API_p.java
Sun Apr 02 22:30:23 CEST 2017
by reger
update master.lng with recent text changes 
to IndexExport_p.html, IndexImportWarc_p.html
Changed Files: locales/master.lng.xlf
Sun Apr 02 20:36:22 CEST 2017
by reger
use css error class for error msg in IndexImportOAIPMH_p.html,
adjust to xhtml <p> usage rule
Changed Files: htroot/IndexImportOAIPMH_p.html
Sun Apr 02 03:59:37 CEST 2017
by reger
remove test case for Standard_MemoryControl which will always fail
see https://github.com/yacy/yacy_search_server/pull/114
Changed Files:
Sun Apr 02 03:32:21 CEST 2017
by reger
Add servlet to import warc file from filesystem IndexImportWarc_p.html.
Apply Importer interface to WarcImporter
Changed Files: htroot/IndexImportWarc_p.html, htroot/IndexImportWarc_p.java, htroot/env/templates/submenuIndexImport.template, source/net/yacy/document/importer/WarcImporter.java, source/net/yacy/search/Switchboard.java
Sat Apr 01 01:04:17 CEST 2017
by Michael Peter Christen
added export to elasticsearch. The export dump can easily be imported to
elasticsearch using the command
curl -XPOST localhost:9200/collection1/yacy/_bulk --data-binary
Changed Files: htroot/IndexExport_p.html, htroot/IndexExport_p.java, source/net/yacy/cora/federate/solr/responsewriter/FlatJSONResponseWriter.java, source/net/yacy/search/index/Fulltext.java
Thu Mar 30 16:14:22 CEST 2017
by luccioman
URL Viewer : only display the link to metadata when metadata exists
Changed Files: htroot/ViewFile.html, htroot/ViewFile.java
Thu Mar 30 10:23:47 CEST 2017
by luccioman
Modified RWI settings page radio click event to use HTTP POST
Changed Files: htroot/IndexControlRWIs_p.html, locales/de.lng, locales/master.lng.xlf, locales/ru.lng, locales/uk.lng
Thu Mar 30 09:22:28 CEST 2017
by luccioman
Updated API calls recording/replay with recent changes.

 - enabled HTTP POST calls with Digest HTTP authentication
 - made API calls compatible with API newly restricted to HTTP POST only
with transaction token validation
 - ensured backward compatibility with older entries recorded as HTTP
Changed Files: htroot/CrawlStartScanner_p.java, source/net/yacy/data/WorkTables.java
Sun Mar 26 23:52:31 CEST 2017
by reger
fix default/httpd.mime Z file extension to lower case
+ test case
Changed Files: defaults/httpd.mime, test/java/net/yacy/cora/document/analysis/ClassificationTest.java
Sun Mar 26 23:26:40 CEST 2017
by reger
remove seedlist bootstrap target (not working for some longer time)
Changed Files: defaults/yacy.network.freeworld.unit
Sun Mar 26 23:13:12 CEST 2017
by reger
Add label text for search word statistic (AccessTracker_p.html) to master
lng file
Changed Files: locales/master.lng.xlf
Sun Mar 26 20:05:48 CEST 2017
by reger
One more use of SwitchboardConstants.SERVER_PORT constant,
apply standard servlet design pattern initialization of solrselectservlet 
Changed Files: source/net/yacy/http/servlets/SolrSelectServlet.java, source/net/yacy/http/servlets/YaCyDefaultServlet.java
Sun Mar 26 11:29:04 CEST 2017
by luccioman
Extended Apache HTTP Digest Auth. for use of YaCy encoded password

When programmatically requesting the local peer with Apache http client,
authentication credentials must be passed as clear-text values. 
This extension to the apache org.apache.http.impl.auth.DigestScheme
permits use of the YaCy encoded password stored in the
adminAccountBase64MD5 configuration property.
Changed Files: source/net/yacy/cora/protocol/http/HTTPClient.java, source/net/yacy/cora/protocol/http/auth/HttpEntityDigester.java, source/net/yacy/cora/protocol/http/auth/YaCyDigestScheme.java, source/net/yacy/cora/protocol/http/auth/YaCyDigestSchemeFactory.java
Sun Mar 26 10:59:04 CEST 2017
by luccioman
Updated dump/restore shell scripts : the API is now IndexExport_p.html
Changed Files: bin/indexdump.sh, bin/indexrestore.sh
Tue Mar 21 01:16:16 CET 2017
by reger
Update master lng file with added text in Settings_ServerAccess
remove outdated file entry in fr.lng & sk.lng
Changed Files: README.md, locales/fr.lng, locales/master.lng.xlf, locales/sk.lng
Tue Oct 25 05:06:42 CEST 2016
by Karl-Philipp Richter
adjusted .travis.yml to build in libbuild first (see http://mantis.tokeek.de/view.php?id=545); added test of build instructions
Changed Files: .travis.yml
Mon Mar 20 02:33:21 CET 2017
by reger
Add hint how to build with maven (for the first time) to readme
Changed Files: README.md
Sun Mar 19 21:45:33 CET 2017
by reger
Add hint text to default ServerAcess Port Settings page
Changed Files: htroot/Settings_ServerAccess.inc
Sun Mar 19 07:12:35 CET 2017
by reger
Display the local search word statistic in alphabetic order
Changed Files: htroot/AccessTracker_p.java, source/net/yacy/cora/sorting/OrderedScoreMap.java
Sat Mar 18 20:32:53 CET 2017
by reger
upd to slf4j-1.7.24.jar
Changed Files: .classpath, build.xml, lib/jcl-over-slf4j-1.7.24.jar, lib/log4j-over-slf4j-1.7.24.jar, lib/slf4j-api-1.7.24.jar, lib/slf4j-jdk14-1.7.24.jar, pom.xml
Sat Mar 18 20:06:58 CET 2017
by reger
upd to icu4j-58_2.jar
Changed Files: .classpath, build.xml, lib/icu4j-58_2.jar, pom.xml
Fri Mar 17 02:19:33 CET 2017
by reger
update to jsoup-1.10.2.jar
Changed Files: .classpath, build.xml, lib/jsoup-1.10.2.jar, pom.xml
Fri Mar 17 02:07:02 CET 2017
by reger
update to jsch-0.1.54.jar
Changed Files: .classpath, build.xml, lib/jsch-0.1.54.License, lib/jsch-0.1.54.jar, pom.xml
Wed Mar 15 22:36:53 CET 2017
by reger
update translation for ConfigNetwork_p.html
Changed Files: htroot/ConfigNetwork_p.html, locales/de.lng, locales/master.lng.xlf
Wed Mar 15 01:39:15 CET 2017
by reger
make digest default authentication in defaults/web.xml
Changed Files: defaults/web.xml
Mon Mar 13 03:08:44 CET 2017
by reger
remove double occuance of geo:lat in rss tokens
Changed Files: source/net/yacy/cora/document/feed/RSSMessage.java
Mon Mar 13 00:34:40 CET 2017
by reger
upd to metadata-extractor-2.10.1.jar
Changed Files: .classpath, build.xml, lib/metadata-extractor-2.10.1.License, lib/metadata-extractor-2.10.1.jar, pom.xml
Sun Mar 12 01:54:56 CET 2017
by reger
implement RequestHeader getRequestURI, getRequestURL for legacy request
Changed Files: source/net/yacy/cora/protocol/RequestHeader.java
Thu Mar 09 22:57:51 CET 2017
by reger
remove unused import pdfParser
Changed Files: source/net/yacy/document/parser/pdfParser.java
Thu Mar 09 22:56:33 CET 2017
by reger
Improve pdf text extraction resource handling.
For sort pdf <= 3 pages use already extracted content,
only for long pdf > 3 pages reassign content and close internal writer (to direct free buffers)
Changed Files: source/net/yacy/document/parser/pdfParser.java
Thu Mar 09 22:50:19 CET 2017
by reger
upd to pdfbox-2.0.4.jar
Changed Files: .classpath, build.xml, lib/fontbox-2.0.4.License, lib/fontbox-2.0.4.jar, lib/pdfbox-2.0.4.License, lib/pdfbox-2.0.4.jar, pom.xml
Thu Mar 09 01:42:36 CET 2017
by reger
eliminate some compiler unchecked and deprecation warnings
in nav plugins by explicite type declaration and replacing date.getYear
with Calendar.get
Changed Files: source/net/yacy/search/navigator/NavigatorPlugins.java, source/net/yacy/search/navigator/YearNavigator.java
Wed Mar 08 22:35:48 CET 2017
by reger
upd to httpclient v4.5.3
Changed Files: .classpath, build.xml, lib/httpclient-4.5.3.jar, lib/httpcore-4.4.6.License, lib/httpcore-4.4.6.jar, lib/httpmime-4.5.3.jar, pom.xml
Wed Mar 08 10:27:18 CET 2017
by luccioman
Fixed unresolved pattern case in search results progress bar.

This is a fix for mantis 715 (http://mantis.tokeek.de/view.php?id=715).

A possible path scenario that could leading to this case :
 - YaCy is running low in memory
 - a search is requested
 - before the end of search results rendering, the cleanup job runs and
deletes the running search event from the cache because of short memory
 - then yacysearchitem renders with "-UNRESOLVED_PATTERN-" parameter
values passed to the statistics() JavaScript function
Changed Files: htroot/yacysearchitem.html, htroot/yacysearchitem.java
Sun Mar 05 02:26:10 CET 2017
by reger
Extend DCEntry.getLanguage convert to ISO639-1 codes for more languages
by using icu.ULocale for languages not already covered (ICU normalizes 
to ISO639-1 2 char codes).
Add test class
Use DublinCore vocabulary declarations in DCEntry and SurrogateReader 
for easier usage debugging, 
Init SurrogateReader.inputSource on first use.

Changed Files: source/net/yacy/document/content/DCEntry.java, source/net/yacy/document/content/SurrogateReader.java, test/java/net/yacy/document/content/DCEntryTest.java
Sat Mar 04 22:45:17 CET 2017
by reger
further avoid to set connect info properties as header value
following comment "use of properties as header values is discouraged"
in case where (proxy)HTTPClient overwrites values with supplied url.
Use defined request.referer procedure in response class.
Changed Files: source/net/yacy/crawler/retrieval/Response.java, source/net/yacy/http/servlets/UrlProxyServlet.java, source/net/yacy/http/servlets/YaCyProxyServlet.java, source/net/yacy/server/http/HTTPDProxyHandler.java
Sat Mar 04 19:41:31 CET 2017
by reger
use pre-defined "Connection" header key, replace depreceated
Changed Files: source/net/yacy/cora/federate/solr/instance/RemoteInstance.java, source/net/yacy/cora/protocol/http/HTTPClient.java
Fri Mar 03 12:05:30 CET 2017
by luccioman
Added an advanced settings page for referrer policy settings.

Feedback will be welcome, notably on the descriptive content of this
Changed Files: htroot/SettingsAck_p.html, htroot/SettingsAck_p.java, htroot/Settings_Referrer.inc, htroot/Settings_p.html, htroot/Settings_p.java, source/net/yacy/http/ReferrerPolicy.java, source/net/yacy/http/servlets/YaCyDefaultServlet.java, source/net/yacy/search/SwitchboardConstants.java
Fri Mar 03 00:21:56 CET 2017
by reger
fix proxyservlet response url to respect http scheme if a relative 
Location header is returned.
Changed Files: source/net/yacy/http/servlets/UrlProxyServlet.java, source/net/yacy/http/servlets/YaCyProxyServlet.java
Wed Mar 01 09:43:00 CET 2017
by luccioman
Updated Archive-It heuristics URL.

The archive-it OpenSearch URL requested without restriction on
collections ("i" parameter) almost always ends up with timeout or fails.
Changed Files: defaults/heuristicopensearch.conf
Mon Feb 27 23:00:46 CET 2017
by reger
fixed ReindexSolrBusyThread new and unexpected repeat of same query with
low number of found documents - by adding additional end condition to 
remove processed query with number of found docs <= process-chunck-size.

Noticed on query h4_txt:[* TO *], found 21, process 21, call of commit happend
but on next cycle same query again 21 docs found (while h4_txt was removed 
from schema and committed inputdocuments).
Changed Files: source/net/yacy/search/index/ReindexSolrBusyThread.java
Mon Feb 27 01:04:31 CET 2017
by reger
fix delta time calculation in PerformanceSearch_p for the 1. entry
(INITIALIZATION displayed absolute date, set delta to 0 for 1. entry)
Changed Files: htroot/PerformanceSearch_p.java
Sun Feb 26 11:03:15 CET 2017
by luccioman
Fixed datacite.org heuristics base url.

The datacite Solr search http URL was returning http status 301 in order
to redirect to its https version, thus making that YaCy heuristic always
Changed Files: defaults/federatecfg/datacite.solr.schema
Sun Feb 26 02:39:52 CET 2017
by reger
Adjust DefaultServlet test case to recent change,
depreciate unused CONNECTION_PROP_PROTOCOL (also as it might be 
misleading with getProtocol vs getScheme)
Changed Files: source/net/yacy/cora/protocol/HeaderFramework.java, source/net/yacy/cora/protocol/RequestHeader.java, test/java/net/yacy/http/servlets/YaCyDefaultServletTest.java
Sat Feb 25 23:55:17 CET 2017
by reger
Fix call parameter for ConnectionInfo in MonitorHandler
(expected scheme e.g. http, was protocol version).
Depreceate obsolete custom X-...-Scheme header constant.
Use existing FORMAT_ANSIC Dateformatter in HeaderFramework.
Correct htmlParserTest (del one not intended println)
Changed Files: source/net/yacy/cora/protocol/HeaderFramework.java, source/net/yacy/cora/protocol/RequestHeader.java, source/net/yacy/http/MonitorHandler.java, source/net/yacy/http/servlets/YaCyDefaultServlet.java, test/java/net/yacy/document/parser/htmlParserTest.java
Fri Feb 24 11:09:42 CET 2017
by luccioman
Added a hint title for required fields in the Solr Schema editor
Changed Files: htroot/IndexSchema_p.html
Fri Feb 24 11:08:18 CET 2017
by luccioman
Switched a few more Solr fields from strictly mandatory to optional
Changed Files: defaults/solr.collection.schema, source/net/yacy/search/schema/CollectionSchema.java
Fri Feb 24 01:25:32 CET 2017
by reger
fix htmlParser <script> text extraction on code containing expression
recognized as tag like 1<a
reported in https://github.com/yacy/yacy_search_server/issues/109

Script content is ignored by default, but the text is filtered for html
tags. Modified scraper to skip tag filtering while within a <script> 
section (until a closing tag is detected </script>. 
Possible side effect, missing </script> end-tag will truncate trailing 
content text.
Changed Files: source/net/yacy/document/parser/html/ContentScraper.java, source/net/yacy/document/parser/html/TransformerWriter.java, test/java/net/yacy/document/parser/htmlParserTest.java
Thu Feb 23 11:09:43 CET 2017
by luccioman
Improved MultiprocotolURL non ASCII characters support.

After @sinkuu Pull Request #108 added JUnit tests, updated some JavaDoc
and also improved URL tokenization to support non ASCII characters.
Changed Files: source/net/yacy/cora/document/id/MultiProtocolURL.java, test/java/net/yacy/cora/document/id/MultiProtocolURLTest.java
Thu Feb 23 07:52:55 CET 2017
by luccioman
Merge pull request #110 from goofy-bz/patch-1

Fixing some typos
Changed Files: locales/fr.lng
Thu Feb 23 01:13:31 CET 2017
by goofy-bz
Fixing some typos

up to line #1000 only
Changed Files: locales/fr.lng
Thu Feb 23 00:27:56 CET 2017
by reger
Correct dublincore title property text to lowercase in htmlresponsewriter,
remove unused (carry over) local variable
Do the same for other responsewriter.
Changed Files: source/net/yacy/cora/federate/solr/responsewriter/EnhancedXMLResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/OpensearchResponseWriter.java
Wed Feb 22 02:01:48 CET 2017
by Burkhard
Update SearchEvent.java

Fix NPE on disabled local SolrIndex, occuring on search moving to the 2nd result page.
The debug purpose only setting to disabeling local SolrIndex (System Admin -> Debug Settings) should long term probably be removed from production code.
Changed Files: source/net/yacy/search/query/SearchEvent.java
Tue Feb 21 22:59:11 CET 2017
by luccioman
Switched some Solr fields from mandatory to optional

These fields are default enabled but with no doubt not strictly
mandatory with the current code base.

As reported by @reger24, splitting between essential mandatory and
optional fields is still to be improved to reflect the current YaCy
Changed Files: defaults/solr.collection.schema, source/net/yacy/search/schema/CollectionSchema.java
Mon Feb 20 23:27:33 CET 2017
by reger
Add extract of queries.log in form of top search word cloud (last 7 days)
to AccessTracker_p.html (Network Access -> Local Search Log page).
It displays top 20 words of search queries.
Changed Files: htroot/AccessTracker_p.html, htroot/AccessTracker_p.java
Mon Feb 20 00:14:14 CET 2017
by reger
correct fromDate init value on missing param in api/timeline_p servlet
revert test modification from last commit in AccessTracker.main
Changed Files: htroot/api/timeline_p.java, source/net/yacy/search/query/AccessTracker.java
Sun Feb 19 05:23:17 CET 2017
by reger
add hint of query syntax in AccessTracker log (qs=normal querystring,
sq=solr-querystring) to allow to filter simple text queries for processing,
remove toString for counter parameter
use more predefined constants in solrservlet
Changed Files: source/net/yacy/http/servlets/GSAsearchServlet.java, source/net/yacy/http/servlets/SolrSelectServlet.java, source/net/yacy/search/query/AccessTracker.java
Fri Feb 17 11:09:30 CET 2017
by luccioman
Fixed a NullPointerException case possible on Index Export

As reported by Palulukas in YaCy forum
the Index Export operation can fails, notably when the Solr index
contains one or more documents with empty (despite required)
"load_date_dt" field.

This fixes the export failure when the situation finally occurs, but
more should be done to harden verifications on minimum required fields.
Changed Files: source/net/yacy/search/index/Fulltext.java
Thu Feb 16 01:43:14 CET 2017
by reger
Reduce self generated content for text_t (visible text index field) 
to avoid repeat of tokenized url as description,
continuation of https://github.com/yacy/yacy_search_server/commit/7e09bff4a1a117d2f2336e004ec67ffb325a7e9d
Add some javadoc, and not needed remove of omitted fields in postprocessing.
Changed Files: source/net/yacy/search/schema/CollectionConfiguration.java
Wed Feb 15 23:26:54 CET 2017
by reger
removed faroo news from default opensearch config
As @luccioman informed, it's only useable with a free api key
Changed Files: defaults/heuristicopensearch.conf
Wed Feb 15 15:04:40 CET 2017
by luccioman
Added robots.txt support for heuristics federated search.

As noticed by @reger24, abusive use of OpenSearch systems should be
prevented, especially if allowing to parse and reuse HTML results.
robots.txt file is now checked before requesting an external OpenSearch
system to respect the host exclusions and eventual crawl-delay value.
The check is also performed when trying to add a new OpenSearch URL
template through the /ConfigHeuristics_p.html admin page.
Changed Files: htroot/ConfigHeuristics_p.java, source/net/yacy/cora/federate/FederateSearchManager.java
Sat Feb 11 08:10:14 CET 2017
by sinkuu
Use java.net.URLDecoder
Changed Files: source/net/yacy/cora/document/id/MultiProtocolURL.java
Tue Feb 14 02:30:26 CET 2017
by reger
adjust translation to renamed configparser_p.html
Changed Files: locales/cn.lng, locales/de.lng, locales/hi.lng, locales/ja.lng, locales/master.lng.xlf, locales/ru.lng, locales/uk.lng
Tue Feb 14 02:04:42 CET 2017
by reger
make ConfigParser a protected page, for consistent behavior of locked
menu items.
Changed Files: htroot/ConfigParser_p.html, htroot/ConfigParser_p.java, htroot/env/templates/submenuCrawler.template
Tue Feb 14 00:31:32 CET 2017
by reger
update opensearch conf - remove suche.sueddeutsche.de
apparently they've revoked the participation in opensearch initiative.
Changed Files: defaults/heuristicopensearch.conf
Fri Feb 10 09:40:42 CET 2017
by luccioman
Upgraded Apache Ant to 1.10.1 in the Docker alpine flavor image

For a more reliable Docker image build, also switched to the ant archive
repository to fetch the needed binary as other repositories only provide
the latest versions.
Changed Files: docker/Dockerfile.alpine
Thu Feb 09 16:42:21 CET 2017
by luccioman
Replaced absolute redirection locations by relative ones when possible.

This makes integration of YaCy behind a reverse proxy subfolder easier.
Changed Files: htroot/Blacklist_p.java, htroot/Status.java, htroot/Wiki.java, source/net/yacy/repository/BlacklistHelper.java
Mon Feb 06 12:41:24 CET 2017
by luccioman
Improved termination of timed out remote solr requests to peers.

On timeout, closing remote Solr requests is proper than simply using
Thread.interrupt() that is not effective in most cases. Closing does not
ask commit on remote solr, but release http connections resources and is
more likely to end those threads that can else wait indefinitely.

Other related improvements included :
 - no more marking remote peer as not available when remote search is
interrupted before timeout by the cleanup job.
 - added a short fine log level trace of failing remote solr requests
Changed Files: source/net/yacy/peers/Protocol.java
Fri Feb 03 10:32:31 CET 2017
by luccioman
Removed deprecated "localMissCount" prop from yacysearchlatestinfo.json.

This property has been deprecated four years ago by commit
d74472f5625ff097e7541e1a56156cbe487b2651. For any active search event
id, it was then always filled with "-UNRESOLVED_PATTERN-".
Changed Files: htroot/yacysearchlatestinfo.java, htroot/yacysearchlatestinfo.json
Fri Feb 03 09:55:08 CET 2017
by luccioman
Named a Thread without name for easier monitoring
Changed Files: source/net/yacy/search/query/SearchEvent.java
Fri Feb 03 09:54:29 CET 2017
by luccioman
Distinguished solr connectors thread names for easier monitoring.
Changed Files: source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java, source/net/yacy/cora/federate/solr/connector/RemoteSolrConnector.java
Wed Feb 01 18:44:42 CET 2017
by luccioman
Refactored the DHT-Trigger section in Performance_p.html page.

This is to be more easily understandable and to reflect more accurately
the current memory strategies implementations that eventually set the
"proper" state not only because DHT reception.
Changed Files: htroot/Performance_p.html, locales/cn.lng, locales/de.lng, locales/fr.lng, locales/master.lng.xlf, locales/ru.lng, locales/uk.lng
Tue Jan 31 16:33:17 CET 2017
by luccioman
Updated French translation for the /Performance_p.html page.

Also updated the master xliff file with missing recent changes.
Changed Files: locales/fr.lng, locales/master.lng.xlf
Tue Jan 31 09:20:19 CET 2017
by luccioman
Fixed unresolved pattern on directory entries in HostBrowser.xml api.

As described in mantis 725 (http://mantis.tokeek.de/view.php?id=725) the
HostBrowser.xml api directory entries had incorrect count attribute
This was because the HostBrowser html page and backing template servlet
evolved, but modifications were not reported on the xml api.
Changed Files: htroot/HostBrowser.xml
Mon Jan 30 22:44:28 CET 2017
by reger
adjust column layout in Settings_Proxy.inc
Changed Files: htroot/Settings_Proxy.inc
Sat Jan 28 10:19:39 CET 2017
by luccioman
Added a CSS class for infobox block.

This will prevent mistakenly hiding a div element not designed to be an
infobox but having a ".info" parent (After having previously added the
possibility for a div - and not only a span element - to be an infobox).
Changed Files: htroot/Performance_p.html, htroot/env/base.css
Sat Jan 28 01:13:57 CET 2017
by reger
Update language file de & master, remove obsolete "Augmented Browsing"
Changed Files: locales/de.lng, locales/master.lng.xlf
Sat Jan 28 00:36:03 CET 2017
by reger
Add consistency check for related index fields upon load and save of 
index schema.
To assemble the original link url for out-/inboundlinks, icons and pictures
the *_protocol_sxt and *_urlstub_sxt is needed (due to the used data-reduced
storage methode). Auto-enable *_protocol_sxt if *_urlstub_sxt is enabled.
to be able to correctly assemble the original link url.
Changed Files: source/net/yacy/search/schema/CollectionConfiguration.java
Thu Jan 26 23:49:15 CET 2017
by reger
adjust the Field-Reindex Thread to verify and update the document id
in case hash (ID) doesn't match document url (sku field).
Changed Files: source/net/yacy/search/index/ReindexSolrBusyThread.java
Thu Jan 26 06:37:29 CET 2017
by Michael Christen
Merge pull request #98 from Velociraptor85/patch-2

Changed Files: addon/yacyInit.sh
Thu Jan 26 06:29:42 CET 2017
by Michael Christen
Merge pull request #105 from ivar/patch-1

Update README.md - removes deprecated URL
Changed Files: README.md
Thu Jan 26 05:36:48 CET 2017
by Ivar Vasara
Update README.md - removes deprecated URL
Changed Files: README.md
Thu Jan 26 01:13:32 CET 2017
by luccioman
Improved Index Browser accessibility with semantically richer html tags.

Made use of ol, li, thead, th, tbody, h1 and h2 html tags.
Added aria-label attributes to provide alternative textual information
previously only conveyed by color cue.

Tested behavior with NVDA 2016.4 screen reader.
Changed Files: htroot/HostBrowser.html
Wed Jan 25 09:54:39 CET 2017
by luccioman
Fixed local image search pagination regression.

As reported by @tglman on issue #90, when searching images on the local
index only, pages next to the first were always empty. This was a
regression from commit c25e48e969f180dcc3c73863acbfcc383a182c8f.
Changed Files: source/net/yacy/search/query/SearchEvent.java
Tue Jan 24 17:14:14 CET 2017
by luccioman
Updated master xliff file with missing entries for HostBrowser.html.

Also translated lang="en" html attribute to lang="[targetLang]" on
locale files having translated entries for HostBrowser.html
Changed Files: locales/de.lng, locales/fr.lng, locales/master.lng.xlf, locales/ru.lng
Tue Jan 24 15:56:29 CET 2017
by Michael Peter Christen
added dc.date.modified and dc.date.created to date parser
Changed Files: source/net/yacy/document/parser/html/ContentScraper.java
Tue Jan 24 11:38:56 CET 2017
by luccioman
Updated French translation of HostBrowser.html
Changed Files: locales/fr.lng
Tue Jan 24 09:40:43 CET 2017
by luccioman
Fixed Index Browser page HTML validation errors and switched to HTML5.

Also removed deprecated HTML attributes uses.

Validation performed with Nu Html Checker 17.1.0.

Cross browser tested with :
 - Debian Jessie : Firefox ESR 45.6.0
 - MS Windows 10 : Firefox 50.1.0, Chrome 55.0.2883.87, MS Edge
Changed Files: htroot/HostBrowser.html, htroot/HostBrowser.java, htroot/HostBrowserAdmin_p.html
Tue Jan 24 01:51:28 CET 2017
by reger
assure that RWI Index.Segment IODispatcher is not blocking on shudown
waiting on a semaphore permit.
see desc. http://mantis.tokeek.de/view.php?id=723
Changed Files: source/net/yacy/kelondro/rwi/IODispatcher.java
Mon Jan 23 16:05:51 CET 2017
by luccioman
Documented /HostBrowser.html related configuration settings
Changed Files: defaults/yacy.init, htroot/HostBrowser.java
Mon Jan 23 14:49:02 CET 2017
by luccioman
Display Index Browser links requiring auth only when authenticated.

In the /HostBrowser.html page "only hosts with urls pending in the
crawler", "only with load errors" and "Administration Options" all
require administration credentials. But they were displayed even to
unauthenticated users, and clicking them did nothing and returned the
/HostBrowser.html page empty.
Changed Files: htroot/HostBrowser.html, htroot/HostBrowser.java
Sun Jan 22 12:31:14 CET 2017
by luccioman
Fixed display of crawler pending URLs counts in HostBrowser.html page.

As described in mantis 722 (http://mantis.tokeek.de/view.php?id=722)

Also updated some Javadoc.
Changed Files: htroot/HostBrowser.java, source/net/yacy/crawler/Balancer.java, source/net/yacy/crawler/HostBalancer.java, source/net/yacy/crawler/data/NoticedURL.java
Sun Jan 22 12:19:43 CET 2017
by luccioman
Removed temporary test main method commited by mistake. 
Changed Files: htroot/yacysearch.java
Sun Jan 22 00:01:18 CET 2017
by reger
add ukr and pol to DCEntry.getLanguage ISO639-2 3-char language code 
conversion to deliver uk, pl 2-char code
and use if else to return on match
Changed Files: source/net/yacy/document/content/DCEntry.java
Sat Jan 21 01:53:43 CET 2017
by reger
delete outdated and unmaintained Netbeans project
Netbeans has good build-in maven support which is a supported and 
maintained build env, making special and additional NB setting obsolete.
Changed Files:
Fri Jan 20 02:15:11 CET 2017
by reger
upd to commons-compress-1.13.jar
hide external icon on forge logo (was also out of position in IE)
Changed Files: .classpath, build.xml, htroot/Status.html, lib/commons-compress-1.13.License, lib/commons-compress-1.13.jar, pom.xml
Thu Jan 19 12:30:44 CET 2017
by luccioman
Added an optional parameter to webstructure.xml api.

This new "documentStructure" parameter can be set to false to only get
hosts accumulated references on a resource and thus prevent scraping the
specified URL and getting citations references.

Also set WebStructureGraph constants as final and updated the Javadoc
with example api call URLs.  
Changed Files: htroot/api/webstructure.java, source/net/yacy/peers/graphics/WebStructureGraph.java
Tue Jan 17 23:45:56 CET 2017
by reger
remove obsolete lastmodified calculation in WebgraphConfig
Changed Files: source/net/yacy/search/schema/WebgraphConfiguration.java
Tue Jan 17 17:01:56 CET 2017
by luccioman
Updated Javadoc and Junit tests for the WebStructureGraph class.
Changed Files: source/net/yacy/peers/graphics/WebStructureGraph.java, test/java/net/yacy/peers/graphics/WebStructureGraphTest.java
Tue Jan 17 15:59:55 CET 2017
by luccioman
Made sure webstructure.xml API produces valid XML.

Host names should not contain XML special characters such as quotation
mark, but at this stage the WebGraph may have mistakenly recorded a host
name with such characters. What's more the DigestURL constructor does
not prevent this.
By the way using serverObjects.putXML to encode host names we ensure
here the rendered XML is well formed and can be parsed by external tools
even if an structure entry is incorrect.
Changed Files: htroot/api/webstructure.java
Mon Jan 16 18:41:58 CET 2017
by luccioman
Fixed WatchWebStructure_p.html render to include https URLs.

As described in mantis 721 (http://mantis.tokeek.de/view.php?id=721)
WatchWebStructure_p.html failed to include in its structure view https
and other protocols and ports than default http.
Changed Files: htroot/WebStructurePicture_p.java, source/net/yacy/peers/graphics/WebStructureGraph.java, test/java/net/yacy/peers/graphics/WebStructureGraphTest.java
Mon Jan 16 16:41:06 CET 2017
by luccioman
Fixed webstructure.xml API used with a domain name 'about' parameter.

As described in mantis 720 (http://mantis.tokeek.de/view.php?id=720),
when requesting this API with a domain name instead of a complete URL
only HTTP references on default port were listed.
Changed Files: htroot/api/webstructure.java, source/net/yacy/peers/graphics/WebStructureGraph.java, test/java/net/yacy/peers/graphics/WebStructureGraphTest.java
Mon Jan 16 10:18:42 CET 2017
by luccioman
Factored code re-implementing DigestURL.hosthash() method.

This ensure consistent implementation of the url host hash generation
and easier usage finding in source code.

Also added a unit test for this function.
Changed Files: htroot/WebStructurePicture_p.java, source/net/yacy/cora/document/id/DigestURL.java, source/net/yacy/crawler/CrawlStacker.java, source/net/yacy/kelondro/data/meta/URIMetadataNode.java, source/net/yacy/peers/graphics/WebStructureGraph.java, source/net/yacy/search/Switchboard.java, test/java/net/yacy/cora/document/id/DigestURLTest.java
Fri Jan 13 16:10:59 CET 2017
by luccioman
Added automated unit tests and perfs test for WebStructureGraph class.

Fixed references count when multiple links target the same domain name
in one document.
Changed Files: source/net/yacy/peers/graphics/WebStructureGraph.java, test/java/net/yacy/peers/graphics/WebStructureGraphTest.java
Fri Jan 13 16:05:46 CET 2017
by luccioman
Factored common code with DigestURL.hosthash()
Changed Files: htroot/HostBrowser.java, htroot/api/webstructure.java
Thu Jan 12 17:52:47 CET 2017
by luccioman
Detailed some Javadoc related to /api/webstructure.xml usage.
Changed Files: htroot/api/webstructure.java, source/net/yacy/peers/graphics/WebStructureGraph.java
Thu Jan 12 01:36:30 CET 2017
by reger
Start to rename "Augmented Browsing" to "Web Proxy ..." / "View via Proxy"
The augmented Browsing option was reduced to the web proxy functionallity.
Augmented browsing is not available and no known plan exist to reimplement
alteration of result pages with additional information.
Changed Files: htroot/AugmentedBrowsing_p.html, htroot/ConfigSearchPage_p.html, htroot/yacysearchitem.html, locales/de.lng, locales/master.lng.xlf
Mon Jan 09 16:45:31 CET 2017
by luccioman
Ignore generated Javadoc with git SCM.
Changed Files: .gitignore
Sat Jan 07 18:24:29 CET 2017
by reger
fix DC.Elements namespace in DublinCore vocabulary class
delete redundant (unused) DCElements.
Changed Files: source/net/yacy/cora/lod/vocabulary/DublinCore.java
Fri Jan 06 12:24:31 CET 2017
by luccioman
Blacklist import and update performance improvements.

Measurement sample : import from blacklist local file containing about
15000 entries
 - before refactoring : several minutes
 - after refactoring : a few seconds!
Changed Files: htroot/BlacklistCleaner_p.java, htroot/IndexControlRWIs_p.java, htroot/sharedBlacklist_p.java, source/net/yacy/repository/Blacklist.java, source/net/yacy/repository/BlacklistHostAndPath.java
Fri Jan 06 11:23:40 CET 2017
by luccioman
Added some JavaDoc.
Changed Files: htroot/sharedBlacklist_p.java, source/net/yacy/server/serverObjects.java
Fri Jan 06 09:00:28 CET 2017
by luccioman
Display result favicons only for http or https resources.

Favicon display only makes sense for http(s) websites, being public or
intranet. So I modified the favicon conditional display to verify the
result URL protocol rather than if we are in intranet mode.

Also prevented rendering an img HTML tag with empty src on other results
protocols such as ftp or file.

Fixing this thanks to priest2 report
Changed Files: htroot/yacysearchitem.html, htroot/yacysearchitem.java, htroot/yacysearchitem.json
Fri Jan 06 03:01:52 CET 2017
by reger
fix concurrency issue with htmlParser using not current scraper data
resulting in incorrect data for some html index metadata.
Details see http://mantis.tokeek.de/view.php?id=717
Changed Files: source/net/yacy/document/AbstractParser.java, source/net/yacy/document/Document.java, source/net/yacy/document/content/DCEntry.java, source/net/yacy/document/parser/genericParser.java, source/net/yacy/document/parser/htmlParser.java, source/net/yacy/search/schema/CollectionConfiguration.java
Thu Jan 05 14:54:59 CET 2017
by luccioman
Added descriptive titles to Crawler_p.html speed settings.

As reported by bubul
(http://forum.yacy-websuche.de/viewtopic.php?f=23&t=5924) , LF and MH
acronyms meaning were not detailed.
Also added label tags for improved accessibility on these input fields.
Changed Files: htroot/Crawler_p.html
Thu Jan 05 00:24:37 CET 2017
by reger
fix exception on URIMetadataNote instantiation with corrected id hash on
host_id_s. Use Solr setField instead of addField to prevent
java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String
	at net.yacy.kelondro.data.meta.URIMetadataNode.hosthash(URIMetadataNode.java:247)
	at net.yacy.search.query.SearchEvent.addNodes(SearchEvent.java:966)
	at net.yacy.peers.Protocol.solrQuery(Protocol.java:1242)
	at net.yacy.peers.RemoteSearch$2.run(RemoteSearch.java:349)
Changed Files: source/net/yacy/kelondro/data/meta/URIMetadataNode.java
Mon Jan 02 14:23:25 CET 2017
by luccioman
Upgraded Apache Ant to 1.10.0 for the Alpine flavor Docker image. 
Changed Files: docker/Dockerfile.alpine
Mon Jan 02 10:24:17 CET 2017
by luccioman
Adjusted crawl depth control for FTP crawl start URLs.
Changed Files: source/net/yacy/crawler/CrawlStacker.java
Mon Jan 02 03:04:21 CET 2017
by reger
Complete harmonization RequestHeader getCookie with std ServletRequest
to use javax.servlet.http.Cookie parameters.
Depreciate now obsolete getHeaderCookies.
Adjust setting of MaxAge to spec if >= 0 otherwise keep default.
Changed Files: htroot/CookieTest_p.java, htroot/User.java, source/net/yacy/cora/protocol/RequestHeader.java, source/net/yacy/cora/protocol/ResponseHeader.java, source/net/yacy/data/UserDB.java, source/net/yacy/search/Switchboard.java
Sun Jan 01 23:58:38 CET 2017
by reger
On negative result vote also delete document from fulltext index
(not only from dht)
Changed Files: htroot/yacysearch.java
Sun Jan 01 23:54:18 CET 2017
by reger
Merge origin/master
Changed Files: docker/Dockerfile, docker/Dockerfile.alpine, docker/Readme.md, startYACY.sh
Sun Jan 01 23:53:44 CET 2017
by reger
fix of fulltext.remove() by id of webgraph document
webgraph has document hash in source_id_s
Changed Files: source/net/yacy/search/index/Fulltext.java
Sat Dec 31 09:51:07 CET 2016
by luccioman
Fixed docker stop behavior.

- Adjusted start script in debug mode to make sure the main java process
can receive signals such as SIGTERM
- Modified docker images main command to properly propagate SIGTERM
signal to the main java process
Changed Files: docker/Dockerfile, docker/Dockerfile.alpine, docker/Readme.md, startYACY.sh
Wed Dec 28 09:47:27 CET 2016
by luccioman
Fixed YaCy proper shutdown triggered by SIGTERM signal.

The main shutdown hook thread was not properly waiting for the main
thread termination which consequently could not properly close resources
and threads. After terminating a running YaCy peer this way (Ctrl+C in
console, or kill <pid> for example), you could see the still existing
DATA/yacy.running file.

Tested with :
 - Debian Jessie openjdk 7 and 8 : regular shutdown, Ctrl+C, kill
command, system restart while yacy is running
 - Windows 10 Oracle JDK 7 and 8 : non regression on regular shutdown 
Changed Files: source/net/yacy/search/Switchboard.java, source/net/yacy/yacy.java