YaCy Release current_development

Major Changes   
Jump to: Bugfixes / Other Changes

Fri Jun 09 12:25:23 CEST 2017
by Michael Peter Christen
migrated Solr 5.5 -> Solr 6.6 and from Java 1.7 -> 1.8
Also: now Version 1.921
Changed Files: .classpath, .settings/org.eclipse.jdt.core.prefs, build.properties, build.xml, defaults/solr/schema.xml, defaults/solr/solrconfig.xml, htroot/yacysearchtrailer.java, lib/commons-math3-3.4.1.jar, lib/lucene-analyzers-common-6.6.0.jar, lib/lucene-analyzers-phonetic-6.6.0.jar, lib/lucene-backward-codecs-6.6.0.jar, lib/lucene-classification-6.6.0.jar, lib/lucene-codecs-6.6.0.jar, lib/lucene-core-6.6.0.jar, lib/lucene-facet-6.6.0.jar, lib/lucene-grouping-6.6.0.jar, lib/lucene-highlighter-6.6.0.jar, lib/lucene-join-6.6.0.jar, lib/lucene-memory-6.6.0.jar, lib/lucene-misc-6.6.0.jar, lib/lucene-queries-6.6.0.jar, lib/lucene-queryparser-6.6.0.jar, lib/lucene-spatial-6.6.0.jar, lib/lucene-suggest-6.6.0.jar, lib/metrics-core-3.2.2.jar, lib/solr-core-6.6.0.jar, lib/solr-dataimporthandler-6.6.0.jar, lib/solr-solrj-6.6.0.jar, lib/spatial4j-0.6.jar, lib/zookeeper-3.4.10.jar, source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java, source/net/yacy/cora/federate/solr/instance/EmbeddedInstance.java, source/net/yacy/cora/federate/solr/instance/InstanceMirror.java, source/net/yacy/cora/federate/solr/instance/ServerMirror.java, source/net/yacy/cora/federate/solr/instance/ServerShard.java, source/net/yacy/cora/federate/solr/responsewriter/EnhancedXMLResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/FlatJSONResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/GSAResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/GrepHTMLResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/OpensearchResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/SnapshotImagesReponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/YJsonResponseWriter.java, source/net/yacy/http/servlets/GSAsearchServlet.java, source/net/yacy/http/servlets/SolrSelectServlet.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/search/query/QueryModifier.java, source/net/yacy/search/query/QueryParams.java, test/java/net/yacy/document/DateDetectionTest.java
Sat Jun 03 04:00:46 CEST 2017
by luccioman
Ensure file input streams proper closing in both success and failures

Also add when possible a warning level log message on input stream
closing error instead of failing silently. This could help understanding
some IO exceptions such as "too many files open".
Changed Files: source/net/yacy/document/parser/images/bmpParser.java, source/net/yacy/document/parser/images/genericImageParser.java, source/net/yacy/document/parser/images/icoParser.java, source/net/yacy/gui/framework/Switchboard.java, source/net/yacy/kelondro/blob/Gap.java, source/net/yacy/kelondro/blob/HeapReader.java, source/net/yacy/kelondro/index/RowHandleMap.java, source/net/yacy/kelondro/index/RowHandleSet.java, source/net/yacy/kelondro/util/FileUtils.java, source/net/yacy/kelondro/util/SetTools.java, source/net/yacy/kelondro/util/XMLTables.java, source/net/yacy/repository/Blacklist.java, source/net/yacy/search/AutoSearch.java, source/net/yacy/search/Switchboard.java, source/net/yacy/server/http/TemplateEngine.java, source/net/yacy/utils/PKCS12Tool.java, source/net/yacy/utils/cryptbig.java, source/net/yacy/utils/tarTools.java, source/net/yacy/utils/translation/TranslationManager.java, test/java/net/yacy/document/parser/htmlParserTest.java, test/java/net/yacy/document/parser/images/genericImageParserTest.java, test/java/net/yacy/document/parser/images/metadataImageParserTest.java, test/java/net/yacy/document/parser/pdfParserTest.java
Fri Jun 02 12:14:29 CEST 2017
by luccioman
Ensure proper closing of file input streams.
Changed Files: source/net/yacy/cora/document/id/MultiProtocolURL.java, source/net/yacy/cora/geo/OpenGeoDBLocation.java, source/net/yacy/cora/protocol/ftp/FTPClient.java, source/net/yacy/cora/storage/Files.java, source/net/yacy/crawler/data/Snapshots.java, source/net/yacy/data/Translator.java, source/net/yacy/document/Condenser.java, source/net/yacy/document/Document.java, source/net/yacy/document/parser/pdfParser.java, source/net/yacy/http/Jetty9HttpServerImpl.java, source/net/yacy/utils/CryptoLib.java, source/net/yacy/utils/PKCS12Tool.java, source/net/yacy/utils/cryptbig.java, source/net/yacy/utils/gzip.java, source/net/yacy/yacy.java, test/java/net/yacy/document/ParserTest.java, test/java/net/yacy/document/parser/xlsParserTest.java
Fri Jun 02 01:00:21 CEST 2017
by reger
Introduce keyword query parameter 
This enables keyword navigator to filter on keywords. Added search page
output and layout config for keywords, allowing e.g. in Intranet use
to display the keywords. No styling or links applied to the keyword
text (but is desirable possibly in combination with bootstrap-tagsinput
for future/intranet).
Changed Files: defaults/yacy.init, htroot/ConfigSearchPage_p.html, htroot/ConfigSearchPage_p.java, htroot/index.html, htroot/yacysearchitem.html, htroot/yacysearchitem.java, source/net/yacy/search/navigator/StringNavigator.java, source/net/yacy/search/query/QueryModifier.java, source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/query/SearchEvent.java
Mon May 15 13:15:16 CEST 2017
by luccioman
Added user interface feedback on results feeding termination status.

Added as an additional icon with title in the search progress bar, to
inform about background search feeder threads terminated or still
running. While giving a bit more information to users about the p2p
search process, this can help choosing whether or not wait a little bit
more time before going to the next page, in order to get results from
various sources sorted as best as possible (see #91 for a discussion
about sorting accuracy and network latency).

Other related modifications included :
 - regular updates to statistics in the progress bar until the
background feeders are completely terminated.
 - removed some uses of unsecure and discouraged JavaScript elements
Changed Files: htroot/js/yacysearch.js, htroot/yacysearch.html, htroot/yacysearchitem.html, htroot/yacysearchitem.java, htroot/yacysearchlatestinfo.java, htroot/yacysearchlatestinfo.json, source/net/yacy/search/query/SearchEvent.java
Thu May 11 18:02:33 CEST 2017
by luccioman
Improved previous merge "Show ranking in HTML UI".

- added the new setting as configurable in the "Debug/Analysis" settings
page. Debug/analysis is its main purpose for now as there is currently
no nice and "understansable" ranking score info servlet (see forum
discussion http://forum.yacy-websuche.de/viewtopic.php?f=8&t=5884 ) 
- render in the "Search Page Layout" page preview when enabled
- added constants
Changed Files: defaults/yacy.init, htroot/ConfigSearchPage_p.html, htroot/ConfigSearchPage_p.java, htroot/SettingsAck_p.java, htroot/Settings_Debug.inc, htroot/Settings_p.java, htroot/yacysearchitem.html, source/net/yacy/search/SwitchboardConstants.java
Fri Apr 14 14:32:44 CEST 2017
by luccioman
Extended Mediawiki dump import to remote URLs.

When using a public HTTP URL in /IndexImportMediawiki_p.html, the remote
file now is directly streamed and processed, allowing import of several
GB dumps even with a low memory remote peer, and without need to
manually download the dump file first.
Changed Files: bin/importmediawiki.sh, htroot/IndexImportMediawiki_p.html, htroot/IndexImportMediawiki_p.java, source/net/yacy/cora/document/id/MultiProtocolURL.java, source/net/yacy/crawler/retrieval/FileLoader.java, source/net/yacy/crawler/retrieval/SMBLoader.java, source/net/yacy/document/importer/MediawikiImporter.java, source/net/yacy/document/parser/htmlParser.java, source/net/yacy/repository/LoaderDispatcher.java, source/net/yacy/search/index/DocumentIndex.java
Thu Apr 06 21:18:01 CEST 2017
by reger
upd to Solr-5.5.4
Changed Files: .classpath, build.xml, lib/lucene-analyzers-common-5.5.4.jar, lib/lucene-analyzers-phonetic-5.5.4.jar, lib/lucene-backward-codecs-5.5.4.jar, lib/lucene-classification-5.5.4.jar, lib/lucene-codecs-5.5.4.jar, lib/lucene-core-5.5.4.jar, lib/lucene-facet-5.5.4.jar, lib/lucene-grouping-5.5.4.jar, lib/lucene-highlighter-5.5.4.jar, lib/lucene-join-5.5.4.jar, lib/lucene-memory-5.5.4.jar, lib/lucene-misc-5.5.4.jar, lib/lucene-queries-5.5.4.jar, lib/lucene-queryparser-5.5.4.jar, lib/lucene-spatial-5.5.4.jar, lib/lucene-suggest-5.5.4.jar, lib/solr-core-5.5.4.jar, lib/solr-solrj-5.5.4.jar, pom.xml
Tue Apr 04 00:59:26 CEST 2017
by reger
upd to pdfbox-2.0.5.jar and transient dependency xmpcore-5.1.3.jar
required by metadata-extractor-2.10.1 (fix build.xml compiler warning)
Changed Files: .classpath, build.xml, lib/fontbox-2.0.5.License, lib/fontbox-2.0.5.jar, lib/pdfbox-2.0.5.License, lib/pdfbox-2.0.5.jar, lib/xmpcore-5.1.3.jar, lib/xmpcore-5.1.3.license, pom.xml
Mon Apr 03 11:34:49 CEST 2017
by luccioman
Set Config Portal as a private administration page.

Consistently with its required action from submission credentials, and
because external unauthenticated users do not need to access these
Changed Files: defaults/yacy.init, htroot/ConfigAppearance_p.html, htroot/ConfigPortal.java, htroot/ConfigPortal_p.html, htroot/ConfigPortal_p.java, htroot/ConfigSearchPage_p.html, htroot/ConfigSearchPage_p.java, htroot/env/templates/header.template, htroot/env/templates/submenuPortalConfiguration.template, locales/cn.lng, locales/de.lng, locales/fr.lng, locales/hi.lng, locales/ja.lng, locales/master.lng.xlf, locales/ru.lng, locales/uk.lng, source/net/yacy/http/servlets/GSAsearchServlet.java
Fri Mar 31 00:58:11 CEST 2017
by reger
Implement surrogate import from Warc archives (as first option handle
warc = Web ARChive File Format.
Warc files with extension .warc or compressed warc.gz can be placed in the
DATA/surrogate/in and contained responses are imported to the index.
The used library is stream based so we can easily extend it later to use
and load warc's from the net.
Changed Files: .classpath, build.xml, lib/jwat-archive-common-1.0.4.jar, lib/jwat-common-1.0.4.jar, lib/jwat-gzip-1.0.4.jar, lib/jwat-warc-1.0.4.jar, pom.xml, source/net/yacy/document/importer/WarcImporter.java, source/net/yacy/search/Switchboard.java
Sun Mar 26 11:48:00 CEST 2017
by luccioman
Enforced access controls on some administrative actions.

 - ensure use of HTTP POST method : HTTP GET should only be used for
information retrieval and not to perform server side effect operations
(see HTTP standard https://tools.ietf.org/html/rfc7231#section-4.2.1)
 - a transaction token is now required for these administrative form
submissions to ensure the request can not be included in an external
site and performed silently/by mistake by the user browser
Changed Files: bin/clearall.sh, bin/clearcache.sh, bin/clearindex.sh, bin/deleteurl.sh, bin/passwd.sh, bin/protectedPostApiCall.sh, htroot/ConfigAccounts_p.html, htroot/ConfigAccounts_p.java, htroot/ConfigProperties_p.html, htroot/ConfigProperties_p.java, htroot/ConfigUpdate_p.html, htroot/ConfigUpdate_p.java, htroot/IndexControlRWIs_p.html, htroot/IndexControlRWIs_p.java, htroot/IndexControlURLs_p.html, htroot/IndexControlURLs_p.java, htroot/IndexDeletion_p.html, htroot/IndexDeletion_p.java, htroot/IndexFederated_p.html, htroot/IndexFederated_p.java, htroot/PerformanceQueues_p.html, htroot/PerformanceQueues_p.java, htroot/Performance_p.html, htroot/Steering.html, htroot/Steering.java, htroot/env/templates/header.template, htroot/terminal_p.html, source/net/yacy/cora/protocol/HeaderFramework.java, source/net/yacy/data/BadTransactionException.java, source/net/yacy/data/TransactionManager.java, source/net/yacy/http/servlets/DisallowedMethodException.java, source/net/yacy/http/servlets/YaCyDefaultServlet.java, source/net/yacy/yacy.java, stopYACY.sh
Tue Mar 21 17:15:01 CET 2017
by luccioman
Updated shell scripts to be compatible with HTTP Digest authentication

Because curl and wget do not let use a hashed password as parameter,
YaCy shell scripts which require authentication are now interactive by
default when HTTP Digest is the only available authentication method.
Batch mode can still be available trough the use of an environment

Other improvements :
 - added backward compatibility for Basic Authentication
 - fixed curl/wget presence detection 
 - do not return with exit code 0 when an API call failed, and print an
error message when the case occurs
 - documented available authentication options for API calls
Changed Files: bin/apicall.sh, bin/apicat.sh, bin/down.sh, bin/passwd.sh, bin/search1.sh, stopYACY.sh
Sun Mar 19 02:30:08 CET 2017
by reger
Introduce the option to configure a shutdown port.
A port value of -1 will disable this option.

If set to a value greater 0, YaCy listens on this of on the local loopback 
address ( for a shutdown or restart signal.
E.g. connect to http://localhost:8005/shutdown will stop the YaCy server.
http://localhost:8005/restart will restart it.
This option allows to stop YaCy locally independant from the web web 
frontend (which might be configured for password protected remote access).

Changed Files: defaults/yacy.init, htroot/SettingsAck_p.html, htroot/SettingsAck_p.java, htroot/Settings_ServerAccess.inc, htroot/Settings_p.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/SwitchboardConstants.java, source/net/yacy/server/serverSwitch.java
Sat Mar 18 20:02:26 CET 2017
by reger
add switchboardconstants for server ports config keys
Changed Files: htroot/ConfigBasic.java, htroot/QuickCrawlLink_p.java, htroot/SettingsAck_p.java, htroot/api/snapshot.java, source/net/yacy/gui/Tray.java, source/net/yacy/http/Jetty9HttpServerImpl.java, source/net/yacy/migration.java, source/net/yacy/peers/Network.java, source/net/yacy/peers/Seed.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/SwitchboardConstants.java, source/net/yacy/utils/upnp/UPnP.java, source/net/yacy/yacy.java
Tue Feb 28 18:11:54 CET 2017
by luccioman
Privacy enhancement : added settings to control referrer policy.

HTTP "Referer" header sent by the browser when using YaCy can now be
controlled either with the referrer meta tag as a global policy, or only
for search result links by adding the attribute rel="noreferrer".

To improve privacy with the less possible regressions, the default is
set as meta tag with value "origin-when-cross-origin" : internal YaCy
links behavior is not affected, but when visiting external websites
referrer url is not empty but stripped from query parameters and path.

Older browsers, Safari, MS IE and Edge do not support the referrer meta
tag, so the standard but less flexible noreferrer link type can also be
enabled as an alternative.

User-friendly settings page to be implemented.
Changed Files: defaults/yacy.init, htroot/env/templates/metas.template, htroot/yacysearchitem.html, htroot/yacysearchitem.java, source/net/yacy/http/servlets/YaCyDefaultServlet.java, source/net/yacy/search/SwitchboardConstants.java
Mon Feb 20 10:48:07 CET 2017
by luccioman
Refactored and enforced Solr mandatory fields for proper operation

- Added a new method to check activation of mandatory fields on
Collection Configuration commit, consistently with checks previously
performed in Switchboard startup and with mandatory fields in the
default schema.
- Reorganized default schema and CollectionConfiguration enumeration :
moved no more mandatory fields in a specific section, and moved fields
enabled at startup to the mandatory section. 
- Marked mandatory fields as required and with stronger font in the
IndexSchema_p.html page
Changed Files: defaults/solr.collection.schema, htroot/IndexSchema_p.html, htroot/IndexSchema_p.java, source/net/yacy/cora/federate/solr/SchemaDeclaration.java, source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/CollectionSchema.java, source/net/yacy/search/schema/WebgraphSchema.java
Mon Feb 13 19:11:17 CET 2017
by luccioman
Added support for HTML OpenSearch results.

Many OpenSearch systems do not provide results as standard RSS/Atom
feeds but only as HTML. 

This modification add some support for custom OpenSearch HTML results
through the use of mapping files (as already done for federated Solr
search) relying on CSS-like selectors to retrieve information from HTML

An example mapping file is provided to map results from the
www.npmjs.com OpenSearch URL.
Changed Files: defaults/federatecfg/npmjs.html.map.properties, defaults/heuristicopensearch.conf, source/net/yacy/cora/federate/AbstractFederateSearchConnector.java, source/net/yacy/cora/federate/FederateSearchManager.java, source/net/yacy/cora/federate/opensearch/OpenSearchConnector.java, source/net/yacy/cora/protocol/Domains.java, source/net/yacy/cora/protocol/http/HTTPClient.java
Sat Feb 11 19:53:27 CET 2017
by reger
upd to Jetty-9.2.21.v20170120
Changed Files: .classpath, build.xml, lib/jetty-client-9.2.21.v20170120.jar, lib/jetty-continuation-9.2.21.v20170120.jar, lib/jetty-deploy-9.2.21.v20170120.jar, lib/jetty-http-9.2.21.v20170120.jar, lib/jetty-io-9.2.21.v20170120.jar, lib/jetty-jmx-9.2.21.v20170120.jar, lib/jetty-proxy-9.2.21.v20170120.jar, lib/jetty-security-9.2.21.v20170120.jar, lib/jetty-server-9.2.21.v20170120.jar, lib/jetty-servlet-9.2.21.v20170120.jar, lib/jetty-servlets-9.2.21.v20170120.jar, lib/jetty-util-9.2.21.v20170120.jar, lib/jetty-webapp-9.2.21.v20170120.jar, lib/jetty-xml-9.2.21.v20170120.jar, pom.xml
Thu Feb 09 11:05:06 CET 2017
by luccioman
Added a new Debug/Analysis advanced settings subsection.

As discussed in PR #93 with @JeremyRand and @reger24 this new advanced
settings page includes:
 - a new setting to control remote Solr responses encoding
 - some existing debug settings which could not be set through the admin
user interface
Changed Files: defaults/yacy.init, htroot/SettingsAck_p.html, htroot/SettingsAck_p.java, htroot/Settings_Debug.inc, htroot/Settings_p.html, htroot/Settings_p.java, source/net/yacy/cora/federate/SolrFederateSearchConnector.java, source/net/yacy/cora/federate/solr/instance/InstanceMirror.java, source/net/yacy/peers/Protocol.java, source/net/yacy/search/AutoSearch.java, source/net/yacy/search/SwitchboardConstants.java, source/net/yacy/search/index/Fulltext.java
Fri Jan 27 15:47:15 CET 2017
by luccioman
Added user-friendly controls over disk usage configuration settings.

As mentioned in issue #103, control settings over YaCy disk usage
already existed but lacked a user-friendly way to set them.

I added it to the Performance_p.html administration page with a little
refactoring on the "Resource Observer" fieldset for improved
accessibility and HTML standards respect.
Also added the possibility to enable/disable the autoregulation fonction
from this page.
Changed Files: htroot/PerformanceQueues_p.java, htroot/Performance_p.html, htroot/env/base.css, locales/cn.lng, locales/de.lng, locales/master.lng.xlf, locales/ru.lng, locales/uk.lng, source/net/yacy/search/ResourceObserver.java, source/net/yacy/search/SwitchboardConstants.java
Sun Jan 22 23:58:46 CET 2017
by reger
Group all proxy settings on System Administration by adding settings of
UrlProxyAccss page (moved from deleted AugmentedBrowsing_p), adjust
submenu (remove Augmented Browsing) and translation files.
Changed Files: htroot/ConfigSearchPage_p.html, htroot/SettingsAck_p.html, htroot/SettingsAck_p.java, htroot/Settings_UrlProxyAccess.inc, htroot/Settings_p.html, htroot/Settings_p.java, htroot/Status_p.inc, htroot/env/templates/submenuSemantic.template, locales/de.lng, locales/fr.lng, locales/ja.lng, locales/master.lng.xlf, locales/ru.lng, source/net/yacy/http/servlets/UrlProxyServlet.java, source/net/yacy/http/servlets/YaCyProxyServlet.java
Sat Jan 21 00:26:04 CET 2017
by reger
upd to solr-5.5.3
minor bugfix version
Changed Files: .classpath, build.xml, lib/lucene-analyzers-common-5.5.3.jar, lib/lucene-analyzers-phonetic-5.5.3.jar, lib/lucene-backward-codecs-5.5.3.jar, lib/lucene-classification-5.5.3.jar, lib/lucene-codecs-5.5.3.jar, lib/lucene-core-5.5.3.jar, lib/lucene-facet-5.5.3.jar, lib/lucene-grouping-5.5.3.jar, lib/lucene-highlighter-5.5.3.jar, lib/lucene-join-5.5.3.jar, lib/lucene-memory-5.5.3.jar, lib/lucene-misc-5.5.3.jar, lib/lucene-queries-5.5.3.jar, lib/lucene-queryparser-5.5.3.jar, lib/lucene-spatial-5.5.3.jar, lib/lucene-suggest-5.5.3.jar, lib/solr-core-5.5.3.jar, lib/solr-solrj-5.5.3.jar, pom.xml
Mon Jan 09 16:44:47 CET 2017
by luccioman
Cleaned up some Javadoc warnings.
Changed Files: source/net/yacy/cora/date/ISO8601Formatter.java, source/net/yacy/cora/protocol/http/HTTPClient.java, source/net/yacy/data/list/ListAccumulator.java, source/net/yacy/data/list/XMLBlacklistImporter.java, source/net/yacy/data/ymark/YMarkUtil.java, source/net/yacy/document/AbstractParser.java, source/net/yacy/document/Document.java, source/net/yacy/document/LargeNumberCache.java, source/net/yacy/document/LibraryProvider.java, source/net/yacy/document/Parser.java, source/net/yacy/document/TextParser.java, source/net/yacy/document/content/DCEntry.java, source/net/yacy/document/importer/Importer.java, source/net/yacy/document/importer/MediawikiImporter.java, source/net/yacy/document/importer/ResumptionToken.java, source/net/yacy/document/parser/apkParser.java, source/net/yacy/document/parser/docParser.java, source/net/yacy/document/parser/html/ContentScraper.java, source/net/yacy/document/parser/html/Evaluation.java, source/net/yacy/document/parser/html/ImageEntry.java, source/net/yacy/document/parser/html/TransformerWriter.java, source/net/yacy/document/parser/htmlParser.java, source/net/yacy/gui/framework/Switchboard.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/SwitchboardConstants.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/navigator/LanguageNavigator.java, source/net/yacy/search/navigator/Navigator.java, source/net/yacy/search/navigator/RestrictedStringNavigator.java, source/net/yacy/search/navigator/YearNavigator.java, source/net/yacy/search/query/QueryGoal.java, source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/snippet/TextSnippet.java
Wed Jan 04 17:09:37 CET 2017
by luccioman
Upgraded jgit build library to version 4.5.0

This is the latest Java 7 compatible jgit release.

Properly support GitHub tags marked as "Pre-release". 
With the previous venerable jgit version 1.1.0, a YaCy repository clone
having such a tag made GitRevTask and GitRevMavenTask crash.
Changed Files: build.xml, libbuild/GitRevMavenTask/pom.xml, libbuild/GitRevMavenTask/src/GitRevMavenTask.java, libbuild/GitRevTask/GitRevTask.java, libbuild/JavaEWAH-0.7.9.License, libbuild/JavaEWAH-0.7.9.jar, libbuild/httpclient-4.3.6.License, libbuild/httpclient-4.3.6.jar, libbuild/jsch-0.1.53.License, libbuild/jsch-0.1.53.jar, libbuild/org.eclipse.jgit-, libbuild/org.eclipse.jgit-, libbuild/slf4j-api-1.7.2.License, libbuild/slf4j-api-1.7.2.jar, pom.xml

Jump to: YaCy Release current_development top / Other Changes

Thu Jun 08 07:19:16 CEST 2017
by luccioman
Properly close file output streams even on exceptions scenarios.
Changed Files: htroot/ConfigLanguage_p.java, source/net/yacy/cora/federate/solr/instance/EmbeddedInstance.java, source/net/yacy/cora/lod/vocabulary/Tagging.java, source/net/yacy/cora/protocol/ftp/FTPClient.java, source/net/yacy/cora/storage/ZIPWriter.java, source/net/yacy/crawler/data/Transactions.java, source/net/yacy/data/Translator.java, source/net/yacy/document/content/dao/PhpBB3Dao.java, source/net/yacy/document/parser/apkParser.java, source/net/yacy/document/parser/bzipParser.java, source/net/yacy/document/parser/gzipParser.java, source/net/yacy/http/Jetty9HttpServerImpl.java, source/net/yacy/kelondro/blob/Gap.java, source/net/yacy/kelondro/blob/HeapWriter.java, source/net/yacy/kelondro/index/BinSearch.java, source/net/yacy/kelondro/index/RowHandleMap.java, source/net/yacy/kelondro/index/RowHandleSet.java, source/net/yacy/kelondro/util/XMLTables.java, source/net/yacy/peers/operation/yacyRelease.java, source/net/yacy/repository/Blacklist.java, source/net/yacy/search/AutoSearch.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/server/serverSwitch.java, source/net/yacy/utils/gzip.java, source/net/yacy/utils/tarTools.java, source/net/yacy/utils/translation/TranslatorXliff.java, source/net/yacy/visualization/AnimationGIF.java, source/net/yacy/visualization/AnimationPlotter.java, source/net/yacy/visualization/ChartPlotter.java, source/net/yacy/visualization/RasterPlotter.java
Tue May 30 12:32:14 CEST 2017
by luccioman
Fix unescape of URLs having some '%' chars but not percent-encoded
Changed Files: source/net/yacy/cora/document/id/MultiProtocolURL.java, test/java/net/yacy/cora/document/id/MultiProtocolURLTest.java
Tue May 30 08:48:20 CEST 2017
by luccioman
Fixed scraper NullPointerException cases on malformed URLs.
Changed Files: source/net/yacy/document/parser/html/ContentScraper.java
Thu May 18 00:28:12 CEST 2017
by Michael Peter Christen
enhanced debugging
Changed Files: source/net/yacy/search/schema/CollectionSchema.java
Tue May 09 12:15:41 CEST 2017
by luccioman
Fixed Debian install message misspelling.
Changed Files: debian/yacy.templates
Thu May 04 08:45:30 CEST 2017
by luccioman
Fixed the previously added link to scheduled dump operations.
Changed Files: htroot/IndexImportMediawiki_p.html
Mon May 01 11:44:26 CEST 2017
by Michael Peter Christen
copied fix from yacy_grid_parser for wrong array type
Changed Files: source/net/yacy/document/parser/html/ContentScraper.java
Mon Apr 24 13:27:07 CEST 2017
by luccioman
Fixed "Unchecked conversion" compilation warnings.
Changed Files: source/net/yacy/cora/federate/solr/responsewriter/FlatJSONResponseWriter.java, source/net/yacy/cora/util/JSONArray.java, source/net/yacy/cora/util/JSONObject.java, source/net/yacy/document/parser/pdfParser.java, source/net/yacy/search/navigator/FileTypeNavigator.java, source/net/yacy/search/navigator/HostNavigator.java, source/net/yacy/search/navigator/StringNavigator.java, source/net/yacy/search/navigator/TokenizedStringNavigator.java, source/net/yacy/search/navigator/YearNavigator.java
Fri Apr 14 21:14:26 CEST 2017
by reger
fix unresolved_pattern on missing post parameter api/message.html
Changed Files: htroot/yacy/message.java
Thu Mar 30 15:41:14 CEST 2017
by luccioman
Fixed NPE case and API URL link on Solr HTML output for webgraph core.
Changed Files: source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java
Tue Mar 07 12:27:27 CET 2017
by luccioman
Fixed settingsAck_p.html back link for case where referrer is stripped.
Changed Files: htroot/SettingsAck_p.java
Fri Mar 03 13:46:44 CET 2017
by luccioman
Fixed unresolved pattern case on /yacysearchlatestinfo.json api
Changed Files: htroot/yacysearchlatestinfo.java
Thu Feb 16 02:36:24 CET 2017
by reger
fix NPE in HTMLResponseWriter on missing document title
Changed Files: source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java
Thu Feb 09 10:59:41 CET 2017
by luccioman
Fixed NPE case occurring when local solr index is disabled in search.
Changed Files: source/net/yacy/search/query/SearchEvent.java
Tue Jan 24 11:49:15 CET 2017
by luccioman
Index Browser : fixed display of "Count colors" for authorized users.
Changed Files: htroot/HostBrowser.java
Mon Jan 23 14:54:37 CET 2017
by luccioman
Fixed "-UNRESOLVED_PATTERN-" admin parameter in "load & index" links.
Changed Files: htroot/HostBrowser.java
Sat Jan 21 00:35:05 CET 2017
by reger
fix the missing solr-5.5.2.jar delete from prev. commit
Changed Files:
Mon Jan 09 17:59:01 CET 2017
by luccioman
Fixed 2 failing JUNit tests.
Changed Files: test/java/net/yacy/document/DateDetectionTest.java, test/java/net/yacy/utils/translation/TranslatorXliffTest.java
Mon Jan 09 09:57:53 CET 2017
by luccioman
Fixed some JavaDocs broken links.
Changed Files: source/net/yacy/cora/bayes/Classifier.java, source/net/yacy/data/list/ListAccumulator.java, source/net/yacy/search/SwitchboardConstants.java
Mon Jan 09 09:54:14 CET 2017
by luccioman
Fixed maven assembly base directory to match last main YaCy binaries.
Changed Files: assembly.xml

Other Changes   
Jump to: YaCy Release current_development top / Bugfixes

Fri Jun 09 12:50:36 CEST 2017
by Michael Peter Christen
re-added solr synchronization hack
Changed Files: source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java
Thu Jun 08 07:36:11 CEST 2017
by luccioman
Ensure system resource release by closing document stream.
Changed Files: source/net/yacy/document/TextParser.java
Tue Jun 06 10:30:02 CEST 2017
by luccioman
Removed unnecessary finalize implementation.

On such private classes with limited scope but with frequent instance
creations and removals within the application lifecycle, implementing
the finalize method is particularly unwanted as it decreases the garbage
collector performance.
What's more the Object.finalize() method is now deprecated in the JDK 9
and will eventually disappear from future releases (see
Changed Files: source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java
Sun Jun 04 01:50:40 CEST 2017
by reger
Tokenize result entry keywords and add some styling for display
Changed Files: htroot/env/base.css, htroot/yacysearchitem.html, htroot/yacysearchitem.java
Sat Jun 03 21:58:04 CEST 2017
by reger
upd to commons-compress-1.14.jar
Changed Files: .classpath, build.xml, lib/commons-compress-1.14.License, lib/commons-compress-1.14.jar, pom.xml
Fri Jun 02 09:47:45 CEST 2017
by luccioman
Unsure closing ChunkIterator stream in every possible use case.

Also trace in logs the eventual close failures instead of failing
This should help prevent holding too many unreleased system file
handlers, as in the case reported by eros on YaCy forum
Changed Files: source/net/yacy/kelondro/table/ChunkIterator.java, source/net/yacy/kelondro/table/Table.java
Fri Jun 02 01:46:06 CEST 2017
by luccioman
Improved consistency between loader openInputStream and load functions
Changed Files: source/net/yacy/crawler/retrieval/FTPLoader.java, source/net/yacy/crawler/retrieval/FileLoader.java, source/net/yacy/crawler/retrieval/HTTPLoader.java, source/net/yacy/crawler/retrieval/Response.java, source/net/yacy/crawler/retrieval/SMBLoader.java, source/net/yacy/crawler/retrieval/StreamResponse.java, source/net/yacy/repository/LoaderDispatcher.java, source/net/yacy/visualization/ImageViewer.java
Tue May 30 17:38:16 CEST 2017
by luccioman
Added JavaDoc to the getpageinfo_p API servlet.
Changed Files: htroot/api/getpageinfo_p.java
Tue May 30 09:29:28 CEST 2017
by luccioman
Deprecated duplicated and internally unused getpageinfo servlet.

Redirections set for the transition of any eventual external uses:
 - /api/getpageinfo.xml to /api/getpageinfo_p.xml
 - /api/getpageinfo.json to /api/getpageinfo_p.json
Changed Files: htroot/api/getpageinfo.java, htroot/api/getpageinfo_p.json
Mon May 29 19:16:09 CEST 2017
by luccioman
Fixed a NullPointerException case on Digest authentication.

Could occur when upgrading from a Debian package configured with Basic
authentication (as in release 1.92.9000) to a more recent one with
Digest authentication, without having re-encoded the admin password (for
example with dpkg-reconfigure).

As reported by eros on YaCy forum
Changed Files: source/net/yacy/http/YaCyLegacyCredential.java
Wed May 24 22:13:42 CEST 2017
by reger
upd to pdfbox-2.0.6.jar
Changed Files: .classpath, build.xml, lib/fontbox-2.0.6.License, lib/fontbox-2.0.6.jar, lib/pdfbox-2.0.6.License, lib/pdfbox-2.0.6.jar, pom.xml
Wed May 24 08:43:03 CEST 2017
by luccioman
Quoted param value in Solr query to avoid unwanted traces in logs

When Webgraph Solr core is enabled, crawling and removing from index an
URL whose hash starts with the '-' character (example URL :
https://cs.wikipedia.org/ whose hash is "-2-HuTEndn4x") produced a full
ParseException stack trace in YaCy logs. This was not blocking because
the Solr query parser is able to escape itself the query and run it
successfully, but filled uselessly YaCy logs.
Changed Files: source/net/yacy/search/index/Fulltext.java
Tue May 23 07:25:40 CEST 2017
by luccioman
Restored search page default behavior for Tab, Page Up and Down keys

Replaced by shortcuts defined by the HTML "accesskey" attribute which
has the advantage to be advertised by screen readers when focusing the
corresponding buttons, contrary to custom JavasScript key handlers.
Now With Firefox :
 - "Alt + Shift + n" for next page
 - "Alt + Shift + p" for previous page

Following ARIA recommendation : "keyboard shortcuts enhance, not
replace, standard keyboard access." ( see

Fix for mantis 711 (http://mantis.tokeek.de/view.php?id=711)
Changed Files: htroot/js/yacysearch.js, htroot/yacysearch.html
Mon May 22 01:56:11 CEST 2017
by reger
Set request originator to own peer in warc importer
in addition to change in https://github.com/yacy/yacy_search_server/commit/039162fbf0eca808afd350d360c3bcfe62dc4195
Changed Files: source/net/yacy/document/importer/WarcImporter.java
Mon May 22 01:34:08 CEST 2017
by reger
Change warc importer to use defaultsurrogate-crawl profile, as reported
by LA_FORGE http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5990 and
analysed by @luccioman (see comment https://github.com/yacy/yacy_search_server/commit/510f11d3745e14841420781376b733fd248d51f3)
it creates conflict using a other crawlprofile without setting originator.
Changed Files: source/net/yacy/document/importer/WarcImporter.java
Thu May 18 00:28:00 CEST 2017
by Michael Peter Christen
added a cache to prevent too many seed enumerations
Changed Files: source/net/yacy/peers/Seed.java, source/net/yacy/peers/SeedDB.java
Wed May 17 09:00:29 CEST 2017
by luccioman
Enable p2p and cluster communication when "Protection of all pages" on

As reported by paul89 on YaCy forum
(http://forum.yacy-websuche.de/viewtopic.php?f=23&t=5958 ), when setting
the "Protection of all pages" to "On" in the "ConfigAccounts_p.html"
page, the peer became completely unreachable by others, which is not the
purpose of this feature.
But the restriction still makes sense as a security enforcement and is
maintained in private "Robinson mode" where by the way any peer-to-peer
or cluster communication would be rejected.
Changed Files: source/net/yacy/http/Jetty9YaCySecurityHandler.java
Tue May 16 09:44:13 CEST 2017
by luccioman
Added missing accessibility attributes on search results progress bar.
Changed Files: htroot/js/yacysearch.js, htroot/yacysearch.html
Mon May 15 13:31:24 CEST 2017
by luccioman
Annotated search result information separators for screen readers.
Changed Files: htroot/ConfigSearchPage_p.html, htroot/yacysearchitem.html
Sat May 13 20:38:25 CEST 2017
by sgaebel
added closing of lst-Tag in solr-Export
Changed Files: source/net/yacy/search/index/Fulltext.java
Thu May 11 08:33:19 CEST 2017
by luccioman
Added some JavaDoc
Changed Files: source/net/yacy/peers/RemoteSearch.java
Tue May 09 22:52:54 CEST 2017
by reger
Adjust mergeDocuments to keep youngest last-modified date of document
Changed Files: source/net/yacy/document/Document.java, test/java/net/yacy/document/DocumentTest.java
Tue May 09 18:32:47 CEST 2017
by luccioman
Fixed StringIndexOutOfBoundsException case.

Revealed by commit c77e43a : the exception was then thrown when indexing
pages containing mailto: scheme URL links with the Solr Webgraph core
Fixed the error case and restored filtering on mailto links in
Document.resortLinks() as these URLs still should not appear in
Changed Files: source/net/yacy/document/Document.java, source/net/yacy/search/schema/WebgraphConfiguration.java
Tue May 09 12:20:41 CEST 2017
by luccioman
Updated Debian package post install script admin password encoding.

To fit the now default HTTP authentication method set to Digest in
commit f7fce1b.
Also fixed unauthenticated access from localhost setting when first
installing the Debian package and letting the prompted password field
Changed Files: debian/postinst
Thu May 04 16:36:45 CEST 2017
by luccioman
Improved new blacklist entries URL scheme detection.
Changed Files: source/net/yacy/repository/BlacklistHelper.java, test/java/net/yacy/repository/BlacklistHelperTest.java
Thu May 04 11:21:27 CEST 2017
by luccioman
Updated putHTML() JavaDoc
Changed Files: source/net/yacy/server/serverObjects.java
Thu May 04 11:19:59 CEST 2017
by luccioman
Handle '?' and '+' chars as valid wild cards when adding to blacklist.

An entry such as "domain.com/[a-z]+" is a valid regular expression and
do not need additional ".*.*/.*" wildcards.
Changed Files: source/net/yacy/repository/BlacklistHelper.java
Thu May 04 11:12:58 CEST 2017
by luccioman
Fixed blacklist Regex containing '+' characters rendering.

As reported on YaCy forum by shni
(http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5970) when a
blacklist entry contained both '?' and '+' characters, the '+' chars
were wrongly decoded and rendered as spaces.
Changed Files: htroot/Blacklist_p.java
Wed May 03 18:53:01 CEST 2017
by luccioman
Added MediaWiki dump import scheduling feature.

Checking the last modified date by default to prevent unnecessary long
running operations.
Changed Files: htroot/IndexImportMediawiki_p.html, htroot/IndexImportMediawiki_p.java, source/net/yacy/data/WorkTables.java
Tue May 02 09:38:45 CEST 2017
by luccioman
Improved MediaWiki dump import monitoring.

When import thread is terminated :
 - now stop refreshing and stay on the monitoring page to give user a
feedback after a long running import
 - added link to the next monitoring step : results from surrogates
 - added link to new import
On the new import page, added a link on the eventual last import report.
Changed Files: htroot/IndexImportMediawiki_p.html, htroot/IndexImportMediawiki_p.java
Tue May 02 09:33:11 CEST 2017
by luccioman
Added some JavaDoc
Changed Files: source/net/yacy/document/importer/Importer.java
Tue May 02 09:32:04 CEST 2017
by luccioman
Fixed regression introduced by commit 9ad4d16

On MediaWiki dump imports, the SurrogateReader was trying to unread too
many bytes, then failing with the following exception :
"java.io.IOException: Push back buffer is full".
Changed Files: source/net/yacy/document/content/SurrogateReader.java
Mon May 01 11:38:02 CEST 2017
by Michael Peter Christen
added patch to rewrite altered yacy grid schema into yacy schema

This generates the stub and protocol parts of an url for inboundlinks,
outboundlinks and images
Changed Files: source/net/yacy/search/Switchboard.java
Sun Apr 30 23:53:52 CEST 2017
by reger
Add a responsHeader to the solr index export with a format identifier
and export parameter (in accordance with response xml format) for easier
format detection on import.
Changed Files: source/net/yacy/document/content/DCEntry.java, source/net/yacy/document/content/SurrogateReader.java, source/net/yacy/search/index/Fulltext.java
Fri Apr 28 11:39:51 CEST 2017
by luccioman
Fixed Index Export feature for compatibility with old indexed documents.

This is a fix for mantis 682 (http://mantis.tokeek.de/view.php?id=682)
and issue #116
Changed Files: source/net/yacy/search/index/Fulltext.java
Fri Apr 28 11:36:48 CEST 2017
by luccioman
Added some JavaDoc
Changed Files: source/net/yacy/cora/federate/solr/SchemaDeclaration.java
Thu Apr 27 18:24:54 CEST 2017
by luccioman
Crawl results page : apply table lines number limit.

Take into account the already existing default limit value (especially
useful after a long crawl or surrogates import), or a custom one from
parameter "count".
Added a "Show all" link for convenience.
Changed Files: htroot/CrawlResults.html, htroot/CrawlResults.java
Thu Apr 27 09:50:04 CEST 2017
by luccioman
Extended WikiCode template inclusion syntax support.

Wiki templates are not rendered but syntax support is improved, which
greatly enhance snippets rendering on search results coming from a
MediaWiki dump import.
Tested on various dumps from Wikimedia at
See also Wikipedia transclusion documentation at
Changed Files: source/net/yacy/data/wiki/WikiCode.java, test/java/net/yacy/data/wiki/WikiCodeTest.java
Tue Apr 25 08:44:02 CEST 2017
by Michael Peter Christen
added yacy grid flatjson surrogate parser
Changed Files: source/net/yacy/search/Switchboard.java, source/net/yacy/search/schema/CollectionSchema.java
Mon Apr 24 18:24:26 CEST 2017
by luccioman
Fixed surrogates import monitoring page (/CrawlResults.html?process=7)

This page was always empty, as described in mantis 740
Changed Files: source/net/yacy/crawler/retrieval/Response.java, source/net/yacy/search/Switchboard.java
Sat Apr 22 23:32:40 CEST 2017
by reger
upd to jwat-1.0.5
Changed Files: .classpath, build.xml, lib/jwat-archive-common-1.0.5.jar, lib/jwat-common-1.0.5.jar, lib/jwat-gzip-1.0.5.jar, lib/jwat-warc-1.0.5.jar, pom.xml
Thu Apr 20 00:47:52 CEST 2017
by reger
fix unit test MultiProtocolURL(file) assertion for Windows path with
drive letter.
Changed Files: test/java/net/yacy/cora/document/id/MultiProtocolURLTest.java
Thu Apr 20 00:18:18 CEST 2017
by reger
Take out mailto collect in internal parsed document
As earlier plans to make use of mailto as separate webgraph entity didn't
materialize (see  http://forum.yacy-websuche.de/viewtopic.php?f=8&t=5726&p=32493&hilit=mailto#p32493)
free the unused handling and resources.
Changed Files: htroot/ViewFile.java, source/net/yacy/document/Document.java
Sun Apr 16 04:25:29 CEST 2017
by reger
Add url input field as source for WarcImporter
allowing to import warc from url without prior download.
Changed Files: htroot/IndexImportWarc_p.html, htroot/IndexImportWarc_p.java, source/net/yacy/document/importer/WarcImporter.java
Fri Apr 14 14:23:50 CEST 2017
by luccioman
Improved http client close time on stream processing errors.
Changed Files: source/net/yacy/cora/protocol/http/HTTPClient.java
Wed Apr 12 17:17:03 CEST 2017
by luccioman
Fixed endless loop case in wikicode processing.

Detected when importing recent MediaWiki dumps containing some pages
with script content in plain text format (see Scribunto extension
https://www.mediawiki.org/wiki/Extension:Scribunto ).

Further improvement : modify the MediawikiImporter to prevent processing
revisions whose <model> is not wikitext.
Changed Files: source/net/yacy/data/wiki/WikiCode.java, test/java/net/yacy/data/wiki/WikiCodeTest.java
Wed Apr 12 09:23:10 CEST 2017
by luccioman
Improved support for non ASCII chars in local file system URLs

Creating a MultiProtocolURL instance from a File object and then
retrieving a File with getFSFile() was inconsistent with file paths
containing space or non ASCII chars. 
Changed Files: source/net/yacy/cora/document/id/MultiProtocolURL.java, test/java/net/yacy/cora/document/id/MultiProtocolURLTest.java
Tue Apr 11 08:21:34 CEST 2017
by luccioman
Improved error reports on various wiki dump prerequisites failure cases.

Also added some JavaDoc.
Changed Files: htroot/IndexImportMediawiki_p.html, htroot/IndexImportMediawiki_p.java
Tue Apr 11 07:34:17 CEST 2017
by luccioman
Used a text input for wiki dump import file selection.

Using an HTML "file" input was confusing (as reported by promocore on
YaCy forum : http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5965) ,
and it only worked with MS IE/Edge on a local YaCy peer :
 - for security reasons some current major browsers such as Firefox or
Chrome do not allow to send full file path information when using a file
form input
 - the local file system selection popup doesn't make sense when you
want to import a dump on a remote YaCy server
Changed Files: htroot/IndexImportMediawiki_p.html
Mon Apr 10 22:58:20 CEST 2017
by reger
Adjust ConfigSearchPage_p to activated hosts navigator as plugin
Changed Files: htroot/ConfigSearchPage_p.html, htroot/ConfigSearchPage_p.java
Mon Apr 10 22:42:06 CEST 2017
by reger
Activate hosts navigator plugin. This includes rwi results in the navigator
This might be tangential related to http://mantis.tokeek.de/view.php?id=736
as the example includes a local index search, while rwi results are not
Changed Files: htroot/yacysearchtrailer.html, htroot/yacysearchtrailer.java, htroot/yacysearchtrailer.json, htroot/yacysearchtrailer.xml, source/net/yacy/search/navigator/NavigatorPlugins.java, source/net/yacy/search/query/QueryModifier.java, source/net/yacy/search/query/SearchEvent.java
Sun Apr 09 21:42:05 CEST 2017
by reger
add missing text from ConfigRobotsTxt_p to master.lng
and link to Translation Editor to Translation News page.
Changed Files: htroot/TransNews_p.html, locales/master.lng.xlf
Sun Apr 09 02:09:32 CEST 2017
by reger
add servlet to list user in UserDB and made user editor available in
separate servlet for a quick and easy overview of configured user and
selection for edit.
Changed Files: htroot/ConfigAccountList_p.html, htroot/ConfigAccountList_p.java, htroot/ConfigAccounts_p.html, htroot/ConfigAccounts_p.java, htroot/ConfigUser_p.html, htroot/ConfigUser_p.java
Sat Apr 08 22:54:57 CEST 2017
by reger
fix edit current user form to required post mehtod 
introduced with https://github.com/yacy/yacy_search_server/commit/cde237b68763c542da20038e5f62bea341ae1d37
Changed Files: htroot/ConfigAccounts_p.html, htroot/ConfigAccounts_p.java
Fri Apr 07 09:15:05 CEST 2017
by Michael Peter Christen
added flatjson parser (stub, unfinished)
Changed Files: source/net/yacy/search/Switchboard.java
Wed Apr 05 00:08:25 CEST 2017
by reger
Introduce a Keyword search navigator using the index field keywords.
The keywords field string is split into words as navigator entries.

A keyword navigator facet is essential for search appliance usage were
documents and metadata use often specialized keyword vocabularies to 
filter search results. This navi can be used without custom index schema.

As we don't have defined a search query command to filter "keywords" yet,
the filtering is limited by adding the keyword to the search query.
Changed Files: source/net/yacy/search/navigator/NavigatorPlugins.java, source/net/yacy/search/navigator/TokenizedStringNavigator.java
Mon Apr 03 22:53:07 CEST 2017
by reger
add CookieTest_p.html text to master.lng
Changed Files: locales/master.lng.xlf
Mon Apr 03 12:20:16 CEST 2017
by luccioman
Enforced access controls on a few more administration pages.

 - ensure use of HTTP POST method when performing server side effect
 - transaction token required to ensure the request has effectively been
requested by user interaction
Changed Files: htroot/ConfigPortal_p.html, htroot/ConfigPortal_p.java, htroot/Table_API_p.html, htroot/Table_API_p.java, htroot/Translator_p.html, htroot/Translator_p.java
Mon Apr 03 11:40:37 CEST 2017
by luccioman
Escaped HTML eventually active content from recorded API call comments.
Changed Files: htroot/Table_API_p.java
Sun Apr 02 22:30:23 CEST 2017
by reger
update master.lng with recent text changes 
to IndexExport_p.html, IndexImportWarc_p.html
Changed Files: locales/master.lng.xlf
Sun Apr 02 20:36:22 CEST 2017
by reger
use css error class for error msg in IndexImportOAIPMH_p.html,
adjust to xhtml <p> usage rule
Changed Files: htroot/IndexImportOAIPMH_p.html
Sun Apr 02 03:59:37 CEST 2017
by reger
remove test case for Standard_MemoryControl which will always fail
see https://github.com/yacy/yacy_search_server/pull/114
Changed Files:
Sun Apr 02 03:32:21 CEST 2017
by reger
Add servlet to import warc file from filesystem IndexImportWarc_p.html.
Apply Importer interface to WarcImporter
Changed Files: htroot/IndexImportWarc_p.html, htroot/IndexImportWarc_p.java, htroot/env/templates/submenuIndexImport.template, source/net/yacy/document/importer/WarcImporter.java, source/net/yacy/search/Switchboard.java
Sat Apr 01 01:04:17 CEST 2017
by Michael Peter Christen
added export to elasticsearch. The export dump can easily be imported to
elasticsearch using the command
curl -XPOST localhost:9200/collection1/yacy/_bulk --data-binary
Changed Files: htroot/IndexExport_p.html, htroot/IndexExport_p.java, source/net/yacy/cora/federate/solr/responsewriter/FlatJSONResponseWriter.java, source/net/yacy/search/index/Fulltext.java
Thu Mar 30 16:14:22 CEST 2017
by luccioman
URL Viewer : only display the link to metadata when metadata exists
Changed Files: htroot/ViewFile.html, htroot/ViewFile.java
Thu Mar 30 10:23:47 CEST 2017
by luccioman
Modified RWI settings page radio click event to use HTTP POST
Changed Files: htroot/IndexControlRWIs_p.html, locales/de.lng, locales/master.lng.xlf, locales/ru.lng, locales/uk.lng
Thu Mar 30 09:22:28 CEST 2017
by luccioman
Updated API calls recording/replay with recent changes.

 - enabled HTTP POST calls with Digest HTTP authentication
 - made API calls compatible with API newly restricted to HTTP POST only
with transaction token validation
 - ensured backward compatibility with older entries recorded as HTTP
Changed Files: htroot/CrawlStartScanner_p.java, source/net/yacy/data/WorkTables.java
Sun Mar 26 23:52:31 CEST 2017
by reger
fix default/httpd.mime Z file extension to lower case
+ test case
Changed Files: defaults/httpd.mime, test/java/net/yacy/cora/document/analysis/ClassificationTest.java
Sun Mar 26 23:26:40 CEST 2017
by reger
remove seedlist bootstrap target (not working for some longer time)
Changed Files: defaults/yacy.network.freeworld.unit
Sun Mar 26 23:13:12 CEST 2017
by reger
Add label text for search word statistic (AccessTracker_p.html) to master
lng file
Changed Files: locales/master.lng.xlf
Sun Mar 26 20:05:48 CEST 2017
by reger
One more use of SwitchboardConstants.SERVER_PORT constant,
apply standard servlet design pattern initialization of solrselectservlet 
Changed Files: source/net/yacy/http/servlets/SolrSelectServlet.java, source/net/yacy/http/servlets/YaCyDefaultServlet.java
Sun Mar 26 11:29:04 CEST 2017
by luccioman
Extended Apache HTTP Digest Auth. for use of YaCy encoded password

When programmatically requesting the local peer with Apache http client,
authentication credentials must be passed as clear-text values. 
This extension to the apache org.apache.http.impl.auth.DigestScheme
permits use of the YaCy encoded password stored in the
adminAccountBase64MD5 configuration property.
Changed Files: source/net/yacy/cora/protocol/http/HTTPClient.java, source/net/yacy/cora/protocol/http/auth/HttpEntityDigester.java, source/net/yacy/cora/protocol/http/auth/YaCyDigestScheme.java, source/net/yacy/cora/protocol/http/auth/YaCyDigestSchemeFactory.java
Sun Mar 26 10:59:04 CEST 2017
by luccioman
Updated dump/restore shell scripts : the API is now IndexExport_p.html
Changed Files: bin/indexdump.sh, bin/indexrestore.sh
Tue Mar 21 01:16:16 CET 2017
by reger
Update master lng file with added text in Settings_ServerAccess
remove outdated file entry in fr.lng & sk.lng
Changed Files: README.md, locales/fr.lng, locales/master.lng.xlf, locales/sk.lng
Mon Mar 20 02:33:21 CET 2017
by reger
Add hint how to build with maven (for the first time) to readme
Changed Files: README.md
Sun Mar 19 21:45:33 CET 2017
by reger
Add hint text to default ServerAcess Port Settings page
Changed Files: htroot/Settings_ServerAccess.inc
Sun Mar 19 07:12:35 CET 2017
by reger
Display the local search word statistic in alphabetic order
Changed Files: htroot/AccessTracker_p.java, source/net/yacy/cora/sorting/OrderedScoreMap.java
Sat Mar 18 20:32:53 CET 2017
by reger
upd to slf4j-1.7.24.jar
Changed Files: .classpath, build.xml, lib/jcl-over-slf4j-1.7.24.jar, lib/log4j-over-slf4j-1.7.24.jar, lib/slf4j-api-1.7.24.jar, lib/slf4j-jdk14-1.7.24.jar, pom.xml
Sat Mar 18 20:06:58 CET 2017
by reger
upd to icu4j-58_2.jar
Changed Files: .classpath, build.xml, lib/icu4j-58_2.jar, pom.xml
Fri Mar 17 02:19:33 CET 2017
by reger
update to jsoup-1.10.2.jar
Changed Files: .classpath, build.xml, lib/jsoup-1.10.2.jar, pom.xml
Fri Mar 17 02:07:02 CET 2017
by reger
update to jsch-0.1.54.jar
Changed Files: .classpath, build.xml, lib/jsch-0.1.54.License, lib/jsch-0.1.54.jar, pom.xml
Wed Mar 15 22:36:53 CET 2017
by reger
update translation for ConfigNetwork_p.html
Changed Files: htroot/ConfigNetwork_p.html, locales/de.lng, locales/master.lng.xlf
Wed Mar 15 01:39:15 CET 2017
by reger
make digest default authentication in defaults/web.xml
Changed Files: defaults/web.xml
Mon Mar 13 03:08:44 CET 2017
by reger
remove double occuance of geo:lat in rss tokens
Changed Files: source/net/yacy/cora/document/feed/RSSMessage.java
Mon Mar 13 00:34:40 CET 2017
by reger
upd to metadata-extractor-2.10.1.jar
Changed Files: .classpath, build.xml, lib/metadata-extractor-2.10.1.License, lib/metadata-extractor-2.10.1.jar, pom.xml
Sun Mar 12 01:54:56 CET 2017
by reger
implement RequestHeader getRequestURI, getRequestURL for legacy request
Changed Files: source/net/yacy/cora/protocol/RequestHeader.java
Thu Mar 09 22:57:51 CET 2017
by reger
remove unused import pdfParser
Changed Files: source/net/yacy/document/parser/pdfParser.java
Thu Mar 09 22:56:33 CET 2017
by reger
Improve pdf text extraction resource handling.
For sort pdf <= 3 pages use already extracted content,
only for long pdf > 3 pages reassign content and close internal writer (to direct free buffers)
Changed Files: source/net/yacy/document/parser/pdfParser.java
Thu Mar 09 22:50:19 CET 2017
by reger
upd to pdfbox-2.0.4.jar
Changed Files: .classpath, build.xml, lib/fontbox-2.0.4.License, lib/fontbox-2.0.4.jar, lib/pdfbox-2.0.4.License, lib/pdfbox-2.0.4.jar, pom.xml
Thu Mar 09 01:42:36 CET 2017
by reger
eliminate some compiler unchecked and deprecation warnings
in nav plugins by explicite type declaration and replacing date.getYear
with Calendar.get
Changed Files: source/net/yacy/search/navigator/NavigatorPlugins.java, source/net/yacy/search/navigator/YearNavigator.java
Wed Mar 08 22:35:48 CET 2017
by reger
upd to httpclient v4.5.3
Changed Files: .classpath, build.xml, lib/httpclient-4.5.3.jar, lib/httpcore-4.4.6.License, lib/httpcore-4.4.6.jar, lib/httpmime-4.5.3.jar, pom.xml
Wed Mar 08 10:27:18 CET 2017
by luccioman
Fixed unresolved pattern case in search results progress bar.

This is a fix for mantis 715 (http://mantis.tokeek.de/view.php?id=715).

A possible path scenario that could leading to this case :
 - YaCy is running low in memory
 - a search is requested
 - before the end of search results rendering, the cleanup job runs and
deletes the running search event from the cache because of short memory
 - then yacysearchitem renders with "-UNRESOLVED_PATTERN-" parameter
values passed to the statistics() JavaScript function
Changed Files: htroot/yacysearchitem.html, htroot/yacysearchitem.java
Sun Mar 05 02:26:10 CET 2017
by reger
Extend DCEntry.getLanguage convert to ISO639-1 codes for more languages
by using icu.ULocale for languages not already covered (ICU normalizes 
to ISO639-1 2 char codes).
Add test class
Use DublinCore vocabulary declarations in DCEntry and SurrogateReader 
for easier usage debugging, 
Init SurrogateReader.inputSource on first use.

Changed Files: source/net/yacy/document/content/DCEntry.java, source/net/yacy/document/content/SurrogateReader.java, test/java/net/yacy/document/content/DCEntryTest.java
Sat Mar 04 22:45:17 CET 2017
by reger
further avoid to set connect info properties as header value
following comment "use of properties as header values is discouraged"
in case where (proxy)HTTPClient overwrites values with supplied url.
Use defined request.referer procedure in response class.
Changed Files: source/net/yacy/crawler/retrieval/Response.java, source/net/yacy/http/servlets/UrlProxyServlet.java, source/net/yacy/http/servlets/YaCyProxyServlet.java, source/net/yacy/server/http/HTTPDProxyHandler.java
Sat Mar 04 19:41:31 CET 2017
by reger
use pre-defined "Connection" header key, replace depreceated
Changed Files: source/net/yacy/cora/federate/solr/instance/RemoteInstance.java, source/net/yacy/cora/protocol/http/HTTPClient.java
Fri Mar 03 12:05:30 CET 2017
by luccioman
Added an advanced settings page for referrer policy settings.

Feedback will be welcome, notably on the descriptive content of this
Changed Files: htroot/SettingsAck_p.html, htroot/SettingsAck_p.java, htroot/Settings_Referrer.inc, htroot/Settings_p.html, htroot/Settings_p.java, source/net/yacy/http/ReferrerPolicy.java, source/net/yacy/http/servlets/YaCyDefaultServlet.java, source/net/yacy/search/SwitchboardConstants.java
Fri Mar 03 00:21:56 CET 2017
by reger
fix proxyservlet response url to respect http scheme if a relative 
Location header is returned.
Changed Files: source/net/yacy/http/servlets/UrlProxyServlet.java, source/net/yacy/http/servlets/YaCyProxyServlet.java
Wed Mar 01 09:43:00 CET 2017
by luccioman
Updated Archive-It heuristics URL.

The archive-it OpenSearch URL requested without restriction on
collections ("i" parameter) almost always ends up with timeout or fails.
Changed Files: defaults/heuristicopensearch.conf
Mon Feb 27 23:00:46 CET 2017
by reger
fixed ReindexSolrBusyThread new and unexpected repeat of same query with
low number of found documents - by adding additional end condition to 
remove processed query with number of found docs <= process-chunck-size.

Noticed on query h4_txt:[* TO *], found 21, process 21, call of commit happend
but on next cycle same query again 21 docs found (while h4_txt was removed 
from schema and committed inputdocuments).
Changed Files: source/net/yacy/search/index/ReindexSolrBusyThread.java
Mon Feb 27 01:04:31 CET 2017
by reger
fix delta time calculation in PerformanceSearch_p for the 1. entry
(INITIALIZATION displayed absolute date, set delta to 0 for 1. entry)
Changed Files: htroot/PerformanceSearch_p.java
Sun Feb 26 11:03:15 CET 2017
by luccioman
Fixed datacite.org heuristics base url.

The datacite Solr search http URL was returning http status 301 in order
to redirect to its https version, thus making that YaCy heuristic always
Changed Files: defaults/federatecfg/datacite.solr.schema
Sun Feb 26 02:39:52 CET 2017
by reger
Adjust DefaultServlet test case to recent change,
depreciate unused CONNECTION_PROP_PROTOCOL (also as it might be 
misleading with getProtocol vs getScheme)
Changed Files: source/net/yacy/cora/protocol/HeaderFramework.java, source/net/yacy/cora/protocol/RequestHeader.java, test/java/net/yacy/http/servlets/YaCyDefaultServletTest.java
Sat Feb 25 23:55:17 CET 2017
by reger
Fix call parameter for ConnectionInfo in MonitorHandler
(expected scheme e.g. http, was protocol version).
Depreceate obsolete custom X-...-Scheme header constant.
Use existing FORMAT_ANSIC Dateformatter in HeaderFramework.
Correct htmlParserTest (del one not intended println)
Changed Files: source/net/yacy/cora/protocol/HeaderFramework.java, source/net/yacy/cora/protocol/RequestHeader.java, source/net/yacy/http/MonitorHandler.java, source/net/yacy/http/servlets/YaCyDefaultServlet.java, test/java/net/yacy/document/parser/htmlParserTest.java
Fri Feb 24 11:09:42 CET 2017
by luccioman
Added a hint title for required fields in the Solr Schema editor
Changed Files: htroot/IndexSchema_p.html
Fri Feb 24 11:08:18 CET 2017
by luccioman
Switched a few more Solr fields from strictly mandatory to optional
Changed Files: defaults/solr.collection.schema, source/net/yacy/search/schema/CollectionSchema.java
Fri Feb 24 01:25:32 CET 2017
by reger
fix htmlParser <script> text extraction on code containing expression
recognized as tag like 1<a
reported in https://github.com/yacy/yacy_search_server/issues/109

Script content is ignored by default, but the text is filtered for html
tags. Modified scraper to skip tag filtering while within a <script> 
section (until a closing tag is detected </script>. 
Possible side effect, missing </script> end-tag will truncate trailing 
content text.
Changed Files: source/net/yacy/document/parser/html/ContentScraper.java, source/net/yacy/document/parser/html/TransformerWriter.java, test/java/net/yacy/document/parser/htmlParserTest.java
Thu Feb 23 11:09:43 CET 2017
by luccioman
Improved MultiprocotolURL non ASCII characters support.

After @sinkuu Pull Request #108 added JUnit tests, updated some JavaDoc
and also improved URL tokenization to support non ASCII characters.
Changed Files: source/net/yacy/cora/document/id/MultiProtocolURL.java, test/java/net/yacy/cora/document/id/MultiProtocolURLTest.java
Thu Feb 23 07:52:55 CET 2017
by luccioman
Merge pull request #110 from goofy-bz/patch-1

Fixing some typos
Changed Files: locales/fr.lng
Thu Feb 23 01:13:31 CET 2017
by goofy-bz
Fixing some typos

up to line #1000 only
Changed Files: locales/fr.lng
Thu Feb 23 00:27:56 CET 2017
by reger
Correct dublincore title property text to lowercase in htmlresponsewriter,
remove unused (carry over) local variable
Do the same for other responsewriter.
Changed Files: source/net/yacy/cora/federate/solr/responsewriter/EnhancedXMLResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/OpensearchResponseWriter.java
Wed Feb 22 02:01:48 CET 2017
by Burkhard
Update SearchEvent.java

Fix NPE on disabled local SolrIndex, occuring on search moving to the 2nd result page.
The debug purpose only setting to disabeling local SolrIndex (System Admin -> Debug Settings) should long term probably be removed from production code.
Changed Files: source/net/yacy/search/query/SearchEvent.java
Tue Feb 21 22:59:11 CET 2017
by luccioman
Switched some Solr fields from mandatory to optional

These fields are default enabled but with no doubt not strictly
mandatory with the current code base.

As reported by @reger24, splitting between essential mandatory and
optional fields is still to be improved to reflect the current YaCy
Changed Files: defaults/solr.collection.schema, source/net/yacy/search/schema/CollectionSchema.java
Mon Feb 20 23:27:33 CET 2017
by reger
Add extract of queries.log in form of top search word cloud (last 7 days)
to AccessTracker_p.html (Network Access -> Local Search Log page).
It displays top 20 words of search queries.
Changed Files: htroot/AccessTracker_p.html, htroot/AccessTracker_p.java
Mon Feb 20 00:14:14 CET 2017
by reger
correct fromDate init value on missing param in api/timeline_p servlet
revert test modification from last commit in AccessTracker.main
Changed Files: htroot/api/timeline_p.java, source/net/yacy/search/query/AccessTracker.java
Sun Feb 19 05:23:17 CET 2017
by reger
add hint of query syntax in AccessTracker log (qs=normal querystring,
sq=solr-querystring) to allow to filter simple text queries for processing,
remove toString for counter parameter
use more predefined constants in solrservlet
Changed Files: source/net/yacy/http/servlets/GSAsearchServlet.java, source/net/yacy/http/servlets/SolrSelectServlet.java, source/net/yacy/search/query/AccessTracker.java
Fri Feb 17 11:09:30 CET 2017
by luccioman
Fixed a NullPointerException case possible on Index Export

As reported by Palulukas in YaCy forum
the Index Export operation can fails, notably when the Solr index
contains one or more documents with empty (despite required)
"load_date_dt" field.

This fixes the export failure when the situation finally occurs, but
more should be done to harden verifications on minimum required fields.
Changed Files: source/net/yacy/search/index/Fulltext.java
Thu Feb 16 01:43:14 CET 2017
by reger
Reduce self generated content for text_t (visible text index field) 
to avoid repeat of tokenized url as description,
continuation of https://github.com/yacy/yacy_search_server/commit/7e09bff4a1a117d2f2336e004ec67ffb325a7e9d
Add some javadoc, and not needed remove of omitted fields in postprocessing.
Changed Files: source/net/yacy/search/schema/CollectionConfiguration.java
Wed Feb 15 23:26:54 CET 2017
by reger
removed faroo news from default opensearch config
As @luccioman informed, it's only useable with a free api key
Changed Files: defaults/heuristicopensearch.conf
Wed Feb 15 15:04:40 CET 2017
by luccioman
Added robots.txt support for heuristics federated search.

As noticed by @reger24, abusive use of OpenSearch systems should be
prevented, especially if allowing to parse and reuse HTML results.
robots.txt file is now checked before requesting an external OpenSearch
system to respect the host exclusions and eventual crawl-delay value.
The check is also performed when trying to add a new OpenSearch URL
template through the /ConfigHeuristics_p.html admin page.
Changed Files: htroot/ConfigHeuristics_p.java, source/net/yacy/cora/federate/FederateSearchManager.java
Sat Feb 11 08:10:14 CET 2017
by sinkuu
Use java.net.URLDecoder
Changed Files: source/net/yacy/cora/document/id/MultiProtocolURL.java
Tue Feb 14 02:30:26 CET 2017
by reger
adjust translation to renamed configparser_p.html
Changed Files: locales/cn.lng, locales/de.lng, locales/hi.lng, locales/ja.lng, locales/master.lng.xlf, locales/ru.lng, locales/uk.lng
Tue Feb 14 02:04:42 CET 2017
by reger
make ConfigParser a protected page, for consistent behavior of locked
menu items.
Changed Files: htroot/ConfigParser_p.html, htroot/ConfigParser_p.java, htroot/env/templates/submenuCrawler.template
Tue Feb 14 00:31:32 CET 2017
by reger
update opensearch conf - remove suche.sueddeutsche.de
apparently they've revoked the participation in opensearch initiative.
Changed Files: defaults/heuristicopensearch.conf
Fri Feb 10 09:40:42 CET 2017
by luccioman
Upgraded Apache Ant to 1.10.1 in the Docker alpine flavor image

For a more reliable Docker image build, also switched to the ant archive
repository to fetch the needed binary as other repositories only provide
the latest versions.
Changed Files: docker/Dockerfile.alpine
Thu Feb 09 16:42:21 CET 2017
by luccioman
Replaced absolute redirection locations by relative ones when possible.

This makes integration of YaCy behind a reverse proxy subfolder easier.
Changed Files: htroot/Blacklist_p.java, htroot/Status.java, htroot/Wiki.java, source/net/yacy/repository/BlacklistHelper.java
Mon Feb 06 12:41:24 CET 2017
by luccioman
Improved termination of timed out remote solr requests to peers.

On timeout, closing remote Solr requests is proper than simply using
Thread.interrupt() that is not effective in most cases. Closing does not
ask commit on remote solr, but release http connections resources and is
more likely to end those threads that can else wait indefinitely.

Other related improvements included :
 - no more marking remote peer as not available when remote search is
interrupted before timeout by the cleanup job.
 - added a short fine log level trace of failing remote solr requests
Changed Files: source/net/yacy/peers/Protocol.java
Fri Feb 03 10:32:31 CET 2017
by luccioman
Removed deprecated "localMissCount" prop from yacysearchlatestinfo.json.

This property has been deprecated four years ago by commit
d74472f5625ff097e7541e1a56156cbe487b2651. For any active search event
id, it was then always filled with "-UNRESOLVED_PATTERN-".
Changed Files: htroot/yacysearchlatestinfo.java, htroot/yacysearchlatestinfo.json
Fri Feb 03 09:55:08 CET 2017
by luccioman
Named a Thread without name for easier monitoring
Changed Files: source/net/yacy/search/query/SearchEvent.java
Fri Feb 03 09:54:29 CET 2017
by luccioman
Distinguished solr connectors thread names for easier monitoring.
Changed Files: source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java, source/net/yacy/cora/federate/solr/connector/RemoteSolrConnector.java
Wed Feb 01 18:44:42 CET 2017
by luccioman
Refactored the DHT-Trigger section in Performance_p.html page.

This is to be more easily understandable and to reflect more accurately
the current memory strategies implementations that eventually set the
"proper" state not only because DHT reception.
Changed Files: htroot/Performance_p.html, locales/cn.lng, locales/de.lng, locales/fr.lng, locales/master.lng.xlf, locales/ru.lng, locales/uk.lng
Tue Jan 31 16:33:17 CET 2017
by luccioman
Updated French translation for the /Performance_p.html page.

Also updated the master xliff file with missing recent changes.
Changed Files: locales/fr.lng, locales/master.lng.xlf
Tue Jan 31 09:20:19 CET 2017
by luccioman
Fixed unresolved pattern on directory entries in HostBrowser.xml api.

As described in mantis 725 (http://mantis.tokeek.de/view.php?id=725) the
HostBrowser.xml api directory entries had incorrect count attribute
This was because the HostBrowser html page and backing template servlet
evolved, but modifications were not reported on the xml api.
Changed Files: htroot/HostBrowser.xml
Mon Jan 30 22:44:28 CET 2017
by reger
adjust column layout in Settings_Proxy.inc
Changed Files: htroot/Settings_Proxy.inc
Sat Jan 28 10:19:39 CET 2017
by luccioman
Added a CSS class for infobox block.

This will prevent mistakenly hiding a div element not designed to be an
infobox but having a ".info" parent (After having previously added the
possibility for a div - and not only a span element - to be an infobox).
Changed Files: htroot/Performance_p.html, htroot/env/base.css
Sat Jan 28 01:13:57 CET 2017
by reger
Update language file de & master, remove obsolete "Augmented Browsing"
Changed Files: locales/de.lng, locales/master.lng.xlf
Sat Jan 28 00:36:03 CET 2017
by reger
Add consistency check for related index fields upon load and save of 
index schema.
To assemble the original link url for out-/inboundlinks, icons and pictures
the *_protocol_sxt and *_urlstub_sxt is needed (due to the used data-reduced
storage methode). Auto-enable *_protocol_sxt if *_urlstub_sxt is enabled.
to be able to correctly assemble the original link url.
Changed Files: source/net/yacy/search/schema/CollectionConfiguration.java
Thu Jan 26 23:49:15 CET 2017
by reger
adjust the Field-Reindex Thread to verify and update the document id
in case hash (ID) doesn't match document url (sku field).
Changed Files: source/net/yacy/search/index/ReindexSolrBusyThread.java
Thu Jan 26 06:37:29 CET 2017
by Michael Christen
Merge pull request #98 from Velociraptor85/patch-2

Changed Files: addon/yacyInit.sh
Thu Jan 26 06:29:42 CET 2017
by Michael Christen
Merge pull request #105 from ivar/patch-1

Update README.md - removes deprecated URL
Changed Files: README.md
Thu Jan 26 05:36:48 CET 2017
by Ivar Vasara
Update README.md - removes deprecated URL
Changed Files: README.md
Thu Jan 26 01:13:32 CET 2017
by luccioman
Improved Index Browser accessibility with semantically richer html tags.

Made use of ol, li, thead, th, tbody, h1 and h2 html tags.
Added aria-label attributes to provide alternative textual information
previously only conveyed by color cue.

Tested behavior with NVDA 2016.4 screen reader.
Changed Files: htroot/HostBrowser.html
Wed Jan 25 09:54:39 CET 2017
by luccioman
Fixed local image search pagination regression.

As reported by @tglman on issue #90, when searching images on the local
index only, pages next to the first were always empty. This was a
regression from commit c25e48e969f180dcc3c73863acbfcc383a182c8f.
Changed Files: source/net/yacy/search/query/SearchEvent.java
Tue Jan 24 17:14:14 CET 2017
by luccioman
Updated master xliff file with missing entries for HostBrowser.html.

Also translated lang="en" html attribute to lang="[targetLang]" on
locale files having translated entries for HostBrowser.html
Changed Files: locales/de.lng, locales/fr.lng, locales/master.lng.xlf, locales/ru.lng
Tue Jan 24 15:56:29 CET 2017
by Michael Peter Christen
added dc.date.modified and dc.date.created to date parser
Changed Files: source/net/yacy/document/parser/html/ContentScraper.java
Tue Jan 24 11:38:56 CET 2017
by luccioman
Updated French translation of HostBrowser.html
Changed Files: locales/fr.lng
Tue Jan 24 09:40:43 CET 2017
by luccioman
Fixed Index Browser page HTML validation errors and switched to HTML5.

Also removed deprecated HTML attributes uses.

Validation performed with Nu Html Checker 17.1.0.

Cross browser tested with :
 - Debian Jessie : Firefox ESR 45.6.0
 - MS Windows 10 : Firefox 50.1.0, Chrome 55.0.2883.87, MS Edge
Changed Files: htroot/HostBrowser.html, htroot/HostBrowser.java, htroot/HostBrowserAdmin_p.html
Tue Jan 24 01:51:28 CET 2017
by reger
assure that RWI Index.Segment IODispatcher is not blocking on shudown
waiting on a semaphore permit.
see desc. http://mantis.tokeek.de/view.php?id=723
Changed Files: source/net/yacy/kelondro/rwi/IODispatcher.java
Mon Jan 23 16:05:51 CET 2017
by luccioman
Documented /HostBrowser.html related configuration settings
Changed Files: defaults/yacy.init, htroot/HostBrowser.java
Mon Jan 23 14:49:02 CET 2017
by luccioman
Display Index Browser links requiring auth only when authenticated.

In the /HostBrowser.html page "only hosts with urls pending in the
crawler", "only with load errors" and "Administration Options" all
require administration credentials. But they were displayed even to
unauthenticated users, and clicking them did nothing and returned the
/HostBrowser.html page empty.
Changed Files: htroot/HostBrowser.html, htroot/HostBrowser.java
Sun Jan 22 12:31:14 CET 2017
by luccioman
Fixed display of crawler pending URLs counts in HostBrowser.html page.

As described in mantis 722 (http://mantis.tokeek.de/view.php?id=722)

Also updated some Javadoc.
Changed Files: htroot/HostBrowser.java, source/net/yacy/crawler/Balancer.java, source/net/yacy/crawler/HostBalancer.java, source/net/yacy/crawler/data/NoticedURL.java
Sun Jan 22 12:19:43 CET 2017
by luccioman
Removed temporary test main method commited by mistake. 
Changed Files: htroot/yacysearch.java
Sun Jan 22 00:01:18 CET 2017
by reger
add ukr and pol to DCEntry.getLanguage ISO639-2 3-char language code 
conversion to deliver uk, pl 2-char code
and use if else to return on match
Changed Files: source/net/yacy/document/content/DCEntry.java
Sat Jan 21 01:53:43 CET 2017
by reger
delete outdated and unmaintained Netbeans project
Netbeans has good build-in maven support which is a supported and 
maintained build env, making special and additional NB setting obsolete.
Changed Files:
Fri Jan 20 02:15:11 CET 2017
by reger
upd to commons-compress-1.13.jar
hide external icon on forge logo (was also out of position in IE)
Changed Files: .classpath, build.xml, htroot/Status.html, lib/commons-compress-1.13.License, lib/commons-compress-1.13.jar, pom.xml
Thu Jan 19 12:30:44 CET 2017
by luccioman
Added an optional parameter to webstructure.xml api.

This new "documentStructure" parameter can be set to false to only get
hosts accumulated references on a resource and thus prevent scraping the
specified URL and getting citations references.

Also set WebStructureGraph constants as final and updated the Javadoc
with example api call URLs.  
Changed Files: htroot/api/webstructure.java, source/net/yacy/peers/graphics/WebStructureGraph.java
Tue Jan 17 23:45:56 CET 2017
by reger
remove obsolete lastmodified calculation in WebgraphConfig
Changed Files: source/net/yacy/search/schema/WebgraphConfiguration.java
Tue Jan 17 17:01:56 CET 2017
by luccioman
Updated Javadoc and Junit tests for the WebStructureGraph class.
Changed Files: source/net/yacy/peers/graphics/WebStructureGraph.java, test/java/net/yacy/peers/graphics/WebStructureGraphTest.java
Tue Jan 17 15:59:55 CET 2017
by luccioman
Made sure webstructure.xml API produces valid XML.

Host names should not contain XML special characters such as quotation
mark, but at this stage the WebGraph may have mistakenly recorded a host
name with such characters. What's more the DigestURL constructor does
not prevent this.
By the way using serverObjects.putXML to encode host names we ensure
here the rendered XML is well formed and can be parsed by external tools
even if an structure entry is incorrect.
Changed Files: htroot/api/webstructure.java
Mon Jan 16 18:41:58 CET 2017
by luccioman
Fixed WatchWebStructure_p.html render to include https URLs.

As described in mantis 721 (http://mantis.tokeek.de/view.php?id=721)
WatchWebStructure_p.html failed to include in its structure view https
and other protocols and ports than default http.
Changed Files: htroot/WebStructurePicture_p.java, source/net/yacy/peers/graphics/WebStructureGraph.java, test/java/net/yacy/peers/graphics/WebStructureGraphTest.java
Mon Jan 16 16:41:06 CET 2017
by luccioman
Fixed webstructure.xml API used with a domain name 'about' parameter.

As described in mantis 720 (http://mantis.tokeek.de/view.php?id=720),
when requesting this API with a domain name instead of a complete URL
only HTTP references on default port were listed.
Changed Files: htroot/api/webstructure.java, source/net/yacy/peers/graphics/WebStructureGraph.java, test/java/net/yacy/peers/graphics/WebStructureGraphTest.java
Mon Jan 16 10:18:42 CET 2017
by luccioman
Factored code re-implementing DigestURL.hosthash() method.

This ensure consistent implementation of the url host hash generation
and easier usage finding in source code.

Also added a unit test for this function.
Changed Files: htroot/WebStructurePicture_p.java, source/net/yacy/cora/document/id/DigestURL.java, source/net/yacy/crawler/CrawlStacker.java, source/net/yacy/kelondro/data/meta/URIMetadataNode.java, source/net/yacy/peers/graphics/WebStructureGraph.java, source/net/yacy/search/Switchboard.java, test/java/net/yacy/cora/document/id/DigestURLTest.java
Fri Jan 13 16:10:59 CET 2017
by luccioman
Added automated unit tests and perfs test for WebStructureGraph class.

Fixed references count when multiple links target the same domain name
in one document.
Changed Files: source/net/yacy/peers/graphics/WebStructureGraph.java, test/java/net/yacy/peers/graphics/WebStructureGraphTest.java
Fri Jan 13 16:05:46 CET 2017
by luccioman
Factored common code with DigestURL.hosthash()
Changed Files: htroot/HostBrowser.java, htroot/api/webstructure.java
Thu Jan 12 17:52:47 CET 2017
by luccioman
Detailed some Javadoc related to /api/webstructure.xml usage.
Changed Files: htroot/api/webstructure.java, source/net/yacy/peers/graphics/WebStructureGraph.java
Thu Jan 12 01:36:30 CET 2017
by reger
Start to rename "Augmented Browsing" to "Web Proxy ..." / "View via Proxy"
The augmented Browsing option was reduced to the web proxy functionallity.
Augmented browsing is not available and no known plan exist to reimplement
alteration of result pages with additional information.
Changed Files: htroot/AugmentedBrowsing_p.html, htroot/ConfigSearchPage_p.html, htroot/yacysearchitem.html, locales/de.lng, locales/master.lng.xlf
Mon Jan 09 16:45:31 CET 2017
by luccioman
Ignore generated Javadoc with git SCM.
Changed Files: .gitignore
Sat Jan 07 18:24:29 CET 2017
by reger
fix DC.Elements namespace in DublinCore vocabulary class
delete redundant (unused) DCElements.
Changed Files: source/net/yacy/cora/lod/vocabulary/DublinCore.java
Fri Jan 06 12:24:31 CET 2017
by luccioman
Blacklist import and update performance improvements.

Measurement sample : import from blacklist local file containing about
15000 entries
 - before refactoring : several minutes
 - after refactoring : a few seconds!
Changed Files: htroot/BlacklistCleaner_p.java, htroot/IndexControlRWIs_p.java, htroot/sharedBlacklist_p.java, source/net/yacy/repository/Blacklist.java, source/net/yacy/repository/BlacklistHostAndPath.java
Fri Jan 06 11:23:40 CET 2017
by luccioman
Added some JavaDoc.
Changed Files: htroot/sharedBlacklist_p.java, source/net/yacy/server/serverObjects.java
Fri Jan 06 09:00:28 CET 2017
by luccioman
Display result favicons only for http or https resources.

Favicon display only makes sense for http(s) websites, being public or
intranet. So I modified the favicon conditional display to verify the
result URL protocol rather than if we are in intranet mode.

Also prevented rendering an img HTML tag with empty src on other results
protocols such as ftp or file.

Fixing this thanks to priest2 report
Changed Files: htroot/yacysearchitem.html, htroot/yacysearchitem.java, htroot/yacysearchitem.json
Fri Jan 06 03:01:52 CET 2017
by reger
fix concurrency issue with htmlParser using not current scraper data
resulting in incorrect data for some html index metadata.
Details see http://mantis.tokeek.de/view.php?id=717
Changed Files: source/net/yacy/document/AbstractParser.java, source/net/yacy/document/Document.java, source/net/yacy/document/content/DCEntry.java, source/net/yacy/document/parser/genericParser.java, source/net/yacy/document/parser/htmlParser.java, source/net/yacy/search/schema/CollectionConfiguration.java
Thu Jan 05 14:54:59 CET 2017
by luccioman
Added descriptive titles to Crawler_p.html speed settings.

As reported by bubul
(http://forum.yacy-websuche.de/viewtopic.php?f=23&t=5924) , LF and MH
acronyms meaning were not detailed.
Also added label tags for improved accessibility on these input fields.
Changed Files: htroot/Crawler_p.html
Thu Jan 05 00:24:37 CET 2017
by reger
fix exception on URIMetadataNote instantiation with corrected id hash on
host_id_s. Use Solr setField instead of addField to prevent
java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String
	at net.yacy.kelondro.data.meta.URIMetadataNode.hosthash(URIMetadataNode.java:247)
	at net.yacy.search.query.SearchEvent.addNodes(SearchEvent.java:966)
	at net.yacy.peers.Protocol.solrQuery(Protocol.java:1242)
	at net.yacy.peers.RemoteSearch$2.run(RemoteSearch.java:349)
Changed Files: source/net/yacy/kelondro/data/meta/URIMetadataNode.java
Mon Jan 02 14:23:25 CET 2017
by luccioman
Upgraded Apache Ant to 1.10.0 for the Alpine flavor Docker image. 
Changed Files: docker/Dockerfile.alpine
Mon Jan 02 10:24:17 CET 2017
by luccioman
Adjusted crawl depth control for FTP crawl start URLs.
Changed Files: source/net/yacy/crawler/CrawlStacker.java
Mon Jan 02 03:04:21 CET 2017
by reger
Complete harmonization RequestHeader getCookie with std ServletRequest
to use javax.servlet.http.Cookie parameters.
Depreciate now obsolete getHeaderCookies.
Adjust setting of MaxAge to spec if >= 0 otherwise keep default.
Changed Files: htroot/CookieTest_p.java, htroot/User.java, source/net/yacy/cora/protocol/RequestHeader.java, source/net/yacy/cora/protocol/ResponseHeader.java, source/net/yacy/data/UserDB.java, source/net/yacy/search/Switchboard.java
Sun Jan 01 23:58:38 CET 2017
by reger
On negative result vote also delete document from fulltext index
(not only from dht)
Changed Files: htroot/yacysearch.java
Sun Jan 01 23:54:18 CET 2017
by reger
Merge origin/master
Changed Files: docker/Dockerfile, docker/Dockerfile.alpine, docker/Readme.md, startYACY.sh
Sun Jan 01 23:53:44 CET 2017
by reger
fix of fulltext.remove() by id of webgraph document
webgraph has document hash in source_id_s
Changed Files: source/net/yacy/search/index/Fulltext.java
Sat Dec 31 09:51:07 CET 2016
by luccioman
Fixed docker stop behavior.

- Adjusted start script in debug mode to make sure the main java process
can receive signals such as SIGTERM
- Modified docker images main command to properly propagate SIGTERM
signal to the main java process
Changed Files: docker/Dockerfile, docker/Dockerfile.alpine, docker/Readme.md, startYACY.sh
Wed Dec 28 09:47:27 CET 2016
by luccioman
Fixed YaCy proper shutdown triggered by SIGTERM signal.

The main shutdown hook thread was not properly waiting for the main
thread termination which consequently could not properly close resources
and threads. After terminating a running YaCy peer this way (Ctrl+C in
console, or kill <pid> for example), you could see the still existing
DATA/yacy.running file.

Tested with :
 - Debian Jessie openjdk 7 and 8 : regular shutdown, Ctrl+C, kill
command, system restart while yacy is running
 - Windows 10 Oracle JDK 7 and 8 : non regression on regular shutdown 
Changed Files: source/net/yacy/search/Switchboard.java, source/net/yacy/yacy.java