YaCy Release 1.5

Release 1.5

Major Changes   
Jump to: Bugfixes / Other Changes

CommitDescription
Wed Jun 12 15:02:49 CEST 2013
by Michael Peter Christen
added a new 'Citations' function: each search result item can now be
explored for citations within other documents. A click on the
'Citations' link shows an analysis with all text lines in the document
each with a complete list of documents which contain the same line. A
second section shows the linking documents in ascending order of number
of citations from the original document. Because documents from
different hosts are most interesting here, they are listed at the top of
the page as possible 'copypasta' source.
Changed Files: defaults/yacy.init, htroot/ConfigPortal.html, htroot/ConfigPortal.java, htroot/ConfigSearchPage_p.html, htroot/ConfigSearchPage_p.java, htroot/api/citation.html, htroot/api/citation.java, htroot/yacysearchitem.html, htroot/yacysearchitem.java, source/net/yacy/cora/document/MultiProtocolURI.java, source/net/yacy/cora/federate/solr/responsewriter/GrepHTMLResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java
Tue Jun 11 14:42:30 CEST 2013
by Michael Peter Christen
added a 'greedy learning' mechanismn which will cause that a 'fresh'
yacy will load linked web pages from search results until the total
number of web pages reaches 15000. This shall give fresh peers a 'boost'
to get faster a personalized search index.
Changed Files: defaults/yacy.init, defaults/yacy.network.freeworld.unit, defaults/yacy.network.intranet.unit, defaults/yacy.network.metager.unit, defaults/yacy.network.webportal.unit, htroot/ConfigHeuristics_p.java, htroot/ConfigNetwork_p.java, htroot/yacysearch.java, htroot/yacysearchitem.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/SwitchboardConstants.java
Mon Jun 10 16:22:00 CEST 2013
by Michael Peter Christen
added new buttons to search result page in p2p mode which show the
switch between p2p search and the 'stealth mode' which is simply a
non-p2p search within the p2p network. The functionality was there all
the time, but the switch to this was not very visible.
Changed Files: htroot/env/base.css, htroot/env/grafics/searchmode_p2p_activated_32.png, htroot/env/grafics/searchmode_p2p_deactivated_32.png, htroot/env/grafics/searchmode_stealth_activated_32.png, htroot/env/grafics/searchmode_stealth_deactivated_32.png, htroot/index.html, htroot/index.java, htroot/yacysearch.html, htroot/yacysearch.java
Sun Jun 09 12:12:34 CEST 2013
by orbiter
replaced yacydoc servlet usage by a solr result output using an html
output writer. This made the creation of a html result writer necessary
which is included in this commit. The yacydoc servlet was used to
present all metadata to a document, but the solr interface can serve for
this purpose in a much better way. All usages (instead one) of yacydoc
were replaced by a solr call. This affects also the 'metadata' link
attached to search results.
Changed Files: htroot/ConfigSearchPage_p.html, htroot/IndexControlURLs_p.html, htroot/ViewFile.html, htroot/solr/select.java, htroot/yacysearchitem.html, source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java
Fri Jun 07 13:20:57 CEST 2013
by Michael Peter Christen
Added a citation reference computation for intra-domain link structures.
While the values for the reference evaluation are computed, also a
backlink-structure can be discovered and written to the index as well.
The host browser has been extended to show such backlinks to each
presented links. The host browser therefore can now show an information
where an document is linked. The new citation reference is computed as
likelyhood for a random click path with recursive usage of previously
computed likelyhood. This process is repeated until the likelyhood
converges to a specific number. This number is then normalized to a
ranking value CRn, 0<=CRn<=1. The value CRn can therefore be used to
rank popularity within intra-domain link structures.
Changed Files: defaults/solr.collection.schema, htroot/HostBrowser.java, source/net/yacy/cora/federate/solr/ProcessType.java, source/net/yacy/cora/federate/solr/SchemaConfiguration.java, source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/kelondro/data/meta/URIMetadataNode.java, source/net/yacy/kelondro/workflow/WorkflowProcessor.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/CollectionSchema.java
Thu Jun 06 22:07:54 CEST 2013
by reger
- fix stopword handling for RWI see example http://bugs.yacy.net/view.php?id=247
   - append language setting specific stopword list

- remove unused OVERHANG stack type
Changed Files: htroot/yacysearch.java, source/net/yacy/cora/storage/HandleSet.java, source/net/yacy/crawler/data/CrawlQueues.java, source/net/yacy/crawler/data/NoticedURL.java, source/net/yacy/kelondro/index/RowHandleSet.java, source/net/yacy/kelondro/util/SetTools.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/query/SearchEvent.java, yacy.stopwords
Sat Jun 01 05:43:08 CEST 2013
by reger
enable use of solrcore.properties for property substitution of solrconfig.xml
- move setting of system property solr.directoryFactory=solr.MMapDirectoryFactory to solrcore.properties
- add check of os.arch for 64bit system, if it fails use default/solrcore.x86.properties (if exists) as solrcore.properties
 
reason: on 32bit MMapDirectoryFactory may fail with.....
Caused by: java.io.IOException: Map failed
	at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849)
	at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283)

Changed Files: defaults/solr/solrcore.properties, defaults/solr/solrcore.x86.properties, source/net/yacy/cora/federate/solr/instance/EmbeddedInstance.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/yacy.java
Wed May 29 18:27:27 CEST 2013
by Michael Peter Christen
removed 'later' tactic because it used too much RAM, reduced number of
soft commits, reduced caching size of search events, ensured that solr
results are processed before connection is closed to keep that stuff not
too long in RAM
Changed Files: defaults/solr/solrconfig.xml, htroot/yacy/crawlReceipt.java, htroot/yacy/transferURL.java, source/net/yacy/cora/federate/solr/connector/RemoteSolrConnector.java, source/net/yacy/peers/Protocol.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/query/SearchEvent.java
Mon May 20 22:05:28 CEST 2013
by Michael Peter Christen
reduced locking situation in crawler: shifted synchronized location and
reduced time-out of robots.txt load limit
Changed Files: htroot/Bookmarks.java, htroot/CrawlCheck_p.java, htroot/Crawler_p.java, htroot/DictionaryLoader_p.java, htroot/Load_RSS_p.java, htroot/ViewFile.java, htroot/ViewImage.java, htroot/api/getpageinfo.java, htroot/api/getpageinfo_p.java, htroot/api/webstructure.java, htroot/yacysearch.java, htroot/yacysearchitem.java, source/net/yacy/crawler/Balancer.java, source/net/yacy/crawler/data/CrawlQueues.java, source/net/yacy/crawler/retrieval/HTTPLoader.java, source/net/yacy/crawler/retrieval/RSSLoader.java, source/net/yacy/crawler/robots/RobotsTxt.java, source/net/yacy/data/ymark/YMarkAutoTagger.java, source/net/yacy/data/ymark/YMarkMetadata.java, source/net/yacy/document/importer/OAIListFriendsLoader.java, source/net/yacy/document/importer/OAIPMHLoader.java, source/net/yacy/peers/graphics/OSMTile.java, source/net/yacy/peers/operation/yacyRelease.java, source/net/yacy/repository/LoaderDispatcher.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/snippet/MediaSnippet.java, source/net/yacy/search/snippet/TextSnippet.java
Fri May 17 13:59:37 CEST 2013
by Michael Peter Christen
redesign of index.exist-test: this shall now not be done using a single
id to be tested, but with a collection of ids. This will cause only a
single call to solr instead of many. The result is a much better
performace when testing the existence of many urls. The effect should
cause very much less IO during index transmission, both on sender and
receiver side.
Changed Files: htroot/HostBrowser.java, htroot/IndexControlRWIs_p.java, htroot/Load_RSS_p.java, htroot/yacy/transferRWI.java, htroot/yacy/transferURL.java, source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/cora/federate/solr/connector/ConcurrentUpdateSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrConnector.java, source/net/yacy/cora/protocol/ftp/FTPClient.java, source/net/yacy/crawler/retrieval/HTTPLoader.java, source/net/yacy/crawler/retrieval/RSSLoader.java, source/net/yacy/crawler/retrieval/SitemapImporter.java, source/net/yacy/kelondro/workflow/WorkflowProcessor.java, source/net/yacy/migration.java, source/net/yacy/peers/Transmission.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/query/SearchEvent.java, source/net/yacy/server/serverObjects.java
Mon May 13 13:28:07 CEST 2013
by Michael Peter Christen
refactoring of WorkflowProcessor, added process counter, update of
process counter if an blocking thread dies. Added also a new column in
PerformanceConcurrency_p servlet to show the actual number of concurrent
processes.
Changed Files: htroot/PerformanceConcurrency_p.html, htroot/PerformanceConcurrency_p.java, source/net/yacy/crawler/CrawlStacker.java, source/net/yacy/kelondro/workflow/AbstractBlockingThread.java, source/net/yacy/kelondro/workflow/InstantBlockingThread.java, source/net/yacy/kelondro/workflow/WorkflowProcessor.java, source/net/yacy/peers/Dispatcher.java, source/net/yacy/search/Switchboard.java
Thu May 09 02:17:53 CEST 2013
by Michael Peter Christen
migrated to solr 4.3.0
Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, defaults/solr/schema.xml, defaults/solr/solrconfig.xml, lib/lucene-analyzers-common-4.3.0.jar, lib/lucene-analyzers-phonetic-4.3.0.jar, lib/lucene-classification-4.3.0.jar, lib/lucene-codecs-4.3.0.jar, lib/lucene-core-4.3.0.jar, lib/lucene-facet-4.3.0.jar, lib/lucene-grouping-4.3.0.jar, lib/lucene-highlighter-4.3.0.jar, lib/lucene-join-4.3.0.jar, lib/lucene-memory-4.3.0.jar, lib/lucene-misc-4.3.0.jar, lib/lucene-queries-4.3.0.jar, lib/lucene-queryparser-4.3.0.jar, lib/lucene-spatial-4.3.0.jar, lib/lucene-suggest-4.3.0.jar, lib/noggit-0.5.jar, lib/solr-core-4.3.0.jar, lib/solr-solrj-4.3.0.License, lib/solr-solrj-4.3.0.jar
Thu May 09 00:22:45 CEST 2013
by Michael Peter Christen
- upgraded httpclient, httpcore and httpmime
- removed httpclient 3.1 which has been used by solrj < 4.x.x and is now
not used any more
- fixed some parts in YaCy which used methods from httpclient 3.1
Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/httpclient-4.2.5.License, lib/httpclient-4.2.5.jar, lib/httpcore-4.2.4.License, lib/httpcore-4.2.4.jar, lib/httpmime-4.2.5.License, lib/httpmime-4.2.5.jar, source/net/yacy/cora/federate/solr/instance/RemoteInstance.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/server/http/ChunkedInputStream.java
Wed May 08 11:50:46 CEST 2013
by Michael Peter Christen
prevent that the size of the index is computed too many times.
Because the index size is now provided by solr, and the only way to do
that is a match for [* TO *], a size computation is quite complex and
time-consuming. Therefore this patch prevents that the method is called
at all and if necessary puts a DOS-preventing barrier in front of it.
Changed Files: htroot/HostBrowser.java, htroot/IndexControlURLs_p.java, htroot/PerformanceGraph.java, htroot/yacy/hello.java, htroot/yacy/query.java, htroot/yacyinteractive.java, source/net/yacy/peers/Network.java, source/net/yacy/peers/Protocol.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/search/index/Segment.java
Mon May 06 16:45:54 CEST 2013
by Michael Peter Christen
re-declared some fields to be of type string rather than text which
makes them more efficient and less large
Changed Files: defaults/solr.collection.schema, htroot/Crawler_p.java, htroot/HostBrowser.java, source/net/yacy/cora/federate/solr/SolrType.java, source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/search/query/QueryGoal.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/CollectionSchema.java, source/net/yacy/search/schema/WebgraphSchema.java
Sat Apr 27 01:32:18 CEST 2013
by Michael Peter Christen
refactoring (renaming) of yacy-solr api
Changed Files: htroot/HostBrowser.java, source/net/yacy/cora/federate/opensearch/OpenSearchConnector.java, source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/cora/federate/solr/connector/CachedSolrConnector.java, source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java, source/net/yacy/cora/federate/solr/connector/MirrorSolrConnector.java, source/net/yacy/cora/federate/solr/connector/RemoteSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java, source/net/yacy/peers/Protocol.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/WebgraphConfiguration.java
Fri Apr 26 10:49:55 CEST 2013
by Michael Peter Christen
- added a new field for the regular expression in crawl start
- added the field in crawl profile
- adopted logging end error management
- adopted duplicate document detection
- added a new rule to the indexing process to reject non-matching
content
- full redesign of the expert crawl start servlet
The new filter field can now be seen in /CrawlStartExpert_p.html at
Section "Document Filter", subsection item "Filter on Content of
Document"
Changed Files: htroot/CrawlProfileEditor_p.java, htroot/CrawlStartExpert_p.html, htroot/CrawlStartExpert_p.java, htroot/Crawler_p.java, htroot/QuickCrawlLink_p.java, htroot/env/base.css, source/net/yacy/crawler/CrawlSwitchboard.java, source/net/yacy/crawler/data/CrawlProfile.java, source/net/yacy/data/ymark/YMarkCrawlStart.java, source/net/yacy/document/Document.java, source/net/yacy/search/Switchboard.java
Thu Apr 25 11:33:17 CEST 2013
by orbiter
- reduction of the concurrently running processes to make YaCy more
adjusted to smaller and 1-core devices.
- the workflow processor now starts no process at all. these are started
as soon as parser/condenser/indexing queues are filled.
- better abstraction
Changed Files: htroot/ViewImage.java, source/net/yacy/cora/protocol/Domains.java, source/net/yacy/crawler/CrawlStacker.java, source/net/yacy/crawler/data/CrawlQueues.java, source/net/yacy/kelondro/blob/MapHeap.java, source/net/yacy/kelondro/data/word/Word.java, source/net/yacy/kelondro/data/word/WordReferenceVars.java, source/net/yacy/kelondro/index/RowHandleMap.java, source/net/yacy/kelondro/workflow/AbstractBlockingThread.java, source/net/yacy/kelondro/workflow/AbstractThread.java, source/net/yacy/kelondro/workflow/InstantBlockingThread.java, source/net/yacy/kelondro/workflow/WorkflowProcessor.java, source/net/yacy/peers/Dispatcher.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/DocumentIndex.java, source/net/yacy/search/snippet/TextSnippet.java
Tue Apr 16 14:45:14 CEST 2013
by Michael Peter Christen
fixed ranking for add-function queries: this did not work. The option
was removed. All function queries are now boosts (multiplies the score
according to a function). This is also the recommended way to boost
rankings based on functions as explained in
http://nolanlawson.com/2012/06/02/comparing-boost-methods-in-solr/
Changed Files: defaults/yacy.init, htroot/RankingSolr_p.html, htroot/RankingSolr_p.java, htroot/gsa/searchresult.java, htroot/solr/select.java, source/net/yacy/cora/federate/solr/Ranking.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/SwitchboardConstants.java, source/net/yacy/search/query/QueryGoal.java, source/net/yacy/search/query/QueryParams.java
Mon Apr 15 14:08:30 CEST 2013
by Michael Peter Christen
redesign of exists()-query (can now be called with query) and the
CachedSolrConnector which based its cache on the key value. This will be
used to correct the title_unique_b and description_unique_b field.
Changed Files: htroot/PerformanceMemory_p.java, source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/cora/federate/solr/connector/CachedSolrConnector.java, source/net/yacy/cora/federate/solr/connector/MirrorSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/search/index/Segment.java
Sat Apr 06 16:11:24 CEST 2013
by Michael Peter Christen
upgrade to solr 4.2.1
Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/lucene-analyzers-common-4.2.1.jar, lib/lucene-analyzers-phonetic-4.2.1.jar, lib/lucene-classification-4.2.1.jar, lib/lucene-core-4.2.1.jar, lib/lucene-facet-4.2.1.jar, lib/lucene-grouping-4.2.1.jar, lib/lucene-highlighter-4.2.1.jar, lib/lucene-join-4.2.1.jar, lib/lucene-memory-4.2.1.jar, lib/lucene-misc-4.2.1.jar, lib/lucene-queries-4.2.1.jar, lib/lucene-queryparser-4.2.1.jar, lib/lucene-spatial-4.2.1.jar, lib/lucene-suggest-4.2.1.jar, lib/solr-core-4.2.1.jar, lib/solr-solrj-4.2.1.jar, lib/solr.License, source/net/yacy/cora/federate/solr/responsewriter/EnhancedXMLResponseWriter.java, source/net/yacy/search/query/QueryParams.java


Bugfixes   
Jump to: YaCy Release 1.5 top / Other Changes

CommitDescription
Thu Jun 13 22:42:21 CEST 2013
by Michael Peter Christen
npe fix
Changed Files: source/net/yacy/kelondro/blob/Heap.java
Thu Jun 13 18:27:57 CEST 2013
by orbiter
fix for citation search in case that the citation is very fresh
Changed Files: htroot/api/citation.java
Wed Jun 12 13:23:58 CEST 2013
by Michael Peter Christen
npe fix
Changed Files: source/net/yacy/cora/sorting/OrderedScoreMap.java
Tue Jun 11 16:22:43 CEST 2013
by Michael Peter Christen
added fixed clear method as public method
Changed Files: source/net/yacy/crawler/data/NoticedURL.java
Fri Jun 07 00:13:45 CEST 2013
by reger
add null pointer check to stopword fix
Changed Files: source/net/yacy/search/query/SearchEvent.java
Tue May 28 16:26:38 CEST 2013
by orbiter
prevent NPE in case RWI is disabled
Changed Files: htroot/PerformanceQueues_p.java, htroot/yacy/query.java, htroot/yacy/search.java, htroot/yacysearch.java, source/net/yacy/peers/Dispatcher.java, source/net/yacy/peers/Protocol.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/query/SearchEvent.java, source/net/yacy/search/snippet/ResultEntry.java
Sun May 26 03:24:32 CEST 2013
by reger
fix DHT url receive see http://bugs.yacy.net/view.php?id=242
Changed Files: htroot/yacy/transferURL.java
Mon May 13 13:27:01 CEST 2013
by Michael Peter Christen
fixed query expressions for collection selection (added quotes)
Changed Files: source/net/yacy/search/query/QueryModifier.java
Sun May 12 21:36:20 CEST 2013
by orbiter
fix for workflow processor (cause: latest redesign for less threads)
Changed Files: source/net/yacy/kelondro/workflow/WorkflowProcessor.java
Sat May 11 11:19:06 CEST 2013
by Michael Peter Christen
small memory leak patch
Changed Files: source/net/yacy/crawler/data/Latency.java, source/net/yacy/repository/LoaderDispatcher.java
Tue Apr 30 11:06:48 CEST 2013
by Michael Peter Christen
infinity timeout bug protection patch
Changed Files: source/net/yacy/cora/sorting/WeakPriorityBlockingQueue.java, source/net/yacy/crawler/Balancer.java, source/net/yacy/kelondro/data/word/WordReferenceFactory.java, source/net/yacy/kelondro/data/word/WordReferenceVars.java, source/net/yacy/peers/Dispatcher.java, source/net/yacy/peers/graphics/WebStructureGraph.java, source/net/yacy/search/query/SearchEvent.java, source/net/yacy/search/ranking/ReferenceOrder.java
Sun Apr 28 20:09:45 CEST 2013
by Michael Peter Christen
fixed bad css change
Changed Files: htroot/env/base.css
Sun Apr 21 12:27:27 CEST 2013
by Michael Peter Christen
fixed default ranking values
Changed Files: htroot/RankingSolr_p.java
Sat Apr 20 10:53:49 CEST 2013
by orbiter
avoid NPE in regex checker
Changed Files: source/net/yacy/repository/RegexHelper.java
Tue Apr 16 13:32:13 CEST 2013
by Michael Peter Christen
fix for result counter logging
Changed Files: source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java
Tue Apr 16 12:38:16 CEST 2013
by Michael Peter Christen
fix to ranking configuration servlet
Changed Files: htroot/RankingSolr_p.html
Tue Apr 16 01:39:30 CEST 2013
by Michael Peter Christen
fixed api table navigation
Changed Files: htroot/Table_API_p.java
Sun Apr 14 02:01:27 CEST 2013
by reger
skip postprocessing during document.store if no citation index connected (prevent null pointer exception)
Changed Files: source/net/yacy/search/index/Segment.java
Sat Apr 06 02:29:49 CEST 2013
by reger
fix typo in prev commit
Changed Files: htroot/AccessTracker_p.html
Wed Mar 20 16:19:49 CET 2013
by Michael Peter Christen
fixes for better search interface integration in yaml templates
Changed Files: htroot/solr/select.java, htroot/yacysearch.java, source/net/yacy/cora/federate/solr/responsewriter/JsonResponseWriter.java
Mon Mar 18 00:10:23 CET 2013
by reger
fix invisible icon not found
Changed Files: htroot/HostBrowser.html


Other Changes   
Jump to: YaCy Release 1.5 top / Bugfixes

CommitDescription
Thu Jun 13 23:50:00 CEST 2013
by Michael Peter Christen
Release 1.5
Changed Files: build.properties
Thu Jun 13 22:40:46 CEST 2013
by Michael Peter Christen
typo
Changed Files: htroot/Steering.html
Thu Jun 13 22:32:06 CEST 2013
by Michael Peter Christen
increased time-out for loading of seed-lists
Changed Files: source/net/yacy/search/Switchboard.java
Thu Jun 13 22:31:39 CEST 2013
by Michael Peter Christen
added target="_blank" to shutdown links
Changed Files: htroot/Steering.html
Thu Jun 13 14:44:47 CEST 2013
by orbiter
added a feed-back message inside the shutdown page
Changed Files: htroot/Steering.html
Thu Jun 13 13:22:43 CEST 2013
by Michael Peter Christen
show the citation report also in ViewFile
Changed Files: htroot/ViewFile.html, htroot/ViewFile.java
Thu Jun 13 13:08:24 CEST 2013
by Michael Peter Christen
fixed usage of ViewFile which needs a commit before showing latest crawl
result pages.
Changed Files: htroot/ViewFile.java
Thu Jun 13 13:03:56 CEST 2013
by Michael Peter Christen
removed warning message during crawling
Changed Files: source/net/yacy/crawler/CrawlStacker.java
Thu Jun 13 13:01:28 CEST 2013
by Michael Peter Christen
removed fields references_internal_id_sxt and
references_internal_url_sxt because they had been shown to be
superfluous. The citation of referrer in the host browser is possible
without them. Therefore now the host browser does not only show
internal, but also external referrer to each link.
Changed Files: defaults/solr.collection.schema, htroot/HostBrowser.java, source/net/yacy/cora/federate/solr/SchemaConfiguration.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/CollectionSchema.java
Wed Jun 12 11:29:35 CEST 2013
by Michael Peter Christen
switching back to the merge factor 10; the solr default.
Changed Files: defaults/solr/solrconfig.xml
Wed Jun 12 02:13:18 CEST 2013
by Michael Peter Christen
added synchronizations and timeouts in solr api; missing
synchronizations in index modification methods causes deadlocks inside
solr.
Changed Files: defaults/yacy.init, htroot/IndexFederated_p.java, source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java, source/net/yacy/cora/federate/solr/connector/RemoteSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java, source/net/yacy/cora/federate/solr/instance/RemoteInstance.java, source/net/yacy/peers/Protocol.java, source/net/yacy/search/Switchboard.java
Wed Jun 12 00:17:44 CEST 2013
by Michael Peter Christen
calling pdf cache flush on class initialization because calling of the
methods during runtime can conflict with dynamic solr class loader and
cause a deadlock (seriously!)
Changed Files: source/net/yacy/document/parser/pdfParser.java
Wed Jun 12 00:16:28 CEST 2013
by Michael Peter Christen
removed misleading http accessGranted message (this is only for
debugging)
Changed Files: source/net/yacy/server/http/HTTPDFileHandler.java
Wed Jun 12 00:14:55 CEST 2013
by Michael Peter Christen
reduced load on solr; no seed update in Status and no exists-check in
HTTPLoader in case of redirects, that can be done using the htcache.
Changed Files: htroot/Status.java, source/net/yacy/crawler/retrieval/HTTPLoader.java
Wed Jun 12 00:12:04 CEST 2013
by Michael Peter Christen
changed administration page headline to 'admnistration'
Changed Files: htroot/env/templates/header.template
Wed Jun 12 00:10:25 CEST 2013
by Michael Peter Christen
changed windows icon again
Changed Files: addon/YaCy.ico, addon/YaCy_TrayIcon.png
Tue Jun 11 16:51:40 CEST 2013
by Michael Peter Christen
increased the solr merge factor because 4 was too much IO load for
frequent index receiving and re-indexing after clickdepth/cr
calculation.
Changed Files: defaults/solr/solrconfig.xml
Tue Jun 11 16:50:34 CEST 2013
by Michael Peter Christen
changed p2p/stealth mode text and links a bit
Changed Files: htroot/yacysearch.html
Tue Jun 11 14:52:46 CEST 2013
by Michael Peter Christen
allip net has greedy learning disabled
Changed Files: defaults/yacy.network.allip.unit
Tue Jun 11 14:51:26 CEST 2013
by Michael Peter Christen
removed forced soft commit since this may be the cause for a performance
problem
Changed Files: source/net/yacy/search/index/Segment.java
Tue Jun 11 13:16:46 CEST 2013
by Michael Peter Christen
new icons
Changed Files: addon/YaCy.app/Contents/Info.plist, addon/YaCy.app/Contents/Resources/YaCy_2013_Icon.icns, addon/YaCy.ico, htroot/favicon.bmp, htroot/favicon.ico, htroot/favicon.png
Tue Jun 11 13:12:59 CEST 2013
by Michael Peter Christen
use s greeting line which does not sound so beta
Changed Files: source/net/yacy/gui/InfoPage.java
Mon Jun 10 18:41:00 CEST 2013
by Michael Peter Christen
added another response writer which can present search result with
texts, separated by sentences. Then, these sentences can be used to
search again in the index for the same sentence. This can be used to
provide a tool for plagiarism-search. (not finished yet).
Try the following:
http://localhost:8090/solr/select?q=text_t:flut&grep=wasser&defType=edismax&start=0&rows=3&core=collection1&wt=grephtml
.. to search for 'flut' and show only sentences in the result documents
which contain the word 'wasser'.
Consider this like using a grep-tool on documents: you select the
documents by a search query and you grep sentences inside the found
documents with the 'grep' attribute.
Changed Files: htroot/solr/select.java, source/net/yacy/cora/federate/solr/responsewriter/GrepHTMLResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java
Mon Jun 10 18:36:06 CEST 2013
by Michael Peter Christen
the line "Web Search by the People, for the People" is more generic for
P2P and portal search as default search string. Otherwise, if people
switch to Portal mode, the "P2P Web Search" does not make sense.
Changed Files: defaults/yacy.init
Mon Jun 10 16:23:58 CEST 2013
by Michael Peter Christen
fix for host compare in case that the host is null. This happens when
doing a search in the intranet for file resources (they don't have a
host).
Changed Files: source/net/yacy/search/Switchboard.java
Sun Jun 09 08:15:23 CEST 2013
by orbiter
show the cache link in search results only if there is actually a cache
entry stored in HTCACHE
Changed Files: htroot/yacysearchitem.java
Fri Jun 07 14:26:14 CEST 2013
by Michael Peter Christen
activated citation ranking by default
Changed Files: defaults/solr.collection.schema
Fri Jun 07 13:22:22 CEST 2013
by Michael Peter Christen
usage of the new normalized link polularity CRn as default ranking
function. This replaces the previous formula, which was bad. Before you
update to this version, please check if you changed the ranking function
yourself before, since it will be overwritten.
Changed Files: defaults/yacy.init, source/net/yacy/search/SwitchboardConstants.java
Fri Jun 07 12:52:03 CEST 2013
by Michael Peter Christen
patch in HTCache and CitationIndex loading in case that a file is
broken: do not crash; instead ignore the file and delete it.
Changed Files: source/net/yacy/crawler/data/Cache.java, source/net/yacy/kelondro/blob/ArrayStack.java, source/net/yacy/kelondro/io/CachedFileWriter.java, source/net/yacy/kelondro/rwi/ReferenceContainerArray.java
Fri Jun 07 08:52:07 CEST 2013
by Michael Peter Christen
fixes to index deletion: quoting of host name (a '-' may be part of the
url) and disabling the engage button when changing the url field at
'Delete by URL matching'
Changed Files: htroot/IndexDeletion_p.html, htroot/IndexDeletion_p.java
Thu Jun 06 13:36:58 CEST 2013
by orbiter
in GSA api enable usage of solr fq-attribute together with GSA
site-attribute
Changed Files: htroot/gsa/searchresult.java
Sun Jun 02 13:50:12 CEST 2013
by Michael Peter Christen
fix for bad exists 'enhancement'; see bug:
http://bugs.yacy.net/view.php?id=245
Changed Files: source/net/yacy/search/index/Fulltext.java
Sat Jun 01 05:50:03 CEST 2013
by reger
fix: enable use of solrcore.properties for property substitution of solrconfig.xml

 
Changed Files: source/net/yacy/cora/federate/solr/instance/EmbeddedInstance.java
Thu May 30 16:39:48 CEST 2013
by Michael Peter Christen
added missing class
Changed Files: source/net/yacy/search/StorageQueueEntry.java
Thu May 30 16:30:35 CEST 2013
by Michael Peter Christen
ranking and boost function update, small bugfixes, better default search
field for solr
Changed Files: defaults/solr/solrconfig.xml, defaults/yacy.init, htroot/IndexControlRWIs_p.html
Thu May 30 13:01:22 CEST 2013
by Michael Peter Christen
removed block rank ranking and all YBR files in /ranking
Changed Files: build.xml, htroot/IndexControlRWIs_p.html, htroot/IndexControlRWIs_p.java, htroot/RankingRWI_p.java, htroot/index.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/ranking/BlockRank.java, source/net/yacy/search/ranking/RankingProfile.java, source/net/yacy/search/ranking/ReferenceOrder.java
Thu May 30 12:47:22 CEST 2013
by Michael Peter Christen
cleanup
Changed Files: htroot/IndexControlRWIs_p.java, source/net/yacy/peers/Protocol.java, source/net/yacy/search/index/Fulltext.java
Thu May 30 12:39:28 CEST 2013
by Michael Peter Christen
added timeout for remote searches of 10 seconds
Changed Files: source/net/yacy/cora/federate/solr/instance/RemoteInstance.java
Thu May 30 12:38:54 CEST 2013
by Michael Peter Christen
try to commit in case of failure which hopefully frees up some RAM
Changed Files: source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java
Thu May 30 12:38:15 CEST 2013
by Michael Peter Christen
Store node/solr search threads to be able to send them an interrupt
signal in case that a cleanup process wants to remove the search
process. Added also a new cleanup process which can reduce the number of
stored searches to a specific number which can be higher or lower
according to the remaining RAM. The cleanup process is called every time
a search ist started.
Changed Files: source/net/yacy/peers/RemoteSearch.java, source/net/yacy/search/query/SearchEvent.java, source/net/yacy/search/query/SearchEventCache.java
Thu May 30 12:35:47 CEST 2013
by Michael Peter Christen
remove text_t in search result after snippet has been computed to save
space in search result cache
Changed Files: source/net/yacy/search/snippet/ResultEntry.java
Thu May 30 12:34:53 CEST 2013
by Michael Peter Christen
new workflow processor in Segment to enqueue indexing documents to solr
Changed Files: source/net/yacy/kelondro/data/word/Word.java, source/net/yacy/peers/Protocol.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/search/index/ReindexSolrBusyThread.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/schema/CollectionConfiguration.java
Thu May 30 12:31:28 CEST 2013
by Michael Peter Christen
default configuration of MMapDirectoryFactory for solr, increased lock
timeout, less documents from remote searches (too many results had
easily blocked a peer)
Changed Files: defaults/solr/solrconfig.xml, defaults/yacy.init, source/net/yacy/yacy.java
Wed May 29 16:09:05 CEST 2013
by Michael Peter Christen
getting the trash out
Changed Files: source/net/yacy/document/parser/pdfParser.java, source/net/yacy/kelondro/data/word/Word.java, source/net/yacy/search/Switchboard.java
Wed May 29 13:45:22 CEST 2013
by Michael Peter Christen
added new link for SMW
Changed Files: htroot/ContentControl_p.html
Wed May 29 13:42:38 CEST 2013
by Michael Peter Christen
removed dead link
Changed Files: htroot/ContentControl_p.html
Wed May 29 13:30:32 CEST 2013
by Michael Peter Christen
less logging
Changed Files: source/net/yacy/peers/Protocol.java
Wed May 29 13:10:32 CEST 2013
by Michael Peter Christen
added new keys for update locations
Changed Files: defaults/yacy.network.allip.unit, defaults/yacy.network.freeworld.unit, defaults/yacy.network.intranet.unit, defaults/yacy.network.metager.unit, defaults/yacy.network.webportal.unit
Wed May 29 13:09:34 CEST 2013
by Michael Peter Christen
added option to re-boot the embedded solr during run-time. Added also
API recording for this method so it can be repeated automatically. The
index dump generation is now also available for API recording. Added
some synchronization in backend which was necessary for this.
Changed Files: htroot/IndexControlURLs_p.html, htroot/IndexControlURLs_p.java, source/net/yacy/search/index/Fulltext.java
Wed May 29 12:02:19 CEST 2013
by Michael Peter Christen
fixed ClassCastException: [Ljava.lang.Object; cannot be cast to
[Ljava.util.List; in robots.txt servlet
Changed Files: htroot/robots.java
Tue May 28 11:38:45 CEST 2013
by Michael Peter Christen
use a retry handler with retryCount=0 because we usually expect requests
to fail if we access non-permanently available resources (peers, web
pages) and want to fail fast without repeating the same request which is
doomed to fail. The previous appearance of http client connection had a
1-2-4-8-second timeout scheme, which caused that connection attempts
lasted for 16 seconds.
Changed Files: source/net/yacy/cora/federate/solr/instance/RemoteInstance.java, source/net/yacy/cora/protocol/http/HTTPClient.java
Tue May 28 11:35:56 CEST 2013
by Michael Peter Christen
include API Table deletion requests to the API recorder
Changed Files: htroot/Table_API_p.java
Tue May 28 10:36:49 CEST 2013
by Michael Peter Christen
activating pollImmediately in case that DHT receive is off. This will
cause a much faster search result when running in public robinson mode.
Changed Files: source/net/yacy/search/query/SearchEvent.java
Tue May 28 10:33:41 CEST 2013
by Michael Peter Christen
fixed missing thisaddress in yacysearch.html which caused that the
opensearch link was not working
Changed Files: htroot/yacysearch.java
Mon May 27 16:15:58 CEST 2013
by Michael Peter Christen
added a (badly formatted) delete button for process scheduler entries
Changed Files: htroot/Table_API_p.html, htroot/Table_API_p.java
Mon May 27 15:23:12 CEST 2013
by orbiter
set a higher limit for table copy usage
Changed Files: source/net/yacy/kelondro/table/Table.java
Mon May 27 13:45:09 CEST 2013
by Michael Peter Christen
javadoc of new multiple-exist test
Changed Files: source/net/yacy/search/index/Fulltext.java, source/net/yacy/search/index/Segment.java
Sat May 25 12:56:43 CEST 2013
by Marc Nause
*) simplified banner creation code
Changed Files: htroot/Banner.java, source/net/yacy/peers/graphics/Banner.java, source/net/yacy/peers/graphics/BannerData.java
Sat May 25 11:08:06 CEST 2013
by Marc Nause
*) updated links to description of regex
Changed Files: htroot/Blacklist_p.html
Mon May 20 11:25:26 CEST 2013
by Michael Peter Christen
nice crawl name if crawl is started with file:// (was: null)
Changed Files: htroot/Crawler_p.java
Mon May 20 11:02:21 CEST 2013
by Michael Peter Christen
added the reindexing job servlet to the submenu structure
Changed Files: htroot/IndexReindexMonitor_p.html, htroot/env/templates/submenuIndexControl.template
Mon May 20 01:50:09 CEST 2013
by reger
- odt & ooxml (office document) parser correction to add content to fulltext index
- adjust Junit yacyVersionTest & ParserTest 
- update yacyVersion.combined2prettyVersion to the default 4-digit minor ver. 
Changed Files: source/net/yacy/document/parser/odtParser.java, source/net/yacy/document/parser/ooxmlParser.java, source/net/yacy/peers/operation/yacyVersion.java, test/de/anomic/document/ParserTest.java, test/de/anomic/yacy/yacyVersionTest.java
Fri May 17 14:11:10 CEST 2013
by Michael Peter Christen
- no downcase when using collection modifier
- removed warnings
Changed Files: source/net/yacy/crawler/retrieval/RSSLoader.java, source/net/yacy/search/query/QueryModifier.java
Wed May 15 23:16:32 CEST 2013
by reger
more generic field selection for reindex option of documents with disabled fields 
using Luke request to compare config with actual fields in index
Changed Files: source/net/yacy/migration.java, source/net/yacy/search/index/ReindexSolrBusyThread.java
Wed May 15 22:42:05 CEST 2013
by Michael Peter Christen
reject bad solr requests
Changed Files: htroot/solr/select.java, source/net/yacy/server/serverObjects.java
Mon May 13 13:26:24 CEST 2013
by Michael Peter Christen
enhanced deletion process for very large number of documents
Changed Files: source/net/yacy/cora/federate/solr/connector/ConcurrentUpdateSolrConnector.java
Mon May 13 04:06:57 CEST 2013
by reger
added reindex option for documents with disabled or obsolete fields to Solr Schema Editor page (IndexSchema_p.html) 
this allows to remove obsolete fields from the index (according to current schema config)
by selecting all documents containig disabled fields.
Changed Files: htroot/IndexReIndexMonitor_p.java, htroot/IndexReindexMonitor_p.html, htroot/IndexSchema_p.html, source/net/yacy/migration.java, source/net/yacy/search/index/ReindexSolrBusyThread.java
Sun May 12 21:37:45 CEST 2013
by orbiter
prevent that concurrent deletion process causes wrong double-check in
crawl start
Changed Files: source/net/yacy/search/Switchboard.java
Sat May 11 10:53:12 CEST 2013
by Michael Peter Christen
removed synchronization and concurrency in Fulltext class, concurrent
deletions are now handled in ConcurrentUpdateSolrConnector
Changed Files: htroot/CrawlResults.java, htroot/Crawler_p.java, htroot/IndexControlURLs_p.java, source/net/yacy/search/index/Fulltext.java
Fri May 10 17:33:02 CEST 2013
by Michael Peter Christen
added new peer icons for Mentor peers and Mentee peers (not used yet)
Changed Files: htroot/env/grafics/JuniorMentee.gif, htroot/env/grafics/SeniorMentor.gif
Fri May 10 17:32:21 CEST 2013
by Michael Peter Christen
- added ssl configuration sign (a lock) to network statistic/table
- fixed a bug in bitfield
Changed Files: htroot/Network.html, htroot/Network.java, source/net/yacy/cora/document/ASCII.java, source/net/yacy/peers/Seed.java, source/net/yacy/search/Switchboard.java, source/net/yacy/utils/bitfield.java
Fri May 10 13:49:46 CEST 2013
by Michael Peter Christen
added checkbox (near port) to switch on ssl support (https access) to
the admin interface.
Changed Files: htroot/ConfigBasic.html, htroot/ConfigBasic.java
Fri May 10 12:02:31 CEST 2013
by orbiter
Added a default keystore for ssl encryption of the YaCy web interface.
This will enable https-access to YaCy, but this feature is disabled by
default using the new server.https=false attribute. This has two
purposes:
- make it easier for everyone to use https (just set server.https=true)
- provide the basis for secure yacy-to-yacy communication in the future
Changed Files: defaults/freeworldKeystore, defaults/yacy.init, htroot/Status.java, source/net/yacy/server/serverCore.java
Fri May 10 05:54:07 CEST 2013
by reger
reduce SolrConnectorLogging setting (from default ALL to INFO)
Changed Files: defaults/yacy.logging
Fri May 10 04:56:58 CEST 2013
by Michael Peter Christen
fix for sitemap detection: the sitemap url was not visible if it
appeared after the declaration of robots allow/deny for the crawler
because the sitemap parser terminated after the allow/deny rules had
been found. Now the parser reads the robots.txt until the end to
discover also sitemap rules at the end of the file.
Changed Files: htroot/api/getpageinfo_p.java, source/net/yacy/crawler/robots/RobotsTxtParser.java
Fri May 10 04:38:13 CEST 2013
by reger
- fix monitor url of crawl job in PerformanceQueues_p.html
- reduce logging of every index add  (switch embeddedsolr.add from info to debug)
Changed Files: source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java, source/net/yacy/search/Switchboard.java
Thu May 09 03:06:48 CEST 2013
by Michael Peter Christen
removed some unnecessary synchronizations
Changed Files: source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java
Wed May 08 23:45:29 CEST 2013
by Michael Peter Christen
merged classpath
Bitte geben Sie eine Versionsbeschreibung für Ihre Änderungen ein. Zeilen,
Changed Files: .classpath
Wed May 08 16:48:45 CEST 2013
by orbiter
fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4652
generate dht data even if dht receive and dht transmission is switched
off
Changed Files: source/net/yacy/kelondro/blob/HeapModifier.java, source/net/yacy/search/Switchboard.java
Wed May 08 15:17:06 CEST 2013
by orbiter
updated pdf parser
Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/fontbox-1.8.1.License, lib/fontbox-1.8.1.jar, lib/jempbox-1.8.1.License, lib/jempbox-1.8.1.jar, lib/pdfbox-1.8.1.License, lib/pdfbox-1.8.1.jar
Wed May 08 13:26:25 CEST 2013
by Michael Peter Christen
fixes to deletion methods (removed unnecessary concurrency and added
removal of crawl queue entries)
Changed Files: htroot/Crawler_p.java, htroot/HostBrowser.java, htroot/IndexDeletion_p.java, htroot/PerformanceMemory_p.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/search/index/Segment.java
Wed May 08 12:41:24 CEST 2013
by Michael Peter Christen
better robustness of Concurrent Solr Connector against update/deletion
thread failure
Changed Files: source/net/yacy/cora/federate/solr/connector/ConcurrentUpdateSolrConnector.java
Mon May 06 14:58:18 CEST 2013
by Michael Peter Christen
increased default proxy client timeout to one minute
Changed Files: defaults/yacy.init, source/net/yacy/server/http/HTTPDProxyHandler.java
Mon May 06 14:27:39 CEST 2013
by Michael Peter Christen
draw the names of other peers which receive/send dht into the network
graphic
Changed Files: htroot/Network.html, htroot/NetworkPicture.java, source/net/yacy/peers/graphics/NetworkGraph.java, source/net/yacy/visualization/PrintTool.java, source/net/yacy/visualization/RasterPlotter.java
Sun May 05 23:39:46 CEST 2013
by Michael Peter Christen
enlarge network graph circle according to image height and reduce the
image height in the Network servlet. Overall, the image is now larger
but takes less space on the web page.
Changed Files: htroot/Network.html, source/net/yacy/peers/graphics/NetworkGraph.java
Sun May 05 05:00:42 CEST 2013
by reger
remove pre 1.0 migration statement which possibly overwrites user navigator setting
Changed Files: source/net/yacy/migration.java
Sat May 04 09:34:06 CEST 2013
by Michael Peter Christen
typo
Changed Files: htroot/IndexDeletion_p.html
Sat May 04 01:14:10 CEST 2013
by Michael Peter Christen
- added regular-expression based deletions
- on-demand collection-list generation for collection-based deletions
instead of a default collection-list presentation (this makes calling
the interface much faster since the computation of collections lists for
large indexes may take some seconds)
Changed Files: htroot/IndexDeletion_p.html, htroot/IndexDeletion_p.java
Sat May 04 00:14:22 CEST 2013
by Michael Peter Christen
abstraction of catchall term
Changed Files: htroot/HostBrowser.java, source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/query/QueryGoal.java
Sat May 04 00:14:00 CEST 2013
by Michael Peter Christen
added the date to error documents
Changed Files: source/net/yacy/search/schema/CollectionConfiguration.java
Fri May 03 03:55:14 CEST 2013
by reger
adjust Test case EmbeddedSolrConnector
Changed Files: test/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnectorTest.java
Fri May 03 02:03:30 CEST 2013
by Michael Peter Christen
fix for solr cache when a delete buffer is filled and a document, which
is the delete queue, is replaced with a new one.
Changed Files: source/net/yacy/cora/federate/solr/connector/ConcurrentUpdateSolrConnector.java
Fri May 03 02:02:35 CEST 2013
by Michael Peter Christen
preventing score computation in solr where applicable
Changed Files: source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java
Fri May 03 00:24:39 CEST 2013
by orbiter
fix for http://bugs.yacy.net/view.php?id=233
- check geolocation coordinates and accept only those, which are
well-formed
- the solr push process does not stop crawling any more if after 20
requests to Solr Solr does not accept the record. Instead, a severe log
entry asks the user to create a bug request
Changed Files: source/net/yacy/document/Document.java, source/net/yacy/kelondro/data/meta/URIMetadataRow.java, source/net/yacy/search/index/Segment.java
Thu May 02 15:47:21 CEST 2013
by sixcooler
fix for PerformanceMemory showing UNRESOLVED_PATTERN by removing
solr-cache-stuff, which is not available anymore
Changed Files: htroot/PerformanceMemory_p.html, htroot/PerformanceMemory_p.java
Tue Apr 30 11:44:56 CEST 2013
by Michael Peter Christen
remove sort order in all cases where not needed
Changed Files: source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java
Tue Apr 30 11:09:21 CEST 2013
by Michael Peter Christen
prevent that long-running deletion tasks block a hard commit.
Changed Files: source/net/yacy/cora/federate/solr/connector/ConcurrentUpdateSolrConnector.java
Tue Apr 30 02:11:28 CEST 2013
by Michael Peter Christen
- added index deletion to index administration submenu
- added index deletion processes to the process scheduler/recorder
Changed Files: htroot/IndexDeletion_p.html, htroot/IndexDeletion_p.java, htroot/env/templates/submenuIndexControl.template, source/net/yacy/data/WorkTables.java
Tue Apr 30 00:03:21 CEST 2013
by Saransh Sharma
New Hindi Translation
Changed Files: locales/hi.lng
Mon Apr 29 19:30:53 CEST 2013
by Michael Peter Christen
added an index deletion servlet and some style changes for the
'dangerous' engage-button
Changed Files: htroot/IndexDeletion_p.html, htroot/IndexDeletion_p.java, skins/pdblue.css
Mon Apr 29 19:30:04 CEST 2013
by Michael Peter Christen
added another solr connector, the ConcurrentUpdateSolrConnector which
does not block when long-running updates to solr are made. This is
realized using blocking queues which process all long-running tasks in
the background. Also some bugfixes to existing connectors.
Changed Files: source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/cora/federate/solr/connector/CachedSolrConnector.java, source/net/yacy/cora/federate/solr/connector/ConcurrentUpdateSolrConnector.java, source/net/yacy/cora/federate/solr/connector/MirrorSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java, source/net/yacy/cora/federate/solr/instance/InstanceMirror.java
Mon Apr 29 19:28:17 CEST 2013
by Michael Peter Christen
added more features to ScoreMap (pretty toString)
Changed Files: source/net/yacy/cora/sorting/AbstractScoreMap.java, source/net/yacy/cora/sorting/ClusteredScoreMap.java, source/net/yacy/cora/sorting/OrderedScoreMap.java, source/net/yacy/cora/sorting/ScoreMap.java, source/net/yacy/server/serverObjects.java
Sun Apr 28 21:20:14 CEST 2013
by Michael Peter Christen
- re-introduced existById in solr connector.
- intruduced raw-queries for the re-introduced byId-Queries (they are
hopefully faster than full edismax queries)
- removed the cached solr connector (testing this) to rely only on the
solr built-in search caches. That should save some RAM (also). We will
see if this is usable.
Changed Files: source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/cora/federate/solr/connector/CachedSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrConnector.java, source/net/yacy/cora/federate/solr/instance/InstanceMirror.java, source/net/yacy/search/index/Fulltext.java
Sat Apr 27 03:11:44 CEST 2013
by reger
added httpstatus_i to automatically switched on fields (used in all search queries)
Changed Files: source/net/yacy/search/Switchboard.java
Fri Apr 26 02:26:38 CEST 2013
by reger
RinkingSolr_p: include warning if boost field not in local index
Changed Files: htroot/RankingSolr_p.html, htroot/RankingSolr_p.java
Wed Apr 24 01:14:35 CEST 2013
by Michael Peter Christen
added collection attribute also to the rss feed reader
Changed Files: htroot/CrawlStartSite_p.html, htroot/Load_RSS_p.html, htroot/Load_RSS_p.java, source/net/yacy/cora/document/RSSMessage.java, source/net/yacy/crawler/retrieval/RSSLoader.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/query/QueryModifier.java, source/net/yacy/search/schema/CollectionConfiguration.java
Tue Apr 23 20:42:54 CEST 2013
by orbiter
added a 'collection' property attribute in yacysearch.html which can be
used to select between different collections as defined during a crawl
start with the 'collection' attribute. This actually implements the
ability to prepare search tenants which restrict their search results to
a specific collection. The main use for this is to provide tenants to
the yaml4 interface (at this time).
Changed Files: htroot/gsa/searchresult.java, htroot/yacysearch.java, source/net/yacy/cora/document/RSSMessage.java, source/net/yacy/search/query/QueryModifier.java, source/net/yacy/search/query/QueryParams.java
Tue Apr 23 16:01:17 CEST 2013
by Saransh Sharma
More Translation
Changed Files: locales/de.lng, locales/hi.lng
Tue Apr 23 12:15:33 CEST 2013
by orbiter
increased row limitation for authorized users from 10000 to 100000000 in
solr interface
Changed Files: htroot/solr/select.java
Mon Apr 22 22:33:13 CEST 2013
by Michael Peter Christen
extended limitation of dom export size from 100000 to 100000000
Changed Files: source/net/yacy/search/index/Fulltext.java
Mon Apr 22 14:33:04 CEST 2013
by Michael Peter Christen
some extensions to raster plotter to transform a RGB picture to an
indexed color scheme. This is needed for gif animations
Changed Files: source/net/yacy/peers/graphics/NetworkGraph.java, source/net/yacy/visualization/RasterPlotter.java
Sun Apr 21 12:29:05 CEST 2013
by Michael Peter Christen
added transparency to gif image animation and the integration to the
YaCy httpd for on-the-fly generated gifs (including animated gifs)
Changed Files: source/net/yacy/kelondro/util/ByteBuffer.java, source/net/yacy/peers/graphics/EncodedImage.java, source/net/yacy/server/http/HTTPDFileHandler.java, source/net/yacy/visualization/AnimationGIF.java
Fri Apr 19 09:42:23 CEST 2013
by Saransh Sharma
Hello world
Changed Files: locales/hi.lng
Thu Apr 18 17:21:17 CEST 2013
by Michael Peter Christen
added new schema fields:

hreflang_url_sxt and hreflang_cc_sxt
for
http://support.google.com/webmasters/bin/answer.py?hl=de&answer=189077

navigation_url_sxt and navigation_type_sxt
for
http://googlewebmastercentral.blogspot.de/2011/09/pagination-with-relnext-and-relprev.html

publisher_url_s
for http://support.google.com/plus/answer/1713826?hl=de

all fields are disabled by default and not written to the index.
Changed Files: defaults/solr.collection.schema, source/net/yacy/document/parser/html/ContentScraper.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/CollectionSchema.java
Wed Apr 17 16:15:27 CEST 2013
by Michael Peter Christen
checking of document signature for a double-document check now refers
only to documents within the same domain
Changed Files: source/net/yacy/search/index/Segment.java
Wed Apr 17 12:57:27 CEST 2013
by Michael Peter Christen
added hindi translation configuration
Changed Files: htroot/ConfigBasic.html, source/net/yacy/data/Translator.java
Wed Apr 17 11:11:55 CEST 2013
by Saransh Sharma
Hindi Some parts only
Changed Files: locales/hi.lng
Tue Apr 16 15:02:00 CEST 2013
by Michael Peter Christen
setting of new default values for ranking
Changed Files: defaults/yacy.init, source/net/yacy/search/SwitchboardConstants.java
Tue Apr 16 11:38:51 CEST 2013
by Michael Peter Christen
added in RankingSolr_p.html a select box to switch between different
ranking situations. By default, four situations can be configured.
Changed Files: htroot/RankingSolr_p.html, htroot/RankingSolr_p.java
Tue Apr 16 01:35:15 CEST 2013
by Michael Peter Christen
added new solr title_exact_signature_l and
description_exact_signature_l to be able to identify unique title and
unique description fields.
Changed Files: defaults/solr.collection.schema, source/net/yacy/cora/document/analysis/EnhancedTextProfileSignature.java, source/net/yacy/document/Condenser.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/CollectionSchema.java
Sun Apr 14 20:52:40 CEST 2013
by Michael Peter Christen
added new field host_extent_i which, after a crawl and postprocessing,
holds the number of documents for the host where the document is hosted.
This is necessary for ranking and the norming of references per local
host in the ranking computation.
Changed Files: defaults/solr.collection.schema, defaults/yacy.init, htroot/IndexControlRWIs_p.java, source/net/yacy/cora/federate/solr/SchemaConfiguration.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/CollectionSchema.java
Sun Apr 14 11:30:57 CEST 2013
by Michael Peter Christen
showing now the details of references count in host browser:
external (ext), internal (int) and external hosts (hosts) for each
indexed document.
Changed Files: htroot/HostBrowser.java
Sun Apr 14 05:33:01 CEST 2013
by reger
add admin option to delete load errors from index
Changed Files: htroot/HostBrowser.html, htroot/HostBrowser.java, htroot/HostBrowserAdmin_p.html
Sat Apr 13 23:04:44 CEST 2013
by Marc Nause
*) did some long overdue refactoring
Changed Files: source/net/yacy/repository/Blacklist.java
Sat Apr 13 21:50:48 CEST 2013
by Marc Nause
*) fixed encoding of query in link to map (in case geolocalization is
enabled, "Show search results for "köln" on map")
*) applied suggestions of Checkstyle plugin
Changed Files: htroot/yacysearchtrailer.java
Fri Apr 12 16:17:14 CEST 2013
by Michael Peter Christen
added three new field for  a better ranking: references_internal_i,
references_external_i and references_exthosts_i. These can be used to
count and evaluate the number of external links to every web page. An
experimental ranking function can be i.e.:
div(add(references_internal_i,product(references_external_i,references_exthosts_i)),add(clickdepth_i,1))
Changed Files: defaults/solr.collection.schema, source/net/yacy/cora/federate/solr/SchemaConfiguration.java, source/net/yacy/kelondro/data/citation/CitationReference.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/CollectionSchema.java
Fri Apr 12 10:48:41 CEST 2013
by Michael Peter Christen
- setting the same default ranking in the solr interface as for YaCy
search interfaces if no other ranking attributes are given
- using the YaCy ranking in the GSA interface only if there was not
given a GSA-style sort attribute
- to avoid confusion about correct ranking attributes, only the default
'0'-ranking profile is used and not scenario-adopted (site, date)
because that should be configurable in the web interface before it is
used actually for ranking.
Changed Files: htroot/gsa/searchresult.java, htroot/solr/select.java, source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/query/SearchEvent.java
Thu Apr 11 15:07:08 CEST 2013
by Michael Peter Christen
resume paused crawls on startup; user expects that restarts 'heal'
everything
Changed Files: source/net/yacy/search/Switchboard.java
Thu Apr 11 14:46:13 CEST 2013
by Michael Peter Christen
- showing references count and clickdepth in host browser
- fixed generation and presentation of both values
Changed Files: htroot/HostBrowser.html, htroot/HostBrowser.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/schema/CollectionConfiguration.java
Tue Apr 09 18:55:26 CEST 2013
by orbiter
if the crawl was paused (automatically), show the reason for pausing in
the Crawler_p servlet.
Changed Files: htroot/Crawler_p.html, htroot/Crawler_p.java
Mon Apr 08 21:25:21 CEST 2013
by reger
fix: Index Administration > Reverse Word Index (IndexControlRWIs_p)  corrected use of word search to word-hash search 
- removed duplicate QueryParams.hashes2Handles , redundant  with .hashes2Set
Changed Files: htroot/IndexControlRWIs_p.java, htroot/yacy/search.java, source/net/yacy/search/query/QueryParams.java
Sun Apr 07 10:36:05 CEST 2013
by Michael Peter Christen
added missing library after solr upgrade
Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/lucene-codecs-4.2.1.jar
Sat Apr 06 23:00:48 CEST 2013
by reger
adjust Netbeans IDE project.xml classpath for Solr 4.2.1 jars
Changed Files: nbproject/project.xml
Sat Apr 06 02:34:56 CEST 2013
by reger
comment out dead menue link
Changed Files: htroot/env/templates/submenuIndexControl.template
Sat Apr 06 02:08:01 CEST 2013
by reger
uncomment "used time" calculation for remote search log
Changed Files: htroot/AccessTracker_p.html, htroot/AccessTracker_p.java
Fri Apr 05 03:33:33 CEST 2013
by reger
improve remote search log, set "Returned Results" to transmitcount (instead of no value)
Changed Files: htroot/AccessTracker_p.html, htroot/AccessTracker_p.java
Thu Apr 04 00:40:59 CEST 2013
by reger
- fix opensearch discover err msg - webgraph not enabled - if no opensearchdescription link found in index
- remove search2.net from sample config (is down)
Changed Files: defaults/heuristicopensearch.conf, source/net/yacy/cora/federate/opensearch/OpenSearchConnector.java
Mon Apr 01 03:51:57 CEST 2013
by reger
make sure configured port is reported on recreated mySeed.txt
Changed Files: source/net/yacy/peers/Seed.java
Tue Mar 19 11:23:18 CET 2013
by Michael Peter Christen
better search timing; prevents '0 results' for very large local
indexes >> 10 mio documents
Changed Files: htroot/yacysearchitem.java
Tue Mar 19 10:33:35 CET 2013
by Michael Peter Christen
fix in GSA result writer which evaluates result context fields as
String. After the migration to Solr 4.1.0 'some' of these fields
suddenly are stored as String[]; this patch compensates this confusion.
Changed Files: source/net/yacy/cora/federate/solr/responsewriter/GSAResponseWriter.java
Tue Mar 19 10:32:01 CET 2013
by Michael Peter Christen
- callback fix
- memory allocation problem in RowCollection: if memory is too low, do
not to try to increase by 1 because this leads to very long execution
time and at the end to the same OOM as if we allocate the memory at the
moment we need it even if the resource observer states that this memory
is not there. To compensate this, the increase size is reduced.
Changed Files: htroot/portalsearch/yacy-portalsearch.js, source/net/yacy/kelondro/index/RowCollection.java
Tue Mar 19 00:59:47 CET 2013
by orbiter
renamed callback function to 'callback' because that is a standard for
jsonp which is also used in backbone.js/jquery
Changed Files: source/net/yacy/cora/federate/solr/responsewriter/JsonResponseWriter.java
Sun Mar 17 22:13:56 CET 2013
by orbiter
increased number of links limitation from 1000 to 10000 for rss feeds
and html documents
Changed Files: defaults/solr.webgraph.schema, source/net/yacy/cora/document/RSSFeed.java, source/net/yacy/document/parser/htmlParser.java
Sun Mar 17 11:43:12 CET 2013
by Frank
add the new PPMbar in Crawler_p for a better style and better use.
Changed Files: htroot/Crawler_p.html, htroot/js/Crawler.js
Sun Mar 17 10:52:31 CET 2013
by orbiter
enhanced did-you-mean (a bit): can now remember previously searched
words (plus small enhancements)
Changed Files: htroot/suggest.java, source/net/yacy/cora/document/WordCache.java, source/net/yacy/data/DidYouMean.java, source/net/yacy/search/ResourceObserver.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/query/QueryGoal.java
Sun Mar 17 03:46:29 CET 2013
by reger
add: reset Solr schema filed selection to default button in IndexSchema_p
Changed Files: htroot/IndexSchema_p.html, htroot/IndexSchema_p.java, source/net/yacy/search/Switchboard.java