Compare commits

...

318 Commits

Author SHA1 Message Date
Nicolas Lœuillet 1186b3b67a Merge pull request #986 from goofy-bz/patch-2
Update fr_FR.utf8.po
2015-01-02 21:23:42 +01:00
goofy-bz 30b948e68b Update fr_FR.utf8.po
formulation (trivial)
2015-01-02 21:13:54 +01:00
Thomas Citharel 1aa1461a2e Merge pull request #974 from jbfavre/fix_ftrss_call_can_break_import
Make call to FTRSS silent to avoid warnings which can break import
2014-12-26 10:31:15 +01:00
Jean Baptiste Favre 81315897f0 Make call to FTRSS silent to avoid warnings which can break import 2014-12-21 22:41:04 +01:00
Nicolas Lœuillet a460404252 Merge pull request #933 from wallabag/dev
1.8.1b
2014-11-16 21:12:47 +01:00
Thomas Citharel d0287608b6 update constant version 2014-11-16 02:29:01 +01:00
Nicolas Lœuillet 1532376710 Merge pull request #932 from wallabag/dev
1.8.1
2014-11-15 20:48:49 +01:00
Nicolas Lœuillet d3122db7b2 add DS_Store in gitignore 2014-11-08 08:25:00 +01:00
Nicolas Lœuillet b46b8933ab Merge pull request #926 from jsit/aboutlink
Fixing URL typo on about page
2014-11-05 17:24:05 +01:00
Jay Sitter 62f3e6db75 Fixing URL typo on about page 2014-11-05 11:22:22 -05:00
Thomas Citharel 217f3ca0b4 Merge pull request #921 from wallabag/about-page
add about page
2014-11-03 23:01:23 +01:00
Nicolas Lœuillet 3eba7538a4 Merge pull request #920 from jsit/uifixes
Many small UI changes/fixes to Baggy theme and English translation files
2014-11-03 22:48:07 +01:00
Nicolas Lœuillet fa6f5db97f Merge pull request #922 from jsit/menuhiding
Fixing menu hiding behavior when switching from mobile width to desktop width
2014-11-03 22:42:49 +01:00
Jay Sitter ebea829d80 Improving class names for menu hiding 2014-11-03 13:32:20 -05:00
Jay Sitter e319c49891 Fixing menu hiding behavior when switching from mobile width to desktop width 2014-11-03 13:28:58 -05:00
Jay Sitter efd0a9f5f1 Applying changes to config.twig to default theme as well 2014-11-03 12:06:44 -05:00
Jay Sitter 94888d5fd4 Removing title text from bookmarklet 2014-11-03 09:13:42 -05:00
Jay Sitter ac8b064f47 Restoring change password section in config 2014-11-03 09:11:25 -05:00
Nicolas Lœuillet 3c133bff49 add about page 2014-11-03 07:44:56 +01:00
Nicolas Lœuillet 20bb3f7f2a Merge pull request #919 from jsit/popupjs_
Improving JavaScript for popup forms
2014-11-03 06:46:18 +01:00
Jay Sitter cc1f78a83d Many small UI changes/fixes to Baggy theme and English translation files. May need review. 2014-11-02 13:37:44 -05:00
Jay Sitter ff02fd8aca Improving JavaScript for popup forms 2014-11-02 11:19:21 -05:00
Jay Sitter 063a2fadaa Removing spaces before colons on config screen 2014-11-02 10:30:13 -05:00
Jay Sitter 266b7328ef Merge branch 'dev' into uitextcase 2014-11-01 18:29:41 -04:00
Thomas Citharel 893b8e4cef Merge pull request #917 from jsit/menuopacity
Fixing opacity issue when using mobile menu (#912)
2014-10-31 20:45:40 +01:00
Jay Sitter 1772de2531 Changing my tabs to spaces :) 2014-10-31 15:38:25 -04:00
Jay Sitter 75dc3a71b7 Fixing opacity issue when using mobile menu 2014-10-31 15:37:08 -04:00
Jay Sitter 0be82dedb6 Capitalizing "EPUB" as is defined by IDPF: http://idpf.org/epub 2014-10-31 15:26:33 -04:00
Thomas Citharel 8a76674568 Merge pull request #913 from jsit/hotfix
Fixing regression in popup close button styling
2014-10-31 19:35:35 +01:00
Jay Sitter 40800c97b2 Fixing regression in popup close button styling 2014-10-31 14:34:10 -04:00
tcit 6926f6dcc7 Merge branch 'jsit-duplicateformstyles' into dev 2014-10-31 18:54:20 +01:00
tcit a63cd1b06f fix merge errors 2014-10-31 18:54:08 +01:00
tcit 9cf370cfb6 Merge branch 'jsit-duplicateformstyles' into dev 2014-10-31 18:48:02 +01:00
tcit ccaefcf69a merge 2014-10-31 18:47:54 +01:00
Jay Sitter 15eb5ca4b8 Merging changes with dev 2014-10-31 13:47:45 -04:00
Thomas Citharel 224528f1de Merge pull request #909 from jsit/closebutton
Standardizing class names and styles for close buttons
2014-10-31 18:44:42 +01:00
Jay Sitter ad2b61db80 Removing left border on popup forms on mobile widths 2014-10-31 13:32:34 -04:00
Jay Sitter 344c8f6b5c Fixing popup form width issue on narrow width 2014-10-31 13:28:39 -04:00
Jay Sitter 4bc70ed401 Making visual styling of search and bag it popup forms more consistent 2014-10-31 12:48:35 -04:00
Jay Sitter b95a6f57bf Removing duplicate popup form styles 2014-10-31 12:37:54 -04:00
Thomas Citharel 87e37e82fd Merge pull request #910 from jsit/default-theme-search-form-css
Moving search-form style out of messages css and into style.css
2014-10-31 17:34:03 +01:00
Jay Sitter 8519cc796f Moving search-form style out of messages css and into style.css 2014-10-31 12:31:13 -04:00
Jay Sitter 827bd1f899 Standardizing class names and styles for close buttons 2014-10-31 11:10:38 -04:00
Thomas Citharel ed0436d21e Merge pull request #908 from jsit/closemessage
Changing close message button to use × instead of X
2014-10-31 16:10:16 +01:00
Jay Sitter 242746fd17 Changing close message button to use × instead of X 2014-10-31 10:44:20 -04:00
Thomas Citharel f23fd0ee5e Merge pull request #907 from jsit/login
Some small design tweaks to the login page: Moving 'Stay signed in'...
2014-10-31 15:10:01 +01:00
Jay Sitter 1087b3cb4e Adding right margin to labels (to compensate for now-missing left margin on input fields) 2014-10-31 09:53:01 -04:00
Jay Sitter f60c9b00ab Some small design tweaks to the login page: Moving 'Stay signed in' label and removing left margin on input boxes 2014-10-31 09:30:57 -04:00
Thomas Citharel 6fe9b616aa Merge pull request #894 from wallabag/change-password-field
Fix #891: change type for password field in installation
2014-10-31 11:51:13 +01:00
Thomas Citharel 655550e23a Merge pull request #904 from wallabag/vagrantfile
Vagrantfile
2014-10-31 11:50:53 +01:00
Nicolas Lœuillet 4bada2b954 Merge pull request #906 from jsit/tagstitle
Uppercasing 'tags' page title
2014-10-30 21:55:46 +01:00
Jay Sitter a87a1b7d3b Uppercasing 'tags' page title 2014-10-30 16:54:24 -04:00
Nicolas Lœuillet 4fae3b0a85 Merge pull request #898 from jsit/previewtext
Fixing issue #874, displaying preview text when in list mode
2014-10-30 21:43:21 +01:00
Nicolas Lœuillet 052bdfc17e Merge pull request #897 from jsit/displaymode
Fixing display mode switching in Baggy theme (issue #896)
2014-10-30 21:43:15 +01:00
Nicolas Lœuillet 476b8902bb Merge pull request #905 from jsit/closebutton
Making the close button more visually consistent on the menu popup forms
2014-10-30 21:25:26 +01:00
Nicolas Lœuillet 6f0b92138f Merge pull request #903 from jsit/uitextcase
Fixing some more text case issues
2014-10-30 21:08:11 +01:00
Jay Sitter cd271fc485 Making the close button more visually consistent on the menu popup forms 2014-10-30 16:00:18 -04:00
Nicolas Lœuillet 0bf65303ca change database name 2014-10-30 20:43:39 +01:00
Nicolas Lœuillet c4800fc6da ignore vagrant directory 2014-10-30 20:42:17 +01:00
Nicolas Lœuillet d51c2e05d3 Vagrantfile, from @fguillot for kanboard 2014-10-30 20:40:56 +01:00
Jay Sitter ce096afed7 Fixing some more text case issues 2014-10-30 15:37:59 -04:00
Nicolas Lœuillet 06e7e7ff7b Merge pull request #902 from jsit/en_us
Adding 'en_US' locale (issue #901)
2014-10-30 20:34:08 +01:00
Jay Sitter bbbda080bf Adding 'en_US' locale (issue #901) 2014-10-30 15:32:00 -04:00
Jay Sitter 574f3faf06 Adding 'en_US' locale (issue #901) 2014-10-30 15:30:09 -04:00
Thomas Citharel b56c86457c Merge pull request #900 from jsit/uitextcase
Fixing a bunch of English translation letter casing and syntax (issue #899)
2014-10-30 18:37:58 +01:00
Jay Sitter 7212386e98 Fixing a bunch of English translation letter casing and syntax (issue #899) 2014-10-30 12:17:26 -04:00
Jay Sitter b73a175386 Fixing issue #874, displaying preview text when in list mode 2014-10-30 11:23:18 -04:00
Jay Sitter c9e6fec4bf Fixing display mode switching in Baggy theme (issue #896) 2014-10-30 11:20:05 -04:00
Nicolas Lœuillet fcd37d0c7b change type for password field in installation 2014-10-29 21:02:07 +01:00
Nicolas Lœuillet b40cd4e73f Merge pull request #889 from wallabag/fix#871
Fix#871
2014-10-27 20:58:13 +01:00
tcit 1b6e21d7a6 translation fix finished for #871 and bring add tag from search feature to all themes 2014-10-27 15:12:46 +01:00
tcit 7ee1972599 translation fix for #887 and tiny display fix 2014-10-27 14:00:47 +01:00
Thomas Citharel 24479b479d Merge pull request #888 from wallabag/updated-site-config
updated site_config
2014-10-27 09:28:30 +01:00
Nicolas Lœuillet 90a1a78b1e updated site_config 2014-10-27 06:46:13 +01:00
Nicolas Lœuillet 4a50075784 Merge pull request #883 from wallabag/hotfixepub
fix #882
2014-10-22 15:12:49 +02:00
tcit 606bea72e1 fix #882 2014-10-22 15:10:38 +02:00
Thomas Citharel 4eb603430d Merge pull request #879 from Marmo/patch-1
update zeit.de.txt for removal of inline ads
2014-10-21 19:42:21 +02:00
Marmo 76b1e0babe update zeit.de.txt for removal of inline ads 2014-10-21 19:33:40 +02:00
Nicolas Lœuillet f2248e604d Merge pull request #878 from wallabag/greybuttonread
fix #873
2014-10-20 15:07:24 +02:00
tcit f56791e6c4 fix #873 2014-10-19 11:12:25 +02:00
tcit 750d904a16 fix translation issues 2014-10-17 21:08:08 +02:00
Nicolas Lœuillet 691a03f176 Merge pull request #868 from wallabag/popupoverlap
fix for #830
2014-10-15 16:53:06 +02:00
tcit 48fb171d7a fix for #830 2014-10-15 16:47:38 +02:00
Nicolas Lœuillet 8fd0512a3c Merge pull request #848 from 11mariom/dev
Add support for custom http port
2014-10-14 19:57:16 +02:00
Nicolas Lœuillet 5b16d508b5 Merge pull request #843 from rros/mysql-utf8mb4
Convert the MySQL charset to utf8mb4 to support the full range of unicode
2014-10-14 19:56:50 +02:00
Nicolas Lœuillet 05e313ad28 Merge pull request #867 from wallabag/zindex-menu-bug
fix z-index-menu mobile view bug #834
2014-10-14 19:50:28 +02:00
tcit b9fa7d2c9c fix z-index-menu mobile view bug #834 2014-10-12 10:24:07 +02:00
Thomas Citharel 8ce508cab0 Create adme.ru.txt
Siteconfig
2014-10-12 10:00:35 +02:00
Thomas Citharel dffbec1c44 Merge pull request #865 from Marmo/patch-1
update heise.de.txt
2014-10-11 15:30:51 +02:00
Marmo ad0eccb4cd update heise.de.txt
Multi-page Telepolis-articles (www.heise.de/tp/...) are not fetched correctly atm. My addition to the single_page_link makes it work (tested with http://www.heise.de/tp/artikel/42/42579/1.html).
2014-10-11 15:22:53 +02:00
Nicolas Lœuillet 44d35257e8 Merge branch 'dev' 2014-10-10 13:33:54 +02:00
Nicolas Lœuillet cf8a5e1eed Merge branch 'master' into dev
Conflicts:
	index.php
2014-10-10 13:33:36 +02:00
Thomas Citharel 6b0894c66a Merge pull request #860 from wallabag/compatibility_file
Move compatibility file (fixes #858)
2014-10-08 21:36:51 +02:00
tcit a7058a5a13 Right redirect from the new path 2014-10-08 21:35:21 +02:00
Thomas Citharel 1403af5be3 Merge pull request #861 from wallabag/fix-query-sqlite-install
query for populate mysql/postgres was called when we choosed sqlite
2014-10-08 21:32:03 +02:00
Nicolas Lœuillet 20b4d7d621 query for populate mysql/postgres was called when we choosed sqlite 2014-10-08 21:23:34 +02:00
Nicolas Lœuillet 7331ed3e80 change href in install/index.php 2014-10-08 21:11:56 +02:00
Nicolas Lœuillet 79dd109e37 Fixes #858: move compatibility file into install folder 2014-10-08 21:08:21 +02:00
Thomas Citharel a305326973 Merge pull request #787 from wallabag/data-for-mysql
Add data for mysql installation, see #624
2014-10-08 19:32:39 +02:00
tcit 3dca040a0b Fix bug for #787 2014-10-08 19:31:15 +02:00
tcit 8327f1c371 Merge branch 'dev' into data-for-mysql 2014-10-08 19:26:26 +02:00
Nicolas Lœuillet 73c833780c Merge pull request #855 from wallabag/fix-828
Fix #828
2014-10-04 21:27:05 +02:00
Nicolas Lœuillet f2cc1db1a8 Merge pull request #856 from wallabag/fix-826
Fix #826
2014-10-04 20:34:40 +02:00
tcit 34c2d1bdd1 get content 2014-10-04 20:17:00 +02:00
Nicolas Lœuillet 29e95769b5 Merge pull request #854 from wallabag/saveclick2search
Saveclick2search (fix for #831)
2014-10-04 20:13:10 +02:00
tcit e3c44f9c0f get full content 2014-10-04 19:45:02 +02:00
tcit 40d2042228 small fix for better width for search translations full display 2014-10-04 19:08:56 +02:00
tcit ab494e4ede translate search messages 2014-10-04 19:01:43 +02:00
tcit 1cd02d55fb autofocus on all themes 2014-10-04 18:51:43 +02:00
tcit f183f72bf4 Merge branch 'dev' into saveclick2search 2014-10-04 18:47:56 +02:00
tcit 8b6c710b09 fixed bug in config screen for default theme 2014-10-04 18:45:43 +02:00
tcit 04b589420e search field selected 2014-10-04 18:44:18 +02:00
Thomas Citharel e38e46ecdb Merge pull request #853 from wallabag/fix-for-#797
Fix for #797
2014-10-04 17:50:01 +02:00
tcit ace428669b fix for #758 2014-09-28 19:12:28 +02:00
tcit b37110cc82 Merge branch 'issue-844' of https://github.com/rros/wallabag into dev 2014-09-28 17:48:06 +02:00
tcit cde2fc3842 Merge branch 'dev' of https://github.com/wallabag/wallabag into dev 2014-09-28 17:32:50 +02:00
tcit ffcd442989 get up to date for merge 2014-09-28 17:31:02 +02:00
Thomas Citharel 76dd27e7f7 Merge pull request #802 from tcitworld/traductionfix
Traductionfix
2014-09-28 17:25:40 +02:00
tcitworld a0822259e7 Merge pull request #841 from wallabag/fixGDdetection
Fix #766 - GD detection
2014-09-27 18:13:35 +02:00
tcit 9b8283d0fc Merge branch 'refactor' into dev 2014-09-27 17:54:24 +02:00
tcit 04a7674bdd merge refactor and dev 2014-09-27 17:54:13 +02:00
Mariusz Kozakowski 2d4cfc58ec Add support for custom http port
Now you can use wallabag behind reverse proxy (i.e Squid or Varnish)
without problem with urls like wallabag.example.com:8080.
2014-09-23 18:44:14 +02:00
Robert Ros 0dc4797a4c Fix the PostgreSQL install errors 2014-09-21 00:39:40 +02:00
Robert Ros b668db242d Convert the MySQL charset to utf8mb4 to support the full range of unicode characters 2014-09-18 22:29:22 +02:00
Thomas Citharel bbfe6fa50b Fix #766 - GD detection 2014-09-17 16:36:10 +02:00
Nicolas Lœuillet a15108e65b Merge pull request #839 from wallabag/fixlocalpictures
fix pictures display when DOWNLOAD_PICTURES is enabled
2014-09-16 21:18:41 +02:00
Thomas Citharel aa1083bdac fix pictures display when DOWNLOAD_PICTURES is enabled 2014-09-16 20:27:03 +02:00
Nicolas Lœuillet b3c720b1c3 Merge pull request #836 from akoenig/x-forwarded-port
Implemented additional check for using the 'X-Forwarded-Port' header.
2014-09-16 20:09:58 +02:00
Nicolas Lœuillet 657245dcbd Merge pull request #771 from tcitworld/refactor
fixed bug for epub export #755 ; also better metadata title
2014-09-16 15:21:12 +02:00
André König 5af2555f59 Implemented additional check for using the 'X-Forwarded-Port' header. 2014-09-11 13:17:19 +02:00
tcitworld 49882dc151 Merge pull request #819 from wallabag/fixSQLiteDownloadDB
Fix downloading SQLite database from all users
2014-09-10 20:05:07 +02:00
Nicolas Lœuillet 19438d3021 Merge pull request #816 from zinnober/dev
Complete rework of faz.net-template
2014-09-03 13:14:29 +02:00
Thomas Citharel d5c481c2f4 remove old function 2014-08-28 21:01:43 +02:00
Thomas Citharel 8763e4efde Fix downloading SQLite database from all users 2014-08-26 12:43:56 +02:00
zinnober ecb8c1389c Complete rework of faz.net-template adding multipage support and major article cleanup 2014-08-23 16:47:29 +02:00
Maryana Rozhankivska d05f5eeb1d added moreQueries for postgressql 2014-08-21 19:07:19 +03:00
Thomas Citharel 4362417495 Merge branch 'dev' of https://github.com/wallabag/wallabag into dev 2014-08-21 16:42:22 +02:00
tcitworld a9bbe11169 Merge pull request #814 from wallabag/fix-issue813
vendor dir is not accessible before install, sqlite db dir write check moved into db class
2014-08-21 16:28:16 +02:00
Thomas Citharel 45e60cb52a Merge branch 'dev' of https://github.com/wallabag/wallabag into dev 2014-08-21 16:24:13 +02:00
Maryana Rozhankivska 211068ce50 vendor dir is not accessible before install, sqlite db dir write check moved into db class 2014-08-21 17:17:36 +03:00
Nicolas Lœuillet 051f7fb28c Merge pull request #783 from wallabag/message-after-login
#763 fix to display the login successful message with the translation
2014-08-18 14:41:09 +02:00
Nicolas Lœuillet 79666a3046 Merge pull request #784 from wallabag/fix-successful-add-message
fix display of 'Done' message when we add a link from 'save a link' item
2014-08-18 14:40:54 +02:00
Nicolas Lœuillet 78abff6a52 Merge pull request #785 from wallabag/change-default-pagination
change default pagination, set it to 12, to have a nice baggy display
2014-08-18 14:40:17 +02:00
Thomas Citharel 1daa8e4a0f merge fix 776 2014-08-16 00:54:46 +02:00
Maryana Rozhankivska dc76489221 minimum of control on server side added 2014-08-15 19:22:55 +03:00
tcitworld 7c503c4438 Fix for #797 2014-08-05 22:19:46 +02:00
tcit a34d920847 Improved instructions 2014-08-03 18:17:43 +02:00
tcit 2e8625c25f little fix 2014-07-29 22:18:15 +02:00
tcit 280972a66c changes in all themes 2014-07-26 12:44:55 +02:00
tcit 200c758ff4 Translations 2014-07-26 12:42:48 +02:00
tcit 9f3477a279 precision 2014-07-25 08:42:30 +02:00
tcit 046b931624 added email field 2014-07-25 08:42:03 +02:00
Nicolas Lœuillet 70549136ba link to guidelines in contributing file 2014-07-25 07:52:00 +02:00
Nicolas Lœuillet 6c0c750000 thank you @mariroz & @tcitworld :) 2014-07-25 07:50:56 +02:00
Nicolas Lœuillet 2f3c05651e guidelines for wallabag 2014-07-25 07:50:15 +02:00
Nicolas Lœuillet fa9a7bbb3c Merge branch 'fix/securityAllowedActions' into dev 2014-07-25 07:27:21 +02:00
Nicolas Lœuillet 830612f555 typo 2014-07-25 07:26:56 +02:00
Nicolas Lœuillet af8292c1de Merge branch 'fix/securityMaster' 2014-07-24 21:41:16 +02:00
Nicolas Lœuillet 38cf3413df 1.7.2 2014-07-24 21:41:01 +02:00
Maryana Rozhankivska 800868e27e security fix 2014-07-24 17:47:23 +03:00
Maryana Rozhankivska 7dd8b5026d security issue 2014-07-24 16:48:41 +03:00
Nicolas Lœuillet 6da20812ce Merge branch 'dev' of github.com:wallabag/wallabag into dev 2014-07-23 13:45:07 +02:00
Nicolas Lœuillet 887b015def Merge branch 'refactor' into dev 2014-07-23 13:44:48 +02:00
Nicolas Lœuillet 505a74ad1d Merge branch 'dev' into refactor
Conflicts:
	check_setup.php
	index.php
2014-07-23 13:42:30 +02:00
Nicolas Lœuillet 83cac9ac05 Merge pull request #789 from wallabag/feature/someMoreSitesConfig
config for habrahabr.ru to grab articles with comments
2014-07-23 13:38:21 +02:00
Nicolas Lœuillet a818ff2000 removed permissions test on htmlpurifier 2014-07-23 13:35:19 +02:00
Maryana Rozhankivska 0ce85e0a7f config for habrahabr.ru to grep articles with comments 2014-07-23 14:27:57 +03:00
Nicolas Lœuillet 86edff4447 Add data for mysql installation, see #624 2014-07-22 21:48:21 +02:00
Nicolas Lœuillet ebd6bf6007 Merge branch 'anno1337-dev' into dev 2014-07-22 21:45:21 +02:00
Nicolas Lœuillet 1f78bd8471 Merge branch 'dev' of github.com:anno1337/wallabag into anno1337-dev 2014-07-22 21:26:02 +02:00
Nicolas Lœuillet f83ffc3ac3 Merge branch 'feature/programmingCodeSyntaxHighlighting' into dev 2014-07-22 19:33:34 +02:00
Nicolas Lœuillet 392f9a1b9c Merge branch 'dev' into feature/programmingCodeSyntaxHighlighting 2014-07-22 19:32:24 +02:00
Maryana Rozhankivska 9f8541ef2a highlight.js library added to highlight programming code examples in article view 2014-07-22 20:17:15 +03:00
Nicolas Lœuillet cca9284b6a change default pagination, set it to 12, to have a nice baggy display 2014-07-22 18:14:41 +02:00
Nicolas Lœuillet 3e87066506 fix display of 'Done' message when we add a link from 'save a link' item 2014-07-22 18:12:03 +02:00
Nicolas Lœuillet 9cf6bac1a5 fix to display the login successful message with the translation 2014-07-22 18:01:27 +02:00
Nicolas Lœuillet b738bea9ca Fix #776 2014-07-22 16:37:13 +02:00
Marc Wilhelm 9c67b1b829 Split up check_setup.php into two files. The new file check_essentials.php takes care of stuff like the PHP version and is executed before the config files are included which are needed by check_setup. This patch addresses issue #773 2014-07-22 11:52:18 +02:00
Nicolas Lœuillet 955fc67438 Merge pull request #775 from wallabag/feature/someMoreSitesConfig
issue #750 - config for dn.pt site added
2014-07-21 21:31:45 +02:00
Nicolas Lœuillet 91b6be3186 Merge branch 'skibbipl-dev' into dev 2014-07-21 21:22:34 +02:00
Nicolas Lœuillet 17065e613f Merge branch 'dev' of github.com:skibbipl/wallabag into skibbipl-dev
Conflicts:
	locale/pl_PL.utf8/LC_MESSAGES/pl_PL.utf8.mo
	locale/pl_PL.utf8/LC_MESSAGES/pl_PL.utf8.po
2014-07-21 21:21:55 +02:00
skibbipl cec19bd866 Updated polish translation 2014-07-21 20:58:58 +02:00
Maryana Rozhankivska 5594d7d054 issue #750 - config for dn.pt site added 2014-07-21 19:34:59 +03:00
tcit 2b58426b2d fixed bug for epub export #755 ; also better metadata title 2014-07-20 00:45:45 +02:00
Nicolas Lœuillet 6a4bbf0fe5 Merge branch 'refactor' of github.com:wallabag/wallabag into refactor 2014-07-18 11:29:05 +02:00
Nicolas Lœuillet 8e68391a57 remove .idea in gitignore 2014-07-18 11:28:49 +02:00
Nicolas Lœuillet 93edcab52e Merge pull request #764 from tcitworld/refactor
Refactor Flattr class.
2014-07-17 16:06:53 +02:00
tcitworld ccd0b381b6 camelCase for FlattrItem class (following) 2014-07-17 15:42:59 +02:00
tcitworld d259f73665 camelCase for FlattrItem class 2014-07-17 15:34:55 +02:00
Nicolas Lœuillet 0f6273cdb8 Merge pull request #761 from wallabag/dev
1.7.1
2014-07-15 11:49:24 +02:00
Nicolas Lœuillet 4e067ceabd updated specific configuration for parsing 2014-07-13 10:15:40 +02:00
Nicolas Lœuillet 58dbe10388 #584 check permissions for HTMLPurifier/DefinitionCache/Serializer folder 2014-07-12 22:08:48 +02:00
Nicolas Lœuillet d423113b00 #683 Rename « home » into « unread » 2014-07-12 21:50:29 +02:00
Nicolas Lœuillet 26452f891f Merge pull request #752 from mariroz/dev
fix of issue #650, #619 and other similar, error in JSLikeHTMLElement: node no longer exists.
2014-07-12 19:28:16 +02:00
Nicolas Lœuillet 2f26729c84 Refactor 2014-07-12 19:01:11 +02:00
Nicolas Lœuillet b6a3c8866a forgot run() call 2014-07-12 16:41:55 +02:00
Nicolas Lœuillet d610968932 ignore my PHPStorm config 2014-07-12 16:40:00 +02:00
Nicolas Lœuillet 26b77483ee remove PicoFarad
I’ll implement it an other day.
2014-07-12 16:39:31 +02:00
Nicolas Lœuillet d14e3f1e22 Merge pull request #754 from sinisterstuf/about.com
Add support for *.about.com
2014-07-12 15:10:05 +02:00
Nicolas Lœuillet b3cda72e93 PicoFarad framework for routing 2014-07-11 17:06:51 +02:00
Nicolas Lœuillet 3602405ec0 WHAT. A. BIG. REFACTOR. + new license (we moved to MIT one) 2014-07-11 16:03:59 +02:00
Siôn Le Roux d59536deea Add support for *.about.com
Includes next_page_link for multi-page articles and strips pesky in-line
'next' links from the article body. Also includes an Xpath for author
but I can't see where this is used in the wallabag UI.

The 'tidy' option is turned off because it messed up bulleted lists.

Tested with psychology.about.com and food.about.com.
2014-07-11 00:04:24 +02:00
Nicolas Lœuillet 6400371ff9 I removed my previous commit. We have to create a new branch for that. 2014-07-10 13:17:04 +02:00
Maryana Rozhankivska c1aad6d574 fix of issue #619 and other similar, error in JSLikeHTMLElement: node no longer exists. 2014-07-09 16:56:52 +03:00
Maryana Rozhankivska cc1ec61b85 fix of issue #619 and other similar, error in JSLikeHTMLElement: node no longer exists. 2014-07-09 16:50:52 +03:00
Nicolas Lœuillet c710f977b2 new call for having domain name in entry view 2014-07-08 21:57:53 +02:00
Nicolas Lœuillet 5425b0dd82 new fields in database, reading time / date and domain name are stored 2014-07-08 21:46:32 +02:00
Nicolas Lœuillet 4247b37551 Merge pull request #751 from mariroz/dev
quick fix of issue #750: mulipage content for politico.com/magazine articles
2014-07-07 21:11:07 +02:00
Maryana Rozhankivska 82980a148b quick fix of issue #750: mulipage content for politico.com/magazine articles 2014-07-07 19:17:55 +03:00
Nicolas Lœuillet c13aac1bc3 1.7.1 2014-07-05 15:49:40 +02:00
tcitworld da87848cee new config file, fix for #740 2014-07-01 10:18:44 +02:00
tcitworld 25052a76ca fix for #738 2014-06-30 23:24:46 +02:00
tcitworld a13ff95777 security check 2014-06-30 22:15:55 +02:00
tcitworld cdda041a90 Merge pull request #737 from mariroz/dev
fix of issue #677: When downloading images, wallabag doesnt respect html "base" tag, tnx to @fivefilters
2014-06-25 19:33:28 +02:00
Maryana Rozhankivska 6924253423 fix of issue #677: When downloading images, wallabag doesnt respect html "base" tag, tnx to @fivefilters 2014-06-25 20:00:00 +03:00
tcitworld 69213014d1 Merge pull request #736 from mariroz/dev
fix of issue #718: Error parsing file imported from Pocket #718
2014-06-25 18:54:39 +02:00
Maryana Rozhankivska aa126ba458 fix of issue #718: Error parsing file imported from Pocket #718 2014-06-25 19:34:14 +03:00
tcitworld c9563378ea Merge pull request #728 from Draky50110/dev
typo FR après vérif.
2014-06-12 23:30:53 +02:00
draky50110 ba22fb1cef typo mineure 2014-06-12 23:10:26 +02:00
draky50110 29cd317aff fin de correction typo FR 2014-06-12 22:21:44 +02:00
draky50110 0bf95d865a Revert "Typo FR (suite)"
This reverts commit 7f186e21e0.

Conflicts:
	locale/fr_FR.utf8/LC_MESSAGES/fr_FR.utf8.mo
	locale/fr_FR.utf8/LC_MESSAGES/fr_FR.utf8.po
2014-06-12 22:16:04 +02:00
draky50110 ae43ec99d9 typo FR 3 2014-06-12 20:32:02 +02:00
draky50110 7f186e21e0 Typo FR (suite) 2014-06-12 18:55:38 +02:00
tcitworld bca2853ade Merge pull request #724 from Draky50110/dev
typo FR
2014-06-12 09:34:05 +02:00
draky50110 97d54f2ac8 typo FR 2014-06-12 01:00:49 +02:00
tcitworld 8142d4b1e6 Merge pull request #722 from tcitworld/dev
do not output debug while generating epub
2014-06-07 16:38:39 +02:00
tcit 35d4e27588 up to date 2014-06-07 16:36:57 +02:00
tcit ec15d0a784 do not debug inside an epub 2014-06-07 15:53:39 +02:00
tcitworld c93a5c137f Merge pull request #716 from mariroz/dev
error reporting level set in E_ALL & ~E_NOTICE by default, can be overriden in config
2014-06-05 16:59:39 +02:00
Maryana Rozhankivska 752cd4a8ef error reporting level set in E_ALL & ~E_NOTICE by default, can be overriden in config 2014-06-02 18:00:09 +03:00
tcitworld 5d198e2b98 Merge pull request #715 from mariroz/dev
fix of undefined ATOM constant warning in full-text-rss, will fix ios-app issue #14
2014-06-01 19:06:45 +02:00
Maryana Rozhankivska 1d14e65315 fix of undefined ATOM constant warning in full-text-rss, will fix ios-app issue #14 2014-06-01 19:49:22 +03:00
Nicolas Lœuillet 67a8848aed Merge pull request #713 from mariroz/dev
small xss vulnerability and translation ability fix
2014-05-30 16:51:13 +02:00
Maryana Rozhankivska 30bd273580 small xss vulnerability and translation ability fix 2014-05-30 17:17:34 +03:00
Maryana Rozhankivska cbc75befb5 small xss vulnerability and translation ability fix 2014-05-30 17:14:53 +03:00
Nicolas Lœuillet a9f5e572dd Merge pull request #712 from wallabag/dev
1.7, call me "Premium version"
2014-05-29 18:54:06 +02:00
Nicolas Lœuillet 8038b38802 1.7, premium version :) 2014-05-29 18:52:34 +02:00
Nicolas Lœuillet 79024eb004 fix #344 FQDN with non-standard ports broken 2014-05-29 18:32:55 +02:00
Nicolas Lœuillet 0c3db64585 Merge pull request #711 from mariroz/dev
fix of uninitialized object warning, issue #710
2014-05-29 16:04:26 +02:00
Maryana Rozhankivska 3dc8d84229 fix of uninitialized object warning, issue #710 2014-05-29 16:35:00 +03:00
Nicolas Lœuillet 87f01ea2e9 Merge pull request #707 from mariroz/dev
update to 3.2 version of full-text-rss, issue #694
2014-05-29 12:50:28 +02:00
Maryana Rozhankivska 0b9bb8cb78 add dailymotion videos, issue #708 2014-05-26 14:29:18 +03:00
Maryana Rozhankivska 009669360d fix of onmouseover displaying of wallbabag a link in wallabag form in default theme 2014-05-24 11:21:43 +03:00
Maryana Rozhankivska a342945b61 fix of rearch form popup in default theme 2014-05-24 11:08:39 +03:00
Maryana Rozhankivska 1fce49fac7 full-text rss config file for ted.com, issue #676 2014-05-23 19:29:01 +03:00
Maryana Rozhankivska a50583fb97 last 3 important changes to 3.2 version of full-text-rss, issue #694 2014-05-23 19:27:17 +03:00
Maryana Rozhankivska d18ff7d956 two small unimportant forgotten changes to 3.2 version of full-text-rss, issue #694 2014-05-23 19:25:48 +03:00
Maryana Rozhankivska 3ec62cf95a update to 3.2 version of full-text-rss, issue #694 2014-05-22 17:16:38 +03:00
tcitworld ab157bbb75 Merge pull request #706 from gboudreau/dev
Fixed Baggy theme CSS for Chrome Extension (and < 500px width layout)
2014-05-21 22:13:31 +02:00
Guillaume Boudreau f61ffec352 Fixed Baggy theme CSS for Chrome Extension (and < 500px width layout) 2014-05-21 15:54:54 -04:00
tcitworld 88f0e31622 Merge pull request #703 from tcitworld/images_security
Security fixes for downloaded images (thanks @leblanc-simon)
2014-05-20 11:55:20 +02:00
tcitworld 38eecef26b Added info for DOWNLOAD_PICTURES
We regenerate pictures, it might take some time
2014-05-20 11:46:05 +02:00
tcitworld 99408dfcf3 Merge pull request #1 from leblanc-simon/images_security
Optimisation et gestion des erreurs
2014-05-20 11:42:22 +02:00
Simon Leblanc 0bf0dfe10d Optimisation et gestion des erreurs 2014-05-20 00:42:51 +02:00
tcit e3b00bcaf5 Fixed bug for png images 2014-05-19 15:59:18 +02:00
tcit 6caba976ec Bug with bracket 2014-05-19 15:34:49 +02:00
tcit 1d6a9ac25a Option for setting quality 2014-05-19 15:24:11 +02:00
tcit 03e501dedd Merge branch 'dev' into images_security 2014-05-19 15:04:09 +02:00
tcit 18209292a4 Fix bad character encoding when downloading images 2014-05-19 15:01:36 +02:00
tcit 007f26e582 Security fix for Download Images 2014-05-18 22:11:56 +02:00
tcitworld 04b43dc097 Merge pull request #697 from tcitworld/dev
Up-to-date with me
2014-05-18 19:00:51 +02:00
tcit becc5bfbf2 Fix for #678 2014-05-18 18:56:20 +02:00
tcitworld 230fa05eb7 Better Grammar 2014-05-17 20:32:38 +02:00
tcitworld 96834a47b0 Added PHPePub 2014-05-16 17:38:19 +02:00
tcit e212e6b12a Cleaned epub fonction 2014-05-16 16:49:09 +02:00
tcit 404adf970d Merge branch 'epub' into dev
Conflicts:
	themes/baggy/home.twig
2014-05-16 16:31:18 +02:00
tcitworld 7d5d9ea449 Merge pull request #693 from tcitworld/dev
better wallabag in wallabag button (see #685)
2014-05-16 15:54:54 +02:00
tcit 74e09e562b better wallabag in wallabag button (see #685) 2014-05-16 15:34:13 +02:00
Nicolas Lœuillet 60c3a4d3e1 Merge pull request #689 from tcitworld/dev
Tag-related features
2014-05-16 13:31:53 +02:00
tcit f3f0b11393 Better names for epub files and epub in all themes now 2014-05-15 15:42:36 +02:00
tcit f2b6b4e230 Fix bugs and improved epub rendering 2014-05-14 22:03:16 +02:00
tcit 6a3c510157 Merge branch 'dev' of https://github.com/wallabag/wallabag into epub 2014-05-14 19:17:51 +02:00
tcit 4555c38d3b Changed theme for better translation method 2014-05-14 19:08:33 +02:00
tcit 24696800e5 Simplifed function which does tag cleanup 2014-05-14 18:51:02 +02:00
Nicolas Lœuillet 818b186f8a Merge pull request #680 from m-r-r/fr-1.6.1
French translation update
2014-05-13 16:44:16 +02:00
tcit 4910af33ff Removed unnecessary function for cleaning tags 2014-05-11 16:14:47 +02:00
tcit 0626e52f3c Merge branch 'dev' of https://github.com/wallabag/wallabag into dev 2014-05-11 15:44:43 +02:00
tcitworld f034640ca3 Added advices to submit a bug
Discussion at #662.
2014-05-11 15:33:44 +02:00
m-r-r 1829b362fc Updated the french translation 2014-05-10 20:11:00 +02:00
tcit 34acb02cbb Added translation capabilities for epub system 2014-05-07 12:48:46 +02:00
tcit 4877836b12 Many improvements to epub produced : better cover, better tags 2014-05-07 12:40:09 +02:00
Nicolas LOEUILLET 07ed2b0231 #568 missing rss.png files 2014-05-02 17:04:08 +02:00
tcit 9c743ab965 Clean old unused tags when deleting a tag 2014-04-30 12:14:20 +02:00
tcit 78bddb22be Remove unnecessary line 2014-04-30 11:31:21 +02:00
tcit decc23aaf2 Added save search as tag functionality 2014-04-30 11:25:03 +02:00
tcit 2395a3802a Changed template organisation 2014-04-29 19:57:59 +02:00
tcit 7ec445b06e Big changes for epub export. Now possible to do it from a tag, a category and a search. Also, improved ebook rendering. 2014-04-25 16:20:25 +02:00
Nicolas Lœuillet 21f29fe492 Merge pull request #667 from mariroz/dev
fix of putenv warning under safe mode, issue #646
2014-04-25 13:03:09 +02:00
Maryana Rozhankivska 5b5e47c3ae fix of already defined constant notice, issue #653 2014-04-25 13:41:42 +03:00
Maryana Rozhankivska b6413975c3 fix of putenv warning under safe mode, issue #646 2014-04-25 13:25:03 +03:00
Nicolas Lœuillet d151b51c67 Merge pull request #665 from mariroz/dev
fix of rss headers problem
2014-04-24 18:05:16 +02:00
Maryana Rozhankivska 827f5b42a6 fix of rss headers problem 2014-04-24 11:48:00 +03:00
tcit ef17914960 Fix for #664 - Missing source url attribute in RSS feeds 2014-04-24 09:39:50 +02:00
tcit 72a857158c Fixed a bug into PHPePub with special caracters 2014-04-24 03:08:31 +02:00
tcit 87090d8ae7 Added epub export function 2014-04-24 03:04:02 +02:00
tcit 8af31ae0f7 Added default statement for db system 2014-04-23 22:48:33 +02:00
tcit feecea2806 Fix #657 and postgresql problems 2014-04-23 22:39:43 +02:00
tcit 07da861126 Added multi-user functionnality in all themes 2014-04-23 10:53:57 +02:00
tcit c97d23c533 Merge branch 'dev' of https://github.com/wallabag/wallabag into dev 2014-04-23 10:34:22 +02:00
tcit 4d99bae893 Fixed Multi-user system 2014-04-23 10:29:53 +02:00
Nicolas Lœuillet 2a6440c134 Merge pull request #659 from mariroz/dev
last error: config.inc.default.php missed
2014-04-22 21:44:12 +02:00
Maryana Rozhankivska 4dbba60439 last error: config.inc.default.php missed 2014-04-22 22:38:10 +03:00
Nicolas Lœuillet 29cf52b677 Merge pull request #658 from mariroz/dev
inc/poche/config.inc.php.new removed
2014-04-22 20:48:42 +02:00
Maryana Rozhankivska fd86559a5b inc/poche/config.inc.php.new removed 2014-04-22 21:39:03 +03:00
Nicolas Lœuillet d70dd7ac69 Merge pull request #653 from mariroz/dev
print view fixed in baggy; print link added; read percent added in default theme; archive and favorite re-factored to be ajax action in article view
2014-04-22 20:18:46 +02:00
Maryana Rozhankivska 43c7b978c3 config.inc.php.new renamed in config.inc.default.php 2014-04-22 20:58:40 +03:00
Nicolas Lœuillet dfff18f81b Merge pull request #655 from julien-m/patch-2
parse mobile.lemondeinformatique.fr
2014-04-22 18:24:17 +02:00
Maryana Rozhankivska a4a870e1ec default config is included after user config to prevent undefined constant warning 2014-04-22 19:11:26 +03:00
jmlrt 03303cd71b parse mobile.lemondeinformatique.fr
site_config file created because the title isn't parsed by default for mobile version of lemondeinformatique.fr
2014-04-22 14:44:31 +02:00
Maryana Rozhankivska c2cf7075c2 print view fixed in baggy; print link added; read percent added in default theme; archive and favorite re-factored to be ajax action in article view 2014-04-22 10:45:09 +03:00
Nicolas Lœuillet bfe1ad6dbc Merge pull request #643 from wallabag/fix-636
[fix] rss feed content type set to text/xml #636
2014-04-15 21:49:37 +02:00
Nicolas Lœuillet 6212acfc81 [fix] rss feed content type set to text/xml #636 2014-04-15 21:49:00 +02:00
Nicolas Lœuillet 847f57686e Merge pull request #642 from jplock/patch-2
Added IF NOT EXISTS
2014-04-14 15:42:09 +02:00
Justin Plock 44fd0faa23 Added IF NOT EXISTS 2014-04-14 09:40:35 -04:00
Nicolas Lœuillet df6c8b3be9 Merge pull request #637 from Bubbendorf/dev
Improvements on french translation
2014-04-13 12:50:48 +02:00
Amaury Carrade a192c21c6a Improved french translation 2014-04-13 00:02:03 +02:00
Nicolas Lœuillet f3312ce58d Merge pull request #635 from DmitrySandalov/dev
added missing image to dmagenta theme
2014-04-11 22:56:59 +02:00
Dmitry Sandalov 603ecb0052 added missing image to dmagenta theme 2014-04-12 00:08:11 +04:00
1093 changed files with 24283 additions and 11950 deletions

4
.gitignore vendored
View File

@ -1,7 +1,9 @@
.DS_Store
assets/*
cache/*
vendor
composer.phar
db/poche.sqlite
inc/poche/config.inc.php
inc/3rdparty/htmlpurifier/HTMLPurifier/DefinitionCache/Serializer/
inc/3rdparty/htmlpurifier/HTMLPurifier/DefinitionCache/Serializer/
.vagrant

View File

@ -1,11 +1,30 @@
# How contributing
# How to contribute
## You found a bug
Please [open a new issue](https://github.com/wallabag/wallabag/issues/new).
To fix the bug quickly, we need some infos:
* your wallabag version (in ./index.php)
* your wallabag version (on top of the ./index.php file, and also on config page)
* your webserver installation :
* type of hosting (shared or dedicaced)
* in case of a dedicaced server, the server and OS used
* the php version used, eventually `phpinfo()`
* which storage system you choose at install (SQLite, MySQL/MariaDB or PostgreSQL)
* any problem on the `wallabag_compatibility_test.php` page
* any particular details which could be related
If relevant :
* the link you want to save and which causes problem
* the file you want to import into wallabag, or just an extract
If you have the skills :
* enable DEBUG mode and look the output at cache/log.txt
* look for errors into php and server logs
Note : If you have large portions of text, use [Github's Gist service](https://gist.github.com/) or other pastebin-like.
## You want to fix a bug or to add a feature
Please fork wallabag and work with **the dev branch** only. **Do not work on master branch**.
Please fork wallabag and work with **the dev branch** only. **Do not work on master branch**.
[Don't forget to read our guidelines](https://github.com/wallabag/wallabag/blob/dev/GUIDELINES.md).

View File

@ -1,14 +1,19 @@
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
Version 2, December 2004
Copyright (c) 2013-2014 Nicolas Lœuillet
Copyright (C) 2004 Sam Hocevar <sam@hocevar.net>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is furnished
to do so, subject to the following conditions:
Everyone is permitted to copy and distribute verbatim or modified
copies of this license document, and changing it is allowed as long
as the name is changed.
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. You just DO WHAT THE FUCK YOU WANT TO.
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

View File

@ -1,7 +1,6 @@
wallabag is based on :
* PHP Readability https://bitbucket.org/fivefilters/php-readability
* Full Text RSS http://code.fivefilters.org/full-text-rss/src
* Encoding https://github.com/neitanod/forceutf8
* logo by Maylis Agniel https://github.com/wallabag/logo
* icons http://icomoon.io
* PHP Simple HTML DOM Parser (for Pocket import) http://simplehtmldom.sourceforge.net/
@ -9,7 +8,10 @@ wallabag is based on :
* Twig http://twig.sensiolabs.org
* Flash messages https://github.com/plasticbrain/PHP-Flash-Messages
* Pagination https://github.com/daveismyname/pagination
* PHPePub https://github.com/Grandt/PHPePub/
wallabag is developed by Nicolas Lœuillet under the Do What the Fuck You Want to Public License
wallabag is mainly developed by Nicolas Lœuillet under the MIT License
Contributors : https://github.com/wallabag/wallabag/graphs/contributors
Thank you so much to @tcitworld and @mariroz.
Contributors : https://github.com/wallabag/wallabag/graphs/contributors

53
GUIDELINES.md Normal file
View File

@ -0,0 +1,53 @@
# Guidelines for wallabag
If you want to contribute to wallabag, you have some rules to respect. These rules were defined by [PHP Framework Interop Group](http://www.php-fig.org).
## Basic Coding Standard (PSR-1)
This section of the standard comprises what should be considered the standard coding elements that are required to ensure a high level of technical interoperability between shared PHP code.
* Files MUST use only `<?php` and `<?=` tags.
* Files MUST use only UTF-8 without BOM for PHP code.
* Files SHOULD either declare symbols (classes, functions, constants, etc.) or cause side-effects (e.g. generate output, change .ini settings, etc.) but SHOULD NOT do both.
* Namespaces and classes MUST follow [PSR-0](https://github.com/php-fig/fig-standards/blob/master/accepted/PSR-0.md).
* Class names MUST be declared in `StudlyCaps`.
* Class constants MUST be declared in all upper case with underscore separators.
* Method names MUST be declared in `camelCase`.
You can read details on [PHP FIG website](http://www.php-fig.org/psr/psr-1/).
## Coding Style Guide (PSR-2)
This guide extends and expands on PSR-1, the basic coding standard.
The intent of this guide is to reduce cognitive friction when scanning code from different authors. It does so by enumerating a shared set of rules and expectations about how to format PHP code.
The style rules herein are derived from commonalities among the various member projects. When various authors collaborate across multiple projects, it helps to have one set of guidelines to be used among all those projects. Thus, the benefit of this guide is not in the rules themselves, but in the sharing of those rules.
* Code MUST follow PSR-1.
* Code MUST use 4 spaces for indenting, not tabs.
* There MUST NOT be a hard limit on line length; the soft limit MUST be 120 characters; lines SHOULD be 80 characters or less.
* There MUST be one blank line after the `namespace` declaration, and there MUST be one blank line after the block of `use` declarations.
* Opening braces for classes MUST go on the next line, and closing braces MUST go on the next line after the body.
* Opening braces for methods MUST go on the next line, and closing braces MUST go on the next line after the body.
* Visibility MUST be declared on all properties and methods; `abstract` and `final` MUST be declared before the visibility; `static` MUST be declared after the visibility.
* Control structure keywords MUST have one space after them; method and function calls MUST NOT.
* Opening braces for control structures MUST go on the same line, and closing braces MUST go on the next line after the body.
* Opening parentheses for control structures MUST NOT have a space after them, and closing parentheses for control structures MUST NOT have a space before.
You can read details on [PHP FIG website](http://www.php-fig.org/psr/psr-2/).

View File

@ -4,7 +4,6 @@ wallabag is a self hostable application allowing you to not miss any content any
More informations on our website: [wallabag.org](http://wallabag.org)
## License
Copyright © 2010-2014 Nicolas Lœuillet <nicolas@loeuillet.org>
Copyright © 2013-2014 Nicolas Lœuillet <nicolas@loeuillet.org>
This work is free. You can redistribute it and/or modify it under the
terms of the Do What The Fuck You Want To Public License, Version 2,
as published by Sam Hocevar. See the COPYING file for more details.
terms of the MIT License. See the COPYING file for more details.

View File

@ -1,10 +1,10 @@
# How to manage translations of wallabag
# How to manage translations for wallabag
This guide will describe procedure of translation management of wallabag web application.
This guide will describe the procedure of translation management of the wallabag web application.
All translation are made using [gettext](http://en.wikipedia.org/wiki/Gettext) system and tools.
All translations are made using [gettext](http://en.wikipedia.org/wiki/Gettext) system and tools.
You will need [Poedit](http://www.poedit.net/download.php) editor to update, edit and create your translation files comfortably. In general, you can handle translations also without it: all can be done using gettext tools and your favorite plain text editor only. This guide, however, describes editing with Poedit. If you want to use gettext only, pls refer to xgettext manual page to update po files from sources (see also how it is used by Poedit below) and use msgunfmt tool to compile .mo files manually.
You will need the [Poedit](http://www.poedit.net/download.php) editor to update, edit and create your translation files easily. However, you can also handle translations also without it: all can be done using gettext tools and your favorite plain text editor only. This guide, however, describes editing with Poedit. If you want to use gettext only, please refer to the xgettext manual page to update po files from sources (see also how it is used by Poedit below) and use msgunfmt tool to compile .mo files manually.
You need to know, that translation phrases are stored in **".po"** files (for example: `locale/pl_PL.utf8/LC_MESSAGES/pl_PL.utf8.po`), which are then complied in **".mo"** files using **msgfmt** gettext tool or by Poedit, which will run msgfmt for you in background.
@ -22,7 +22,7 @@ go to root of your installation of wallabag project and run next command:
`rm -rf ./cache/*`
(this may require root privileges if you run, for example Apatche web server with mod_php)
(this may require root privileges if you run, for example Apache web server with mod_php)
### 2. Generate php files from all twig templates
Do this using next command:
@ -31,37 +31,37 @@ Do this using next command:
OR
from your browser: **http://your-wallabag-host.com/locale/tools/fillCache.php** (this may require removal of .htacces file in locale/ directory).
from your browser: **http://your-wallabag-host.com/locale/tools/fillCache.php** (this may require removal of .htaccess file in locale/ directory).
### 3. Configure your Poedit
Open Poedit editor, open Edit->Preferences. Go to "Parsers" tab, click on PHP and press "Edit" button. Make sure your "Parser command:" looks like
`xgettext --no-location --force-po -o %o %C %K %F`
Usualy it is required to add "--no-location" to default value.
Usually it is required to add "--no-location" to default value.
### 4. Open .po file you want to edit in Poedit and change it's settings
### 4. Open .po file you want to edit in Poedit and change its settings
Open, for example `locale/pl_PL.utf8/LC_MESSAGES/pl_PL.utf8.po` file in your Poedit.
Go to "Catalog"->"Settings..." menu. Go to "Path" tab and add path to wallabag installaion in your local file system. This step can't be ommited as you will not be able to update phrases otherwise.
Go to "Catalog"->"Settings..." menu. Then go to "Path" tab and add path to wallabag installation in your local file system. This step can't be omitted as you will not be able to update phrases otherwise.
You can also check "project into" tab to be sure, that "Language" is set correctly (this will allow you to spell check your translation).
### 5. Update opened .po file from sources
Once you have set your path correctly, you are able to update phrases from sources. Press "Update catalog - synchronize it with sources" button or go to "Catalog"->"Update from sources" menu.
As a result you will see confirmation popup with two tabs: "New strings" and "Obsolete strings". Pls review and accept changes (or press "Undo" if you see too many obsolete strings, as Poedit will remove them all - in this case please make sure all previous steps are performed w/o errors).
As a result you will see confirmation popup with two tabs: "New strings" and "Obsolete strings". Please review and accept changes (or press "Undo" if you see too many obsolete strings, as Poedit will remove them all - in this case please make sure all previous steps are performed w/o errors).
### 6. Translate and save your .po file
If you have any dificulties on this step, please consult with Poedit manual.
Every time you save your .po file, Poedit will also comple appropriate .mo file by default (of course, if not disabled in preferences).
If you have any difficulties on this step, please consult with Poedit manual.
Every time you save your .po file, Poedit will also compile appropriate .mo file by default (of course, if not disabled in preferences).
So, you are almost done.
You are now almost done.
### 7. Clear cache again
This step may be required if your web server runs php scripts in name of, say, www user (i.e. Apache with mod_php, not cgi).
##To create new translation
Please simple create appropriate directories in locale folder and perform all steps, described above. Instead of opening an existing file just create new one.
##To create new translation
You just have to copy the folder corresponding to the language you want to translate from, change language in the project settings and for the folder and files names. Then start replacing all existing translations with your own.

71
Vagrantfile vendored Normal file
View File

@ -0,0 +1,71 @@
$script_sqlite = <<SCRIPT
apt-get update
apt-get install -y apache2 php5 php5-sqlite php5-xdebug
apt-get clean -y
echo "ServerName localhost" >> /etc/apache2/apache2.conf
service apache2 restart
rm -f /var/www/html/index.html
date > /etc/vagrant_provisioned_at
SCRIPT
$script_mysql = <<SCRIPT
export DEBIAN_FRONTEND=noninteractive
apt-get update
apt-get install -y apache2 php5 php5-mysql php5-xdebug mysql-server mysql-client
apt-get clean -y
echo "ServerName localhost" >> /etc/apache2/apache2.conf
service apache2 restart
service mysql restart
echo "create database wallabag;" | mysql -u root
rm -f /var/www/html/index.html
date > /etc/vagrant_provisioned_at
SCRIPT
$script_postgres = <<SCRIPT
export DEBIAN_FRONTEND=noninteractive
apt-get update
apt-get install -y apache2 php5 php5-pgsql php5-xdebug postgresql postgresql-contrib
apt-get clean -y
echo "ServerName localhost" >> /etc/apache2/apache2.conf
service apache2 restart
service postgresql restart
rm -f /var/www/html/index.html
date > /etc/vagrant_provisioned_at
SCRIPT
Vagrant.configure("2") do |config|
config.vm.define "sqlite" do |m|
m.vm.box = "ubuntu/trusty64"
m.vm.provision "shell", inline: $script_sqlite
m.vm.synced_folder ".", "/var/www/html", owner: "www-data", group: "www-data"
end
config.vm.define "mysql" do |m|
m.vm.box = "ubuntu/trusty64"
m.vm.provision "shell", inline: $script_mysql
m.vm.synced_folder ".", "/var/www/html", owner: "www-data", group: "www-data"
end
config.vm.define "postgres" do |m|
m.vm.box = "ubuntu/trusty64"
m.vm.provision "shell", inline: $script_postgres
m.vm.synced_folder ".", "/var/www/html", owner: "www-data", group: "www-data"
end
config.vm.define "debian7" do |m|
m.vm.box = "chef/debian-7.6"
m.vm.provision "shell", inline: $script_sqlite
m.vm.synced_folder ".", "/var/www", owner: "www-data", group: "www-data"
end
config.vm.define "debian6" do |m|
m.vm.box = "chef/debian-6.0.10"
m.vm.provision "shell", inline: $script_sqlite
m.vm.synced_folder ".", "/var/www", owner: "www-data", group: "www-data"
end
config.vm.network :forwarded_port, guest: 80, host: 8003
#config.vm.network "public_network", :bridge => "en0: Wi-Fi (AirPort)"
end

14
check_essentials.php Normal file
View File

@ -0,0 +1,14 @@
<?php
// PHP 5.3 minimum
if (version_compare(PHP_VERSION, '5.3.3', '<')) {
die('This software require PHP 5.3.3 minimum');
}
// Short tags must be enabled for PHP < 5.4
if (version_compare(PHP_VERSION, '5.4.0', '<')) {
if (! ini_get('short_open_tag')) {
die('This software require to have short tags enabled, check your php.ini => "short_open_tag = On"');
}
}

18
check_setup.php Normal file → Executable file
View File

@ -1,28 +1,10 @@
<?php
// PHP 5.3 minimum
if (version_compare(PHP_VERSION, '5.3.3', '<')) {
die('This software require PHP 5.3.3 minimum');
}
// Short tags must be enabled for PHP < 5.4
if (version_compare(PHP_VERSION, '5.4.0', '<')) {
if (! ini_get('short_open_tag')) {
die('This software require to have short tags enabled, check your php.ini => "short_open_tag = On"');
}
}
// Check if /cache is writeable
if (! is_writable('cache')) {
die('The directory "cache" must be writeable by your web server user');
}
// Check if /db is writeable
if (! is_writable('db')) {
die('The directory "db" must be writeable by your web server user');
}
// install folder still present, need to install wallabag
if (is_dir('install')) {
require('install/index.php');

View File

@ -1,28 +1,35 @@
<?php
/*
* Class for Flattr querying
*/
class FlattrItem {
/**
* wallabag, self hostable application allowing you to not miss any content anymore
*
* @category wallabag
* @author Nicolas Lœuillet <nicolas@loeuillet.org>
* @copyright 2013
* @license http://opensource.org/licenses/MIT see COPYING file
*/
class FlattrItem
{
public $status;
public $urltoflattr;
public $urlToFlattr;
public $flattrItemURL;
public $numflattrs;
public $numFlattrs;
public function checkItem($urltoflattr,$id) {
$this->cacheflattrfile($urltoflattr, $id);
public function checkItem($urlToFlattr, $id)
{
$this->_cacheFlattrFile($urlToFlattr, $id);
$flattrResponse = file_get_contents(CACHE . "/flattr/".$id.".cache");
if($flattrResponse != FALSE) {
$result = json_decode($flattrResponse);
if (isset($result->message)){
if (isset($result->message)) {
if ($result->message == "flattrable") {
$this->status = FLATTRABLE;
}
}
elseif (is_object($result) && $result->link) {
elseif (is_object($result) && $result->link) {
$this->status = FLATTRED;
$this->flattrItemURL = $result->link;
$this->numflattrs = $result->flattrs;
$this->numFlattrs = $result->flattrs;
}
else {
$this->status = NOT_FLATTRABLE;
@ -33,17 +40,18 @@ class FlattrItem {
}
}
private function cacheflattrfile($urltoflattr, $id) {
private function _cacheFlattrFile($urlToFlattr, $id)
{
if (!is_dir(CACHE . '/flattr')) {
mkdir(CACHE . '/flattr', 0777);
}
// if a cache flattr file for this url already exists and it's been less than one day than it have been updated, see in /cache
if ((!file_exists(CACHE . "/flattr/".$id.".cache")) || (time() - filemtime(CACHE . "/flattr/".$id.".cache") > 86400)) {
$askForFlattr = Tools::getFile(FLATTR_API . $urltoflattr);
$askForFlattr = Tools::getFile(FLATTR_API . $urlToFlattr);
$flattrCacheFile = fopen(CACHE . "/flattr/".$id.".cache", 'w+');
fwrite($flattrCacheFile, $askForFlattr);
fclose($flattrCacheFile);
}
}
}
}

View File

@ -309,4 +309,38 @@ class Session
return true; // User is not banned.
}
/**
* Tells if a param exists in session
*
* @param $name name of the param to test
* @return bool
*/
public static function isInSession($name)
{
return (isset($_SESSION[$name]) ? : FALSE);
}
/**
* Returns param in session
*
* @param $name name of the param to return
* @return mixed param or null
*/
public static function getParam($name)
{
return (self::isInSession($name) ? $_SESSION[$name] : NULL);
}
/**
* Store value in session
*
* @param $name name of the variable to store
* @param $value value to store
*/
public static function setParam($name, $value)
{
$_SESSION[$name] = $value;
}
}

View File

@ -44,7 +44,7 @@ class Messages {
var $msgId;
var $msgTypes = array( 'help', 'info', 'warning', 'success', 'error' );
var $msgClass = 'messages';
var $msgWrapper = "<div class='%s %s'><a href='#' class='closeMessage'>X</a>\n%s</div>\n";
var $msgWrapper = "<div class='%s %s'><a href='#' class='closeMessage'>&times;</a>\n%s</div>\n";
var $msgBefore = '<p>';
var $msgAfter = "</p>\n";

View File

@ -19,7 +19,7 @@ if (!isset($options)) $options = new stdClass();
// Enable service
// ----------------------
// Set this to false if you want to disable the service.
// If set to false, no feed is produced and users will
// If set to false, no feed is produced and users will
// be told that the service is disabled.
$options->enabled = true;
@ -43,10 +43,64 @@ $options->default_entries = 5;
// ----------------------
// The maximum number of feed items to process when no access key is supplied.
// This limits the user-supplied &max=x value. For example, if the user
// asks for 20 items to be processed (&max=20), if max_entries is set to
// asks for 20 items to be processed (&max=20), if max_entries is set to
// 10, only 10 will be processed.
$options->max_entries = 10;
// Full content
// ----------------------
// By default Full-Text RSS includes the extracted content in the output.
// You can exclude this from the output by passing '&content=0' in the querystring.
//
// Possible values...
// Always include: true
// Never include: false
// Include unless user overrides (&content=0): 'user' (default)
//
// Note: currently this does not disable full content extraction. It simply omits it
// from the output.
$options->content = 'user';
// Excerpts
// ----------------------
// By default Full-Text RSS does not include excerpts in the output.
// You can enable this by passing '&summary=1' in the querystring.
// This will include a plain text excerpt from the extracted content.
//
// Possible values...
// Always include: true (recommended for new users)
// Never include: false
// Don't include unless user overrides (&summary=1): 'user' (default)
//
// Important: if both content and excerpts are requested, the excerpt will be
// placed in the description element and the full content inside content:encoded.
// If excerpts are not requested, the full content will go inside the description element.
//
// Why are we not returning both excerpts and content by default?
// Mainly for backward compatibility.
// Excerpts should appear in the feed item's description element. Previous versions
// of Full-Text RSS did not return excerpts, so the description element was always
// used for the full content (as recommended by the RSS advisory). When returning both,
// we need somewhere else to place the content (content:encoded).
// Having both enabled should not create any problems for news readers, but it may create
// problems for developers upgrading from one of our earlier versions who may now find
// their applications are returning excerpts instead of the full content they were
// expecting. To avoid such surprises for users who are upgrading Full-Text RSS,
// excerpts must be explicitly requested in the querystring by default.
//
// Why not use a different element name for excerpts?
// According to the RSS advisory:
// "Publishers who employ summaries should store the summary in description and
// the full content in content:encoded, ordering description first within the item.
// On items with no summary, the full content should be stored in description."
// See: http://www.rssboard.org/rss-profile#namespace-elements-content-encoded
//
// For more consistent element naming, we recommend new users set this option to true.
// The full content can still be excluded via the querystring, but the element names
// will not change: when $options->summary = true, the description element will always
// be reserved for the excerpt and content:encoded always for full content.
$options->summary = 'user';
// Rewrite relative URLs
// ----------------------
// With this enabled relative URLs found in the extracted content
@ -67,7 +121,7 @@ $options->exclude_items_on_fail = 'user';
// Enable multi-page support
// -------------------------
// If enabled, we will try to follow next page links on multi-page articles.
// Currently this only happens for sites where next_page_link has been defined
// Currently this only happens for sites where next_page_link has been defined
// in a site config file.
$options->multipage = true;
@ -125,10 +179,10 @@ $options->detect_language = 1;
// Registration key
// ---------------
// The registration key is optional. It is not required to use Full-Text RSS,
// and does not affect the normal operation of Full-Text RSS. It is currently
// only used on admin pages which help you update site patterns with the
// latest version offered by FiveFilters.org. For these admin-related
// The registration key is optional. It is not required to use Full-Text RSS,
// and does not affect the normal operation of Full-Text RSS. It is currently
// only used on admin pages which help you update site patterns with the
// latest version offered by FiveFilters.org. For these admin-related
// tasks to complete, we will require a valid registration key.
// If you would like one, you can purchase the latest version of Full-Text RSS
// at http://fivefilters.org/content-only/
@ -144,12 +198,12 @@ $options->registration_key = '';
// ----------------------
// Certain pages/actions, e.g. updating site patterns with our online tool, will require admin credentials.
// To use these pages, enter a password here and you'll be prompted for it when you try to access those pages.
// If no password or username is set, pages requiring admin privelages will be inaccessible.
// If no password or username is set, pages requiring admin privelages will be inaccessible.
// The default username is 'admin'.
// If overriding with an environment variable, separate username and password with a colon, e.g.:
// ftr_admin_credentials: admin:my-secret-password
// Example: $options->admin_credentials = array('username'=>'admin', 'password'=>'my-secret-password');
$options->admin_credentials = array('username'=>'admin', 'password'=>'admin');
$options->admin_credentials = array('username'=>'admin', 'password'=>'');
// URLs to allow
// ----------------------
@ -178,12 +232,12 @@ $options->key_required = false;
// ----------------------
// By default, when processing feeds, we assume item titles in the feed
// have not been truncated. So after processing web pages, the extracted titles
// are not used in the generated feed. If you prefer to have extracted titles in
// the feed you can either set this to false, in which case we will always favour
// extracted titles. Alternatively, if set to 'user' (default) we'll use the
// are not used in the generated feed. If you prefer to have extracted titles in
// the feed you can either set this to false, in which case we will always favour
// extracted titles. Alternatively, if set to 'user' (default) we'll use the
// extracted title if you pass '&use_extracted_title' in the querystring.
// Possible values:
// * Favour feed titles: true
// * Favour feed titles: true
// * Favour extracted titles: false
// * Favour feed titles with user override: 'user' (default)
// Note: this has no effect when the input URL is to a web page - in these cases
@ -192,17 +246,17 @@ $options->favour_feed_titles = 'user';
// Access keys (password protected access)
// ------------------------------------
// NOTE: You do not need an API key from fivefilters.org to run your own
// NOTE: You do not need an API key from fivefilters.org to run your own
// copy of the code. This is here if you'd like to restrict access to
// _your_ copy.
// Keys let you group users - those with a key and those without - and
// restrict access to the service to those without a key.
// If you want everyone to access the service in the same way, you can
// leave the array below empty and ignore the access key options further down.
// The options further down let you control how the service should behave
// The options further down let you control how the service should behave
// in each mode.
// Note: Explicitly including the index number (1 and 2 in the examples below)
// is highly recommended (when generating feeds, we encode the key and
// Note: Explicitly including the index number (1 and 2 in the examples below)
// is highly recommended (when generating feeds, we encode the key and
// refer to it by index number and hash).
$options->api_keys = array();
// Example:
@ -232,13 +286,13 @@ $options->max_entries_with_key = 10;
// filter the resulting HTML for XSS attacks, making it redundant for
// Full-Text RSS do the same. Similarly with frameworks/CMS which display
// feed content - the content should be treated like any other user-submitted content.
//
//
// If you are writing an application yourself which is processing feeds generated by
// Full-Text RSS, you can either filter the HTML yourself to remove potential XSS attacks
// or enable this option. This might be useful if you are processing our generated
// feeds with JavaScript on the client side - although there's client side xss
// filtering available too, e.g. https://code.google.com/p/google-caja/wiki/JsHtmlSanitizer
//
//
// If enabled, we'll pass retrieved HTML content through htmLawed with
// safe flag on and style attributes denied, see
// http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed/htmLawed_README.htm#s3.6
@ -253,8 +307,8 @@ $options->xss_filter = 'user';
// Allowed parsers
// ----------------------
// Full-Text RSS attempts to use PHP's libxml extension to process HTML.
// While fast, on some sites it may not always produce good results.
// For these sites, you can specify an alternative HTML parser:
// While fast, on some sites it may not always produce good results.
// For these sites, you can specify an alternative HTML parser:
// parser: html5lib
// The html5lib parser is bundled with Full-Text RSS.
// see http://code.google.com/p/html5lib/
@ -273,7 +327,7 @@ $options->cors = false;
// Use APC user cache?
// ----------------------
// If enabled we will store site config files (when requested
// If enabled we will store site config files (when requested
// for the first time) in APC's user cache. Keys prefixed with 'sc.'
// This improves performance by reducing disk access.
// Note: this has no effect if APC is unavailable on your server.
@ -346,7 +400,7 @@ $options->rewrite_url = array(
// Valid actions:
// * 'exclude' - exclude this item from the result
// * 'link' - create HTML link to the item
$options->content_type_exc = array(
$options->content_type_exc = array(
'application/pdf' => array('action'=>'link', 'name'=>'PDF'),
'image' => array('action'=>'link', 'name'=>'Image'),
'audio' => array('action'=>'link', 'name'=>'Audio'),
@ -375,13 +429,13 @@ $options->cache_cleanup = 100;
/// DO NOT CHANGE ANYTHING BELOW THIS ///////////
/////////////////////////////////////////////////
if (!defined('_FF_FTR_VERSION')) define('_FF_FTR_VERSION', '3.1');
if (!defined('_FF_FTR_VERSION')) define('_FF_FTR_VERSION', '3.2');
if (basename(__FILE__) == 'config.php') {
if (file_exists(dirname(__FILE__).'/custom_config.php')) {
require_once dirname(__FILE__).'/custom_config.php';
}
// check for environment variables - often used on cloud platforms
// environment variables should be prefixed with 'ftr_', e.g.
// ftr_max_entries: 1

View File

@ -0,0 +1,266 @@
<?php
/**
* This should be a complete list of all HTML entities, mapped to their UTF-8 character codes.
*
* @author A. Grandt
* @copyright A. Grandt 2009-2013
* @license GNU LGPL, Attribution required for commercial implementations, requested for everything else.
* @version 3.00
*/
global $htmlEntities;
$htmlEntities = array();
$htmlEntities["&quot;"] ="\x22"; // &#34; ((double) quotation mark)
$htmlEntities["&amp;"] ="\x26"; // &#38; (ampersand)
$htmlEntities["&apos;"] ="\x27"; // &#39; (apostrophe = apostrophe-quote)
$htmlEntities["&lt;"] ="\x3C"; // &#60; (less-than sign)
$htmlEntities["&gt;"] ="\x3E"; // &#62; (greater-than sign)
$htmlEntities["&nbsp;"] ="\xC2\xA0"; // &#160; (non-breaking space)
$htmlEntities["&iexcl;"] ="\xC2\xA1"; // &#161; (inverted exclamation mark)
$htmlEntities["&cent;"] ="\xC2\xA2"; // &#162; (cent)
$htmlEntities["&pound;"] ="\xC2\xA3"; // &#163; (pound)
$htmlEntities["&curren;"] ="\xC2\xA4"; // &#164; (currency)
$htmlEntities["&yen;"] ="\xC2\xA5"; // &#165; (yen)
$htmlEntities["&brvbar;"] ="\xC2\xA6"; // &#166; (broken vertical bar)
$htmlEntities["&sect;"] ="\xC2\xA7"; // &#167; (section)
$htmlEntities["&uml;"] ="\xC2\xA8"; // &#168; (spacing diaeresis)
$htmlEntities["&copy;"] ="\xC2\xA9"; // &#169; (copyright)
$htmlEntities["&ordf;"] ="\xC2\xAA"; // &#170; (feminine ordinal indicator)
$htmlEntities["&laquo;"] ="\xC2\xAB"; // &#171; (angle quotation mark (left))
$htmlEntities["&not;"] ="\xC2\xAC"; // &#172; (negation)
$htmlEntities["&shy;"] ="\xC2\xAD"; // &#173; (soft hyphen)
$htmlEntities["&reg;"] ="\xC2\xAE"; // &#174; (registered trademark)
$htmlEntities["&macr;"] ="\xC2\xAF"; // &#175; (spacing macron)
$htmlEntities["&deg;"] ="\xC2\xB0"; // &#176; (degree)
$htmlEntities["&plusmn;"] ="\xC2\xB1"; // &#177; (plus-or-minus)
$htmlEntities["&sup2;"] ="\xC2\xB2"; // &#178; (superscript 2)
$htmlEntities["&sup3;"] ="\xC2\xB3"; // &#179; (superscript 3)
$htmlEntities["&acute;"] ="\xC2\xB4"; // &#180; (spacing acute)
$htmlEntities["&micro;"] ="\xC2\xB5"; // &#181; (micro)
$htmlEntities["&para;"] ="\xC2\xB6"; // &#182; (paragraph)
$htmlEntities["&middot;"] ="\xC2\xB7"; // &#183; (middle dot)
$htmlEntities["&cedil;"] ="\xC2\xB8"; // &#184; (spacing cedilla)
$htmlEntities["&sup1;"] ="\xC2\xB9"; // &#185; (superscript 1)
$htmlEntities["&ordm;"] ="\xC2\xBA"; // &#186; (masculine ordinal indicator)
$htmlEntities["&raquo;"] ="\xC2\xBB"; // &#187; (angle quotation mark (right))
$htmlEntities["&frac14;"] ="\xC2\xBC"; // &#188; (fraction 1/4)
$htmlEntities["&frac12;"] ="\xC2\xBD"; // &#189; (fraction 1/2)
$htmlEntities["&frac34;"] ="\xC2\xBE"; // &#190; (fraction 3/4)
$htmlEntities["&iquest;"] ="\xC2\xBF"; // &#191; (inverted question mark)
$htmlEntities["&Agrave;"] ="\xC3\x80"; // &#192; (capital a, grave accent)
$htmlEntities["&Aacute;"] ="\xC3\x81"; // &#193; (capital a, acute accent)
$htmlEntities["&Acirc;"] ="\xC3\x82"; // &#194; (capital a, circumflex accent)
$htmlEntities["&Atilde;"] ="\xC3\x83"; // &#195; (capital a, tilde)
$htmlEntities["&Auml;"] ="\xC3\x84"; // &#196; (capital a, umlaut mark)
$htmlEntities["&Aring;"] ="\xC3\x85"; // &#197; (capital a, ring)
$htmlEntities["&AElig;"] ="\xC3\x86"; // &#198; (capital ae)
$htmlEntities["&Ccedil;"] ="\xC3\x87"; // &#199; (capital c, cedilla)
$htmlEntities["&Egrave;"] ="\xC3\x88"; // &#200; (capital e, grave accent)
$htmlEntities["&Eacute;"] ="\xC3\x89"; // &#201; (capital e, acute accent)
$htmlEntities["&Ecirc;"] ="\xC3\x8A"; // &#202; (capital e, circumflex accent)
$htmlEntities["&Euml;"] ="\xC3\x8B"; // &#203; (capital e, umlaut mark)
$htmlEntities["&Igrave;"] ="\xC3\x8C"; // &#204; (capital i, grave accent)
$htmlEntities["&Iacute;"] ="\xC3\x8D"; // &#205; (capital i, acute accent)
$htmlEntities["&Icirc;"] ="\xC3\x8E"; // &#206; (capital i, circumflex accent)
$htmlEntities["&Iuml;"] ="\xC3\x8F"; // &#207; (capital i, umlaut mark)
$htmlEntities["&ETH;"] ="\xC3\x90"; // &#208; (capital eth, Icelandic)
$htmlEntities["&Ntilde;"] ="\xC3\x91"; // &#209; (capital n, tilde)
$htmlEntities["&Ograve;"] ="\xC3\x92"; // &#210; (capital o, grave accent)
$htmlEntities["&Oacute;"] ="\xC3\x93"; // &#211; (capital o, acute accent)
$htmlEntities["&Ocirc;"] ="\xC3\x94"; // &#212; (capital o, circumflex accent)
$htmlEntities["&Otilde;"] ="\xC3\x95"; // &#213; (capital o, tilde)
$htmlEntities["&Ouml;"] ="\xC3\x96"; // &#214; (capital o, umlaut mark)
$htmlEntities["&times;"] ="\xC3\x97"; // &#215; (multiplication)
$htmlEntities["&Oslash;"] ="\xC3\x98"; // &#216; (capital o, slash)
$htmlEntities["&Ugrave;"] ="\xC3\x99"; // &#217; (capital u, grave accent)
$htmlEntities["&Uacute;"] ="\xC3\x9A"; // &#218; (capital u, acute accent)
$htmlEntities["&Ucirc;"] ="\xC3\x9B"; // &#219; (capital u, circumflex accent)
$htmlEntities["&Uuml;"] ="\xC3\x9C"; // &#220; (capital u, umlaut mark)
$htmlEntities["&Yacute;"] ="\xC3\x9D"; // &#221; (capital y, acute accent)
$htmlEntities["&THORN;"] ="\xC3\x9E"; // &#222; (capital THORN, Icelandic)
$htmlEntities["&szlig;"] ="\xC3\x9F"; // &#223; (small sharp s, German)
$htmlEntities["&agrave;"] ="\xC3\xA0"; // &#224; (small a, grave accent)
$htmlEntities["&aacute;"] ="\xC3\xA1"; // &#225; (small a, acute accent)
$htmlEntities["&acirc;"] ="\xC3\xA2"; // &#226; (small a, circumflex accent)
$htmlEntities["&atilde;"] ="\xC3\xA3"; // &#227; (small a, tilde)
$htmlEntities["&auml;"] ="\xC3\xA4"; // &#228; (small a, umlaut mark)
$htmlEntities["&aring;"] ="\xC3\xA5"; // &#229; (small a, ring)
$htmlEntities["&aelig;"] ="\xC3\xA6"; // &#230; (small ae)
$htmlEntities["&ccedil;"] ="\xC3\xA7"; // &#231; (small c, cedilla)
$htmlEntities["&egrave;"] ="\xC3\xA8"; // &#232; (small e, grave accent)
$htmlEntities["&eacute;"] ="\xC3\xA9"; // &#233; (small e, acute accent)
$htmlEntities["&ecirc;"] ="\xC3\xAA"; // &#234; (small e, circumflex accent)
$htmlEntities["&euml;"] ="\xC3\xAB"; // &#235; (small e, umlaut mark)
$htmlEntities["&igrave;"] ="\xC3\xAC"; // &#236; (small i, grave accent)
$htmlEntities["&iacute;"] ="\xC3\xAD"; // &#237; (small i, acute accent)
$htmlEntities["&icirc;"] ="\xC3\xAE"; // &#238; (small i, circumflex accent)
$htmlEntities["&iuml;"] ="\xC3\xAF"; // &#239; (small i, umlaut mark)
$htmlEntities["&eth;"] ="\xC3\xB0"; // &#240; (small eth, Icelandic)
$htmlEntities["&ntilde;"] ="\xC3\xB1"; // &#241; (small n, tilde)
$htmlEntities["&ograve;"] ="\xC3\xB2"; // &#242; (small o, grave accent)
$htmlEntities["&oacute;"] ="\xC3\xB3"; // &#243; (small o, acute accent)
$htmlEntities["&ocirc;"] ="\xC3\xB4"; // &#244; (small o, circumflex accent)
$htmlEntities["&otilde;"] ="\xC3\xB5"; // &#245; (small o, tilde)
$htmlEntities["&ouml;"] ="\xC3\xB6"; // &#246; (small o, umlaut mark)
$htmlEntities["&divide;"] ="\xC3\xB7"; // &#247; (division)
$htmlEntities["&oslash;"] ="\xC3\xB8"; // &#248; (small o, slash)
$htmlEntities["&ugrave;"] ="\xC3\xB9"; // &#249; (small u, grave accent)
$htmlEntities["&uacute;"] ="\xC3\xBA"; // &#250; (small u, acute accent)
$htmlEntities["&ucirc;"] ="\xC3\xBB"; // &#251; (small u, circumflex accent)
$htmlEntities["&uuml;"] ="\xC3\xBC"; // &#252; (small u, umlaut mark)
$htmlEntities["&yacute;"] ="\xC3\xBD"; // &#253; (small y, acute accent)
$htmlEntities["&thorn;"] ="\xC3\xBE"; // &#254; (small thorn, Icelandic)
$htmlEntities["&yuml;"] ="\xC3\xBF"; // &#255; (small y, umlaut mark)
$htmlEntities["&OElig;"] ="\xC5\x92"; // &#338; (capital ligature OE)
$htmlEntities["&oelig;"] ="\xC5\x93"; // &#339; (small ligature oe)
$htmlEntities["&Scaron;"] ="\xC5\xA0"; // &#352; (capital S with caron)
$htmlEntities["&scaron;"] ="\xC5\xA1"; // &#353; (small S with caron)
$htmlEntities["&Yuml;"] ="\xC5\xB8"; // &#376; (capital Y with diaeres)
$htmlEntities["&fnof;"] ="\xC6\x92"; // &#402; (f with hook)
$htmlEntities["&circ;"] ="\xCB\x86"; // &#710; (modifier letter circumflex accent)
$htmlEntities["&tilde;"] ="\xCB\x9C"; // &#732; (small tilde)
$htmlEntities["&Alpha;"] ="\xCE\x91"; // &#913; (Alpha)
$htmlEntities["&Beta;"] ="\xCE\x92"; // &#914; (Beta)
$htmlEntities["&Gamma;"] ="\xCE\x93"; // &#915; (Gamma)
$htmlEntities["&Delta;"] ="\xCE\x94"; // &#916; (Delta)
$htmlEntities["&Epsilon;"] ="\xCE\x95"; // &#917; (Epsilon)
$htmlEntities["&Zeta;"] ="\xCE\x96"; // &#918; (Zeta)
$htmlEntities["&Eta;"] ="\xCE\x97"; // &#919; (Eta)
$htmlEntities["&Theta;"] ="\xCE\x98"; // &#920; (Theta)
$htmlEntities["&Iota;"] ="\xCE\x99"; // &#921; (Iota)
$htmlEntities["&Kappa;"] ="\xCE\x9A"; // &#922; (Kappa)
$htmlEntities["&Lambda;"] ="\xCE\x9B"; // &#923; (Lambda)
$htmlEntities["&Mu;"] ="\xCE\x9C"; // &#924; (Mu)
$htmlEntities["&Nu;"] ="\xCE\x9D"; // &#925; (Nu)
$htmlEntities["&Xi;"] ="\xCE\x9E"; // &#926; (Xi)
$htmlEntities["&Omicron;"] ="\xCE\x9F"; // &#927; (Omicron)
$htmlEntities["&Pi;"] ="\xCE\xA0"; // &#928; (Pi)
$htmlEntities["&Rho;"] ="\xCE\xA1"; // &#929; (Rho)
$htmlEntities["&Sigma;"] ="\xCE\xA3"; // &#931; (Sigma)
$htmlEntities["&Tau;"] ="\xCE\xA4"; // &#932; (Tau)
$htmlEntities["&Upsilon;"] ="\xCE\xA5"; // &#933; (Upsilon)
$htmlEntities["&Phi;"] ="\xCE\xA6"; // &#934; (Phi)
$htmlEntities["&Chi;"] ="\xCE\xA7"; // &#935; (Chi)
$htmlEntities["&Psi;"] ="\xCE\xA8"; // &#936; (Psi)
$htmlEntities["&Omega;"] ="\xCE\xA9"; // &#937; (Omega)
$htmlEntities["&alpha;"] ="\xCE\xB1"; // &#945; (alpha)
$htmlEntities["&beta;"] ="\xCE\xB2"; // &#946; (beta)
$htmlEntities["&gamma;"] ="\xCE\xB3"; // &#947; (gamma)
$htmlEntities["&delta;"] ="\xCE\xB4"; // &#948; (delta)
$htmlEntities["&epsilon;"] ="\xCE\xB5"; // &#949; (epsilon)
$htmlEntities["&zeta;"] ="\xCE\xB6"; // &#950; (zeta)
$htmlEntities["&eta;"] ="\xCE\xB7"; // &#951; (eta)
$htmlEntities["&theta;"] ="\xCE\xB8"; // &#952; (theta)
$htmlEntities["&iota;"] ="\xCE\xB9"; // &#953; (iota)
$htmlEntities["&kappa;"] ="\xCE\xBA"; // &#954; (kappa)
$htmlEntities["&lambda;"] ="\xCE\xBB"; // &#955; (lambda)
$htmlEntities["&mu;"] ="\xCE\xBC"; // &#956; (mu)
$htmlEntities["&nu;"] ="\xCE\xBD"; // &#957; (nu)
$htmlEntities["&xi;"] ="\xCE\xBE"; // &#958; (xi)
$htmlEntities["&omicron;"] ="\xCE\xBF"; // &#959; (omicron)
$htmlEntities["&pi;"] ="\xCF\x80"; // &#960; (pi)
$htmlEntities["&rho;"] ="\xCF\x81"; // &#961; (rho)
$htmlEntities["&sigmaf;"] ="\xCF\x82"; // &#962; (sigmaf)
$htmlEntities["&sigma;"] ="\xCF\x83"; // &#963; (sigma)
$htmlEntities["&tau;"] ="\xCF\x84"; // &#964; (tau)
$htmlEntities["&upsilon;"] ="\xCF\x85"; // &#965; (upsilon)
$htmlEntities["&phi;"] ="\xCF\x86"; // &#966; (phi)
$htmlEntities["&chi;"] ="\xCF\x87"; // &#967; (chi)
$htmlEntities["&psi;"] ="\xCF\x88"; // &#968; (psi)
$htmlEntities["&omega;"] ="\xCF\x89"; // &#969; (omega)
$htmlEntities["&thetasym;"] ="\xCF\x91"; // &#977; (theta symbol)
$htmlEntities["&upsih;"] ="\xCF\x92"; // &#978; (upsilon symbol)
$htmlEntities["&piv;"] ="\xCF\x96"; // &#982; (pi symbol)
$htmlEntities["&ensp;"] ="\xE2\x80\x82"; // &#8194; (en space)
$htmlEntities["&emsp;"] ="\xE2\x80\x83"; // &#8195; (em space)
$htmlEntities["&thinsp;"] ="\xE2\x80\x89"; // &#8201; (thin space)
$htmlEntities["&zwnj;"] ="\xE2\x80\x8C"; // &#8204; (zero width non-joiner)
$htmlEntities["&zwj;"] ="\xE2\x80\x8D"; // &#8205; (zero width joiner)
$htmlEntities["&lrm;"] ="\xE2\x80\x8E"; // &#8206; (left-to-right mark)
$htmlEntities["&rlm;"] ="\xE2\x80\x8F"; // &#8207; (right-to-left mark)
$htmlEntities["&ndash;"] ="\xE2\x80\x93"; // &#8211; (en dash)
$htmlEntities["&mdash;"] ="\xE2\x80\x94"; // &#8212; (em dash)
$htmlEntities["&lsquo;"] ="\xE2\x80\x98"; // &#8216; (left single quotation mark)
$htmlEntities["&rsquo;"] ="\xE2\x80\x99"; // &#8217; (right single quotation mark)
$htmlEntities["&sbquo;"] ="\xE2\x80\x9A"; // &#8218; (single low-9 quotation mark)
$htmlEntities["&ldquo;"] ="\xE2\x80\x9C"; // &#8220; (left double quotation mark)
$htmlEntities["&rdquo;"] ="\xE2\x80\x9D"; // &#8221; (right double quotation mark)
$htmlEntities["&bdquo;"] ="\xE2\x80\x9E"; // &#8222; (double low-9 quotation mark)
$htmlEntities["&dagger;"] ="\xE2\x80\xA0"; // &#8224; (dagger)
$htmlEntities["&Dagger;"] ="\xE2\x80\xA1"; // &#8225; (double dagger)
$htmlEntities["&bull;"] ="\xE2\x80\xA2"; // &#8226; (bullet)
$htmlEntities["&hellip;"] ="\xE2\x80\xA6"; // &#8230; (horizontal ellipsis)
$htmlEntities["&permil;"] ="\xE2\x80\xB0"; // &#8240; (per mille)
$htmlEntities["&prime;"] ="\xE2\x80\xB2"; // &#8242; (minutes or prime)
$htmlEntities["&Prime;"] ="\xE2\x80\xB3"; // &#8243; (seconds or Double Prime)
$htmlEntities["&lsaquo;"] ="\xE2\x80\xB9"; // &#8249; (single left angle quotation)
$htmlEntities["&rsaquo;"] ="\xE2\x80\xBA"; // &#8250; (single right angle quotation)
$htmlEntities["&oline;"] ="\xE2\x80\xBE"; // &#8254; (overline)
$htmlEntities["&frasl;"] ="\xE2\x81\x84"; // &#8260; (fraction slash)
$htmlEntities["&euro;"] ="\xE2\x82\xAC"; // &#8364; (euro)
$htmlEntities["&image;"] ="\xE2\x84\x91"; // &#8465; (blackletter capital I)
$htmlEntities["&weierp;"] ="\xE2\x84\x98"; // &#8472; (script capital P)
$htmlEntities["&real;"] ="\xE2\x84\x9C"; // &#8476; (blackletter capital R)
$htmlEntities["&trade;"] ="\xE2\x84\xA2"; // &#8482; (trademark)
$htmlEntities["&alefsym;"] ="\xE2\x84\xB5"; // &#8501; (alef)
$htmlEntities["&larr;"] ="\xE2\x86\x90"; // &#8592; (left arrow)
$htmlEntities["&uarr;"] ="\xE2\x86\x91"; // &#8593; (up arrow)
$htmlEntities["&rarr;"] ="\xE2\x86\x92"; // &#8594; (right arrow)
$htmlEntities["&darr;"] ="\xE2\x86\x93"; // &#8595; (down arrow)
$htmlEntities["&harr;"] ="\xE2\x86\x94"; // &#8596; (left right arrow)
$htmlEntities["&crarr;"] ="\xE2\x86\xB5"; // &#8629; (carriage return arrow)
$htmlEntities["&lArr;"] ="\xE2\x87\x90"; // &#8656; (left double arrow)
$htmlEntities["&uArr;"] ="\xE2\x87\x91"; // &#8657; (up double arrow)
$htmlEntities["&rArr;"] ="\xE2\x87\x92"; // &#8658; (right double arrow)
$htmlEntities["&dArr;"] ="\xE2\x87\x93"; // &#8659; (down double arrow)
$htmlEntities["&hArr;"] ="\xE2\x87\x94"; // &#8660; (left right double arrow)
$htmlEntities["&forall;"] ="\xE2\x88\x80"; // &#8704; (for all)
$htmlEntities["&part;"] ="\xE2\x88\x82"; // &#8706; (partial differential)
$htmlEntities["&exist;"] ="\xE2\x88\x83"; // &#8707; (there exists)
$htmlEntities["&empty;"] ="\xE2\x88\x85"; // &#8709; (empty set)
$htmlEntities["&nabla;"] ="\xE2\x88\x87"; // &#8711; (backward difference)
$htmlEntities["&isin;"] ="\xE2\x88\x88"; // &#8712; (element of)
$htmlEntities["&notin;"] ="\xE2\x88\x89"; // &#8713; (not an element of)
$htmlEntities["&ni;"] ="\xE2\x88\x8B"; // &#8715; (ni = contains as member)
$htmlEntities["&prod;"] ="\xE2\x88\x8F"; // &#8719; (n-ary product)
$htmlEntities["&sum;"] ="\xE2\x88\x91"; // &#8721; (n-ary sumation)
$htmlEntities["&minus;"] ="\xE2\x88\x92"; // &#8722; (minus)
$htmlEntities["&lowast;"] ="\xE2\x88\x97"; // &#8727; (asterisk operator)
$htmlEntities["&radic;"] ="\xE2\x88\x9A"; // &#8730; (square root)
$htmlEntities["&prop;"] ="\xE2\x88\x9D"; // &#8733; (proportional to)
$htmlEntities["&infin;"] ="\xE2\x88\x9E"; // &#8734; (infinity)
$htmlEntities["&ang;"] ="\xE2\x88\xA0"; // &#8736; (angle)
$htmlEntities["&and;"] ="\xE2\x88\xA7"; // &#8743; (logical and)
$htmlEntities["&or;"] ="\xE2\x88\xA8"; // &#8744; (logical or)
$htmlEntities["&cap;"] ="\xE2\x88\xA9"; // &#8745; (intersection)
$htmlEntities["&cup;"] ="\xE2\x88\xAA"; // &#8746; (union)
$htmlEntities["&int;"] ="\xE2\x88\xAB"; // &#8747; (integral)
$htmlEntities["&there4;"] ="\xE2\x88\xB4"; // &#8756; (therefore)
$htmlEntities["&sim;"] ="\xE2\x88\xBC"; // &#8764; (similar to)
$htmlEntities["&cong;"] ="\xE2\x89\x85"; // &#8773; (congruent to)
$htmlEntities["&asymp;"] ="\xE2\x89\x88"; // &#8776; (approximately equal)
$htmlEntities["&ne;"] ="\xE2\x89\xA0"; // &#8800; (not equal)
$htmlEntities["&equiv;"] ="\xE2\x89\xA1"; // &#8801; (equivalent)
$htmlEntities["&le;"] ="\xE2\x89\xA4"; // &#8804; (less or equal)
$htmlEntities["&ge;"] ="\xE2\x89\xA5"; // &#8805; (greater or equal)
$htmlEntities["&sub;"] ="\xE2\x8A\x82"; // &#8834; (subset of)
$htmlEntities["&sup;"] ="\xE2\x8A\x83"; // &#8835; (superset of)
$htmlEntities["&nsub;"] ="\xE2\x8A\x84"; // &#8836; (not subset of)
$htmlEntities["&sube;"] ="\xE2\x8A\x86"; // &#8838; (subset or equal)
$htmlEntities["&supe;"] ="\xE2\x8A\x87"; // &#8839; (superset or equal)
$htmlEntities["&oplus;"] ="\xE2\x8A\x95"; // &#8853; (circled plus)
$htmlEntities["&otimes;"] ="\xE2\x8A\x87"; // &#8855; (circled times)
$htmlEntities["&perp;"] ="\xE2\x8A\xA5"; // &#8869; (perpendicular)
$htmlEntities["&sdot;"] ="\xE2\x8C\x85"; // &#8901; (dot operator)
$htmlEntities["&lceil;"] ="\xE2\x8C\x88"; // &#8968; (left ceiling)
$htmlEntities["&rceil;"] ="\xE2\x8C\x89"; // &#8969; (right ceiling)
$htmlEntities["&lfloor;"] ="\xE2\x8C\x8A"; // &#8970; (left floor)
$htmlEntities["&rfloor;"] ="\xE2\x8C\x8B"; // &#8971; (right floor)
$htmlEntities["&lang;"] ="\xE2\x8C\xA9"; // &#9001; (left angle bracket = bra)
$htmlEntities["&rang;"] ="\xE2\x8C\xAA"; // &#9002; (right angle bracket = ket)
$htmlEntities["&loz;"] ="\xE2\x97\x8A"; // &#9674; (lozenge)
$htmlEntities["&spades;"] ="\xE2\x99\xA0"; // &#9824; (spade)
$htmlEntities["&clubs;"] ="\xE2\x99\xA3"; // &#9827; (club)
$htmlEntities["&hearts;"] ="\xE2\x99\xA5"; // &#9829; (heart)
$htmlEntities["&diams;"] ="\xE2\x99\xA6"; // &#9830; (diamond)
?>

View File

@ -0,0 +1,782 @@
<?php
/**
* ePub NCX file structure
*
* @author A. Grandt <php@grandt.com>
* @copyright 2009-2014 A. Grandt
* @license GNU LGPL, Attribution required for commercial implementations, requested for everything else.
* @version 3.20
*/
class Ncx {
const _VERSION = 3.20;
const MIMETYPE = "application/x-dtbncx+xml";
private $bookVersion = EPub::BOOK_VERSION_EPUB2;
private $navMap = NULL;
private $uid = NULL;
private $meta = array();
private $docTitle = NULL;
private $docAuthor = NULL;
private $currentLevel = NULL;
private $lastLevel = NULL;
private $languageCode = "en";
private $writingDirection = EPub::DIRECTION_LEFT_TO_RIGHT;
public $chapterList = array();
public $referencesTitle = "Guide";
public $referencesClass = "references";
public $referencesId = "references";
public $referencesList = array();
public $referencesName = array();
public $referencesOrder = NULL;
/**
* Class constructor.
*
* @param string $uid
* @param string $docTitle
* @param string $docAuthor
* @param string $languageCode
* @param string $writingDirection
*/
function __construct($uid = NULL, $docTitle = NULL, $docAuthor = NULL, $languageCode = "en", $writingDirection = EPub::DIRECTION_LEFT_TO_RIGHT) {
$this->navMap = new NavMap($writingDirection);
$this->currentLevel = $this->navMap;
$this->setUid($uid);
$this->setDocTitle($docTitle);
$this->setDocAuthor($docAuthor);
$this->setLanguageCode($languageCode);
$this->setWritingDirection($writingDirection);
}
/**
* Class destructor
*
* @return void
*/
function __destruct() {
unset($this->bookVersion, $this->navMap, $this->uid, $this->meta);
unset($this->docTitle, $this->docAuthor, $this->currentLevel, $this->lastLevel);
unset($this->languageCode, $this->writingDirection, $this->chapterList, $this->referencesTitle);
unset($this->referencesClass, $this->referencesId, $this->referencesList, $this->referencesName);
unset($this->referencesOrder);
}
/**
*
* Enter description here ...
*
* @param string $bookVersion
*/
function setVersion($bookVersion) {
$this->bookVersion = is_string($bookVersion) ? trim($bookVersion) : EPub::BOOK_VERSION_EPUB2;
}
/**
*
* @return bool TRUE if the book is set to type ePub 2
*/
function isEPubVersion2() {
return $this->bookVersion === EPub::BOOK_VERSION_EPUB2;
}
/**
*
* Enter description here ...
*
* @param string $uid
*/
function setUid($uid) {
$this->uid = is_string($uid) ? trim($uid) : NULL;
}
/**
*
* Enter description here ...
*
* @param string $docTitle
*/
function setDocTitle($docTitle) {
$this->docTitle = is_string($docTitle) ? trim($docTitle) : NULL;
}
/**
*
* Enter description here ...
*
* @param string $docAuthor
*/
function setDocAuthor($docAuthor) {
$this->docAuthor = is_string($docAuthor) ? trim($docAuthor) : NULL;
}
/**
*
* Enter description here ...
*
* @param string $languageCode
*/
function setLanguageCode($languageCode) {
$this->languageCode = is_string($languageCode) ? trim($languageCode) : "en";
}
/**
*
* Enter description here ...
*
* @param string $writingDirection
*/
function setWritingDirection($writingDirection) {
$this->writingDirection = is_string($writingDirection) ? trim($writingDirection) : EPub::DIRECTION_LEFT_TO_RIGHT;
}
/**
*
* Enter description here ...
*
* @param NavMap $navMap
*/
function setNavMap($navMap) {
if ($navMap != NULL && is_object($navMap) && get_class($navMap) === "NavMap") {
$this->navMap = $navMap;
}
}
/**
* Add one chapter level.
*
* Subsequent chapters will be added to this level.
*
* @param string $navTitle
* @param string $navId
* @param string $navClass
* @param string $isNavHidden
* @param string $writingDirection
* @return NavPoint
*/
function subLevel($navTitle = NULL, $navId = NULL, $navClass = NULL, $isNavHidden = FALSE, $writingDirection = NULL) {
$navPoint = FALSE;
if (isset($navTitle) && isset($navClass)) {
$navPoint = new NavPoint($navTitle, NULL, $navId, $navClass, $isNavHidden, $writingDirection);
$this->addNavPoint($navPoint);
}
if ($this->lastLevel !== NULL) {
$this->currentLevel = $this->lastLevel;
}
return $navPoint;
}
/**
* Step back one chapter level.
*
* Subsequent chapters will be added to this chapters parent level.
*/
function backLevel() {
$this->lastLevel = $this->currentLevel;
$this->currentLevel = $this->currentLevel->getParent();
}
/**
* Step back to the root level.
*
* Subsequent chapters will be added to the rooot NavMap.
*/
function rootLevel() {
$this->lastLevel = $this->currentLevel;
$this->currentLevel = $this->navMap;
}
/**
* Step back to the given level.
* Useful for returning to a previous level from deep within the structure.
* Values below 2 will have the same effect as rootLevel()
*
* @param int $newLevel
*/
function setCurrentLevel($newLevel) {
if ($newLevel <= 1) {
$this->rootLevel();
} else {
while ($this->currentLevel->getLevel() > $newLevel) {
$this->backLevel();
}
}
}
/**
* Get current level count.
* The indentation of the current structure point.
*
* @return current level count;
*/
function getCurrentLevel() {
return $this->currentLevel->getLevel();
}
/**
* Add child NavPoints to current level.
*
* @param NavPoint $navPoint
*/
function addNavPoint($navPoint) {
$this->lastLevel = $this->currentLevel->addNavPoint($navPoint);
}
/**
*
* Enter description here ...
*
* @return NavMap
*/
function getNavMap() {
return $this->navMap;
}
/**
*
* Enter description here ...
*
* @param string $name
* @param string $content
*/
function addMetaEntry($name, $content) {
$name = is_string($name) ? trim($name) : NULL;
$content = is_string($content) ? trim($content) : NULL;
if ($name != NULL && $content != NULL) {
$this->meta[] = array($name => $content);
}
}
/**
*
* Enter description here ...
*
* @return string
*/
function finalize() {
$nav = $this->navMap->finalize();
$ncx = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n";
if ($this->isEPubVersion2()) {
$ncx .= "<!DOCTYPE ncx PUBLIC \"-//NISO//DTD ncx 2005-1//EN\"\n"
. " \"http://www.daisy.org/z3986/2005/ncx-2005-1.dtd\">\n";
}
$ncx .= "<ncx xmlns=\"http://www.daisy.org/z3986/2005/ncx/\" version=\"2005-1\" xml:lang=\"" . $this->languageCode . "\" dir=\"" . $this->writingDirection . "\">\n"
. "\t<head>\n"
. "\t\t<meta name=\"dtb:uid\" content=\"" . $this->uid . "\" />\n"
. "\t\t<meta name=\"dtb:depth\" content=\"" . $this->navMap->getNavLevels() . "\" />\n"
. "\t\t<meta name=\"dtb:totalPageCount\" content=\"0\" />\n"
. "\t\t<meta name=\"dtb:maxPageNumber\" content=\"0\" />\n";
if (sizeof($this->meta)) {
foreach ($this->meta as $metaEntry) {
list($name, $content) = each($metaEntry);
$ncx .= "\t\t<meta name=\"" . $name . "\" content=\"" . $content . "\" />\n";
}
}
$ncx .= "\t</head>\n\n\t<docTitle>\n\t\t<text>"
. $this->docTitle
. "</text>\n\t</docTitle>\n\n\t<docAuthor>\n\t\t<text>"
. $this->docAuthor
. "</text>\n\t</docAuthor>\n\n"
. $nav;
return $ncx . "</ncx>\n";
}
/**
*
* @param string $title
* @param string $cssFileName
* @return string
*/
function finalizeEPub3($title = "Table of Contents", $cssFileName = NULL) {
$end = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
. "<html xmlns=\"http://www.w3.org/1999/xhtml\"\n"
. " xmlns:epub=\"http://www.idpf.org/2007/ops\"\n"
. " xml:lang=\"" . $this->languageCode . "\" lang=\"" . $this->languageCode . "\" dir=\"" . $this->writingDirection . "\">\n"
. "\t<head>\n"
. "\t\t<title>" . $this->docTitle . "</title>\n"
. "\t\t<meta http-equiv=\"default-style\" content=\"text/html; charset=utf-8\"/>\n";
if ($cssFileName !== NULL) {
$end .= "\t\t<link rel=\"stylesheet\" href=\"" . $cssFileName . "\" type=\"text/css\"/>\n";
}
$end .= "\t</head>\n"
. "\t<body epub:type=\"frontmatter toc\">\n"
. "\t\t<header>\n"
. "\t\t\t<h1>" . $title . "</h1>\n"
. "\t\t</header>\n"
. $this->navMap->finalizeEPub3()
. $this->finalizeEPub3Landmarks()
. "\t</body>\n"
. "</html>\n";
return $end;
}
/**
* Build the references for the ePub 2 toc.
* These are merely reference pages added to the end of the navMap though.
*
* @return string
*/
function finalizeReferences() {
if (isset($this->referencesList) && sizeof($this->referencesList) > 0) {
$this->rootLevel();
$this->subLevel($this->referencesTitle, $this->referencesId, $this->referencesClass);
$refId = 1;
while (list($item, $descriptive) = each($this->referencesOrder)) {
if (array_key_exists($item, $this->referencesList)) {
$name = (empty($this->referencesName[$item]) ? $descriptive : $this->referencesName[$item]);
$navPoint = new NavPoint($name, $this->referencesList[$item], "ref-" . $refId++);
$this->addNavPoint($navPoint);
}
}
}
}
/**
* Build the landmarks for the ePub 3 toc.
* @return string
*/
function finalizeEPub3Landmarks() {
$lm = "";
if (isset($this->referencesList) && sizeof($this->referencesList) > 0) {
$lm = "\t\t\t<nav epub:type=\"landmarks\">\n"
. "\t\t\t\t<h2"
. ($this->writingDirection === EPub::DIRECTION_RIGHT_TO_LEFT ? " dir=\"rtl\"" : "")
. ">" . $this->referencesTitle . "</h2>\n"
. "\t\t\t\t<ol>\n";
$li = "";
while (list($item, $descriptive) = each($this->referencesOrder)) {
if (array_key_exists($item, $this->referencesList)) {
$li .= "\t\t\t\t\t<li><a epub:type=\""
. $item
. "\" href=\"" . $this->referencesList[$item] . "\">"
. (empty($this->referencesName[$item]) ? $descriptive : $this->referencesName[$item])
. "</a></li>\n";
}
}
if (empty($li)) {
return "";
}
$lm .= $li
. "\t\t\t\t</ol>\n"
. "\t\t\t</nav>\n";
}
return $lm;
}
}
/**
* ePub NavMap class
*/
class NavMap {
const _VERSION = 3.00;
private $navPoints = array();
private $navLevels = 0;
private $writingDirection = NULL;
/**
* Class constructor.
*
* @return void
*/
function __construct($writingDirection = NULL) {
$this->setWritingDirection($writingDirection);
}
/**
* Class destructor
*
* @return void
*/
function __destruct() {
unset($this->navPoints, $this->navLevels, $this->writingDirection);
}
/**
* Set the writing direction to be used for this NavPoint.
*
* @param string $writingDirection
*/
function setWritingDirection($writingDirection) {
$this->writingDirection = isset($writingDirection) && is_string($writingDirection) ? trim($writingDirection) : NULL;
}
function getWritingDirection() {
return $this->writingDirection;
}
/**
* Add a navPoint to the root of the NavMap.
*
* @param NavPoint $navPoint
* @return NavMap
*/
function addNavPoint($navPoint) {
if ($navPoint != NULL && is_object($navPoint) && get_class($navPoint) === "NavPoint") {
$navPoint->setParent($this);
if ($navPoint->getWritingDirection() == NULL) {
$navPoint->setWritingDirection($this->writingDirection);
}
$this->navPoints[] = $navPoint;
return $navPoint;
}
return $this;
}
/**
* The final max depth for the "dtb:depth" meta attribute
* Only available after finalize have been called.
*
* @return number
*/
function getNavLevels() {
return $this->navLevels+1;
}
function getLevel() {
return 1;
}
function getParent() {
return $this;
}
/**
* Finalize the navMap, the final max depth for the "dtb:depth" meta attribute can be retrieved with getNavLevels after finalization
*
*/
function finalize() {
$playOrder = 0;
$this->navLevels = 0;
$nav = "\t<navMap>\n";
if (sizeof($this->navPoints) > 0) {
$this->navLevels++;
foreach ($this->navPoints as $navPoint) {
$retLevel = $navPoint->finalize($nav, $playOrder, 0);
if ($retLevel > $this->navLevels) {
$this->navLevels = $retLevel;
}
}
}
return $nav . "\t</navMap>\n";
}
/**
* Finalize the navMap, the final max depth for the "dtb:depth" meta attribute can be retrieved with getNavLevels after finalization
*
*/
function finalizeEPub3() {
$playOrder = 0;
$level = 0;
$this->navLevels = 0;
$nav = "\t\t<nav epub:type=\"toc\" id=\"toc\">\n";
if (sizeof($this->navPoints) > 0) {
$this->navLevels++;
$nav .= str_repeat("\t", $level) . "\t\t\t<ol epub:type=\"list\">\n";
foreach ($this->navPoints as $navPoint) {
$retLevel = $navPoint->finalizeEPub3($nav, $playOrder, 0);
if ($retLevel > $this->navLevels) {
$this->navLevels = $retLevel;
}
}
$nav .= str_repeat("\t", $level) . "\t\t\t</ol>\n";
}
return $nav . "\t\t</nav>\n";
}
}
/**
* ePub NavPoint class
*/
class NavPoint {
const _VERSION = 3.00;
private $label = NULL;
private $contentSrc = NULL;
private $id = NULL;
private $navClass = NULL;
private $isNavHidden = FALSE;
private $navPoints = array();
private $parent = NULL;
/**
* Class constructor.
*
* All three attributes are mandatory, though if ID is set to null (default) the value will be generated.
*
* @param string $label
* @param string $contentSrc
* @param string $id
* @param string $navClass
* @param bool $isNavHidden
* @param string $writingDirection
*/
function __construct($label, $contentSrc = NULL, $id = NULL, $navClass = NULL, $isNavHidden = FALSE, $writingDirection = NULL) {
$this->setLabel($label);
$this->setContentSrc($contentSrc);
$this->setId($id);
$this->setNavClass($navClass);
$this->setNavHidden($isNavHidden);
$this->setWritingDirection($writingDirection);
}
/**
* Class destructor
*
* @return void
*/
function __destruct() {
unset($this->label, $this->contentSrc, $this->id, $this->navClass);
unset($this->isNavHidden, $this->navPoints, $this->parent);
}
/**
* Set the Text label for the NavPoint.
*
* The label is mandatory.
*
* @param string $label
*/
function setLabel($label) {
$this->label = is_string($label) ? trim($label) : NULL;
}
/**
* Get the Text label for the NavPoint.
*
* @return string Label
*/
function getLabel() {
return $this->label;
}
/**
* Set the src reference for the NavPoint.
*
* The src is mandatory for ePub 2.
*
* @param string $contentSrc
*/
function setContentSrc($contentSrc) {
$this->contentSrc = isset($contentSrc) && is_string($contentSrc) ? trim($contentSrc) : NULL;
}
/**
* Get the src reference for the NavPoint.
*
* @return string content src url.
*/
function getContentSrc() {
return $this->contentSrc;
}
/**
* Set the parent for this NavPoint.
*
* @param NavPoint or NavMap $parent
*/
function setParent($parent) {
if ($parent != NULL && is_object($parent) &&
(get_class($parent) === "NavPoint" || get_class($parent) === "NavMap") ) {
$this->parent = $parent;
}
}
/**
* Get the parent to this NavPoint.
*
* @return NavPoint, or NavMap if the parent is the root.
*/
function getParent() {
return $this->parent;
}
/**
* Get the current level. 1 = document root.
*
* @return int level
*/
function getLevel() {
return $this->parent === NULL ? 1 : $this->parent->getLevel()+1;
}
/**
* Set the id for the NavPoint.
*
* The id must be unique, and is mandatory.
*
* @param string $id
*/
function setId($id) {
$this->id = is_string($id) ? trim($id) : NULL;
}
/**
* Set the class to be used for this NavPoint.
*
* @param string $navClass
*/
function setNavClass($navClass) {
$this->navClass = isset($navClass) && is_string($navClass) ? trim($navClass) : NULL;
}
/**
* Set the class to be used for this NavPoint.
*
* @param string $navClass
*/
function setNavHidden($isNavHidden) {
$this->isNavHidden = $isNavHidden === TRUE;
}
/**
* Set the writing direction to be used for this NavPoint.
*
* @param string $writingDirection
*/
function setWritingDirection($writingDirection) {
$this->writingDirection = isset($writingDirection) && is_string($writingDirection) ? trim($writingDirection) : NULL;
}
function getWritingDirection() {
return $this->writingDirection;
}
/**
* Add child NavPoints for multi level NavMaps.
*
* @param NavPoint $navPoint
*/
function addNavPoint($navPoint) {
if ($navPoint != NULL && is_object($navPoint) && get_class($navPoint) === "NavPoint") {
$navPoint->setParent($this);
if ($navPoint->getWritingDirection() == NULL) {
$navPoint->setWritingDirection($this->writingDirection);
}
$this->navPoints[] = $navPoint;
return $navPoint;
}
return $this;
}
/**
*
* Enter description here ...
*
* @param string $nav
* @param int $playOrder
* @param int $level
* @return int
*/
function finalize(&$nav = "", &$playOrder = 0, $level = 0) {
$maxLevel = $level;
$levelAdjust = 0;
if ($this->isNavHidden) {
return $maxLevel;
}
if (isset($this->contentSrc)) {
$playOrder++;
if ($this->id == NULL) {
$this->id = "navpoint-" . $playOrder;
}
$nav .= str_repeat("\t", $level) . "\t\t<navPoint id=\"" . $this->id . "\" playOrder=\"" . $playOrder . "\">\n"
. str_repeat("\t", $level) . "\t\t\t<navLabel>\n"
. str_repeat("\t", $level) . "\t\t\t\t<text>" . $this->label . "</text>\n"
. str_repeat("\t", $level) . "\t\t\t</navLabel>\n"
. str_repeat("\t", $level) . "\t\t\t<content src=\"" . $this->contentSrc . "\" />\n";
} else {
$levelAdjust++;
}
if (sizeof($this->navPoints) > 0) {
$maxLevel++;
foreach ($this->navPoints as $navPoint) {
$retLevel = $navPoint->finalize($nav, $playOrder, ($level+1+$levelAdjust));
if ($retLevel > $maxLevel) {
$maxLevel = $retLevel;
}
}
}
if (isset($this->contentSrc)) {
$nav .= str_repeat("\t", $level) . "\t\t</navPoint>\n";
}
return $maxLevel;
}
/**
*
* Enter description here ...
*
* @param string $nav
* @param int $playOrder
* @param int $level
* @return int
*/
function finalizeEPub3(&$nav = "", &$playOrder = 0, $level = 0, $subLevelClass = NULL, $subLevelHidden = FALSE) {
$maxLevel = $level;
if ($this->id == NULL) {
$this->id = "navpoint-" . $playOrder;
}
$indent = str_repeat("\t", $level) . "\t\t\t\t";
$nav .= $indent . "<li id=\"" . $this->id . "\"";
if (isset($this->writingDirection)) {
$nav .= " dir=\"" . $this->writingDirection . "\"";
}
$nav .= ">\n";
if (isset($this->contentSrc)) {
$nav .= $indent . "\t<a href=\"" . $this->contentSrc . "\">" . $this->label . "</a>\n";
} else {
$nav .= $indent . "\t<span>" . $this->label . "</span>\n";
}
if (sizeof($this->navPoints) > 0) {
$maxLevel++;
$nav .= $indent . "\t<ol epub:type=\"list\"";
if (isset($subLevelClass)) {
$nav .= " class=\"" . $subLevelClass . "\"";
}
if ($subLevelHidden) {
$nav .= " hidden=\"hidden\"";
}
$nav .= ">\n";
foreach ($this->navPoints as $navPoint) {
$retLevel = $navPoint->finalizeEPub3($nav, $playOrder, ($level+2), $subLevelClass, $subLevelHidden);
if ($retLevel > $maxLevel) {
$maxLevel = $retLevel;
}
}
$nav .= $indent . "\t</ol>\n";
}
$nav .= $indent . "</li>\n";
return $maxLevel;
}
}
?>

File diff suppressed because it is too large Load Diff

2438
inc/3rdparty/libraries/PHPePub/EPub.php vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,201 @@
<?php
/**
* Split an HTML file into smaller html files, retaining the formatting and structure for the individual parts.
* What this splitter does is using DOM to try and retain any formatting in the file, including rebuilding the DOM tree for subsequent parts.
* Split size is considered max target size. The actual size is the result of an even split across the resulting files.
*
* @author A. Grandt <php@grandt.com>
* @copyright 2009-2014 A. Grandt
* @license GNU LGPL 2.1
* @link http://www.phpclasses.org/package/6115
* @link https://github.com/Grandt/PHPePub
* @version 3.20
*/
class EPubChapterSplitter {
const VERSION = 3.20;
private $splitDefaultSize = 250000;
private $bookVersion = EPub::BOOK_VERSION_EPUB2;
/**
*
* Enter description here ...
*
* @param unknown_type $ident
*/
function setVersion($bookVersion) {
$this->bookVersion = is_string($bookVersion) ? trim($bookVersion) : EPub::BOOK_VERSION_EPUB2;
}
/**
* Set default chapter target size.
* Default is 250000 bytes, and minimum is 10240 bytes.
*
* @param $size segment size in bytes
* @return void
*/
function setSplitSize($size) {
$this->splitDefaultSize = (int)$size;
if ($size < 10240) {
$this->splitDefaultSize = 10240; // Making the file smaller than 10k is not a good idea.
}
}
/**
* Get the chapter target size.
*
* @return $size
*/
function getSplitSize() {
return $this->splitDefaultSize;
}
/**
* Split $chapter into multiple parts.
*
* The search string can either be a regular string or a PHP PECL Regular Expression pattern as defined here: http://www.php.net/manual/en/pcre.pattern.php
* If the search string is a regular string, the matching will be for lines in the HTML starting with the string given
*
* @param String $chapter XHTML file
* @param Bool $splitOnSearchString Split on chapter boundaries, Splitting on search strings disables the split size check.
* @param String $searchString Chapter string to search for can be fixed text, or a regular expression pattern.
*
* @return array with 1 or more parts
*/
function splitChapter($chapter, $splitOnSearchString = false, $searchString = '/^Chapter\\ /i') {
$chapterData = array();
$isSearchRegexp = $splitOnSearchString && (preg_match('#^(\D|\S|\W).+\1[imsxeADSUXJu]*$#m', $searchString) == 1);
if ($splitOnSearchString && !$isSearchRegexp) {
$searchString = '#^<.+?>' . preg_quote($searchString, '#') . "#";
}
if (!$splitOnSearchString && strlen($chapter) <= $this->splitDefaultSize) {
return array($chapter);
}
$xmlDoc = new DOMDocument();
@$xmlDoc->loadHTML($chapter);
$head = $xmlDoc->getElementsByTagName("head");
$body = $xmlDoc->getElementsByTagName("body");
$htmlPos = stripos($chapter, "<html");
$htmlEndPos = stripos($chapter, ">", $htmlPos);
$newXML = substr($chapter, 0, $htmlEndPos+1) . "\n</html>";
if (strpos(trim($newXML), "<?xml ") === FALSE) {
$newXML = "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n" . $newXML;
}
$headerLength = strlen($newXML);
$files = array();
$chapterNames = array();
$domDepth = 0;
$domPath = array();
$domClonedPath = array();
$curFile = $xmlDoc->createDocumentFragment();
$files[] = $curFile;
$curParent = $curFile;
$curSize = 0;
$bodyLen = strlen($xmlDoc->saveXML($body->item(0)));
$headLen = strlen($xmlDoc->saveXML($head->item(0))) + $headerLength;
$partSize = $this->splitDefaultSize - $headLen;
if ($bodyLen > $partSize) {
$parts = ceil($bodyLen / $partSize);
$partSize = ($bodyLen / $parts) - $headLen;
}
$node = $body->item(0)->firstChild;
do {
$nodeData = $xmlDoc->saveXML($node);
$nodeLen = strlen($nodeData);
if ($nodeLen > $partSize && $node->hasChildNodes()) {
$domPath[] = $node;
$domClonedPath[] = $node->cloneNode(false);
$domDepth++;
$node = $node->firstChild;
}
$node2 = $node->nextSibling;
if ($node != null && $node->nodeName != "#text") {
$doSplit = false;
if ($splitOnSearchString) {
$doSplit = preg_match($searchString, $nodeData) == 1;
if ($doSplit) {
$chapterNames[] = trim($nodeData);
}
}
if ($curSize > 0 && ($doSplit || (!$splitOnSearchString && $curSize + $nodeLen > $partSize))) {
$curFile = $xmlDoc->createDocumentFragment();
$files[] = $curFile;
$curParent = $curFile;
if ($domDepth > 0) {
reset($domPath);
reset($domClonedPath);
$oneDomClonedPath = each($domClonedPath);
while ($oneDomClonedPath) {
list($k, $v) = $oneDomClonedPath;
$newParent = $v->cloneNode(false);
$curParent->appendChild($newParent);
$curParent = $newParent;
$oneDomClonedPath = each($domClonedPath);
}
}
$curSize = strlen($xmlDoc->saveXML($curFile));
}
$curParent->appendChild($node->cloneNode(true));
$curSize += $nodeLen;
}
$node = $node2;
while ($node == null && $domDepth > 0) {
$domDepth--;
$node = end($domPath)->nextSibling;
array_pop($domPath);
array_pop($domClonedPath);
$curParent = $curParent->parentNode;
}
} while ($node != null);
$curFile = null;
$curSize = 0;
$xml = new DOMDocument('1.0', $xmlDoc->xmlEncoding);
$xml->lookupPrefix("http://www.w3.org/1999/xhtml");
$xml->preserveWhiteSpace = false;
$xml->formatOutput = true;
for ($idx = 0; $idx < count($files); $idx++) {
$xml2Doc = new DOMDocument('1.0', $xmlDoc->xmlEncoding);
$xml2Doc->lookupPrefix("http://www.w3.org/1999/xhtml");
$xml2Doc->loadXML($newXML);
$html = $xml2Doc->getElementsByTagName("html")->item(0);
$html->appendChild($xml2Doc->importNode($head->item(0), true));
$body = $xml2Doc->createElement("body");
$html->appendChild($body);
$body->appendChild($xml2Doc->importNode($files[$idx], true));
// force pretty printing and correct formatting, should not be needed, but it is.
$xml->loadXML($xml2Doc->saveXML());
$doc = $xml->saveXML();
if ($this->bookVersion === EPub::BOOK_VERSION_EPUB3) {
$doc = preg_replace('#^\s*<!DOCTYPE\ .+?>\s*#im', '', $doc);
}
$chapterData[$splitOnSearchString ? $chapterNames[$idx] : $idx] = $doc;
}
return $chapterData;
}
}
?>

View File

@ -0,0 +1,92 @@
<?php
/**
* Simple log line aggregator.
*
* @author A. Grandt <php@grandt.com>
* @copyright 2012-2013 A. Grandt
* @license GNU LGPL, Attribution required for commercial implementations, requested for everything else.
* @version 1.00
*/
class Logger {
const VERSION = 1.00;
private $log = "";
private $tStart;
private $tLast;
private $name = NULL;
private $isLogging = FALSE;
private $isDebugging = FALSE;
/**
* Class constructor.
*
* @return void
*/
function __construct($name = NULL, $isLogging = FALSE) {
if ($name === NULL) {
$this->name = "";
} else {
$this->name = $name . " : ";
}
$this->isLogging = $isLogging;
$this->start();
}
/**
* Class destructor
*
* @return void
* @TODO make sure elements in the destructor match the current class elements
*/
function __destruct() {
unset($this->log);
}
function start() {
/* Prepare Logging. Just in case it's used. later */
if ($this->isLogging) {
$this->tStart = gettimeofday();
$this->tLast = $this->tStart;
$this->log = "<h1>Log: " . $this->name . "</h1>\n<pre>Started: " . gmdate("D, d M Y H:i:s T", $this->tStart['sec']) . "\n &#916; Start ; &#916; Last ;";
$this->logLine("Start");
}
}
function dumpInstalledModules() {
if ($this->isLogging) {
$isCurlInstalled = extension_loaded('curl') && function_exists('curl_version');
$isGdInstalled = extension_loaded('gd') && function_exists('gd_info');
$isExifInstalled = extension_loaded('exif') && function_exists('exif_imagetype');
$isFileGetContentsInstalled = function_exists('file_get_contents');
$isFileGetContentsExtInstalled = $isFileGetContentsInstalled && ini_get('allow_url_fopen');
$this->logLine("isCurlInstalled...............: " . ($isCurlInstalled ? "Yes" : "No"));
$this->logLine("isGdInstalled.................: " . ($isGdInstalled ? "Yes" : "No"));
$this->logLine("isExifInstalled...............: " . ($isExifInstalled ? "Yes" : "No"));
$this->logLine("isFileGetContentsInstalled....: " . ($isFileGetContentsInstalled ? "Yes" : "No"));
$this->logLine("isFileGetContentsExtInstalled.: " . ($isFileGetContentsExtInstalled ? "Yes" : "No"));
}
}
function logLine($line) {
if ($this->isLogging) {
$tTemp = gettimeofday();
$tS = $this->tStart['sec'] + (((int)($this->tStart['usec']/100))/10000);
$tL = $this->tLast['sec'] + (((int)($this->tLast['usec']/100))/10000);
$tT = $tTemp['sec'] + (((int)($tTemp['usec']/100))/10000);
$logline = sprintf("\n+%08.04f; +%08.04f; ", ($tT-$tS), ($tT-$tL)) . $this->name . $line;
$this->log .= $logline;
$this->tLast = $tTemp;
if ($this->isDebugging) {
echo "<pre>" . $logline . "\n</pre>\n";
}
}
}
function getLog() {
return $this->log;
}
}
?>

818
inc/3rdparty/libraries/PHPePub/Zip.php vendored Normal file
View File

@ -0,0 +1,818 @@
<?php
/**
* Class to create and manage a Zip file.
*
* Initially inspired by CreateZipFile by Rochak Chauhan www.rochakchauhan.com (http://www.phpclasses.org/browse/package/2322.html)
* and
* http://www.pkware.com/documents/casestudies/APPNOTE.TXT Zip file specification.
*
* License: GNU LGPL, Attribution required for commercial implementations, requested for everything else.
*
* @author A. Grandt <php@grandt.com>
* @copyright 2009-2014 A. Grandt
* @license GNU LGPL 2.1
* @link http://www.phpclasses.org/package/6110
* @link https://github.com/Grandt/PHPZip
* @version 1.60
*/
class Zip {
const VERSION = 1.60;
const ZIP_LOCAL_FILE_HEADER = "\x50\x4b\x03\x04"; // Local file header signature
const ZIP_CENTRAL_FILE_HEADER = "\x50\x4b\x01\x02"; // Central file header signature
const ZIP_END_OF_CENTRAL_DIRECTORY = "\x50\x4b\x05\x06\x00\x00\x00\x00"; //end of Central directory record
const EXT_FILE_ATTR_DIR = 010173200020; // Permission 755 drwxr-xr-x = (((S_IFDIR | 0755) << 16) | S_DOS_D);
const EXT_FILE_ATTR_FILE = 020151000040; // Permission 644 -rw-r--r-- = (((S_IFREG | 0644) << 16) | S_DOS_A);
const ATTR_VERSION_TO_EXTRACT = "\x14\x00"; // Version needed to extract
const ATTR_MADE_BY_VERSION = "\x1E\x03"; // Made By Version
// Unix file types
const S_IFIFO = 0010000; // named pipe (fifo)
const S_IFCHR = 0020000; // character special
const S_IFDIR = 0040000; // directory
const S_IFBLK = 0060000; // block special
const S_IFREG = 0100000; // regular
const S_IFLNK = 0120000; // symbolic link
const S_IFSOCK = 0140000; // socket
// setuid/setgid/sticky bits, the same as for chmod:
const S_ISUID = 0004000; // set user id on execution
const S_ISGID = 0002000; // set group id on execution
const S_ISTXT = 0001000; // sticky bit
// And of course, the other 12 bits are for the permissions, the same as for chmod:
// When addding these up, you can also just write the permissions as a simgle octal number
// ie. 0755. The leading 0 specifies octal notation.
const S_IRWXU = 0000700; // RWX mask for owner
const S_IRUSR = 0000400; // R for owner
const S_IWUSR = 0000200; // W for owner
const S_IXUSR = 0000100; // X for owner
const S_IRWXG = 0000070; // RWX mask for group
const S_IRGRP = 0000040; // R for group
const S_IWGRP = 0000020; // W for group
const S_IXGRP = 0000010; // X for group
const S_IRWXO = 0000007; // RWX mask for other
const S_IROTH = 0000004; // R for other
const S_IWOTH = 0000002; // W for other
const S_IXOTH = 0000001; // X for other
const S_ISVTX = 0001000; // save swapped text even after use
// Filetype, sticky and permissions are added up, and shifted 16 bits left BEFORE adding the DOS flags.
// DOS file type flags, we really only use the S_DOS_D flag.
const S_DOS_A = 0000040; // DOS flag for Archive
const S_DOS_D = 0000020; // DOS flag for Directory
const S_DOS_V = 0000010; // DOS flag for Volume
const S_DOS_S = 0000004; // DOS flag for System
const S_DOS_H = 0000002; // DOS flag for Hidden
const S_DOS_R = 0000001; // DOS flag for Read Only
private $zipMemoryThreshold = 1048576; // Autocreate tempfile if the zip data exceeds 1048576 bytes (1 MB)
private $zipData = NULL;
private $zipFile = NULL;
private $zipComment = NULL;
private $cdRec = array(); // central directory
private $offset = 0;
private $isFinalized = FALSE;
private $addExtraField = TRUE;
private $streamChunkSize = 65536;
private $streamFilePath = NULL;
private $streamTimestamp = NULL;
private $streamFileComment = NULL;
private $streamFile = NULL;
private $streamData = NULL;
private $streamFileLength = 0;
private $streamExtFileAttr = null;
/**
* Constructor.
*
* @param boolean $useZipFile Write temp zip data to tempFile? Default FALSE
*/
function __construct($useZipFile = FALSE) {
if ($useZipFile) {
$this->zipFile = tmpfile();
} else {
$this->zipData = "";
}
}
function __destruct() {
if (is_resource($this->zipFile)) {
fclose($this->zipFile);
}
$this->zipData = NULL;
}
/**
* Extra fields on the Zip directory records are Unix time codes needed for compatibility on the default Mac zip archive tool.
* These are enabled as default, as they do no harm elsewhere and only add 26 bytes per file added.
*
* @param bool $setExtraField TRUE (default) will enable adding of extra fields, anything else will disable it.
*/
function setExtraField($setExtraField = TRUE) {
$this->addExtraField = ($setExtraField === TRUE);
}
/**
* Set Zip archive comment.
*
* @param string $newComment New comment. NULL to clear.
* @return bool $success
*/
public function setComment($newComment = NULL) {
if ($this->isFinalized) {
return FALSE;
}
$this->zipComment = $newComment;
return TRUE;
}
/**
* Set zip file to write zip data to.
* This will cause all present and future data written to this class to be written to this file.
* This can be used at any time, even after the Zip Archive have been finalized. Any previous file will be closed.
* Warning: If the given file already exists, it will be overwritten.
*
* @param string $fileName
* @return bool $success
*/
public function setZipFile($fileName) {
if (is_file($fileName)) {
unlink($fileName);
}
$fd=fopen($fileName, "x+b");
if (is_resource($this->zipFile)) {
rewind($this->zipFile);
while (!feof($this->zipFile)) {
fwrite($fd, fread($this->zipFile, $this->streamChunkSize));
}
fclose($this->zipFile);
} else {
fwrite($fd, $this->zipData);
$this->zipData = NULL;
}
$this->zipFile = $fd;
return TRUE;
}
/**
* Add an empty directory entry to the zip archive.
* Basically this is only used if an empty directory is added.
*
* @param string $directoryPath Directory Path and name to be added to the archive.
* @param int $timestamp (Optional) Timestamp for the added directory, if omitted or set to 0, the current time will be used.
* @param string $fileComment (Optional) Comment to be added to the archive for this directory. To use fileComment, timestamp must be given.
* @param int $extFileAttr (Optional) The external file reference, use generateExtAttr to generate this.
* @return bool $success
*/
public function addDirectory($directoryPath, $timestamp = 0, $fileComment = NULL, $extFileAttr = self::EXT_FILE_ATTR_DIR) {
if ($this->isFinalized) {
return FALSE;
}
$directoryPath = str_replace("\\", "/", $directoryPath);
$directoryPath = rtrim($directoryPath, "/");
if (strlen($directoryPath) > 0) {
$this->buildZipEntry($directoryPath.'/', $fileComment, "\x00\x00", "\x00\x00", $timestamp, "\x00\x00\x00\x00", 0, 0, $extFileAttr);
return TRUE;
}
return FALSE;
}
/**
* Add a file to the archive at the specified location and file name.
*
* @param string $data File data.
* @param string $filePath Filepath and name to be used in the archive.
* @param int $timestamp (Optional) Timestamp for the added file, if omitted or set to 0, the current time will be used.
* @param string $fileComment (Optional) Comment to be added to the archive for this file. To use fileComment, timestamp must be given.
* @param bool $compress (Optional) Compress file, if set to FALSE the file will only be stored. Default TRUE.
* @param int $extFileAttr (Optional) The external file reference, use generateExtAttr to generate this.
* @return bool $success
*/
public function addFile($data, $filePath, $timestamp = 0, $fileComment = NULL, $compress = TRUE, $extFileAttr = self::EXT_FILE_ATTR_FILE) {
if ($this->isFinalized) {
return FALSE;
}
if (is_resource($data) && get_resource_type($data) == "stream") {
$this->addLargeFile($data, $filePath, $timestamp, $fileComment, $extFileAttr);
return FALSE;
}
$gzData = "";
$gzType = "\x08\x00"; // Compression type 8 = deflate
$gpFlags = "\x00\x00"; // General Purpose bit flags for compression type 8 it is: 0=Normal, 1=Maximum, 2=Fast, 3=super fast compression.
$dataLength = strlen($data);
$fileCRC32 = pack("V", crc32($data));
if ($compress) {
$gzTmp = gzcompress($data);
$gzData = substr(substr($gzTmp, 0, strlen($gzTmp) - 4), 2); // gzcompress adds a 2 byte header and 4 byte CRC we can't use.
// The 2 byte header does contain useful data, though in this case the 2 parameters we'd be interrested in will always be 8 for compression type, and 2 for General purpose flag.
$gzLength = strlen($gzData);
} else {
$gzLength = $dataLength;
}
if ($gzLength >= $dataLength) {
$gzLength = $dataLength;
$gzData = $data;
$gzType = "\x00\x00"; // Compression type 0 = stored
$gpFlags = "\x00\x00"; // Compression type 0 = stored
}
if (!is_resource($this->zipFile) && ($this->offset + $gzLength) > $this->zipMemoryThreshold) {
$this->zipflush();
}
$this->buildZipEntry($filePath, $fileComment, $gpFlags, $gzType, $timestamp, $fileCRC32, $gzLength, $dataLength, $extFileAttr);
$this->zipwrite($gzData);
return TRUE;
}
/**
* Add the content to a directory.
*
* @author Adam Schmalhofer <Adam.Schmalhofer@gmx.de>
* @author A. Grandt
*
* @param string $realPath Path on the file system.
* @param string $zipPath Filepath and name to be used in the archive.
* @param bool $recursive Add content recursively, default is TRUE.
* @param bool $followSymlinks Follow and add symbolic links, if they are accessible, default is TRUE.
* @param array &$addedFiles Reference to the added files, this is used to prevent duplicates, efault is an empty array.
* If you start the function by parsing an array, the array will be populated with the realPath
* and zipPath kay/value pairs added to the archive by the function.
* @param bool $overrideFilePermissions Force the use of the file/dir permissions set in the $extDirAttr
* and $extFileAttr parameters.
* @param int $extDirAttr Permissions for directories.
* @param int $extFileAttr Permissions for files.
*/
public function addDirectoryContent($realPath, $zipPath, $recursive = TRUE, $followSymlinks = TRUE, &$addedFiles = array(),
$overrideFilePermissions = FALSE, $extDirAttr = self::EXT_FILE_ATTR_DIR, $extFileAttr = self::EXT_FILE_ATTR_FILE) {
if (file_exists($realPath) && !isset($addedFiles[realpath($realPath)])) {
if (is_dir($realPath)) {
if ($overrideFilePermissions) {
$this->addDirectory($zipPath, 0, null, $extDirAttr);
} else {
$this->addDirectory($zipPath, 0, null, self::getFileExtAttr($realPath));
}
}
$addedFiles[realpath($realPath)] = $zipPath;
$iter = new DirectoryIterator($realPath);
foreach ($iter as $file) {
if ($file->isDot()) {
continue;
}
$newRealPath = $file->getPathname();
$newZipPath = self::pathJoin($zipPath, $file->getFilename());
if (file_exists($newRealPath) && ($followSymlinks === TRUE || !is_link($newRealPath))) {
if ($file->isFile()) {
$addedFiles[realpath($newRealPath)] = $newZipPath;
if ($overrideFilePermissions) {
$this->addLargeFile($newRealPath, $newZipPath, 0, null, $extFileAttr);
} else {
$this->addLargeFile($newRealPath, $newZipPath, 0, null, self::getFileExtAttr($newRealPath));
}
} else if ($recursive === TRUE) {
$this->addDirectoryContent($newRealPath, $newZipPath, $recursive, $followSymlinks, $addedFiles, $overrideFilePermissions, $extDirAttr, $extFileAttr);
} else {
if ($overrideFilePermissions) {
$this->addDirectory($zipPath, 0, null, $extDirAttr);
} else {
$this->addDirectory($zipPath, 0, null, self::getFileExtAttr($newRealPath));
}
}
}
}
}
}
/**
* Add a file to the archive at the specified location and file name.
*
* @param string $dataFile File name/path.
* @param string $filePath Filepath and name to be used in the archive.
* @param int $timestamp (Optional) Timestamp for the added file, if omitted or set to 0, the current time will be used.
* @param string $fileComment (Optional) Comment to be added to the archive for this file. To use fileComment, timestamp must be given.
* @param int $extFileAttr (Optional) The external file reference, use generateExtAttr to generate this.
* @return bool $success
*/
public function addLargeFile($dataFile, $filePath, $timestamp = 0, $fileComment = NULL, $extFileAttr = self::EXT_FILE_ATTR_FILE) {
if ($this->isFinalized) {
return FALSE;
}
if (is_string($dataFile) && is_file($dataFile)) {
$this->processFile($dataFile, $filePath, $timestamp, $fileComment, $extFileAttr);
} else if (is_resource($dataFile) && get_resource_type($dataFile) == "stream") {
$fh = $dataFile;
$this->openStream($filePath, $timestamp, $fileComment, $extFileAttr);
while (!feof($fh)) {
$this->addStreamData(fread($fh, $this->streamChunkSize));
}
$this->closeStream($this->addExtraField);
}
return TRUE;
}
/**
* Create a stream to be used for large entries.
*
* @param string $filePath Filepath and name to be used in the archive.
* @param int $timestamp (Optional) Timestamp for the added file, if omitted or set to 0, the current time will be used.
* @param string $fileComment (Optional) Comment to be added to the archive for this file. To use fileComment, timestamp must be given.
* @param int $extFileAttr (Optional) The external file reference, use generateExtAttr to generate this.
* @return bool $success
*/
public function openStream($filePath, $timestamp = 0, $fileComment = null, $extFileAttr = self::EXT_FILE_ATTR_FILE) {
if (!function_exists('sys_get_temp_dir')) {
die ("ERROR: Zip " . self::VERSION . " requires PHP version 5.2.1 or above if large files are used.");
}
if ($this->isFinalized) {
return FALSE;
}
$this->zipflush();
if (strlen($this->streamFilePath) > 0) {
$this->closeStream();
}
$this->streamFile = tempnam(sys_get_temp_dir(), 'Zip');
$this->streamData = fopen($this->streamFile, "wb");
$this->streamFilePath = $filePath;
$this->streamTimestamp = $timestamp;
$this->streamFileComment = $fileComment;
$this->streamFileLength = 0;
$this->streamExtFileAttr = $extFileAttr;
return TRUE;
}
/**
* Add data to the open stream.
*
* @param string $data
* @return mixed length in bytes added or FALSE if the archive is finalized or there are no open stream.
*/
public function addStreamData($data) {
if ($this->isFinalized || strlen($this->streamFilePath) == 0) {
return FALSE;
}
$length = fwrite($this->streamData, $data, strlen($data));
if ($length != strlen($data)) {
die ("<p>Length mismatch</p>\n");
}
$this->streamFileLength += $length;
return $length;
}
/**
* Close the current stream.
*
* @return bool $success
*/
public function closeStream() {
if ($this->isFinalized || strlen($this->streamFilePath) == 0) {
return FALSE;
}
fflush($this->streamData);
fclose($this->streamData);
$this->processFile($this->streamFile, $this->streamFilePath, $this->streamTimestamp, $this->streamFileComment, $this->streamExtFileAttr);
$this->streamData = null;
$this->streamFilePath = null;
$this->streamTimestamp = null;
$this->streamFileComment = null;
$this->streamFileLength = 0;
$this->streamExtFileAttr = null;
// Windows is a little slow at times, so a millisecond later, we can unlink this.
unlink($this->streamFile);
$this->streamFile = null;
return TRUE;
}
private function processFile($dataFile, $filePath, $timestamp = 0, $fileComment = null, $extFileAttr = self::EXT_FILE_ATTR_FILE) {
if ($this->isFinalized) {
return FALSE;
}
$tempzip = tempnam(sys_get_temp_dir(), 'ZipStream');
$zip = new ZipArchive;
if ($zip->open($tempzip) === TRUE) {
$zip->addFile($dataFile, 'file');
$zip->close();
}
$file_handle = fopen($tempzip, "rb");
$stats = fstat($file_handle);
$eof = $stats['size']-72;
fseek($file_handle, 6);
$gpFlags = fread($file_handle, 2);
$gzType = fread($file_handle, 2);
fread($file_handle, 4);
$fileCRC32 = fread($file_handle, 4);
$v = unpack("Vval", fread($file_handle, 4));
$gzLength = $v['val'];
$v = unpack("Vval", fread($file_handle, 4));
$dataLength = $v['val'];
$this->buildZipEntry($filePath, $fileComment, $gpFlags, $gzType, $timestamp, $fileCRC32, $gzLength, $dataLength, $extFileAttr);
fseek($file_handle, 34);
$pos = 34;
while (!feof($file_handle) && $pos < $eof) {
$datalen = $this->streamChunkSize;
if ($pos + $this->streamChunkSize > $eof) {
$datalen = $eof-$pos;
}
$data = fread($file_handle, $datalen);
$pos += $datalen;
$this->zipwrite($data);
}
fclose($file_handle);
unlink($tempzip);
}
/**
* Close the archive.
* A closed archive can no longer have new files added to it.
*
* @return bool $success
*/
public function finalize() {
if (!$this->isFinalized) {
if (strlen($this->streamFilePath) > 0) {
$this->closeStream();
}
$cd = implode("", $this->cdRec);
$cdRecSize = pack("v", sizeof($this->cdRec));
$cdRec = $cd . self::ZIP_END_OF_CENTRAL_DIRECTORY
. $cdRecSize . $cdRecSize
. pack("VV", strlen($cd), $this->offset);
if (!empty($this->zipComment)) {
$cdRec .= pack("v", strlen($this->zipComment)) . $this->zipComment;
} else {
$cdRec .= "\x00\x00";
}
$this->zipwrite($cdRec);
$this->isFinalized = TRUE;
$this->cdRec = NULL;
return TRUE;
}
return FALSE;
}
/**
* Get the handle ressource for the archive zip file.
* If the zip haven't been finalized yet, this will cause it to become finalized
*
* @return zip file handle
*/
public function getZipFile() {
if (!$this->isFinalized) {
$this->finalize();
}
$this->zipflush();
rewind($this->zipFile);
return $this->zipFile;
}
/**
* Get the zip file contents
* If the zip haven't been finalized yet, this will cause it to become finalized
*
* @return zip data
*/
public function getZipData() {
if (!$this->isFinalized) {
$this->finalize();
}
if (!is_resource($this->zipFile)) {
return $this->zipData;
} else {
rewind($this->zipFile);
$filestat = fstat($this->zipFile);
return fread($this->zipFile, $filestat['size']);
}
}
/**
* Send the archive as a zip download
*
* @param String $fileName The name of the Zip archive, in ISO-8859-1 (or ASCII) encoding, ie. "archive.zip". Optional, defaults to NULL, which means that no ISO-8859-1 encoded file name will be specified.
* @param String $contentType Content mime type. Optional, defaults to "application/zip".
* @param String $utf8FileName The name of the Zip archive, in UTF-8 encoding. Optional, defaults to NULL, which means that no UTF-8 encoded file name will be specified.
* @param bool $inline Use Content-Disposition with "inline" instead of "attached". Optional, defaults to FALSE.
* @return bool $success
*/
function sendZip($fileName = null, $contentType = "application/zip", $utf8FileName = null, $inline = false) {
if (!$this->isFinalized) {
$this->finalize();
}
$headerFile = null;
$headerLine = null;
if (!headers_sent($headerFile, $headerLine) or die("<p><strong>Error:</strong> Unable to send file $fileName. HTML Headers have already been sent from <strong>$headerFile</strong> in line <strong>$headerLine</strong></p>")) {
if ((ob_get_contents() === FALSE || ob_get_contents() == '') or die("\n<p><strong>Error:</strong> Unable to send file <strong>$fileName</strong>. Output buffer contains the following text (typically warnings or errors):<br>" . htmlentities(ob_get_contents()) . "</p>")) {
if (ini_get('zlib.output_compression')) {
ini_set('zlib.output_compression', 'Off');
}
header("Pragma: public");
header("Last-Modified: " . gmdate("D, d M Y H:i:s T"));
header("Expires: 0");
header("Accept-Ranges: bytes");
header("Connection: close");
header("Content-Type: " . $contentType);
$cd = "Content-Disposition: ";
if ($inline) {
$cd .= "inline";
} else{
$cd .= "attached";
}
if ($fileName) {
$cd .= '; filename="' . $fileName . '"';
}
if ($utf8FileName) {
$cd .= "; filename*=UTF-8''" . rawurlencode($utf8FileName);
}
header($cd);
header("Content-Length: ". $this->getArchiveSize());
if (!is_resource($this->zipFile)) {
echo $this->zipData;
} else {
rewind($this->zipFile);
while (!feof($this->zipFile)) {
echo fread($this->zipFile, $this->streamChunkSize);
}
}
}
return TRUE;
}
return FALSE;
}
/**
* Return the current size of the archive
*
* @return $size Size of the archive
*/
public function getArchiveSize() {
if (!is_resource($this->zipFile)) {
return strlen($this->zipData);
}
$filestat = fstat($this->zipFile);
return $filestat['size'];
}
/**
* Calculate the 2 byte dostime used in the zip entries.
*
* @param int $timestamp
* @return 2-byte encoded DOS Date
*/
private function getDosTime($timestamp = 0) {
$timestamp = (int)$timestamp;
$oldTZ = @date_default_timezone_get();
date_default_timezone_set('UTC');
$date = ($timestamp == 0 ? getdate() : getdate($timestamp));
date_default_timezone_set($oldTZ);
if ($date["year"] >= 1980) {
return pack("V", (($date["mday"] + ($date["mon"] << 5) + (($date["year"]-1980) << 9)) << 16) |
(($date["seconds"] >> 1) + ($date["minutes"] << 5) + ($date["hours"] << 11)));
}
return "\x00\x00\x00\x00";
}
/**
* Build the Zip file structures
*
* @param string $filePath
* @param string $fileComment
* @param string $gpFlags
* @param string $gzType
* @param int $timestamp
* @param string $fileCRC32
* @param int $gzLength
* @param int $dataLength
* @param int $extFileAttr Use self::EXT_FILE_ATTR_FILE for files, self::EXT_FILE_ATTR_DIR for Directories.
*/
private function buildZipEntry($filePath, $fileComment, $gpFlags, $gzType, $timestamp, $fileCRC32, $gzLength, $dataLength, $extFileAttr) {
$filePath = str_replace("\\", "/", $filePath);
$fileCommentLength = (empty($fileComment) ? 0 : strlen($fileComment));
$timestamp = (int)$timestamp;
$timestamp = ($timestamp == 0 ? time() : $timestamp);
$dosTime = $this->getDosTime($timestamp);
$tsPack = pack("V", $timestamp);
$ux = "\x75\x78\x0B\x00\x01\x04\xE8\x03\x00\x00\x04\x00\x00\x00\x00";
if (!isset($gpFlags) || strlen($gpFlags) != 2) {
$gpFlags = "\x00\x00";
}
$isFileUTF8 = mb_check_encoding($filePath, "UTF-8") && !mb_check_encoding($filePath, "ASCII");
$isCommentUTF8 = !empty($fileComment) && mb_check_encoding($fileComment, "UTF-8") && !mb_check_encoding($fileComment, "ASCII");
if ($isFileUTF8 || $isCommentUTF8) {
$flag = 0;
$gpFlagsV = unpack("vflags", $gpFlags);
if (isset($gpFlagsV['flags'])) {
$flag = $gpFlagsV['flags'];
}
$gpFlags = pack("v", $flag | (1 << 11));
}
$header = $gpFlags . $gzType . $dosTime. $fileCRC32
. pack("VVv", $gzLength, $dataLength, strlen($filePath)); // File name length
$zipEntry = self::ZIP_LOCAL_FILE_HEADER;
$zipEntry .= self::ATTR_VERSION_TO_EXTRACT;
$zipEntry .= $header;
$zipEntry .= pack("v", ($this->addExtraField ? 28 : 0)); // Extra field length
$zipEntry .= $filePath; // FileName
// Extra fields
if ($this->addExtraField) {
$zipEntry .= "\x55\x54\x09\x00\x03" . $tsPack . $tsPack . $ux;
}
$this->zipwrite($zipEntry);
$cdEntry = self::ZIP_CENTRAL_FILE_HEADER;
$cdEntry .= self::ATTR_MADE_BY_VERSION;
$cdEntry .= ($dataLength === 0 ? "\x0A\x00" : self::ATTR_VERSION_TO_EXTRACT);
$cdEntry .= $header;
$cdEntry .= pack("v", ($this->addExtraField ? 24 : 0)); // Extra field length
$cdEntry .= pack("v", $fileCommentLength); // File comment length
$cdEntry .= "\x00\x00"; // Disk number start
$cdEntry .= "\x00\x00"; // internal file attributes
$cdEntry .= pack("V", $extFileAttr); // External file attributes
$cdEntry .= pack("V", $this->offset); // Relative offset of local header
$cdEntry .= $filePath; // FileName
// Extra fields
if ($this->addExtraField) {
$cdEntry .= "\x55\x54\x05\x00\x03" . $tsPack . $ux;
}
if (!empty($fileComment)) {
$cdEntry .= $fileComment; // Comment
}
$this->cdRec[] = $cdEntry;
$this->offset += strlen($zipEntry) + $gzLength;
}
private function zipwrite($data) {
if (!is_resource($this->zipFile)) {
$this->zipData .= $data;
} else {
fwrite($this->zipFile, $data);
fflush($this->zipFile);
}
}
private function zipflush() {
if (!is_resource($this->zipFile)) {
$this->zipFile = tmpfile();
fwrite($this->zipFile, $this->zipData);
$this->zipData = NULL;
}
}
/**
* Join $file to $dir path, and clean up any excess slashes.
*
* @param string $dir
* @param string $file
*/
public static function pathJoin($dir, $file) {
if (empty($dir) || empty($file)) {
return self::getRelativePath($dir . $file);
}
return self::getRelativePath($dir . '/' . $file);
}
/**
* Clean up a path, removing any unnecessary elements such as /./, // or redundant ../ segments.
* If the path starts with a "/", it is deemed an absolute path and any /../ in the beginning is stripped off.
* The returned path will not end in a "/".
*
* Sometimes, when a path is generated from multiple fragments,
* you can get something like "../data/html/../images/image.jpeg"
* This will normalize that example path to "../data/images/image.jpeg"
*
* @param string $path The path to clean up
* @return string the clean path
*/
public static function getRelativePath($path) {
$path = preg_replace("#/+\.?/+#", "/", str_replace("\\", "/", $path));
$dirs = explode("/", rtrim(preg_replace('#^(?:\./)+#', '', $path), '/'));
$offset = 0;
$sub = 0;
$subOffset = 0;
$root = "";
if (empty($dirs[0])) {
$root = "/";
$dirs = array_splice($dirs, 1);
} else if (preg_match("#[A-Za-z]:#", $dirs[0])) {
$root = strtoupper($dirs[0]) . "/";
$dirs = array_splice($dirs, 1);
}
$newDirs = array();
foreach ($dirs as $dir) {
if ($dir !== "..") {
$subOffset--;
$newDirs[++$offset] = $dir;
} else {
$subOffset++;
if (--$offset < 0) {
$offset = 0;
if ($subOffset > $sub) {
$sub++;
}
}
}
}
if (empty($root)) {
$root = str_repeat("../", $sub);
}
return $root . implode("/", array_slice($newDirs, 0, $offset));
}
/**
* Create the file permissions for a file or directory, for use in the extFileAttr parameters.
*
* @param int $owner Unix permisions for owner (octal from 00 to 07)
* @param int $group Unix permisions for group (octal from 00 to 07)
* @param int $other Unix permisions for others (octal from 00 to 07)
* @param bool $isFile
* @return EXTRERNAL_REF field.
*/
public static function generateExtAttr($owner = 07, $group = 05, $other = 05, $isFile = true) {
$fp = $isFile ? self::S_IFREG : self::S_IFDIR;
$fp |= (($owner & 07) << 6) | (($group & 07) << 3) | ($other & 07);
return ($fp << 16) | ($isFile ? self::S_DOS_A : self::S_DOS_D);
}
/**
* Get the file permissions for a file or directory, for use in the extFileAttr parameters.
*
* @param string $filename
* @return external ref field, or FALSE if the file is not found.
*/
public static function getFileExtAttr($filename) {
if (file_exists($filename)) {
$fp = fileperms($filename) << 16;
return $fp | (is_dir($filename) ? self::S_DOS_D : self::S_DOS_A);
}
return FALSE;
}
}
?>

View File

@ -0,0 +1,31 @@
DrUUID RFC4122 library for PHP5
by J. King (http://jkingweb.ca/)
Licensed under MIT license
See http://jkingweb.ca/code/php/lib.uuid/
for documentation
Last revised 2010-02-15
Copyright (c) 2009 J. King
Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without
restriction, including without limitation the rights to use,
copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following
conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.

View File

@ -0,0 +1,314 @@
<?php
/*
DrUUID RFC4122 library for PHP5
by J. King (http://jkingweb.ca/)
Licensed under MIT license
See http://jkingweb.ca/code/php/lib.uuid/
for documentation
Last revised 2010-02-15
*/
/*
Copyright (c) 2009 J. King
Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without
restriction, including without limitation the rights to use,
copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following
conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
*/
class UUID {
const MD5 = 3;
const SHA1 = 5;
const clearVer = 15; // 00001111 Clears all bits of version byte with AND
const clearVar = 63; // 00111111 Clears all relevant bits of variant byte with AND
const varRes = 224; // 11100000 Variant reserved for future use
const varMS = 192; // 11000000 Microsft GUID variant
const varRFC = 128; // 10000000 The RFC 4122 variant (this variant)
const varNCS = 0; // 00000000 The NCS compatibility variant
const version1 = 16; // 00010000
const version3 = 48; // 00110000
const version4 = 64; // 01000000
const version5 = 80; // 01010000
const interval = 0x01b21dd213814000; // Time (in 100ns steps) between the start of the UTC and Unix epochs
const nsDNS = '6ba7b810-9dad-11d1-80b4-00c04fd430c8';
const nsURL = '6ba7b811-9dad-11d1-80b4-00c04fd430c8';
const nsOID = '6ba7b812-9dad-11d1-80b4-00c04fd430c8';
const nsX500 = '6ba7b814-9dad-11d1-80b4-00c04fd430c8';
protected static $randomFunc = 'randomTwister';
protected static $randomSource = NULL;
//instance properties
protected $bytes;
protected $hex;
protected $string;
protected $urn;
protected $version;
protected $variant;
protected $node;
protected $time;
public static function mint($ver = 1, $node = NULL, $ns = NULL) {
/* Create a new UUID based on provided data. */
switch((int) $ver) {
case 1:
return new self(self::mintTime($node));
case 2:
// Version 2 is not supported
throw new UUIDException("Version 2 is unsupported.");
case 3:
return new self(self::mintName(self::MD5, $node, $ns));
case 4:
return new self(self::mintRand());
case 5:
return new self(self::mintName(self::SHA1, $node, $ns));
default:
throw new UUIDException("Selected version is invalid or unsupported.");
}
}
public static function import($uuid) {
/* Import an existing UUID. */
return new self(self::makeBin($uuid, 16));
}
public static function compare($a, $b) {
/* Compares the binary representations of two UUIDs.
The comparison will return true if they are bit-exact,
or if neither is valid. */
if (self::makeBin($a, 16)==self::makeBin($b, 16)) {
return TRUE;
} else {
return FALSE;
}
}
public function __toString() {
return $this->string;
}
public function __get($var) {
switch($var) {
case "bytes":
return $this->bytes;
case "hex":
return bin2hex($this->bytes);
case "string":
return $this->__toString();
case "urn":
return "urn:uuid:".$this->__toString();
case "version":
return ord($this->bytes[6]) >> 4;
case "variant":
$byte = ord($this->bytes[8]);
if ($byte >= self::varRes) {
return 3;
}
if ($byte >= self::varMS) {
return 2;
}
if ($byte >= self::varRFC) {
return 1;
}
return 0;
case "node":
if (ord($this->bytes[6])>>4==1) {
return bin2hex(substr($this->bytes,10));
} else {
return NULL;
}
case "time":
if (ord($this->bytes[6])>>4==1) {
// Restore contiguous big-endian byte order
$time = bin2hex($this->bytes[6].$this->bytes[7].$this->bytes[4].$this->bytes[5].$this->bytes[0].$this->bytes[1].$this->bytes[2].$this->bytes[3]);
// Clear version flag
$time[0] = "0";
// Do some reverse arithmetic to get a Unix timestamp
$time = (hexdec($time) - self::interval) / 10000000;
return $time;
} else {
return NULL;
}
default:
return NULL;
}
}
protected function __construct($uuid) {
if (strlen($uuid) != 16) {
throw new UUIDException("Input must be a 128-bit integer.");
}
$this->bytes = $uuid;
// Optimize the most common use
$this->string =
bin2hex(substr($uuid,0,4))."-".
bin2hex(substr($uuid,4,2))."-".
bin2hex(substr($uuid,6,2))."-".
bin2hex(substr($uuid,8,2))."-".
bin2hex(substr($uuid,10,6));
}
protected static function mintTime($node = NULL) {
/* Generates a Version 1 UUID.
These are derived from the time at which they were generated. */
// Get time since Gregorian calendar reform in 100ns intervals
// This is exceedingly difficult because of PHP's (and pack()'s)
// integer size limits.
// Note that this will never be more accurate than to the microsecond.
$time = microtime(1) * 10000000 + self::interval;
// Convert to a string representation
$time = sprintf("%F", $time);
preg_match("/^\d+/", $time, $time); //strip decimal point
// And now to a 64-bit binary representation
$time = base_convert($time[0], 10, 16);
$time = pack("H*", str_pad($time, 16, "0", STR_PAD_LEFT));
// Reorder bytes to their proper locations in the UUID
$uuid = $time[4].$time[5].$time[6].$time[7].$time[2].$time[3].$time[0].$time[1];
// Generate a random clock sequence
$uuid .= self::randomBytes(2);
// set variant
$uuid[8] = chr(ord($uuid[8]) & self::clearVar | self::varRFC);
// set version
$uuid[6] = chr(ord($uuid[6]) & self::clearVer | self::version1);
// Set the final 'node' parameter, a MAC address
if ($node) {
$node = self::makeBin($node, 6);
}
if (!$node) {
// If no node was provided or if the node was invalid,
// generate a random MAC address and set the multicast bit
$node = self::randomBytes(6);
$node[0] = pack("C", ord($node[0]) | 1);
}
$uuid .= $node;
return $uuid;
}
protected static function mintRand() {
/* Generate a Version 4 UUID.
These are derived soly from random numbers. */
// generate random fields
$uuid = self::randomBytes(16);
// set variant
$uuid[8] = chr(ord($uuid[8]) & self::clearVar | self::varRFC);
// set version
$uuid[6] = chr(ord($uuid[6]) & self::clearVer | self::version4);
return $uuid;
}
protected static function mintName($ver, $node, $ns) {
/* Generates a Version 3 or Version 5 UUID.
These are derived from a hash of a name and its namespace, in binary form. */
if (!$node) {
throw new UUIDException("A name-string is required for Version 3 or 5 UUIDs.");
}
// if the namespace UUID isn't binary, make it so
$ns = self::makeBin($ns, 16);
if (!$ns) {
throw new UUIDException("A binary namespace is required for Version 3 or 5 UUIDs.");
}
$uuid = null;
$version = self::version3;
switch($ver) {
case self::MD5:
$version = self::version3;
$uuid = md5($ns.$node,1);
break;
case self::SHA1:
$version = self::version5;
$uuid = substr(sha1($ns.$node,1),0, 16);
break;
}
// set variant
$uuid[8] = chr(ord($uuid[8]) & self::clearVar | self::varRFC);
// set version
$uuid[6] = chr(ord($uuid[6]) & self::clearVer | $version);
return ($uuid);
}
protected static function makeBin($str, $len) {
/* Insure that an input string is either binary or hexadecimal.
Returns binary representation, or false on failure. */
if ($str instanceof self) {
return $str->bytes;
}
if (strlen($str)==$len) {
return $str;
} else {
$str = preg_replace("/^urn:uuid:/is", "", $str); // strip URN scheme and namespace
}
$str = preg_replace("/[^a-f0-9]/is", "", $str); // strip non-hex characters
if (strlen($str) != ($len * 2)) {
return FALSE;
} else {
return pack("H*", $str);
}
}
public static function initRandom() {
/* Look for a system-provided source of randomness, which is usually crytographically secure.
/dev/urandom is tried first simply out of bias for Linux systems. */
if (is_readable('/dev/urandom')) {
self::$randomSource = fopen('/dev/urandom', 'rb');
self::$randomFunc = 'randomFRead';
}
else if (class_exists('COM', 0)) {
try {
self::$randomSource = new COM('CAPICOM.Utilities.1'); // See http://msdn.microsoft.com/en-us/library/aa388182(VS.85).aspx
self::$randomFunc = 'randomCOM';
}
catch(Exception $e) {
}
}
return self::$randomFunc;
}
public static function randomBytes($bytes) {
return call_user_func(array('self', self::$randomFunc), $bytes);
}
protected static function randomTwister($bytes) {
/* Get the specified number of random bytes, using mt_rand().
Randomness is returned as a string of bytes. */
$rand = "";
for ($a = 0; $a < $bytes; $a++) {
$rand .= chr(mt_rand(0, 255));
}
return $rand;
}
protected static function randomFRead($bytes) {
/* Get the specified number of random bytes using a file handle
previously opened with UUID::initRandom().
Randomness is returned as a string of bytes. */
return fread(self::$randomSource, $bytes);
}
protected static function randomCOM($bytes) {
/* Get the specified number of random bytes using Windows'
randomness source via a COM object previously created by UUID::initRandom().
Randomness is returned as a string of bytes. */
return base64_decode(self::$randomSource->GetRandom($bytes,0)); // straight binary mysteriously doesn't work, hence the base64
}
}
class UUIDException extends Exception {
}

File diff suppressed because it is too large Load Diff

View File

@ -1,338 +1,343 @@
<?php
/**
* Site Config
*
* Each instance of this class should hold extraction patterns and other directives
* for a website. See ContentExtractor class to see how it's used.
*
* @version 0.7
* @date 2012-08-27
* @author Keyvan Minoukadeh
* @copyright 2012 Keyvan Minoukadeh
* @license http://www.gnu.org/licenses/agpl-3.0.html AGPL v3
*/
class SiteConfig
{
// Use first matching element as title (0 or more xpath expressions)
public $title = array();
// Use first matching element as body (0 or more xpath expressions)
public $body = array();
// Use first matching element as author (0 or more xpath expressions)
public $author = array();
// Use first matching element as date (0 or more xpath expressions)
public $date = array();
// Strip elements matching these xpath expressions (0 or more)
public $strip = array();
// Strip elements which contain these strings (0 or more) in the id or class attribute
public $strip_id_or_class = array();
// Strip images which contain these strings (0 or more) in the src attribute
public $strip_image_src = array();
// Additional HTTP headers to send
// NOT YET USED
public $http_header = array();
// Process HTML with tidy before creating DOM (bool or null if undeclared)
public $tidy = null;
protected $default_tidy = true; // used if undeclared
// Autodetect title/body if xpath expressions fail to produce results.
// Note that this applies to title and body separately, ie.
// * if we get a body match but no title match, this option will determine whether we autodetect title
// * if neither match, this determines whether we autodetect title and body.
// Also note that this only applies when there is at least one xpath expression in title or body, ie.
// * if title and body are both empty (no xpath expressions), this option has no effect (both title and body will be auto-detected)
// * if there's an xpath expression for title and none for body, body will be auto-detected and this option will determine whether we auto-detect title if the xpath expression for it fails to produce results.
// Usage scenario: you want to extract something specific from a set of URLs, e.g. a table, and if the table is not found, you want to ignore the entry completely. Auto-detection is unlikely to succeed here, so you construct your patterns and set this option to false. Another scenario may be a site where auto-detection has proven to fail (or worse, picked up the wrong content).
// bool or null if undeclared
public $autodetect_on_failure = null;
protected $default_autodetect_on_failure = true; // used if undeclared
// Clean up content block - attempt to remove elements that appear to be superfluous
// bool or null if undeclared
public $prune = null;
protected $default_prune = true; // used if undeclared
// Test URL - if present, can be used to test the config above
public $test_url = array();
// Single-page link - should identify a link element or URL pointing to the page holding the entire article
// This is useful for sites which split their articles across multiple pages. Links to such pages tend to
// display the first page with links to the other pages at the bottom. Often there is also a link to a page
// which displays the entire article on one page (e.g. 'print view').
// This should be an XPath expression identifying the link to that page. If present and we find a match,
// we will retrieve that page and the rest of the options in this config will be applied to the new page.
public $single_page_link = array();
public $next_page_link = array();
// Single-page link in feed? - same as above, but patterns applied to item description HTML taken from feed
public $single_page_link_in_feed = array();
// Which parser to use for turning raw HTML into a DOMDocument (either 'libxml' or 'html5lib')
// string or null if undeclared
public $parser = null;
protected $default_parser = 'libxml'; // used if undeclared
// Strings to search for in HTML before processing begins (used with $replace_string)
public $find_string = array();
// Strings to replace those found in $find_string before HTML processing begins
public $replace_string = array();
// the options below cannot be set in the config files which this class represents
//public $cache_in_apc = false; // used to decide if we should cache in apc or not
public $cache_key = null;
public static $debug = false;
protected static $apc = false;
protected static $config_path;
protected static $config_path_fallback;
protected static $config_cache = array();
const HOSTNAME_REGEX = '/^(([a-zA-Z0-9-]*[a-zA-Z0-9])\.)*([A-Za-z0-9-]*[A-Za-z0-9])$/';
protected static function debug($msg) {
if (self::$debug) {
//$mem = round(memory_get_usage()/1024, 2);
//$memPeak = round(memory_get_peak_usage()/1024, 2);
echo '* ',$msg;
//echo ' - mem used: ',$mem," (peak: $memPeak)\n";
echo "\n";
ob_flush();
flush();
}
}
// enable APC caching of certain site config files?
// If enabled the following site config files will be
// cached in APC cache (when requested for first time):
// * anything in site_config/custom/ and its corresponding file in site_config/standard/
// * the site config files associated with HTML fingerprints
// * the global site config file
// returns true if enabled, false otherwise
public static function use_apc($apc=true) {
if (!function_exists('apc_add')) {
if ($apc) self::debug('APC will not be used (function apc_add does not exist)');
return false;
}
self::$apc = $apc;
return $apc;
}
// return bool or null
public function tidy($use_default=true) {
if ($use_default) return (isset($this->tidy)) ? $this->tidy : $this->default_tidy;
return $this->tidy;
}
// return bool or null
public function prune($use_default=true) {
if ($use_default) return (isset($this->prune)) ? $this->prune : $this->default_prune;
return $this->prune;
}
// return string or null
public function parser($use_default=true) {
if ($use_default) return (isset($this->parser)) ? $this->parser : $this->default_parser;
return $this->parser;
}
// return bool or null
public function autodetect_on_failure($use_default=true) {
if ($use_default) return (isset($this->autodetect_on_failure)) ? $this->autodetect_on_failure : $this->default_autodetect_on_failure;
return $this->autodetect_on_failure;
}
public static function set_config_path($path, $fallback=null) {
self::$config_path = $path;
self::$config_path_fallback = $fallback;
}
public static function add_to_cache($key, SiteConfig $config, $use_apc=true) {
$key = strtolower($key);
if (substr($key, 0, 4) == 'www.') $key = substr($key, 4);
if ($config->cache_key) $key = $config->cache_key;
self::$config_cache[$key] = $config;
if (self::$apc && $use_apc) {
self::debug("Adding site config to APC cache with key sc.$key");
apc_add("sc.$key", $config);
}
self::debug("Cached site config with key $key");
}
public static function is_cached($key) {
$key = strtolower($key);
if (substr($key, 0, 4) == 'www.') $key = substr($key, 4);
if (array_key_exists($key, self::$config_cache)) {
return true;
} elseif (self::$apc && (bool)apc_fetch("sc.$key")) {
return true;
}
return false;
}
public function append(SiteConfig $newconfig) {
// check for commands where we accept multiple statements (no test_url)
foreach (array('title', 'body', 'author', 'date', 'strip', 'strip_id_or_class', 'strip_image_src', 'single_page_link', 'single_page_link_in_feed', 'next_page_link', 'http_header', 'find_string', 'replace_string') as $var) {
// append array elements for this config variable from $newconfig to this config
//$this->$var = $this->$var + $newconfig->$var;
$this->$var = array_unique(array_merge($this->$var, $newconfig->$var));
}
// check for single statement commands
// we do not overwrite existing non null values
foreach (array('tidy', 'prune', 'parser', 'autodetect_on_failure') as $var) {
if ($this->$var === null) $this->$var = $newconfig->$var;
}
}
// returns SiteConfig instance if an appropriate one is found, false otherwise
// if $exact_host_match is true, we will not look for wildcard config matches
// by default if host is 'test.example.org' we will look for and load '.example.org.txt' if it exists
public static function build($host, $exact_host_match=false) {
$host = strtolower($host);
if (substr($host, 0, 4) == 'www.') $host = substr($host, 4);
if (!$host || (strlen($host) > 200) || !preg_match(self::HOSTNAME_REGEX, ltrim($host, '.'))) return false;
// check for site configuration
$try = array($host);
// should we look for wildcard matches
if (!$exact_host_match) {
$split = explode('.', $host);
if (count($split) > 1) {
array_shift($split);
$try[] = '.'.implode('.', $split);
}
}
// look for site config file in primary folder
self::debug(". looking for site config for $host in primary folder");
foreach ($try as $h) {
if (array_key_exists($h, self::$config_cache)) {
self::debug("... site config for $h already loaded in this request");
return self::$config_cache[$h];
} elseif (self::$apc && ($sconfig = apc_fetch("sc.$h"))) {
self::debug("... site config for $h in APC cache");
return $sconfig;
} elseif (file_exists(self::$config_path."/$h.txt")) {
self::debug("... found site config ($h.txt)");
$file_primary = self::$config_path."/$h.txt";
$matched_name = $h;
break;
}
}
// if we found site config, process it
if (isset($file_primary)) {
$config_lines = file($file_primary, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
if (!$config_lines || !is_array($config_lines)) return false;
$config = self::build_from_array($config_lines);
// if APC caching is available and enabled, mark this for cache
//$config->cache_in_apc = true;
$config->cache_key = $matched_name;
// if autodetec on failure is off (on by default) we do not need to look
// in secondary folder
if (!$config->autodetect_on_failure()) {
self::debug('... autodetect on failure is disabled (no other site config files will be loaded)');
return $config;
}
}
// look for site config file in secondary folder
if (isset(self::$config_path_fallback)) {
self::debug(". looking for site config for $host in secondary folder");
foreach ($try as $h) {
if (file_exists(self::$config_path_fallback."/$h.txt")) {
self::debug("... found site config in secondary folder ($h.txt)");
$file_secondary = self::$config_path_fallback."/$h.txt";
$matched_name = $h;
break;
}
}
if (!isset($file_secondary)) {
self::debug("... no site config match in secondary folder");
}
}
// return false if no config file found
if (!isset($file_primary) && !isset($file_secondary)) {
self::debug("... no site config match for $host");
return false;
}
// return primary config if secondary not found
if (!isset($file_secondary) && isset($config)) {
return $config;
}
// process secondary config file
$config_lines = file($file_secondary, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
if (!$config_lines || !is_array($config_lines)) {
// failed to process secondary
if (isset($config)) {
// return primary config
return $config;
} else {
return false;
}
}
// merge with primary and return
if (isset($config)) {
self::debug('. merging config files');
$config->append(self::build_from_array($config_lines));
return $config;
} else {
// return just secondary
$config = self::build_from_array($config_lines);
// if APC caching is available and enabled, mark this for cache
//$config->cache_in_apc = true;
$config->cache_key = $matched_name;
return $config;
}
}
public static function build_from_array(array $lines) {
$config = new SiteConfig();
foreach ($lines as $line) {
$line = trim($line);
// skip comments, empty lines
if ($line == '' || $line[0] == '#') continue;
// get command
$command = explode(':', $line, 2);
// if there's no colon ':', skip this line
if (count($command) != 2) continue;
$val = trim($command[1]);
$command = trim($command[0]);
if ($command == '' || $val == '') continue;
// check for commands where we accept multiple statements
if (in_array($command, array('title', 'body', 'author', 'date', 'strip', 'strip_id_or_class', 'strip_image_src', 'single_page_link', 'single_page_link_in_feed', 'next_page_link', 'http_header', 'test_url', 'find_string', 'replace_string'))) {
array_push($config->$command, $val);
// check for single statement commands that evaluate to true or false
} elseif (in_array($command, array('tidy', 'prune', 'autodetect_on_failure'))) {
$config->$command = ($val == 'yes');
// check for single statement commands stored as strings
} elseif (in_array($command, array('parser'))) {
$config->$command = $val;
// check for replace_string(find): replace
} elseif ((substr($command, -1) == ')') && preg_match('!^([a-z0-9_]+)\((.*?)\)$!i', $command, $match)) {
if (in_array($match[1], array('replace_string'))) {
$command = $match[1];
array_push($config->find_string, $match[2]);
array_push($config->$command, $val);
}
}
}
return $config;
}
}
?>
<?php
/**
* Site Config
*
* Each instance of this class should hold extraction patterns and other directives
* for a website. See ContentExtractor class to see how it's used.
*
* @version 0.8
* @date 2013-04-16
* @author Keyvan Minoukadeh
* @copyright 2013 Keyvan Minoukadeh
* @license http://www.gnu.org/licenses/agpl-3.0.html AGPL v3
*/
class SiteConfig
{
// Use first matching element as title (0 or more xpath expressions)
public $title = array();
// Use first matching element as body (0 or more xpath expressions)
public $body = array();
// Use first matching element as author (0 or more xpath expressions)
public $author = array();
// Use first matching element as date (0 or more xpath expressions)
public $date = array();
// Strip elements matching these xpath expressions (0 or more)
public $strip = array();
// Strip elements which contain these strings (0 or more) in the id or class attribute
public $strip_id_or_class = array();
// Strip images which contain these strings (0 or more) in the src attribute
public $strip_image_src = array();
// Additional HTTP headers to send
// NOT YET USED
public $http_header = array();
// Process HTML with tidy before creating DOM (bool or null if undeclared)
public $tidy = null;
protected $default_tidy = true; // used if undeclared
// Autodetect title/body if xpath expressions fail to produce results.
// Note that this applies to title and body separately, ie.
// * if we get a body match but no title match, this option will determine whether we autodetect title
// * if neither match, this determines whether we autodetect title and body.
// Also note that this only applies when there is at least one xpath expression in title or body, ie.
// * if title and body are both empty (no xpath expressions), this option has no effect (both title and body will be auto-detected)
// * if there's an xpath expression for title and none for body, body will be auto-detected and this option will determine whether we auto-detect title if the xpath expression for it fails to produce results.
// Usage scenario: you want to extract something specific from a set of URLs, e.g. a table, and if the table is not found, you want to ignore the entry completely. Auto-detection is unlikely to succeed here, so you construct your patterns and set this option to false. Another scenario may be a site where auto-detection has proven to fail (or worse, picked up the wrong content).
// bool or null if undeclared
public $autodetect_on_failure = null;
protected $default_autodetect_on_failure = true; // used if undeclared
// Clean up content block - attempt to remove elements that appear to be superfluous
// bool or null if undeclared
public $prune = null;
protected $default_prune = true; // used if undeclared
// Test URL - if present, can be used to test the config above
public $test_url = array();
// Single-page link - should identify a link element or URL pointing to the page holding the entire article
// This is useful for sites which split their articles across multiple pages. Links to such pages tend to
// display the first page with links to the other pages at the bottom. Often there is also a link to a page
// which displays the entire article on one page (e.g. 'print view').
// This should be an XPath expression identifying the link to that page. If present and we find a match,
// we will retrieve that page and the rest of the options in this config will be applied to the new page.
public $single_page_link = array();
public $next_page_link = array();
// Single-page link in feed? - same as above, but patterns applied to item description HTML taken from feed
public $single_page_link_in_feed = array();
// Which parser to use for turning raw HTML into a DOMDocument (either 'libxml' or 'html5lib')
// string or null if undeclared
public $parser = null;
protected $default_parser = 'libxml'; // used if undeclared
// Strings to search for in HTML before processing begins (used with $replace_string)
public $find_string = array();
// Strings to replace those found in $find_string before HTML processing begins
public $replace_string = array();
// the options below cannot be set in the config files which this class represents
//public $cache_in_apc = false; // used to decide if we should cache in apc or not
public $cache_key = null;
public static $debug = false;
protected static $apc = false;
protected static $config_path;
protected static $config_path_fallback;
protected static $config_cache = array();
const HOSTNAME_REGEX = '/^(([a-zA-Z0-9-]*[a-zA-Z0-9])\.)*([A-Za-z0-9-]*[A-Za-z0-9])$/';
protected static function debug($msg) {
if (self::$debug) {
//$mem = round(memory_get_usage()/1024, 2);
//$memPeak = round(memory_get_peak_usage()/1024, 2);
echo '* ',$msg;
//echo ' - mem used: ',$mem," (peak: $memPeak)\n";
echo "\n";
ob_flush();
flush();
}
}
// enable APC caching of certain site config files?
// If enabled the following site config files will be
// cached in APC cache (when requested for first time):
// * anything in site_config/custom/ and its corresponding file in site_config/standard/
// * the site config files associated with HTML fingerprints
// * the global site config file
// returns true if enabled, false otherwise
public static function use_apc($apc=true) {
if (!function_exists('apc_add')) {
if ($apc) self::debug('APC will not be used (function apc_add does not exist)');
return false;
}
self::$apc = $apc;
return $apc;
}
// return bool or null
public function tidy($use_default=true) {
if ($use_default) return (isset($this->tidy)) ? $this->tidy : $this->default_tidy;
return $this->tidy;
}
// return bool or null
public function prune($use_default=true) {
if ($use_default) return (isset($this->prune)) ? $this->prune : $this->default_prune;
return $this->prune;
}
// return string or null
public function parser($use_default=true) {
if ($use_default) return (isset($this->parser)) ? $this->parser : $this->default_parser;
return $this->parser;
}
// return bool or null
public function autodetect_on_failure($use_default=true) {
if ($use_default) return (isset($this->autodetect_on_failure)) ? $this->autodetect_on_failure : $this->default_autodetect_on_failure;
return $this->autodetect_on_failure;
}
public static function set_config_path($path, $fallback=null) {
self::$config_path = $path;
self::$config_path_fallback = $fallback;
}
public static function add_to_cache($key, SiteConfig $config, $use_apc=true) {
$key = strtolower($key);
if (substr($key, 0, 4) == 'www.') $key = substr($key, 4);
if ($config->cache_key) $key = $config->cache_key;
self::$config_cache[$key] = $config;
if (self::$apc && $use_apc) {
self::debug("Adding site config to APC cache with key sc.$key");
apc_add("sc.$key", $config);
}
self::debug("Cached site config with key $key");
}
public static function is_cached($key) {
$key = strtolower($key);
if (substr($key, 0, 4) == 'www.') $key = substr($key, 4);
if (array_key_exists($key, self::$config_cache)) {
return true;
} elseif (self::$apc && (bool)apc_fetch("sc.$key")) {
return true;
}
return false;
}
public function append(SiteConfig $newconfig) {
// check for commands where we accept multiple statements (no test_url)
foreach (array('title', 'body', 'author', 'date', 'strip', 'strip_id_or_class', 'strip_image_src', 'single_page_link', 'single_page_link_in_feed', 'next_page_link', 'http_header') as $var) {
// append array elements for this config variable from $newconfig to this config
//$this->$var = $this->$var + $newconfig->$var;
$this->$var = array_unique(array_merge($this->$var, $newconfig->$var));
}
// check for single statement commands
// we do not overwrite existing non null values
foreach (array('tidy', 'prune', 'parser', 'autodetect_on_failure') as $var) {
if ($this->$var === null) $this->$var = $newconfig->$var;
}
// treat find_string and replace_string separately (don't apply array_unique) (thanks fabrizio!)
foreach (array('find_string', 'replace_string') as $var) {
// append array elements for this config variable from $newconfig to this config
//$this->$var = $this->$var + $newconfig->$var;
$this->$var = array_merge($this->$var, $newconfig->$var);
}
}
// returns SiteConfig instance if an appropriate one is found, false otherwise
// if $exact_host_match is true, we will not look for wildcard config matches
// by default if host is 'test.example.org' we will look for and load '.example.org.txt' if it exists
public static function build($host, $exact_host_match=false) {
$host = strtolower($host);
if (substr($host, 0, 4) == 'www.') $host = substr($host, 4);
if (!$host || (strlen($host) > 200) || !preg_match(self::HOSTNAME_REGEX, ltrim($host, '.'))) return false;
// check for site configuration
$try = array($host);
// should we look for wildcard matches
if (!$exact_host_match) {
$split = explode('.', $host);
if (count($split) > 1) {
array_shift($split);
$try[] = '.'.implode('.', $split);
}
}
// look for site config file in primary folder
self::debug(". looking for site config for $host in primary folder");
foreach ($try as $h) {
if (array_key_exists($h, self::$config_cache)) {
self::debug("... site config for $h already loaded in this request");
return self::$config_cache[$h];
} elseif (self::$apc && ($sconfig = apc_fetch("sc.$h"))) {
self::debug("... site config for $h in APC cache");
return $sconfig;
} elseif (file_exists(self::$config_path."/$h.txt")) {
self::debug("... found site config ($h.txt)");
$file_primary = self::$config_path."/$h.txt";
$matched_name = $h;
break;
}
}
// if we found site config, process it
if (isset($file_primary)) {
$config_lines = file($file_primary, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
if (!$config_lines || !is_array($config_lines)) return false;
$config = self::build_from_array($config_lines);
// if APC caching is available and enabled, mark this for cache
//$config->cache_in_apc = true;
$config->cache_key = $matched_name;
// if autodetec on failure is off (on by default) we do not need to look
// in secondary folder
if (!$config->autodetect_on_failure()) {
self::debug('... autodetect on failure is disabled (no other site config files will be loaded)');
return $config;
}
}
// look for site config file in secondary folder
if (isset(self::$config_path_fallback)) {
self::debug(". looking for site config for $host in secondary folder");
foreach ($try as $h) {
if (file_exists(self::$config_path_fallback."/$h.txt")) {
self::debug("... found site config in secondary folder ($h.txt)");
$file_secondary = self::$config_path_fallback."/$h.txt";
$matched_name = $h;
break;
}
}
if (!isset($file_secondary)) {
self::debug("... no site config match in secondary folder");
}
}
// return false if no config file found
if (!isset($file_primary) && !isset($file_secondary)) {
self::debug("... no site config match for $host");
return false;
}
// return primary config if secondary not found
if (!isset($file_secondary) && isset($config)) {
return $config;
}
// process secondary config file
$config_lines = file($file_secondary, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
if (!$config_lines || !is_array($config_lines)) {
// failed to process secondary
if (isset($config)) {
// return primary config
return $config;
} else {
return false;
}
}
// merge with primary and return
if (isset($config)) {
self::debug('. merging config files');
$config->append(self::build_from_array($config_lines));
return $config;
} else {
// return just secondary
$config = self::build_from_array($config_lines);
// if APC caching is available and enabled, mark this for cache
//$config->cache_in_apc = true;
$config->cache_key = $matched_name;
return $config;
}
}
public static function build_from_array(array $lines) {
$config = new SiteConfig();
foreach ($lines as $line) {
$line = trim($line);
// skip comments, empty lines
if ($line == '' || $line[0] == '#') continue;
// get command
$command = explode(':', $line, 2);
// if there's no colon ':', skip this line
if (count($command) != 2) continue;
$val = trim($command[1]);
$command = trim($command[0]);
if ($command == '' || $val == '') continue;
// check for commands where we accept multiple statements
if (in_array($command, array('title', 'body', 'author', 'date', 'strip', 'strip_id_or_class', 'strip_image_src', 'single_page_link', 'single_page_link_in_feed', 'next_page_link', 'http_header', 'test_url', 'find_string', 'replace_string'))) {
array_push($config->$command, $val);
// check for single statement commands that evaluate to true or false
} elseif (in_array($command, array('tidy', 'prune', 'autodetect_on_failure'))) {
$config->$command = ($val == 'yes');
// check for single statement commands stored as strings
} elseif (in_array($command, array('parser'))) {
$config->$command = $val;
// check for replace_string(find): replace
} elseif ((substr($command, -1) == ')') && preg_match('!^([a-z0-9_]+)\((.*?)\)$!i', $command, $match)) {
if (in_array($match[1], array('replace_string'))) {
$command = $match[1];
array_push($config->find_string, $match[2]);
array_push($config->$command, $val);
}
}
}
return $config;
}
}

103
inc/3rdparty/libraries/feedwriter/FeedItem.php vendored Normal file → Executable file
View File

@ -1,7 +1,7 @@
<?php
/**
* Univarsel Feed Writer
*
*
* FeedItem class - Used as feed element in FeedWriter class
*
* @package UnivarselFeedWriter
@ -12,20 +12,20 @@
{
private $elements = array(); //Collection of feed elements
private $version;
/**
* Constructor
*
* @param contant (RSS1/RSS2/ATOM) RSS2 is default.
*/
* Constructor
*
* @param contant (RSS1/RSS2/ATOM) RSS2 is default.
*/
function __construct($version = RSS2)
{
{
$this->version = $version;
}
/**
* Set element (overwrites existing elements with $elementName)
*
*
* @access public
* @param srting The tag name of an element
* @param srting The content of tag
@ -38,11 +38,11 @@
unset($this->elements[$elementName]);
}
$this->addElement($elementName, $content, $attributes);
}
}
/**
* Add an element to elements array
*
*
* @access public
* @param srting The tag name of an element
* @param srting The content of tag
@ -61,11 +61,11 @@
$this->elements[$elementName][$i]['content'] = $content;
$this->elements[$elementName][$i]['attributes'] = $attributes;
}
/**
* Set multiple feed elements from an array.
* Set multiple feed elements from an array.
* Elements which have attributes cannot be added by this method
*
*
* @access public
* @param array array of elements in 'tagName' => 'tagContent' format.
* @return void
@ -73,15 +73,15 @@
public function addElementArray($elementArray)
{
if(! is_array($elementArray)) return;
foreach ($elementArray as $elementName => $content)
foreach ($elementArray as $elementName => $content)
{
$this->addElement($elementName, $content);
}
}
/**
* Return the collection of elements in this feed item
*
*
* @access public
* @return array
*/
@ -89,68 +89,74 @@
{
return $this->elements;
}
// Wrapper functions ------------------------------------------------------
/**
* Set the 'dscription' element of feed item
*
*
* @access public
* @param string The content of 'description' element
* @return void
*/
public function setDescription($description)
public function setDescription($description)
{
$this->setElement('description', $description);
$tag = ($this->version == ATOM)? 'summary' : 'description';
$this->setElement($tag, $description);
}
/**
* @desc Set the 'title' element of feed item
* @access public
* @param string The content of 'title' element
* @return void
*/
public function setTitle($title)
public function setTitle($title)
{
$this->setElement('title', $title);
$this->setElement('title', $title);
}
/**
* Set the 'date' element of feed item
*
*
* @access public
* @param string The content of 'date' element
* @return void
*/
public function setDate($date)
public function setDate($date)
{
if(! is_numeric($date))
{
$date = strtotime($date);
}
if($this->version == RSS2)
if($this->version == ATOM)
{
$tag = 'pubDate';
$value = date(DATE_RSS, $date);
$tag = 'updated';
$value = date(DATE_ATOM, $date);
}
else
elseif($this->version == RSS2)
{
$tag = 'dc:date';
$value = date("Y-m-d", $date);
$tag = 'pubDate';
$value = date(DATE_RSS, $date);
}
$this->setElement($tag, $value);
else
{
$tag = 'dc:date';
$value = date("Y-m-d", $date);
}
$this->setElement($tag, $value);
}
/**
* Set the 'link' element of feed item
*
*
* @access public
* @param string The content of 'link' element
* @return void
*/
public function setLink($link)
public function setLink($link)
{
if($this->version == RSS2 || $this->version == RSS1)
{
@ -161,26 +167,27 @@
{
$this->setElement('link','',array('href'=>$link));
$this->setElement('id', FeedWriter::uuid($link,'urn:uuid:'));
}
}
}
/**
* Set the 'source' element of feed item
*
*
* @access public
* @param string The content of 'source' element
* @return void
*/
public function setSource($link)
public function setSource($link)
{
$this->setElement('source', $link);
$attributes = array('url'=>$link);
$this->setElement('source', "wallabag",$attributes);
}
/**
* Set the 'encloser' element of feed item
* For RSS 2.0 only
*
*
* @access public
* @param string The url attribute of encloser tag
* @param string The length attribute of encloser tag
@ -192,6 +199,6 @@
$attributes = array('url'=>$url, 'length'=>$length, 'type'=>$type);
$this->setElement('enclosure','',$attributes);
}
} // end of class FeedItem
?>

View File

@ -2,6 +2,7 @@
define('RSS2', 1, true);
define('JSON', 2, true);
define('JSONP', 3, true);
define('ATOM', 4, true);
/**
* Univarsel Feed Writer class
@ -87,20 +88,26 @@ define('JSONP', 3, true);
* @access public
* @return void
*/
public function genarateFeed()
public function genarateFeed($withHeaders = true)
{
if ($this->version == RSS2) {
// header('Content-type: text/xml; charset=UTF-8');
// this line prevents Chrome 20 from prompting download
// used by Google: https://news.google.com/news/feeds?ned=us&topic=b&output=rss
// header('X-content-type-options: nosniff');
} elseif ($this->version == JSON) {
// header('Content-type: application/json; charset=UTF-8');
$this->json = new stdClass();
} elseif ($this->version == JSONP) {
// header('Content-type: application/javascript; charset=UTF-8');
$this->json = new stdClass();
if ($withHeaders) {
if ($this->version == RSS2) {
header('Content-type: text/xml; charset=UTF-8');
// this line prevents Chrome 20 from prompting download
// used by Google: https://news.google.com/news/feeds?ned=us&topic=b&output=rss
header('X-content-type-options: nosniff');
} elseif ($this->version == JSON) {
header('Content-type: application/json; charset=UTF-8');
} elseif ($this->version == JSONP) {
header('Content-type: application/javascript; charset=UTF-8');
}
}
if ($this->version == JSON || $this->version == JSONP) {
$this->json = new stdClass();
}
$this->printHead();
$this->printChannels();
$this->printItems();
@ -110,6 +117,11 @@ define('JSONP', 3, true);
}
}
public function &getItems()
{
return $this->items;
}
/**
* Create a new FeedItem.
*
@ -193,7 +205,8 @@ define('JSONP', 3, true);
*/
public function setDescription($description)
{
$this->setChannelElement('description', $description);
$tag = ($this->version == ATOM)? 'subtitle' : 'description';
$this->setChannelElement($tag, $description);
}
/**
@ -238,7 +251,7 @@ define('JSONP', 3, true);
{
$out = '<?xml version="1.0" encoding="utf-8"?>'."\n";
if ($this->xsl) $out .= '<?xml-stylesheet type="text/xsl" href="'.htmlspecialchars($this->xsl).'"?>' . PHP_EOL;
$out .= '<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/">' . PHP_EOL;
$out .= '<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/">' . PHP_EOL;
echo $out;
}
elseif ($this->version == JSON || $this->version == JSONP)

View File

@ -134,6 +134,7 @@ class HTML5_TreeBuilder {
// Namespaces for foreign content
const NS_HTML = null; // to prevent DOM from requiring NS on everything
const NS_XHTML = 'http://www.w3.org/1999/xhtml';
const NS_MATHML = 'http://www.w3.org/1998/Math/MathML';
const NS_SVG = 'http://www.w3.org/2000/svg';
const NS_XLINK = 'http://www.w3.org/1999/xlink';
@ -3157,11 +3158,19 @@ class HTML5_TreeBuilder {
}
private function insertElement($token, $append = true) {
$el = $this->dom->createElementNS(self::NS_HTML, $token['name']);
//$el = $this->dom->createElementNS(self::NS_HTML, $token['name']);
$namespaceURI = strpos($token['name'], ':') ? self::NS_XHTML : self::NS_HTML;
$el = $this->dom->createElementNS($namespaceURI, $token['name']);
if (!empty($token['attr'])) {
foreach($token['attr'] as $attr) {
if(!$el->hasAttribute($attr['name'])) {
// mike@macgirvin.com 2011-11-17, check attribute name for
// validity (ignoring extenders and combiners) as illegal chars in names
// causes everything to abort
$valid = preg_match('/^[a-zA-Z\_\:]([\-a-zA-Z0-9\_\:\.]+$)/',$attr['name']);
if($attr['name'] && (!$el->hasAttribute($attr['name'])) && ($valid)) {
$el->setAttribute($attr['name'], $attr['value']);
}
}

View File

@ -1,404 +1,403 @@
<?php
/**
* Cookie Jar
*
* PHP class for handling cookies, as defined by the Netscape spec:
* <http://curl.haxx.se/rfc/cookie_spec.html>
*
* This class should be used to handle cookies (storing cookies from HTTP response messages, and
* sending out cookies in HTTP request messages). This has been adapted for FiveFilters.org
* from the original version used in HTTP Navigator. See http://www.keyvan.net/code/http-navigator/
*
* This class is mainly based on Cookies.pm <http://search.cpan.org/author/GAAS/libwww-perl-5.65/
* lib/HTTP/Cookies.pm> from the libwww-perl collection <http://www.linpro.no/lwp/>.
* Unlike Cookies.pm, this class only supports the Netscape cookie spec, not RFC 2965.
*
* @version 0.5
* @date 2011-03-15
* @see http://php.net/HttpRequestPool
* @author Keyvan Minoukadeh
* @copyright 2011 Keyvan Minoukadeh
* @license http://www.gnu.org/licenses/agpl-3.0.html AGPL v3
*/
class CookieJar
{
/**
* Cookies - array containing all cookies.
*
* <pre>
* Cookies are stored like this:
* [domain][path][name] = array
* where array is:
* 0 => value, 1 => secure, 2 => expires
* </pre>
* @var array
* @access private
*/
public $cookies = array();
public $debug = false;
/**
* Constructor
*/
function __construct() {
}
protected function debug($msg, $file=null, $line=null) {
if ($this->debug) {
$mem = round(memory_get_usage()/1024, 2);
$memPeak = round(memory_get_peak_usage()/1024, 2);
echo '* ',$msg;
if (isset($file, $line)) echo " ($file line $line)";
echo ' - mem used: ',$mem," (peak: $memPeak)\n";
ob_flush();
flush();
}
}
/**
* Get matching cookies
*
* Only use this method if you cannot use add_cookie_header(), for example, if you want to use
* this cookie jar class without using the request class.
*
* @param array $param associative array containing 'domain', 'path', 'secure' keys
* @return string
* @see add_cookie_header()
*/
public function getMatchingCookies($url)
{
if (($parts = @parse_url($url)) && isset($parts['scheme'], $parts['host'], $parts['path'])) {
$param['domain'] = $parts['host'];
$param['path'] = $parts['path'];
$param['secure'] = (strtolower($parts['scheme']) == 'https');
unset($parts);
} else {
return false;
}
// RFC 2965 notes:
// If multiple cookies satisfy the criteria above, they are ordered in
// the Cookie header such that those with more specific Path attributes
// precede those with less specific. Ordering with respect to other
// attributes (e.g., Domain) is unspecified.
$domain = $param['domain'];
if (strpos($domain, '.') === false) $domain .= '.local';
$request_path = $param['path'];
if ($request_path == '') $request_path = '/';
$request_secure = $param['secure'];
$now = time();
$matched_cookies = array();
// domain - find matching domains
$this->debug('Finding matching domains for '.$domain, __FILE__, __LINE__);
while (strpos($domain, '.') !== false) {
if (isset($this->cookies[$domain])) {
$this->debug(' domain match found: '.$domain);
$cookies =& $this->cookies[$domain];
} else {
$domain = $this->_reduce_domain($domain);
continue;
}
// paths - find matching paths starting from most specific
$this->debug(' - Finding matching paths for '.$request_path);
$paths = array_keys($cookies);
usort($paths, array($this, '_cmp_length'));
foreach ($paths as $path) {
// continue to next cookie if request path does not path-match cookie path
if (!$this->_path_match($request_path, $path)) continue;
// loop through cookie names
$this->debug(' path match found: '.$path);
foreach ($cookies[$path] as $name => $values) {
// if this cookie is secure but request isn't, continue to next cookie
if ($values[1] && !$request_secure) continue;
// if cookie is not a session cookie and has expired, continue to next cookie
if (is_int($values[2]) && ($values[2] < $now)) continue;
// cookie matches request
$this->debug(' cookie match: '.$name.'='.$values[0]);
$matched_cookies[] = $name.'='.$values[0];
}
}
$domain = $this->_reduce_domain($domain);
}
// return cookies
return implode('; ', $matched_cookies);
}
/**
* Parse Set-Cookie values.
*
* Only use this method if you cannot use extract_cookies(), for example, if you want to use
* this cookie jar class without using the response class.
*
* @param array $set_cookies array holding 1 or more "Set-Cookie" header values
* @param array $param associative array containing 'host', 'path' keys
* @return void
* @see extract_cookies()
*/
public function storeCookies($url, $set_cookies)
{
if (count($set_cookies) == 0) return;
$param = @parse_url($url);
if (!is_array($param) || !isset($param['host'])) return;
$request_host = $param['host'];
if (strpos($request_host, '.') === false) $request_host .= '.local';
$request_path = @$param['path'];
if ($request_path == '') $request_path = '/';
//
// loop through set-cookie headers
//
foreach ($set_cookies as $set_cookie) {
$this->debug('Parsing: '.$set_cookie);
// temporary cookie store (before adding to jar)
$tmp_cookie = array();
$param = explode(';', $set_cookie);
// loop through params
for ($x=0; $x<count($param); $x++) {
$key_val = explode('=', $param[$x], 2);
if (count($key_val) != 2) {
// if the first param isn't a name=value pair, continue to the next set-cookie
// header
if ($x == 0) continue 2;
// check for secure flag
if (strtolower(trim($key_val[0])) == 'secure') $tmp_cookie['secure'] = true;
// continue to next param
continue;
}
list($key, $val) = array_map('trim', $key_val);
// first name=value pair is the cookie name and value
// the name and value are stored under 'name' and 'value' to avoid conflicts
// with later parameters.
if ($x == 0) {
$tmp_cookie = array('name'=>$key, 'value'=>$val);
continue;
}
$key = strtolower($key);
if (in_array($key, array('expires', 'path', 'domain', 'secure'))) {
$tmp_cookie[$key] = $val;
}
}
//
// set cookie
//
// check domain
if (isset($tmp_cookie['domain']) && ($tmp_cookie['domain'] != $request_host) &&
($tmp_cookie['domain'] != ".$request_host")) {
$domain = $tmp_cookie['domain'];
if ((strpos($domain, '.') === false) && ($domain != 'local')) {
$this->debug(' - domain "'.$domain.'" has no dot and is not a local domain');
continue;
}
if (preg_match('/\.[0-9]+$/', $domain)) {
$this->debug(' - domain "'.$domain.'" appears to be an ip address');
continue;
}
if (substr($domain, 0, 1) != '.') $domain = ".$domain";
if (!$this->_domain_match($request_host, $domain)) {
$this->debug(' - request host "'.$request_host.'" does not domain-match "'.$domain.'"');
continue;
}
} else {
// if domain is not specified in the set-cookie header, domain will default to
// the request host
$domain = $request_host;
}
// check path
if (isset($tmp_cookie['path']) && ($tmp_cookie['path'] != '')) {
$path = urldecode($tmp_cookie['path']);
if (!$this->_path_match($request_path, $path)) {
$this->debug(' - request path "'.$request_path.'" does not path-match "'.$path.'"');
continue;
}
} else {
$path = $request_path;
$path = substr($path, 0, strrpos($path, '/'));
if ($path == '') $path = '/';
}
// check if secure
$secure = (isset($tmp_cookie['secure'])) ? true : false;
// check expiry
if (isset($tmp_cookie['expires'])) {
if (($expires = strtotime($tmp_cookie['expires'])) < 0) {
$expires = null;
}
} else {
$expires = null;
}
// set cookie
$this->set_cookie($domain, $path, $tmp_cookie['name'], $tmp_cookie['value'], $secure, $expires);
}
}
// return array of set-cookie values extracted from HTTP response headers (string $h)
public function extractCookies($h) {
$x = 0;
$lines = 0;
$headers = array();
$last_match = false;
$h = explode("\n", $h);
foreach ($h as $line) {
$line = rtrim($line);
$lines++;
$trimmed_line = trim($line);
if (isset($line_last)) {
// check if we have \r\n\r\n (indicating the end of headers)
// some servers will not use CRLF (\r\n), so we make CR (\r) optional.
// if (preg_match('/\015?\012\015?\012/', $line_last.$line)) {
// break;
// }
// As an alternative, we can check if the current trimmed line is empty
if ($trimmed_line == '') {
break;
}
// check for continuation line...
// RFC 2616 Section 2.2 "Basic Rules":
// HTTP/1.1 header field values can be folded onto multiple lines if the
// continuation line begins with a space or horizontal tab. All linear
// white space, including folding, has the same semantics as SP. A
// recipient MAY replace any linear white space with a single SP before
// interpreting the field value or forwarding the message downstream.
if ($last_match && preg_match('/^\s+(.*)/', $line, $match)) {
// append to previous header value
$headers[$x-1] .= ' '.rtrim($match[1]);
continue;
}
}
$line_last = $line;
// split header name and value
if (preg_match('/^Set-Cookie\s*:\s*(.*)/i', $line, $match)) {
$headers[$x++] = rtrim($match[1]);
$last_match = true;
} else {
$last_match = false;
}
}
return $headers;
}
/**
* Set Cookie
* @param string $domain
* @param string $path
* @param string $name cookie name
* @param string $value cookie value
* @param bool $secure
* @param int $expires expiry time (null if session cookie, <= 0 will delete cookie)
* @return void
*/
function set_cookie($domain, $path, $name, $value, $secure=false, $expires=null)
{
if ($domain == '') return;
if ($path == '') return;
if ($name == '') return;
// check if cookie needs to go
if (isset($expires) && ($expires <= 0)) {
if (isset($this->cookies[$domain][$path][$name])) unset($this->cookies[$domain][$path][$name]);
return;
}
if ($value == '') return;
$this->cookies[$domain][$path][$name] = array($value, $secure, $expires);
return;
}
/**
* Clear cookies - [domain [,path [,name]]] - call method with no arguments to clear all cookies.
* @param string $domain
* @param string $path
* @param string $name
* @return void
*/
function clear($domain=null, $path=null, $name=null)
{
if (!isset($domain)) {
$this->cookies = array();
} elseif (!isset($path)) {
if (isset($this->cookies[$domain])) unset($this->cookies[$domain]);
} elseif (!isset($name)) {
if (isset($this->cookies[$domain][$path])) unset($this->cookies[$domain][$path]);
} elseif (isset($name)) {
if (isset($this->cookies[$domain][$path][$name])) unset($this->cookies[$domain][$path][$name]);
}
}
/**
* Compare string length - used for sorting
* @access private
* @return int
*/
function _cmp_length($a, $b)
{
$la = strlen($a); $lb = strlen($b);
if ($la == $lb) return 0;
return ($la > $lb) ? -1 : 1;
}
/**
* Reduce domain
* @param string $domain
* @return string
* @access private
*/
function _reduce_domain($domain)
{
if ($domain == '') return '';
if (substr($domain, 0, 1) == '.') return substr($domain, 1);
return substr($domain, strpos($domain, '.'));
}
/**
* Path match - check if path1 path-matches path2
*
* From RFC 2965:
* <i>For two strings that represent paths, P1 and P2, P1 path-matches P2
* if P2 is a prefix of P1 (including the case where P1 and P2 string-
* compare equal). Thus, the string /tec/waldo path-matches /tec.</i>
* @param string $path1
* @param string $path2
* @return bool
* @access private
*/
function _path_match($path1, $path2)
{
return (substr($path1, 0, strlen($path2)) == $path2);
}
/**
* Domain match - check if domain1 domain-matches domain2
*
* A few extracts from RFC 2965:
* - A Set-Cookie2 from request-host y.x.foo.com for Domain=.foo.com
* would be rejected, because H is y.x and contains a dot.
*
* - A Set-Cookie2 from request-host x.foo.com for Domain=.foo.com
* would be accepted.
*
* - A Set-Cookie2 with Domain=.com or Domain=.com., will always be
* rejected, because there is no embedded dot.
*
* - A Set-Cookie2 from request-host example for Domain=.local will
* be accepted, because the effective host name for the request-
* host is example.local, and example.local domain-matches .local.
*
* I'm ignoring the first point for now (must check to see how other browsers handle
* this rule for Set-Cookie headers)
*
* @param string $domain1
* @param string $domain2
* @return bool
* @access private
*/
function _domain_match($domain1, $domain2)
{
$domain1 = strtolower($domain1);
$domain2 = strtolower($domain2);
while (strpos($domain1, '.') !== false) {
if ($domain1 == $domain2) return true;
$domain1 = $this->_reduce_domain($domain1);
continue;
}
return false;
}
}
?>
<?php
/**
* Cookie Jar
*
* PHP class for handling cookies, as defined by the Netscape spec:
* <http://curl.haxx.se/rfc/cookie_spec.html>
*
* This class should be used to handle cookies (storing cookies from HTTP response messages, and
* sending out cookies in HTTP request messages). This has been adapted for FiveFilters.org
* from the original version used in HTTP Navigator. See http://www.keyvan.net/code/http-navigator/
*
* This class is mainly based on Cookies.pm <http://search.cpan.org/author/GAAS/libwww-perl-5.65/
* lib/HTTP/Cookies.pm> from the libwww-perl collection <http://www.linpro.no/lwp/>.
* Unlike Cookies.pm, this class only supports the Netscape cookie spec, not RFC 2965.
*
* @version 0.5
* @date 2011-03-15
* @see http://php.net/HttpRequestPool
* @author Keyvan Minoukadeh
* @copyright 2011 Keyvan Minoukadeh
* @license http://www.gnu.org/licenses/agpl-3.0.html AGPL v3
*/
class CookieJar
{
/**
* Cookies - array containing all cookies.
*
* <pre>
* Cookies are stored like this:
* [domain][path][name] = array
* where array is:
* 0 => value, 1 => secure, 2 => expires
* </pre>
* @var array
* @access private
*/
public $cookies = array();
public $debug = false;
/**
* Constructor
*/
function __construct() {
}
protected function debug($msg, $file=null, $line=null) {
if ($this->debug) {
$mem = round(memory_get_usage()/1024, 2);
$memPeak = round(memory_get_peak_usage()/1024, 2);
echo '* ',$msg;
if (isset($file, $line)) echo " ($file line $line)";
echo ' - mem used: ',$mem," (peak: $memPeak)\n";
ob_flush();
flush();
}
}
/**
* Get matching cookies
*
* Only use this method if you cannot use add_cookie_header(), for example, if you want to use
* this cookie jar class without using the request class.
*
* @param array $param associative array containing 'domain', 'path', 'secure' keys
* @return string
* @see add_cookie_header()
*/
public function getMatchingCookies($url)
{
if (($parts = @parse_url($url)) && isset($parts['scheme'], $parts['host'], $parts['path'])) {
$param['domain'] = $parts['host'];
$param['path'] = $parts['path'];
$param['secure'] = (strtolower($parts['scheme']) == 'https');
unset($parts);
} else {
return false;
}
// RFC 2965 notes:
// If multiple cookies satisfy the criteria above, they are ordered in
// the Cookie header such that those with more specific Path attributes
// precede those with less specific. Ordering with respect to other
// attributes (e.g., Domain) is unspecified.
$domain = $param['domain'];
if (strpos($domain, '.') === false) $domain .= '.local';
$request_path = $param['path'];
if ($request_path == '') $request_path = '/';
$request_secure = $param['secure'];
$now = time();
$matched_cookies = array();
// domain - find matching domains
$this->debug('Finding matching domains for '.$domain, __FILE__, __LINE__);
while (strpos($domain, '.') !== false) {
if (isset($this->cookies[$domain])) {
$this->debug(' domain match found: '.$domain);
$cookies =& $this->cookies[$domain];
} else {
$domain = $this->_reduce_domain($domain);
continue;
}
// paths - find matching paths starting from most specific
$this->debug(' - Finding matching paths for '.$request_path);
$paths = array_keys($cookies);
usort($paths, array($this, '_cmp_length'));
foreach ($paths as $path) {
// continue to next cookie if request path does not path-match cookie path
if (!$this->_path_match($request_path, $path)) continue;
// loop through cookie names
$this->debug(' path match found: '.$path);
foreach ($cookies[$path] as $name => $values) {
// if this cookie is secure but request isn't, continue to next cookie
if ($values[1] && !$request_secure) continue;
// if cookie is not a session cookie and has expired, continue to next cookie
if (is_int($values[2]) && ($values[2] < $now)) continue;
// cookie matches request
$this->debug(' cookie match: '.$name.'='.$values[0]);
$matched_cookies[] = $name.'='.$values[0];
}
}
$domain = $this->_reduce_domain($domain);
}
// return cookies
return implode('; ', $matched_cookies);
}
/**
* Parse Set-Cookie values.
*
* Only use this method if you cannot use extract_cookies(), for example, if you want to use
* this cookie jar class without using the response class.
*
* @param array $set_cookies array holding 1 or more "Set-Cookie" header values
* @param array $param associative array containing 'host', 'path' keys
* @return void
* @see extract_cookies()
*/
public function storeCookies($url, $set_cookies)
{
if (count($set_cookies) == 0) return;
$param = @parse_url($url);
if (!is_array($param) || !isset($param['host'])) return;
$request_host = $param['host'];
if (strpos($request_host, '.') === false) $request_host .= '.local';
$request_path = @$param['path'];
if ($request_path == '') $request_path = '/';
//
// loop through set-cookie headers
//
foreach ($set_cookies as $set_cookie) {
$this->debug('Parsing: '.$set_cookie);
// temporary cookie store (before adding to jar)
$tmp_cookie = array();
$param = explode(';', $set_cookie);
// loop through params
for ($x=0; $x<count($param); $x++) {
$key_val = explode('=', $param[$x], 2);
if (count($key_val) != 2) {
// if the first param isn't a name=value pair, continue to the next set-cookie
// header
if ($x == 0) continue 2;
// check for secure flag
if (strtolower(trim($key_val[0])) == 'secure') $tmp_cookie['secure'] = true;
// continue to next param
continue;
}
list($key, $val) = array_map('trim', $key_val);
// first name=value pair is the cookie name and value
// the name and value are stored under 'name' and 'value' to avoid conflicts
// with later parameters.
if ($x == 0) {
$tmp_cookie = array('name'=>$key, 'value'=>$val);
continue;
}
$key = strtolower($key);
if (in_array($key, array('expires', 'path', 'domain', 'secure'))) {
$tmp_cookie[$key] = $val;
}
}
//
// set cookie
//
// check domain
if (isset($tmp_cookie['domain']) && ($tmp_cookie['domain'] != $request_host) &&
($tmp_cookie['domain'] != ".$request_host")) {
$domain = $tmp_cookie['domain'];
if ((strpos($domain, '.') === false) && ($domain != 'local')) {
$this->debug(' - domain "'.$domain.'" has no dot and is not a local domain');
continue;
}
if (preg_match('/\.[0-9]+$/', $domain)) {
$this->debug(' - domain "'.$domain.'" appears to be an ip address');
continue;
}
if (substr($domain, 0, 1) != '.') $domain = ".$domain";
if (!$this->_domain_match($request_host, $domain)) {
$this->debug(' - request host "'.$request_host.'" does not domain-match "'.$domain.'"');
continue;
}
} else {
// if domain is not specified in the set-cookie header, domain will default to
// the request host
$domain = $request_host;
}
// check path
if (isset($tmp_cookie['path']) && ($tmp_cookie['path'] != '')) {
$path = urldecode($tmp_cookie['path']);
if (!$this->_path_match($request_path, $path)) {
$this->debug(' - request path "'.$request_path.'" does not path-match "'.$path.'"');
continue;
}
} else {
$path = $request_path;
$path = substr($path, 0, strrpos($path, '/'));
if ($path == '') $path = '/';
}
// check if secure
$secure = (isset($tmp_cookie['secure'])) ? true : false;
// check expiry
if (isset($tmp_cookie['expires'])) {
if (($expires = strtotime($tmp_cookie['expires'])) < 0) {
$expires = null;
}
} else {
$expires = null;
}
// set cookie
$this->set_cookie($domain, $path, $tmp_cookie['name'], $tmp_cookie['value'], $secure, $expires);
}
}
// return array of set-cookie values extracted from HTTP response headers (string $h)
public function extractCookies($h) {
$x = 0;
$lines = 0;
$headers = array();
$last_match = false;
$h = explode("\n", $h);
foreach ($h as $line) {
$line = rtrim($line);
$lines++;
$trimmed_line = trim($line);
if (isset($line_last)) {
// check if we have \r\n\r\n (indicating the end of headers)
// some servers will not use CRLF (\r\n), so we make CR (\r) optional.
// if (preg_match('/\015?\012\015?\012/', $line_last.$line)) {
// break;
// }
// As an alternative, we can check if the current trimmed line is empty
if ($trimmed_line == '') {
break;
}
// check for continuation line...
// RFC 2616 Section 2.2 "Basic Rules":
// HTTP/1.1 header field values can be folded onto multiple lines if the
// continuation line begins with a space or horizontal tab. All linear
// white space, including folding, has the same semantics as SP. A
// recipient MAY replace any linear white space with a single SP before
// interpreting the field value or forwarding the message downstream.
if ($last_match && preg_match('/^\s+(.*)/', $line, $match)) {
// append to previous header value
$headers[$x-1] .= ' '.rtrim($match[1]);
continue;
}
}
$line_last = $line;
// split header name and value
if (preg_match('/^Set-Cookie\s*:\s*(.*)/i', $line, $match)) {
$headers[$x++] = rtrim($match[1]);
$last_match = true;
} else {
$last_match = false;
}
}
return $headers;
}
/**
* Set Cookie
* @param string $domain
* @param string $path
* @param string $name cookie name
* @param string $value cookie value
* @param bool $secure
* @param int $expires expiry time (null if session cookie, <= 0 will delete cookie)
* @return void
*/
function set_cookie($domain, $path, $name, $value, $secure=false, $expires=null)
{
if ($domain == '') return;
if ($path == '') return;
if ($name == '') return;
// check if cookie needs to go
if (isset($expires) && ($expires <= 0)) {
if (isset($this->cookies[$domain][$path][$name])) unset($this->cookies[$domain][$path][$name]);
return;
}
if ($value == '') return;
$this->cookies[$domain][$path][$name] = array($value, $secure, $expires);
return;
}
/**
* Clear cookies - [domain [,path [,name]]] - call method with no arguments to clear all cookies.
* @param string $domain
* @param string $path
* @param string $name
* @return void
*/
function clear($domain=null, $path=null, $name=null)
{
if (!isset($domain)) {
$this->cookies = array();
} elseif (!isset($path)) {
if (isset($this->cookies[$domain])) unset($this->cookies[$domain]);
} elseif (!isset($name)) {
if (isset($this->cookies[$domain][$path])) unset($this->cookies[$domain][$path]);
} elseif (isset($name)) {
if (isset($this->cookies[$domain][$path][$name])) unset($this->cookies[$domain][$path][$name]);
}
}
/**
* Compare string length - used for sorting
* @access private
* @return int
*/
function _cmp_length($a, $b)
{
$la = strlen($a); $lb = strlen($b);
if ($la == $lb) return 0;
return ($la > $lb) ? -1 : 1;
}
/**
* Reduce domain
* @param string $domain
* @return string
* @access private
*/
function _reduce_domain($domain)
{
if ($domain == '') return '';
if (substr($domain, 0, 1) == '.') return substr($domain, 1);
return substr($domain, strpos($domain, '.'));
}
/**
* Path match - check if path1 path-matches path2
*
* From RFC 2965:
* <i>For two strings that represent paths, P1 and P2, P1 path-matches P2
* if P2 is a prefix of P1 (including the case where P1 and P2 string-
* compare equal). Thus, the string /tec/waldo path-matches /tec.</i>
* @param string $path1
* @param string $path2
* @return bool
* @access private
*/
function _path_match($path1, $path2)
{
return (substr($path1, 0, strlen($path2)) == $path2);
}
/**
* Domain match - check if domain1 domain-matches domain2
*
* A few extracts from RFC 2965:
* - A Set-Cookie2 from request-host y.x.foo.com for Domain=.foo.com
* would be rejected, because H is y.x and contains a dot.
*
* - A Set-Cookie2 from request-host x.foo.com for Domain=.foo.com
* would be accepted.
*
* - A Set-Cookie2 with Domain=.com or Domain=.com., will always be
* rejected, because there is no embedded dot.
*
* - A Set-Cookie2 from request-host example for Domain=.local will
* be accepted, because the effective host name for the request-
* host is example.local, and example.local domain-matches .local.
*
* I'm ignoring the first point for now (must check to see how other browsers handle
* this rule for Set-Cookie headers)
*
* @param string $domain1
* @param string $domain2
* @return bool
* @access private
*/
function _domain_match($domain1, $domain2)
{
$domain1 = strtolower($domain1);
$domain2 = strtolower($domain2);
while (strpos($domain1, '.') !== false) {
if ($domain1 == $domain2) return true;
$domain1 = $this->_reduce_domain($domain1);
continue;
}
return false;
}
}

File diff suppressed because it is too large Load Diff

View File

@ -1,79 +1,78 @@
<?php
/**
* Humble HTTP Agent extension for SimplePie_File
*
* This class is designed to extend and override SimplePie_File
* in order to prevent duplicate HTTP requests being sent out.
* The idea is to initialise an instance of Humble HTTP Agent
* and attach it, to a static class variable, of this class.
* SimplePie will then automatically initialise this class
*
* @date 2011-02-28
*/
class SimplePie_HumbleHttpAgent extends SimplePie_File
{
protected static $agent;
var $url;
var $useragent;
var $success = true;
var $headers = array();
var $body;
var $status_code;
var $redirects = 0;
var $error;
var $method = SIMPLEPIE_FILE_SOURCE_NONE;
public static function set_agent(HumbleHttpAgent $agent) {
self::$agent = $agent;
}
public function __construct($url, $timeout = 10, $redirects = 5, $headers = null, $useragent = null, $force_fsockopen = false) {
if (class_exists('idna_convert'))
{
$idn = new idna_convert();
$parsed = SimplePie_Misc::parse_url($url);
$url = SimplePie_Misc::compress_parse_url($parsed['scheme'], $idn->encode($parsed['authority']), $parsed['path'], $parsed['query'], $parsed['fragment']);
}
$this->url = $url;
$this->useragent = $useragent;
if (preg_match('/^http(s)?:\/\//i', $url))
{
if (!is_array($headers))
{
$headers = array();
}
$this->method = SIMPLEPIE_FILE_SOURCE_REMOTE | SIMPLEPIE_FILE_SOURCE_CURL;
$headers2 = array();
foreach ($headers as $key => $value) {
$headers2[] = "$key: $value";
}
//TODO: allow for HTTP headers
// curl_setopt($fp, CURLOPT_HTTPHEADER, $headers2);
$response = self::$agent->get($url);
if ($response === false || !isset($response['status_code'])) {
$this->error = 'failed to fetch URL';
$this->success = false;
} else {
// The extra lines at the end are there to satisfy SimplePie's HTTP parser.
// The class expects a full HTTP message, whereas we're giving it only
// headers - the new lines indicate the start of the body.
$parser = new SimplePie_HTTP_Parser($response['headers']."\r\n\r\n");
if ($parser->parse()) {
$this->headers = $parser->headers;
//$this->body = $parser->body;
$this->body = $response['body'];
$this->status_code = $parser->status_code;
}
}
}
else
{
$this->error = 'invalid URL';
$this->success = false;
}
}
}
?>
<?php
/**
* Humble HTTP Agent extension for SimplePie_File
*
* This class is designed to extend and override SimplePie_File
* in order to prevent duplicate HTTP requests being sent out.
* The idea is to initialise an instance of Humble HTTP Agent
* and attach it, to a static class variable, of this class.
* SimplePie will then automatically initialise this class
*
* @date 2011-02-28
*/
class SimplePie_HumbleHttpAgent extends SimplePie_File
{
protected static $agent;
var $url;
var $useragent;
var $success = true;
var $headers = array();
var $body;
var $status_code;
var $redirects = 0;
var $error;
var $method = SIMPLEPIE_FILE_SOURCE_NONE;
public static function set_agent(HumbleHttpAgent $agent) {
self::$agent = $agent;
}
public function __construct($url, $timeout = 10, $redirects = 5, $headers = null, $useragent = null, $force_fsockopen = false) {
if (class_exists('idna_convert'))
{
$idn = new idna_convert();
$parsed = SimplePie_Misc::parse_url($url);
$url = SimplePie_Misc::compress_parse_url($parsed['scheme'], $idn->encode($parsed['authority']), $parsed['path'], $parsed['query'], $parsed['fragment']);
}
$this->url = $url;
$this->useragent = $useragent;
if (preg_match('/^http(s)?:\/\//i', $url))
{
if (!is_array($headers))
{
$headers = array();
}
$this->method = SIMPLEPIE_FILE_SOURCE_REMOTE | SIMPLEPIE_FILE_SOURCE_CURL;
$headers2 = array();
foreach ($headers as $key => $value) {
$headers2[] = "$key: $value";
}
//TODO: allow for HTTP headers
// curl_setopt($fp, CURLOPT_HTTPHEADER, $headers2);
$response = self::$agent->get($url);
if ($response === false || !isset($response['status_code'])) {
$this->error = 'failed to fetch URL';
$this->success = false;
} else {
// The extra lines at the end are there to satisfy SimplePie's HTTP parser.
// The class expects a full HTTP message, whereas we're giving it only
// headers - the new lines indicate the start of the body.
$parser = new SimplePie_HTTP_Parser($response['headers']."\r\n\r\n");
if ($parser->parse()) {
$this->headers = $parser->headers;
//$this->body = $parser->body;
$this->body = $response['body'];
$this->status_code = $parser->status_code;
}
}
}
else
{
$this->error = 'invalid URL';
$this->success = false;
}
}
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,57 @@
<?php
class Text_LanguageDetect_Exception extends Exception
{
/**
* Database file could not be found
*/
const DB_NOT_FOUND = 10;
/**
* Database file found, but not readable
*/
const DB_NOT_READABLE = 11;
/**
* Database file is empty
*/
const DB_EMPTY = 12;
/**
* Database contents is not a PHP array
*/
const DB_NOT_ARRAY = 13;
/**
* Magic quotes are activated
*/
const MAGIC_QUOTES = 14;
/**
* Parameter of invalid type passed to method
*/
const PARAM_TYPE = 20;
/**
* Character in parameter is invalid
*/
const INVALID_CHAR = 21;
/**
* Language is not in the database
*/
const UNKNOWN_LANGUAGE = 30;
/**
* Error during block detection
*/
const BLOCK_DETECTION = 40;
/**
* Error while clustering languages
*/
const NO_HIGHEST_KEY = 50;
}

View File

@ -0,0 +1,339 @@
<?php
/**
* Part of Text_LanguageDetect
*
* PHP version 5
*
* @category Text
* @package Text_LanguageDetect
* @author Christian Weiske <cweiske@php.net>
* @copyright 2011 Christian Weiske <cweiske@php.net>
* @license http://www.debian.org/misc/bsd.license BSD
* @version SVN: $Id$
* @link http://pear.php.net/package/Text_LanguageDetect/
*/
/**
* Provides a mapping between the languages from lang.dat and the
* ISO 639-1 and ISO-639-2 codes.
*
* Note that this class contains only languages that exist in lang.dat.
*
* @category Text
* @package Text_LanguageDetect
* @author Christian Weiske <cweiske@php.net>
* @copyright 2011 Christian Weiske <cweiske@php.net>
* @license http://www.debian.org/misc/bsd.license BSD
* @link http://www.loc.gov/standards/iso639-2/php/code_list.php
*/
class Text_LanguageDetect_ISO639
{
/**
* Maps all language names from the language database to the
* ISO 639-1 2-letter language code.
*
* NULL indicates that there is no 2-letter code.
*
* @var array
*/
public static $nameToCode2 = array(
'albanian' => 'sq',
'arabic' => 'ar',
'azeri' => 'az',
'bengali' => 'bn',
'bulgarian' => 'bg',
'cebuano' => null,
'croatian' => 'hr',
'czech' => 'cs',
'danish' => 'da',
'dutch' => 'nl',
'english' => 'en',
'estonian' => 'et',
'farsi' => 'fa',
'finnish' => 'fi',
'french' => 'fr',
'german' => 'de',
'hausa' => 'ha',
'hawaiian' => null,
'hindi' => 'hi',
'hungarian' => 'hu',
'icelandic' => 'is',
'indonesian' => 'id',
'italian' => 'it',
'kazakh' => 'kk',
'kyrgyz' => 'ky',
'latin' => 'la',
'latvian' => 'lv',
'lithuanian' => 'lt',
'macedonian' => 'mk',
'mongolian' => 'mn',
'nepali' => 'ne',
'norwegian' => 'no',
'pashto' => 'ps',
'pidgin' => null,
'polish' => 'pl',
'portuguese' => 'pt',
'romanian' => 'ro',
'russian' => 'ru',
'serbian' => 'sr',
'slovak' => 'sk',
'slovene' => 'sl',
'somali' => 'so',
'spanish' => 'es',
'swahili' => 'sw',
'swedish' => 'sv',
'tagalog' => 'tl',
'turkish' => 'tr',
'ukrainian' => 'uk',
'urdu' => 'ur',
'uzbek' => 'uz',
'vietnamese' => 'vi',
'welsh' => 'cy',
);
/**
* Maps all language names from the language database to the
* ISO 639-2 3-letter language code.
*
* @var array
*/
public static $nameToCode3 = array(
'albanian' => 'sqi',
'arabic' => 'ara',
'azeri' => 'aze',
'bengali' => 'ben',
'bulgarian' => 'bul',
'cebuano' => 'ceb',
'croatian' => 'hrv',
'czech' => 'ces',
'danish' => 'dan',
'dutch' => 'nld',
'english' => 'eng',
'estonian' => 'est',
'farsi' => 'fas',
'finnish' => 'fin',
'french' => 'fra',
'german' => 'deu',
'hausa' => 'hau',
'hawaiian' => 'haw',
'hindi' => 'hin',
'hungarian' => 'hun',
'icelandic' => 'isl',
'indonesian' => 'ind',
'italian' => 'ita',
'kazakh' => 'kaz',
'kyrgyz' => 'kir',
'latin' => 'lat',
'latvian' => 'lav',
'lithuanian' => 'lit',
'macedonian' => 'mkd',
'mongolian' => 'mon',
'nepali' => 'nep',
'norwegian' => 'nor',
'pashto' => 'pus',
'pidgin' => 'crp',
'polish' => 'pol',
'portuguese' => 'por',
'romanian' => 'ron',
'russian' => 'rus',
'serbian' => 'srp',
'slovak' => 'slk',
'slovene' => 'slv',
'somali' => 'som',
'spanish' => 'spa',
'swahili' => 'swa',
'swedish' => 'swe',
'tagalog' => 'tgl',
'turkish' => 'tur',
'ukrainian' => 'ukr',
'urdu' => 'urd',
'uzbek' => 'uzb',
'vietnamese' => 'vie',
'welsh' => 'cym',
);
/**
* Maps ISO 639-1 2-letter language codes to the language names
* in the language database
*
* Not all languages have a 2 letter code, so some are missing
*
* @var array
*/
public static $code2ToName = array(
'ar' => 'arabic',
'az' => 'azeri',
'bg' => 'bulgarian',
'bn' => 'bengali',
'cs' => 'czech',
'cy' => 'welsh',
'da' => 'danish',
'de' => 'german',
'en' => 'english',
'es' => 'spanish',
'et' => 'estonian',
'fa' => 'farsi',
'fi' => 'finnish',
'fr' => 'french',
'ha' => 'hausa',
'hi' => 'hindi',
'hr' => 'croatian',
'hu' => 'hungarian',
'id' => 'indonesian',
'is' => 'icelandic',
'it' => 'italian',
'kk' => 'kazakh',
'ky' => 'kyrgyz',
'la' => 'latin',
'lt' => 'lithuanian',
'lv' => 'latvian',
'mk' => 'macedonian',
'mn' => 'mongolian',
'ne' => 'nepali',
'nl' => 'dutch',
'no' => 'norwegian',
'pl' => 'polish',
'ps' => 'pashto',
'pt' => 'portuguese',
'ro' => 'romanian',
'ru' => 'russian',
'sk' => 'slovak',
'sl' => 'slovene',
'so' => 'somali',
'sq' => 'albanian',
'sr' => 'serbian',
'sv' => 'swedish',
'sw' => 'swahili',
'tl' => 'tagalog',
'tr' => 'turkish',
'uk' => 'ukrainian',
'ur' => 'urdu',
'uz' => 'uzbek',
'vi' => 'vietnamese',
);
/**
* Maps ISO 639-2 3-letter language codes to the language names
* in the language database.
*
* @var array
*/
public static $code3ToName = array(
'ara' => 'arabic',
'aze' => 'azeri',
'ben' => 'bengali',
'bul' => 'bulgarian',
'ceb' => 'cebuano',
'ces' => 'czech',
'crp' => 'pidgin',
'cym' => 'welsh',
'dan' => 'danish',
'deu' => 'german',
'eng' => 'english',
'est' => 'estonian',
'fas' => 'farsi',
'fin' => 'finnish',
'fra' => 'french',
'hau' => 'hausa',
'haw' => 'hawaiian',
'hin' => 'hindi',
'hrv' => 'croatian',
'hun' => 'hungarian',
'ind' => 'indonesian',
'isl' => 'icelandic',
'ita' => 'italian',
'kaz' => 'kazakh',
'kir' => 'kyrgyz',
'lat' => 'latin',
'lav' => 'latvian',
'lit' => 'lithuanian',
'mkd' => 'macedonian',
'mon' => 'mongolian',
'nep' => 'nepali',
'nld' => 'dutch',
'nor' => 'norwegian',
'pol' => 'polish',
'por' => 'portuguese',
'pus' => 'pashto',
'rom' => 'romanian',
'rus' => 'russian',
'slk' => 'slovak',
'slv' => 'slovene',
'som' => 'somali',
'spa' => 'spanish',
'sqi' => 'albanian',
'srp' => 'serbian',
'swa' => 'swahili',
'swe' => 'swedish',
'tgl' => 'tagalog',
'tur' => 'turkish',
'ukr' => 'ukrainian',
'urd' => 'urdu',
'uzb' => 'uzbek',
'vie' => 'vietnamese',
);
/**
* Returns the 2-letter ISO 639-1 code for the given language name.
*
* @param string $lang English language name like "swedish"
*
* @return string Two-letter language code (e.g. "sv") or NULL if not found
*/
public static function nameToCode2($lang)
{
$lang = strtolower($lang);
if (!isset(self::$nameToCode2[$lang])) {
return null;
}
return self::$nameToCode2[$lang];
}
/**
* Returns the 3-letter ISO 639-2 code for the given language name.
*
* @param string $lang English language name like "swedish"
*
* @return string Three-letter language code (e.g. "swe") or NULL if not found
*/
public static function nameToCode3($lang)
{
$lang = strtolower($lang);
if (!isset(self::$nameToCode3[$lang])) {
return null;
}
return self::$nameToCode3[$lang];
}
/**
* Returns the language name for the given 2-letter ISO 639-1 code.
*
* @param string $code Two-letter language code (e.g. "sv")
*
* @return string English language name like "swedish"
*/
public static function code2ToName($code)
{
$lang = strtolower($code);
if (!isset(self::$code2ToName[$code])) {
return null;
}
return self::$code2ToName[$code];
}
/**
* Returns the language name for the given 3-letter ISO 639-2 code.
*
* @param string $code Three-letter language code (e.g. "swe")
*
* @return string English language name like "swedish"
*/
public static function code3ToName($code)
{
$lang = strtolower($code);
if (!isset(self::$code3ToName[$code])) {
return null;
}
return self::$code3ToName[$code];
}
}

View File

@ -8,7 +8,7 @@
* @author Nicholas Pisarro
* @copyright 2006
* @license BSD
* @version CVS: $Id: Parser.php,v 1.5 2006/03/11 05:45:05 taak Exp $
* @version CVS: $Id: Parser.php 322327 2012-01-15 17:55:59Z cweiske $
* @link http://pear.php.net/package/Text_LanguageDetect/
* @link http://langdetect.blogspot.com/
*/
@ -28,7 +28,7 @@
* @author Nicholas Pisarro
* @copyright 2006
* @license BSD
* @version release: 0.2.3
* @version release: 0.3.0
*/
class Text_LanguageDetect_Parser extends Text_LanguageDetect
{
@ -102,21 +102,17 @@ class Text_LanguageDetect_Parser extends Text_LanguageDetect
* @access private
* @param string $string string to be parsed
*/
function Text_LanguageDetect_Parser($string, $db=null, $unicode_db=null) {
if (isset($db)) $this->_db_filename = $db;
if (isset($unicode_db)) $this->_unicode_db_filename = $unicode_db;
function Text_LanguageDetect_Parser($string) {
$this->_string = $string;
}
/**
* Returns true if a string is suitable for parsing
*
* @static
* @access public
* @param string $str input string to test
* @return bool true if acceptable, false if not
*/
function validateString($str) {
public static function validateString($str) {
if (!empty($str) && strlen($str) > 3 && preg_match('/\S/', $str)) {
return true;
} else {
@ -222,8 +218,7 @@ class Text_LanguageDetect_Parser extends Text_LanguageDetect
// unicode startup
if ($this->_compile_unicode) {
$blocks =& $this->_read_unicode_block_db();
$blocks = $this->_read_unicode_block_db();
$block_count = count($blocks);
$skipped_count = 0;
@ -349,6 +344,4 @@ class Text_LanguageDetect_Parser extends Text_LanguageDetect
}
}
/* vim: set expandtab tabstop=4 shiftwidth=4 softtabstop=4: */
?>
/* vim: set expandtab tabstop=4 shiftwidth=4 softtabstop=4: */

2281
inc/3rdparty/libraries/readability/Readability.php vendored Normal file → Executable file

File diff suppressed because it is too large Load Diff

View File

@ -3,8 +3,8 @@
// Author: Keyvan Minoukadeh
// Copyright (c) 2013 Keyvan Minoukadeh
// License: AGPLv3
// Version: 3.1
// Date: 2013-03-05
// Version: 3.2
// Date: 2013-05-13
// More info: http://fivefilters.org/content-only/
// Help: http://help.fivefilters.org
@ -25,14 +25,10 @@ along with this program. If not, see <http://www.gnu.org/licenses/>.
// Usage
// -----
// Request this file passing it your feed in the querystring: makefulltextfeed.php?url=mysite.org
// The following options can be passed in the querystring:
// * URL: url=[feed or website url] (required, should be URL-encoded - in php: urlencode($url))
// * URL points to HTML (not feed): html=true (optional, by default it's automatically detected)
// * API key: key=[api key] (optional, refer to config.php)
// * Max entries to process: max=[max number of items] (optional)
// Request this file passing it a web page or feed URL in the querystring: makefulltextfeed.php?url=example.org/article
// For more request parameters, see http://help.fivefilters.org/customer/portal/articles/226660-usage
error_reporting(E_ALL ^ E_NOTICE);
//error_reporting(E_ALL ^ E_NOTICE);
ini_set("display_errors", 1);
@set_time_limit(120);
@ -76,8 +72,8 @@ header('X-Robots-Tag: noindex, nofollow');
////////////////////////////////
// Check if service is enabled
////////////////////////////////
if (!$options->enabled) {
die('The full-text RSS service is currently disabled');
if (!$options->enabled) {
die('The full-text RSS service is currently disabled');
}
////////////////////////////////
@ -121,8 +117,8 @@ $options->smart_cache = $options->smart_cache && function_exists('apc_inc');
////////////////////////////////
// Check for feed URL
////////////////////////////////
if (!isset($_GET['url'])) {
die('No URL supplied');
if (!isset($_GET['url'])) {
die('No URL supplied');
}
$url = trim($_GET['url']);
if (strtolower(substr($url, 0, 7)) == 'feed://') {
@ -161,10 +157,12 @@ if (isset($_GET['key']) && ($key_index = array_search($_GET['key'], $options->ap
if (isset($_GET['links'])) $redirect .= '&links='.urlencode($_GET['links']);
if (isset($_GET['exc'])) $redirect .= '&exc='.urlencode($_GET['exc']);
if (isset($_GET['format'])) $redirect .= '&format='.urlencode($_GET['format']);
if (isset($_GET['callback'])) $redirect .= '&callback='.urlencode($_GET['callback']);
if (isset($_GET['callback'])) $redirect .= '&callback='.urlencode($_GET['callback']);
if (isset($_GET['l'])) $redirect .= '&l='.urlencode($_GET['l']);
if (isset($_GET['xss'])) $redirect .= '&xss';
if (isset($_GET['use_extracted_title'])) $redirect .= '&use_extracted_title';
if (isset($_GET['content'])) $redirect .= '&content='.urlencode($_GET['content']);
if (isset($_GET['summary'])) $redirect .= '&summary='.urlencode($_GET['summary']);
if (isset($_GET['debug'])) $redirect .= '&debug';
if ($debug_mode) {
debug('Redirecting to hide access key, follow URL below to continue');
@ -177,7 +175,7 @@ if (isset($_GET['key']) && ($key_index = array_search($_GET['key'], $options->ap
///////////////////////////////////////////////
// Set timezone.
// Prevents warnings, but needs more testing -
// Prevents warnings, but needs more testing -
// perhaps if timezone is set in php.ini we
// don't need to set it at all...
///////////////////////////////////////////////
@ -199,7 +197,7 @@ if (isset($_GET['key']) && isset($_GET['hash']) && isset($options->api_keys[(int
}
$key_index = ($valid_key) ? (int)$_GET['key'] : 0;
if (!$valid_key && $options->key_required) {
die('A valid key must be supplied');
die('A valid key must be supplied');
}
if (!$valid_key && isset($_GET['key']) && $_GET['key'] != '') {
die('The entered key is invalid');
@ -250,6 +248,28 @@ if ($options->favour_feed_titles == 'user') {
$favour_feed_titles = $options->favour_feed_titles;
}
///////////////////////////////////////////////
// Include full content in output?
///////////////////////////////////////////////
if ($options->content === 'user') {
if (isset($_GET['content']) && $_GET['content'] === '0') {
$options->content = false;
} else {
$options->content = true;
}
}
///////////////////////////////////////////////
// Include summaries in output?
///////////////////////////////////////////////
if ($options->summary === 'user') {
if (isset($_GET['summary']) && $_GET['summary'] === '1') {
$options->summary = true;
} else {
$options->summary = false;
}
}
///////////////////////////////////////////////
// Exclude items if extraction fails
///////////////////////////////////////////////
@ -272,15 +292,6 @@ if ($options->detect_language === 'user') {
$detect_language = $options->detect_language;
}
if ($detect_language >= 2) {
$language_codes = array('albanian' => 'sq','arabic' => 'ar','azeri' => 'az','bengali' => 'bn','bulgarian' => 'bg',
'cebuano' => 'ceb', // ISO 639-2
'croatian' => 'hr','czech' => 'cs','danish' => 'da','dutch' => 'nl','english' => 'en','estonian' => 'et','farsi' => 'fa','finnish' => 'fi','french' => 'fr','german' => 'de','hausa' => 'ha',
'hawaiian' => 'haw', // ISO 639-2
'hindi' => 'hi','hungarian' => 'hu','icelandic' => 'is','indonesian' => 'id','italian' => 'it','kazakh' => 'kk','kyrgyz' => 'ky','latin' => 'la','latvian' => 'lv','lithuanian' => 'lt','macedonian' => 'mk','mongolian' => 'mn','nepali' => 'ne','norwegian' => 'no','pashto' => 'ps',
'pidgin' => 'cpe', // ISO 639-2
'polish' => 'pl','portuguese' => 'pt','romanian' => 'ro','russian' => 'ru','serbian' => 'sr','slovak' => 'sk','slovene' => 'sl','somali' => 'so','spanish' => 'es','swahili' => 'sw','swedish' => 'sv','tagalog' => 'tl','turkish' => 'tr','ukrainian' => 'uk','urdu' => 'ur','uzbek' => 'uz','vietnamese' => 'vi','welsh' => 'cy');
}
$use_cld = extension_loaded('cld') && (version_compare(PHP_VERSION, '5.3.0') >= 0);
/////////////////////////////////////
@ -330,7 +341,7 @@ if ($options->cors) header('Access-Control-Allow-Origin: *');
//////////////////////////////////
if ($options->caching) {
debug('Caching is enabled...');
$cache_id = md5($max.$url.$valid_key.$links.$favour_feed_titles.$xss_filter.$exclude_on_fail.$format.$detect_language.(int)isset($_GET['pubsub']));
$cache_id = md5($max.$url.(int)$valid_key.$links.(int)$favour_feed_titles.(int)$options->content.(int)$options->summary.(int)$xss_filter.(int)$exclude_on_fail.$format.$detect_language.(int)isset($_GET['pubsub']));
$check_cache = true;
if ($options->apc && $options->smart_cache) {
apc_add("cache.$cache_id", 0, 10*60);
@ -468,7 +479,7 @@ if ($img_url = $feed->get_image_url()) {
////////////////////////////////////////////
// Loop through feed items
////////////////////////////////////////////
$items = $feed->get_items(0, $max);
$items = $feed->get_items(0, $max);
// Request all feed items in parallel (if supported)
$urls_sanitized = array();
$urls = array();
@ -550,24 +561,43 @@ foreach ($items as $key => $item) {
$is_single_page = false;
if ($single_page_response = getSinglePage($item, $html, $effective_url)) {
$is_single_page = true;
$html = $single_page_response['body'];
// remove strange things
$html = str_replace('</[>', '', $html);
$html = convert_to_utf8($html, $single_page_response['headers']);
$effective_url = $single_page_response['effective_url'];
debug("Retrieved single-page view from $effective_url");
// check if action defined for returned Content-Type
$mime_info = get_mime_action_info($single_page_response['headers']);
if (isset($mime_info['action'])) {
if ($mime_info['action'] == 'exclude') {
continue; // skip this feed item entry
} elseif ($mime_info['action'] == 'link') {
if ($mime_info['type'] == 'image') {
$html = "<a href=\"$effective_url\"><img src=\"$effective_url\" alt=\"{$mime_info['name']}\" /></a>";
} else {
$html = "<a href=\"$effective_url\">Download {$mime_info['name']}</a>";
}
$extracted_title = $mime_info['name'];
$do_content_extraction = false;
}
}
if ($do_content_extraction) {
$html = $single_page_response['body'];
// remove strange things
$html = str_replace('</[>', '', $html);
$html = convert_to_utf8($html, $single_page_response['headers']);
debug("Retrieved single-page view from $effective_url");
}
unset($single_page_response);
}
}
if ($do_content_extraction) {
debug('--------');
debug('Attempting to extract content');
$extract_result = $extractor->process($html, $effective_url);
$readability = $extractor->readability;
$content_block = ($extract_result) ? $extractor->getContent() : null;
$content_block = ($extract_result) ? $extractor->getContent() : null;
$extracted_title = ($extract_result) ? $extractor->getTitle() : '';
// Deal with multi-page articles
//die('Next: '.$extractor->getNextPageUrl());
$is_multi_page = (!$is_single_page && $extract_result && $extractor->getNextPageUrl());
if ($options->multipage && $is_multi_page) {
if ($options->multipage && $is_multi_page && $options->content) {
debug('--------');
debug('Attempting to process multi-page article');
$multi_page_urls = array();
@ -580,7 +610,7 @@ foreach ($items as $key => $item) {
// check it's not what we have already!
if (!in_array($next_page_url, $multi_page_urls)) {
// it's not, so let's attempt to fetch it
$multi_page_urls[] = $next_page_url;
$multi_page_urls[] = $next_page_url;
$_prev_ref = $http->referer;
if (($response = $http->get($next_page_url, true)) && $response['status_code'] < 300) {
// make sure mime type is not something with a different action associated
@ -605,13 +635,15 @@ foreach ($items as $key => $item) {
// did we successfully deal with this multi-page article?
if (empty($multi_page_content)) {
debug('Failed to extract all parts of multi-page article, so not going to include them');
$multi_page_content[] = $readability->dom->createElement('p')->innerHTML = '<em>This article appears to continue on subsequent pages which we could not extract</em>';
$_page = $readability->dom->createElement('p');
$_page->innerHTML = '<em>This article appears to continue on subsequent pages which we could not extract</em>';
$multi_page_content[] = $_page;
}
foreach ($multi_page_content as $_page) {
$_page = $content_block->ownerDocument->importNode($_page, true);
$content_block->appendChild($_page);
}
unset($multi_page_urls, $multi_page_content, $page_mime_info, $next_page_url);
unset($multi_page_urls, $multi_page_content, $page_mime_info, $next_page_url, $_page);
}
}
// use extracted title for both feed and item title if we're using single-item dummy feed
@ -639,7 +671,11 @@ foreach ($items as $key => $item) {
$html .= $item->get_description();
} else {
$readability->clean($content_block, 'select');
if ($options->rewrite_relative_urls) makeAbsolute($effective_url, $content_block);
// get base URL
$base_url = get_base_url($readability->dom);
if (!$base_url) $base_url = $effective_url;
// rewrite URLs
if ($options->rewrite_relative_urls) makeAbsolute($base_url, $content_block);
// footnotes
if (($links == 'footnotes') && (strpos($effective_url, 'wikipedia.org') === false)) {
$readability->addFootnotes($content_block);
@ -658,7 +694,7 @@ foreach ($items as $key => $item) {
} else {
$html = $content_block->ownerDocument->saveXML($content_block); // essentially outerHTML
}
unset($content_block);
//unset($content_block);
// post-processing cleanup
$html = preg_replace('!<p>[\s\h\v]*</p>!u', '', $html);
if ($links == 'remove') {
@ -671,130 +707,155 @@ foreach ($items as $key => $item) {
}
}
if ($valid_key && isset($_GET['pubsub'])) { // used only on fivefilters.org at the moment
$newitem->addElement('guid', 'http://fivefilters.org/content-only/redirect.php?url='.urlencode($item->get_permalink()), array('isPermaLink'=>'false'));
if ($valid_key && isset($_GET['pubsub'])) { // used only on fivefilters.org at the moment
$newitem->addElement('guid', 'http://fivefilters.org/content-only/redirect.php?url='.urlencode($item->get_permalink()), array('isPermaLink'=>'false'));
} else {
$newitem->addElement('guid', $item->get_permalink(), array('isPermaLink'=>'true'));
}
// filter xss?
if ($xss_filter) {
debug('Filtering HTML to remove XSS');
$html = htmLawed::hl($html, array('safe'=>1, 'deny_attribute'=>'style', 'comment'=>1, 'cdata'=>1));
}
// add content
if ($options->summary === true) {
// get summary
$summary = '';
if (!$do_content_extraction) {
$summary = $html;
} else {
$newitem->addElement('guid', $item->get_permalink(), array('isPermaLink'=>'true'));
}
// filter xss?
if ($xss_filter) {
debug('Filtering HTML to remove XSS');
$html = htmLawed::hl($html, array('safe'=>1, 'deny_attribute'=>'style', 'comment'=>1, 'cdata'=>1));
}
$newitem->setDescription($html);
// set date
if ((int)$item->get_date('U') > 0) {
$newitem->setDate((int)$item->get_date('U'));
} elseif ($extractor->getDate()) {
$newitem->setDate($extractor->getDate());
}
// add authors
if ($authors = $item->get_authors()) {
foreach ($authors as $author) {
// for some feeds, SimplePie stores author's name as email, e.g. http://feeds.feedburner.com/nymag/intel
if ($author->get_name() !== null) {
$newitem->addElement('dc:creator', $author->get_name());
} elseif ($author->get_email() !== null) {
$newitem->addElement('dc:creator', $author->get_email());
// Try to get first few paragraphs
if (isset($content_block) && ($content_block instanceof DOMElement)) {
$_paras = $content_block->getElementsByTagName('p');
foreach ($_paras as $_para) {
$summary .= preg_replace("/[\n\r\t ]+/", ' ', $_para->textContent).' ';
if (strlen($summary) > 200) break;
}
}
} elseif ($authors = $extractor->getAuthors()) {
//TODO: make sure the list size is reasonable
foreach ($authors as $author) {
// TODO: xpath often selects authors from other articles linked from the page.
// for now choose first item
$newitem->addElement('dc:creator', $author);
break;
} else {
$summary = $html;
}
}
// add language
if ($detect_language) {
$language = $extractor->getLanguage();
if (!$language) $language = $feed->get_language();
if (($detect_language == 3 || (!$language && $detect_language == 2)) && $text_sample) {
try {
if ($use_cld) {
// Use PHP-CLD extension
$php_cld = 'CLD\detect'; // in quotes to prevent PHP 5.2 parse error
$res = $php_cld($text_sample);
if (is_array($res) && count($res) > 0) {
$language = $res[0]['code'];
}
} else {
//die('what');
// Use PEAR's Text_LanguageDetect
if (!isset($l)) {
$l = new Text_LanguageDetect('libraries/language-detect/lang.dat', 'libraries/language-detect/unicode_blocks.dat');
}
$l_result = $l->detect($text_sample, 1);
if (count($l_result) > 0) {
$language = $language_codes[key($l_result)];
}
unset($_paras, $_para);
$summary = get_excerpt($summary);
$newitem->setDescription($summary);
if ($options->content) $newitem->setElement('content:encoded', $html);
} else {
if ($options->content) $newitem->setDescription($html);
}
// set date
if ((int)$item->get_date('U') > 0) {
$newitem->setDate((int)$item->get_date('U'));
} elseif ($extractor->getDate()) {
$newitem->setDate($extractor->getDate());
}
// add authors
if ($authors = $item->get_authors()) {
foreach ($authors as $author) {
// for some feeds, SimplePie stores author's name as email, e.g. http://feeds.feedburner.com/nymag/intel
if ($author->get_name() !== null) {
$newitem->addElement('dc:creator', $author->get_name());
} elseif ($author->get_email() !== null) {
$newitem->addElement('dc:creator', $author->get_email());
}
}
} elseif ($authors = $extractor->getAuthors()) {
//TODO: make sure the list size is reasonable
foreach ($authors as $author) {
// TODO: xpath often selects authors from other articles linked from the page.
// for now choose first item
$newitem->addElement('dc:creator', $author);
break;
}
}
// add language
if ($detect_language) {
$language = $extractor->getLanguage();
if (!$language) $language = $feed->get_language();
if (($detect_language == 3 || (!$language && $detect_language == 2)) && $text_sample) {
try {
if ($use_cld) {
// Use PHP-CLD extension
$php_cld = 'CLD\detect'; // in quotes to prevent PHP 5.2 parse error
$res = $php_cld($text_sample);
if (is_array($res) && count($res) > 0) {
$language = $res[0]['code'];
}
} catch (Exception $e) {
//die('error: '.$e);
// do nothing
}
}
if ($language && (strlen($language) < 7)) {
$newitem->addElement('dc:language', $language);
}
}
// add MIME type (if it appeared in our exclusions lists)
if (isset($mime_info['mime'])) $newitem->addElement('dc:format', $mime_info['mime']);
// add effective URL (URL after redirects)
if (isset($effective_url)) {
//TODO: ensure $effective_url is valid witout - sometimes it causes problems, e.g.
//http://www.siasat.pk/forum/showthread.php?108883-Pakistan-Chowk-by-Rana-Mubashir--25th-March-2012-Special-Program-from-Liari-(Karachi)
//temporary measure: use utf8_encode()
$newitem->addElement('dc:identifier', remove_url_cruft(utf8_encode($effective_url)));
} else {
$newitem->addElement('dc:identifier', remove_url_cruft($item->get_permalink()));
}
// add categories
if ($categories = $item->get_categories()) {
foreach ($categories as $category) {
if ($category->get_label() !== null) {
$newitem->addElement('category', $category->get_label());
}
}
}
// check for enclosures
if ($options->keep_enclosures) {
if ($enclosures = $item->get_enclosures()) {
foreach ($enclosures as $enclosure) {
// thumbnails
foreach ((array)$enclosure->get_thumbnails() as $thumbnail) {
$newitem->addElement('media:thumbnail', '', array('url'=>$thumbnail));
} else {
//die('what');
// Use PEAR's Text_LanguageDetect
if (!isset($l)) {
$l = new Text_LanguageDetect();
$l->setNameMode(2); // return ISO 639-1 codes (e.g. "en")
}
$l_result = $l->detect($text_sample, 1);
if (count($l_result) > 0) {
$language = key($l_result);
}
if (!$enclosure->get_link()) continue;
$enc = array();
// Media RSS spec ($enc): http://search.yahoo.com/mrss
// SimplePie methods ($enclosure): http://simplepie.org/wiki/reference/start#methods4
$enc['url'] = $enclosure->get_link();
if ($enclosure->get_length()) $enc['fileSize'] = $enclosure->get_length();
if ($enclosure->get_type()) $enc['type'] = $enclosure->get_type();
if ($enclosure->get_medium()) $enc['medium'] = $enclosure->get_medium();
if ($enclosure->get_expression()) $enc['expression'] = $enclosure->get_expression();
if ($enclosure->get_bitrate()) $enc['bitrate'] = $enclosure->get_bitrate();
if ($enclosure->get_framerate()) $enc['framerate'] = $enclosure->get_framerate();
if ($enclosure->get_sampling_rate()) $enc['samplingrate'] = $enclosure->get_sampling_rate();
if ($enclosure->get_channels()) $enc['channels'] = $enclosure->get_channels();
if ($enclosure->get_duration()) $enc['duration'] = $enclosure->get_duration();
if ($enclosure->get_height()) $enc['height'] = $enclosure->get_height();
if ($enclosure->get_width()) $enc['width'] = $enclosure->get_width();
if ($enclosure->get_language()) $enc['lang'] = $enclosure->get_language();
$newitem->addElement('media:content', '', $enc);
}
} catch (Exception $e) {
//die('error: '.$e);
// do nothing
}
}
/* } */
if ($language && (strlen($language) < 7)) {
$newitem->addElement('dc:language', $language);
}
}
// add MIME type (if it appeared in our exclusions lists)
if (isset($mime_info['mime'])) $newitem->addElement('dc:format', $mime_info['mime']);
// add effective URL (URL after redirects)
if (isset($effective_url)) {
//TODO: ensure $effective_url is valid witout - sometimes it causes problems, e.g.
//http://www.siasat.pk/forum/showthread.php?108883-Pakistan-Chowk-by-Rana-Mubashir-<2D>-25th-March-2012-Special-Program-from-Liari-(Karachi)
//temporary measure: use utf8_encode()
$newitem->addElement('dc:identifier', remove_url_cruft(utf8_encode($effective_url)));
} else {
$newitem->addElement('dc:identifier', remove_url_cruft($item->get_permalink()));
}
// add categories
if ($categories = $item->get_categories()) {
foreach ($categories as $category) {
if ($category->get_label() !== null) {
$newitem->addElement('category', $category->get_label());
}
}
}
// check for enclosures
if ($options->keep_enclosures) {
if ($enclosures = $item->get_enclosures()) {
foreach ($enclosures as $enclosure) {
// thumbnails
foreach ((array)$enclosure->get_thumbnails() as $thumbnail) {
$newitem->addElement('media:thumbnail', '', array('url'=>$thumbnail));
}
if (!$enclosure->get_link()) continue;
$enc = array();
// Media RSS spec ($enc): http://search.yahoo.com/mrss
// SimplePie methods ($enclosure): http://simplepie.org/wiki/reference/start#methods4
$enc['url'] = $enclosure->get_link();
if ($enclosure->get_length()) $enc['fileSize'] = $enclosure->get_length();
if ($enclosure->get_type()) $enc['type'] = $enclosure->get_type();
if ($enclosure->get_medium()) $enc['medium'] = $enclosure->get_medium();
if ($enclosure->get_expression()) $enc['expression'] = $enclosure->get_expression();
if ($enclosure->get_bitrate()) $enc['bitrate'] = $enclosure->get_bitrate();
if ($enclosure->get_framerate()) $enc['framerate'] = $enclosure->get_framerate();
if ($enclosure->get_sampling_rate()) $enc['samplingrate'] = $enclosure->get_sampling_rate();
if ($enclosure->get_channels()) $enc['channels'] = $enclosure->get_channels();
if ($enclosure->get_duration()) $enc['duration'] = $enclosure->get_duration();
if ($enclosure->get_height()) $enc['height'] = $enclosure->get_height();
if ($enclosure->get_width()) $enc['width'] = $enclosure->get_width();
if ($enclosure->get_language()) $enc['lang'] = $enclosure->get_language();
$newitem->addElement('media:content', '', $enc);
}
}
}
$output->addItem($newitem);
unset($html);
$item_count++;
@ -831,7 +892,7 @@ if (!$debug_mode) {
}
if ($add_to_cache) {
ob_start();
$output->genarateFeed();
$output->genarateFeed(false);
$output = ob_get_contents();
ob_end_clean();
if ($html_only && $item_count == 0) {
@ -842,7 +903,7 @@ if (!$debug_mode) {
}
echo $output;
} else {
$output->genarateFeed();
$output->genarateFeed(false);
}
if ($callback) echo ');';
}

View File

@ -66,6 +66,38 @@ class DummySingleItem {
// HELPER FUNCTIONS
///////////////////////////////
// Adapted from WordPress
// http://core.trac.wordpress.org/browser/tags/3.5.1/wp-includes/formatting.php#L2173
function get_excerpt($text, $num_words=55, $more=null) {
if (null === $more) $more = '&hellip;';
$text = strip_tags($text);
//TODO: Check if word count is based on single characters (East Asian characters)
/*
if (1==2) {
$text = trim(preg_replace("/[\n\r\t ]+/", ' ', $text), ' ');
preg_match_all('/./u', $text, $words_array);
$words_array = array_slice($words_array[0], 0, $num_words + 1);
$sep = '';
} else {
$words_array = preg_split("/[\n\r\t ]+/", $text, $num_words + 1, PREG_SPLIT_NO_EMPTY);
$sep = ' ';
}
*/
$words_array = preg_split("/[\n\r\t ]+/", $text, $num_words + 1, PREG_SPLIT_NO_EMPTY);
$sep = ' ';
if (count($words_array) > $num_words) {
array_pop($words_array);
$text = implode($sep, $words_array);
$text = $text.$more;
} else {
$text = implode($sep, $words_array);
}
// trim whitespace at beginning or end of string
// See: http://stackoverflow.com/questions/4166896/trim-unicode-whitespace-in-php-5-2
$text = preg_replace('/^[\pZ\pC]+|[\pZ\pC]+$/u', '', $text);
return $text;
}
function url_allowed($url) {
global $options;
if (!empty($options->allowed_urls)) {
@ -165,14 +197,6 @@ function convert_to_utf8($html, $header=null)
if (strtolower($encoding) != 'utf-8') {
debug('Converting to UTF-8');
$html = SimplePie_Misc::change_encoding($html, $encoding, 'utf-8');
/*
if (function_exists('iconv')) {
// iconv appears to handle certain character encodings better than mb_convert_encoding
$html = iconv($encoding, 'utf-8', $html);
} else {
$html = mb_convert_encoding($html, 'utf-8', $encoding);
}
*/
}
}
}
@ -196,7 +220,7 @@ function makeAbsolute($base, $elem) {
}
function makeAbsoluteAttr($base, $e, $attr) {
if ($e->hasAttribute($attr)) {
// Trim leading and trailing white space. I don't really like this but
// Trim leading and trailing white space. I don't really like this but
// unfortunately it does appear on some sites. e.g. <img src=" /path/to/image.jpg" />
$url = trim(str_replace('%20', ' ', $e->getAttribute($attr)));
$url = str_replace(' ', '%20', $url);
@ -353,3 +377,13 @@ function debug($msg) {
flush();
}
}
function get_base_url($dom) {
$xpath = new DOMXPath($dom);
$base_url = @$xpath->evaluate('string(//head/base/@href)', $dom);
if ($base_url !== '') {
return $base_url;
} else {
return false;
}
}

105
inc/3rdparty/simple_html_dom.php vendored Normal file → Executable file
View File

@ -34,7 +34,7 @@
* @author S.C. Chen <me578022@gmail.com>
* @author John Schlick
* @author Rus Carroll
* @version 1.5 ($Rev: 202 $)
* @version 1.5 ($Rev: 210 $)
* @package PlaceLocalInclude
* @subpackage simple_html_dom
*/
@ -269,7 +269,10 @@ class simple_html_dom_node
{
return $this->children;
}
if (isset($this->children[$idx])) return $this->children[$idx];
if (isset($this->children[$idx]))
{
return $this->children[$idx];
}
return null;
}
@ -330,14 +333,14 @@ class simple_html_dom_node
function find_ancestor_tag($tag)
{
global $debug_object;
if (is_object($debug_object)) { $debug_object->debugLogEntry(1); }
if (is_object($debug_object)) { $debug_object->debug_log_entry(1); }
// Start by including ourselves in the comparison.
$returnDom = $this;
while (!is_null($returnDom))
{
if (is_object($debug_object)) { $debug_object->debugLog(2, "Current tag is: " . $returnDom->tag); }
if (is_object($debug_object)) { $debug_object->debug_log(2, "Current tag is: " . $returnDom->tag); }
if ($returnDom->tag == $tag)
{
@ -374,7 +377,7 @@ class simple_html_dom_node
$text = " with text: " . $this->text;
}
}
$debug_object->debugLog(1, 'Innertext of tag: ' . $this->tag . $text);
$debug_object->debug_log(1, 'Innertext of tag: ' . $this->tag . $text);
}
if ($this->tag==='root') return $this->innertext();
@ -532,7 +535,9 @@ class simple_html_dom_node
foreach ($head as $k=>$v)
{
if (!isset($found_keys[$k]))
{
$found_keys[$k] = 1;
}
}
}
@ -554,7 +559,7 @@ class simple_html_dom_node
protected function seek($selector, &$ret, $lowercase=false)
{
global $debug_object;
if (is_object($debug_object)) { $debug_object->debugLogEntry(1); }
if (is_object($debug_object)) { $debug_object->debug_log_entry(1); }
list($tag, $key, $val, $exp, $no_key) = $selector;
@ -615,7 +620,7 @@ class simple_html_dom_node
// this is a normal search, we want the value of that attribute of the tag.
$nodeKeyValue = $node->attr[$key];
}
if (is_object($debug_object)) {$debug_object->debugLog(2, "testing node: " . $node->tag . " for attribute: " . $key . $exp . $val . " where nodes value is: " . $nodeKeyValue);}
if (is_object($debug_object)) {$debug_object->debug_log(2, "testing node: " . $node->tag . " for attribute: " . $key . $exp . $val . " where nodes value is: " . $nodeKeyValue);}
//PaperG - If lowercase is set, do a case insensitive test of the value of the selector.
if ($lowercase) {
@ -623,7 +628,7 @@ class simple_html_dom_node
} else {
$check = $this->match($exp, $val, $nodeKeyValue);
}
if (is_object($debug_object)) {$debug_object->debugLog(2, "after match: " . ($check ? "true" : "false"));}
if (is_object($debug_object)) {$debug_object->debug_log(2, "after match: " . ($check ? "true" : "false"));}
// handle multiple class
if (!$check && strcasecmp($key, 'class')===0) {
@ -645,12 +650,12 @@ class simple_html_dom_node
unset($node);
}
// It's passed by reference so this is actually what this function returns.
if (is_object($debug_object)) {$debug_object->debugLog(1, "EXIT - ret: ", $ret);}
if (is_object($debug_object)) {$debug_object->debug_log(1, "EXIT - ret: ", $ret);}
}
protected function match($exp, $pattern, $value) {
global $debug_object;
if (is_object($debug_object)) {$debug_object->debugLogEntry(1);}
if (is_object($debug_object)) {$debug_object->debug_log_entry(1);}
switch ($exp) {
case '=':
@ -672,7 +677,7 @@ class simple_html_dom_node
protected function parse_selector($selector_string) {
global $debug_object;
if (is_object($debug_object)) {$debug_object->debugLogEntry(1);}
if (is_object($debug_object)) {$debug_object->debug_log_entry(1);}
// pattern of CSS selectors, modified from mootools
// Paperg: Add the colon to the attrbute, so that it properly finds <tag attr:ibute="something" > like google does.
@ -683,7 +688,7 @@ class simple_html_dom_node
// $pattern = "/([\w-:\*]*)(?:\#([\w-]+)|\.([\w-]+))?(?:\[@?(!?[\w-]+)(?:([!*^$]?=)[\"']?(.*?)[\"']?)?\])?([\/, ]+)/is";
$pattern = "/([\w-:\*]*)(?:\#([\w-]+)|\.([\w-]+))?(?:\[@?(!?[\w-:]+)(?:([!*^$]?=)[\"']?(.*?)[\"']?)?\])?([\/, ]+)/is";
preg_match_all($pattern, trim($selector_string).' ', $matches, PREG_SET_ORDER);
if (is_object($debug_object)) {$debug_object->debugLog(2, "Matches Array: ", $matches);}
if (is_object($debug_object)) {$debug_object->debug_log(2, "Matches Array: ", $matches);}
$selectors = array();
$result = array();
@ -718,12 +723,14 @@ class simple_html_dom_node
return $selectors;
}
function __get($name) {
function __get($name)
{
if (isset($this->attr[$name]))
{
return $this->convert_text($this->attr[$name]);
}
switch ($name) {
switch ($name)
{
case 'outertext': return $this->outertext();
case 'innertext': return $this->innertext();
case 'plaintext': return $this->text();
@ -732,22 +739,30 @@ class simple_html_dom_node
}
}
function __set($name, $value) {
switch ($name) {
function __set($name, $value)
{
global $debug_object;
if (is_object($debug_object)) {$debug_object->debug_log_entry(1);}
switch ($name)
{
case 'outertext': return $this->_[HDOM_INFO_OUTER] = $value;
case 'innertext':
if (isset($this->_[HDOM_INFO_TEXT])) return $this->_[HDOM_INFO_TEXT] = $value;
return $this->_[HDOM_INFO_INNER] = $value;
}
if (!isset($this->attr[$name])) {
if (!isset($this->attr[$name]))
{
$this->_[HDOM_INFO_SPACE][] = array(' ', '', '');
$this->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_DOUBLE;
}
$this->attr[$name] = $value;
}
function __isset($name) {
switch ($name) {
function __isset($name)
{
switch ($name)
{
case 'outertext': return true;
case 'innertext': return true;
case 'plaintext': return true;
@ -765,7 +780,7 @@ class simple_html_dom_node
function convert_text($text)
{
global $debug_object;
if (is_object($debug_object)) {$debug_object->debugLogEntry(1);}
if (is_object($debug_object)) {$debug_object->debug_log_entry(1);}
$converted_text = $text;
@ -777,7 +792,7 @@ class simple_html_dom_node
$sourceCharset = strtoupper($this->dom->_charset);
$targetCharset = strtoupper($this->dom->_target_charset);
}
if (is_object($debug_object)) {$debug_object->debugLog(3, "source charset: " . $sourceCharset . " target charaset: " . $targetCharset);}
if (is_object($debug_object)) {$debug_object->debug_log(3, "source charset: " . $sourceCharset . " target charaset: " . $targetCharset);}
if (!empty($sourceCharset) && !empty($targetCharset) && (strcasecmp($sourceCharset, $targetCharset) != 0))
{
@ -1045,10 +1060,10 @@ class simple_html_dom
// prepare
$this->prepare($str, $lowercase, $stripRN, $defaultBRText, $defaultSpanText);
// strip out comments
$this->remove_noise("'<!--(.*?)-->'is");
// strip out cdata
$this->remove_noise("'<!\[CDATA\[(.*?)\]\]>'is", true);
// strip out comments
$this->remove_noise("'<!--(.*?)-->'is");
// Per sourceforge http://sourceforge.net/tracker/?func=detail&aid=2949097&group_id=218559&atid=1044037
// Script tags removal now preceeds style tag removal.
// strip out <script> tags
@ -1078,10 +1093,15 @@ class simple_html_dom
// load html from file
function load_file()
{
//external error: NOT related to dom loading
$extError=error_get_last();
$args = func_get_args();
$this->load(call_user_func_array('file_get_contents', $args), true);
// Throw an error if we can't properly load the dom.
if (($error=error_get_last())!==null) {
$error=error_get_last();
if ($error!==$extError) {
$this->clear();
return false;
}
@ -1198,22 +1218,22 @@ class simple_html_dom
if ($success)
{
$charset = $matches[1];
if (is_object($debug_object)) {$debug_object->debugLog(2, 'header content-type found charset of: ' . $charset);}
if (is_object($debug_object)) {$debug_object->debug_log(2, 'header content-type found charset of: ' . $charset);}
}
}
if (empty($charset))
{
$el = $this->root->find('meta[http-equiv=Content-Type]',0);
$el = $this->root->find('meta[http-equiv=Content-Type]',0, true);
if (!empty($el))
{
$fullvalue = $el->content;
if (is_object($debug_object)) {$debug_object->debugLog(2, 'meta content-type tag found' . $fullvalue);}
if (is_object($debug_object)) {$debug_object->debug_log(2, 'meta content-type tag found' . $fullvalue);}
if (!empty($fullvalue))
{
$success = preg_match('/charset=(.+)/', $fullvalue, $matches);
$success = preg_match('/charset=(.+)/i', $fullvalue, $matches);
if ($success)
{
$charset = $matches[1];
@ -1221,7 +1241,7 @@ class simple_html_dom
else
{
// If there is a meta tag, and they don't specify the character set, research says that it's typically ISO-8859-1
if (is_object($debug_object)) {$debug_object->debugLog(2, 'meta content-type tag couldn\'t be parsed. using iso-8859 default.');}
if (is_object($debug_object)) {$debug_object->debug_log(2, 'meta content-type tag couldn\'t be parsed. using iso-8859 default.');}
$charset = 'ISO-8859-1';
}
}
@ -1231,14 +1251,19 @@ class simple_html_dom
// If we couldn't find a charset above, then lets try to detect one based on the text we got...
if (empty($charset))
{
// Have php try to detect the encoding from the text given to us.
$charset = mb_detect_encoding($this->root->plaintext . "ascii", $encoding_list = array( "UTF-8", "CP1252" ) );
if (is_object($debug_object)) {$debug_object->debugLog(2, 'mb_detect found: ' . $charset);}
// Use this in case mb_detect_charset isn't installed/loaded on this machine.
$charset = false;
if (function_exists('mb_detect_encoding'))
{
// Have php try to detect the encoding from the text given to us.
$charset = mb_detect_encoding($this->root->plaintext . "ascii", $encoding_list = array( "UTF-8", "CP1252" ) );
if (is_object($debug_object)) {$debug_object->debug_log(2, 'mb_detect found: ' . $charset);}
}
// and if this doesn't work... then we need to just wrongheadedly assume it's UTF-8 so that we can move on - cause this will usually give us most of what we need...
if ($charset === false)
{
if (is_object($debug_object)) {$debug_object->debugLog(2, 'since mb_detect failed - using default of utf-8');}
if (is_object($debug_object)) {$debug_object->debug_log(2, 'since mb_detect failed - using default of utf-8');}
$charset = 'UTF-8';
}
}
@ -1246,11 +1271,11 @@ class simple_html_dom
// Since CP1252 is a superset, if we get one of it's subsets, we want it instead.
if ((strtolower($charset) == strtolower('ISO-8859-1')) || (strtolower($charset) == strtolower('Latin1')) || (strtolower($charset) == strtolower('Latin-1')))
{
if (is_object($debug_object)) {$debug_object->debugLog(2, 'replacing ' . $charset . ' with CP1252 as its a superset');}
if (is_object($debug_object)) {$debug_object->debug_log(2, 'replacing ' . $charset . ' with CP1252 as its a superset');}
$charset = 'CP1252';
}
if (is_object($debug_object)) {$debug_object->debugLog(1, 'EXIT - ' . $charset);}
if (is_object($debug_object)) {$debug_object->debug_log(1, 'EXIT - ' . $charset);}
return $this->_charset = $charset;
}
@ -1616,14 +1641,14 @@ class simple_html_dom
protected function remove_noise($pattern, $remove_tag=false)
{
global $debug_object;
if (is_object($debug_object)) { $debug_object->debugLogEntry(1); }
if (is_object($debug_object)) { $debug_object->debug_log_entry(1); }
$count = preg_match_all($pattern, $this->doc, $matches, PREG_SET_ORDER|PREG_OFFSET_CAPTURE);
for ($i=$count-1; $i>-1; --$i)
{
$key = '___noise___'.sprintf('% 5d', count($this->noise)+1000);
if (is_object($debug_object)) { $debug_object->debugLog(2, 'key is: ' . $key); }
if (is_object($debug_object)) { $debug_object->debug_log(2, 'key is: ' . $key); }
$idx = ($remove_tag) ? 0 : 1;
$this->noise[$key] = $matches[$i][$idx][0];
$this->doc = substr_replace($this->doc, $key, $matches[$i][$idx][1], strlen($matches[$i][$idx][0]));
@ -1641,7 +1666,7 @@ class simple_html_dom
function restore_noise($text)
{
global $debug_object;
if (is_object($debug_object)) { $debug_object->debugLogEntry(1); }
if (is_object($debug_object)) { $debug_object->debug_log_entry(1); }
while (($pos=strpos($text, '___noise___'))!==false)
{
@ -1649,7 +1674,7 @@ class simple_html_dom
if (strlen($text) > $pos+15)
{
$key = '___noise___'.$text[$pos+11].$text[$pos+12].$text[$pos+13].$text[$pos+14].$text[$pos+15];
if (is_object($debug_object)) { $debug_object->debugLog(2, 'located key of: ' . $key); }
if (is_object($debug_object)) { $debug_object->debug_log(2, 'located key of: ' . $key); }
if (isset($this->noise[$key]))
{
@ -1674,7 +1699,7 @@ class simple_html_dom
function search_noise($text)
{
global $debug_object;
if (is_object($debug_object)) { $debug_object->debugLogEntry(1); }
if (is_object($debug_object)) { $debug_object->debug_log_entry(1); }
foreach($this->noise as $noiseElement)
{

View File

@ -0,0 +1,45 @@
# Author: zinnober
tidy: no
prune: no
# Set author
author: //a[@rel='author']
# Set date
date: //span[@class='Datum']
# Content is here
body: //div[@class='Artikel']
# Tidy up before article
strip: //div[@id='FAZHeaderNeu']
strip: //h2[@itemprop='headline']
strip: //span[@class='Datum']
strip: //span[@class='Autor']
strip_id_or_class: ArticlePagerTop
strip: //div[@class='FAZArtikelEinleitung']/h2
# General cleanup
strip: //div[@class='clear']
strip: //span[@class='Bildnachweis']
strip: //iframe
strip_id_or_class: Community
strip: ' · '
# Remove tracking and ads
strip_image_src: /l.gif?
strip: //img[@width='1']
strip_id_or_class: invisible
strip_id_or_class: Anzeige
strip_id_or_class: billboard
# Remove clutter after article
strip_id_or_class: Tagline
strip_id_or_class: ArtikelAbbinder
strip_id_or_class: FAZArtikelKommentare
strip_id_or_class: ArtikelKommentieren
strip_id_or_class: FAZContentRight
# Try it yourself
test_url: http://blogs.faz.net/wost/2014/08/17/viel-fuck-und-wenig-guter-sex-1239/

View File

@ -0,0 +1,12 @@
title: //title
body: //iframe
replace_string(<![CDATA[): _
replace_string(]]>): _
single_page_link: //link[@type='application/xml+oembed']
prune: no
tidy: no
http://www.dailymotion.com/video/x1vk5oh_before-they-were-on-game-of-thrones_people

View File

@ -0,0 +1,3 @@
<?php
// this is here to prevent directory listing over the web
?>

View File

@ -0,0 +1,6 @@
title: //h2
body: div[@id='illustration'] | //p
prune: no
tidy: no
test_url: http://mobile.lemondeinformatique.fr/actualites/lire-les-datacenters-d-apple-google-et-facebook-eco-responsables-selon-greenpeace-le-monde-informatique-57122.html

11
inc/3rdparty/site_config/custom/ted.com.txt vendored Executable file
View File

@ -0,0 +1,11 @@
title: //title
body: //div[@class='talk-article__body talk-transcript__body'] | //div[@class='media__image media__image--thumb talk-link__image']
strip_id_or_class: talk-transcript__para__time
single_page_link: //a[@id='hero-transcript-link']
#prune: no
tidy: no
test_url: http://www.ted.com/talks/andrew_solomon_how_the_worst_moments_in_our_lives_make_us_who_we_are

View File

@ -1,3 +1,2 @@
<?php
// this is here to prevent directory listing over the web
?>
<?php
// this is here to prevent directory listing over the web

View File

@ -0,0 +1,14 @@
body: //div[@id='articlebody']
title: //h1
author: //p[@id='by']//a
next_page_link: //span[@class='next']/a
# Not the same as below!
prune: yes
tidy: no
# Annoying 'next' links plainly inside the article body
strip: //*[text()[contains(.,'Next: ')]]
test_url: http://psychology.about.com/od/theoriesofpersonality/ss/defensemech.htm

8
inc/3rdparty/site_config/standard/24ways.org.txt vendored Normal file → Executable file
View File

@ -1,6 +1,6 @@
title: //div[@class='meta']/h2/a
author: //div[@class='meta']/h2/following-sibling::p/a/text()
date://div[@class='meta']/h2/strong
body: //div[@id='article']
title: //div[@class='meta']/h2/a
author: //div[@class='meta']/h2/following-sibling::p/a/text()
date://div[@class='meta']/h2/strong
body: //div[@id='article']
strip: //div[@class='domore']
test_url: http://24ways.org/2011/composing-the-new-canon

View File

@ -0,0 +1,8 @@
title: //h1[contains(@class, 'entry-title')]
date: //meta[@name='weibo: article:create_at']/@content
body: //div[contains(@class, 'mainContent')]
strip_id_or_class: related_topics
prune: no
test_url: http://www.36kr.com/p/207879.html

8
inc/3rdparty/site_config/standard/37signals.com.txt vendored Normal file → Executable file
View File

@ -1,6 +1,6 @@
title: //div[@class='post_header']//h2/a
author: //span[@class='author']
date: //span[@class='date']
body: //div[@id='Content']
title: //div[@class='post_header']//h2/a
author: //span[@class='author']
date: //span[@class='date']
body: //div[@id='Content']
test_url: http://37signals.com/svn/posts/2785-the-end-of-the-it-department

16
inc/3rdparty/site_config/standard/3quarksdaily.com.txt vendored Normal file → Executable file
View File

@ -1,9 +1,9 @@
body: //div[@class='content']
date: //div[@class='content']/h2
strip: //div[@class='content']/h2
title: //div[@class='content']/h3
strip: //div[@id='postmenu']
strip: //div[@class='trackback']
tidy: no
body: //div[@class='content']
date: //div[@class='content']/h2
strip: //div[@class='content']/h2
title: //div[@class='content']/h3
strip: //div[@id='postmenu']
strip: //div[@class='trackback']
tidy: no
test_url: http://www.3quarksdaily.com/3quarksdaily/2012/01/martin-luther-king-i-have-a-dream.html

0
inc/3rdparty/site_config/standard/3voor12.vpro.nl.txt vendored Normal file → Executable file
View File

4
inc/3rdparty/site_config/standard/43folders.com.txt vendored Normal file → Executable file
View File

@ -1,4 +1,4 @@
body: //*[@class = 'content']
author: //*[@class = 'submitted']/a
body: //*[@class = 'content']
author: //*[@class = 'submitted']/a
date: substring-after(//*[@class = 'submitted']/text(), '|')
test_url: http://www.43folders.com/2011/04/22/cranking

50
inc/3rdparty/site_config/standard/500px.com.txt vendored Normal file → Executable file
View File

@ -1,27 +1,27 @@
# very loose setup for both 500px.com/photo/* and 500px.com/blog/*
# photo page example: http://500px.com/photo/4181666
# blog page example: http://500px.com/blog/110
# avoid "no text" error
tidy:no
prune:no
# reorganize photo page elements
#body://div[contains(@class,'container')]
move_into(body)://div[contains(@id,'thephoto')]
move_into(body)://div[contains(@id,'description')]
move_into(body)://div[contains(@id,'tags')]
move_into(body)://div[contains(@id,'photo-info')]
# clean photo page info
strip://span[contains(@id,'copyright')]
strip://*[contains(@id,'store')]
strip://*[contains(@id,'user-info')]
strip://*[contains(@id,'photo-stats')]
strip://*[contains(@id,'voting_controls_container')]
strip://*[contains(@id,'more-photos')]
strip://*[contains(@id,'embed-photo')]
# clean blog page side bar
# very loose setup for both 500px.com/photo/* and 500px.com/blog/*
# photo page example: http://500px.com/photo/4181666
# blog page example: http://500px.com/blog/110
# avoid "no text" error
tidy:no
prune:no
# reorganize photo page elements
#body://div[contains(@class,'container')]
move_into(body)://div[contains(@id,'thephoto')]
move_into(body)://div[contains(@id,'description')]
move_into(body)://div[contains(@id,'tags')]
move_into(body)://div[contains(@id,'photo-info')]
# clean photo page info
strip://span[contains(@id,'copyright')]
strip://*[contains(@id,'store')]
strip://*[contains(@id,'user-info')]
strip://*[contains(@id,'photo-stats')]
strip://*[contains(@id,'voting_controls_container')]
strip://*[contains(@id,'more-photos')]
strip://*[contains(@id,'embed-photo')]
# clean blog page side bar
strip://*[contains(@class,'col d3 clearafter')]
test_url: http://500px.com/photo/3641041?from=editors

4
inc/3rdparty/site_config/standard/512pixels.net.txt vendored Normal file → Executable file
View File

@ -1,2 +1,2 @@
title: substring-before(//title, '&mdash;')
test_url: http://512pixels.net/more-on-linked-lists/
title: //meta[@property='og:title']/@content
test_url: http://www.512pixels.net/blog/2014/10/the-move

14
inc/3rdparty/site_config/standard/5by5.tv.txt vendored Normal file → Executable file
View File

@ -1,9 +1,9 @@
body: //*[@id="episode"]
prune: no
tidy: no
autodetect_next_page: no
strip_id_or_class: player
body: //*[@id="episode"]
prune: no
tidy: no
autodetect_next_page: no
strip_id_or_class: player
strip://*[@id="header"]
test_url: http://5by5.tv/buildanalyze/60

View File

@ -0,0 +1,7 @@
title: //*[@id='sstitle']
body: //div[@id='sstory']
strip_id_or_class: newsoptions
prune: no
test_url: http://www.7newsbelize.com/sstory.php?nid=25654
test_url: http://www.7newsbelize.com/7news.xml

14
inc/3rdparty/site_config/standard/944.com.txt vendored Normal file → Executable file
View File

@ -1,9 +1,9 @@
title: //h2[@class='border']
body: //div[@class='padding']
convert_double_br_tags: yes
strip: //div[@id='social_sharing']
strip: //div[@class='socialLinks']
title: //h2[@class='border']
body: //div[@class='padding']
convert_double_br_tags: yes
strip: //div[@id='social_sharing']
strip: //div[@class='socialLinks']
test_url: http://www.944.com/articles/mild-obsessions-frock-la-get-to-know-victoria-tik-s-haute-sustainable-fashion-line/

40
inc/3rdparty/site_config/standard/README.md vendored Executable file
View File

@ -0,0 +1,40 @@
Full-Text RSS site config files
================
[Full-Text RSS](http://fivefilters.org/content-only/), our article extraction tool, makes use of site-specific extraction rules to improve results. Each time a URL is processed, it checks to see if there are extraction rules for the site being processed. If there are no rules are found, it tries to detect the content block automatically.
This repository contains the site-specific extraction rules we rely on in Full-Text RSS.
### Contributing changes
We run automated tests on these files to detect issues. If you'd like to help keep these up to date, please look at the [test results](http://siteconfig.fivefilters.org/test/) and see which files you'd like to contribute fixes for.
We chose GitHub for this set of files because they offer one feature which we hope will make contributing changes easier: [file editing](https://github.com/blog/844-forking-with-the-edit-button) through the web interface.
You can now make changes to any of our site config files and request that your changes be pulled into the main set we maintain. This is what GitHub calls the Fork and Pull model:
> The Fork & Pull Model lets anyone fork an existing repository and push changes to their personal fork without requiring access be granted to the source repository. The changes must then be pulled into the source repository by the project maintainer. This model reduces the amount of friction for new contributors and is popular with open source projects because it allows people to work independently without upfront coordination.
When we receive a pull request we'll review the changes and if everything's okay we'll update our copy.
If a site is not in our set, you can create a file for it in the same way. See [Creating files on GitHub](https://github.com/blog/1327-creating-files-on-github).
### How to write a site config file
The quickest and simplest way is to use our [point-and-click interface](http://siteconfig.fivefilters.org). It's a simple tool only intended to create a rule to extract the correct content block.
For further refinements, e.g. selecting the title, stripping elements, dealing with multi-page articles, please see our [help page](http://help.fivefilters.org/customer/portal/articles/223153-site-patterns).
### Instapaper
When we introduced site patterns, we chose to adopt the [same format](http://blog.instapaper.com/post/730281947) used by Instapaper. This allows us to make use of the existing extraction rules contributed by Instapaper users.
Marco, Instapaper's creator, graciously opened up the database of contributions to everyone:
> And, recognizing that your efforts could be useful to a wide range of other tools and services, I'll make the list of all of these site-specific configurations available to the public, free, with no strings attached.
Most of the extraction rules in our set are borrowed from Instapaper. You can see the list maintained by Instapaper at [instapaper.com/bodytext/](http://instapaper.com/bodytext/) (no longer available since Instapaper was sold).
### Testing site config files
Currently you will have to have a copy of Full-Text RSS to test changes to the site config files. In the future we will try to make this process easier.

18
inc/3rdparty/site_config/standard/aachener-nachrichten.de.txt vendored Normal file → Executable file
View File

@ -1,10 +1,10 @@
title: //meta[@property='og:title']/@content
body: //*[@class='fliesstext_detail' or @class='detail_fliesstext'] | //img[@itemprop="image" and starts-with(@src, "/sixcms/media.php/")]
strip_id_or_class: socialshareprivacy1
strip_id_or_class: zvaFacebookButton
tidy: no
prune: no
title: //meta[@property='og:title']/@content
body: //*[@class='fliesstext_detail' or @class='detail_fliesstext'] | //img[@itemprop="image" and starts-with(@src, "/sixcms/media.php/")]
strip_id_or_class: socialshareprivacy1
strip_id_or_class: zvaFacebookButton
tidy: no
prune: no
test_url: http://www.aachener-nachrichten.de/lokales/aachen-detail-an/2517757

18
inc/3rdparty/site_config/standard/aachener-zeitung.de.txt vendored Normal file → Executable file
View File

@ -1,10 +1,10 @@
title: //meta[@property='og:title']/@content
body: //*[@class='fliesstext_detail' or @class='detail_fliesstext'] | //img[@itemprop="image" and starts-with(@src, "/sixcms/media.php/")]
strip_id_or_class: socialshareprivacy1
strip_id_or_class: zvaFacebookButton
tidy: no
prune: no
title: //meta[@property='og:title']/@content
body: //*[@class='fliesstext_detail' or @class='detail_fliesstext'] | //img[@itemprop="image" and starts-with(@src, "/sixcms/media.php/")]
strip_id_or_class: socialshareprivacy1
strip_id_or_class: zvaFacebookButton
tidy: no
prune: no
test_url: http://www.aachener-zeitung.de/sixcms/detail.php?template=az_detail&id=2552718

10
inc/3rdparty/site_config/standard/abc.es.txt vendored Normal file → Executable file
View File

@ -1,7 +1,7 @@
title: //meta[@property='og:title']/@content
body: //div[@class='datosi' or @class='date' or @class='photo-alt1' or @class='text']
strip_id_or_class: colB
prune: no
title: //meta[@property='og:title']/@content
body: //div[@class='datosi' or @class='date' or @class='photo-alt1' or @class='text' or @itemprop='articleBody']
strip_id_or_class: colB
prune: no
test_url: http://www.abc.es/20120209/tv-series/abci-house-ultima-temporada-201202090936.html

26
inc/3rdparty/site_config/standard/abc.net.au.txt vendored Normal file → Executable file
View File

@ -1,10 +1,18 @@
title: //h1
author: //div[@class="byline"]/a
date: //span[@class="timestamp"]
strip: //p[@class="topics"]
strip: //h1
strip: //div[@class="byline"]
strip: //p[@class="published"]
title: //div[@class='article section']//h1
author: //div[@class="byline"]/a
date: //span[@class="timestamp"]
body: //div[@class="page section"]
strip: //a[@class="inline-caption"]
strip: //p[@class="ticker section noprint"]
strip: //p[@class="topics"]
strip: //h1
strip: //div[@class="byline"]
strip: //p[@class="published"]
strip: //div[contains(@class,"featured-scroller")]
test_url: http://www.abc.net.au/news/2011-11-08/crabb-carbon-legislation-abbott-demolition/3652544
strip_id_or_class: footer
tidy: no
test_url: http://www.abc.net.au/news/2013-03-27/open-speed-highways-change-clp-giles/4597892
test_url: http://www.abc.net.au/news/2013-04-30/credit-growth-remains-subdued/4660054?section=business

52
inc/3rdparty/site_config/standard/abcnews.go.com.txt vendored Normal file → Executable file
View File

@ -1,27 +1,27 @@
title: //h1[@class='headline']
body: //div[@id='storyText']
# for video entries
body: //img[@id='ff-img'] | //div[@id='meta']//div[contains(@class, 'overview')]
author: //div[@class='byline']
date: //div[@class='date']
strip: //*[@id='date_partner']
strip: //div[@class='breadcrumb']
strip: //div[contains(@class,'show_tools')]
strip: //div[@id='sponsoredByAd']
strip: //div[contains(@class,'rel_container')]
strip: //p[a[starts-with(@href, 'http://www.twitter.com')]]
strip: //p[a[starts-with(@href, 'http://www.facebook.com')]]
strip: //p[contains(., 'Click here to return to')]
#strip_id_or_class: media
strip_id_or_class: mediaplayer
replace_string(<link rel="image_src" href="http): <img id="ff-img" src="http
prune: no
single_page_link: concat(//li[@class='pager']//a/@href, '&singlePage=true')
test_url: http://abcnews.go.com/Politics/newt-gingrich-rocky-rollout-presidential-campaign-recover/story?id=13632744
# multi-page
title: //h1[@class='headline']
body: //div[@id='storyText']
# for video entries
body: //img[@id='ff-img'] | //div[@id='meta']//div[contains(@class, 'overview')]
author: //div[@class='byline']
date: //div[@class='date']
strip: //*[@id='date_partner']
strip: //div[@class='breadcrumb']
strip: //div[contains(@class,'show_tools')]
strip: //div[@id='sponsoredByAd']
strip: //div[contains(@class,'rel_container')]
strip: //p[a[starts-with(@href, 'http://www.twitter.com')]]
strip: //p[a[starts-with(@href, 'http://www.facebook.com')]]
strip: //p[contains(., 'Click here to return to')]
#strip_id_or_class: media
strip_id_or_class: mediaplayer
replace_string(<link rel="image_src" href="http): <img id="ff-img" src="http
prune: no
single_page_link: concat(//li[@class='pager']//a/@href, '&singlePage=true')
test_url: http://abcnews.go.com/Politics/newt-gingrich-rocky-rollout-presidential-campaign-recover/story?id=13632744
# multi-page
test_url: http://abcnews.go.com/Blotter/family-freed-american-hostage-somalia-seals-obama/story?id=15439544

16
inc/3rdparty/site_config/standard/accesstoinsight.org.txt vendored Normal file → Executable file
View File

@ -1,9 +1,9 @@
title: //div[@id='H_docTitle']
body: //div[@id='H_meta' or @id='H_content' or @id='F_footer']
strip_id_or_class: F_toenail
prune: no
title: //div[@id='H_docTitle']
body: //div[@id='H_meta' or @id='H_content' or @id='F_footer']
strip_id_or_class: F_toenail
prune: no
test_url: http://www.accesstoinsight.org/lib/authors/nyanaponika/wheel026.html

4
inc/3rdparty/site_config/standard/acidcow.com.txt vendored Normal file → Executable file
View File

@ -1,3 +1,3 @@
body: //div[starts-with(@id, 'news-id-')]
body: //div[starts-with(@id, 'news-id-')]
test_url: http://acidcow.com/fun/20933-acid-picdump-83-pics.html

14
inc/3rdparty/site_config/standard/acquia.com.txt vendored Normal file → Executable file
View File

@ -1,9 +1,9 @@
title://h1[@class="title"]
author://div[@class="submitted"]/span/a
date://div[@class="submitted"]/span
body://div[@class="content-wrapper"]
strip://div[@id="skip-link"]
strip://div[@id="region-content-3-3"]
title://h1[@class="title"]
author://div[@class="submitted"]/span/a
date://div[@class="submitted"]/span
body://div[@class="content-wrapper"]
strip://div[@id="skip-link"]
strip://div[@id="region-content-3-3"]
strip://div[@id="section-footer"]
test_url: https://www.acquia.com/blog/drupals-long-warmth-toward-third-party-code

6
inc/3rdparty/site_config/standard/acroswing.fr.txt vendored Normal file → Executable file
View File

@ -1,5 +1,5 @@
tidy:no
date: //time[@class='updated']
dissolve: //ul[@class='video-gallery']/li
tidy:no
date: //time[@class='updated']
dissolve: //ul[@class='video-gallery']/li
dissolve: //ul[@class='video-gallery']
test_url: http://www.acroswing.fr/actualites/competition_rock/selectif_bellegarde_sur_valserine__2012-02-26.php

View File

@ -0,0 +1,6 @@
# Generated by FiveFilters.org's web-based selection tool
# Place this file inside your site_config/custom/ folder
# Source: http://siteconfig.fivefilters.org/grab.php?url=http%3A%2F%2Fwww.adme.ru%2Ftvorchestvo-hudozhniki%2Fprostoj-kak-5-kopeek-hudozhnik-557405%2F
body: //article[contains(concat(' ',normalize-space(@class),' '),' article ')]
test_url: http://www.adme.ru/tvorchestvo-hudozhniki/prostoj-kak-5-kopeek-hudozhnik-557405/

View File

@ -0,0 +1,5 @@
title: //h1[@class='articleTitle ']
body: //div[@class='bodyText widget storyContent']
strip: //p/span[@class='quote']/..
strip_id_or_class: 'pull1'
test_url: https://www.aftenposten.no/meninger/spaltister/Portrett-av-scenekunstneren-som-ung-mann-7167959.html

View File

@ -0,0 +1,13 @@
author: //article//address[contains(@class, 'author')]
body: //article[.//div[contains(@class, 'abBodyText')]]//*[contains(@class, 'abLeadText') or contains(@class, 'abBodyText') or contains(@class, 'abImageBlock') or contains(@class, 'abIGSatellite')]
strip: //address//img
strip: //footer
strip_id_or_class: abSticky
prune: no
test_url: http://www.aftonbladet.se/sportbladet/hockey/sverige/allsvenskan/article17498194.ab
test_url: http://www.aftonbladet.se/debatt/article16207536.ab
test_url: http://www.aftonbladet.se/debatt/debattamnen/politik/article17483377.ab
test_url: http://www.aftonbladet.se/rss.xml

26
inc/3rdparty/site_config/standard/aht.seriouseats.com.txt vendored Normal file → Executable file
View File

@ -1,15 +1,15 @@
body: //div[@id='content']
# clean up recipe pages
strip: //h2[@class='fn'] | //h2[@class='double-lined'] | //h3 | //div[@id='threeColumn2'] | //div[@id='threeColumn3']
#recipe pages
strip_id_or_class: "recipe-feedback"
strip_id_or_class: "comments"
strip_id_or_class: "procedure-number"
strip_id_or_class: "more-with-author"
#slice
strip_id_or_class: "inner"
body: //div[@id='content']
# clean up recipe pages
strip: //h2[@class='fn'] | //h2[@class='double-lined'] | //h3 | //div[@id='threeColumn2'] | //div[@id='threeColumn3']
#recipe pages
strip_id_or_class: "recipe-feedback"
strip_id_or_class: "comments"
strip_id_or_class: "procedure-number"
strip_id_or_class: "more-with-author"
#slice
strip_id_or_class: "inner"
test_url: http://aht.seriouseats.com/archives/2009/12/the-burger-lab-salting-ground-beef.html

View File

@ -0,0 +1,6 @@
body: //div[@id='main-column']//div[@class='content']
prune: no
test_url: http://www.albayan.ae/across-the-uae/education/2013-08-29-1.1949645
test_url: http://www.albayan.ae/1.448?ot=ot.AjaxPageLayout

0
inc/3rdparty/site_config/standard/alex.mullr.net.txt vendored Normal file → Executable file
View File

View File

@ -0,0 +1,4 @@
body: //section[@class='content']
date: //span[1]
author: //h1[@id='sitetitle']
test_url: http://alexduner.com/blog/something-i-learned-today

View File

@ -0,0 +1,4 @@
body: //section[@class='content']
date: //span[1]
author: //h1[@id='sitetitle']
test_url: https://alexduner.squarespace.com/blog/2013/1/tech-culture-from-the-outside-looking-in

20
inc/3rdparty/site_config/standard/alistapart.com.txt vendored Normal file → Executable file
View File

@ -1,12 +1,12 @@
title: //h1[@class='title']
author: //h3[@class='byline']/a
date: //div[@class='ishinfo']
body: //*[@id='articletext']
strip_id_or_class: 'ishinfo'
strip_id_or_class: 'metastuff'
strip_id_or_class: 'learnmore'
strip_id_or_class: 'discuss'
title: //h1[@class='title']
author: //h3[@class='byline']/a
date: //div[@class='ishinfo']
body: //*[@id='articletext']
strip_id_or_class: 'ishinfo'
strip_id_or_class: 'metastuff'
strip_id_or_class: 'learnmore'
strip_id_or_class: 'discuss'
prune: no
test_url: http://www.alistapart.com/articles/organizing-mobile/

14
inc/3rdparty/site_config/standard/aljazeera.com.txt vendored Normal file → Executable file
View File

@ -1,8 +1,8 @@
title: //span[@id='DetailedTitle']
body: //td[@id='tdTextContent']
strip_id_or_class: Skyscrapper_Body
date: //span[@id='ctl00_cphBody_lblDate']
author: //div[@id="dvAuthorInfo"]//a/text()
strip: //table[ tbody/tr/td/object ]
prune: no
title: //span[@id='DetailedTitle']
body: //td[@id='tdTextContent']
strip_id_or_class: Skyscrapper_Body
date: //span[@id='ctl00_cphBody_lblDate']
author: //div[@id="dvAuthorInfo"]//a/text()
strip: //table[ tbody/tr/td/object ]
prune: no
test_url: http://www.aljazeera.com/indepth/opinion/2012/01/2012114121925380575.html

24
inc/3rdparty/site_config/standard/allrecipes.com.txt vendored Normal file → Executable file
View File

@ -1,14 +1,14 @@
title: //h1[@id='itemTitle']
body: //img[@id="ctl00_CenterColumnPlaceHolder_recipe_photoStuff_imgPhoto"] | //div[@id='ctl00_CenterColumnPlaceHolder_recipe_divSubmitter'] | //div[contains(@class, 'recipe-details-content')]
strip: //div[@class='top-left' or @class='top-right' or @class='bot-left' or @class='bot-right']
strip: //div[contains(@class, 'rightcoltoolsdiv')]
strip: //div[contains(@class, 'servings-form')]
strip: //p[@class='nutritional-information']
strip: //a[contains(@class, 'nutritional-information') or contains(@class, 'nutritionanchor')]
strip: //div[@id='nutri-info']/div[contains(@class, 'title')]
strip: //img[@id='ctl00_CenterColumnPlaceHolder_recipe_imgSubmitter']
strip_id_or_class: eshaAttribute
strip_id_or_class: eshaParagraph
prune: no
title: //h1[@id='itemTitle']
body: //img[@id="ctl00_CenterColumnPlaceHolder_recipe_photoStuff_imgPhoto"] | //div[@id='ctl00_CenterColumnPlaceHolder_recipe_divSubmitter'] | //div[contains(@class, 'recipe-details-content')]
strip: //div[@class='top-left' or @class='top-right' or @class='bot-left' or @class='bot-right']
strip: //div[contains(@class, 'rightcoltoolsdiv')]
strip: //div[contains(@class, 'servings-form')]
strip: //p[@class='nutritional-information']
strip: //a[contains(@class, 'nutritional-information') or contains(@class, 'nutritionanchor')]
strip: //div[@id='nutri-info']/div[contains(@class, 'title')]
strip: //img[@id='ctl00_CenterColumnPlaceHolder_recipe_imgSubmitter']
strip_id_or_class: eshaAttribute
strip_id_or_class: eshaParagraph
prune: no
test_url: http://allrecipes.com/Recipe/Taco-Pie/Detail.aspx?src=rotd

21
inc/3rdparty/site_config/standard/allthingsd.com.txt vendored Normal file → Executable file
View File

@ -1,10 +1,13 @@
title://div[@class="article-title"]/h1[@class="title"]
date: //p[@class="article-date"]
body://*[@class="article-body article-text"]
# Trim out related posts at bottom of article
strip://blockquote[@class="memo"]
# Yup, no idea why author won't work...
author://div[@class="page-header article-header clearfix"]/p[@class="title"]
title://div[@class="article-title"]/h1[@class="title"]
date: //p[@class="article-date"]
body://div[contains(@class, "article-body")]
# Trim out related posts at bottom of article
strip://blockquote[@class="memo"]
tidy: no
# Yup, no idea why author won't work...
author://div[@class="page-header article-header clearfix"]/p[@class="title"]
# [Marco:] Author won't work here because the page defines the "home" link under the author's name as rel="author", which always gets priority if the page has defined it.
test_url: http://allthingsd.com/20120513/exclusive-yahoos-thompson-out-levinsohn-in-board-settlement-with-loeb-nears-completion/
test_url: http://allthingsd.com/20120513/exclusive-yahoos-thompson-out-levinsohn-in-board-settlement-with-loeb-nears-completion/
test_url: http://allthingsd.com/20131010/google-cio-ben-fried-on-how-google-works/

12
inc/3rdparty/site_config/standard/allyou.com.txt vendored Normal file → Executable file
View File

@ -1,8 +1,8 @@
title: //div[@id='pageHdr']//h1
body: //div[@id='pageHdr']/*[@class='dek'] | //div[@id='printArticle' or @id='slideShowPrint']
strip: //div[contains(@class, 'infoBox') or @id='infoBox']
single_page_link: //li[@id='print']/a
title: //div[@id='pageHdr']//h1
body: //div[@id='pageHdr']/*[@class='dek'] | //div[@id='printArticle' or @id='slideShowPrint']
strip: //div[contains(@class, 'infoBox') or @id='infoBox']
single_page_link: //li[@id='print']/a
prune: no
test_url: http://www.allyou.com/budget-home/money-shopping/freebies-online-00400000066392/

18
inc/3rdparty/site_config/standard/alphabeta.argaam.com.txt vendored Normal file → Executable file
View File

@ -1,11 +1,11 @@
body: //div[@class = 'entry']
date: substring-after(//p[@class="date"],'بتاريخ ')
strip_id_or_class: date
strip_id_or_class: follow-single
strip_id_or_class: ratingblock
strip_id_or_class: newRatingHolder
strip_id_or_class: postmetadata
strip_id_or_class: addthis_toolbox
strip_id_or_class: addthis_default_style
body: //div[@class = 'entry']
date: substring-after(//p[@class="date"],'بتاريخ ')
strip_id_or_class: date
strip_id_or_class: follow-single
strip_id_or_class: ratingblock
strip_id_or_class: newRatingHolder
strip_id_or_class: postmetadata
strip_id_or_class: addthis_toolbox
strip_id_or_class: addthis_default_style
strip_id_or_class: size-full
test_url: http://alphabeta.argaam.com/?p=35657

16
inc/3rdparty/site_config/standard/alriyadh.com.txt vendored Normal file → Executable file
View File

@ -1,9 +1,9 @@
body: //div[@id = "article-view"]
body: //div[contains(@class, 'article')]//div[contains(@class, 'photo_bg')]
author: //p[@class = "author"]
strip: //h1
strip: //h2
strip_id_or_class: author
prune: no
test_url: http://www.alriyadh.com/2011/10/10/article674357.html
body: //div[@id = "article-view"]
body: //div[contains(@class, 'article')]//div[contains(@class, 'photo_bg')]
author: //p[@class = "author"]
strip: //h1
strip: //h2
strip_id_or_class: author
prune: no
test_url: http://www.alriyadh.com/2011/10/10/article674357.html
test_url: http://www.alriyadh.com/net/article/780935

0
inc/3rdparty/site_config/standard/alseraj.net.txt vendored Normal file → Executable file
View File

0
inc/3rdparty/site_config/standard/alt1040.com.txt vendored Normal file → Executable file
View File

View File

@ -0,0 +1,4 @@
single_page_link: //div[contains(@class, 'story_tools')]//a[contains(@href, '/print/')]
test_url: http://www.alternet.org/civil-liberties/noam-chomsky-surveillance-state-beyond-imagination-being-created-one-freest
test_url: http://feeds.feedblitz.com/alternet

0
inc/3rdparty/site_config/standard/altfoto.com.txt vendored Normal file → Executable file
View File

16
inc/3rdparty/site_config/standard/alumni.stanford.edu.txt vendored Normal file → Executable file
View File

@ -1,10 +1,10 @@
title: //h1
author: substring-after(//div[@class="enableBullets"]/preceding-sibling::p[1], "By ")
date: //div/a[contains (@href, "issue")]
move_into(//div[@class="enableBullets"]/p): (//div[@id="content"]//img)[1]
title: //h1
author: substring-after(//div[@class="enableBullets"]/preceding-sibling::p[1], "By ")
date: //div/a[contains (@href, "issue")]
move_into(//div[@class="enableBullets"]/p): (//div[@id="content"]//img)[1]
body: //div[@class="enableBullets"]
test_url: http://alumni.stanford.edu/get/page/magazine/article/?article_id=54819

View File

@ -0,0 +1,6 @@
body: //div[@id='content']//div[contains(@class, 'content')]
strip_id_or_class: widget
strip: //a[contains(@href, 'upm_export=')]
test_url: http://amandala.com.bz/news/feed/
test_url: http://amandala.com.bz/news/poor-pse-results-30-raise/

36
inc/3rdparty/site_config/standard/amazon.com.txt vendored Normal file → Executable file
View File

@ -1,19 +1,19 @@
title: //span[@id = 'btAsinTitle']
body: (//*[@id='prodImageCell']//a)[1] | //div[@id = 'ps-content'] | //span[@id='actualPriceValue'] | //h2[.='Product Details']/following-sibling::div | //div[@class='h2' and .='Product Description']/following-sibling::div
#strip_id_or_class: quantityDropdownDiv
#strip_id_or_class: addToCartSpan
#strip_id_or_class: oneClickDiv
strip_id_or_class: nocontent
strip_id_or_class: masDynamicConten
strip_id_or_class: dynamic-content
prune: no
find_string: <span id="actualPriceValue">
replace_string: <span id="actualPriceValue"><br />Price:
strip_id_or_class: collapsePS
strip_id_or_class: expandPS
strip_id_or_class: psPlaceHolde
strip: //li[contains(., 'update product info') or contains(., 'give feedback on images')]
title: //span[@id = 'btAsinTitle']
body: (//*[@id='prodImageCell']//a)[1] | //div[@id = 'ps-content'] | //span[@id='actualPriceValue'] | //h2[.='Product Details']/following-sibling::div | //div[@class='h2' and .='Product Description']/following-sibling::div
#strip_id_or_class: quantityDropdownDiv
#strip_id_or_class: addToCartSpan
#strip_id_or_class: oneClickDiv
strip_id_or_class: nocontent
strip_id_or_class: masDynamicConten
strip_id_or_class: dynamic-content
prune: no
find_string: <span id="actualPriceValue">
replace_string: <span id="actualPriceValue"><br />Price:
strip_id_or_class: collapsePS
strip_id_or_class: expandPS
strip_id_or_class: psPlaceHolde
strip: //li[contains(., 'update product info') or contains(., 'give feedback on images')]
test_url: http://www.amazon.com/Common-Sense-Forestry-Living-Mother/dp/1931498210/

8
inc/3rdparty/site_config/standard/americandrink.net.txt vendored Normal file → Executable file
View File

@ -1,6 +1,6 @@
title: //div[@class='head']/h2/a
author: //div[@class='head']/a
date: //div[@class='head']/p[@class='date']/a
body: //div[@class='copy']
title: //div[@class='head']/h2/a
author: //div[@class='head']/a
date: //div[@class='head']/p[@class='date']/a
body: //div[@class='copy']
strip: //p[@class='meta']
test_url: http://americandrink.net/post/10567188712/free-the-hooch

18
inc/3rdparty/site_config/standard/americascup.com.txt vendored Normal file → Executable file
View File

@ -1,10 +1,10 @@
title: //div[@class="editorial-content"]/h3
body: //div[@class="hero-image" or @class="editorial-content"]
strip: //ul[@class="hero-caption"]
strip_id_or_class: footer
prune: no
tidy: no
title: //div[@class="editorial-content"]/h3
body: //div[@class="hero-image" or @class="editorial-content"]
strip: //ul[@class="hero-caption"]
strip_id_or_class: footer
prune: no
tidy: no
test_url: http://www.americascup.com/en/Latest/News/2012/3/Coutts-and-Peyron-tell-transformative-tale-at-Global-Sports-Forum/

6
inc/3rdparty/site_config/standard/americastestkitchenfeed.com.txt vendored Normal file → Executable file
View File

@ -1,5 +1,5 @@
title: //h1[@class="post-title"]
author: //span[@class="author"]/a
date: //span[@class="date"]
title: //h1[@class="post-title"]
author: //span[@class="author"]/a
date: //span[@class="date"]
body: //div[@class="post-content main"]
test_url: http://www.americastestkitchenfeed.com/gadgets-and-gear/2012/07/chill-out-with-tovolos-king-cube-silicone-ice-cube-tray/

View File

@ -0,0 +1,8 @@
title: //title
body: //div[@class="entry-content"]
author: //span[@class="author vcard"]
date: //span[@class="entry-date"]
test_url: http://www.amptoons.com/blog/2013/03/14/open-thread-and-link-farm-i-hate-being-sick-edition/

26
inc/3rdparty/site_config/standard/anandtech.com.txt vendored Normal file → Executable file
View File

@ -1,11 +1,15 @@
author: //a[@class='b'][1]
date: substring-after(substring-before(//div, 'Posted in'), ' on ')
strip_image_src: /content/images/globals/
strip: //h2[. = 'Page 1']/preceding::p
strip: //h2
prune: no
single_page_link: concat('http://www.anandtech.com/print/', substring-after(//meta[@property='og:url']/@content, '/show/'))
test_url: http://www.anandtech.com/show/5812/eurocom-monster-10-clevos-little-monster/
body: //section[@class='main_cont']/img | //div[@class='articleContent']
title: //div[@class='blog_top_left']//h2
author: //a[@class='b'][1]
date: substring-after(substring-before(//div, 'Posted in'), ' on ')
strip_image_src: /content/images/globals/
strip: //h2[. = 'Page 1']/preceding::p
strip: //h2
prune: no
single_page_link: concat('http://www.anandtech.com/print/', substring-after(//meta[@property='og:url']/@content, '/show/'))
test_url: http://www.anandtech.com/show/8370/gigabyte-am1m-s2h-review
test_url: http://www.anandtech.com/show/8402/sandisk-releases-ultra-ii-ssd-the-second-tlc-nand-ssd-in-the-market
test_url: http://www.anandtech.com/show/8400/arms-cortex-m-even-smaller-and-lower-power-cpu-cores

View File

@ -0,0 +1,5 @@
body: //div[@class='post_content']
date: //div[@class='date_day'] | div[@class='date_month']
test_url: http://www.androidpolice.com/2014/03/30/music-boss-for-pebble-can-now-control-playback-and-volume-on-chromecast-content-from-your-smartwatch/

16
inc/3rdparty/site_config/standard/andyrutledge.com.txt vendored Normal file → Executable file
View File

@ -1,9 +1,9 @@
title: //h2
author: string('Andy Rutledge')
date: //div[@class='articledate']
body: //div[@class='copybody']
strip: //*[@class='space']
strip: //*[@class='articleFoot']
title: //h2
author: string('Andy Rutledge')
date: //div[@class='articledate']
body: //div[@class='copybody']
strip: //*[@class='space']
strip: //*[@class='articleFoot']
test_url: http://www.andyrutledge.com/hungry-for-a-better-menu.php

14
inc/3rdparty/site_config/standard/annatravelling.wordpress.com.txt vendored Normal file → Executable file
View File

@ -1,9 +1,9 @@
title: //h1[@class="title"]
author: ("Anna Manasova")
# is ignored, unfortunately
date: //p[@class="date"]
title: //h1[@class="title"]
author: ("Anna Manasova")
# is ignored, unfortunately
date: //p[@class="date"]
body: //div[@class="entry"]
test_url: http://annatravelling.wordpress.com/2011/11/07/a-day-of-cooking-thai/

View File

@ -0,0 +1,23 @@
# Author: zinnober
prune: no
title: substring-before(//div[@id='content']/h1, ',')
single_page_link: //a[@title='Seite drucken']
body: //div[@id='detail-body']
replace_string(<span class="description">): <em>
replace_string(<p class="leadtext"><small>): <p class="leadtext">
# Fix headlines
replace_string(Patrick Hollstein): &nbsp;
replace_string(APOTHEKE ADHOC): &nbsp;
replace_string(dpa): &nbsp;
replace_string(Katharina Lübke): &nbsp;
replace_string(Julia Pradel): &nbsp;
replace_string(Franziska Gerhardt): &nbsp;
test_url: http://www.apotheke-adhoc.de/nachrichten/politik/nachricht-detail-politik/deutscher-apothekertag-antraege-gegen-lieferengpaesse-2/

34
inc/3rdparty/site_config/standard/applature.com.txt vendored Normal file → Executable file
View File

@ -1,18 +1,18 @@
title: //h1[contains(@class, 'title')#
body: //div[@id='mainContent']//div[contains(@class, 'section_content')] | //ul[@class='section_footer']
date: //div[@class='date']
strip_id_or_class: sharethis
strip_id_or_class: stats
strip_id_or_class: apply_form
strip_id_or_class: job_map
strip_id_or_class: respond
strip: //h1//span[@class='type']
strip: //li[@class='print' or @class='map']
replace_string(<ul class="section_footer" style="display): <ul class="section_footer" style="display-bla
prune: no
tidy: no
title: //h1[contains(@class, 'title')#
body: //div[@id='mainContent']//div[contains(@class, 'section_content')] | //ul[@class='section_footer']
date: //div[@class='date']
strip_id_or_class: sharethis
strip_id_or_class: stats
strip_id_or_class: apply_form
strip_id_or_class: job_map
strip_id_or_class: respond
strip: //h1//span[@class='type']
strip: //li[@class='print' or @class='map']
replace_string(<ul class="section_footer" style="display): <ul class="section_footer" style="display-bla
prune: no
tidy: no
test_url: http://applature.com/mining-jobs/jobs/nickel-west-leinster-analytical-laboratory-technician/

12
inc/3rdparty/site_config/standard/apple.com.txt vendored Normal file → Executable file
View File

@ -1,7 +1,7 @@
strip: //p[@class='sosumi']
# Aren't they witty?
# I can't work out what causes the  before the title.
title: //h1[@class='title']
strip: //h1[@class='title']
strip: //p[@class='sosumi']
# Aren't they witty?
# I can't work out what causes the  before the title.
title: //h1[@class='title']
strip: //h1[@class='title']
test_url: http://www.apple.com/pr/library/2011/02/15appstore.html

Some files were not shown because too many files have changed in this diff Show More