mirror of
https://github.com/moparisthebest/wallabag
synced 2024-12-26 01:09:19 -05:00
d59536deea
Includes next_page_link for multi-page articles and strips pesky in-line 'next' links from the article body. Also includes an Xpath for author but I can't see where this is used in the wallabag UI. The 'tidy' option is turned off because it messed up bulleted lists. Tested with psychology.about.com and food.about.com.
15 lines
336 B
Plaintext
15 lines
336 B
Plaintext
body: //div[@id='articlebody']
|
|
title: //h1
|
|
author: //p[@id='by']//a
|
|
|
|
next_page_link: //span[@class='next']/a
|
|
# Not the same as below!
|
|
|
|
prune: yes
|
|
tidy: no
|
|
|
|
# Annoying 'next' links plainly inside the article body
|
|
strip: //*[text()[contains(.,'Next: ')]]
|
|
|
|
test_url: http://psychology.about.com/od/theoriesofpersonality/ss/defensemech.htm
|