Skip to content
GitLab
About GitLab
GitLab: the DevOps platform
Explore GitLab
Install GitLab
How GitLab compares
Get started
GitLab docs
GitLab Learn
Pricing
Talk to an expert
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Projects
Groups
Snippets
Sign up now
Login
Sign in / Register
Toggle navigation
Menu
Open sidebar
Ben Gamari
html-parse
Commits
c2d36f08
Commit
c2d36f08
authored
Oct 07, 2017
by
Ben Gamari
Browse files
Add some typical performance numbers
parent
ff8bcf57
Changes
2
Hide whitespace changes
Inline
Side-by-side
README.mkd
View file @
c2d36f08
...
...
@@ -12,3 +12,29 @@ For instance,
>>>
parseTokens
"<div><h1>Hello World</h1><br/><p class=widget>Example!</p></div>"
[
TagOpen
"div"
[]
,
TagOpen
"h1"
[]
,
ContentText
"Hello World"
,
TagClose
"h1"
,
TagSelfClose
"br"
[]
,
TagOpen
"p"
[
Attr
"class"
"widget"
],
ContentText
"Example!"
,
TagClose
"p"
,
TagClose
"div"
]
```
## Performance
Here are some typical performance numbers taken from parsing a fairly
long
[
Wikipedia article
](
https://en.wikipedia.org/wiki/New_York_City
)
,
```
benchmarking Forced/tagsoup fast Text
time 171.2 ms (166.4 ms .. 177.3 ms)
0.999 R² (0.997 R² .. 1.000 R²)
mean 171.9 ms (169.4 ms .. 173.2 ms)
std dev 2.516 ms (1.104 ms .. 3.558 ms)
variance introduced by outliers: 12% (moderately inflated)
benchmarking Forced/tagsoup normal Text
time 176.9 ms (167.3 ms .. 188.5 ms)
0.998 R² (0.994 R² .. 1.000 R²)
mean 180.7 ms (177.5 ms .. 183.7 ms)
std dev 4.246 ms (2.316 ms .. 5.803 ms)
variance introduced by outliers: 14% (moderately inflated)
benchmarking Forced/html-parser
time 20.88 ms (20.60 ms .. 21.25 ms)
0.999 R² (0.998 R² .. 0.999 R²)
mean 20.99 ms (20.81 ms .. 21.20 ms)
std dev 446.1 μs (336.4 μs .. 596.2 μs)
```
html-parse.cabal
View file @
c2d36f08
...
...
@@ -6,15 +6,42 @@ description:
upon the @attoparsec@ library. The parsing strategy is based upon the HTML5
parsing specification with few deviations.
.
For instance,
.
>>> parseTokens "<div><h1 class=widget>Hello World</h1><br/>"
[TagOpen "div" [],
TagOpen "h1" [Attr "class" "widget"],
ContentText "Hello World",
TagClose "h1",
TagSelfClose "br" []]
.
The package targets similar use-cases to the venerable @tagsoup@ library,
but is significantly more efficient, achieving parsing speeds of over 50
megabytes per second on modern hardware with and typical web documents.
megabytes per second on modern hardware and typical web documents.
Here are some typical performance numbers taken from parsing a Wikipedia
article of moderate length:
.
For instance,
@
benchmarking Forced/tagsoup fast Text
time 171.2 ms (166.4 ms .. 177.3 ms)
0.999 R² (0.997 R² .. 1.000 R²)
mean 171.9 ms (169.4 ms .. 173.2 ms)
std dev 2.516 ms (1.104 ms .. 3.558 ms)
variance introduced by outliers: 12% (moderately inflated)
.
>>> parseTokens "<div><h1 class=widget>Hello World</h1><br/>"
[TagOpen "div" [],TagOpen "h1" [Attr "class" "widget"],
ContentText "Hello World",TagClose "h1",TagSelfClose "br" []]
benchmarking Forced/tagsoup normal Text
time 176.9 ms (167.3 ms .. 188.5 ms)
0.998 R² (0.994 R² .. 1.000 R²)
mean 180.7 ms (177.5 ms .. 183.7 ms)
std dev 4.246 ms (2.316 ms .. 5.803 ms)
variance introduced by outliers: 14% (moderately inflated)
.
benchmarking Forced/html-parser
time 20.88 ms (20.60 ms .. 21.25 ms)
0.999 R² (0.998 R² .. 0.999 R²)
mean 20.99 ms (20.81 ms .. 21.20 ms)
std dev 446.1 μs (336.4 μs .. 596.2 μs)
@
homepage: http://github.com/bgamari/html-parse
license: BSD3
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment