How to Extract all Images from a Webpage with Ruby
Here is a little ruby snippet that will download all pictures from a webpage.
Rather than using XPath, we are going to first reduce the source code to capture everything inside of quotes. Some websites use JSON w/in a script tag to lazy load images an...
Written by Sean Behan on 08/01/2018
Regex for Extracting URLs in Plain Text
Here is a Regex for extracting URLs from text. However, these links will not already be hyperlinked or source attribtues from images or iframes.
This example is in PHP. I was trying to format a Wordpress page to auto hyperlink but preserve embeded ima...
Written by Sean Behan on 04/14/2017
Matching email addresses in Javascript
Matching email addresses in Javascript
regex = /\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b/img
"hello sean@example.com how are you? do you know bob@example.com?".match(regex)
// => ["sean@example.com", "bob@example.com"]
Written by Sean Behan on 03/24/2017
How to Create a Slug in Python with the Re Module
There are a few 3rd party modules that do this sort of thing. But there is a pretty solution using out of the box Python functionality. You don't have to install any dependencies if you use the `re` module.
import re
text = ' asdfladf ljklasfj 2324...
Written by Sean Behan on 03/02/2017
A Ruby Regex for Removing Links and Images from Text
r = /https?:\/\/[\S]+/i
you_string.gsub(r, '')
Here's the rubular regex to play around with yourself http://rubular.com/r/SRKkYrW4IJ
Written by Sean Behan on 11/14/2013
Email Regex
Regular Expression that Matches Email Addresses:
/\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b/
Written by Sean Behan on 06/17/2012
Absolutize Relative Links Using PHP and Preg_Replace_Callback
I was in the market for a simple php script to replace hrefs with their absolute paths from scraped web pages. I wrote one myself. I used the preg_replace_callback function so that I could pass the parsed results as a single variable.
<?php
$domain =...
Written by Sean Behan on 06/17/2012
Regular Expression for finding absolute URLs
Regular Expression for finding absolute URLs in a bunch of text... like a log file.
/(http:(.*?)\s)/
Written by Sean Behan on 06/17/2012
Email Obfuscation and Extraction from Text with Rails
There is a helper method for handling the obfuscation of email addresses in Rails.
mail_to "me@domain.com", "My email", :encode => "hex"
# => My email
If you want to then extract an email address(or all email addresses) from a block of text here is the...
Written by Sean Behan on 06/17/2012
Parse for Links with Prototype JS
Parsing for links with the Prototype javascript library is easy. Here is the pattern for finding links
/(http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^
=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?/
And to implement it you can loop through your con...
Written by Sean Behan on 06/17/2012