GitHub - philipjkim/goreadability: Webpage summary extractor using Facebook Open Graph and arc90's readability (original) (raw)
goreadability is a tool for extracting the primary readable content of a webpage. It is a Go port of arc90's readability project, based on ruby-readability.
From v2.0 goreadability uses opengraph tag values if exists. You can disable opengraph lookup and follow the traditional readability rules by setting Option.LookupOpenGraphTags to false.
// URL to extract contents (title, description, images, ...) url := "https://en.wikipedia.org/wiki/Lego"
// Default option opt := readability.NewOption()
// You can modify some option values if needed. opt.ImageRequestTimeout = 3000 // ms
content, err := readability.Extract(url, opt) if err != nil { log.Fatal(err) }
log.Println(content.Title) log.Println(content.Description) log.Println(content.Images)