GitHub - philipjkim/goreadability: Webpage summary extractor using Facebook Open Graph and arc90's readability (original) (raw)

goreadability is a tool for extracting the primary readable content of a webpage. It is a Go port of arc90's readability project, based on ruby-readability.

From v2.0 goreadability uses opengraph tag values if exists. You can disable opengraph lookup and follow the traditional readability rules by setting Option.LookupOpenGraphTags to false.

// URL to extract contents (title, description, images, ...) url := "https://en.wikipedia.org/wiki/Lego"

// Default option opt := readability.NewOption()

// You can modify some option values if needed. opt.ImageRequestTimeout = 3000 // ms

content, err := readability.Extract(url, opt) if err != nil { log.Fatal(err) }

log.Println(content.Title) log.Println(content.Description) log.Println(content.Images)