Perl soared to popularity as a language for creating and managing web content, but with LWP (Library for WWW in Perl), Perl is equally adept at consuming information on the Web. LWP is a suite of modules for fetching and processing web pages. The Web is a vast data source that contains everything from stock prices to movie credits, and with LWP all that data is just a few lines of code away. Anything you do on the Web, whether it's buying or selling, reading or writing, uploading or downloading, news to e-commerce, can be controlled with Perl and LWP. You can automate Web-based purchase orders as easily as you can set up a program to download MP3 files from a web site. Perl & LWP covers: Understanding LWP and its design Fetching and analyzing URLs Extracting information from HTML using regular expressions and tokens Working with the structure of HTML documents using trees Setting and inspecting HTTP headers and response codes Managing cookies Accessing information that requires authentication Extracting links Cooperating with proxy caches Writing web spiders (also known as robots) in a safe fashion Perl & LWP includes many step-by-step examples that show how to apply the various techniques. Programs to extract information from the web sites of BBC News, Altavista, ABEBooks.com, and the Weather Underground, to name just a few, are explained in detail, so that you understand how and why they work. Perl programmers who want to automate and mine the web can pick up this book and be immediately productive. Written by a contributor to LWP, and with a foreword by one of LWP's creators, Perl & LWP is the authoritative guide to this powerful and popular toolkit.
This book can teach you expert-level web scraping/munging.
Published by Thriftbooks.com User , 21 years ago
If you aren't yet comfortable using object-oriented Perl modules, the multitude of examples will at least allow you see how it's done even if you're a bit fuzzy on what's happening 'underneath' when you call object methods. If you're comfortable learning how to do something without knowing exactly why it works, then the author's clear step-by-step explantions and numerous progressively more powerful examples should make this book accessible even to relatively innexperienced Perl programmers.More experienced programmers will understand better why things work, but any Perl programmer will set this book down feeling empowered to turn the web into their own valet. No longer do you need to check multiple sites looking for interesting information. Instead, you can readily author code to do that for you and alert you when items of interest are found. You can use these tools to free up personal time, to harvest information to inform business decisions, to automate tedious web application testing, and a zillion other things.The author's clear exploration of the relevant Perl modules leaves the reader with a good depth of understanding of what these modules do, when you might want to use which module, and how to use them for real world tasks. Before reading the book, I knew of these modules, but they were a rather intimidating pile. I'd used a few of them on occasion for rather limited projects, but was reluctant to invest the time required to read all of the documentation from the whole collection. Mountains of method-level documentation do not a tutorial make. This book takes all of that information, selects the most important parts, and ensures that those parts are covered in progressively more powerful and/or flexible examples.If you know Perl and you're sick of 'working the web' to get information and you want the web to work for you instead, then you need this book. I had a personal project that was on the back burner for a couple of years because it just sounded too hard. The weekend after I finished this book, I wrote what I had previously thought to be the hard part of that project and it was both easy and fun. This book makes hard things not just possible, but actually easy.-matt
Fabulous book!
Published by Thriftbooks.com User , 22 years ago
This book is a comprehensive and authoritative guide to web automation. It reads as both a gentle tutorial and a well organized reference. Basic HTTP operation, regexp HTML parsing, tokenizing, cookie authentication, form handling, and robot spidering are covered extensively in numerous case studies and practical examples.Naturally, I was impressed by the simple, consistent treatment of examples: inspect source and find the interesting bits, code things up and then enhance to suit. :-)A particularly satisfying thing to me is the sane way of working, that the author assumes. So many people seem to just bungle their way through web programming while ignoring basics like the robots.txt file. This book helps to prevent this.One would think that only a thick tome would be sufficient to cover such vast territory, but the author (who is an active LWP module developer) does a fabulous job covering this extensive subject matter.I recommend this book both to anyone starting out on their way to working with the underside of the web and to accomplished professionals in need of a full reference manual.
Very Informative and useful
Published by Thriftbooks.com User , 22 years ago
As a web programmer, I had dealt with several such projects dealing with web automation and writing simple crawlers even before I read "Perl & LWP". The book was the first book I've read on the subject, and I'm by no means disappointed. The book is very well organized, very informative and nails the subject in the head. I am pleased. I noticed some inaccuracies in the discussions, some chopped off paragraphs and sentences. But this doesn't affect the usability of the book much. Author Sean Burke does a great job in walking one through the most of the aspects of web automation and data extraction in the web using Perl and LWP (libwww in Perl ). The codes the book gives are very well organized, well written and easily debugable. The steps are pretty consistent across all the examples: a) Inspect the HTML source code of the page; b) Determine the tokens and patterns of interest; c) Write the first code; d) Fine tune the code;As usual, I'll be commenting on individual chapters to give you an idea of thecoverage of the book in more details...
Excellent coverage of LWP, packed full of useful examples
Published by Thriftbooks.com User , 22 years ago
I was definitely interested when I first heard that O'Reilly were publishing a book on LWP. LWP is a definitive collection of perl modules covering everything you could think of doing with URIs, HTML, and HTTP. While 'web services' are the buzzword friendly technology of the day, sometimes you need to roll your sleeves up and get a bit dirty scraping screens and hacking at HTML. For such a deep subject, this book weighs in at a slim 242 pages. This is a very good thing. I'm far too busy to read these massive shelf-destroying tomes that seem to be churned out recently. It covers everything you need to know with concise examples, which is what makes this book really shine. You start with the basics using LWP::Simple through to more advanced topics using LWP::UserAgent, HTTP::Cookies, and WWW::RobotRules. Sean shows finger saving tips and shortcuts that take you more than a couple notches above what you can learn from the lwpcook manpage, with enough depth to satisfy somebody who is an experienced LWP hacker. This book is a great reference, just flick through and you'll find a relevant chapter with an example to save the day. Chapters include filling in forms and extracting data from HTML using regular expressions, then more advanced topics using HTML::TokeParser, and then my preferred tool, the author's own HTML::TreeBuilder. The book ends with a chapter on spidering, with excellent coverage of design and warnings to get your started on your web trawling.
A must-read for exploiting the web in a GOOD way.
Published by Thriftbooks.com User , 22 years ago
A great book for anyone who wishes to automate daily tasks on the web. Sean does an outstanding job of showing how Perl can be used to extract and manipulate not just data but useful information efficiently from the web's vast data resources. I've already adapted an example from this book (link-checking spider) for sites I maintain. Yes, I've known of the LWP module prior to this book. But as a lazy programmer, I rely on others to show me the way. Sean does just that...
ThriftBooks sells millions of used books at the lowest everyday prices. We personally assess every book's quality and offer rare, out-of-print treasures. We deliver the joy of reading in recyclable packaging with free standard shipping on US orders over $15. ThriftBooks.com. Read more. Spend less.