FF string search "find in tab" ignores manual line break, good, can it also ignore Paragraph Mark, this would be wonderful.
It would be wonderful if FF "find in tab" would ignore the Paragraph Mark Sign. This is all I want - string search and Paragraph Marks ignored - professional search tools do this by default. Please help.
All Replies (15)
I'm not aware of any built-in feature for this. Maybe there's an add-on which has this capability??
Unfortunately feature suggestions tend to get lost in the support forum.
To get this where you want it to go, here are 3 options (you can use more than 1):
- Feedback site: https://input.mozilla.org/feedback
- Choose a mailing list that seems most appropriate: http://lists.mozilla.org/listinfo
- File a request for enhancement in the bug tracking system:
Here's something to experiment with: the built-in find will not cross certain boundaries, but if you hack the page, then you can search more freely. The appearance is somewhat wrecked, but it might be worth it in some cases. To test out this idea:
When viewing the page, open the web console in the lower part of the tab using either:
- Ctrl+Shift+k
- "3-bar" menu button > Developer > Web Console
- (menu bar) Tools menu > Web Developer > Web Console
At the bottom, next to the caret (>>), paste the following long line of code and press Enter:
var allblocks = document.querySelectorAll("p,div,h1,h2,h3"); for (var i=allblocks.length; i>0; i--){var spanNew = document.createElement("span"); if(allblocks[i-1].nodeName.substr(0,1)=="H") spanNew.setAttribute("style", "display:block; margin:1em 0; font-weight:bold;"); else spanNew.setAttribute("style", "display:block; margin:1em 0;"); allblocks[i-1].parentNode.insertBefore(spanNew, allblocks[i-1]); while (allblocks[i-1].childNodes.length>0){spanNew.appendChild(allblocks[i-1].childNodes[0]);} allblocks[i-1].remove();};
This should relocated the contents of div, p, and h1-h3 tabs into span elements which Find will search across. Is that useful at all?
Jefferson,
thank you for your replies - you are great as always. The issue with the paragraph mark is a little broader for me: I do have thousands of HTML files that I have converted from PDF to HTML for better accessibility and I do search 10ens of them at a time in a FF window, so hacking a specific page is unfortunately not practical. If you look closer to the paragraph mark issue it is actually a bug in FF - the "find in tabs" search algorithm is not reliable because of this limitation. In addition, if FF can ignore the manual line break there must be the possibility to ignore the paragraph mark. BTW, in each case where paragraph marks are used for formatting purposes in a web page this problem exists. Who owns the code for "find in tab"?
Oh, I see. The PDF-to-HTML converter probably only cares about appearance and isn't forming logical segments of text as you would find in normal web pages. I guess my question is, how bad is it? Could you paste a sample selection in a Pastebin and provide a link to it?
As for how it might be addressed, I would try the previous links (mailing list, Bugzilla) once we can articulate what needs to be changed.
I see you are smart and you understand the issue. The PDF converter puts a Paragraph Mark at the end of each line for formatting purposes - this is how bad it is. All that is required is the option that Find in Tabs ignores the Paragraph Mark.
Thank you for your help.
Okay, I think it would be useful for the benefit of developers who might look at this bug report to see some of the HTML that is generated.
It sounds as though you are saying every line might start and end with p tags:
<p>and some more text... ending here</p>
But the very specific details probably matter in assessing whether this is something that could be added quickly.
http://pastebin.com/ (no registration required)
Thank you very much for your reply.
I have copied a test file to the pastebin - I hope it works. But to give you better access to my problem here is the link to the product I use: http://www.investintech.com/prod_a2e_pro.htm able2extract professional 9 is, to my knowledge, the best PDF conversion product available. If you download a trial version you can produce authentic output. Thank you again.
Is the issue with any PDF or only with PDFs where the program performs OCR?
To share your Pastebin, you'll need to post the link (or maybe you intend to post it only in the bug report).
Thank you for your reply - the issue is with any PDF. Do you really want that I provide a link to my file system? I'm not so sure that I want to do this....
But, we are getting a little bit off topic and see the problem backwards: The issue is that FF "Find in Tabs" could easily "IGNORE PARAGRAPH MARKS" and the problem would be solved. Any other approach to solve the Paragraph Mark issue is much, much more complicated and will produce unpredictable results. I would really, really very much appreciate if one of the developers of "Find in Tabs" would simply insert a few lines of code and fix that FF bug. Thank you very much for your help and assistance.
I meant a link to the Pastebin you created. You don't have to give an example from a sensitive PDF. If it happens to all PDFs, or all PDFs of a particular type, you can pick any one you like, convert it, then view source of the converted page and paste that into a Pastebin. I guess my point is that having someone else convert something at random may not yield the specific HTML structure that is causing you a problem, which would be a waste of everyone's time.
I did as advised:
The source text is copied to pastebin with the name: PDF Paragraph Marks.
I hope I did it right.
Thank you for your time and help
Okay, I found it: http://pastebin.com/RtTUHAyr
Every line is a fragment encapsulated in two elements, a span and a div:
<div style="position:absolute;left:70.96px;top:187.80px" class="cls_006"><span class="cls_006">"Delivered at Frontier" means that the seller delivers when the goods are placed at the disposal</span></div> <div style="position:absolute;left:70.96px;top:200.52px" class="cls_006"><span class="cls_006">of the buyer on the arriving means of transport not unloaded, cleared for export, but not cleared</span></div>
Find does not cross over between two div's, just as it does not cross the boundary between two p's. So you would want to file a bug referring to the ability to search across both of these kinds of elements.
Thank you very much for the explanation and clarification. I opened an account in Bugzilla and filed a bug report under: Bug 1120148 - quick find search recognizes Paragraph Mark as character
I'm afraid I'm not particularly good at reporting this technical matter properly. Could you do me a great favor and file the appropriate bug report so that the developers understand what the issue is and can better address this issue? PLEASE!
Okay, I added a comment with a link to this thread for further background.
Thank you sooooooo much! I hope we are successful. Thank you.