Quantcast
Viewing all articles
Browse latest Browse all 1448

Using Line Input for Unicode (UTF-8 included), the fast way

For some days ago I was involved in the word count routines. I realize that a speed contest for some extra ms was not what we call productive. Productive can be a searching utility for those words. So I decide to make a search utility.
For that task I make some functions. One set of functions used for loading the text. Olaf’s Schmidt routine ReadUnicodeOrANSI was perfect to read any kind of text (as that text follows some rules). But I want the old fashion Line Input to fill my document class. So I make a combination. I use line input, for ANSI and Unicode LE16, BE16, UTF8, with my own buffer. I realize that using LOF or SEEK in each reading cost a huge delay. For using BINARY files VB didn’t give buffers, and that is right if you have to read and write in one session. But here we use binary file for read only, so we need a buffer. But this buffer maybe not as those from vb. We can use buffers with more length, here I use 327680 bytes for buffer.
For ANSI reading we need to read one byte from buffer. For LE16 or BE16 we need to read one word (2 bytes). For UTF8 is a little complicated, but we can found the end of line without parse the code that define if a char has one or two or more bytes length. Exactly we use a small parser that read bit 8 (&h80). If that is clear then this is a one byte char. Any other byte from any multi byte char has this bit set to 1. Because we have to get bytes and translate them to LE16, we place any char to a second buffer, and then we do the translation.
The second set of functions is for INSTR and INSTRREV with any LCID (locale ID). Because we want to use VBtextcompare with any LCID, we have to make our own routines (For OS better from Windows XP there are API for that but we can easy make what we want…as you can see in the source)

You need the bible13.txt, (it is in ANSI fromat but you can open it and save again from notepad, using any unicode fromat)
Info for other wordcount routines here

Image may be NSFW.
Clik here to view.
Name:  find.jpg
Views: 49
Size:  41.3 KB


In the text box we see a preview of a maximum of 9 lines (we don't feed textbox with all text)

A docpara = 0 needed if we want to place new document in the doc object
So change this Sub in Document object
Quote:

Sub EmptyDoc()
delDocPara = 0
docpara = 0
docmax = 20 '
ReDim para(1 To docmax)
ReDim DocParaBack(0 To docmax)
ReDim DocParaNext(0 To docmax)
End Sub
Attached Images
Image may be NSFW.
Clik here to view.
 
Attached Files
  • Image may be NSFW.
    Clik here to view.
    File Type: zip
    Form01.zip (12.5 KB)

Viewing all articles
Browse latest Browse all 1448

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>