-
How to detect UTF-8-based encoded strings
Hi
A customer of asked us to build him a multi-language based support VB6 scraper, for which we had the need to detect UTF-8 based encoded strings to decode it later for proper displaying in application UI. It's necessary to point out that this need arises based on VB6 limitations to natively support UTF-8 in its controls, contrary to what it happens in .NET where you can tell a control that it should expect UTF-8 encoding. VB6 natively supports ISO 8859-1 and/or Windows-1252 encodings only, for which textboxes, dropdowns, listview controls, others can't be defined to natively support/expect UTF-8 as you can do in .NET considering what we just explained; so we would see weird symbols such as é, è among others, making it a whole mess at the time of displaying.
So, next function contains whole UTF-8 encoded punctuation marks and symbols from languages like Spanish, Italian, German, Portuguese, French and others, based on an excellent UTF-8 based list we got from this link - Ref. http://home.telfort.nl/~t876506/utf8tbl.html
Basically, the function compares if each and one of the listed UTF-8 encoded sentences, separated by | (pipe) are found in our passed string making a substring search first. Whether it's not found, it makes an alternative ASCII value based search to get a match. Say, a string like "Societ" (Society in english) would return FALSE through calling isUTF8("Societ") while it would return TRUE when calling isUTF8("Societˆ") since ˆ is the UTF-8 encoded representation of .
Once you got it TRUE or FALSE, you can decode the string through DecodeUTF8() function for properly displaying it, a function we found somewhere else time ago and also included in this post.
Code:
Function isUTF8(ByVal ptstr As String)
Dim tUTFencoded As String
Dim tUTFencodedaux
Dim tUTFencodedASCII As String
Dim ptstrASCII As String
Dim iaux, iaux2 As Integer
Dim ffound As Boolean
ffound = False
ptstrASCII = ""
For iaux = 1 To Len(ptstr)
ptstrASCII = ptstrASCII & Asc(Mid(ptstr, iaux, 1)) & "|"
Next
tUTFencoded = "„|…|‡|‰|‘|–|Œ|á||â|ä|ã|å|ç|é|è|ê|ë|*|ì|î|ï|ñ|ó|ò|ô|ö|õ|ú|ù|û|ü|€|°|¢|£|§|€|¶|Ÿ|®|©|„|´|¨|‰|†|˜|ˆž|±|‰|‰|¥|µ|ˆ‚|ˆ‘|ˆ|€|ˆ|ª|º|Ω|æ|ø|¿|¡|¬|ˆš|’|‰ˆ|ˆ†|«|»|€||€|ƒ|•|’|“|€“|€”|€œ|€|€˜|€™|÷|—Š|ÿ|Ÿ|„|‚|€|€|fi|‚|€|·|€š|€ž|€|‚|š|Á|‹|ˆ|Í|Ž|Ï|Œ|“|”||’|š|›|™|ı|†|œ|¯|˜|™|š|¸|˝|›|‡" & _
"|š|¦|²|³|¹|¼|½|¾|Ð|—|Ý|ž|ð|ý|þ" & _
"‰|ˆž|‰|‰|ˆ‚|ˆ‘|ˆ|€|ˆ|Ω|ˆš|‰ˆ|ˆ†|—Š|„|fi|‚||ı|˜|™|š|˝|›|‡"
tUTFencodedaux = Split(tUTFencoded, "|")
If UBound(tUTFencodedaux) > 0 Then
iaux = 0
Do While Not ffound And Not iaux > UBound(tUTFencodedaux)
If InStr(1, ptstr, tUTFencodedaux(iaux), vbTextCompare) > 0 Then
ffound = True
End If
If Not ffound Then
'ASCII numeric search
tUTFencodedASCII = ""
For iaux2 = 1 To Len(tUTFencodedaux(iaux))
'gets ASCII numeric sequence
tUTFencodedASCII = tUTFencodedASCII & Asc(Mid(tUTFencodedaux(iaux), iaux2, 1)) & "|"
Next
'tUTFencodedASCII = Left(tUTFencodedASCII, Len(tUTFencodedASCII) - 1)
'compares numeric sequences
If InStr(1, ptstrASCII, tUTFencodedASCII) > 0 Then
ffound = True
End If
End If
iaux = iaux + 1
Loop
End If
isUTF8 = ffound
End Function
Function DecodeUTF8(s)
Dim i
Dim c
Dim n
s = s & " "
i = 1
Do While i <= Len(s)
c = Asc(Mid(s, i, 1))
If c And &H80 Then
n = 1
Do While i + n < Len(s)
If (Asc(Mid(s, i + n, 1)) And &HC0) <> &H80 Then
Exit Do
End If
n = n + 1
Loop
If n = 2 And ((c And &HE0) = &HC0) Then
c = Asc(Mid(s, i + 1, 1)) + &H40 * (c And &H1)
Else
c = 191
End If
s = Left(s, i - 1) + Chr(c) + Mid(s, i + n)
End If
i = i + 1
Loop
DecodeUTF8 = s
End Function
Hope it helps
Regards
Diego Sendra
e-mail: contact@diegosendra.com
http://www.diegosendra.com
*Please note you have to download the function from http://www.diegosendra.com/samples/c...VB6_isUTF8.txt considering some of the UTF encoded symbols in tUTFencoded variable were lost/deleted at the time of copy/pasting the code into this thread
Last edited by diebythesword76; 06-24-2013 at 08:27 PM.
Similar Threads
-
By DotnetRaji in forum .NET
Replies: 0
Last Post: 05-11-2009, 05:22 AM
-
Replies: 1
Last Post: 10-29-2007, 09:06 AM
-
Replies: 0
Last Post: 06-20-2007, 11:29 AM
-
By ksrs_kak in forum .NET
Replies: 0
Last Post: 06-20-2007, 11:27 AM
-
By anonymous in forum VB Classic
Replies: 0
Last Post: 05-12-2003, 05:38 PM
Tags for this Thread
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Development Centers
-- Android Development Center
-- Cloud Development Project Center
-- HTML5 Development Center
-- Windows Mobile Development Center
|