FileReader ascii and unicode


DevX Home    Today's Headlines   Articles Archive   Tip Bank   Forums   

Results 1 to 2 of 2

Thread: FileReader ascii and unicode

  1. #1
    Join Date
    Jan 2005
    Posts
    4

    FileReader ascii and unicode

    Hi Again

    Sorry I am so thick, when I am a java wizard I will answer questions instead of asking them !!

    This little snippit of example code duplicates text files, but on closer inspection it is pushing integer ascii values around rather than actual characters (c is an integer variable)

    I want to deal with a stream of actual characters rather than integers, how do I do that please ? or shall I just search the web for the java equivalent of the good old qbasic chr$() function ?

    I thought Java was unicode based, yet I have looked at the values of c in this proglet and they are plain old ascii, so what is going on I wonder ?? as an aside, how do I handle charactars as unicode values rather than 'x' style charactar literals ?

    import java.io.*;

    public class Copy {
    public static void main(String[] args) throws IOException {
    File inputFile = new File("sample.txt");
    File outputFile = new File("outagain.txt");

    FileReader in = new FileReader(inputFile);
    FileWriter out = new FileWriter(outputFile);
    int c;

    while ((c = in.read()) != -1)
    out.write(c);

    in.close();
    out.close();
    }
    }

  2. #2
    Join Date
    Sep 2004
    Posts
    150

    Re: FileReader ascii and unicode

    Originally posted by gogokev


    This little snippit of example code duplicates text files, but on closer inspection it is pushing integer ascii values around rather than actual characters (c is an integer variable)

    I want to deal with a stream of actual characters rather than integers, how do I do that please ? or shall I just search the web for the java equivalent of the good old qbasic chr$() function ?

    I thought Java was unicode based, yet I have looked at the values of c in this proglet and they are plain old ascii, so what is going on I wonder ?? as an aside, how do I handle charactars as unicode values rather than 'x' style charactar literals ?

    Unicode is basically Ascii on steroids (2 bytes for each character instead of one). They wanted to make a better system that could support more languages than just english. However they wanted backward compatiblility. So you'll find that the first 128 characters of Unicode are in fact the same 128 of Ascii. That's why they look the same, they are for those numbers.

    Handling text as "bytes" is considered archaic, and will not be very useful in the future (I imagine).

    You could open the file as binary, which then you can read and write bytes too. You would want to caste your chars or ints to bytes (just drops the extra byte for char, extra three bytes for int).

    Not sure why the file reader reads in ints (32 bits), but it does. The value that you get from it is essentially a 16 bit value as far as I know.

    edit: Last time I checked they are working on a Unicode with 4 bytes per character, one that could represent 2^32 (more than 4 billion) distinct characters and be able to support just about every language out there.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center
 
 
FAQ
Latest Articles
Java
.NET
XML
Database
Enterprise
Questions? Contact us.
C++
Web Development
Wireless
Latest Tips
Open Source


   Development Centers

   -- Android Development Center
   -- Cloud Development Project Center
   -- HTML5 Development Center
   -- Windows Mobile Development Center