Importing Excel Data


DevX Home    Today's Headlines   Articles Archive   Tip Bank   Forums   

Results 1 to 11 of 11

Thread: Importing Excel Data

  1. #1
    Join Date
    Feb 2010
    Location
    Central WV, USA
    Posts
    212

    Importing Excel Data

    What the best way to import Excel data into a VB.net collection. I have many items and 12 months of data for each item in colums. Do I have to name the ranges first in excel?

  2. #2
    Join Date
    Feb 2004
    Location
    Longueuil, Québec
    Posts
    577
    Hi rrjii.

    I am back in Quebec city, but not at the same hotel as last week, and they have a good Internet connection, so I might be able to drop in once or twice during the week.

    But my portable is an old thing that is not able to run Visual Studio, so my answers will be from memory. Be aware that if you try what I tell you, you will probably have to correct a few things. But the technique will be there.

    As you suggest, working with ranges makes things easier, specially if the format of the Excel document changes in time (Adding or removing sheets, adding or removing columns). A piece of code such as WorkBook(1).WorkSheets(3).Columns(10) can end up all messed up if the sheet or the column is moved, while WorkBook(1).WorkSheets("Sales").Columns("Tax") will work as long as the "Sales" worksheet and the "Tax" column are not deleted and still contain the proper data.

    And if the rows and columns containing the data are all bunched together, it would also be useful to create a named range with the block of data, excluding the headers and everything that is not purely data, so that you could use code similar to the following one to move through the data.

    Given an Excel file named "TheMonth.xls", in which you have a range named "MonthlySales" that holds all the data:

    Code:
    Dim app As Microsoft.Office.Interop.Excel.Application 'Not sure of the order of the words in the namespace
    Dim book As Microsoft.Office.Interop.Excel.WorkBook
    Dim range As Microsoft.Office.Interop.Excel.Range
    Dim oneObject As YourObject
    Dim yourCollection As New YourCollection
    
    app = New Microsoft.Office.Interop.Excel.Application
    book = app.WorkBooks.Open("TheMonth.xls")
    range = book.Ranges("MonthlySales")
    
    For row As Integer = 1 To range.Rows.Count
        oneObject = New YourObject
        'Maybe the row comes after the column in the following, I do not remember
        oneObject.Property1 = CInt(range.Cells(row,1).Value)
        oneObject.Property2 = CDate(range.Cells(row,2).Value)
        '... and so on, adjusting for the type of each property
        yourCollection.Add(oneObject)
    Next
    
    app.Quit()
    If you decided to go with a named range for each column, the code will be a little more verbose and a little slower, but it will help if the sheet format is changed in the future.

    This would look something like this:

    Code:
    Dim app As Microsoft.Office.Interop.Excel.Application 'Not sure of the order of the words in the namespace
    Dim book As Microsoft.Office.Interop.Excel.WorkBook
    Dim range As Microsoft.Office.Interop.Excel.Range
    Dim oneObject As YourObject
    Dim yourCollection As YourCollection
    
    app = New Microsoft.Office.Interop.Excel.Application
    book = app.WorkBooks.Open("TheMonth.xls")
    range = book.Ranges("MonthlySales")
    
    For row As Integer = 1 To range.Rows.Count
     oneObject = New YourObject
        'Maybe the row comes after the column in the following, I do not remember
        oneObject.Property1 = CInt(range.Cells(row,Range("NameOfColumn1").Column).Value))
        oneObject.Property2 = CDate(range.Cells(row,row,Range("NameOfColumn2").Column).Value))
    Next
    
    app.Quit()
    As stated earlier, take this as a guide. Without the applications on my portable, I cannot check the validity of what I wrote here, but I should be close. This should give you the basic idea.

    Have a good week

    And what are you doing, working on Sunday .
    Jacques Bourgeois
    JBFI
    http://www3.sympatico.ca/jbfi/homeus.htm

  3. #3
    Join Date
    Feb 2010
    Location
    Central WV, USA
    Posts
    212
    Thanks, I'm going to work with it today and see if I can get some results.

    Sundays? It seems my work is never done. And if it is done, I can dream up more rather quickly.


    Later...

    Do I need to add any references or add any Import statements at the top, because the namespaces (I think that what they're called) don't seem to come together. You start with Microsoft then you have (C#, SQL, VB, win32) then it just forks out into infinity.


    I also found this thread, it's looks more complicated, and it has the same namespace problems when you paste it in your Editor.

    http://forums.devx.com/showthread.php?t=155202


    I'll continue looking, and if I get it figured out, I'll post the answer.


    ..............

    Ok, I've had a little luck

    1st GOTO your "Add Referance" and add the MS.office.excel.iterop
    2nd add "Imports Microsoft.Office.Interop" at the top of your form

    ...Proceeding from here...
    Last edited by rrjii2000; 03-22-2010 at 03:25 PM.

  4. #4
    Join Date
    Feb 2010
    Location
    Central WV, USA
    Posts
    212
    Can you elaborate, when you get time, on the object (oneObject) that you're using to represent the data that will be collected? In my case I have a wellname and 12 columns of data. So I'm going to select (by row) the 13 colums then name the range.

    I'm assuming the object your refering to above will hold this row of data, and allow it to be broken down into individual cells. But I'm not sure how to go about defining the object, will I need to make a class that somehow defines the worksheet format? And should all the cells be read as strings then converted, or does it matter?




    ...Later on , Ok I think I see what your doing

    Your accessing it by property, whereas my defined ranges will contain all 13 properties.

    I would like to be able to grap the whole row with my predefined class (a property for each column) and then immediately be able to reference it by accessing my class properties.
    This would be done one row at a time, and immediately added to my collection. I have the class setup now, but I'm still unable to make it work as i've described.
    Last edited by rrjii2000; 03-22-2010 at 04:05 PM.

  5. #5
    Join Date
    Feb 2010
    Location
    Central WV, USA
    Posts
    212
    While not exactely as I invisioned it, it does work. I decided to import by row number, and NOT by naming my ranges. I've got 8 years of data that would need to be named, and after some thought that was really not an option. So I'm just grabbing raw data and sorting it out after the fact.

    Code:
            Dim oXL As Microsoft.Office.Interop.Excel.Application = New Microsoft.Office.Interop.Excel.Application
            Dim oWB As Microsoft.Office.Interop.Excel.Workbook '= app.Workbooks.Open(Trim(txtFileName.Text))
            Dim osheet As Microsoft.Office.Interop.Excel.Worksheet
            Dim oRange As Range
    
    
            oXL = New ApplicationClass()
            oWB = oXL.Workbooks.Open(txtFileName.Text)
    
    
            osheet = CType(oWB.Sheets(1), Microsoft.Office.Interop.Excel.Worksheet)
            Dim ds As New DataSet()
    
            Dim sb As New StringBuilder()
            Dim jValue As Integer = osheet.UsedRange.Cells.Columns.Count
            Dim iValue As Integer = osheet.UsedRange.Cells.Rows.Count
    
            For i As Integer = 1 To iValue ' Row count
                Dim NewXL As New ExcelPWS
    
                lblStatus.Text = "Importing Row # " & i
                Me.Refresh()
    
                oRange = CType(osheet.Cells(i, 1), Microsoft.Office.Interop.Excel.Range)
                NewXL.API = oRange.Text.ToString
    
                oRange = CType(osheet.Cells(i, 2), Microsoft.Office.Interop.Excel.Range)
                NewXL.WellNum = oRange.Text.ToString
    
                oRange = CType(osheet.Cells(i, 3), Microsoft.Office.Interop.Excel.Range)
                NewXL.WellName = oRange.Text.ToString
    
                oRange = CType(osheet.Cells(i, 4), Microsoft.Office.Interop.Excel.Range)
                NewXL.RigWork = oRange.Text.ToString
    
                oRange = CType(osheet.Cells(i, 5), Microsoft.Office.Interop.Excel.Range)
                NewXL.HECPumper = oRange.Text.ToString
    
                oRange = CType(osheet.Cells(i, 6), Microsoft.Office.Interop.Excel.Range)
                NewXL.ReadingType = oRange.Text.ToString
    
                oRange = CType(osheet.Cells(i, 7), Microsoft.Office.Interop.Excel.Range)
                NewXL.PrevYRAve = oRange.Text.ToString
    
                oRange = CType(osheet.Cells(i, 8), Microsoft.Office.Interop.Excel.Range)
                NewXL.Jan = oRange.Text.ToString
    
                oRange = CType(osheet.Cells(i, 9), Microsoft.Office.Interop.Excel.Range)
                NewXL.Feb = oRange.Text.ToString
    
                oRange = CType(osheet.Cells(i, 10), Microsoft.Office.Interop.Excel.Range)
                NewXL.Mar = oRange.Text.ToString
    
                oRange = CType(osheet.Cells(i, 11), Microsoft.Office.Interop.Excel.Range)
                NewXL.Apr = oRange.Text.ToString
    
                oRange = CType(osheet.Cells(i, 12), Microsoft.Office.Interop.Excel.Range)
                NewXL.May = oRange.Text.ToString
    
                oRange = CType(osheet.Cells(i, 13), Microsoft.Office.Interop.Excel.Range)
                NewXL.Jun = oRange.Text.ToString
    
                oRange = CType(osheet.Cells(i, 14), Microsoft.Office.Interop.Excel.Range)
                NewXL.Jly = oRange.Text.ToString
    
                oRange = CType(osheet.Cells(i, 15), Microsoft.Office.Interop.Excel.Range)
                NewXL.Aug = oRange.Text.ToString
    
                oRange = CType(osheet.Cells(i, 16), Microsoft.Office.Interop.Excel.Range)
                NewXL.Sep = oRange.Text.ToString
    
                oRange = CType(osheet.Cells(i, 17), Microsoft.Office.Interop.Excel.Range)
                NewXL.Oct = oRange.Text.ToString
    
                oRange = CType(osheet.Cells(i, 18), Microsoft.Office.Interop.Excel.Range)
                NewXL.Nov = oRange.Text.ToString
    
                oRange = CType(osheet.Cells(i, 19), Microsoft.Office.Interop.Excel.Range)
                NewXL.Dec = oRange.Text.ToString
    
                oRange = CType(osheet.Cells(i, 20), Microsoft.Office.Interop.Excel.Range)
                NewXL.Total = oRange.Text.ToString
    
                lstImport.Items.Add(NewXL.WellName)
                XLCollection.Add(NewXL)
            Next i
    
            Dispose()
    End sub

  6. #6
    Join Date
    Feb 2010
    Location
    Central WV, USA
    Posts
    212
    I think I've screwed around and got one of my Excel file locked into ReadOnly by accessing it with VB. Whats the best way to fix this? And is there a sure fire way to end my sub that ensures the file is closed?

  7. #7
    Join Date
    Feb 2004
    Location
    Longueuil, Québec
    Posts
    577
    Hi.

    A little break.

    A lot of stuff since the last time, but by what I see, you were able to figure it out.

    -----

    You found the Interop. Good. I though you had already began programming on the thing, and knew about that, so I did not care to talk about that il my last post.

    Excel, as any Office program, is not a .NET application. It is an ActiveX/COM application, what we used to do before .NET (in Visual Basic 6 for instance).

    When they designed .NET, Microsoft worked hard to make it more secure and to make better use of the memory than the old stuff did (they have more experience on those subject after so many years in the field).

    Because .NET and ActiveX/COM are not designed the same way, a .NET program cannot access directly an ActiveX/COM program. It needs a kind of translator that can be used as a bridge between the 2 worlds. That is the role of the Interop. You need one any time you need to communicate between a .NET application and an old (ActiveX/COM) application.

    -----

    I have not seen your sheet, but if it built like most sheets, and it looks as it is by your code, the row is the equivalent of an object (the oneObject in my example, it would be a Well in your case if you have a class that defines a well. Is that what ExcelPWS is? That name if meaningless to me). The column is the equivalent of a property.

    You are right that giving a ranged name to each row would be too much, specially since the number of rows will probably change with time.

    If you look carefully at my last code, you will also see that I do reference the rows by number instead of by the row. It's the columns that I treat as a Names range. So if a column is ever inserted in the middle, calling it by its name will still work. Calling it by its number would not work because column 9 could become column 10. You would have to make changes in the code, and that could lead to a lot of work or to new bugs.

    -----

    For the readonly problem, I am not sure, but I think I see why.

    You do not close the Excel and/or the Excel application. You should do so with the following code:

    Code:
    oXL.Quit()
    Calling Dispose as you do is not a good thing, and it does not closes Excel and/or the workbook. If you call the Task Manager (Ctrl-Alt-Del), and look in the processes tab, you probably have one or many copies of Excel running. Those are the Excel applications you started with New Microsoft.Office.Interop.Excel.Application. You did not close them, so they are still opened. That means that the file is still opened in one of those copy of Excel, and by default, an Excel workbook is locked in read-only mode when it is already opened by someone or by an other copy of Excel.

    You must close Excel with a Quit command, as I do in the above line.

    Calling Dispose is useless in most instances. In some cases, it can also cause unexpected problems. Without making it too complex, because is it complex, Dispose is usually called by the class itself when it needs to free some of the memory it used.

    Although I cannot know exactly by the code you are sending, I suspect that the Sub is in a form, and that you are trying to close the form and/or free the memory. Although it works and might not cause any problem, you should never call Dispose on a form.

    Doing so, you are cleaning things too fast. Disposes frees the memory used by most of the objects defined in the form. If you program the FormValidating, FormValidated, FormClosing or FormClosed events, which is often the case in a Form in order to make sure that the data on the form is saved before the form is closed, or for cleanup purposes, or for any other reason, those events won't be called by requesting a Dispose.

    The right way to get out of a form is to call it's Close method.

    Code:
    Me.Close()
    The Me is not necessary, but it references the class (a form is a class) in which the code is running, and it makes it clearer that it is the form that you are clearing.

    Calling Me.Close will call all the events that might have been programmed to be called when discarding a form... and will also call Dispose when this has been done.

    -----

    Finally (have to go soon), a little trick.

    Weren't you tired of typing Microsoft.Office.Interop. on almost every line? Don't you find that the code is hard to read because it is cluttered by those.

    You can prevent that 2 ways.

    If you use the interop in only one file in your project, go to the top of the file, before the line with Public Module or Public Class and add the following:

    Code:
    Imports Microsoft.Office.Interop
    You have just told the compiler to use that automatically in the file, so that typing both of those will be the same:

    Code:
    Dim oXL As Microsoft.Office.Interop.Excel.Application = New Microsoft.Office.Interop.Excel.Application
    Dim oXL As Excel.Application = New Excel.Application
    Which one would you rather use? Since it works everythere in the file, imagine how uncluttered your code would be compared to what you have now.

    And if you use Excel in many Forms / Classes / Modules in your application, instead of having to write the Imports statement in all the files, you can apply it to the whole application, by going into the page were you set the References. If you scroll down the page, you will see at the bottom (if you are in Visual 2005 or more), a list of all the namespaces in available in your project. Simply check Microsoft.Office.Interop, and you will not have to type it anymore anywhere in the project where you use an Excel object.

    A last one, really have to go, all the lines where you assign to oRange, such as

    Code:
    oRange = CType(osheet.Cells(i, 2), Microsoft.Office.Interop.Excel.Range)
    If you define your oRange object as an Excel Range, you won't have to Convert (CType) as you do everywhere. Not only will it make the code easier to read, but it will also be faster, since CType requires time and memory.

    Supposing you have applied the Imports as explained earlier, your code would thus looks like that:

    Code:
    Dim oRange As Excel.Range
    oRange = osheet.Cells(i, 2)
    ...
    If you look in the documentation, Cells return a Range. You do not have to convert it into a Range, since it is already one.

    -----

    No time to revise, I hope I did not make any mistake, all of this was written from memory without the possibility to revise or check since I do not have Visual Studio on my portable.

    I will also probably not be able to answer tomorrow, I hit the road back to Montreal right after my training session and will arrrive late at home.
    Jacques Bourgeois
    JBFI
    http://www3.sympatico.ca/jbfi/homeus.htm

  8. #8
    Join Date
    Feb 2010
    Location
    Central WV, USA
    Posts
    212
    As Always, thanks for your thoughts. I never gave any thought that the "dispose()" at the end was actually a class destructor, I thought it was being used to unload excel. But that problem has now been fixed, as you have enlightened me

    BTW ExcelPWS is Excel Production Work Sheet, the class doesn't define a well, it defines what a Production Work Sheet is, which ended up having 20 fields.

    The namespace shortcut you spoke of is very helpfull. In the past I thought you had to place namespaces in the top of your class to be able to use them period. But I now understand they're used as a shortcut and don't add any functionality other than shorter code.

    I'll also try and implement the range object you described and get rid of that "Ctype". Because the execution was very slow. How slow? It would take about 5 minutes to read 1500 rows.

  9. #9
    Join Date
    Feb 2004
    Location
    Longueuil, Québec
    Posts
    577
    5 minutes to read 1500 rows is a little slow, but not that much.

    Communications between .NET and Excel is a process that involves, back and forth, 4 components:

    Application -> Interop -> OLE -> Excel -> OLE -> Interop -> Application

    Over that, some data types (such as dates) do not have the same format in .NET and Excel, and a translation has to be made, and this, before your own CType.

    Over that, you have security checks because, which always happens when one application is talking to another one, specially when you go from .NET (very secure) to ActiveX (not as secure - Excel is an ActiveX/COM application)

    When you type osheet.Cells(i, 14), you are in fact asking for osheet.Cells(i, 14).Value. For each cell you call there is one of these round trips to get the Cells(i,n), and a second one to retrieve

    You made a mistake that I would say is made by 90% of data users: you used Excel to accumulate and store data.

    Excel has not been designed as a data repository. Excel is an analytical tool. It has been optimized to perform lot of complex mathematical operations, on the data it contains. Has a data repository, its stinks.

    Accumulating data should be done in a database (Access, SQL Server). A database is not very good at crunching numbers, but is optimized for data retrieval speed and data validation. When you need to analyze the data, you export it to Excel. Those 2 pieces of software have been designed to work together.

    I have been called once at a big public company because they got stuck with the limit of 65000 lines in an Excel sheet. We transferred the data in a database and they found out that 1/3 of the data had been entered twice. Excel is unable to detect that, a database is built to do that. Their calculations where a couple of millions off because they used the wrong software (Excel) for data input.

    Even worse, a lot of the data was not typed correctly. They had things such as Dec. 32 in date columns, and 25,ooo (the letter o instead of zéro), so that data was skipped by Excel in calculations. In a properly constructed database, such typing errors are spotted as soon as the user types them.

    Most people will use Excel because you start it and type the first number right on. Just as a program, a database needs to be designed and prepared before it can receive data, wich can take a few hours for simple ones, many days and even weeks for bigger ones.

    Retrieving your 15000 rows from a database would be a matter of seconds, since you would retrieve all the data in one round trip to the database, instead of 1500 rows * 20 fields * 2 trips (Cells + Value) = 60000 round trips through many layers and conversions of dlls as you have now.

    ----

    Ways to help?

    With the code you have, I don't see any, although, I have to admit that I am not an expert on Excel, even if I have worked a lot with it. In my world, what you are doing is done in a Database, not Excel. There might be some methods in Excel that I do not know that could make you retrieve data faster, maybe one row at a time instead of one cell at a time. You would then minimize the number of round trips by 20, which would surely be a big improvement.

    Here are a few tips and leads.

    When I have to work with Excel from .NET, I try to do as much as the work as I can inside Excel, in VBA macros, and call those macros from .NET. That makes me save a lot because there are less trips back and forth. Internally, VBA macros work a lot faster because the work is done inside the Excell memory space, without having to convert all the time.

    But with what you are doing, it could not be done, Excel is unable to see your .NET class and collection. So the code has to be done in .NET.

    And now that I have more information, the idea of working with a named range for the columns is not as good. Retrieving data through a name makes the application easier to understand, debug and maintain, but involves a penalty on performance, because in the background, the system still works with numbers. When you ask for Columns("BillNumber") or Cells(1,"BillsNumber"), the system has to check which column is "BillNumber" to know if it is columns 17 or 18. That is good for you, because if the format of the sheet has change it can still retrive the right data, but that is bad on performance because it does the checkup each time.

    The only way to get rid of the CType would be to define the properties of your class as Object, because Excel returns the value of a cell as an Object. You might save some time retrieving the data, but you would lose further down the drain. Because an Object can contain anything, it is very slow to use. And you would have to use CType anyway further down the drain because you cannot do any operation on an Object (such as adding 2 Object variables) unless you convert it to something usable in the operation.

    I have no idea of the type or amount of manipulations you do with the data once it is loaded into your collection. But maybe, loading an Excel sheet into a .NET collection is not a good idea. Although working with classes and collections is usually the best way to go, there are always exceptions. Mathematicians and people who work in geomatics (maps) will usually prefer a matrix (an array) over a collection, because they have a very big amount of numbers on wich to perform a big amount of operations, and 2 dimensional arrays are faster than a collection of objects for those type of operations.

    In your case, you used the wrong software to record the data: Excel instead of Access. As I told you before, it is done all the time everywhere, you are not alone. As far as I amconcerned, it is probably me most common mistake in the world of of computing. Would you use Excel for Word processing. Or a Word table to replace a big Excel sheet? As far as data is concerned, entering data in an Excel sheel instead of a database is as big an abherration.

    Because the data is not in a format made for data, techniques that would normally be good, such as loading the data into classes and collections, might suddenly cause problems, such as your long 5 minutes transfer rate.

    What you did with the text files is the right thing to do. Text files have no functionnality built into them, so you have to load them somehow into an application before using them. Their format is thus designed so that they can be loaded fast into any application.

    However, and Excel sheet is made to be loaded in Excel, so it is designed to load fast in that application. Any transfer somewhere else involves having to load first into Excel, and then connect / validate / convert in order to used them.

    In your case, unless you are ready to learn to work with a database and export to Excel data to the database, which might be a lot of work because you are inexperienced with databases, it might be better to use VBA macros in the Excel document itself, instead of bringing everything in .NET. VBA uses an older version of Visual Basic, and there are marked differences between Visual Basic for Applications and Visual Basic .NET, but there are also a lot of similitudes.

    Since the VBA macro works from inside Excel, has data types that are the same as those in Excel, the impact on performance would be very great.

    Good luck
    Jacques Bourgeois
    JBFI
    http://www3.sympatico.ca/jbfi/homeus.htm

  10. #10
    Join Date
    Feb 2010
    Location
    Central WV, USA
    Posts
    212
    From the outside it might appear that i'm using excel for a data container, but there's lots of calculations being done on the production data. Now after all those are done, it might be a good idea to somehow archive the data out into MS Access (I used to use foxpro, that one didn't last long). But moving forward the Data will be managed by the application i'm creating and excel won't be used other than doing some preliminary calculations before the data is imported.

    I'd like to learn how to better to VBA inside excel, there are many other Excel sheets we use that could use some streamlining. But I've got to take things one at a time. My whole programming stratagy has changed over the past few weeks. And there's still much to learn on the .net front. After I get this program done, there's an accounts recievables program that's next on the list, and I want it to be more OOP structured throughout. My current app is still being used as a learning platform, and I'm sure some areas are not what they should be. But one day I'll be able to go back and fix them the right way.

  11. #11
    Join Date
    Feb 2004
    Location
    Longueuil, Québec
    Posts
    577
    Moving data between Access and Excel is a very simple task, specially if you do it from Access.

    Import Excel into Access with File...Get External Data.

    Export to Excel from Access with File...Export, or by putting the cursor on a table or query and click on the publishing button on the toolbar. That button shows a Word icon by default, but if you "open" it, you will see that there is an option to Analyze with Excel.
    Jacques Bourgeois
    JBFI
    http://www3.sympatico.ca/jbfi/homeus.htm

Similar Threads

  1. Summarize Data in Excel
    By Ron Weller in forum Database
    Replies: 2
    Last Post: 02-26-2008, 08:45 PM
  2. Importing data from an Excel file
    By admol in forum .NET
    Replies: 1
    Last Post: 02-15-2007, 07:57 PM
  3. Importing SQL Table to the Data Grid
    By software_develo in forum .NET
    Replies: 5
    Last Post: 11-18-2005, 05:11 PM
  4. Replies: 1
    Last Post: 07-25-2001, 07:17 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center
 
 
FAQ
Latest Articles
Java
.NET
XML
Database
Enterprise
Questions? Contact us.
C++
Web Development
Wireless
Latest Tips
Open Source


   Development Centers

   -- Android Development Center
   -- Cloud Development Project Center
   -- HTML5 Development Center
   -- Windows Mobile Development Center