The Gnumeric Spreadsheet Internal Design September 2, 1998 Miguel de Icaza Instituto de Ciencias Nucleares, Universidad Nacional Autónoma de México
miguel@gnu.org
September 2, 1998
Gnumeric Internal Design Design Goals The Gnumeric Spreadsheet is a spreadsheet that is intended to grow in the future to provide all of the features available in commercial-grade spreadsheets. I am not only intending to provide Gnumeric with a very good computation engine, but I am also interested in making the GUI experience for the user as pleasant as possible, and that includes taking a lot of ideas from existing spreadsheets. Target Gnumeric should be compatible as much as possible with Microsoft Excel in terms of the user. This means formulas and expressions should be compatible unless we find a serious design problem in Excel. An example of a design problem in Excel is the fact that various internal functions have limits on the number of arguments they can take: this is just bad coding and this sort of limitations is unacceptable in Gnumeric. When writing code for Gnumeric, no hard coded limits should be set (currently Gnumeric breaks this rule by having hardcoded in a few places the maximum number of columns to be 256). Basic organization The Gnumeric spreadsheet is basically organized as Workbooks (see src/sheet.h for the definition of the workbook object). There might be various workbooks loaded at the same time. Every one of these workbooks might have a variable number of Sheets (see src/sheet.h for the definition of the Sheet object), and finally each sheet contains cells (The definition of a Gnumeric Cell is in src/cell.h). Workbooks only take care of keeping various sheets together and giving a name to them. item Sheets are the repository of information: cells are kept here, information on the columns and rows is kept here, and the styles attached to the regions is also kept here. Sheets might have multiple views, this is required to support split views and in the future to support the GNOME document model. The actual front-end to the Sheet object is the SheetView object: SheetView object each one has a number of components: Their scrollbars. Their cell display engine (more in a second). Their bar display (column and row display). The cell display engine is basically a modified GnomeCanvas that can deal with keystrokes and can do some extra spreadsheet oriented tasks. This cell display engine is of type GnumericSheet. GnumericSheet objects usually contain a number of Gnome Canvas items specially designed to be used for a spreadsheet: the Grid Item and the Cursor Item: The Grid item takes care of rendering the actual contents of the Sheet object and displaying the Cells in the way the user requested, this is the actual "core" display engine. The Cursor item is the item that actually draws the spreadsheet cursor. This item, as every other Gnome Canvas item can take events and this one is specially interesting, as it provides the basic facilities for dragging a region of cells. During the course of a user session, Gnumeric will create Items of type Editor, a special item designed to display the contents of a cell as it is being typed and it is syncronized with a GtkEntry (to provide an Excel-like way of typing text into the cells). Sheets contain information for columns and rows in doubly linked lists to facilitate their traversal (in the future, when bottlenecks are identified, we will provide an alternate quick access method based on hash tables, but the information will still be linked forward and backwards). The column and row information is stored in GList's that contain ColRowInfo structures. These structures include a number of interesting bits: Their assigned position (or -1 if they are the "default" style), field name "pos". The actual width used in pixels (for the current magnification setting) as well as their logical size, plus the margins required in pixels for displaying various cell adornements. When a cell is allocated, both ColRowInfos (for column and row) are allocated and properly linked, the cell is made to point to these new structures. The column ColRowInfos have a field called "data", this is a linked list of Cells for every row where a cell exists. Cells are stored in a hash table inside the Sheet data structure for quick retrieval and they are also linked properly in their respective columns. A cell might display information in more than one column, so Gnumeric needs to keep track of which cells (column, row pairs) are being managed by which cell. This information is kept in a per-row fashion in the data pointer in the ColRowInfo structure for the rows. The registration and unregistration of the view areas code is on the cellspan.c file. Formula storage When a formula is encountered, Gnumeric parses the formula into a tree structure. The tree structure is later used to evaluate the expression at a given location (each cordinate in a cell references is stored as either an absolute reference or a relative reference to the cell position). To speed up formula duplication, Gnumeric reference counts the parsed expression, this allow for quick duplication of expressions. It should be noted that some file formats (the Excel file format) uses formula references, which can preserve the memory saving features of reference counting. Currently Gnumeric does not provide any way to keep these references, a possible scheme for saving correctly these cells would be to keep in a special list any formula being saved that has a reference count bigger than one and keep track of these duplicates and generate formula references in the output file rather than outputing the actual formula. Resource management Data structures in Gnumeric are lightweight, they are designed to consume little memory by reusing as much information as possible. This is achieved by making common information be hashed and reference-counted. This is done with strings, parser/interprester symbols and styles. To read more about this, check: for strings: src/str.c and src/str.h for styles: src/style.c and src/style.h for symbols: src/symbol.c and src/symbol.h Plugins