Variables & Data Structures in Programming

The purpose of any IT system is to process data, whatever form that data takes (numbers, text, images, sounds, etc.). One of the most fundamental features of any programming language, therefore, is the ability to store the data in a structured format.

Variables

The most simple form of storage is called a variable. It's an area of memory that stores one item of data, such as a number or a character. They have two purposes - the programmer is able to choose the names of the variables, making programming easier, and also, you can write programs or functions that will work with any values. If you're familiar with spreadsheets already, you can think of variables as being like the cells, which you can then use in formulae regardless of the values they contain. All procedural programming languages, such as C, BASIC and Pascal, have variables, although they may support different types and let you manipulate them in different ways.

Some languages are strongly typed (see below), whereas others aren't typed at all. Some require that you declare a variable before you use it, and others let you go straight in and define a variable's value without declaring it first.

Declaring a variable gives the variable a name, and, in most programming languages, gives it a type - in effect it creates the container that stores your value. See the Type section for examples of declarations.

When you define a variable, you are simply giving it a value.

Type

Most procedural programming languages support some sort of typing - that is variables can only store one type of value. Anyone who has created a database will be familiar with this idea - each field in an Access database is also given a type, be it number, text, memo, date, etc. The types supported will vary from language to language, but will include some, or all, of the following (and maybe more!):

Integer (int)

Short integer

Long integer
Integers are whole numbers, and integer variables are used when you know there is never going to be anything after the decimal point, e.g. if you're writing a lottery ball generator, all the balls have whole numbers on them. The difference between short integers, integers and long integers is the number of bytes (see the number bases section for details) used to store them. This will vary according to the operating system and hardware you're using, but these days you can assume that an integer will be at least 16 bits, and a long integer is probably at least 32. In a 32-bit environment, it is more efficient to use long integers (i.e. a whole word), and so many compilers will automatically use long integers unless you specify a short one.
Float

Single

Double
Floating point numbers are ones that contain fractional parts - i.e. they are not whole numbers. The single and double quantifiers are analagous to the short and long quantifiers used with integers - i.e. they indicate how many bits are used to store the variable. Floating point arithmetic can lead to problems with rounding and precision, so if you're dealing with a limited number of decimal places, it is probably more efficient to use integers and multiply all your values by a power of 10. For example, if you're dealing with money, it's probably better to work in pence and use integers than to work in pounds and use floating point variables.
Char A char variable is a common sight in C or C++ programs (which can't handle strings), and is used to store a single text character. The value it actually stores is an integer representing the code (e.g. ASCII) for the character represented.
Boolean A boolean variable can store one of two values - either TRUE or FALSE. Like char, this is usually an integer - in VisualBASIC, for example, FALSE is 0 and TRUE is -1, and the TRUE and FALSE values themselves are constants (see below).
Fixed-length string Strings are variables that contain text, and they come in two sorts. With a fixed-length string, you declare how many characters the string is going to hold. Certain API calls in Windows require the use of fixed-length strings, but generally they are not used in BASIC. In C they are implemented as an array (or vector) of chars.
Variable-length string A variable-length string is one where you don't define the length. This is the default type in BASIC, and is useful for taking user input where you don't know what the response will be. The maximum length of the string will depend on your environment, but it should be at least 255 characters.

These types may go by different names in different languages, for example an integer in one might be a short in another, or a single in one might be a float elsewhere.

VisualBASIC

dim name as string
dim age as integer
dim height as single
dim sex as string * 1
dim married as boolean

C or C++

char*	name;
int	age;
float	height;
char	sex;
int	married;

Note that there is no string type in C, but that you can use a char pointer (char* - I'm not going to go into the complexities of pointers and references here!), and nor is there a Boolean type. I've used a fixed-length string in the BASIC example to represent a single character (either M or F).

BASIC also allows you to declare the type of a variable by using a suffix on its name, e.g. $ for strings, % for integers, etc.. In the above example, I could have defined name$ and age%, and the variables would have been set to set to a string and an integer automatically.

See how variables and types are used in Python on the Advanced ICT YouTube channel.

Some languages, such as JavaScript, are not typed, but still have a declaration statement, e.g.:

var name, age, height, sex, married;

This might sound easier, but can lead to confusing results when it can't decide whether your variables are numbers or strings. For example, say you have two variables, a = 123 and b = 456 - what would you expect c = a + b to give you? The + operator can also be used to concatenate strings, so the value of c will depend on the context. Sometimes c will be 579, and sometimes it might be 123456! To be sure, you'd have to use something like c = (a * 1) + (b * 1) (for addition) or c = a + "" + b (for string concatenation).

Scope

A variable can't always be used throughout the whole of your program - they have something called a scope which determines where a value can be read or changed.

Global variables are variables that can be used through your program - that is, the scope of a global variable is the entire application. Most variables, however, will be local - local variables can only be used in the function (or procedure) in which they were declared, or any other function called by that function. Scope, therefore, is hierarchical, and generally only applies downwards (from the main body of the program, to the functions it calls, and from functions down to further sub-functions).

This means that if you give a variable at the top of your program, you can't declare another variable with the same name in a function. However, if you declare a variable in one function, you can declare another variable with the same name in another function, and they will effectively be different variables and can have different variables.

Finally, by default a variable declared in a function only exists for the time that the function is running - each time you call the function, the variable is re-declared and reset. If this isn't what you want, you can declare your variable as a static - this means that its value will persist after the function finishes, and its value will be the same next time the function is run. Static variables still have a type (e.g. integer, float, etc.).

Arrays

Sometimes you might want to have variables to represent collections of similar items or objects - for example, the six balls picked by a lottery simulator. You could create six separate integer variables, called something like first_ball, second_ball,third_ball, etc., but it would be better to use an array.

An array is a collection of variables of the same type, with the same name, differentiated by a numbered index. In the lottery example, we could use an array of integers called ball, and refer to the individual balls as ball(0), ball(1), ball(2), etc. Note that the index usual starts at zero rather than one. What's useful about arrays is that you can use the same code to process all of the items in the array using a loop.

Arrays can also be used to implement a simple look-up table. For example, of you wanted to generate a day of the week at random, you could use a string array to contain the days of the week. You could then generate an integer from 0 to 6, and use this as the index to return the day of the week. To do this in Visual BASIC or JavaScript, the code would be:

VisualBASIC

dim day(7) as integer
dim day_number as integer

day(0) = "Sunday"
day(1) = "Monday"
day(2) = "Tuesday"
day(3) = "Wednesday"
day(4) = "Thursday"
day(5) = "Friday"
day(6) = "Saturday"

day_number = int(rnd() * 7)

msgbox(day(day_number))

JavaScript

day = new Array(7);
var day_number;

day[0] = "Sunday";
day[1] = "Monday";
day[2] = "Tuesday";
day[3] = "Wednesday";
day[4] = "Thursday";
day[5] = "Friday";
day[6] = "Saturday";

day_number = Math.round(Math.random() * 7);

alert(day[day_number]);

Early versions of Visual BASIC (and presumably other languages) also allowed you to have arrays of controls (e.g. fields or buttons). This allows you to write one function or procedure that can then apply to any of the objects in the array.

I have written a TES Subject Genius article about how arrays can be used for selection.

NB. In some programming languages (e.g. C or C++), arrays are called vectors.

More Complicated Structures

Some types of data may require more complicated structures to store them. Languages such as C++ allow you to create compound data-structures (e.g. classes, unions, structs, etc.) by combining variables. For example, you might want to create a time type, which uses integers to separately store the numbers of minutes and hours. You can then create functions for manipulating your types, e.g. setting, comparing and adding the times, as the standard arithmetic operators cannot be used. This is known as operator overloading.

Examples of other types of data structure include trees, queues, lists and stacks.

Data Abstraction

Your code should be written in such a way that the details of how the data are stored are hidden from the rest of the code - this is known as data abstraction. So, for example, if you have a program that uses a stack, you shouldn't read from, or write to, the stack directly, but you should write and use functions for the purpose, e.g. one called push() to put a value on the stack, and one called pop() to remove values. Not only does this allow you to centralise data-storage functions and reduce repetition of code, but it also allows you to change how a data structur is implemented without having to re-write all your code. If you wanted to change your stack implementation from using an array to using a linked-list, for example, you would only need to change your push() and pop() functions, not all of the functions that used the stack.

Constants

Finally, most programming languages also support the use of constants - these are used in the same way as variables, but their values can't be changed once the program is running. They are useful because you can refer to mathematic values by name, rather than having to remember them, or you can personalise many parts of your programming by setting a constant to the name of the user at the start of the program. Programming languages may also include many constants of their own, which are predefined. In Visual BASIC, for example, these include values such as pi, true and false, and also the arguments used for certain functions, such as msgbox() types (e.g. vbYesNo), which are actually just integers.

With compiled languages, the compiler often works by replacing the names of the constants with their actual value (like Find and Replace in a text editor or word processor) before compiling the code, thereby reducing the amount of memory required for storage and making the code more efficient.