what is metadata, anyway?

What is metadata?

Metadata is just information about something.

Me: Hey, is that an apple?

You: Yeah, it’s red and round and crunchy.

Congratulations, you’ve created metadata!

In a place like a library or an archive, we create metadata about the things we own. This can get confusing for some people because a lot of what we own are just WORDS, wrapped up in a book jacket.

You: Wait, you are going to create words to describe these other words?

Me: …Yes.

You: Like….you are going to create information about other pieces of information?

Me:…Yes. Sorry.

Librarians avoid getting themselves confused by thinking of the materials in a library as OBJECTS, much like an apple. Except that it’s a book (or a CD, or an e-book, or a magazine). So, if *you* think of a book as an object, there is actually a lot you can say about it! You can describe its name (the title), who made it (the author), who made it into a physical thing and where that happened (the publisher and the city of publication), how big it is (page number and dimensions), what are some of its features (illustrations, the subject of it). All of this is metadata.

You (smugly): But what if the “object” isn’t an object? What if it’s a song in my phone?

Me: You can’t trip me up that easily.

Even electronic objects are still objects—you can think of them like virtual balls, round and whole and bouncing around the digital world as people upload and download and share. So the metadata for a digital object is pretty much the same as for a “physical” object, but the terms are a little different. You can describe its name (title), who made it (author/songwriter), who made it into something for others to use (recording company/hosting website), how big it is (file size), what are some of its features (file format, bit rate, playing speed).

Metadata is not *too* scary, because you create metadata all the time (tell me what that thing is you are holding, tell me about your car, tell me about your day). And the best part about creating metadata for things is that when you get a lot of things together in one place (like a library! Where you might have thousands of things in one small place!), you can take all the metadata and put *it* in one place too (like a library catalog), so that anyone who wants to find an object doesn’t need to look at everything to find what they need—they can look in the catalog and be directed to wherever they need to go to find the object they are looking for. And this is why metadata is so important.


In depth: Digital Divide

As most of us are aware, sometime in the ’90s more people in the U.S. started using computers than typewriters, mimeographs, or pens.

It was an interesting time to be a teenager, using the school’s computers to type research papers that you researched from physical books and newspapers and microfilm, or perhaps from a CD-ROM. Most people didn’t get onto the internet until the late ’90s. But the move from paper to electronic was so incredibly fast it’s a wonder that heads weren’t spinning.

Here at UHD, things were no different. In the archives, the challenge is to try to create accessible collections which straddle the physical/digital divide fairly evenly. But it’s a very complex problem to navigate. There is a bright line for UHD in terms of how records were created over the years: pre-2004, most reports and documents were created with the intent of printing. Post-2004, most reports and documents were created with the intent of never or rarely printing them. There are two noticeable outcomes of this shift in practice:

1) most records now came with embedded searchability, thus diminishing the need for traditional tables of contents, page numbers, indexes, etc, and

2) records could be and often were much longer and more detailed. Graphs from a spreadsheet or database, embedded into a WordPerfect document? No problem for the creator, but a potential headache for the archivist down the road who now has not one, but two or even three documents that could be considered “archival.”

So relatively quickly the university (and all universities, and all businesses, and all regular ol’ human beings) moved from the physical to the digital. The divide is deep, terribly deep, and one that the archival community is still coming to grips with. For 10,000 years (give or take), archivists have built their work around physical objects that could last for centuries if only kept away from wind, water, and sun. Now we are faced with a huge amount of material that, quite honestly, may not be able to be saved (or at least, not on the same timescales as we were used to before). This has placed archivists in the position of thinking not just about what a document was written ON, but what the document is written ABOUT, and considering the content to be the thing that must be preserved. The information, not the book or tablet or scroll, is paramount, because for digital objects, their physical format is so incredibly fragile that there is no known way to preserve it long-term. It’s been a sea change in archives, and will probably continue to change our work in fundamental ways and change how we give our communities access to the materials we save.

Discovering the Digital Archives

Archives, historically, have a very difficult time being “discovered.” Most researchers or users find out about collections from their peers or from the archivist in a reference interview. In an effort to increase the usability of the UHD Digital Archives, we’ve launched a new page on our site: Discover the Collections.

Discover Archives detail

(detail of page)

Here users can find their way around the collections visually, in a more intuitive and contextualized way. Using regular hyperlinking technology, they can move from one collection to another seamlessly, and new collections can be added into the “universe” as they come online.

UHD and the Web

As the University embarks on the third renovation of its web presence, and at the end of the 40th anniversary of the merger of STJC and UH, it seems like an opportune time to revisit how the University’s web presence has changed over the years.

Of course, before the mid-90s most institutions had no web presence at all; the Internet was by and large for research groups to talk to one another. But after the creation of HTML and the revolution it caused in the way that information was presented and shared over the Internet, there was a “population explosion” of new webpages and search engines (Yahoo! being one of the most long-lived). By 1997, most universities had a website, although they were all in HTML and therefore very simple by today’s standards. The Internet Archive’s Wayback Machine can help us see the past websites of UHD. Starting in 1997, UHD “piggybacked” on the University of Houston website with the address www.dt.uh.edu. In 2000, however, the University decided to begin hosting its own site and www.uhd.edu was born. The first standalone website was still simplistic by the expectations of 2015; there were no forms or portals. Library research within academic journals was still mostly done via CD-ROM.

In 2009, however, with the advances in website design beyond simple informational pages and into the “social” needs of users, the University renovated their site to include more interaction and online tasks. The new site was much larger as well (just before we began our current renovations, the site contained over 7,500 distinct pages).

Now, six years later, we embark on another renovation of the website, as the web design industry has shifted its focus yet again. From the informational sites of the mid-90s, to the social sites of the early 2000s, and now onto the idea that websites should act as “stories”, our website is expected to form an engaging picture of the organization, rather than just act as a place to store or input information.

How the Archives are Arranged

The UHD Archives are arranged to mirror the organization of the University itself. Starting with the current organizational chart and thinking about how we might arrange materials so that one group of records would not be substantially larger than any other, in the end seven Record Groups were created (see below for full listing of the Groups).
A Record Group is an organizational strategy of bringing lots of different offices or business functions under one “umbrella” to help build relationships for researchers. In the case of UHD, we realized that there are a few main areas under which we can group certain offices. Under each Record Group, there are several Collections which correspond to different programs or offices. Within each Collection, there might be several Series to help further organize the materials, and within each series it would be possible to have subseries that contains many individual items or groups of items that relate solely to that specific subseries. This is common archival practice, and something any researcher would see in any archives in the U.S., Canada, or Australia. An example of how these groups nest within each other:

Record Group: RG-C (Colleges and Academic Programs)
Collection: College of Humanities and Social Sciences (1974-present)
Series: Center for Public Deliberation (2006-present)
Subseries: Achieving the Dream Program (2008)

You can see that each sub-grouping is tightening the net, so to speak, until we have reached a fine level of granularity. By going up from the bottom (starting at the subseries level), it enables a researcher to see who was responsible for all business activities at the University. While a person may not realize that the Achieving the Dream program was funded by the Center for Public Deliberation, seeing the organizational structure makes it perfectly clear that the Center is part of the College and that the program was done by those two groups working together.

Here is the full listing of the Record Groups and some representative Collections within each Record Group:

RG-AA (Academic Affairs)

Contains the records of the Office of the Provost, Library, Planning, Enrollment Management, Advising, and Student Affairs

RG-C (Colleges and Academic Programs)

Contains records of the individual Colleges, distance education, English Language Institute, and sponsored programs

RG-ESO (Employment Services and Operations)

Contains all records related to Human Resources, training, and employment

RG-F (Administration and Finance)

Contains the records of Facilities, Information Technology, Budget, Compliance, and Police

RG-PA (Public Affairs)

Contains records and output of University Relations, Corporate Relations, Giving, and Events (including scholarship fundraising)

RG-PO (President’s Office)

Each collection is a President’s tenure


This group contains all materials related to South Texas Junior College and its administration, as well as the purchase of STJC by University of Houston in 1974. It is broken down into several subgroups to facilitate searching.

RG-V (Visual Materials)

Contains any and all photographs, physical and digital, separated by time period and office