Technological context

The possibilities for automating agency rulemaking and adjudication have increased because of the availability of low-priced desktop computers and associated networking hardware and software. Networking technologies, particularly, have expanded possibilities for using information technology to transfer information in electronic form between participants in regulatory proceedings and agency decisionmakers.

Networks

Two technological configurations or "architectures" make automation of the regulatory process feasible: local area networks ("LANs") of desktop computers and wide area networks linking geographically dispersed LANs.

Local area networks

LANs permit users of desktop computers to enjoy the performance and autonomy advantages of running software directly on their own computers, while being able to share files and certain other resources with other users through the local area network and its associated "servers". 10 Such an architecture invites agency work groups to perform sequential steps such as data entry, editing, and final production by accessing files through a local area network. The latest generation of desktop computers have made it feasible to exchange fairly large files, including graphical images, through local area networks even when the exchange involves simultaneous access by a multiplicity of users. So members of an agency work group can work on files electronically without having to sacrifice speed of access or having to give up use of graphical images.

Wide area networks

Wide area networking permits local area networks to be joined electronically regardless of their geographic proximity to each other. Thus, a LAN in a Washington agency headquarters can exchange information with a LAN in an Alaska field office through a wide area network. Advances in modem speeds and wide area networking protocols began to make it realistic in about 1994 for significant quantities of graphical information and large text files to be exchanged over wide area networks. Sufficient bandwidth 11 is available through the public switched telephone network to perform these types of information exchange routinely.

Until the Internet became popular, however, most wide area networks were proprietary, involving mainframe computers at one end of a connection running proprietary applications and desktop computers or dumb terminals at the other end of the connection running proprietary interface software, sometimes utilizing international data communications standards such as X.25 over the communications link itself.

The Internet

The Internet is an international network of computers and computer networks connected to each other through routers using the TCP/IP protocols and sharing a common name and address space. One can communicate with any computer connected to the Internet simply by establishing a connection to an Internet router or node. 12 The Internet is not a corporation or administrative arrangement; 13 it is a method for connecting computer systems, and the phenomenon of very widespread adherence to that method. The Internet began in the 1960s with federally subsidized connections among universities and government research laboratories. An "acceptable use policy" limited traffic unrelated to research and education. By 1990, the Internet's potential as a model for a National Information Infrastructure had been recognized, and the federal government began to reduce the subsidy and to encourage private entities to take over responsibility for basic communication and traffic management functions. By 1995, most of the traffic on the Internet involved unsubsidized facilities and private traffic. The Internet is the archetypical open network.

The Internet has accelerated an independent tendency toward two important technological phenomena: client/server computing and distributed database management, both of which are significant to agency automation. The client/server model permits software developers to allocate tasks between a "server" computer and a "client" computer, connected to each other through a network, so as to maximize performance, security, and other design criteria. For example, the client/server model permits the client, such as an individual user's desktop computer, to perform most or all of the tasks associated with graphical image management and screen displays, thus making it easier to implement windows-based and other highly graphical computing sessions without burdening the communications link with large quantities of data necessary to describe all the features of a particular screen image. 14 Using this model, the individual user works with a Microsoft Windows or Macintosh display, communicating data and instructions to the server by pointing and clicking with the mouse. The client sends greatly abbreviated messages of one or a few characters across the communications link based on its "mediation" between the graphical interface and the communications link. The server can cause a particular image or change in the user display to be presented by sending a similarly abbreviated character or character string based on its "knowledge" of the graphical images the client already has and on its "knowledge" of the operational details of the client's interface software. 15

The second important phenomenon is distributed database management. This concept allows a user to combine data actually stored on a multiplicity of computers. For example, a client interested in a particular submission in notice-and-comment rulemaking could retrieve the docket kept on computer A, and also select a particular item from the docket, which would cause the desired document to be retrieved from computer B. Another document, for example, an opposing party's response, might be retrieved from computer C. Depending on the quality of the user's client software, the user might retrieve all of this material and have it presented to her as an integrated set with no indication that the elements of the set came from different computers.

The flexibility and power of the client/server and distributed data-base models are enhanced when the protocols for implementing them are non-proprietary and "open." With open protocols, a multiplicity of designers and vendors can make up the pieces that can be combined into a distributed database or a client/server application.

Internet communication makes more efficient use of communications capacity than some alternatives such as dialup bulletin boards. When a member of the public establishes a dialup connection to an agency electronic bulletin board, the connection is held open for the entire duration of the interaction between the member of the public and the agency system. When a member of the public communicates with an agency server through the Internet, communications capacity is used only when packets of information actually are being transmitted. The only part of an Internet session that involves holding open a telephone connection, reserved for that particular session regardless of whether information is moving across it, is the telephone connection from a requester's modem and a -- usually local -- point of presence for an Internet access provider.

Despite its technological significance, the Internet worries many computer professionals. Its open nature means a significant loss of control by computer managers who hook up to it. Once they connect, they are potentially vulnerable to unknown persons who also connect to the Internet. While a variety of "firewall" techniques exist to prevent intruders from unauthorized activities on a computer system connected to the Internet, an Internet connection opens up security risks that do not exist in the absence of external network connections. In addition, because responsibility for the operational status and performance of Internet facilities is diffused among operators of multiple backbones, operators of Internet access services, operators of nodes that supply particular information items, and user facility managers, there is also concern that the Internet is less reliable than closed and proprietary network solutions.

Also, the Internet is based on UNIX and TCP/IP, operating system and network application technologies that may be unfamiliar to computer professionals trained and experienced on proprietary DEC or IBM systems.

Moreover, as the public switched telephone system becomes more digital and shifts to packet switching wholly or entirely, 16 there may be less need for the Internet as a distinct networking protocol. But at most, this would eliminate the need for specialized Internet features at the IP (network) and TCP (session) layers of the Open Systems Interconnection ("OSI") stack. 17 There would continue to be the need for non-proprietary open standards at higher, application layers such as Telnet, 18 FTP, 19 Gopher, 20 WAIS, 21 and World Wide Web 22 applications associated with the Internet.

Despite skepticism about the Internet's limitations, its use to distribute public and private information outside the research context has grown rapidly. The Villanova Law School Villanova Center for Information Law and Policy maintains an Internet server that among other things has a "Federal Web Locator," comprising pointers to some 650 Internet servers maintained by federal government entities. 23 More than 1,874,926 requests for information have been handled through this Federal Web Locator. The Library of Congress Thomas System, implemented through the Internet's World Wide Web is one of the few places where the full text of bills introduced in the House and Senate can be obtained electronically. In mid-1995, OMB sponsored several electronic "town hall" discussions on the Internet, using newsgroup and listserv applications. Several hundred people participated, from all over the country, and representing diverse backgrounds and points of view. Private entities such as the Teachers Insurance and Annuity Association ("TIAA"), one of the largest pension plans in the country, has made participant and beneficiary information available through an Internet Gopher server. 24 Both the Nuclear Regulatory Commission and the Department of Transportation have established World Wide Web "home pages," shown in figures 1 and 2.

Figure 1 - DOT Home Page

DOT homepage GIF

Figure 2 - NRC Home Page

NRC homepage GIF

On the other hand, present Internet applications are better at electronic publishing than at facilitating interactive communication. It is relatively straightforward to publish a notice of proposed rulemaking and associated documents and images through a Gopher menu or a World Wide Web page. Establishing an appropriate framework for exchanging pleadings or statements of litigation position in an adjudication is more difficult. One possible starting point for developing an adversarial proceeding matrix is newsgroup or list-serve technology like that used in the OMB town hall initiatives in which a group of people interested in the same subject post messages, all of which can be seen by the other participants. Under this concept, one party could post a complaint, followed by opposing parties posting their answers, followed by first party responses, and so on. The newsgroup or list then would represent the docket with the full contents of all of the submissions.

Alternatively, some World Wide Web applications permit the creation of specialized forms which might be adapted to an interactive administrative litigation context. In any event, these possibilities are at present untried and therefore unproved.

Acquisition of information

Information can be acquired by an agency computing system by receiving the information from submitters in electronic formats, or by receiving the information in paper formats and converting it into electronic form through some electronic input device.

From electronic submissions

Agencies can receive information intended for their automated systems in electronic formats, through network connections or by receiving physical computer-readable media such as magnetic tapes, cartridges, and diskettes.

Receiving information in electronic formats relieves the agencies of burdensome data entry tasks; the data from outside parties goes directly into the computer system without having to be keyboarded or scanned. The risk of error and omissions in the data entry process is virtually eliminated although errors can occur in handling electronic submissions as well.

There are, however, at least two general limitations of electronic submission that vitiate cost-savings and error-reduction potential. First, not everyone in the population entitled to submit information to agencies has access to the technology necessary to prepare electronic submissions. As the desktop computer revolution continues, the portion of the population without access to basic computer technology is diminishing rapidly, 25 but there probably will always be some irreducible minimum that is unable or unwilling to use basic computer technology. Moreover, to the extent electronic submissions must be presented in sophisticated and or proprietary formats, the size of the relevant population lacking access to relevant filing technology increases proportionately. 26

The other limitation on electronic submission arises from the richness of formatting used in conventional submissions. Paper documents permit a rich variety of concepts to be communicated typographically, by typeface, by typesize, position on page, and physical placement of related elements, as in tables. 27 There is no single computer protocol or standard that permits all of these to be expressed unambiguously in computer-readable format. Different vendors, especially word processing software and desktop publishing software vendors, each have their own methods of coding typographic design elements. But if an agency uses one of these proprietary page description languages it excludes -- or at least significantly increases the cost to -- users of other products. For example, an agency can specify that electronic submissions must be presented in the popular WordPerfect word processing format, and such a specification would enable submitters to express a variety of typographical features, but it would disadvantage users of the Microsoft Word and Lotus Ami Pro products. There are non-proprietary standards or "grammars" for expressing a rich variety of typographical elements. The most flexible and prominent is Standard Generalized Markup Language ("SGML"). 28 Not many consumer-oriented products have SGML capability yet, however.

In addition, specialized software applications could be written for a particular agency or a particular proceeding that would structure user responses and create a computer file with the responses represented in an appropriate format for computer acceptance. The experience of using such software would be like filling out an electronic form.

From paper submissions

Agencies can receive information on paper and perform the tasks necessary to transform it into a format used by their computer systems. There are two basic ways of accomplishing the transformation: scanning and keyboarding. Scanning involves taking a digital picture of each page through a device that works mechanically much like a copying machine. The resulting digital picture or image is a bitmap. That means that it does not represent individual characters through codes that can be interpreted by a computer; 29 rather it represents light and dark spots on the page and sometimes colors as well. A user may not edit a scanned image with a word processing program, nor search for particular characters or words. Rather, the computer system moves the bitmap of the page around, storing it, presenting it on user displays, copying it and sending it to other computer systems, or printing it, much as it would handle any other graphical image.

There is an additional step that can be performed with scanned page images that generates a version of each page in character-based format. Optical Character Recognition ("OCR") searches for patterns of light and dark spots on the page corresponding to the patterns made by alpha-numeric characters and substitutes a sequence of alpha-numeric characters (or, more likely, ASCII codes) for the character images it finds on the page. Many legal applications, such as those regularly used to manage discovery documents in complex civil litigation, have two, linked, files for each page of material, the bitmapped image and the OCR translation. The software permits users to search the OCR version for words and phrases and presents both the OCR file and the bitmapped version of the corresponding page at the user's option.

OCR technology works well only with "clean" bitmaps, ones not containing extraneous spots or marks and ones not containing handwritten annotations. Moreover, even on clean bitmaps, OCR rarely has an error rate much better than 95%, meaning that the OCR file from a typical bitmapped page of typewritten text would have from ten to twenty "typographical" errors on it. Some recent research indicates the potential for improving accuracy by a few percentage points. For example, NRC's LSS Steering Committee was informed in its December, 1994 meeting that the Information Science Research Institute at the University of Nevada 30 was able to achieve 97.4 to 98.5% accuracy, scanning second generation DOE documents with a Calera Recognition Systems system.

Many OCR software products are adroit at identifying possible errors and inviting human inspection and correction. Using such error-correction software, however, requires considerable human input, amounting to a minute or more per page.

Keyboarding involves a human keyboard operator typing the entire contents of a document submitted on paper. Keyboard operators can be trained to recognize certain information elements based on presentation on the printed page, such as the parties in the caption on a pleading. The keyboard operator then can enter certain data elements in particular fields, thus preserving some of the information represented by typographical position or features on the printed page. No individual keyboard operator can avoid making mistakes, but techniques and facilitating software products are available that reduce keyboarding errors by having every page keyboarded twice by two different operators, followed by a computer comparison, to identify differences and therefore possible errors. Obviously such a quality control technique doubles the labor costs of the keyboarding step.

From non-text electronic submissions

Any system for automating agency rulemaking and adjudicatory procedures must deal with the likelihood that some submissions will comprise non-text information such as video images, and sound files. The problems of representing videotapes or multimedia computer files are similar to, but more serious than those of representing textual information filed on paper. Similar to paper, the information is not directly machine processable, and word patterns cannot be searched for without further human processing. Unlike paper, scanning and OCR is not a possibility for creating a parallel text file for indexing or full-text retrieval. One approach is to keyboard certain identifying and descriptive data, linking such "header" records with the video image or sound file.

Storage

Once information has been acquired by a computer system in electronic form, whether in bitmapped or character-based formats, it can be stored in one of two basic ways: on optically readable media or on magnetically readable media. Optically readable media have the advantage of much greater information density than most magnetic media. A single platter the size of an audio CD used to distribute most popular music can store some 500 megabytes -- about 200,000 typewritten pages -- in character format. In addition, some optical media do not permit the information stored on them to be altered, thus representing a more permanent storage medium than magnetic alternatives.

Both optical and magnetic media come in easily removable formats, such as popular CDROMs used on desktop computers, and 3.5 inch magnetic diskettes. Both types of media also come in forms more or less permanently attached to devices used for writing to and reading from them, such as high speed, high capacity, magnetic and optical disk storage systems. Devices for writing to and reading from all of these forms of storage connect to other parts of the computing systems through standard protocols, thus making the other parts of the computing systems relatively indifferent to the particular type of storage medium used.

Because of the major differences in the density of representation of intelligence achieved by bitmaps compared with character-based representations, 31 much more storage capacity is required for systems that use bitmap images than for systems that rely on character formats.

Retrieval and presentation to agency personnel

The particulars of information retrieval and presentation subsystems depend on the design of software applications used to perform those functions. Theoretically, almost any kind of retrieval specification and presentation format is possible. In general, however, there are a number of constraints. Systems can retrieve related pieces of information, such as all the documents making up a single docket, only if all of those documents are linked to that docket. Similarly, it is possible to retrieve all documents filed by ABC company only if each document from that filer has information in it allowing a computer to determine that it has been filed by ABC company. Thus the retrieval strategies are limited by input formats. Good database designers begin their design with an inquiry into how the information in the database is to be retrieved and used and work from that understanding to a design of data input formats.

With respect to presentation, the main choices involve paper versus screen images, text based versus graphical, 32 and screen displays specialized for the particular agency or proceeding versus generic screen displays. 33 Historically, there was considerable consumer resistance to reading large amounts of information on computer displays resulting from the modest resolution and relatively small size of such displays. Now, large screen, high resolution displays, available at affordable prices, replicate the quality of a printed image available from a copy machine, which is acceptable to the vast majority of users.

Digital signatures

Digital signatures are potentially important ways of satisfying legal authentication requirements, introduced in Part VIII of this report. In the typical digital signature application, the contents of a document are summarized statistically into a hashcode or digest, and the document is transmitted along with an encrypted copy of the digest. Either public or private key encryption can be used to encode the digest. Someone wishing to authenticate the document, computes a new digest using the same mathematical function used by the sender, and then decrypts the transmitted digest using the key associated with the sender. If the digests match, the document came from that sender and its contents have not been tampered with. Any tampering or any other forgery causes the new digest not to match the decoded digest associated with the purported signer's key. The Food and Drug Administration has adopted a functionally oriented digital signature rule that can be adapted to other agency needs when digital signatures involving encryption seem desirable. Other, less sophisticated techniques also can be used for electronic signatures, including printed names of signers expressed in computer readable character codes with or without confidential personal identification numbers, such as those used in popular automatic teller machines in the banking industry.

Public access

Public access to the contents of agency computer systems can be provided in one or more of three basic ways: through printed output generated by the agency computer system; through agency owned and operated work stations located on agency premises, in public reading rooms; and remotely through computers owned and controlled by members of the public or by intermediaries like public libraries. Direct remote access usually is the most convenient to the public, eliminating the delay associated with preparing paper submissions in response to requests, enabling public use of computer formats, and eliminating the need for the member of the public to travel to an agency facility. 34

Remote access can be provided either through agency-specific electronic bulletin boards or through the Internet or other wide area networks.

Any form of remote access theoretically increases the risk of unauthorized intrusion into an agency computer system, although well recognized techniques of protection reduce the risk well within acceptable levels.

The quality of remote public access depends on the interaction between bandwidth 35 available for remote access and the type of information likely to be retrieved. If a typical public inquiry can be satisfied only by transferring large numbers of bitmaps or graphical images, a high bandwidth connection 36 is necessary to service the requests satisfactorily. On the other hand, if relatively small numbers of pages in character-based format constitute the typical public request, such requests easily can be satisfied through low speed modems and dialup telephone connections. 37 If a remote user has a 14.4 kilobit per second modem and an ordinary voice grade telephone line the user could download a 100 page document stored in text format in about 2 minutes, but the same document would take about an hour in image formats.

Footnotes

10 A server is a computer attached to a network that performs functions for more than one user. Some servers on small local area networks differ little if at all from individual desktop computers but specialize in access to files stored on their hard disks through the LAN. Other servers, such as those typically connected to wide area networks including the Internet, have greater performance and capacity than individual computers they support and may run specialized applications such as electronic mail, World Wide Web servers, or Gopher servers.

11 Bandwidth is a measure of the capacity of a communications link, frequently measured in bits per second. The typical bandwidth over a dialup telephone connection in 1988 was 1,200 bits per second limited by the speed of low cost modems. In mid 1995 the typical dialup connection bandwidth connection is moving to 28.8 kilobits per second, because of newly available international standards operating at that speed. Typical dedicated data lines leased from local exchange telephone companies operate at 56 kilobits per second, "T1"(1.45 megabits per second), or "T3" (45 megabits per second).

12 Widely-used Internet applications (in addition to email), include telnet, a method of establishing a remote terminal connection to another computer across the Internet; file transfer protocol ("ftp"), a means for transferring files between computers linked together by the Internet; gopher, a user-friendly menuing system for making files and text available; news and newsgroups, a means for electronic discussions in which posted messages and their replies are accessible to anyone connected to an Internet node; and World Wide Web.

13 There is no such thing as a president or board of directors of the Internet, although there are voluntary cooperative bodies such as the Internet Engineering Task Force ("IETF") that discuss and formulate standards and protocols through documents called requests for comments ("RFCs").

14 Without the client/server model, the host would have to send every pixel on the screen every time any part of the screen display on the user's terminal changes. Pixels are the dots, which when formed into patterns, show characters and other meaningful shaped.

15 Instead of sending every pixel for a new screen image the host (now called the "server") sends a brief message that says, "Show picture number 451."

16 The asynchronous transfer mode ("ATM") is a relatively new protocol receiving favorable attention from telephone companies. It uses a specialized form of packet switching with standard sized packets and "policy-based routing" which means that packets which cannot tolerate delay are sent before packets that can.

17 OSI is a model of computer system functions adopted by international standards bodies. It is divided into seven layers, ranging from hardware, in layer one, to applications, in layer seven. The model contemplates that hardware and software performing functions at particular layers can interact with functions at other layers by adhering to standards defining the layers even though different designers and vendors are involved.

18 Telnet is a basic Internet application enabling one node connected to the Internet to start a terminal session on another node.

19 File transfer protocol ("ftp") is a basic Internet application enabling one node to transfer files to or from another node.

20 Gopher is an easy to use menuing system for making files available through the Internet.

21 Wide Area Information Service ("WAIS") is the proprietary name for Z39.50, an international standard for full text searching across distributed networked databases.

22 World Wide Web is a popular Internet application, in which documents appear with hypertext links that, when activated, retrieve other information from the same or other Internet nodes.

23 http://www.vcilp.org

24 Teachers Insurance and Annuity Association/College Retirement Equities Fund, THE PARTICIPANT, Aug. 1995, at p. 10 (reporting that TIAA-CREF logged more than 250,000 connections to its Gopher service from early March to the end of July, 1995). See also Gopher://gopher/tiaa-cref.org, http://chronicle.merit.edu/.vendors/.tiaa/home.html.

25 Those without their own computers have increasing access through hardware, software and communications links in public libraries and in commercial enterprises such as Kinkos.

26 It is important to realize, however, that one may need proprietary format technology only to submit information, and may not need it to access information already filed, and that larger populations may need only to access information using the proprietary formats, not to submit such information.

27 Effective legal representation is, in signification part, persuasiveness, and the appearance of a document is an important tool for a legal representative -- or so many practicing lawyers believe.

28 Standard Generalized Markup Language ("SGML") is an international standard that permits (mostly) textual material to be marked up with tags that allow a variety of output devices, including video displays, and different kinds of printers, to present the material with formatting appropriate for the device. It allows textual databases to be organized based on the conceptual structure of documents, making headings, indexes, and body text computer-recognizable. It thus avoids approaches that embed formatting instructions or codes according to the characteristics of a particular output device. The popularity of SGML has grown with increasing markets for CDROM products and with the popularity of the World Wide Web, which uses a markup language, html, that resembles SGML in many ways. See generally Henry H. Perritt, Jr., Format and Content Standards for the Electronic Exchange of Legal Information, 33 JURIMETRICS J. 265 (1993) (explaining utility of SGML for legal information).

29 The most popular coding scheme for representing alphanumeric information is the American Standard Code for Information Interchange ("ASCII"). Using ASCII, one need not communicate the bits that describe the letter "A." One simply sends the decimal number "65."

30 The University of Nevada is a contractor to the Department of Energy, responsible, among other things for "research in support of DOE development of a computerized Licensing Support System (LSS) based on OCR technology, as agreed to in negotiations with the NRC for licensing of a potential high-level nuclear waste repository at Yucca Mountain. The research will focus on enhancing existing technology in order to create faster, more efficient, and more effective software. UNLV will design, develop, and execute a research program aimed at increasing the efficiency of the LSS. This plan will be drafted in conjunction with DOE and its contractors to ensure that the focus of the research provides optimal benefits to the LSS. Examples of possible areas include reprocessing of text, improving recognition algorithms, intelligent automation of indexing processes, improving retrieval effectiveness, and researching hardware vs. software solutions to document searching." 54 Fed. Reg. 52981 (Dec. 26, 1989) (award of cooperative agreement).

31 A typical character-based representation of a textual page would be about one kilobyte. A typical bitmap of the same page would be about thirty kilobytes.

32 Graphical displays are easier for people to use because they enlarge the possibilities for representing and emphasizing information. Graphical displays require more bandwidth between the originating computer and the display, and require more processing power, however.

33 Specialized screen displays fit the nature of agency information better, but require more investment to program, and may -- depending on whether the client/server model is used -- be usable only by those with specialized software on their computers. The designer of a specialized graphical interface is likely to want the client computer to perform many of the screen management tasks, in order to reduce demands on bandwidth and server processing power. That approach would require that client users have copies of the specialized client software.

34 There may be circumstances in which direct access is not most convenient for users because of the transaction costs of establishing direct connections, or because they lack familiarity with the technology applications used in direct access. For these consumers, visiting a terminal in a public reference room may be more convenient. Obviously the trade off depends on how far away the nearest public reference room terminal is.

35 Bandwidth is the rate of information that can be transferred through a communications channel. It thus measures capacity of a network and usually is expressed in kilo (thousands) of bits per second (kbps), or mega (millions) of bits per second (mbps). For example, a 100 megabit per second Ethernet LAN has ten times the information transfer capacity of a 10 megabit per second Ethernet LAN.

36 "High" in this context means at least 56 kbps, up to 1.45 mbps (T1).

37 "Low" in this context means 9.6-28.8 kbps.

Back to Table of Contents


You can have this document FAXed to you through VCILP's FlexGate (sm) service.