The possibilities for automating agency rulemaking and adjudication have increased because of the availability of low-priced desktop computers and associated networking hardware and software. Networking technologies, particularly, have expanded possibilities for using information technology to transfer information in electronic form between participants in regulatory proceedings and agency decisionmakers.
Until the Internet became popular, however, most wide area networks were proprietary, involving mainframe computers at one end of a connection running proprietary applications and desktop computers or dumb terminals at the other end of the connection running proprietary interface software, sometimes utilizing international data communications standards such as X.25 over the communications link itself.
The Internet has accelerated an independent tendency toward two important technological phenomena: client/server computing and distributed database management, both of which are significant to agency automation. The client/server model permits software developers to allocate tasks between a "server" computer and a "client" computer, connected to each other through a network, so as to maximize performance, security, and other design criteria. For example, the client/server model permits the client, such as an individual user's desktop computer, to perform most or all of the tasks associated with graphical image management and screen displays, thus making it easier to implement windows-based and other highly graphical computing sessions without burdening the communications link with large quantities of data necessary to describe all the features of a particular screen image. 14 Using this model, the individual user works with a Microsoft Windows or Macintosh display, communicating data and instructions to the server by pointing and clicking with the mouse. The client sends greatly abbreviated messages of one or a few characters across the communications link based on its "mediation" between the graphical interface and the communications link. The server can cause a particular image or change in the user display to be presented by sending a similarly abbreviated character or character string based on its "knowledge" of the graphical images the client already has and on its "knowledge" of the operational details of the client's interface software. 15
The second important phenomenon is distributed database management. This concept allows a user to combine data actually stored on a multiplicity of computers. For example, a client interested in a particular submission in notice-and-comment rulemaking could retrieve the docket kept on computer A, and also select a particular item from the docket, which would cause the desired document to be retrieved from computer B. Another document, for example, an opposing party's response, might be retrieved from computer C. Depending on the quality of the user's client software, the user might retrieve all of this material and have it presented to her as an integrated set with no indication that the elements of the set came from different computers.
The flexibility and power of the client/server and distributed data-base models are enhanced when the protocols for implementing them are non-proprietary and "open." With open protocols, a multiplicity of designers and vendors can make up the pieces that can be combined into a distributed database or a client/server application.
Internet communication makes more efficient use of communications capacity than some alternatives such as dialup bulletin boards. When a member of the public establishes a dialup connection to an agency electronic bulletin board, the connection is held open for the entire duration of the interaction between the member of the public and the agency system. When a member of the public communicates with an agency server through the Internet, communications capacity is used only when packets of information actually are being transmitted. The only part of an Internet session that involves holding open a telephone connection, reserved for that particular session regardless of whether information is moving across it, is the telephone connection from a requester's modem and a -- usually local -- point of presence for an Internet access provider.
Despite its technological significance, the Internet worries many computer professionals. Its open nature means a significant loss of control by computer managers who hook up to it. Once they connect, they are potentially vulnerable to unknown persons who also connect to the Internet. While a variety of "firewall" techniques exist to prevent intruders from unauthorized activities on a computer system connected to the Internet, an Internet connection opens up security risks that do not exist in the absence of external network connections. In addition, because responsibility for the operational status and performance of Internet facilities is diffused among operators of multiple backbones, operators of Internet access services, operators of nodes that supply particular information items, and user facility managers, there is also concern that the Internet is less reliable than closed and proprietary network solutions.
Also, the Internet is based on UNIX and TCP/IP, operating system and network application technologies that may be unfamiliar to computer professionals trained and experienced on proprietary DEC or IBM systems.
Moreover, as the public switched telephone system becomes more digital and shifts to packet switching wholly or entirely, 16 there may be less need for the Internet as a distinct networking protocol. But at most, this would eliminate the need for specialized Internet features at the IP (network) and TCP (session) layers of the Open Systems Interconnection ("OSI") stack. 17 There would continue to be the need for non-proprietary open standards at higher, application layers such as Telnet, 18 FTP, 19 Gopher, 20 WAIS, 21 and World Wide Web 22 applications associated with the Internet.
Despite skepticism about the Internet's limitations, its use to distribute public and private information outside the research context has grown rapidly. The Villanova Law School Villanova Center for Information Law and Policy maintains an Internet server that among other things has a "Federal Web Locator," comprising pointers to some 650 Internet servers maintained by federal government entities. 23 More than 1,874,926 requests for information have been handled through this Federal Web Locator. The Library of Congress Thomas System, implemented through the Internet's World Wide Web is one of the few places where the full text of bills introduced in the House and Senate can be obtained electronically. In mid-1995, OMB sponsored several electronic "town hall" discussions on the Internet, using newsgroup and listserv applications. Several hundred people participated, from all over the country, and representing diverse backgrounds and points of view. Private entities such as the Teachers Insurance and Annuity Association ("TIAA"), one of the largest pension plans in the country, has made participant and beneficiary information available through an Internet Gopher server. 24 Both the Nuclear Regulatory Commission and the Department of Transportation have established World Wide Web "home pages," shown in figures 1 and 2.
Figure 1 - DOT Home Page
Figure 2 - NRC Home Page
On the other hand, present Internet applications are better at electronic publishing than at facilitating interactive communication. It is relatively straightforward to publish a notice of proposed rulemaking and associated documents and images through a Gopher menu or a World Wide Web page. Establishing an appropriate framework for exchanging pleadings or statements of litigation position in an adjudication is more difficult. One possible starting point for developing an adversarial proceeding matrix is newsgroup or list-serve technology like that used in the OMB town hall initiatives in which a group of people interested in the same subject post messages, all of which can be seen by the other participants. Under this concept, one party could post a complaint, followed by opposing parties posting their answers, followed by first party responses, and so on. The newsgroup or list then would represent the docket with the full contents of all of the submissions.
Alternatively, some World Wide Web applications permit the creation of specialized forms which might be adapted to an interactive administrative litigation context. In any event, these possibilities are at present untried and therefore unproved.
Receiving information in electronic formats relieves the agencies of burdensome data entry tasks; the data from outside parties goes directly into the computer system without having to be keyboarded or scanned. The risk of error and omissions in the data entry process is virtually eliminated although errors can occur in handling electronic submissions as well.
There are, however, at least two general limitations of electronic submission that vitiate cost-savings and error-reduction potential. First, not everyone in the population entitled to submit information to agencies has access to the technology necessary to prepare electronic submissions. As the desktop computer revolution continues, the portion of the population without access to basic computer technology is diminishing rapidly, 25 but there probably will always be some irreducible minimum that is unable or unwilling to use basic computer technology. Moreover, to the extent electronic submissions must be presented in sophisticated and or proprietary formats, the size of the relevant population lacking access to relevant filing technology increases proportionately. 26
The other limitation on electronic submission arises from the richness of formatting used in conventional submissions. Paper documents permit a rich variety of concepts to be communicated typographically, by typeface, by typesize, position on page, and physical placement of related elements, as in tables. 27 There is no single computer protocol or standard that permits all of these to be expressed unambiguously in computer-readable format. Different vendors, especially word processing software and desktop publishing software vendors, each have their own methods of coding typographic design elements. But if an agency uses one of these proprietary page description languages it excludes -- or at least significantly increases the cost to -- users of other products. For example, an agency can specify that electronic submissions must be presented in the popular WordPerfect word processing format, and such a specification would enable submitters to express a variety of typographical features, but it would disadvantage users of the Microsoft Word and Lotus Ami Pro products. There are non-proprietary standards or "grammars" for expressing a rich variety of typographical elements. The most flexible and prominent is Standard Generalized Markup Language ("SGML"). 28 Not many consumer-oriented products have SGML capability yet, however.
In addition, specialized software applications could be written for a particular agency or a particular proceeding that would structure user responses and create a computer file with the responses represented in an appropriate format for computer acceptance. The experience of using such software would be like filling out an electronic form.
There is an additional step that can be performed with scanned page images that generates a version of each page in character-based format. Optical Character Recognition ("OCR") searches for patterns of light and dark spots on the page corresponding to the patterns made by alpha-numeric characters and substitutes a sequence of alpha-numeric characters (or, more likely, ASCII codes) for the character images it finds on the page. Many legal applications, such as those regularly used to manage discovery documents in complex civil litigation, have two, linked, files for each page of material, the bitmapped image and the OCR translation. The software permits users to search the OCR version for words and phrases and presents both the OCR file and the bitmapped version of the corresponding page at the user's option.
OCR technology works well only with "clean" bitmaps, ones not containing extraneous spots or marks and ones not containing handwritten annotations. Moreover, even on clean bitmaps, OCR rarely has an error rate much better than 95%, meaning that the OCR file from a typical bitmapped page of typewritten text would have from ten to twenty "typographical" errors on it. Some recent research indicates the potential for improving accuracy by a few percentage points. For example, NRC's LSS Steering Committee was informed in its December, 1994 meeting that the Information Science Research Institute at the University of Nevada 30 was able to achieve 97.4 to 98.5% accuracy, scanning second generation DOE documents with a Calera Recognition Systems system.
Many OCR software products are adroit at identifying possible errors and inviting human inspection and correction. Using such error-correction software, however, requires considerable human input, amounting to a minute or more per page.
Keyboarding involves a human keyboard operator typing the entire contents of a document submitted on paper. Keyboard operators can be trained to recognize certain information elements based on presentation on the printed page, such as the parties in the caption on a pleading. The keyboard operator then can enter certain data elements in particular fields, thus preserving some of the information represented by typographical position or features on the printed page. No individual keyboard operator can avoid making mistakes, but techniques and facilitating software products are available that reduce keyboarding errors by having every page keyboarded twice by two different operators, followed by a computer comparison, to identify differences and therefore possible errors. Obviously such a quality control technique doubles the labor costs of the keyboarding step.
Both optical and magnetic media come in easily removable formats, such as popular CDROMs used on desktop computers, and 3.5 inch magnetic diskettes. Both types of media also come in forms more or less permanently attached to devices used for writing to and reading from them, such as high speed, high capacity, magnetic and optical disk storage systems. Devices for writing to and reading from all of these forms of storage connect to other parts of the computing systems through standard protocols, thus making the other parts of the computing systems relatively indifferent to the particular type of storage medium used.
Because of the major differences in the density of representation of intelligence achieved by bitmaps compared with character-based representations, 31 much more storage capacity is required for systems that use bitmap images than for systems that rely on character formats.
With respect to presentation, the main choices involve paper versus screen images, text based versus graphical, 32 and screen displays specialized for the particular agency or proceeding versus generic screen displays. 33 Historically, there was considerable consumer resistance to reading large amounts of information on computer displays resulting from the modest resolution and relatively small size of such displays. Now, large screen, high resolution displays, available at affordable prices, replicate the quality of a printed image available from a copy machine, which is acceptable to the vast majority of users.
Remote access can be provided either through agency-specific electronic bulletin boards or through the Internet or other wide area networks.
Any form of remote access theoretically increases the risk of unauthorized intrusion into an agency computer system, although well recognized techniques of protection reduce the risk well within acceptable levels.
The quality of remote public access depends on the interaction between bandwidth 35 available for remote access and the type of information likely to be retrieved. If a typical public inquiry can be satisfied only by transferring large numbers of bitmaps or graphical images, a high bandwidth connection 36 is necessary to service the requests satisfactorily. On the other hand, if relatively small numbers of pages in character-based format constitute the typical public request, such requests easily can be satisfied through low speed modems and dialup telephone connections. 37 If a remote user has a 14.4 kilobit per second modem and an ordinary voice grade telephone line the user could download a 100 page document stored in text format in about 2 minutes, but the same document would take about an hour in image formats.
Footnotes
10 A server is a computer attached to a network that performs functions for more than one user. Some servers on small local area networks differ little if at all from individual desktop computers but specialize in access to files stored on their hard disks through the LAN. Other servers, such as those typically connected to wide area networks including the Internet, have greater performance and capacity than individual computers they support and may run specialized applications such as electronic mail, World Wide Web servers, or Gopher servers.
11 Bandwidth is a measure of the capacity of a communications link, frequently measured in bits per second. The typical bandwidth over a dialup telephone connection in 1988 was 1,200 bits per second limited by the speed of low cost modems. In mid 1995 the typical dialup connection bandwidth connection is moving to 28.8 kilobits per second, because of newly available international standards operating at that speed. Typical dedicated data lines leased from local exchange telephone companies operate at 56 kilobits per second, "T1"(1.45 megabits per second), or "T3" (45 megabits per second).
12 Widely-used Internet applications (in addition to email), include telnet, a method of establishing a remote terminal connection to another computer across the Internet; file transfer protocol ("ftp"), a means for transferring files between computers linked together by the Internet; gopher, a user-friendly menuing system for making files and text available; news and newsgroups, a means for electronic discussions in which posted messages and their replies are accessible to anyone connected to an Internet node; and World Wide Web.
13 There is no such thing as a president or board of directors of the Internet, although there are voluntary cooperative bodies such as the Internet Engineering Task Force ("IETF") that discuss and formulate standards and protocols through documents called requests for comments ("RFCs").
14 Without the client/server model, the host would have to send every pixel on the screen every time any part of the screen display on the user's terminal changes. Pixels are the dots, which when formed into patterns, show characters and other meaningful shaped.
15 Instead of sending every pixel for a new screen image the host (now called the "server") sends a brief message that says, "Show picture number 451."
16 The asynchronous transfer mode ("ATM") is a relatively new protocol receiving favorable attention from telephone companies. It uses a specialized form of packet switching with standard sized packets and "policy-based routing" which means that packets which cannot tolerate delay are sent before packets that can.
17 OSI is a model of computer system functions adopted by international standards bodies. It is divided into seven layers, ranging from hardware, in layer one, to applications, in layer seven. The model contemplates that hardware and software performing functions at particular layers can interact with functions at other layers by adhering to standards defining the layers even though different designers and vendors are involved.
18 Telnet is a basic Internet application enabling one node connected to the Internet to start a terminal session on another node.
19 File transfer protocol ("ftp") is a basic Internet application enabling one node to transfer files to or from another node.
20 Gopher is an easy to use menuing system for making files available through the Internet.
21 Wide Area Information Service ("WAIS") is the proprietary name for Z39.50, an international standard for full text searching across distributed networked databases.
22 World Wide Web is a popular Internet application, in which documents appear with hypertext links that, when activated, retrieve other information from the same or other Internet nodes.
23 http://www.vcilp.org
24 Teachers Insurance and Annuity Association/College Retirement Equities Fund, THE PARTICIPANT, Aug. 1995, at p. 10 (reporting that TIAA-CREF logged more than 250,000 connections to its Gopher service from early March to the end of July, 1995). See also Gopher://gopher/tiaa-cref.org, http://chronicle.merit.edu/.vendors/.tiaa/home.html.
25 Those without their own computers have increasing access through hardware, software and communications links in public libraries and in commercial enterprises such as Kinkos.
26 It is important to realize, however, that one may need proprietary format technology only to submit information, and may not need it to access information already filed, and that larger populations may need only to access information using the proprietary formats, not to submit such information.
27 Effective legal representation is, in signification part, persuasiveness, and the appearance of a document is an important tool for a legal representative -- or so many practicing lawyers believe.
28 Standard Generalized Markup Language ("SGML") is an international standard that permits (mostly) textual material to be marked up with tags that allow a variety of output devices, including video displays, and different kinds of printers, to present the material with formatting appropriate for the device. It allows textual databases to be organized based on the conceptual structure of documents, making headings, indexes, and body text computer-recognizable. It thus avoids approaches that embed formatting instructions or codes according to the characteristics of a particular output device. The popularity of SGML has grown with increasing markets for CDROM products and with the popularity of the World Wide Web, which uses a markup language, html, that resembles SGML in many ways. See generally Henry H. Perritt, Jr., Format and Content Standards for the Electronic Exchange of Legal Information, 33 JURIMETRICS J. 265 (1993) (explaining utility of SGML for legal information).
29 The most popular coding scheme for representing alphanumeric information is the American Standard Code for Information Interchange ("ASCII"). Using ASCII, one need not communicate the bits that describe the letter "A." One simply sends the decimal number "65."
30 The University of Nevada is a contractor to the Department of Energy, responsible, among other things for "research in support of DOE development of a computerized Licensing Support System (LSS) based on OCR technology, as agreed to in negotiations with the NRC for licensing of a potential high-level nuclear waste repository at Yucca Mountain. The research will focus on enhancing existing technology in order to create faster, more efficient, and more effective software. UNLV will design, develop, and execute a research program aimed at increasing the efficiency of the LSS. This plan will be drafted in conjunction with DOE and its contractors to ensure that the focus of the research provides optimal benefits to the LSS. Examples of possible areas include reprocessing of text, improving recognition algorithms, intelligent automation of indexing processes, improving retrieval effectiveness, and researching hardware vs. software solutions to document searching." 54 Fed. Reg. 52981 (Dec. 26, 1989) (award of cooperative agreement).
31 A typical character-based representation of a textual page would be about one kilobyte. A typical bitmap of the same page would be about thirty kilobytes.
32 Graphical displays are easier for people to use because they enlarge the possibilities for representing and emphasizing information. Graphical displays require more bandwidth between the originating computer and the display, and require more processing power, however.
33 Specialized screen displays fit the nature of agency information better, but require more investment to program, and may -- depending on whether the client/server model is used -- be usable only by those with specialized software on their computers. The designer of a specialized graphical interface is likely to want the client computer to perform many of the screen management tasks, in order to reduce demands on bandwidth and server processing power. That approach would require that client users have copies of the specialized client software.
34 There may be circumstances in which direct access is not most convenient for users because of the transaction costs of establishing direct connections, or because they lack familiarity with the technology applications used in direct access. For these consumers, visiting a terminal in a public reference room may be more convenient. Obviously the trade off depends on how far away the nearest public reference room terminal is.
35 Bandwidth is the rate of information that can be transferred through a communications channel. It thus measures capacity of a network and usually is expressed in kilo (thousands) of bits per second (kbps), or mega (millions) of bits per second (mbps). For example, a 100 megabit per second Ethernet LAN has ten times the information transfer capacity of a 10 megabit per second Ethernet LAN.
36 "High" in this context means at least 56 kbps, up to 1.45 mbps (T1).
37 "Low" in this context means 9.6-28.8 kbps.