Extensible Provider for Windows PowerShell - ExtBrain

81
Charles University in Prague Faculty of Mathematics and Physics MASTER THESIS Bc. Josef Závišek Extensible Provider for Windows PowerShell Department of Distributed and Dependable Systems Supervisor: Mgr. Pavel Ježek Consultant: Ing. Tomáš Novotný Study program: Computer science, software systems 2011

Transcript of Extensible Provider for Windows PowerShell - ExtBrain

Page 1: Extensible Provider for Windows PowerShell - ExtBrain

Charles University in Prague

Faculty of Mathematics and Physics

MASTER THESIS

Bc. Josef Závišek

Extensible Provider for Windows PowerShell

Department of Distributed and Dependable Systems

Supervisor: Mgr. Pavel Ježek

Consultant: Ing. Tomáš Novotný

Study program: Computer science, software systems

2011

Page 2: Extensible Provider for Windows PowerShell - ExtBrain

2

I would like to thank all the people who helped me with this thesis. Firstly, I would like

to express my thanks to my supervisor Mgr. Pavel Ježek for his professional guidance,

valuable ideas and help with the text of this thesis. Special thanks go to my consultant

Ing. Tomáš Novotný for all the support, encouragement, help with the problems that occurred

during software development and for hours spent during consultations. I cannot fail to mention

Mgr. Zuzana Hořká, who helped me with the language corrections. And finally, I want to

express thanks to my family and my girlfriend for always being there for me.

I hereby declare that I have completed this master thesis independently and that I have listed

all the literature and publications used. I have no objection to usage of this thesis in

compliance with the Act §60 No. 121/2000Sb. (copyright law), and with the rights connected

with the Copyright Act, as amended.

In Prague, April 7, 2011 ..............................................

Josef Závišek

Page 3: Extensible Provider for Windows PowerShell - ExtBrain

3

CONTENTS

ANOTATION ..................................................................................................... 5

1 INTRODUCTION .......................................................................................... 6

1.1 Motivation .............................................................................................................. 6 1.1.1 Tools for data manipulation ....................................................................................... 6 1.1.2 What can be improved ............................................................................................... 6 1.1.3 ExtBrain project .......................................................................................................... 6

1.2 PowerShell overview ............................................................................................... 7

1.3 Goals....................................................................................................................... 9 1.3.1 Why universal provider .............................................................................................. 9 1.3.2 Goals summary ......................................................................................................... 10

2 ANALYSIS ................................................................................................... 11

2.1 Introduction to PowerShell ..................................................................................... 11 2.1.1 PowerShell architecture ........................................................................................... 11 2.1.2 Extending PowerShell ............................................................................................... 12 2.1.3 Requirements to implement custom provider ......................................................... 13

2.2 Universal filesystem provider ................................................................................. 15 2.2.1 Idea of the provider .................................................................................................. 15 2.2.2 Requirements ........................................................................................................... 16

2.3 Project separation into modules ............................................................................. 16

2.4 Reading process ..................................................................................................... 17 2.4.1 Path matching ........................................................................................................... 18 2.4.2 Readers ..................................................................................................................... 20 2.4.3 Transferring data between readers .......................................................................... 20 2.4.4 Reader items architecture ........................................................................................ 22 2.4.5 Expansion and conversion ........................................................................................ 24 2.4.6 Stream sharing during reading process .................................................................... 25 2.4.7 Sharing an archive object ......................................................................................... 27

2.5 Writing process ...................................................................................................... 28 2.5.1 Write path accumulation and collisions ................................................................... 29 2.5.2 Write items separately or together .......................................................................... 30 2.5.3 Stream disposing problem ........................................................................................ 30

2.6 The library for compression and decompression ...................................................... 31

2.7 SevenZip analysis ................................................................................................... 31 2.7.1 SevenZip wrapper ..................................................................................................... 32 2.7.2 SevenZip COM API overview .................................................................................... 33 2.7.3 C# SevenZip architecture .......................................................................................... 34

3 IMPLEMENTATION ................................................................................. 37

3.1 Modules description ............................................................................................... 37

Page 4: Extensible Provider for Windows PowerShell - ExtBrain

4

3.2 SevenZip library ..................................................................................................... 38 3.2.1 COM interop classes ................................................................................................. 38 3.2.2 Decompression interop classes ................................................................................ 41 3.2.3 Loading SevenZip library into C# .............................................................................. 43 3.2.4 The decompression process ..................................................................................... 48 3.2.5 Compression interop classes .................................................................................... 50 3.2.6 The compression process ......................................................................................... 52 3.2.7 SevenZip API in C# .................................................................................................... 54

3.3 Implementation of the PowerShell extension .......................................................... 57 3.3.1 Snap-In implementation ........................................................................................... 57 3.3.2 FileSystem provider implementation ....................................................................... 59

3.4 Reading process ..................................................................................................... 60 3.4.1 Readers architecture ................................................................................................ 60 3.4.2 Readers ..................................................................................................................... 62 3.4.3 Conversion & Expansion ........................................................................................... 63

3.5 Writing process ...................................................................................................... 63 3.5.1 Writers architecture ................................................................................................. 63 3.5.2 Writers ...................................................................................................................... 64 3.5.3 Input items for writing .............................................................................................. 65 3.5.4 Collisions ................................................................................................................... 66

3.6 Stream tools ........................................................................................................... 66

4 USER DOCUMENTATION ....................................................................... 68

4.1 SevenZip library usage ............................................................................................ 68

4.2 Provider usage ....................................................................................................... 70

4.3 Future extensions ................................................................................................... 73

5 CONCLUSION ............................................................................................. 77

5.1 Evaluation .............................................................................................................. 77

5.2 Comparison with other products ............................................................................. 77

5.3 Future visions ......................................................................................................... 77

REFERENCES ................................................................................................. 79

APPENDICES .................................................................................................. 81

Page 5: Extensible Provider for Windows PowerShell - ExtBrain

5

ANOTATION

Title: Extensible provider for Windows PowerShell

Author: Bc. Josef Závišek

Department: Department of Distributed and Dependable Systems

Supervisor: Mgr. Pavel Ježek

Supervisor’s email address: [email protected]

Consultant: Ing. Tomáš Novotný

Abstract:

This thesis deals with the design and implementation of an extensible provider for Windows

PowerShell. This provider allows registering the adapters which provide access to various data

stores. The thesis gives an introduction into PowerShell and outlines how to realize new

extensions. It then elaborates the architecture of the provider in detail. Next part is devoted to

the design and implementation of the adapter for compressed files. For this purpose, the

SevenZip library is used which had to be adapted for the use from C# language. Therefore, the

thesis also includes description of the wrapper allowing the library utilization from the

managed code.

Keywords: extensible, provider, PowerShell, SevenZip, COM, wrapper

Název práce: Rozšiřitelný provider pro Windows PowerShell

Autor: Bc. Josef Závišek

Katedra (ústav): Katedra distribuovaných a spolehlivých systémů

Vedoucí diplomové práce: Mgr. Pavel Ježek

e-mail vedoucího: [email protected]

Konzultant: Ing. Tomáš Novotný

Abstrakt:

Předložená práce se zabývá návrhem a implementací rozšiřitelného provideru pro Windows

PowerShell. Tento provider umožňuje registraci adapterů, které zpřístupňují různá datová

úložiště. Práce podává stručný úvod do PowerShellu a nastiňuje způsob, jakým se nová

rozšíření realizují. Detailněji je pak rozpracována architektura provideru. Další část práce je

věnována návrhu a implementaci adaptéru pro komprimované soubory. K tomu je využita

knihovna SevenZip, která musela být přizpůsobena pro použití z jazyka C#. Součástí práce je

tedy i popis knihovny a implementace wrapperu umožňujícího použití knihovny v managed

kódu.

Klíčová slova: rozšiřitelný, provider, PowerShell, SevenZip, COM, wrapper

Page 6: Extensible Provider for Windows PowerShell - ExtBrain

6

1 INTRODUCTION

1.1 Motivation The expansion of the IT industry in recent years is causing intensifying competition

among companies and thus increasing demands on developers. They have to perform their

daily tasks more quickly while keeping reliability and quality of their work at high level. As a

result, many companies are developing custom tools and frameworks that give them advantage

over competitors and help them do their jobs faster and better.

When considering developers‟ everyday work, it includes not only code writing but also

additional data manipulation like copying, deleting, moving, etc. since the data delivered by

customers are often wrong or incomplete. This is done especially when new versions of the

software are released and since many projects are developed iteratively, the time taken by this

additional work increases.

The intention of this thesis is to improve the way developers manipulate with data. But

since this is a very extensive area, let us firstly look at the tools that are currently used and

specify what can be particularly improved.

1.1.1 Tools for data manipulation Most commonly used utilities that enable data manipulation are so called file managers

and there are scores of them. For instance, the most popular file managers are Altap

Salamander, Free Commander, Total Commander, Midnight Commander, GNome

Commander and others. A typical file manager application consists of three main parts: two

panels that show the structure of the storage and the third, the command line, which is

essentially a minimized command (shell) window. This brings us to another type of frequently

used tools – command-line shells.

Command-shells enable interaction with the computer as well as data manipulation.

They include set of commands through which the client controls the computer. These

commands can often be put into scripts and thus perform more complex tasks. Similarly to file

managers, many shells exist on Windows, Unix and other platforms. The well-known ones are

for example Bourne Shell, C Shell, Emacs Shell, PowerShell, cmd.exe etc. Shells and file

manager applications are often connected together as many file managers enable invoking the

shell commands.

1.1.2 What can be improved The problem with the file managers is that they are often old and thus connected also

with old shells (e.g. Total Commander which is developed in Delphi and still uses old

cmd.exe). This fact has developed the idea to build a new file manager that will take advantage

from some newer shell. The idea became fundamental for the ExtBrain project, particularly for

its part called Commander. This project is oriented on Windows PowerShell which is still

relatively young. Its first release appeared in 2006 and so far no file manager that would

leverage the PowerShell‟s infrastructure has been developed. Since PowerShell‟s popularity

has been growing in recent years, many extensions appeared. Therefore, the need of an

application that would effectively use it is increasing. The ExtBrain project aims to fill this gap

and this thesis becomes its part.

1.1.3 ExtBrain project ExtBrain is a research project that is being created on the Czech Technical University

and its aim is to simplify everyday tasks of researchers, software developers, project managers

and other power users who need to extract, manipulate and exchange information. The project

Page 7: Extensible Provider for Windows PowerShell - ExtBrain

7

consists of four main parts called ExtBrain Communicator, ExtBrain Commander, ExtBrain

Extractor and ExtBrain for Android. More information about this project can be found at [1].

This thesis relates the ExtBrain Commander part whose intention is to develop file

manager application that will leverage the PowerShell infrastructure. However, it is not

focused only on the application itself but its aim is to create the whole platform for data

manipulation. The structure of the ExtBrain Commander project is sketched in the Figure 1.

Figure 1 ExtBrain Commander structure

However, PowerShell must be adjusted in order to be used in such a project. This thesis

deals with these necessary adjustments. Therefore, before the goals of this thesis will be

specified, let us introduce the PowerShell concepts first.

1.2 PowerShell overview Every release of Microsoft Windows included a command-line tool through which the

client could control a computer. Old versions of these tools supported only basic operations

and for other purposes, separate applications had to be implemented which were then invoked

from these shells. In order to automate various tasks they included scripting language. Well

known BAT files could put together several commands and perform more complicated

operations. But the fact is that the scripting language is very limited and it is not eligible for

creation of complex scripts.

Windows PowerShell is new command-line shell from Microsoft that was developed in

order to substitute old cmd.exe. It brought many benefits which are briefly discussed in this

chapter. More information can be found at [2], [3], [4].

Cmdlets

Rather than building custom applications for each task PowerShell includes small utility

programs called cmdlets. These can either be run directly from the command shell prompt or

called from within a batch file or script. Writing custom cmdlet is very simple and lot of them

have appeared since PowerShell release. The good thing about them is that Microsoft strictly

keeps the consistency of their naming. Every cmdlet name consists of a verb and a subject

where the verb expresses the intended action and the name represents an item to which the

action relates. This helps with learning them as their name can be often guessed. Moreover,

users can define alias names for each cmdlet and adjust them according to their own

preferences. Many standard cmdlets have these aliases whose names are often adopted from

old cmd.exe or unix shells.

Page 8: Extensible Provider for Windows PowerShell - ExtBrain

8

Object pipe

PowerShell is written in .NET1 and it is object oriented. “Unlike traditional

command-line environments that work by returning text results to the end user or routing

(“piping”) text to different command-line utilities, Windows PowerShell manipulates .NET

Framework objects directly.” [5] For example, when the get-service cmdlet is called in

Windows PowerShell, a .NET Framework object that represents the service is written to the

pipe. Then, it can be passed to other cmdlets which can do further processing. The sample is

shown in the Figure 2.

Figure 2 Object pipe sample

Scripting

Windows PowerShell uses its own scripting language. This language is dynamically

typed and is designed to be consistent with higher-level languages used in .NET, such as C#. It

supports variables, functions, branching, loops and other language constructs. PowerShell

provides full access to the whole .NET Framework, COM2 and WMI

3 and therefore almost any

operation achievable in .NET can be done through PowerShell as well. Moreover, the tasks can

be performed both on a local and remote computer.

Hosting

One advantage is that PowerShell can be hosted in another managed application4 which

can instantiate a Runspace (an instance of PowerShell runtime). Then, it can run cmdlets

within the Runspace and display the results to the client or perform additional tasks.

Providers

PowerShell Providers are .NET programs that allow the client to work with data stores

as if they were mounted drives. This simplifies accessing external data outside the PowerShell

environment. For example, the client can access the registry as if it was a file system. This

translates to being able to use the same cmdlets as working with files and folders. Cmdlets that

relates to providers are shown in the Table 1 and through these the common operations on the

storage can be achieved.

Cmdlet Alias Cmd Commands Descritption

Get-Location gl pwd Current Directory.

Set-Location sl cd, chdir Change current directory.

Copy-Item cpi Copy Copy Files.

1 .NET Framework is programming infrastructure created by Microsoft for building, deploying and

running applications and services. 2 COM is a platform-independent, distributed, object-oriented system for creating binary software

components that can interact. [29] 3 Windows Management Instrumentation (WMI) is the infrastructure for management data and

operations on Windows-based operating systems. [28] 4 Managed application identifies an application that is executed under the “management” of a Common

Language Runtime virtual machine.

Page 9: Extensible Provider for Windows PowerShell - ExtBrain

9

Remove-Item ri del Removes a File or directory.

Move-Item mi move Move a file.

Rename-Item rni rn Rename a file.

New-Item ni n/a Creates a new empty file or folder.

Clear-Item cli n/a Clears the contents of a file.

Set-Item si n/a Set the contents of a file.

Mkdir n/a md Creates a new directory.

Get-Content gc type Sends contents of a file to the

output stream.

Set-Content sc n/a Set the contents of a file.

Table 1 Provider cmdlets

1.3 Goals When looking at the existing file managers, most of them have a plugin architecture that

can be extended via so called packer plugins. These are the plugins that allow users to go

through the content of files. The good example is a packer plugin for archive files in Total

Commander which enables users to go through the content of archives in the same way as if

they were going through the file system. However, PowerShell does not have this ability.

Microsoft included several default provider implementations through which the client can

access the file system in a computer, registry or certificate storage and perform common

operations like copying, reading, deleting, moving etc. However, it is not possible to access

data inside the files. The aim of this thesis is to add such support to PowerShell so the future

file manager application could use it. Let us see how.

The idea is to rewrite the original PowerShell‟s file system provider and extend its

functionality so it can register adapters for various file types. Each adapter will then enable to

access the inner structure of particular file format. Let us discuss a little why the universal

provider is a good way when adding such functionality to PowerShell.

1.3.1 Why universal provider

Why to implement a universal provider

Although data come in different flavours, when considered at higher level, some

similarities can be found. One of these is that they are often hierarchical. For example xml file

contains its elements and the elements their subelements, archive files contain a set of files,

databases contain tables with rows where each row contains several columns etc.

That is what PowerShell providers are designed for. Since users who work with

PowerShell are aware of cmdlets that can be used with providers, they can operate on the

storage natively. In addition, it enables them to take advantage of all things that PowerShell

provides, such as streaming objects through the pipeline, consistent formatting and output,

scripts, functions and more. By implementing a data provider for particular backend data store,

the client can access data in the same way, regardless how the data are stored in the backend.

This also enables the scripts which access data to decouple themselves from storage details.

Why not to implement a bunch of cmdlets

Although data could be accessed via the set of cmdlets, the users would need to learn

new commands for each type of the data. PowerShell already includes cmdlets that are

designed to be consistent and intuitive for the user. For instance, rather than adding new

commands that can unpack compressed archives it is better to reuse those prepared for going

through the file system in order to go through the content of an archive file.

Page 10: Extensible Provider for Windows PowerShell - ExtBrain

10

Why not to implement a set of providers

One option is also to implement a provider for each data type separately and add it to

PowerShell independently. For example, one provider for xml files, one for archive files etc.

However, clients would have to register a new drive for each file that they want to work with

and afterwards switch to the newly added drive.

1.3.2 Goals summary Let us summarize the goals of this thesis. The first objective is to design the provider

architecture so it can be extended by adapters in the future. However, the aim is also to

decouple data manipulation from the provider so it can be used separately. The secondary

objective is to add adapter for archive files in order to see whether the architecture of the

provider and data manipulation module works. As a result, the thesis covers these goals:

- design of the architecture of the modular data tool

- implementation of the provider that will use this data tool

- selection of the suitable library for archive files

- implementation of the adapter that will enable manipulating with archive entries

- implementation of the extension for PowerShell with the provider

Page 11: Extensible Provider for Windows PowerShell - ExtBrain

11

2 ANALYSIS

2.1 Introduction to PowerShell

2.1.1 PowerShell architecture PowerShell consists of three main parts – central execution engine, a set of cmdlets and

Providers, and a customizable user interface (shown in the Figure 3). It ships with several

default implementations of cmdlets, providers and user interface and many third party

implementations are provided by other groups or external companies.

Figure 3 PowerShell architecture

Host application

The engine is designed to be hostable in different application environments. The host is

an application which uses PowerShell functionality to perform its tasks. The host can be a

console application, a Windows application (with UI) or a web application (ASP.NET). In

order to communicate with the PowerShell engine it has to implement the host interface that

includes:

1. Getting input from users

2. Reporting progress information

3. Output and error reporting

Page 12: Extensible Provider for Windows PowerShell - ExtBrain

12

PowerShell contains default console-based implementation of the host called

PowerShell.exe. For those who would not like this environment several more

user-friendly and sophisticated environments are available such as PowerGUI [6], PowerShell

Plus [7] and others.

Engine

The PowerShell engine provides the execution environment for cmdlets, providers,

functions, filters, scripts and external executables and contains the core execution

functionality. This functionality is exposed through the Runspace interface, which is used by

the hosting application to interact with the engine. “At a high level, the engine consists of a

runspace, which is like an instance of the engine, and one or more pipelines, which are

instances of command lines.” [8] Pipelines interact with cmdlets through the Cmdlet

interface and similarly with providers through a well-defined set of provider interfaces.

Snap-In

The Snap-Ins represents the key entry point when extending PowerShell. Good

definition of Snap-In is given by [8]: “A snap-in is a .NET assembly or set of assemblies that

contains cmdlets, providers, type extensions, and format metadata.” The default Microsoft‟s

implementations of providers and cmdlets are also implemented within these Snap-Ins:

- Microsoft.PowerShell.Diagnostics

- Microsoft.WSMan.Management

- Microsoft.PowerShell.Core

- Microsoft.PowerShell.Management.Utility

- Microsoft.PowerShell.Management.Host

- Microsoft.PowerShell.Management

- Microsoft.PowerShell.Security

2.1.2 Extending PowerShell PowerShell can be extended by adding new Snap-In which contains cmdlets or

providers. Basically any .NET assembly can become a PSSnapIn if it contains implementation

of the installer class. There are two types of installer classes provided – PSSnapIn and

CustomPSSnapIn. The Snap-In implementation has to inherit from one of these two and

override few fields. Inheriting from CustomPSSnapIn gives user more capabilities when

deciding what cmdlets and providers will be registered when the Snap-In is loaded.

Writing a cmdlet

The cmdlet is determined to perform one particular task. Its implementation is simple

since PowerShell runtime helps with parameter parsing, error reporting and output writing.

The Cmdlet has to inherit from the PSCmdlet class and override the method called

ProcessRecord that contains cmdlet logic. Parameters of the cmdlet are automatically

bound to the class properties which are marked with the ParameterAttribute.

Conversion to the specified type is done behind the scene. Additionally, the cmdlet can

perform parameter validations and transformations, define a default parameters position,

perform start-up or closing operations and other tasks.

Writing a provider

Implementation of the provider involves decision about provider capabilities – whether

it will read items from storage, navigate through storage, etc. PowerShell contains several

classes that the provider can inherit from. Each class defines virtual methods through which

the standard provider cmdlets are performed. To be more specific, if the provider supports

navigating through storage it should be derived from the NavigationCmdletProvider

class that defines methods like ItemExists, GetChildName, GetParentPath and that

are used to perform for example the Set-Location cmdlet. If the client does not override

Page 13: Extensible Provider for Windows PowerShell - ExtBrain

13

one of parent‟s virtual methods the PowerShell tries to use the default implementations. The

requirements for provider implementations are discussed in the following chapter in more

detail.

2.1.3 Requirements to implement custom provider In PowerShell, providers present consistent interfaces to custom data stores. There are

several types of providers and developers must choose which one to use for controlling access

to the data store. This selection of the provider base class affects what cmdlets actually work

with the provider. The following section describes what options developers have and what type

of provider enables particular set of operations on data. Additional useful information about

provider concepts is available at [9].

Provider base classes

Every provider base class inherits from CmdletProvider which contains several

methods and properties for interacting with the provider infrastructure or host. Usually

providers do not inherit directly from this one as it defines no methods to perform tasks on

data. But it defines two helpful callbacks called Start and Stop that can be used to observe

when the provider is initialized or terminated and thus run start-up or clean-up code. It is more

reasonable to derive from one of the classes in the following paragraphs. Their hierarchy is

sketched in the Diagram 1.

Diagram 1 Hierarchy of provider base classes

DriveCmdletProvider

The DriveCmdletProvider class is the base class for all providers that enable

creation and removal of drives. Drives provide a way in PowerShell to logically or physically

partition a provider‟s data store. This base class contains methods for default drives

initialization, adding or removing drives. PowerShell stores information about current location

and drive in the PSDriveInfo property of the CmdletProvider class that is accessible to

descendants.

ItemCmdletProvider

By deriving from this base class it is indicated that the provider can access data located

by their paths. The path can point to one or more items within the store. An important fact is

that PowerShell supports both slashes and backslashes to be used in paths. Therefore the

provider should normalize them accordingly. The operations defined by this base class allow

users to perform data retrieving, clearing and invoking. ItemCmdletProvider inherits

from the DriveCmdletProvider class so it has to support all the previously mentioned

drive operations. The Diagram 2 shows main methods to be overridden.

Page 14: Extensible Provider for Windows PowerShell - ExtBrain

14

Diagram 2 DriveCmdletProvider and ItemCmdletProvider classes

ContainerCmdletProvider

The ContainerCmdletProvider is a descendant of the previously mentioned

class and adds functionality that introduces sense of hierarchy. There are several cmdlets

working on multi-layered data storage. They provide a way to perform tasks like copying,

removing or renaming items. Main methods to be overridden are shown in the Diagram 3.

NavigationCmdletProvider

This base class adds support for nested storages and relative paths. Using the filesystem

as an example, nested containers would be directories and subdirectories and these are not

supported by the ContainerCmdletProvider. Relative paths enable to use current

location set by the Set-Location cmdlet as a starting point when passing paths to other

provider cmdlets. Main methods of this base class are shown in the Diagram 3.

Diagram 3 ContainerCmdletProvider and NavigationCmdletProvivider classes

IContentCmdletProvider & IPropertyCmdletProvider

There are two optional interfaces that can the provider implement. These define methods

for getting and setting content or properties of items within the storage.

IContenCmdletProvider defines methods that return content reader or writer objects

through which the content is obtained and set. Therefore, another requirement is to provide

such readers and writers which must implement the IContentReader and

IContentWriter interfaces. Main methods of these interfaces are shown in the Diagram 4.

Page 15: Extensible Provider for Windows PowerShell - ExtBrain

15

Diagram 4 Optional provider interfaces

Conclusion

The methods shown in the diagrams of the provider base classes mostly

have their pairs with the “DynamicParameters” suffix (e.g. CopyItem and

CopyItemDynamicParameters). These methods can be implemented when a developer

of the provider needs to add a custom option to the associated cmdlet (e.g. Copy-Item).

Developers do not have to override all the methods of the provider base classes as PowerShell

can use its own default implementations. However, it is highly recommended to do so, because

PowerShell then tries to figure out what the storage structure is and calls other provider

methods repeatedly (like ItemExists) in order to gain necessary information.

2.2 Universal filesystem provider

2.2.1 Idea of the provider The idea of the universal provider is the following: The users will invoke cmdlets that

are reserved for going through the storage. When the provided path points to filesystem the

provider will behave in the same way as the original one. However, when the user passes the

path which points to an item inside a file the provider will check whether it has a registered

reader which can access the inner file content. If there is such a reader, provider uses it to read

its content and enables client to navigate through the inner file structure, obtain items from the

file, copy them, move, delete or rename them.

PS> Get-Item D:\Temp\archive.zip\Folder\file.xml\root\* PS> Copy-Item D:\Temp\archive.zip\file.txt D:\NewFolder\file.7z\Folder PS> Delete-Item D:\Temp\archive.7z\Folder\*.txt PS> Move-Item D:\Temp\file.txt D:\Temp\existingArchive.7z\Folder PS> Get-ChildItem D:\Temp\archive.7z

Figure 4 Universal provider usage samples

Next, when the client invokes the cmdlet for creation of a new item or moves an existing

item into new location the provider analyses the path and creates all its segments accordingly.

This means, that when the specified path points to the location inside a file, the provider

creates the file with appropriate structure. It will contain registered writers where each writer

will process one or more file types.

Page 16: Extensible Provider for Windows PowerShell - ExtBrain

16

Another scenario is when the user asks for the file content. The provider will check what

reader is registered for the file type at the path and returns the most suitable object. For

example it returns an XDocument object directly from xml files rather than passing a string to

the pipe. The samples of commands for these operations are sketched in the Figure 4.

2.2.2 Requirements The development of the provider with the features mentioned above includes several

tasks to be done. The following paragraphs give their overview and they are elaborated later in

more detail.

Separate project into modules

The aim is not to implement all the features into the provider but develop separate

modules which can be used independently, even without PowerShell. There must be a part

which will perform data manipulation, a part which will manage registered readers and writers

etc. Therefore, the decisions about these modules must be discussed.

Implement PowerShell provider

For purposes of this thesis, it will be necessary to derive from the

NavigationCmdletProvider and implement all the methods of the provider base

classes - drive, item, container as well as navigating methods. By deriving from any ancestor

the provider would lose the navigation functionality that is crucial for this project.

Figure out how the reading and writing process will work

The writing and reading process bring many questions like: How the loaded data will be

passed to other readers? Will be reading performed immediately or lazily? How the readers

will be selected and who will be responsible for this? Whether the path will be processed part

by part or by the whole segment? These and others must be taken into account.

Choose and adapt the library for archive files

The intention is to add support for archive files as they are a good example of further

structured data. For this reason the SevenZip library was chosen but what led to its selection

and how it was adapted into C# code must be discussed.

Implement sample readers and writers

In order to see whether the architecture works, it is necessary to include some default

reader and writer implementations. The thesis will provide support for archive files processing

as well as readers for more convenient xml or text files reading.

2.3 Project separation into modules Let us see how the project of the universal provider development can be separated into

individual parts. Obviously, one part of the work must include provider itself and Snap-In

which will be registered in PowerShell. However, the data reading and modification must be

extracted out of the PowerShell's scope. It should not be tied inside the provider so it could be

used with other projects separately. Therefore, this will cover another independent module.

The provider will then delegate its methods calls to this module and gain data from it.

Page 17: Extensible Provider for Windows PowerShell - ExtBrain

17

Diagram 5 Modules

The data module must have an architecture through which the ability to read and modify

particular file type could be added in the future. The idea is that the data module will have

multiple registered adapters where each adapter provides access to one file type. One adapter

could theoretically involve both the reading and writing capability. But the fact is that

developers who would like to extend only the reading or writing part of the data module

separately would have no chance to accomplish this. This resulted in the idea to separate the

reading and writing parts so they could be extended independently.

It is important to realize that the relation between particular file type and the reader (or

writer) registered for this file type is of type n to n. To be more specific, one reader can read

multiple file types and one file type can be read by more readers. A good example of such a

relationship is an xml file which can be read both by the text and xml reader and the text reader

can read both text and xml files. That is why there must be a module that will hold these

relations and carry preferences according to which the readers (or writers) will be selected.

This module must be easily replaceable in order to let developers using the data module

implement rules by themselves or let the client decide what reader will be used when reading

particular data file. Since the writing and reading part is separated and preferences in reading

can differ from those that are applied during writing there will be two modules for this

purpose. This division is sketched in the Diagram 5.

Another quite separate part of the thesis will include a library for compression and

decompression which will be used for implementation of the archive reader and writer. Thus,

the library together with a sample application that will show its usage form another outcome of

this thesis.

2.4 Reading process

The idea of the process The idea of the reading process is the following: The data module will obtain a path

from the user. Firstly, it will call the reader provider module which will select the first reader

according to the path root. Next, the obtained reader will load matching items. But it can

happen that the path points to entries inside the file and the first reader does not know how to

load them. Therefore, the items will be passed to the expansion procedure which will process

the next path parts recursively until the whole path is matched. Each iteration of this process

Page 18: Extensible Provider for Windows PowerShell - ExtBrain

18

will call the reader provider module in order to select the most suitable reader which

understands the structure of an item that should be expanded. The code sketch in the Figure 5

shows the intended algorithm.

ArrayOfResults ReadItemsFromPath(path) { rootReader = readerProvider.GetRootReader(path); inputItems = rootReader.ExpandRootPath(path); return ExpandItems(inputItems); } ArrayOfResults ExpandItems(ArrayOfItems items) { foreach (item in items) { if (IsEmpty(item.UnprocessedPath)) { AddToResults(item); continue; } nextReader = readerProvider.GetReaderFor(item); subItems = nextReader.ExpandItem(item); AddToResults(ExpandItems(subItems)); } }

Figure 5 Reading process algorithm code sketch

Each part of the path will belong to one reader. The reader provider will carry rules

according to which the readers will be selected. The sample how the division of a path among

readers could look like is shown in the Figure 6. Here, the archive.zip file opened by root

reader is passed to the SevenZip reader which extracts the xml file. The file is then processed

by the xml reader which loads matching elements.

Figure 6 Division of path parts among readers

The setting of the reading process will be influenced through public properties of the

readers. These can include settings of the encoding, buffer size etc.

Requirements In order to implement the process described above this, these things have to be considered:

- how the path will be matched

- division of the readers and their selection through the reading process

- what data will be passed between the readers and what their responsibilities will be

- whether to read items immediately or lazily

2.4.1 Path matching

Read path segments or individual path parts One question that comes with the reading process is whether the readers should process

always only one part of the path or the whole path segment which consists of multiple parts. It

can be answered when looking at the path in the Figure 7 and going over its reading process.

Page 19: Extensible Provider for Windows PowerShell - ExtBrain

19

D:\Temp\archive.7z\*\*.*

Figure 7 Sample path

When the archive file reader would process part by part separately the items that match

the first asterisk would be returned immediately back to the data module without trying to

expand the next path part. However in the next iteration, these items would be passed back to

the same reader which would have to open the archive file archive.7z again in order to

expand “*.*”. This would result into continual files reopening and inefficiency. As a result, the

readers must process the whole path segments and return items to the data module when they

cannot continue in the reading and a different reader must be used.

Extension with special wildcards One aim was to give users more ability for items selection. Although, the path can

contain wildcards, the users have no chance to set recursive searching by using them. In

PowerShell, the recursive search is specified by the additional –recurse parameter. That is

sufficient when the searching is done within one store. But when the paths will go through files

the users need an ability to specify which path parts should be expanded recursively and which

parts should be expanded only with one level search. Switching all readers into recursive mode

would result in performance issues because unnecessary file parsing and reading would be

done.

For this reason, the paths that can be used with the data module are extended with new

wildcards combinations that enable to specify which path parts will be expanded with

recursive search. Moreover, the new wildcard combinations were added to restrict whether the

searching will include only files, folders or both. These wildcard combinations are shown in

the Table 2.

One level search Recursive search

All items * **

All folder *. **.

All files *.* **.*

Table 2 Wildcard combinations

The usage of this extension is sketched in the Figure 8. The path in the Figure specifies

recursive search to all zip files but one level searching within the zip files.

D:\Temp\**\*.zip\Folder\*\file.xml

Figure 8 Demonstration of wildcard extension

Ambiguity in path matching The extension with recursive wildcards however brings the problem with ambiguous

match. Let us consider the path Folder\file.zip and the pattern **\*.zip given by

the user. The Figure 9 shows how this path can match the pattern. In the Sample 1, the whole

pattern matched and no unprocessed path is present. However, in the Sample 2 only the first

pattern part matched and the remainder points to entries inside file.zip.

Figure 9 Ambiguous matching sample

Page 20: Extensible Provider for Windows PowerShell - ExtBrain

20

The decision how the readers will behave can take basically three options into account:

(1) strictly require readers to match as many pattern parts as possible, (2) the opposite can be

done and matching can behave modestly and match the smallest pattern segment, (3) try to

resolve all matching possibilities and process each matching result separately. The best option

is definitely to return all matches since users should not lose any results. As a consequence, the

readers must count with the ambiguity and return one item even multiple times when different

matches appear.

2.4.2 Readers It can be noticed that readers can be divided into two categories: (1) those which need

only path to load data from and (2) those which need already loaded item to get subitems from.

For example, the file system or ftp reader would belong to the first category as the only thing

they require is a path. But readers like the archive file reader or xml reader need a file to

operate on (an archive file or an xml file). Therefore, two categories of readers are

distinguished. They are called:

1. Root readers

2. Content readers

The root readers will be determined to read data from particular storage while the

content readers will read data from an already loaded file. When the path will be obtained from

the user the reader provider will determine the suitable root reader will which will be used to

expand the first path segment. When the root reader loads the data then the content readers will

be used. Let us see what requirements the readers must implement:

Matching

Since the path may contain wildcards the readers must provide path matching. However,

matching should be extracted out of the readers‟ scope. That is because future extensions

might want to implement custom matching rules or provide additional wildcard extensions.

Lazy loading

Readers should return matching items lazily because some path patterns provided by

users may contain wildcard combinations that will match large number of items. However,

user might want only subset of them or might want to stop searching when the desired item

appears.

Settings

The reading process must be customizable. Therefore the readers should expose

properties through which their expansion process can be influenced. These properties might

include settings of used encoding, buffer size, line separators (e.g. for the text reader), etc. The

one who will instantiate the reader will decide how the reader will behave during the reading

process.

2.4.3 Transferring data between readers

Requirements Obviously, the readers must cooperate together. The reader that loads one path part must

somehow transfer loaded data to another reader so it can perform further expansion. To be

more specific, the SevenZip reader needs to obtain a parent archive file from the root reader so

it can extract items.

Stream with unprocessed path is not enough When looking at the content readers there are two basic pieces of information they must

certainly obtain. These are the data of the file to expand and path to items within the file. The

simplest straightforward solution would be to pass the Stream of the file together with the

Page 21: Extensible Provider for Windows PowerShell - ExtBrain

21

unprocessed path string to the following reader. This reader would then read the obtained

Stream and load items that match to the next path part. However, this approach has several

drawbacks.

Drawbacks

Let us consider the reading process of the path in the Figure 10. The root reader would

open the Stream of the file.ukn and pass it to the data module. The data module would

then ask the reader provider module for a next suitable reader. The problem appears when no

reader for the file would be found. The Stream would be opened too early, even in the case,

when it is not needed and further expansion will be not done. Moreover, when the file.ukn

would be inside an archive file the unnecessary extraction would be done. Thus, the reading

process would be very inefficient.

D:\Temp\file.ukn\file.txt

Figure 10 Sample path

Another problem comes with Stream disposing. Many stream utilities in .NET are

designed to close the stream they operated on. This design suits well for common scenarios;

however, when the Stream would be closed during the reading process, the following reader

could not read it anymore and expand items from it.

Consequences

Consequently, readers must create more complex items. Each item will relate to the one

path part and will carry all information that is necessary for the part loading and subsequent

processing. The idea is that all readers will define their products that will implement one

common interface. These reader products (called “reader items” in the following text) will be

then transferred between readers during the reading process and they will know how to load

underlying data of the related path segment.

Responsibilities of reader items Let us explore what capabilities should reader items have:

1. Lazy loading

The reader item must enable to read data of the underlying object. But data loading must

be suspended to the time when it is actually needed in order not to do any unnecessary

operations too early.

2. Reloading support

The owner of the item may want to test data before reading them. For example, one

reading can be done by the reader provider module which can test item‟s signature in

order to select the most suitable reader for the further expansion and another can be

done by the following reader. Consequently, the item must have an ability to load

underlying data multiple times.

3. Multiple reading methods

In many cases, the items can be read in different ways. For example, the text file can be

read as a string as well as an array of lines. The ability to read files in various ways

enables writers to let them select the most suitable type for writing. The aim is to use the

data module for data transformations as well. Therefore, the plain data format is not

enough and the reader items must provide reading methods that can load and parse data

into various objects.

4. Delete, rename and move

The reader items must have an ability to delete, rename or move the underlying object.

Although it might seem that this should be the concern of the writing part, this part is

Page 22: Extensible Provider for Windows PowerShell - ExtBrain

22

separate with its custom path matching and it does not know about reader items

locations.

2.4.4 Reader items architecture

Lazy and multiple loading support In order to read item‟s underlying data lazily, the reader item must be able to start the

whole reading process of each path segment. Let us see it on the example. The Figure 11

assumes that the path was expanded by three readers – the FileSystemRootReader,

SevenZipReader and XmlReader. Each reader produced one reader item carrying

information about one path segment. The reader item of one reader is always sent to the

following reader so it can perform next segment‟s expansion. When the client asks for data, the

final XmlReaderItem needs to open and load all three path segments.

Figure 11 Sample reading process

Solution

The solution here is simple. Each reader item needs to hold a reference to the product of

the previous reader which knows how to load the previous path part. When the request on data

is sent from the client the most nested reader item will ask the previous item to load its

underlying data first. This will result into a cascade of reading events of all reader items

produced from the path. In order to be able to modify reading process attributes through

properties of the readers, the items must also hold a reference to their parent readers and use

their setting.

Reading methods registration One part of the reader items architecture is focused on the question: How the reading

methods will be registered within the reader item classes? The reader items must support

multiple ways of reading and thus they must have an ability to tell others what their supported

types are and they have to somehow map reading methods to client requests. For instance, one

XML document (represented by an instance of the xml reader item class) can be read as a

XElement as well as XDocument so it must provide methods that can produce them and

select them accordingly.

Proposed solution

The first approach assumes that reader items will contain pair of generic methods

CanReasAs<T> and Read<T> where generic parameter T specifies the type that the caller

wants to obtain from the reading method. The signature of these methods is shown in the

Figure 12.

bool CanReadAs<T>(); T Read<T>();

Figure 12 Methods' signature

To implement these two methods each instance of the reader item class must somehow

map the type to the reading method providing its creation. For this purpose each reader item

could create the object which would manage its reading methods (in this sample called

Page 23: Extensible Provider for Windows PowerShell - ExtBrain

23

MethodsMap) and which would select them accordingly. This methods map could directly

cast objects returned from the registered methods to needed type and ensure consistency of the

methods with the types they return.

The process of method registration and selection would be the following: Firstly, the

reader item will create a methods map object and register its reading methods in it (probably

during reader item‟s instantiation). When the user calls the read method and specifies the

needed type, the reader item will use this methods map object to select the appropriate method

and launch it. This process is sketched in the Figure 13.

Figure 13 Methods registration via MethodsMap

Pros and cons

This approach of the reading methods registration leads to several advantages:

1. Handy usage

When the reader item object is obtained, it is easy to check whether the particular type

can be retrieved by using the CanReadAs<T> method and obtain instance of that type

using the Read<T> method.

2. Simple implementation

Implementation of each reader item class can use a MethodsMap to register all reading

methods. The Read<T> method will only call the MethodsMap object to invoke the

appropriate reading method for the given type. The CanReadAs<T> method will only

ask the MethodsMap object whether it contains any registered method for the given

type.

3. Runtime registration

The reader item can decide what reading methods will be registered at runtime when the

object is constructed.

Despite these visible advantages, it also contains these issues:

1. Repeated methods registration

The methods registration has to be done each time when the reader item is instantiated;

even so, it could be done only once during the program start. Since many items can

match a path pattern given by the user repeated registration could decrease the

efficiency.

2. No option to register more methods returning the same type

Let us assume that the TextReaderItem class would need to register two reading

methods returning the string type with two different semantics. The MethodsMap

Page 24: Extensible Provider for Windows PowerShell - ExtBrain

24

would have to be extended to register multiple reading methods for one type and enable

users the way to select among these.

Solution modification

Still, it is realizable to change this solution slightly and avoid the mentioned issues while

keeping its good features. The suggested idea is to register the methods through interfaces.

Thus, create an interface for each form of the reading method and then implement this

interface explicitly. This permits implementing multiple reading methods with the same return

type and no repeated registration will be done. However, how the convenient methods

CanReadAs<T> and Read<T> can be kept. These two can be implemented as extension

methods5 of the reader item interface and check whether the reader item implements particular

reading interface. When colliding reading methods appear they can provide overloads with

parameters that distinguish them.

Delete, rename and move support When reader items know about their parent items it is simpler to add functionality for

deletes and updates because these operations must also change all parent items data

accordingly. Important is to realize that the updating process will have the opposite direction

that the reading process. The most nested item must update itself first and then pass new data

to the parent. Let us propose the solution how these operations will behave.

Solution

The delete operation is the easiest as it only loads the parent data, deletes itself from

them and then asks parent to update. Very similar is the rename operation. It must load parent

data, locate item to rename within them, perform rename and send update request with updated

data to the parent item. More complicated is the move operation. The issue is that the target

path can lead into totally different storage. The Figure 14 shows that.

PS> Move-Item D:\Temp\archive.zip\file.txt D:\Folder\file.zip\Folder\file.txt

Figure 14 Sample of move cmdlet

Two situations can appear:

- the target path points to the storage from which the item was loaded

- the target path points to a different storage

It is important to distinguish these two situations because moving within one storage is

often much faster (the node is only renamed). When the target path points to different storage

the move operation will behave as delete and write. Since the reader items are separated from

the writing module this operation will be controlled by the data module and items will perform

these operations only in the scope of their container (storage). In order to be able to decide to

what storage the target path points, reader items will define method which determines whether

the target storage is equal to the reader item‟s one. If so, it returns relative path that will be

passed to reader item‟s move method in the next step. When this is not the case, the data

module will request reader item to perform delete and writing module to write item to the new

location.

2.4.5 Expansion and conversion

Scenario description

When looking at the path in the Figure 15 it raises the question what will be the last

reader which will process the file. Obviously, the first reader who will get to the item will be

5 Extension methods enable to add methods to existing types without creating a new derived type,

recompiling, or otherwise modifying the original type. [33]

Page 25: Extensible Provider for Windows PowerShell - ExtBrain

25

file system root reader. However, the xml reader is the one which is determined for xml files

and which understand their inner structure. But both reading the file as Stream with the root

reader and reading file as XDocument or XElement with the xml reader can be suitable in

some situations. Apparently, the duplication of reading methods in all readers is not

admissible.

D:\Temp\file.xml

Figure 15 Path sample

Solution

The proposed solution distinguishes between the expansion and conversion process. An

item is expanded until the time when all path parts are matched and then returned to the user

immediately. When the user sends the request to read the item and supplies the type which the

reader item is not able to construct, the conversion process is launched. This is shown in the

Figure 16 where the requested type is XElement. The item is firstly loaded with the

expansion process and then converted to the xml reader item which knows how to load needed

XElement.

Figure 16 Expansion and conversion process

Both expansion and conversion processes must select suitable readers that will be used

in the next iteration of the process. Expansion needs to select a reader according to the type of

the last loaded item while conversion needs to map types to readers that can produce items

with ability to create object of that type. Therefore, these reader providers carrying selection

rules may be implemented separately for expansion and conversion processes.

2.4.6 Stream sharing during reading process The reading process involves important decisions regarding to stream closing. The

situation when it needs more attention is when the path is targeted to multiple items within a

file (or a different item). The sample of the path like this is shown in the Figure 17.

D:\Temp\archive.7z\Test\* The file structure: archive.7z\Test\a.txt archive.7z\Test\b.txt

Figure 17 Path sample for stream sharing problem

Page 26: Extensible Provider for Windows PowerShell - ExtBrain

26

Scenario description

Let us assume that archive.7z contains two files a.txt and b.txt in the folder

called Test. The process of reading would be the following: The reader provider module

determines that the first path part belongs to the file system root reader which would expand it

into one reader item representing the archive.7z file. Then, the next part is evaluated and

passed together with the reader item to the SevenZip reader. Since all reading is done lazily the

expand method of the SevenZip reader returns immediately an enumerator and waits until its

MoveNext method is invoked. After that, the evaluation process starts and the SevenZip

reader asks for stream of the parent archive.7z item and finds the first matching file within

the archive. Next, it wraps it into a SevenZip reader item and yields it immediately. At this

point, the client gets the final SevenZip reader item (as the whole path was expanded) and can

perform any further operation with it. Commonly, he/she will read its data immediately. When

this happens, the reader item representing the a.txt file asks the parent reader item for

archive.7z stream again so it can be extracted. But the archive stream is already opened

since the yielding in the parent reader has not finished yet and the b.txt file is still pending.

As a result, the stream must be shared for reading. This process is sketched in the Figure 18.

Figure 18 Reading process with sharing problem

Simple stream sharing is not enough

Although it might seem that opening stream with the read share mode would be

sufficient, it is not. To be more precise, it would be sufficient for the situation above since

reopening archive stream can be done quickly on the file system. But when the archive.7z

file would be the inner item of another archive, the reopening archive stream would mean

extracting it again. Or, when the file would be located on ftp it would result into a new

download process. Thus, it is necessary to reuse the obtained stream when the reading is not

finished and keep it opened as long as needed.

Solution

The proposed solution introduces a stream provider object that will control stream

opening, closing as well as reading. It will hold a reference to the original stream and expose

its data to clients. The clients will obtain custom stream implementation whose only

responsibility is to delegate all calls of Read, Dispose and other stream methods to the

stream provider object so it can control the reading process. The reader provider must keep

Page 27: Extensible Provider for Windows PowerShell - ExtBrain

27

positions of each stream given to the client and perform seek operation previous to each read

method call.

The process of reading will be the following: The client will ask for data. The stream

provider will create and register fake stream and return it to the client. When the client calls

the Read method on this fake stream, it will ask the parent stream provider for data. Then the

stream provider identifies the stream from which the request for data was sent, finds its

position, preforms seeking and loads the data. These data are then sent to the client. The

sample how the stream provider state could look like is shown in the Figure 19.

Figure 19 Stream provider

In order to be able to decide when to close the base stream it must know how many

streams are still alive (not disposed). This is done by reference counting which is performed in

the Get and Dispose methods. Another fact is that the provider has to be capable of opening

the stream again even after yielding has finished. That is because reader items can be read

multiple times or with delay after yielding and thus start the reading process again. That is why

the stream provider will take the function for stream retrieval rather than taking the stream

itself.

2.4.7 Sharing an archive object Very analogous situation as was with the stream sharing comes with archives. The

process is the same as described above; however, now the problem is not in reopening the

stream but in repeated archive object loading in the SevenZip reader. In order to obtain an

archive object SevenZip has to parse its headers and do some steps to create it. This takes time

so it is inefficient to create an archive object multiple times when it is not necessary.

Solution

As a result, the SevenZip reader has to implement archive sharing in very similar way as

was done with stream sharing. It must also do reference counting and dispose the archive

object only at the time when all of its subitems have already been loaded. Consequently, reader

items of the SevenZip reader will take an archive provider object that will control accesses to

the parent archive. When the item requests the archive object, the archive provider will return a

wrapped archive through which it will control when the operation with the archive ended.

The fact is that the very similar situation will appear also in other readers. That is

because readers return matching items lazily. Therefore, each reader implementation will have

to figure out, how to share the expanded file between the reader and its products.

Page 28: Extensible Provider for Windows PowerShell - ExtBrain

28

2.5 Writing process

Requirements The writing process differs from the reading because it must run in the opposite

direction. Similarly to the update operation of reader items discussed above, the most nested

path part must be constructed before the others so it can provide data for parent parts. Let us

see it on the sample shown in the Figure 20. Here, file.txt must be constructed first and

then passed to the archive writer which will create the zip file with the corresponding entry.

After that, the data of the archive file can be written to the file system. Important fact is that

writers must also write data into existing locations and perform update operations on existing

path parts.

D:\Temp\newfile.zip\Folder\file.txt

Figure 20 Path sample for the writing process

Idea of the writing process The proposed idea of this process is the following: Firstly, the user will supply a target

path to the data module. It will then use the writer provider module to construct a write object.

This object will consist of several writer items where each writer item will be a product of one

writer. It will carry information about one path segment similarly as reader items in the reading

process. The sample how the object can look like for the path above is shown in the Figure 21.

When the data module obtains the constructed write object it will start the writing procedure

by invoking its write method and supplies data to be written. The outer writer item will prepare

its path part and then ask its inner item to provide data. The inner item will do the same work

recursively. The most nested item will construct data from those obtained from the data

module and stop the recursion.

Figure 21 Write object structure

The process has to be able to handle the case when the target path contains parts which

already exist. In that case, writer items which represent the existing path part must provide

original data to their subitems so they can update them. This is sketched in the Figure 22 where

the root reader passes data of the existing.zip file to the SevenZip writer item. Since its

entry file.txt does not exist no data are sent to the following item.

Page 29: Extensible Provider for Windows PowerShell - ExtBrain

29

Figure 22 Passing data for update between writer items

2.5.1 Write path accumulation and collisions

Scenario description

The problem that comes with recursive searching is the following: Let us assume that

user invoke the Copy-Item cmdlet shown in the Figure 23. Since the path contains recursive

searching in all subfolders of the folder “Input”, multiple files with the name “text.txt” can

match. The question is what names should be used for output files. Evidently, they all cannot

have the same name “text.txt”. Moreover, in the second copy command of the Figure the user

would expect that locations of the items present in the folder “Input” will be kept at output.

This is also connected with the question how the collisions will be resolved during writing.

The aim is not to couple this functionality with writers and solve it separately so the collision

solver could be replaced.

PS> Copy-Item D:\Input\**.\text.txt D:\Output\ PS> Copy-Item D:\Input\** D:\Output\

Figure 23 Copy sample with recursive wildcards

Solution

The first problem can be solved by adding this restriction: The items which will be

written must record the path which was substituted for wildcards (this path is referred as a

write path in the following text). The combination of this write path with the target path will

determine the final item‟s location. The write path recording is actually responsibility of

readers since they must cumulate it during the matching process.

When dealing with collisions, the proposed idea is to define policy classes and set them

to those writers which can encounter the collision issue. Each policy class will define

particular behaviour of a writer. For instance, one policy class can define rewrite behaviour so

the writer will override colliding item with new data. Policies must implement one common

interface that will writers work with. The reason for this is that the policy classes must be

easily replaceable within the writers.

The question is what should policy classes get from the writer and return back. The idea

is to give them path of a colliding item together with the function through which the

uniqueness of the item‟s path can be tested. That is because the policy class may suggest new

item‟s name and check immediately whether the new name is not colliding anymore. If the

policy class does not change the colliding item‟s name it must be able to tell what operation

(rewrite or update) should be done. Therefore, the policy class should definitely return the final

path together with an attribute identifying the intended action.

Page 30: Extensible Provider for Windows PowerShell - ExtBrain

30

2.5.2 Write items separately or together Another question that comes with the writing process is whether the items should be

written all together or individually. Let us consider advantages and disadvantages of both

approaches and explain it on an example.

Scenario

As it was discussed in the previous section, input items may contain a write path that

was substituted for wildcards. The sample when this can happen is shown in the Figure 24. It

shows two input items that match the source path pattern and the write paths (in brackets) that

were substituted for the wildcards. In order to keep the same structure as was in the source

storage, the write paths of the matched items must be combined with the target path given by

the user.

PS> Copy-Item D:\Folder\**\*.txt D:\Temp\ --> D:\Folder\Test\file.txt (Test\file.txt) --> D:\Folder\Test\file.7z\text.txt (Test\file.7z\text.txt)

Figure 24 Copying sample

Pros and cons

As it can be noticed, the write path of the second matching item leads across the archive

file file.7z. The problem when writing multiple objects together is that the write object is

built only once according to the target path given by the user and write paths of matching items

are not included. Therefore, the last writer will handle all input items without switching to

different writers even when any write path goes through a file.

If the items were written individually the paths might be combined immediately by the

data module before the write object is built. The advantage here is that the building of the write

object can select a different writer for it and write the file.7z with an appropriate writer.

On the other hand, individual writing would result into continual write object building

and more importantly reopening of the whole target path. Collective writing does not contain

these problems as the items are written only once by the last writer.

Conclusion

Both approaches have drawbacks; however, efficiency is more important here and thus

collective writing is better. Moreover, it can be expected that common use cases will not match

items across multiple storages during the copy procedure and users will do not want to create

many different files with paths that they could possibly match.

2.5.3 Stream disposing problem

Scenario

Nested writer items must provide Stream with data for their parent items so they can

write the data to prepared destination. The issue is that some classes (like StreamWriter)

are designed to close the underlying stream when they are disposed. This is expected

behaviour but for purposes of nested writer items it is not convenient. That is because these

items operate with data in memory and stream close means their loss. The question is how to

prevent this stream closing.

Solution

The classes that cause this problem are always disposable. However, ignoring their

dispose can cause that other resources will remain unrealised and that they will leak. Also

creation of the fake stream whose dispose operation does nothing is a good idea as it goes

against the Stream class design.

Page 31: Extensible Provider for Windows PowerShell - ExtBrain

31

The proposed solution is to create a stream pipe. The result data will be filled into the

pipe‟s input and then read from the pipe‟s output. The pipe will close the underlying stream

when the reading of its content is finished. When the client asks the pipe for the input or

output, it will provide a fake stream which will delegate its calls to the methods of the pipe.

The pipe will receive and send data through these fake streams. This design expect future

extension to multithread pipe which will return output immediately and coordinate process

writing to the input according to requirements on the output (hence with producent-consumer

model).

2.6 The library for compression and decompression One goal of this thesis was to include a reader and writer for archive files. To achieve

this, the library for compression and decompression must be selected. Many libraries are

concerned about data de/compression but only few of them are both open source and intended

for Windows platform. The final selection chose from the libraries mentioned below which

were most suitable for this project.

SharpZipLib

SharpZiplib (formerly NZipLib) is a Zip, GZip, Tar and BZip2 library written entirely in

C# for the .NET platform. “It is implemented as an assembly (installable in the GAC6), and

thus can easily be incorporated into other projects (in any .NET language)” [10]. It is an open

source library and the development and support is still alive. The library seems to be very

stable and its API is easy to understand and use.

SevenZip

SevenZip is also an open source library and it includes probably the widest range of

supported archive types. These include more and more used 7z and also zip, rar, iso, gzip,

bzip2, arj, z, etc. Strength of SevenZip is also a very good compression ratio. Unfortunately,

this library is implemented in C++ so it would be necessary to implement a C# wrapper. More

information about this library can be found at [11] [12].

SevenZipSharp

“SevenZipSharp is managed 7-zip library written in C# that provides data

(self-)extraction and compression. It wraps 7z.dll or any compatible one and makes use of

LZMA SDK.” [13] Its API is also well designed and easy to learn. However, during the

examination it turned out that some features were not stable enough and some parts of

functionality for stream de/compression and archive updating were missing. Moreover, it was

unclear whether the author of the library will continue the development or whether the future

support will be provided.

Final selection

SharpZipLib was rejected because of small range of supported archive formats.

SevenZipSharp suffered from the issues mentioned above (at the time when this thesis was

created) and therefore it was not used either. As a result, the SevenZip library was selected as

the most suitable for this project. Its wide range of supported archive types meets our

requirements.

2.7 SevenZip analysis Since the SevenZip library is written in the C++ language it is not possible to use

directly. Therefore, preparatory work must be done. Firstly, it is necessary to decide how C#

6 The Global Assembly Cache (GAC) is the cache present on each computer with installed common

language runtime. It stores assemblies designated to be shared by several applications. More

information can be found at [32].

Page 32: Extensible Provider for Windows PowerShell - ExtBrain

32

code will call the library and write a wrapper through which it can be accessed. Secondly, the

user interface in C# must be designed and several questions related to usability must be

discussed. These two phases are elaborated in the following sections.

2.7.1 SevenZip wrapper Since SevenZip is written in the C++ language it is necessary to make it available for C#

code. There are basically these three options that can be used when calling a native library

from managed code.

C++/CLI7 wrapper

By using C++/CLI, an application may simultaneously use the managed heap (by way

of tracking pointers) and any native memory region. Therefore, it is reachable to wrap native

C++ objects and functions and then use them from independent C# project. Simply, C++/CLI

is the means to program for .NET platform in C++. Formally, it is Microsoft's language

specification standardized by ECMA8 (and also by ISO

9) intended to supersede Managed

Extensions for C++ and aims to simplify the older Managed C++ syntax. In addition to the

facilities provided by C++, it provides additional keywords, classes, exceptions, namespaces,

and library facilities, as well as garbage collection. More information can be found at [14].

Explicit P/Invoke

Platform Invocation Services (P/Invoke) allows managed code to call unmanaged

functions that are implemented in a DLL from C#. There are few steps that must be followed

when calling native functions directly. Firstly, managed code must provide the compiler with a

declaration of the unmanaged function. Optionally, it may also provide the C# compiler with a

description of parameters and return values marshalling from and to the unmanaged code.

Next, the DllImport attribute must be attached to methods declarations and it must specify

the name of DLL that contains the native function. An important requirement is that the native

function must be exported from the native DLL. More information about function exporting is

at [15] and about direct calling of native functions is at [16].

Accessing library via COM API

The SevenZip library implements the COM interface that can be used when accessing

SevenZip from other languages. Even the console application which is a part of the SevenZip

source code accesses SevenZip classes via its COM API. However, the COM implementation

in SevenZip it is not typical Windows COM and the way of objects obtaining slightly differs.

In order to reference COM objects and interfaces in C# code, it is necessary to include a .NET

Framework definitions for the COM interfaces into C# build (often referred as interop classes

and interfaces) which specify how the native types will be marshalled into .NET types. For this

purpose, .NET SDK contains tool called TlbImp.exe which can generate these interop

classes automatically. However, because of the SevenZip differences it does not work. More

information about using COM from managed code is at [17].

Final decision The problem with C++/CLI wrapper is that the SevenZip project structure is quite

complicated and many classes would need their wrappers. Moreover, some parts of SevenZip

7 Common Language Infrastructure (CLI) is Microsoft‟s language specification standardized by

European Computer Manufacturers Association (ECMA). It allows applications to be written in a

variety of high-level programming languages and executed in different system environments. More

information is available at [31] and [30]. 8 ECMA International is an international standards organization for information and communication

systems. More information is available at [34]. 9 International Organization for Standardization (ISO) is an international standards organization

composed of representatives from various national standards organizations. More information is at

[35].

Page 33: Extensible Provider for Windows PowerShell - ExtBrain

33

are written in native C or even in an assembler and these parts caused errors during

compilation of the project with CLI headers.

The P/Invoke approach is also problematic. The SevenZip library contains several

classes through which archives can be compressed and decompressed but it contains no high-

level methods that put these classes together. Export functions in P/Invoke approach would

have to implement these methods from the scratch. This would result in a bunch of code

inserted into the SevenZip code.

As a result, the only option left behind is SevenZip COM API. No additional code will

be added to SevenZip; however, the interop classes with marshalling rules must be written

manually.

2.7.2 SevenZip COM API overview During deeper library investigation it emerged that SevenZip implementation of COM

interface differs from standard Windows COM in the way the COM objects are obtained. This

is because most of 7-Zip functions return direct pointers to interface implementation and do

not use the QueryInterface method10

. Owing to that, using Visual Studio tool

TlbImp.exe for generating interop classes is unfeasible and it is necessary to implement

these interop classes manually.

Let us briefly look at COM interfaces exposed by SevenZip. The most important are

archive interfaces (IInArchive and IOutArchive) through which the manipulation with

archives is done. Instances of these two are obtained from the SevenZip library and thus the

developer must only declare them and specify marshalling of parameters for each its method.

The different situation comes with the stream interfaces. Their implementations are passed

from managed code to native and therefore their implementation must appear. SevenZip then

calls their methods in order to read the archive file‟s data or write the result of compression.

These interfaces are shown in the Diagram 6.

Diagram 6 SevenZip archive and stream interfaces

SevenZip defines several callback interfaces which are used to control the compression

and extraction process. Their instances are passed from managed code to native code so each

of these interfaces must be implemented by the developer that is using SevenZip COM API.

The open and extract callbacks are passed to SevenZip during extraction and the update

callback is used when a new archive is created or an existing archive is updated. All these

callbacks contain methods of the IProgress interface that is used for progress reporting.

SevenZip also supports archive encryption and thus two interfaces are defined for this

purpose. The first one is used during extraction the second one during compression. Their

10

In standard COM, a caller obtains implementation of IUnknown interface first. This interface

defines QueryInterface method through which the caller can retrieve references to other

interfaces that the component implements.

Page 34: Extensible Provider for Windows PowerShell - ExtBrain

34

methods slightly differ in number of parameters of their CryptoGetTextPassword

method. The Diagram 7 shows main methods of the mentioned interfaces.

Diagram 7 SevenZip callback interfaces

2.7.3 C# SevenZip architecture Adaption of the SevenZip library for C# code presents an important outcome of this

thesis. Its design is emphasised on convenient usage from C# code. The aim was to design

several samples of usage first and then adjust the rest of code to them accordingly. The

following sections describe what factors influenced the final design.

Code sketch SevenZip internally uses one object for representing an archive file that it is currently

working with. This idea is adopted into the C# design. The proposed solution assumes one

object given to the client through which he/she will modify its inner content. In contrast to

SevenZipSharp which uses two different classes for compression and extraction, this approach

unifies the both procedures. Since the compression and decompression operations always work

with an archive stream, the archive object must be designed as disposable, thus it can

automatically close used streams at the end of the process. Items within the archive can be

simply exposed via public entries array on an archive object. The code sketch in the Figure 25

shows how the API could look like.

using (var archive = SevenZip.Open( File.Open("archive.7z", FileMode.Open), ArchiveFormat.SevenZip) { archive.Entries[0].Extract(File.Create(@"d:\Temp\TestFiles\old.txt")); archive.Entries[0].Update(@"d:\Temp\TestFiles\a.txt", "new.txt"); archive.Entries[1].Delete(); archive.AddEntry(@"d:\Temp\TestFiles\another.txt", "another.txt"); }

Figure 25 C# SevenZip code sketch

Archive updates Initial assumptions were that SevenZip allows updating an archive item inside one

opened stream. However, the investigation revealed that SevenZip needs to operate on two

streams during the update procedures. It needs one stream to read the old state from and one

stream to write the resulting data to. In the Figure 26 it can be seen that the SevenZip

FileManager application (which is a part of the SevenZip installation pack) also creates a

Page 35: Extensible Provider for Windows PowerShell - ExtBrain

35

temporary file when the update is performed. Because of this, the C# design must also differ

between the situations when the archive is opened for reading or when the update or creation

procedure is taken.

Figure 26 Monitoring of the SevenZip process which creates the temporary file

As a result, the sketch was changed and more methods for the archive object retrieval

were added. Initial Open method is substituted by three factory methods with this purpose:

Read (to obtain a readable archive), Create (for a new updatable archive) and Open (for an

old updatable archive). Each of these methods can return a slightly different archive object

which exposes only available operations.

When to perform update operations Another question that appeared is: When to perform the methods for an archive

modification? SevenZip allows the client to perform all updating operations through one

method call, so it would be inefficient to call the methods for entry adding, deleting or

updating immediately and repack the whole stream again and again. What is more, extracting

does not change the internal archive state, but deletes and updates do. The delete operation

would decrease the number of entries inside the archive and this might confuse methods

written in the code below. Consequently, this resulted in the decision to leave the state of the

archive unchanged until the time when the archive object is disposed. Those methods which do

not have an impact on the internal archive state like extract can be run immediately. The others

(deletes, updates or new entry adding) must be suspended to the time of the archive disposal.

They must only record information about the intended action and perform it later.

Compression settings Another consideration is required when integrating the feature through which the client

would set the properties of the compression process. Preferably, SevenZip capabilities were

examined and it was revealed that SevenZip enables to set attributes of the compression only

once – at the beginning of the process. During the running compression the client cannot

influence its setting any more. Unfortunately, this appears to be a big shortage of the SevenZip

compared to the SharpZipLib library which provides such functionality. The program which

would like to select different compression level for each item (e.g. lower for already archived

files and higher for the others) has no opportunity to accomplish this (respectively has, but

only by running new update process with new settings).

The properties of the compression process have assigned unique string identifiers in

SevenZip code and some of them relate only to the particular archive format. Moreover, the

setting to one compress option can impact the others. These aspects resulted in decision that

API should provide a feature through which the client could implement custom sets with

multiple compression options and then reuse them in more compression processes. For

instance, the custom options set could contain the settings of the archive compress ratio, used

number of threads, used compression method etc. that could be applied through one command.

Page 36: Extensible Provider for Windows PowerShell - ExtBrain

36

Since most of SevenZip compress property identifiers have noncommittal names, it is

necessary to provide default implementations of these sets, so the client does not have to know

them. Providing default sets implementations would bring another advantage. The client will

not have to know the type of the compression option which has to be set precisely in

compliance with SevenZip library expectations. But still, the opportunity to set property using

its name must be left unlocked for advanced users so they can take advantage of the whole

SevenZip functionality. The Figure 27 shows how the user code for setting compression

options could look like.

using (var archive = SevenZip.Create( File.Create(@"d:\Temp\TestFiles\Output\out.7z"), ArchiveFormat.SevenZip)) { archive.Password = "heslo"; archive.SetCompressOptions(new SevenZipCompressOptions { CompressHeaders = true, EncryptHeaders = true, ThreadsCount = 3 }); archive.SetCompressOption("x", (uint)3); archive.AddEntry(@"d:\Temp\TestFiles\Files\file.doc", "file.doc"); archive.AddEntry(@"d:\Temp\TestFiles\Files\a.txt", "a.txt"); }

Figure 27 Compression properties setting

Page 37: Extensible Provider for Windows PowerShell - ExtBrain

37

3 IMPLEMENTATION The PowerShell is Microsoft‟s product designed for the Windows platform and

implemented in the C# language. Therefore, the code of this thesis is also written in C# and

uses .NET Framework version 3.5. The provider itself is intended for PowerShell 2.0. This

thesis became a part of the ExtBrain project and thus it is integrated into it. Most of the code

forms its Framework.Shell namespace and some universal classes are added to the

Framework.Core namespace. This chapter discusses only main classes and key

implementation concepts. Additional description of classes, their methods and parameters is

available in the Documentation.chm file.

3.1 Modules description The division of the implementation followed the architectural concepts discussed in the

analysis chapter. Hence, it consists of these parts:

1. The part that is focussed on communication with PowerShell

2. The data module that presents independent tool for data manipulation

3. The set of readers together with reader providers

4. The set of writers together with a write object builder

5. The SevenZip library

Figure 28 Modules scheme

These modules are sketched in the Figure 28. Each module consists of several classes

and their division into namespaces is shown in the following list:

1. Framework.Shell.PowerShell – classes related to PowerShell

2. Framework.Shell.Data – the data module

3. Framework.Shell.Data.Reading – the readers and reader providers

Page 38: Extensible Provider for Windows PowerShell - ExtBrain

38

4. Framework.Shell.Data.Writing – the writers and writer object builder

5. Framework.Shell.SevenZip – the SevenZip library

6. Framework.Core – some classes are placed into this namespace, however this

namespace is formed mainly by classes that were created sooner and relate to other

ExtBrain project parts

Let us look at main classes that are contained within each of these namespaces. The

communication with PowerShell is done within two main classes – ExtbrainSnapIn and

ExtbrainFileSystemProvider. The Snap-In is used for provider registration within

the Powershell runspace. The provider utilizes the data module to perform requests given by

users. The key class of the data module is called FileTool and it exposes functions for data

reading, writing and other manipulation.

Reading part consists of several sample readers that can gain data from the file system,

xml files, archive files and text files. The readers are selected according to rules that are

carried by the ReaderProvider and ConvertReaderProvider classes. These two are

used during the expansion and conversion processes and are used by the FileTool class.

Writing part also contains several writers that enable to write data to the file

system, archive files, xml as well as text files. The target path is analysed by the

WriterObjectBuilder which selects a writer for each path segment.

Separate from these is the SevenZip library. Its main class is called SevenZip which

exports several methods for creation of an archive object. Archives are represented by two

main classes called ReadArchive and WriteArchive. Next, the library contains classes

for entries, entry and archive properties, callback classes and many others. Each of these parts

will be discussed in more detail in the following chapters.

3.2 SevenZip library Implementation of the managed SevenZip wrapper and interaction with the native

library brought many unexpected difficulties and unfortunately, the SevenZip COM API is not

described in the documentation. Hence, lot of time was spent by studying C++ classes and

testing their behaviour.

Consequently, the following chapters give deeper description of the SevenZip COM API

and explain requirements and expectations of the native code. The last chapter of this section

shows the API design of the managed wrapper.

3.2.1 COM interop classes The SevenZip library defines couple interfaces through which the developer can access

SevenZip features. Some of them are used both during compression and decompression and

some of them relate only to one part. Those needed in both are described first so the other

sections may refer to them.

Classes for data transfer – streams First of all, it is necessary to transfer archive binary data to the extern SevenZip COM

library and newly compressed data from the library back to the managed C# code. For this

purpose, SevenZip introduces the IInStream and IOutStream interfaces. These two

define methods Read, Write, Seek and SetSize with obvious responsibilities. Their

signatures are shown in Figure 29.

uint Read(IntPtr data, uint size); int Write(IntPtr data, uint size, IntPtr processedSize); void Seek(long offset, uint seekOrigin, IntPtr newPosition); int SetSize(long newSize);

Figure 29 Signature of stream methods

Page 39: Extensible Provider for Windows PowerShell - ExtBrain

39

Two additional interfaces ISequentialInStream and

ISequentialOutStream are proposed by SevenZip. Difference between IInStream

and ISequentialStream is that the sequential variant defines only Read method and

does not support seeking. That is because SevenZip does not need seeking ability for some

operations (e.g. the item that will be compressed is read sequentially). Similar situation arises

with the ISequentialOutStream. Unlike IOutStream, it defines only Write method

and does not provide Seek or SetSize operations.

When implementing these methods it is essential to be careful with parameters‟ type

selection. As it can be noticed from the Figure above, some parameters have IntPtr type

even though it seems they might have the long type. The reason for this is that SevenZip

sometimes unexpectedly set these parameters to null and converting them into long caused

marshalling exceptions.

The SevenZip.StreamImplementation class implements all the mentioned

stream interfaces. It wraps standard Stream and enables it for usage by the native SevenZip

library. Individual methods are implemented via calls of methods on the wrapped stream. An

instance of StreamImplementation class is passed to SevenZip both during extraction

and compression and it is utilized for each binary data transfer between COM SevenZip

library and managed C# code. The scenarios when the stream is wrapped into

StreamImplementation object are the following:

- archive file stream is passed to the native SevenZip library for decompression

- extracted files data is retrieved from SevenZip and passed to managed code

- files streams that are about to be compressed are passed to the SevenZip library from

managed code

- the result of the compression is retrieved from SevenZip and passed to managed code

SevenZip properties Another topic that has to be mentioned refers to properties of an archive or properties of

archive entries. SevenZip uses uint number for the property identification and special

structure for the property value storing. This structure is called PROPVARIANT in the

SevenZip code and it is defined as struct containing a C++ union. The union can contain

multiple fields that share memory space. Better explanation is given by MSDN whose

definition for the union is the following: “A union is a user-defined data or class type that, at

any given time, contains only one object from its list of members” [18]. Because C# has no

unions, solution is to define class with an explicit layout.

The class corresponding to the SevenZip PROPVARINANT has the same name in this

thesis. To define a class with an explicit layout in C#, the StructLayout attribute has to be

added with the parameter LayoutKind.Explicit. Consecutively, each field of the class

can have the FieldOffset attribute. This attribute takes a number of bytes that specify the

position of the field in the memory. Accurate implementation of the class has to match the

layout of the C++ union precisely. The sample code is shown in the Figure 30.

[StructLayout(LayoutKind.Explicit)] public struct PropVariant { [FieldOffset(0)] ushort vt; [FieldOffset(8)] public IntPtr Pointer; [FieldOffset(8)] public sbyte SByte; [FieldOffset(8)]

Page 40: Extensible Provider for Windows PowerShell - ExtBrain

40

public byte Byte; [FieldOffset(8)] public short Int16; ...

Figure 30 Class with an explicit layout

The SevenZip PROPVARINANT struct uses first two bytes for VARTYPE which is

ushort number taking 2 bytes. VARTYPE determines the type of an item stored in the

structure. Next, it contains three ushort fields reserved for internal purposes. These fields

take another 6 bytes. Further, the union is defined to store a property value. An important fact,

as Microsoft specifies, is: “A union requires enough storage to hold the largest member in its

member list” [18]. With this in mind, the C# object memory representation has to be aligned

precisely to match SevenZip expectations. Inaccurate memory size of C# PropVariant

object led to errors during PropVariant arrays transfer. Careful investigation revealed that

union part of the SevenZip PROPVARIANT struct takes maximally 8 bytes on 32-bit

based systems and 12 bytes on 64-bit systems. To ensure that these values will be set

accurately, the easiest way is to include an artificial field with a type of the needed size in

PropVariant C# class. This type is called PropArray and has two fields of uint and

IntPtr type. Uint field has always 4 bytes and IntPtr size changes with OS architecture

from 4 bytes to 8 bytes. The sum of these two produces the needed values. Definition of

PropArray is shown in the Figure 31.

[StructLayout(LayoutKind.Sequential)] struct PropArray { readonly uint elementsCount; readonly IntPtr elementsPointer; }

Figure 31 PropArray class definition

The PropVariant class also defines a couple of methods to make its usage better.

These are methods GetValue and GetObject with the generic parameter T. The methods

check whether the generic parameter corresponds to the type stored in the VARTYPE field and

if so, the method selects a relevant field for value retrieval. Easy usage supplies generic class

named IntKey. Instances of this class work as property identifiers that contain both the type

of property via generic parameter and uint identifier of the property. Sample of IntKey

definition is shown in the Figure 32.

// IIntKey<T> identifier definition in PropertyName class public static class PropertyName { public static readonly IIntKey<string> Path = new IntKey<string>((ulong)PropertyId.Path); ... }

Figure 32 IntKey identifier definition

This enables to keep the definition of the property together with its identifier and type at

one place. Sample usage of the IntKey class is sketched out in the Figure 33.

// Invoking function with IIntKey<T> parameter GetPropertyObject(itemIndex, PropertyName.Path);

Page 41: Extensible Provider for Windows PowerShell - ExtBrain

41

// Method taking IIntKey<T> identifier and parsing value from // PropVariant object public T? GetPropertyObject<T>(uint itemIndex, IIntKey<T> propertyKey) where T : class { PropertyId propertyId = (PropertyId)propertyKey.Value; PropVariant variant = new PropVariant(); InArchive.GetProperty(index, propertyId, ref variant); return variant.GetObject<T>(); }

Figure 33 IntKey identifier usage sample

3.2.2 Decompression interop classes Let us focus now on classes and interfaces that refer only to the decompression process.

This process is the easier one and many of its concepts are similar to those used with

compression.

Classes for controlling the compression process – callbacks The extraction and compression operations are controlled by callbacks. These callbacks

are passed to the COM library and their methods are called by SevenZip during the both

processes.

For decompression purposes SevenZip defines two main interfaces:

IArchiveOpenCallback and IArchiveExtractCallback. The first one is passed

to SevenZip during archive opening and SevenZip calls its methods to pass information about

the number of files and bytes that will be processed. However, SevenZip calls these methods

only for the subset of supported archive types (e.g. rar, tar, cpio, lzh) and often leaves

parameter specifying files count empty. More important is that SevenZip checks if the

obtained open callback object implements the ICryptoGetTextPassword interface. This

interface defines the method CryptoGetTextPassword through which the managed code

can convey password of an encrypted archive to the native SevenZip library.

The second IArchiveExtractCallback serves as a controller of the extraction

process. The PrepareOperatinon method is called by SevenZip before each item

processing and its parameter specifies whether the actual item will be skipped, tested or

extracted. This parameter depends on extraction configuration discussed below. Indispensable

is the GetStream method which asks the client to provide stream which will be used for data

extraction. Its first parameter determines the index of an actual item in the archive. The third

parameter tells the type of the intended operation with the item (i.e. skip, test or extract).

Method‟s signature is shown in the Figure 34.

int GetStream(uint index, out ISequentialOutStream outStream, AskMode askExtractMode)

Figure 34 GetStream method signature

Implementation of these interfaces can be found in the SimpleOpenCallback and

SimpleExtractCallback classes. SimpleExtractCallback implements one

additional COM interface called IProgress with two methods SetTotal and

SetCompleted. Before the extraction of an archive item starts, SevenZip calls SetTotal

method which defines how many bytes will be processed. Then it repeatedly calls

SetCompleted method which informs the client about the actual extraction progress.

Since some archives may have encrypted both header and contained files it is necessary

to implement ICryptoGetTextPassword interface on both SimpleOpenCallback

and SimpleExtractCallback classes. The password is requested during opening of an

Page 42: Extensible Provider for Windows PowerShell - ExtBrain

42

archive with encrypted headers or before extraction of an encrypted item in the archive (only

one request is sent during one extraction process).

IInArchive The crucial interface to access the decompression functionality is IInArchive. This

interface defines a couple of methods for the archive data retrieval. The first one Open is

called by the client before any other. It enables to pass an archive stream with the compressed

data to SevenZip together with an instance of IArchiveOpenCallback which

controls the archive opening process. The next parameter of the method is called

maxCheckStartPosition and it is designated for the header size determination.

SevenZip reads several header bytes and checks whether the signature of a file in the stream

corresponds to the previously specified archive type (specified during obtaining of the

IInArchive class from SevenZip – this will be discussed later). The number of checked

header bytes may be limited with the maxCheckStartPosition parameter.

The most important method is called Extract. Its signature is shown in the Figure 35.

Method invocation launches the extraction process. Each entry in a SevenZip archive has a

unique uint identifier. Therefore, the first parameter of the method is an array of indices that

tell SevenZip which entries are about to be extracted. This array has to be sorted from the

lowest to the highest index. The second parameter specifies the number of items that will be

processed. The 0xFFFFFFFF value has special meaning and stands for “all items in archive”.

Next, SevenZip allows running the extraction process without writing results to the output

stream(s). To do so, the third parameter called testMode has to be set to the non-zero

number.

int Extract(uint[] indices, uint numItems, int testMode, IArchiveExtractCallback extractCallback);

Figure 35 Extract method signature

The Close method of the IInArchive has to be called after the reading archive is

finished. SevenZip unlocks the archive stream after the method invocation.

IInArchive defines additional methods for reading archive properties such as CRC,

used compression method, creation time etc. These methods are divided into two

groups – those which relate to the archive itself and those which are associated with

archive entries. Both groups contain three methods. The first group includes

GetNumberOfArchiveProperties, GetArchivePropertyInfo and

GetArchiveProperty methods. Their signature is shown in the Figure 36.

uint GetNumberOfArchiveProperties(); void GetArchivePropertyInfo(uint index, out string name, out PropertyId propId, out VarEnum varType); void GetArchiveProperty(PropertyId propId, ref PropVariant value);

Figure 36 Methods for archive properties retrieval

The process of archive property obtaining is the following: Firstly, the

GetNumberOfArchiveProperties method has to be called to find out how many

properties are defined within the archive. Secondly, the GetArchivePropertyInfo

method follows. This method takes uint number as a parameter – the index of the property.

The valid range for indices that can be used with the GetArchivePropertyInfo method

spreads from zero to properties count minus one. The method returns an identifier

of the property, its name and type. Finally, the identifier has to be passed to the

GetArchiveProperty method through the first parameter and the value of the property is

set into the second output parameter. The code for property retrieval is sketched out in the

Page 43: Extensible Provider for Windows PowerShell - ExtBrain

43

Figure 37. When the property value is retrieved it is encapsulated into data class called

Property. This is the class that the client works with.

var result = new List<Property>(); for (int index = 0; index < InArchive.GetNumberOfArchiveProperties(); index++) { string name; PropertyId propId; VarEnum varType; InArchive.GetArchivePropertyInfo((uint)index, out name, out propId, out varType); var value = new PropVariant(); InArchive.GetArchiveProperty(propId, ref value); result.Add(new Property(propId, name, varType, value.GetObject())); }

Figure 37 Sample of archive property retrieval

The process of properties getting of an item stored in the archive is very similar.

Three methods from the first group are substituted by the GetNumberOfProperties,

GetPropertyInfo and GetProperty methods. These methods forms the second group

related to an archive entry and have the same responsibilities as their corresponding partners.

One difference can be found in the second of them. Unlike the GetArchiveProperty

method from the first group, the GetProperty method takes one additional parameter called

entryId. It is designated to identify the archive item to which the properties relate.

3.2.3 Loading SevenZip library into C# In order to be able to use SevenZip functionality one remaining part is missing. This part

refers to SevenZip library loading and instantiation of COM objects.

The installation process of SevenZip puts into the installation folder file named 7z.dll

which is the one that managed code will cooperate with. To load the unmanaged library into

memory, several Windows API native functions are needed. The question is how to invoke

these functions. Thankfully, the interop features of the common language run-time, called

Platform Invoke (P/Invoke), are very complete. To call such a method, its signature has to be

added in a C# class or struct. This definition has to be preceded by an extern keyword. This

keyword is the hint to the compiler telling that the method will be implemented by a function

exported from a DLL. Therefore, there is no need to supply a method body. The requirement

for P/Invoke methods is that they must be declared as static since there is no consistent notion

of an instance in the Windows API. Important is that P/Invoke methods are nothing more than

metadata that the just-in-time (JIT) compiler uses to wire managed code to an unmanaged DLL

function at run time. An important piece of information required to perform this wiring to the

unmanaged world is the name of the DLL from which the unmanaged method is exported.

This information is provided by the DllImport attribute that has to be defined before the

function C# declaration. The sample of P/Invoke method declaration is shown in the Figure 38.

[DllImport("User32.dll")] static extern Boolean MessageBeep(UInt32 beepType);

Figure 38 P/Invoke method declaration

One of the rules of C# is that its call syntax can only access CLR data types such as

System.UInt32, System.Boolean etc. “C# is expressly unaware of C-based data

Page 44: Extensible Provider for Windows PowerShell - ExtBrain

44

types used in the Windows API such as UINT and BOOL, which are just typedefs of the C

language types.” [19] Hence, data has to be marshalled between the managed and unmanaged

world.

For the library loading purposes two functions are needed. The LoadLibrary

function loads a dynamic library into address space of the calling process and FreeLibrary

serves for library unloading. Both of these functions are defined in the kernel32.dll

library so the name kernel32.dll has to be set to the first parameter of the DllImport

attribute. The LoadLibrary method takes the string parameter filename telling the path

to the target library. Problem can come with different versions of host OS. Windows 9x family

of products lack Unicode support, while Windows NT and higher versions use Unicode

natively. Therefore, the DllImport attribute includes a handful optional property called

CharSet. By setting this property to value CharSet.Auto CLR uses an appropriate

character set based on host OS. The default value of the property is CharSet.Ansi. This

default is unfortunate as it negatively affects the performance of the text parameter marshalling

for interop calls. The CharSet property is accompanied by the additional

BestFitMapping property. CLR converts any Unicode characters to ANSI characters on

Windows 98 or Windows ME and by default, it uses close-matching. For instance, copyright

character can be translated into „c‟ ANSI character. To prevent this behaviour the

BestFitMapping property must be set to false.

The frequently omitted property of the DllImport attribute is SetLastError. This

parameter helps when dealing with different approaches of exception handling in Windows

API and managed code. If the value of the attribute is set to true, CLR will cache the error set

by the API function after each call to an extern method. The error HRESULT can be then

retrieved by calling the Marshal.GetHRForLastWin32Error method. Then it is

appropriate to check whether the call of unmanaged function succeeded and if not, use the

Marshal.ThrowExceptionForHR method. It throws an exception in managed code that

corresponds to the HRESULT error code.

To improve reliability and safety during interop with native code, .NET introduces

SafeHandles. Before their existence all handles had to be wrapped into the IntPtr

managed wrapper object. But it could happen that the handles stored inside IntPrt objects

obtained from P/Invoke calls could leak when program run into exceptional conditions such as

rude thread abort, out-of-memory, stack overflow etc. For instance, if an asynchronous

exception occurs right after P/Invoke call returns the result but before the resulting IntPtr

handle is stored, the resource allocated by the native code will leak without any hope for

freeing it. Moreover, since OS recycles handle values periodically, the situation when the

IntPtr object whose recycled handle could be pointing to some other resource instead of the

original one, thus opening potential secure risk. This might happen especially for long-running

processes. .NET 2.0 introduced SafeHandle class which was designed to address these

problems.

When using SafeHandle instead of IntPtr, the runtime guarantees that the store

operation is atomic and an IntPtr handle will be safely saved inside a SafeHandle object.

Besides, SafeHandle inherits from CriticalFinalizerObject which guarantees

that the finalizer will be run, won't be aborted by the host, and that it will be run after the

finalizers of other objects collected at the same time (this is important because it ensures that

classes like FileStream can run a normal finalizer to flush out existing buffered data

without worrying about the state of the SafeFileHandle). SafeHadles solves also the

second issue since P/Invoke layer automatically increments the safe-handle ref-count anytime

the underlying handle value is passed to the native code in a Win32 method call, and

decrement it upon completion. “This will ensure that any out-of-band asynchronous call to

methods like CloseHandle (perhaps from another thread) will not release the OS handle while

there are other P/Invoke calls that are pending.” [20]

Page 45: Extensible Provider for Windows PowerShell - ExtBrain

45

Internally, the .NET Framework uses large number of SafeHandle-derived types, one for

each type of unmanaged resource it needs to deal with. However, only few of

them are exposed (SafeFileHandle, SafeWaitHandle). When dealing with

unmanaged resources where no public SafeHandle-derived type is suitable,

there is a recommendation to create custom derivative. The example of

SafeHandle implementation is shown in the Figure 39. To make writing custom

SafeHandle easier, the .NET Framework provides two additional public

SafeHandle-derived types, SafeHandleZeroOrMinusOneIsInvalid and

SafeHandleMinusOneIsInvalid. SafeHandle has to be able to tell whether

IntPtr it is storing is valid for the relevant type of resource. Since the majority of resource

handles in the Win32 world are invalid when they are -1 or when they are either 0 or -1, these

classes have been provided to incorporate those checks.

[DllImport("kernel32.dll", SetLastError = true)] public static extern SafeLibraryHandle LoadLibrary(string lpFileName); [ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success)] [DllImport("kernel32.dll", SetLastError = true)] [return: MarshalAs(UnmanagedType.Bool)] private static extern bool FreeLibrary(IntPtr hModule); [SecurityPermission(SecurityAction.LinkDemand, UnmanagedCode = true)] public sealed class SafeLibraryHandle : SafeHandleZeroOrMinusOneIsInvalid { private SafeLibraryHandle() : base(true) { } protected override bool ReleaseHandle() { return FreeLibrary(handle); } }

Figure 39 SafeHandle implementation

Because LoadLibrary returns a handle, it calls for SafeHandle usage. That is why

the custom implementation of SafeHandle called SafeLibraryHandle is used to store

a handle to the loaded library. Thanks to that, reference counting takes place and it is ensured

that the library is freed only after all references have been released. But it is important to

match each call to LoadLibrary with a call to FreeLibrary to decrease the reference

count.

As it can be seen in the Figure 39, the FreeLibrary method is marked with a

ReliabilityContract attribute. The ReliabilityContract attribute is a way to

tell CLR what kind of guarantees the method implementation makes when it encounters

extraordinary conditions.

Obtaining COM objects from SevenZip library The whole process of loading would be useless without ability to call an unmanaged

function defined within the library. To do so, the GetProcAddress function from

kernel32.dll is convenient. This function takes a handle to the library and a name of the

desired function and returns the function pointer. Its signature is shown in the Figure 40.

[DllImport(("kernel32.dll"] static extern IntPtr GetProcAddress(SafeLibraryHandle hModule, string procName);

Figure 40 GetProcAddress method's signature

Page 46: Extensible Provider for Windows PowerShell - ExtBrain

46

The whole unmanaged DLL library loading is wrapped into the class called

Win32Library. The custom SafeLibraryHandle class and P/Invoke methods are

hidden inside the class separated from the rest of code. One more improvement is added to

hide the low level GetProcAddress method. It is wrapped into the generic method called

GetFunction which takes an identifier of the function in the unmanaged library. This

identifier is of IKey<T> type that is the common interface for identifiers enriched with

generic type. Similarly to IntKey<T> discussed above, the Key<T> holds the string

identifier together with type stored in the generic parameter. This enables generic functions

both obtaining the string identifier and inferring the type from one parameter, thus providing

better type safety. The implementation of the GetFuction method is shown in the Figure

41.

public TDelegate GetFunction<TDelegate>(IKey<TDelegate> functionKey) where TDelegate : class { var address = GetProcAddress(handle, functionKey.ToString()); if (address == IntPtr.Zero) return null; return (TDelegate)(object)Marshal .GetDelegateForFunctionPointer(address, typeof(TDelegate)); }

Figure 41 Implementation of the GetFunction method

Wrapping these low level techniques into separate class leads to several advantages:

- it provides better reusability when loading other unmanaged libraries

- it provides type-safe delegate wrappers over GetProcAddress

- it converts Win32 errors to managed exceptions

- it uses SafeHandles for an unmanaged library

- it wraps the raw P/Invokes to LoadLibrary, GetProcAddress, and FreeLibrary

Currently, when the class for unmanaged libraries loading is prepared, it is much easier

to call the function from the SevenZip unmanaged library. SevenZip source code contains

Archive2.def file exporting the CreateObject function. This is the essential function

through which COM objects are obtained from SevenZip. Its declaration in the SevenZip code

is shown in the Figure 42.

STDAPI CreateObject(const GUID *clsid, const GUID *iid, void **outObject)

Figure 42 CreateObject method's signature

In order to map this function to managed code, its delegate has to be declared. This

delegate is used as a type parameter of the key identifier which is transferred to

GetFunction method which locates the function in the unmanaged library. The declaration

of the delegate and key is shown in the Figure 43.

[UnmanagedFunctionPointer(CallingConvention.StdCall)] delegate int CreateObjectDelegate([In] ref Guid classId, [In] ref Guid interfaceId, out IntPtr outObject); static readonly IKey<CreateObjectDelegate> createObjectKey = new Key<CreateObjectDelegate>("CreateObject");

Figure 43 CreateObject delegate and key

Page 47: Extensible Provider for Windows PowerShell - ExtBrain

47

The process of calling the SevenZip unmanaged function is following: Firstly, the

instance of Win32Library class is created whose constructor takes path to the desired DLL.

Then it loads the DLL into memory. Secondly, the caller invokes the GetFunction method

and provides it with the Key<TDelegate> object that contains a string identifier and

required delegate type. The GetFunction method internally calls GetProccAddress

and takes care of marshalling and casting its result to required delegate type. Finally, the caller

uses the obtained delegate and invokes the referenced function supplying SevenZip GUIDs.

The function returns a pointer to an unmanaged COM object that is converted into managed

representation by the Marshal.GetTypedObjectForIUnknown method. The simplified

process is sketched out in the Figure 44.

using (var library = new Win32Library(@"c:\Program Files\7-Zip\7z.dll", false)) { IKey<CreateObjectDelegate> createObjectKey = new Key<CreateObjectDelegate>("CreateObject"); CreateObjectDelegate createObjectFunc = library.GetFunction(createObjectKey); Guid interfaceId = ...; // IInArchive guid Guid classId = ...; // guid of particular archive class IntPtr interfaceAddress = IntPtr.Zero; createObjectFunc(ref classId, ref interfaceId, out interfaceAddress); IInArchive archive = (IInArchive)Marshal.GetTypedObjectForIUnknown( interfaceAddress, typeof(IInArchive)); archive.Open( ... ); archive.Extract( ... ); ... } // 7z library is released from memory

Figure 44 Invoking native SevenZip function

The Figure 44 omits necessary checks and better code separation into number of

methods. The sample serves as an overview of the loading process.

Page 48: Extensible Provider for Windows PowerShell - ExtBrain

48

Diagram 8 Archive COM object retrieval

3.2.4 The decompression process The previous chapters described the decompression API and loading of the unmanaged

SevenZip library into memory. The only thing that remains is to put these steps into correct

order and describe interaction between the unmanaged and managed world.

The first step is to load the SevenZip library. The managed code resolves the path to

DLL file and passes it to the Win32Library class constructor. It internally calls the

LoadLibrary unmanaged function which returns a SafeHandle instance. If the library is

correctly loaded into memory and the SafeHandle is valid then managed code locates its

CreateObject function by calling GetProcAddress.

The second step is to obtain a COM object representing an archive – an instance of

IInArchive interface. SevenZip defines GUIDs for each of its supported archive types (e.g.

7z, zip, gzip, iso, etc.). These GUIDs are defined in the Guid.txt file which is a part of the

SevenZip source code package. In order to obtain an archive COM object, managed code must

supply two GUIDs: the first one specifies the type of the archive and the second one is the

GUID of the IInArchive interface. When the GUIDs are passed to the unmanaged part

through CreateObject function its result is sent back to managed code. This part of the

process is shown in the Diagram 8.

Next, the Open method of obtained COM archive object is called and it provides the

stream of an archive file and instance of the IArchiveOpenCallback. At that time

SevenZip starts to read its header bytes. Moreover, it checks the file‟s signature and compares

whether it corresponds to format specified by the GUID identifier. If the header of the archive

Page 49: Extensible Provider for Windows PowerShell - ExtBrain

49

is encrypted, unmanaged code asks for password by calling CryptoGetTextPassword

method of the IArchiveOpenCallback object. After that all other IInArchive

methods can be called. Managed code can consequently retrieve archive properties or launch

the extraction process.

The Extract method on IInArchive COM object starts the extraction process. It

takes one parameter of the IArchiveExtractCallback type that is used by unmanaged

code during the extraction. The order in which SevenZip calls methods of

IArchiveExtractCallback object is sketched in the Figure 45.

Figure 45 Order of method calls during the extraction process

The methods SetTotal, SetCompleted, PrepaireOperation and

SetOperationResult are used only to inform the client about the extraction process and

the data about the process flows from unmanaged code to managed code. The different

situation comes with GetStream and CryptoGetTextPassword methods which are

called by SevenZip. Here, data are provided by the managed code and transferred to the

unmanaged part. These methods are called by SevenZip when it needs a stream for item

extraction or password for item decryption. Writing to the stream itself starts after

PrepaireOperation method call. Since that moment SevenZip repeatedly alternates

between writing data to the stream and calling callback‟s SetCompleted method. The

extraction of the item ends when the SetOperationResult method is called. The part of

process from Open method is shown in the Diagram 9.

Page 50: Extensible Provider for Windows PowerShell - ExtBrain

50

Diagram 9 Extraction process

3.2.5 Compression interop classes Compression API of SevenZip is very similar to the decompression part and its usage

becomes very intuitive after good cognition of decompression. IInStream, IOutStream

and their sequential variants are also used for binary data transfer during the compression

process. New interfaces that come are IArchiveUpdateCallback, ISetProperties,

ICryptoGetTextPassword2 and IOutArchive. An important fact is that these

interfaces are for both creating new archive and updating of an existing archive.

Callbacks As was the case with the decompression, SevenZip defines interface of callback in order

to control the compression process. For this purpose the IArchiveUpdateCallback is

defined. Its most important method is GetUpdateItemInfo. SevenZip calls it for each

processed item whose index is set to the method‟s first parameter. The other three are output

parameters that must be filled by the callback. The first of them named newData determines

whether the item is new and SevenZip should ask for its data stream. The value 1 means that

new stream with data will be provided for the item and the value 0 stands for the opposite. If

the value is set to 1 then SevenZip calls the callback‟s GetStream method which supplies

the item data stream. Very similar situation comes with the second output parameter called

newProperties. Also this parameter can be set to 1 or 0 and if the value of 1 is set than

SevenZip asks for new properties of the item by calling GetProperty method of the

callback. This method is called more times separately for each property. SevenZip provides

index parameter to determine related item in the archive and identifier of the property (e.g.

Page 51: Extensible Provider for Windows PowerShell - ExtBrain

51

CreationTime, Size, Path). The third output parameter of GetUpdateInfo method is

called indexInArchive. This number specifies the internal archive index which will be

used to access item within the archive. Special meaning has the value -1 which indicates that

the index for the item can be selected by SevenZip and it does not matter which exact value

will be used. To delete item from the archive this method must set output parameters

newData and newProperties to 0 value and indexInArchive has to be set to an

index which is different from the one that item already had. The examples of settings returned

by the GetUpdateInfo method are shown in the Table 3.

Desired operation newData newProperties indexInArchive

New item in archive (arbitrary

index within the archive) 1 1 -1

New item in archive (index within

the archive set to 3) 1 1 3

Update properties of the item with

index 2 0 1 2

No operation with the item with

index 4 0 0 4

New data stream for the item with

index 5 but with the same

properties

1 0 5

Delete the item with index 6 0 0 -1

Table 3 Return values of the GetUpdateInfo method

Important information is that old versions of SevenZip could not update so called solid

archives (more information about solid archives can be found at [21]). This feature appeared

with 9.xx versions of the library which were released during 2010. In older version SevenZip

throws error telling that the feature is not implemented. Since the code in this thesis is adjusted

to support the version 9.20 of the SevenZip library, all values listed in the Table 3 are valid.

IArchiveUpdateCallback is accompanied by the

ICryptoGetTextPassword2 interface with the method CryptoGetTextPassword2.

SevenZip calls this method to ask for the archive password. The password is requested only

once which means that one password can be used to decrypt all items of the archive file. The

callback interface is implemented by the MultipleUpdateCallback class.

IOutArchive The main compression interface is IOutArchive and it defines the only method that

can be used with COM object representing the archive file. This method is named

UpdateItems and its invocation starts the compression/update process. It takes three

parameters: the stream to write in the resulting data, the number of processed items and an

update callback. At this point, it is necessary to distinguish situation when a new archive is

created and when an old archive is updated. The first one is much easier since IOutArchive

COM object can be directly retrieved by calling the CreateObject function which supplies

only the archive format GUID. Then the UpdateItems method can be invoked immediately

on the obtained COM object.

In contrast to previous, during the update operation the IInArchive object has to be

created first. This is achieved by using exactly the same procedure that was used for archive

extraction. It means that CreateObject has to be invoked to obtain IInArchive COM

object, then it must be followed by Open method to deliver archive stream and so on. When

this procedure is finished IInArchive COM object can be casted into IOutArchive and

consequently UpdateItems method may be called. It is important to set correctly its second

parameter numItems. Even if the update operation will delete some items from the old

archive, they must be included in numItems parameter. For instance, if the archive contained

Page 52: Extensible Provider for Windows PowerShell - ExtBrain

52

6 items, 2 of them will be added and 3 deleted, the parameter has to be set to value 8 (6 – to go

through all old items and determine which of them should be deleted, 2 – to go through new

entries).

Classes for compression setting One more interface is left to complete the description of the compression part –

ISetProperties with the SetProperties method. The user of SevenZip can

customize the compression process by setting its properties. These properties can influence the

compression level, number of threads used by SevenZip, used encryption method and many

others. Each property is identified by its name (e.g. “x” string stands for the compression level,

“mt” for threads count). These names are sufficiently described in the SevenZip

documentation. What can be slightly confusing is that the setting of one property can influence

the others (e.g. compression level can set two properties called “number of fast bytes” and

“number of passes”) and not all properties can be set to all archive formats. If the property is

set wrongly, SevenZip will ignore it. Every IOutArchive COM object can be casted into

ISetProperties interface and then SetProperties method can be invoked. This

method takes three arguments – a pointer to the array containing string identifiers of

properties, a pointer to the array of their values (one value is one PropVariant object) and a

uint number of properties. But these values have to be prepared by managed code so the

unmanaged part can read them. The string identifiers are marshalled by the StringToBSTR

method and array of values must be properly allocated and freed with pair of methods

GCHandle.Alloc and GCHandle.Free. In managed code, each compression setting is

wrapped into the custom class called CompressProperty that contains both the identifier

and the value. The sample how to provide compression setting to the unmanaged code is

sketched out in the Figure 46.

void SetCompressOptions(IEnumerable<CompressProperty> properties) { var namesHandle = GCHandle.Alloc(properties.Select(prop => prop.Name).ToArray(), GCHandleType.Pinned); var valuesHandle = GCHandle.Alloc(properties.Select(prop => prop.PropVariant).ToArray(), GCHandleType.Pinned); try { ((ISetProperties)OutArchive).SetProperties( namesHandle.AddrOfPinnedObject(), valuesHandle.AddrOfPinnedObject(), names.Count()); } finally { namesHandle.Free(); valuesHandle.Free(); } }

Figure 46 Compression options setting

3.2.6 The compression process The compression process as well as decompression starts with the same steps. Since the

loading of library has been already discussed let us move to the point where the

CreateObject function has been already obtained. After that the prime thing is to retrieve

an archive COM object. For a new archive the IIOutArchive is created via

CreateObject function. This step is substituted by three steps when updating an existing

archive. These steps are creating an IInArchive object, calling its Open method and then

casting it into IOutArchive interface. From this point updating and creating new archive

Page 53: Extensible Provider for Windows PowerShell - ExtBrain

53

continues with the same steps. First, the UpdateItems method is called which starts the

compression process. The IArchiveUpdateCallback object is passed to unmanaged

code and SevenZip calls its methods for each process item in the order sketched in the Figure

47.

Figure 47 Order of method calls during the compression process

Because of the fact that compression operation runs regularly in several threads, the

SetOperationResult method call is often followed by multiple SetCompleted

method calls. Since that, it is not a good idea to close the output stream in the

SetOperationResult method call as SevenZip may continue writing into the stream. The

writing to output stream itself starts right after GetStream method call. The diagram of

compression process is shown in the Diagram 10.

Diagram 10 Compression process

Page 54: Extensible Provider for Windows PowerShell - ExtBrain

54

3.2.7 SevenZip API in C# The interaction with the native library was discussed in the previous chapters. However,

the API description of the C# wrapper is missing. The following sections are intended for this

purpose.

Archive classes In order to separate the archive object implementation and keep the technique “Program

to an interface, not an implementation”, two archive interfaces are proposed –

IReadArchive and IWriteArchive. These interfaces are those that the clients work

with, thus they remain unaware of the specific types of the objects that they use. Since there is

no difference between creation and update operations from the client perspective, the

IWriteArchive is used for both. That is also supported by the fact that SevenZip defines

only one interface for an archive modification as well. The implementations of these interfaces

are called ReadArchive and WriteArchive.

The archive objects are obtained via the static factory methods Read, Create and

Update declared within the SevenZip class. Items inside the archive are exposed through

public Entries property. Since the capabilities of archive entries differ in the

ReadArchive and WriteArchive, two entry interfaces IReadEntry and IEntry are

proposed. These represent inner archive items and expose their available methods. As a result,

the SevenZip.Read method returns the IReadArchive with IReadEntries and the

SevenZip.Create and SevenZip.Open methods return the IWriteArchive with

IEntries. Since IEntry has the same capabilities as IReadEntry and provides only few

additional methods, IEntry inherits from IReadEntry interface and the same do their

implementations ReadEntry and Entry. The same fact is valid also for archives and

therefore they are also put into hierarchy. The final design of the archive classes is shown in

the Diagram 11. As it can be noticed, the code that communicates with the native library is

extracted out of the ReadArchive and WriteArchive classes into custom class called

Archive. That is because it is better to keep the communication with the native code at one

place separately and not though out the whole code.

Diagram 11 Structure of archive classes

The attributes of entries as well as attributes of an archive are exposed to the client by

public property arrays declared both on the archive and entry interfaces. Because the common

use case does not often care about attributes and includes only data compression or

Page 55: Extensible Provider for Windows PowerShell - ExtBrain

55

decompression, their values are resolved on demand and not at the time when the archive

COM object is loaded. Consequently, the entries own the reference to the parent Archive

object and use it when the request on property is sent from the user.

Compress options In order to be able to influence the compression process, the IWriteArchive object

exposes the method called SetCompressOption through which one compress property

identified by its name can be set. It is determined for advanced users who are aware of these

option names and their types. SevenZip describes them well in its documentation. Since the

archive modification process is suspended to the time of the archive object disposal, it is

allowed to call the SetCompressOption method at any time before the disposal.

However, to make it easier for users who are not aware of options names the work

provides another way for compress options setting. It defines the interface called

ICompressOptions and includes several default implementations that are shown in the

Diagram 12. They include public properties where each property relates to one compress

option. The whole options set can be then passed to the SetCompressOptions extension

method of the IWriteArchive class. But, why the method is declared out of the

WriteArchive class scope as an extension method? The reason for this is that the method

does not need any internal archive properties and can use the SetCompressOption method

for its implementation. Although it might seem that both methods belongs to the

WriteArchive and both should be defined within the class, it is better to keep small amount

of methods that ensure object consistency and other implement as extension methods which do

not have access to inner object fields, hence they cannot corrupt its internal state.

Diagram 12 Default ICompressOptions implementations

It can be noticed that compress options inherit from generic interface rather than from

ICompressOptions directly. The fact that some properties relate to the particular archive

format developed an idea that API should somehow force the client to set the compress options

correctly. In other words, when the client is setting for example the compression level he/she

must be aware of the archive format since not all values of the compression level can be set to

all archive types. Because of that, IWriteArchive and ICompressOptions interfaces

are enriched with the generic parameter TArchive. As a result, the compiler will throw an

error when the generic parameter TArchive in ICompressOptions object differs from

the parameter in IWriteArchive object and prevent the client from misusing the options

with wrong archive format.

Specification of the archive format One thing to figure out was how the archive format will be specified. This piece of

information has to be supplied by the client. In native SevenZip code, every archive class is

Page 56: Extensible Provider for Windows PowerShell - ExtBrain

56

identified by its GUID. This GUID has to be provided by managed code when creating an

archive COM object. To hide these GUIDs to the client they were substituted by an

enumeration called ArchiveFormat containing members for each supported archive format.

But this is little odd since the WriteArchive generic parameter TArchive already

semantically specifies the archive type. So the idea was to specify the format with the generic

type parameter rather than ArchiveFormat enumeration. This solution is shown in the

Figure 48 where the format of the archive is specified by the generic parameter of the Create

method. It gains the GUID from SevenZip.BZip2 class and creates an instance of

IWriteArchive<SevenZip.BZip2>.

using (var archive = SevenZip.Create<SevenZip.BZip2>( File.Create(@"d:\Temp\TestFiles\Output\out.bzip2"))) { archive.AddEntry(@"d:\Temp\TestFiles\Files\file.doc", "file.doc"); }

Figure 48 Creation of generic archive class

From the sample above it is obvious that SevenZip class must provide classes for each

archive format that carry discussed GUIDs. And, the factory methods will have to take generic

parameter as well. This is shown in the example in the Figure 49. This solution enables to

check the correctness of used compress options during the compile time. Moreover, it can be

easily restricted what types of archives can be used with the Create and Update methods.

[Guid("23170f69-40c1-278a-1000-000110010000")] public class Zip : IWritableArchive { } [Guid("23170f69-40c1-278a-1000-000110020000")] public class BZip2 : IWritableArchive { } public static IWriteArchive<TArchive> Create<TArchive>( ... ) where TArchive : class, IWritableArchive { ... } public static IWriteArchive<TArchive> Update<TArchive>( ... ) where TArchive : class, IWritableArchive { ... }

Figure 49 Helper classes for archive format definition

However, this API was finally slightly liberated and the client was allowed to give up

the type safety and specify the format not by the generic parameter but by using the

ArchiveFormat enumeration. The reason for this came up from usage. Some applications

may detect the format of an archive at runtime (e.g. according to file extension) and they

would need to map every extension to all three factory methods. For example, the reference to

Read<SevenZip.Zip>, Open<SevenZip.Zip> and Create<SevenZip.Zip>

methods would have to be kept in order to read and modify the zip file whose type is

recognized at runtime. That is why the non-generic version of the WriteArchive was

introduced and similarly non-generic versions of the factory methods. The zip file extension

can be then mapped only to one ArchiveFormat variable instead of three method variants.

Transferring new entries data One thing to deal with during implementation was how to transfer the properties of a

new added entry. The first thought was to pass this information in several parameters of the

AddEntry method define on IWriteArchive. These would include an inner archive path,

creation time, last access time etc. In fact, there are too many properties that can SevenZip

store together with an entry. For this reason, new interface IWriteEntry together with its

default implementation called WriteEntry is introduced. WriteEntry is only data

container that assembles entry properties. Since the client often wants to quickly create new

Page 57: Extensible Provider for Windows PowerShell - ExtBrain

57

entry from FileInfo, Stream or other objects, the API provides several overloads which

would take care of WriteEntry creation. This data class can be used for update operations

as well. However, SevenZip allows three ways of updating:

1. Update the whole entry

2. Update entry properties and keep old data (e.g. only the inner archive path can be

changed and hence move operation can be performed this way)

3. Update entry data and keep old properties

As a result, the WriteEntry class is divided into two parts. Those properties of an

entry that can be updated are extracted into the custom data class called EntryProperties.

The WriteArchive class then defines three update methods: Update, UpdateData and

UpdateProperties. This final design is shown in the Diagram 13.

Diagram 13 Classes that hold new entry data

3.3 Implementation of the PowerShell extension The next module of this thesis is focussed on implementation of the PowerShell

extension and classes that communicates with the PowerShell runtime. Let us highlight main

concepts used during extension development.

3.3.1 Snap-In implementation Every extension of Windows PowerShell starts with implementation of the Snap-In

installer class which must be contained within the .NET assembly. The following sections

describe the process of Snap-In‟s creation.

Selection of the Snap-In base class Two types of Snap-In installer classes are introduced by PowerShell – PsSnapIn and

CustomPsSnapIn. If the Snap-In inherits from the PsSnapIn class, PowerShell will

register all the providers and cmdlets contained within the Snap-In‟s assembly. Inheriting from

CustomPsSnapIn class brings the ability to specify classes of cmdlets and providers which

will be registered. The set of cmdlets is defined through CmdLets property, a collection of

CmdletConfigurationEntry objects and similarly, providers through Providers

property, a collection of ProviderConfigurationEntry. The type that implements the

Page 58: Extensible Provider for Windows PowerShell - ExtBrain

58

provider is passed to ProviderConfigurationEntry class constructor together with the

name of the provider. Optionally the client can specify a help file that is shown to user when

calling Get-Help cmdlet on the provider.

Snap-In registration and loading into running shell The Snap-In class must be marked with the RunInstallerAttribute that is a

member of the System.ComponentModel namespace so it could work with

installutil.exe, a tool that .NET Framework provides for installing and uninstalling

managed applications on the computer. Snap-In has to override public Name property. The

registry key is created at Snap-In registration time and the Snap-In‟s name is used as a key

name.

When the DLL containing the Snap-In is registered its necessary to load it into shell. For

this purpose PowerShell defines cmdled called Add-PsSnapIn. It takes the name of the

Snap-In as the first parameter. To verify whether the loading succeeded the Get-PsSnapIn

cmdlet can be used which lists all available Snap-Ins. Since Installutil.exe can be invoked

directly from PowerShell both registration and loading can be done inside the PowerShell. The

sample is shown in the Figure 50.

PS D:\Temp\Extbrain>C:\Windows\Microsoft.NET\Framework\v2.0.50727\ installutil.exe Extbrain.Shell.dll PS D:\Temp\Extbrain>Add-PsSnapIn Extbrain.Shell

Figure 50 Snap-In registration

Specification of the format of objects written to the output PowerShell enables to customize the final format of output objects that were produced

by the providers or cmdlets within the Snap-In. This format is described in simplified

xml files with accurately specified structure. Every Snap-In can provide its own

format files by overriding the Formats property. This property is a collection of the

FormatConfiguration class where each instance contains path to one xml format file.

When an object is written to the pipe PowerShell uses reflection to determine its type and

checks whether the format for the obtained type is supplied.

The structure of the format has to contain definition of one or more view elements

where each view can be applied to multiple object types. The view has it‟s the name element

that is used to identify the view. Then it contains definitions of types that determine the types

to which the view will be applied. Next, the control definition follows describing the output

view. This can be TableControl, ListControl, WideControl or CustomControl

and each of them can select what properties of the output objects will be shown to the user.

The sample of the format file is shown in the Figure 51 and more information about output

formatting is available at [22].

<View> <Name>System.Globalization.CultureInfo</Name> <ViewSelectedBy> <TypeName>Deserialized.System.Globalization.CultureInfo</TypeName> <TypeName>System.Globalization.CultureInfo</TypeName> </ViewSelectedBy> <TableControl> <TableHeaders> <TableColumnHeader><Width>16</Width></TableColumnHeader> <TableColumnHeader/> </TableHeaders> <TableRowEntries><TableRowEntry><TableColumnItems> <TableColumnItem><PropertyName>LCID</PropertyName></TableColumnItem>

Page 59: Extensible Provider for Windows PowerShell - ExtBrain

59

<TableColumnItem><PropertyName>Name</PropertyName></TableColumnItem> </TableColumnItems></TableRowEntry></TableRowEntries> </TableControl> </View>

Figure 51 Format file

Debugging Snap-In Cmdlets and providers within the Snap-In can be debugged from the Visual Studio

environment. However, the breakpoints are sometimes not hit in Visual Studio 2010 in

combination with x64 Windows OS and few steps have to be done to make it work.

Standard way of project configuration in Visual Studio for debugging includes settings

to these fields in the project properties window:

1. Pre-build events

In order to avoid conflict with the old Snap-In the best is to uninstall it before the new

registration is done. This is achieved by calling installutil.exe tool with /u

option and supplying the path to the old DLL version with the Snap-In.

2. Post-build events

The new version of assembly containing Snap-In has to be registered by calling to

installutil.exe tool and providing it with the path to the assembly.

3. Start external program

This field will contain path to powershell.exe.

4. Command line arguments

The registration is not enough and the Snap-In has to be added to running shell. This is

done by calling Add-PSSnapIn cmdlet. Powershell.exe enables to specify the

file with script that should be invoked when the shell starts. This is done by providing

the -File option with the file path. Important is to include -NoExit parameter.

Without this parameter PowerShell would end immediately after running startup

commands. More about PowerShell command line arguments can be found at [23].

When the problem with non-hitting breakpoints appears on x64 system this workaround should

work:

1. Add existing project to the solution explorer and select powershell.exe

2. Set newly added project as a startup project

3. Set project properties and set the "Debugger Type" field to "Managed (v2.0, v1.1, v1.0)"

4. Set the same command line arguments as were used in the standard procedure

3.3.2 FileSystem provider implementation The provider implementation is located in the ExtbrainFileSystemProvider

class. The implementation is straightforward since in most cases it only calls the data module.

But let us mention several things that might serve for better orientation when implementing a

new provider.

Provider capabilities One thing that the provider must specify is what capabilities it will support. The

capabilities are determined via the CmdletProvider attribute which must be defined before

the provider class declaration. It is very important to include them because they influence the

way in which PowerShell works with the provider. More information about provider

capabilities is available at [24]. So far, the provider in this thesis implements these two

capabilities:

Page 60: Extensible Provider for Windows PowerShell - ExtBrain

60

1. ExpandWildcards

This capability defines the ability to handle wildcards within a provider internal path.

The PowerShell runtime performs this operation if the provider does not supply this

capability. Since the paths used with data module define custom wildcard combinations

it is necessary to include this one.

2. ShouldProcess

Support of the ShouldProcess capability means that the provider calls

ShouldProcess method before any modification to data is done. PowerShell then

prompts the user to determine whether to continue with an operation that modifies one

or more items in the provider‟s store.

PowerShell calls What might be little confusing is that the provider is instantiated every time when new

command is invoked. That is probably because PowerShell do not want to keep too many

resources too long in the memory. The order in which PowerShell calls the provider methods

differs with each used command. For instance, Get-Item command with D:\Temp\*.xml

parameter results into this order of method calls:

1. GetParentPath and GetChildName methods for D:\Temp\*.txt

2. ExpandPath to get matching items

3. GetParentPath and GetChildName methods for the path of each matched item

4. Get-Item for each matched item

The odd thing about this is that PowerShell invokes these methods multiple times often

with the same parameters. This is noticeable particularly on GetParentPath and

GetChildName methods which are invoked for each path segment. The reason for this is

probably that PowerShell tries to figure out what the structure of the storage is.

3.4 Reading process Another important part of the thesis deals with the reading process and with the

selection of the readers. During the process several readers are used in combination with reader

provider classes. Let us discuss them a little.

3.4.1 Readers architecture

Readers interfaces As it was mentioned in the analysis part, the readers must be divided into two

categories. These two categories are represented by two reader interfaces –

IRootFileReader and IContentFileReader. Root readers are used for expansion of

the leading path part and after that, the content readers are employed. The division of readers

included in this thesis into these two categories is shown in the Diagram 14.

Page 61: Extensible Provider for Windows PowerShell - ExtBrain

61

Diagram 14 Division of readers

Data transferring between readers Each reader defines one or multiple products that implement the interface called

IReaderItem. This is the common interface for objects which carry data of one path part

and enable their reading and manipulation. However, these items are not those that are

transferred between readers during the reading process. They are wrapped into another class

called ExpandInfo. That is because some properties serve only for purposes of the reading

process (like the unprocessed path which is needed only during the expansion) and they were

extracted out of the IReaderItem interface scope. The structure of IExpandInfo objects

that are passed between the readers is shown in the Diagram 15. As it can be noticed, the

ExpandRoot and ExpandItem methods return IEnumerable<IExpandInfo> type,

thus readers can yield matched items lazily.

Diagram 15 IExpandInfo interface

The reading methods of each reader item are added as explicit IReaderItem<T>

interface implementations where T generic parameter determines the type of object returned

from the reading method. The Figure 52 shows the reading methods registration in the

XmlReaderItem class.

Page 62: Extensible Provider for Windows PowerShell - ExtBrain

62

XmlReaderItem : IReaderItem<XDocument>, IReaderItem<XElement>, IReaderItem<XPathNodeIterator> { XDocument IReaderItem<XDocument>.Read() { … } XElement IReaderItem<XElement>.Read() { … } XPathNodeIterator IReaderItem<XPathNodeIterator>.Read(){ … } }

Figure 52 Reading methods registration

3.4.2 Readers Let us briefly look at implementations of each reader included in this thesis. The

following sections do not describe all their methods but only main concepts.

FileSystemRootReader Since the capabilities of files and directories differ on file system, this reader defines

two products – FileRootReaderItem and DirectoryRootReaderItem. These are

wrapped into ExpandInfo class during the reading process. The reader includes recursive

algorithm that expands path pattern. It works in the following manner: Each iteration cuts one

path part and lists all child names that can possibly match to current path part. It is important to

load only top level child items when no recursive wildcard appears in the path. Otherwise, it

loads names of items from all levels. When the names are loaded it uses the

FileSystemMatcher class to determine, whether the path matched to pattern and how.

When the item is succeeds in matching it is wrapped together with a new unprocessed and

matched path and sent to the next iteration.

SevenZipFileReader SevenZip file reader enables navigation through archives supported by the SevenZip

library. It also defines two products – one for the file entries and one for the directory entries.

The reader itself does not perform any extraction. It reads only archive headers to get the list of

items‟ paths and matches them. The SevenZip reader uses the same matcher as file system

reader – FileSystemMatcher. However, it matches always the whole path of an entry

rather than part by part. The reason for this is that the archive entries do not contain any

reference to their child items. Entry A is child of entry B if path to B is prefix of path to A.

However, this information is not explicitly stored within the archive file.

The problem with archive sharing during the reading process is solved via

ArchiveProvider and ArchiveWrapper classes. The reader items sent to output carry

a reference to ArchiveProvider class which controls opening and closing of one archive.

Whenever the item needs the parent archive it asks the provider. It returns

ArchiveWrapper object through which provider watches operations performed with the

archive.

XmlFileReader The xml reader enables navigation and operations in xml files. However, it does not use

the XPath for this. It uses simplified navigation that is similar to the one used on the file

system. That is because writing result to the path determined by XPath is complicated and the

aim was to keep the consistency of paths used both in the reader and writer. However, users

still have chance to perform XPath scripts as xml reader items enable to load

XPathNavigator of the parent xml document and send it to the output. In order to be able

to identify element at the path unambiguously, the suffix with its position is added when

multiple elements with the same name appear at the same level. The sample paths are listed in

the Figure 53.

Page 63: Extensible Provider for Windows PowerShell - ExtBrain

63

data.xml\root\item_0 data.xml\root\item_1 data.xml\root\item_2

Figure 53 XmlFileReader sample paths

BinaryFileReader & TextFileReader These two readers do not support navigation. They are used only to improve reading

capabilities of the provider, thus it can return lines or words from the text files etc. They are

used mainly during the conversion process when the user requests the type which is not

supported by the reader which expanded item.

3.4.3 Conversion & Expansion Expansion and conversion processes use two interfaces IReaderProvider and

IConvertReaderProvider that define methods through which the suitable readers are

selected (shown in the Diagram 16). Their implementations are called ReaderProvider

and ConvertReaderProvider. They contains dictionaries that maps readers to file types

or to types that their products can create.

The IConvertReaderProvider defines also method which selects the default

reader for the particular file type. The reason for that is the following: When the user invokes

the Get-Content cmdlet in PowerShell the data module must return some data. The original

file system provider always reads the file as string and sends it to the pipe. However, the

intention here is to select most suitable data type for the specified path. That is why the

implementation of IConvertReaderProvider decides what objects will user get to

output.

Diagram 16 Conversion and expansion interfaces

3.5 Writing process Another module presented in this work is focused on the writing process and the

selection of writers. The following chapters discuss the classes and interfaces related to this

part.

3.5.1 Writers architecture Also the writers are separated into two categories – root writers and content writers (the

IRootFileWriter and IContentFileWriter interfaces). The content writers are

determined to produce new data while root writers write them to the particular storage. Unlike

in reading part, the products of root and content writers slightly differ. That is because some

path parts can also point to an item which already exists and the data of the original item must

be sent for the update operation. The root writer items do not need any data. As a consequence,

there are two writer item interfaces designed as shown in the Diagram 17. They are called

IRootWriterItem and IInnerWriterItem. As it can be noticed, the writer items will

have recursive data structure as one writer item can contain other nested items.

Page 64: Extensible Provider for Windows PowerShell - ExtBrain

64

Diagram 17 Writer and WriterItem interfaces

The writing process works in the following way: Firstly, the FileTool class is used –

particularly its Write method. The parameters of this method are the target path and the list

of items carrying input data. These data items may come directly from user (e.g. when the

set-content or add-content cmdlet is called) or loaded with the readers (e.g. when

copy-item is called). The FileTool class the calls WriteObjectBuilder which

analyses the target path and divides path into segments. Then, the suitable writer is selected for

each path segment and its Write method is called. The first writer creates its product which is

combined together with products of other writers into one write object. This object consists of

one root writer item (IRootWriterItem) which can optionally contain several nested inner

writer items (IInnerWriterItem). The write object is then passed back to FileTool

which starts the writing process by calling the Write method on the obtained object.

This thesis includes several writer implementations. They support creating and updating

archive files, xml files or text files. Their division into mentioned two categories is shown in

the Diagram 18.

Diagram 18 Writers

3.5.2 Writers Implementation of writers is quite straightforward. However, let us highlight several

important points that influence the final design of some writers.

SevenZipFileWriter Unexpected obstacles were caused by append and rewrite operations of colliding items

in the SevenZipFileWriter. That is because there are always two ways the items can

collide: (1) with items existing within the storage or (2) with other input items. Let us show it

Page 65: Extensible Provider for Windows PowerShell - ExtBrain

65

on the example shown in the Figure 54. It assumes three colliding items – two input items

called file.dat and one item with the same name present in the archive. When the writer is

switched to the append mode (by the writer policy which is discussed later) the result entry

within the archive must contain data of all three colliding items. In the case of the rewrite

mode, only the final item must contain data of the last input item. The problem is that one

update process on the SevenZip archive object can include only one change of the particular

entry. Thus, it is not possible to update item with first input entry and then with the second one

in one update process. And, running new update process for each input item would be

inefficient.

Figure 54 Collisions inside archive

As a result the implementation firstly iterates through input items and records intended

operations and items data. When all input items are processed it is capable to recognize which

items were colliding and form the final data for update. The writing itself is done via one

update process after the time when all results are prepared.

XmlFileWriter The problem that came up when implementing the XmlFileWriter is that the xml

document cannot contain some special characters. This appeared especially when writing

binary data to the element‟s content. Therefore, the writer uses simple heuristics for text

content recognition. It explores the characters distribution and decides whether the file

contains text. If not, it converts its stream into Base64 format [25] and then writes it to the

element content. This is also done when writing of a text file fails. The future work presumes

improvements of this heuristics since there are many other options how the text content can be

recognized. An additional fact is that the writer uses the same paths as those mentioned with

the XmlFileReader thus it can be used in the similar way as the file system writer.

3.5.3 Input items for writing The implementation of the writing part needs also consideration related to the type of

items which will carry data of input items. Since the reading module returns IReaderItem

objects they could be possibly used also for writing. However, the IReaderItem interface

defines additional methods for delete, rename and other operations. These methods are not

needed by writers as they need only method for reading. Moreover, the clients who would like

to use the writing part separately would have to implement their own reader item and wrap

data into it. Therefore, IDataItem interface is introduced which defines only those members

that are actually needed by the writers. Data of input items are not read until the writers

explicitly need them and the writers can select the most suitable data type they need. Thus, the

IDataItem interface is generic and defines the Read<T> method that provides data of the

required type. Results of the reading process implement the IDataItem interface as well so

they can be passed to writing process directly.

Page 66: Extensible Provider for Windows PowerShell - ExtBrain

66

3.5.4 Collisions Every writer must count with name collisions when writing multiple items. The

intention was to separate the writer logic from the part that decides how the writer will behave

when collision occurs. Therefore, the interface IWritePolicy is proposed. Every writer

which works with paths and which may encounter the collision problem defines public

property of IWritePolicy type through which the writer behaviour can be set. The

interface defines one method called Resolve. This method takes the colliding path and the

reference to the method through which the uniqueness of the path can be tested. Thus, the

policy class may suggest new item‟s name and check it immediately. The result of the method

is of WritePolicyResult type that carries the final item‟s path (possibly renamed) and a

flag determining whether the item at the path (if any) should be updated or rewritten. The

Diagram 19 shows the structure of the policy classes.

Diagram 19 WritePolicy interface

3.6 Stream tools Problems with stream sharing and stream closing are solved via these main classes –

StreamProvider and StreamPipe. The first one solves the problem with stream

sharing, the second and third problems with stream closing.

StreamProvider Since data of one item can be read at the same time from multiple locations, access to

these data must be controlled. The proposed solution uses the StreamProvider class in

combination with custom implementation of Stream called ForwardingStream.

StreamProvider contains the public method Get returning the ForwardingStream

object whose only responsibility is to pass all calls of Read, Dispose and other stream

methods to StreamProvider so it can control accesses to data. Each instance of the

ForwardingStream obtains an identifier from StreamProvider so it can recognize it

and monitor its position within the stream. When the Read method is called on the

ForwardingStream object it asks the StreamProvider for data. It determines its

position, performs seeking and reads the data that are then passed through

ForwardingStream to the client. The structure of the StreamProvider and

ForwardingStream is shown in the Diagram 20.

Page 67: Extensible Provider for Windows PowerShell - ExtBrain

67

Diagram 20 StreamProvider

In order to be able to decide when to close the base stream it must know how many

streams are still alive (not disposed). This is done by reference counting which is performed in

the Get and Dispose methods. Another fact is that the provider has to be capable of

reopening the stream even in the case when no reading is performed for a while. Therefore, the

StreamProvider takes the function for stream retrieval rather than taking the stream itself.

It can be noticed that the ForwardingStream does not hold any reference to the

StreamProvider. That is because StreamProvider registers its methods within

ForwardingStream during object creation and thus it only holds references to these

methods directly.

StreamPipe StreamPipe is the class that prevents premature stream closing. Its idea is simple:

The pipe provides access to its input (via GetInput method) through which the pipe can be

filled with data and output (via GetOutput method) through which the data can be obtained

back from the pipe. Both of these methods return custom stream implementation which

delegates its methods calls to the pipe. For this purpose the ForwardingStream class is

reused. Since the current implementation is one threaded the pipe always checks whether the

writing has finished before reading is done. When the input stream is closed the pipe does not

dispose its underlying data but waits until the time when the reading from pipe‟s output is

completed.

Page 68: Extensible Provider for Windows PowerShell - ExtBrain

68

4 USER DOCUMENTATION

4.1 SevenZip library usage The C# wrapper of the SevenZip library which is included in this thesis can be used

separately from PowerShell or data module. It is designed for SevenZip version 9.20 but the

compatibility with other versions is presumed. Implementation always searches for newer

version of SevenZip library DLL at its standard installation path (C:\Program Files\7-

zip\7z.dll). However, developers do not have to install SevenZip library as they can only

copy 7z_x32.dll or 7z_x64.dll file (according to OS architecture) from this project to

the folder of the program.

The classes of SevenZip C# API are contained within the FrameworkShell.dll in

the Framework.Shell.SevenZip namespace. However, it uses several common

methods from Framework.Core project and therefore the FrameworkCore.dll file

must be also copied to the program folder. The key class is called SevenZip and it contains

three factory methods that create an archive object – they are called Read, Create and

Open. The first one is used to extract and read properties, the second one to create a new

archive file and the third to update on an existing archive. This thesis also includes a sample

application called SevenZipTestApp that shows library usage in more detail. Nevertheless, let

us summarize the main use cases of the library usage.

Extraction The entry point for data extraction is the SevenZip.Read factory method. It creates

an instance of IReadArchive which represents an archive file. The archive object contains

the Entries property (an array of IReadEntry). Both the archive object and each entry

contain public property called Properties through which additional information about

archive or entry can be gained. The invocation of Extract method of a particular entry starts

the extraction process. The Figure 55 shows reading the properties and extracting the first

entry in the archive. Some files can be encrypted and the figure also reveals how the password

can be set.

using (var archive = SevenZip.Read( File.OpenRead(@"d:\Temp\encryptedFile.zip"), ArchiveFormat.Zip)) { archive.Password = "heslo"; // reading archive properties foreach (var property in archive.Properties) { ... } // reading entry properties foreach (var entry in archive.Entries) foreach (var property in entry.Properties) { ... } // extraction archive.Entries[0].Extract(File.Create(@"d:\Temp\Output\file.txt")); }

Figure 55 Archive reading

It is necessary to keep in mind that the archive object obtained from any factory method

must be disposed when the reading or updating is finished. That is because it must release

COM object obtained from the native library and release the archive stream.

Page 69: Extensible Provider for Windows PowerShell - ExtBrain

69

Compression & Updates For data compression it is necessary to create an instance of IWriteArchive. This is

done via SevenZip.Create or SevenZip.Open method. These two differs in a number

of parameters they take. While the Create method needs only one output stream, the Open

method updates an existing archive and thus it must obtain additional Stream of the original

archive file. New entries are added through the method called AddEntry defined on the

IWriteArchive interface. This interface also exposes the public property with entries;

however, their type is not IReadEntry but IEntry. This interface inherits from the

previous one and adds methods for delete and update operations. They are called Delete,

Move, Update, UpdateData and UpdateProperties. The sample of update is shown

in the Figure 56.

using (var archive = SevenZip.Open<SevenZip.SevenZ>( File.OpenRead(@"d:\Temp\TestFiles\File.7z"), File.Create(@"d:\Temp\TestFiles\Output\out2.7z"))) { archive.AddEntry(@"d:\Temp\TestFiles\Files\a.txt", "new.txt"); // Delete can precede Extract since Delete is performet during disposal archive.Entries[3].Delete(); archive.Entries[3].Extract(File.Create(@"d:\Temp\TestFiles\Output\b.txt")); archive.Entries[2].Update(@"d:\Temp\TestFiles\Files\c.txt"); }

Figure 56 Archive update

It is important that all update operations are suspended to the time of archive disposal.

Therefore, invokes of extract method can follow deletes and updates which are changing the

archive structure since the archive remains in its initial state until the dispose time.

Setting compression properties Sometimes it might be useful to compress files with different compression ratio or

specify different properties of the compression process. This functionality is enabled through

the methods SetCompressOption and SetCompressOptions. The first one serves for

advanced users who are aware of options names and types (described in the SevenZip

documentation). An easier way is to use the second option and pass one of the prepared

options set to the SetCompressOptions method. This is shown in the Figure 57.

using (var archive = SevenZip.Create(File.Create(@"d:\Temp\out.7z"), ArchiveFormat.SevenZip)) { archive.Password = "heslo"; archive.SetCompressOptions(new SevenZip.SevenZipCompressOptions { CompressHeaders = true, EncryptHeaders = true, ThreadsCount = 3 }); // “x” represents compression level archive.SetCompressOption("x", (uint)3); archive.AddEntry(@"d:\Temp\TestFiles\Files\file.doc", "file.doc"); archive.AddEntry(@"d:\Temp\TestFiles\Files\a.txt", "a.txt"); }

Figure 57 Sample with compress options

Page 70: Extensible Provider for Windows PowerShell - ExtBrain

70

4.2 Provider usage

Installation The provider is compatible with PowerShell version 2.0. The library that contains the

Snap-In installer class and the provider is located in the FrameworkShell.dll file. The

Snap-In must be installed first and then loaded into the PowerShell runspace. To accomplish

this, the user must follow several steps. The thesis includes also the script which can user also

utilize. However, let us explain what must be done:

1. Register the Snap-In library

Firstly, it is necessary to launch the .NET framework installer tool called

Installutil.exe [26] with the path to the FrameworkShell.dll. Since

this library uses functionality from the FrameworkCore project, the

FrameworkCore.dll file must be placed into the same directory where the

FrameworkShell.dll is located. This step creates only registration information for

the Snap-In so it can be later loaded into the PowerShell‟s runspace. In order to see

whether the registration was completed successfully, the user can invoke

Get-PSSnapIn cmdlet with the “registered” parameter. It is shown in the Figure 58.

c:\Windows\Microsoft.NET\Framework\v2.0.50727\InstallUtil.exe Framework.Shell.dll Get-PSSnapin -registered

Figure 58 Snap-In registration

2. Remove colliding providers

PowerShell does not allow registering multiple providers that control access to the same

stores. Therefore, the default file system provider must be unregistered because system

drives cannot be controlled by two providers. The script in the Figure 59 shows how it

can be done. It loads all available drives and removes those where the default file system

provider is used. The registration of new drives runs automatically when the new

Snap-In with a provider is added (discussed in the following step).

Get-PSDrive | Where-Object { $_.Provider.Name -eq "FileSystem" } | Remove-PSDrive –Force

Figure 59 Unregistering default FileSystemProvider

3. Register new Snap-In with the provider

The Snap-In is loaded to the runspace via the Add-PSSnapin cmdlet and providing it

with the Snap-In‟s name (in this thesis called Extbrain.Shell). This is shown in

the Figure 60. Together with the Snap-In PowerShell loads also the format file that

describes how the objects written to the pipe will be displayed. Therefore, the format file

(called ExtBrain.formats.ps1xml) must be placed to the folder from which the

Snap-In was registered. If the user does not have installed SevenZip on the computer,

the folder should also contain 7z_x86.dll (or 7z_x64.dll) file. That is because

the provider uses SevenZip wrapper which searches for the native SevenZip library.

Add-PSSnapin Extbrain.Shell

Figure 60 Loading Snap-In to the runspace

Page 71: Extensible Provider for Windows PowerShell - ExtBrain

71

PowerShell starts to use the provider when the user switches to one of its drives.

Therefore, Set-Location cmdlet should be invoked first. After that, the provider is ready

to use.

Usage The provider supports drive, item, container, navigation as well as content cmdlets.

However, these cmdlets can be used also inside archive and xml files. The provider also

improves content cmdlets for binary, xml and text files. Let us see several samples of usage

and demonstrate implemented improvements.

Item cmdlets

The provider supports all item oriented cmdlets which include Get-Item,

Clear-Item and Invoke-Item. However, in addition to a standard provider the input

paths used with the cmdlets can point at location inside xml or archive file and they can be

combined together. All these paths can contain additional recursive wildcards, thus user can

search even inside files that are supported by readers. The Figure 61 shows the samples which

demonstrate that.

// loads all elements inside root element of the file.xml which is contained in // the archive file PS D:\> Get-Item D:\Temp\archive.zip\Folder\file.xml\root\* // searches for all text files within zip archives (with the name archive.zip) // that are located in any subfolders of the Temp folder PS D:\> Get-Item D:\Temp\**.\archive.zip\*.txt // invokes defauld action for text.txt file contained within two archives PS D:\> Invoke-Item D:\Temp\archive.7z\Folder\innerArchive.zip\text.txt // clears item contained within archive PS D:\> Clear-Item D:\Temp\archive.zip\file.txt

Figure 61 Item cmdlets demonstration

Container cmdlets

Copy-Item, Get-ChildItem, New-Item, Remove-Item, Rename-Item

create container group of cmdlets. All of these are supported by the provider. Thus, the user

can copy the file into a new or existing archive, rename or remove archive entries or xml

elements, etc. More illustrative are samples shown in the Figure 62.

// gets top level entries contained within archive.7z PS D:\> Get-ChildItem D:\Temp\archive.7z // creates new archive with text file that will contain element from file.xml PS D:\> Copy-Item D:\Temp\file.xml\root\item D:\Temp\newArchive.7z\file.txt // copies all country elements from file.xml to text file PS D:\> Copy-Item D:\Temp\file.xml\**\country D:\Temp\Out\file.txt // creates text file inside inside two archives PS D:\> New-Item D:\Temp\archive.7z\Folder\innerArchive.zip\text.txt -value ”This is file content” // renames root element within file.xml PS D:\> Rename-Item D:\Temp\file.xml\root newRootName

Figure 62 Container cmdlets demonstration

Page 72: Extensible Provider for Windows PowerShell - ExtBrain

72

Navigation cmdlets

Since the provider supports navigation cmdlets the user can go through the file system

and additionally through the content of archive and xml files. Navigation cmdlets are

Set-Location, Get-Location and Move-Item. The samples of their usage are shown

in the Figure 63.

// sets current location to position inside archive or xml file (first sample // uses alias cmdlet name) PS D:\> cd D:\Temp\archive.7z\Folder PS D:\> Set-Location D:\Temp\file.xml\root\ // sets location to position inside archive and then lists all subelements PS D:\> Set-Location D:\Temp\archive.7z\Folder\ PS D:\> ls // moves text file from the archive to the folder with the name Out PS D:\> Move-Item D:\Temp\archive.7z\Folder\file.txt D:\Temp\Out

Figure 63 Navigation cmdlets demonstration

All items are regarded as containers from the provider‟s view because there can be

registered readers and writers that can access its inner structure. Therefore move as well as

copy operations always try to create item inside the target path and thus combine the target

path with the path of a moved or copied item. Let us explain it on the example shown in the

Figure 64. Here, the provider will move the item into the archive.7z file rather than create

the text file with the name archive.7z. Thus, the final item‟s path will be

D:\Temp\archive.7z\file.txt and not D:\Temp\archive.7z.

PS D:\> Move-Item D:\Temp\file.txt D:\Temp\archive.7z

Figure 64 Sample of move operation

Content cmdlets

A very useful group of cmdlets are content cmdlets. These include Get-Content,

Set-Content, Add-Content and Clear-Content. Users can manipulate with content

of xml, text and binary files. The provider adds an additional parameter to Get-Content

called resultType through which users can specify the type of object written to the output.

These cmdlets can be handily combined together as shown in the Figure 65.

// gets the content of text file contained within archive PS D:\> Get-Content D:\Temp\archive.7z\file.txt // loads XElement from the xml file PS D:\> Get-Content D:\Temp\file.xml\root\item –resultType XElement // reads lines of file.csv PS D:\> Get-Content file.csv –resultType IEnumerable<string> // sets content of text file inside archive or conent of element PS D:\> Set-Content D:\Temp\file.xml\root\item “Element content” // fills main element with elements from the file.xml PS D:\> Get-Content D:\Temp\file.xml\root\* | Set-Content D:\Temp\data.xml\main // appends elements to item element within data.xml file PS D:\> Get-Content D:\Temp\file.xml\root\* | Add-Content D:\data.xml\main\item // appends content of text files to data.txt file

Page 73: Extensible Provider for Windows PowerShell - ExtBrain

73

PS D:\> Get-Content D:\Temp\*.txt | Add-Content D:\Out\data.txt // clears content of text file PS D:\> Clear-Content D:\Temp\archive.7z\Folder\file.txt

Figure 65 Content cmdlets demonstration

4.3 Future extensions This thesis is based on the assumption that it will be extended in the future. Therefore,

most parts are designed to be replaceable or suppose future addition of new features. Let us see

what parts can be improved or replaced.

New reader implementation So far the provider supports accessing data inside archive files, xml files and improves

reading text files. In addition to these, developers may add custom readers which make

different storages accessible or enable navigation through the content of files that are not

currently supported. These steps must be followed in order to add a new reader:

Implement IRootFileReader or IContentFileReader interface

Firstly, developers must decide what type of reader they will add. As it was mentioned

before, the root readers are determined to load data from the particular storage (like ftp,

database, etc.) while the content readers enable access to inner structure of an already loaded

file. Both IRootFileReader and IContentFileReader interfaces contain only one

method. Their signature is shown in the Figure 66. Root readers obtain the path they should

expand while content readers obtain more complex structure called ExpandInfo. This

structure includes the path part that has already been processed, the inner path that should be

expanded by the current reader and a product (IReaderItem implementation) of the

previous reader. This product provides reading methods through which the previous path part

can be loaded.

IEnumerable<IExpandInfo> ExpandRoot(string path); // content readers IEnumerable<IExpandInfo> ExpandItem(IExpandInfo expandInfo); // root readers

Figure 66 Readers’ methods

Readers implementations are expected to follow these steps:

1. Check correctness of obtained path/obtained data

2. Start process that searches for matching items

3. When the matching item is found wrap information about it into custom

IReaderItem implementation

4. Fill the ExpandInfo structure with IReaderItem object and other properties that

are necessary for another expansion

5. Send it to output

The readers may return an empty list of items if they do not know how to expand the

path or parse obtained data. Otherwise, they should return one ExpandInfo object for each

matched item and fill its properties accordingly.

Implement IReaderItem interface

ExpandInfo structure returned from reader‟s Write method must be filled with a

reader product – an implementation of IReaderItem. IReaderItem is the common

interface for an object carrying data of one path part. It must implement several properties and

methods. These together with their meanings are shown in the Table 4.

Page 74: Extensible Provider for Windows PowerShell - ExtBrain

74

Properties Meaning

Name Name of the item that matched the path.

Path Full path to the item.

WritePath

This property specifies the path that should be used during writing

process. This path will be combined with the target path given by user

during copying.

IsContainer Specifies whether the item is container in the scope of parent storage.

HasChildren Specifies whether the item has children in the scope of parent storage.

Methods Meaning

Clear Clears the content of the item.

Update Updates the item content.

Delete Deletes the item.

GetChildren If the item is container this method must return children according to

specified filter.

MakeRelative

Decides whether the target path for move operation leads in to the

same storage from which the item was loaded. If so, the methods must

return relative path within the storage (e.g. the xml reader returns the

path that is relative to parent xml file path). Otherwise, this method

returns null.

Move Moves the item to path (result of MakeRelative method) in the

scope of parent storage.

ReadDefault Returns default object (this object will be returned to provider users

when no type is specified for Get-Content cmdlet.

Table 4 Reader methods and properties description

Some of these methods are determined to change the item‟s underlying data. If the

reader does not support data modification, their reader items must throw an exception in

corresponding methods. An important fact is that the reader items of content readers must call

their parent items when any update operation is called. That is because the changes must be

propagated through all path parts.

In addition to the members defined by IReaderItem interface, the reader item must

implement one or several reading methods that load underlying data. For instance, current xml

reader supports loading XElement as well as string that can be used by the following

readers or users. These reading methods must be added as explicit implementations of

IReaderItem<T> interface where T parameter specifies the type of object returned from

the method.

Register reader in ReaderProvider and optionally in ConvertReaderProvider

The last step is to register the reader within the reader provider class and convert the

reader provider class. ReaderProvider maps the type of file to the readers and

ConvertReaderProvider maps types to readers. It is necessary to add a new definition

into these two in order to use the newly defined reader during expansion and conversion

processes.

New writer implementation Similarly to the readers also new writers can be added and two types of them are

available. Developers must select between IRootFileWriter and

IContentFileWriter interface. Root writers are determined to write data to different

storage while intention of content writers is to create data with appropriate structure.

Root writer

New root writer must implement IRootFileWriter interface with Write method

which is called during write object building. This method returns IRootWriterItem and

Page 75: Extensible Provider for Windows PowerShell - ExtBrain

75

the writer must provide custom implementation of this item. It carries information necessary

for writing of the path part to which the item relates. What data will the item contain is left on

developers. Therefore, IRootWriterItem defines only two properties. InnerPath

carries the path that the writer item represents and IInnerWriterItem that holds reference

to writer item produced by the following reader (if any). The inner writer item is filled during

write object building by WriteObjectBuilder class. The interfaces related to root writers

are shown in the Diagram 21.

Diagram 21 Root writer interfaces

Write method of the IRootWriterItem launches the writing process and provides

data in the parameter of the method. The writer item is expected to perform these steps:

1. Create the inner path that the item represents and prepare location for data writing

2. Write the data

a. If the InnerWriterItem is filled the writer item must call its Write method

and provide it with the obtained data items. The result of the method must be then

written to the prepared location.

b. If the InnerWriterItem is empty the writer item must iterate through all the

obtained data items and write them to a prepared location.

Content writer

Implementation of content writers differs from root writers only in a few points. Their

Write method returns IInnerWriterItem instead of IRootWriterItem, thus a

slightly different writer product. This product interface contains Write method that has one

additional parameter. That is Stream with data from the parent writer item that should be

updated. This is because some parts of the path may already exist and the writer item must get

the original data of an existing path part so it could update them and pass back to the parent.

These original data are transferred right through this additional parameter. Therefore, the

implementation of the Write method must distinguish these two situations:

1. Update – when data from the parent are obtained

2. Create new – when data from the parent are empty (null)

Both of these situations must follow this rule: When the InnerWriterItem is

present the current writer item must provide it with data obtained from the user and also with

data from the target path when the item at the target path exists. These steps must be followed

when the parent provides original data of the target path:

1. Read original data

2. Ensure target path and prepare target location

3. Add new data

4. Return updated data to the parent

An important fact is that the writer item must close the data Stream obtained from the

parent item even if it does not use them. That is because the parent item must recognize

whether original data can be overridden.

Page 76: Extensible Provider for Windows PowerShell - ExtBrain

76

Create operation is easier. The only thing is to create a file with an appropriate structure

(that also contains target path) and fill it with new data. As a result, the create operation

includes only these steps:

1. Create target path and prepare target location

2. Add new data

3. Return new data to the parent

Writer registration

Analogously to readers, also new writers must be registered in order to be able to select

them during writer object building. Therefore, it is necessary to add new definition into

WriterObjectBuilder class. This class tests each path part and matches patterns

provided by registered writers. When the pattern corresponds to the tested path part, related

writer is chosen.

ReaderProvider, ConvertReaderProvider, WriteObjectBuilder These three objects influence the way in which the readers and writers are selected. So

far only one reader or writer has been registered for one file type. But new readers and writers

will be added in the future. Then it may happen that multiple readers or writers will have an

ability to process the same file type differently.

Therefore, when the collision appears reader providers and a writer builder must follow

some rules in order to select the most suitable reader or writer. Since the data module works

only with interfaces the reader providers and write object builder can be easily substituted with

custom implementations. Developers thus have an ability to implement custom set of rules that

influence the readers and writers‟ selection. Moreover, future implementation supposes an

extension that will enable the client to specify which reader or writer will be selected. But this

extension will be reasonable only when more readers and writers are implemented.

Matchers Current file system, SevenZip and xml reader are using a matcher during expansion of a

path. They iterate through items within the storage and ask the matcher whether particular

item‟s path corresponds to the pattern given by the user. The matcher is associated with the

reader in the reader and convert providers. Its implementation can be substituted and the

matching rules can be changed this way.

Page 77: Extensible Provider for Windows PowerShell - ExtBrain

77

5 CONCLUSION

5.1 Evaluation The objective of this thesis was to design and implement a modular PowerShell provider

that will allow developers to add adapters enabling manipulation with data contained within

the files or located in other storages. The secondary objective was to implement an adapter for

archive files.

As a result, this thesis presents a provider with all the needed requirements. It uses a

separate data module which allows registration of readers and writers through which the

content of the files can be accessed and changed. This data module can be used separately

from the provider and hence it is easily includable into other projects. Currently, the data

module contains readers and writers for archive and xml files and thus the users can navigate

through them and manipulate with their content. In addition, it includes readers of text, xml

and binary files that improve the data loading into various .NET objects which can be directly

written to the PowerShell‟s pipe. The thesis also contains implementation of PowerShell‟s

Snap-In installer class and consequently the provider can be easily loaded into PowerShell

runspace.

The adapter for archive files uses SevenZip library. But first this library was adapted for

usage with the managed code and the custom C# wrapper of the library is introduced. This

wrapper allows extracting, compressing as well as updating of all archive files supported by

the native library. The native library is accessed via its COM API. However, since it differs

from standard windows COM, it was necessary to implement the whole layer with interop

classes manually and map the native types precisely. Unfortunately, the native library is not

documented enough and therefore this thesis brings also detailed description of SevenZip

COM API.

5.2 Comparison with other products So far, no similar extension to PowerShell has been found. Current extensions are

mostly focused on adding new providers and cmdlets and thus the idea of a universal provider

is unique. Moreover, these extensions are often closely tied with PowerShell and they cannot

be used separately. Probably the most similar kind of software is called PowerShell

Community Extensions (PSCX). Let us discuss it a little.

PowerShell Community Extensions

PSCX is aimed at providing a widely useful set of additional cmdlets, providers, aliases,

filters, functions and scripts for Windows PowerShell. These extensions are produced by a

community of developers who wanted more cmdlets than Microsoft was able to deliver. These

cmdlets and providers are not specialized in one particular area but allow working with active

directory, clipboard, file system, .NET assemblies and others. However, it can be seen that the

intention of this software is different as it contains many cmdlets and providers that are

separate without any attempt to put them together into one universal provider. More

information about PSCX is at [27].

5.3 Future visions This thesis assumes that its development will continue in the future. Let us briefly look

at some extensions that are currently planned.

Page 78: Extensible Provider for Windows PowerShell - ExtBrain

78

New readers and writers

The future work presumes implementation of additional readers and writers. The

intention is to add support for accessing FTP as well as database storages. These two are

currently most needed; however, many other can be devised.

Multithreading support

The whole module for data reading and writing is currently one-threaded. That is

noticeable especially when multiple readers are combined together. The nested reader always

waits until previous reader loads parent data; although the readers could work simultaneously.

The idea is to create a multithreaded pipe where one reader would fill data to the pipe‟s input

and another reader could read the pipe‟s output immediately. This would increase the speed

when reading large files or when many readers are combined together.

Hotplug

Another thing that is currently missing is the support for hotplug. The current provider

does not recognize the new flash drive and PowerShell registers its own file system provider

for it. The plan is to substitute this one when a new drive appears.

FileManager application

The main part that will be built on this prepared framework will be the file manager

application. The intention is to integrate the universal provider into desktop application which

will take advantage not only from this provider but from the whole infrastructure of

PowerShell.

Page 79: Extensible Provider for Windows PowerShell - ExtBrain

79

REFERENCES 1. ExtBrain. ExtBrain. [Online] http://extbrain.felk.cvut.cz/.

2. Kopczynski, Tyson, Handley, Pete and Shaw, Marco. Windows PowerShell Unleashed

(2nd Edition). s.l. : Sams, 2009. ISBN: 978-0-672-32953-1.

3. Jones, Don and Hicks, Jeffery. Windows PowerShell 2.0. Napa : SAPIEN Technologies,

Inc., 2006. ISBN: 978-0-9821314-2-8.

4. Microsoft. Windows PowerShell Core. TechNet. [Online]

5. Microsoft. PowerShell. Web MSDN Library. [Online] http://msdn.microsoft.com/en-

us/library/ee809360.aspx.

6. PowerGUI.org. PowerGUI. PowerGUI home page. [Online]

http://www.powergui.org/index.jspa.

7. Idera. PowerShell Plus. Idera. [Online]

http://www.idera.com/products/powershell/powershell-plus/.

8. Kumaravel, Arul, et al. Professional Windows PowerShell Programming. Indianapolis :

Wiley Publishing, Inc., 2008. ISBN: 978-0-470-17393-0.

9. Microsoft. Windows PowerShell Provider Concepts. Web MSDN Library. [Online]

10. SharpZipLib. SharpZipLib. SharpDevelop. [Online]

http://www.sharpdevelop.net/OpenSource/SharpZipLib/.

11. Pavlov, Igor. SevenZip. SevenZip. [Online] http://www.7-zip.org/.

12. Wikipedia. SevenZip. Wikipedia. [Online] http://en.wikipedia.org/wiki/7-Zip.

13. Vadim, Markovtsev. SevenZipSharp. CodePlex. [Online]

http://sevenzipsharp.codeplex.com/.

14. Microsoft. C++/CLI Migration Primer. Web MSDN Library. [Online]

http://msdn.microsoft.com/en-us/library/ms235289%28VS.80%29.aspx.

15. Microsoft. Exporting from a DLL. Web MSDN Library. [Online]

http://msdn.microsoft.com/en-us/library/z4zxe9k8%28v=vs.80%29.aspx.

16. Microsoft. Platform Invoke Tutorial. Web MSDN Library. [Online]

http://msdn.microsoft.com/en-us/library/aa288468%28v=vs.71%29.aspx.

17. Microsoft. COM Interop Part 1: C# Client Tutorial. Web MSDN Library. [Online]

http://msdn.microsoft.com/en-us/library/aa645736%28v=VS.71%29.aspx.

18. Microsoft. Unions. Web MSDN Library. [Online] http://msdn.microsoft.com/en-

us/library/5dxy4b7b%28v=vs.80%29.aspx.

19. Clark, Jason. Calling Win32 DLLs in C# with P/Invoke. Web MSDN Library. [Online]

http://msdn.microsoft.com/en-us/magazine/cc164123.aspx.

20. Krishnaswamy, Ravi. SafeHandles: the best V2.0 feature of the .NET Framework.

MSDN Blogs. [Online] March 15, 2005.

http://blogs.msdn.com/b/bclteam/archive/2005/03/15/safehandles-the-best-v2-0-feature-

of-the-net-framework-ravi-krishnaswamy.aspx.

21. Wikipedia. Solid compression. Wikipedia. [Online]

http://en.wikipedia.org/wiki/Solid_compression.

22. Microsoft. about_Format.ps1xml. TechNet. [Online] http://technet.microsoft.com/en-

us/library/dd315396.aspx.

23. Microsoft. PowerShell.exe Console Help. TechNet. [Online]

http://technet.microsoft.com/en-us/library/dd315276.aspx.

24. Microsoft. ProviderCapabilities Enumeration. Web MSDN Library. [Online]

http://msdn.microsoft.com/en-

us/library/system.management.automation.provider.providercapabilities%28v=vs.85%29.

aspx.

25. Wikipedia. Base64. Wikipedia. [Online] http://en.wikipedia.org/wiki/Base64.

26. Microsoft. Installer Tool (Installutil.exe). Web MSDN Library. [Online]

http://msdn.microsoft.com/en-us/library/50614e95%28v=VS.80%29.aspx.

Page 80: Extensible Provider for Windows PowerShell - ExtBrain

80

27. Community, PowerShell. PowerShell Community Extensions. CodePlex. [Online]

http://pscx.codeplex.com/.

28. Microsoft. Windows Management Instrumentation. Web MSDN Library. [Online]

Microsoft. http://msdn.microsoft.com/en-us/library/aa394582%28v=vs.85%29.aspx.

29. Microsoft. Component Object Model (COM). Web MSDN Library. [Online] Microsoft.

http://msdn.microsoft.com/en-us/library/ms680573%28VS.85%29.aspx.

30. Wikipedia. Common Language Infrastructure. Wikipedia. [Online]

http://en.wikipedia.org/wiki/Common_Language_Infrastructure.

31. ECMA. Common Language Infrastructure (CLI) Partitions I to IV. ECMA 335. [Online]

June 4, 2006. http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-

335.pdf.

32. Microsoft. Global Assembly Cache. Web MSDN Library. [Online]

http://msdn.microsoft.com/en-us/library/yf1d93sz.aspx.

33. Microsoft. Extension Methods (C# Programming Guide). Web MSDN Library. [Online]

http://msdn.microsoft.com/en-us/library/bb383977.aspx.

34. ECMA. ECMA International Welcome page. ECMA International. [Online]

http://www.ecma-international.org/.

35. Standards, International Organization for. Home page. International Organization for

Standards. [Online] http://www.iso.org/iso/home.htm.

Page 81: Extensible Provider for Windows PowerShell - ExtBrain

81

APPENDICES Contents of the enclosed CD

Bin The directory containing compiled binaries.

Doc The directory containing the documentation of the source

code.

Src The directory containing the source code.

Text The directory containing the document with text of this thesis

in the PDF format.

readme.txt The text document containing information about the CD

structure.