1. Example
<family lastName="McDowell" state="CA">
<person firstName="Gayle">Some Message</person>
</family>
Element --> Tag Attributes END(tag head end) Children END(tag tail end)
Attribute --> Tag Value
END --> 0
Tag --> some predefined mapping to int
Value --> string value END()
family -> l, person ->2, frstName ->3, LastName -> 4j state -> 5).
<family lastName="McDowell" state="CA">
<person firstName="Gayle">Some Message</person>
</family>
Becomes:
1 4 McDowell 5 CA 0 2 3 Gayle 0 Some Message 0 0
/
/ 1. Read tag name // Image
// 2. Read attributes
// 3. End of TagHead
// 4. End of tag , /> OR >
// 1. TagHead
// 2. Attribute
// 3. TagHead End
// 4. Tag Value
// 5. Tag's Children
// 6. TagTail End
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
02
03Tove 01
04Jani 01
05Reminder 01
06Don't forget me this weekend! 01
01
<note date="12/11/2007">
<to>Tove</to>
<from>Jani</from>
</note>
http://chimera.labs.oreilly.com/books/1234000001805/ch24.html#the_butler_did_it
@XmlRootElement
public
class
Person
{
public
String
name
;
public
Address
address
;
public
int
age
;
}
public
class
Address
{
public
String
city
,
street
;
public
int
number
,
zip
;
}
<
person
>
<
name
>
Pat
Niemeyer
</
name
>
<
address
>
<
city
>
St
.
Louis
</
city
>
<
street
>
Java
St
.</
street
>
<
number
>
1234
</
number
>
<
zip
>
54321
</
zip
>
</
address
>
<
age
>
42
</
age
>
</
person
>
<!--
Simple
-->
<
Sentence
>
This
is
text
.</
Sentence
>
<!--
Element
-->
<
Paragraph
><
Sentence
>
This
is
text
.</
Sentence
></
Paragraph
>
<!--
Mixed
-->
<
Paragraph
>
<
Sentence
>
This
<
verb
>
is
</
verb
>
text
.</
Sentence
>
</
Paragraph
>
<!--
Empty
-->
<
PageBreak
></
PageBreak
>
Attributes
<
Document
type
=
"LEGAL"
id
=
"42"
>...</
Document
>
<
Image
name
=
"truffle.jpg"
/>
2. Implementation
Part1: Encoding
private Map tagMap; // tag gets mapped to a pre-defined integer value
private static final Byte[] END={0,1};
private List tokens;
private int currentTokenIndex;
byte[] encode(char[] input) throws IOException
{
// STEP1: Process Input
tokenize(input);
currentTokenIndex = 0;
// STEP2: Encode processed input
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
encodeTokens(outputStream);
}
void encodeTokens(ByteArrayOutputStream output)
{
//
nextToken("<");
// 1. Read tag name // Image
String tagName = nextToken();
output.write( getTagCode(tagName) );
// 2. Read attributes
while ( !hasNextToken(">") && !hasNextTokens("/", ">") )
{
// read next attribute
String key = nextToken(); // name
nextToken("="); // =
String value = nextToken(); // "truffle.jpg"
output.write( getTagCode(key) );
for(char c: vlaue.toCharArray() )
output.write(c);
// NOTE: after attribute value, it is the End
output.write(END[0]);
output.write(END[1]);
}
// 3. End of TagHead
output.write(END[0]);
output.write(END[1]);
// 4. End of tag , /> OR >
// CASE1: Attribute End like
if ( hasNextTokens("/", ">")) // />
{
nextToken("/");
nextToken(">");
}
// CASE2: Tag End like Gambardella, Matthew
else
{
nextToken(">");
while( !hasNextTokens("<","/"))
encodeTokens(output); // Encode child
// Encoding tag,
nextToken("<");
nextToken("/");
nextToken(tagName);
nextToken(">");
}
// NOTE: end of tag
output.write(END[0]);
output.write(END[1]);
}
Ralls, Kim
Midnight Rain
Fantasy
5.95
2000-12-16
A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.
Part2: Compress further
You can treat file as a general stream of characters and use any number of comparison techniques: Shannon-Fano coding, Huffman coding or Arithmetic coding
//assuming a mapping of
// family -> 1,
// person ->2,
// firstName -> 3
// lastName -> 4
// state -> 5
//
//
// Some Message
//
// Becomes:
// 1 4 McDowell 5 CA 0 2 3 Gayle 0 Some Message 0 0.
void encode(Element root, StringBuffer sb)
{
// validate the input
if (root == null)
return;
// 1. TagHead
encode(root.getTagName(), sb);
// 2. Attribute
for (Attribute a: root.attributes())
{
encode(a, sb);
}
// 3. TagHead End
encode("0",sb);
// 4. Tag Value
if ( root.hasValue())
encode(root.getVlaue(), sb);
// 5. Tag's Children
for (Element e: root.children())
{
encode(e, sb);
}
// 6. TagTail End
encode("0", sb);
}
void encode(String s, StringBuffer sb)
{
sb.append(s).append(" ");
}
void encode(Attribute a, StirngBuffer sb)
{
encode(a.getTagName(), sb);
encode(a.getValue(), sb);
}
class Element
{
String getTagNmae() {return null;}
Attribute[] attributes() {return null;}
Element[] children() {return null;}
String getValue() {return null;}
boolena hasValue() {return false;}
}
class Attribute
{
String getTagName() {return null;}
String getValue() {return null;}
}
3. Similar Ones
https://github.com/wzhishen/cracking-the-coding-interview/blob/master/src/chap17/Q10.java